
241
F
3
KM: Federated, Fair, and Fast 𝑘-means
SHENGKUN ZHU, School of Computer Science, Wuhan University, China
QUANQING XU, OceanBase, Ant Group, China
JINSHAN ZENG, School of Computer and Information Engineering, Jiangxi Normal University, China
SHENG WANG
∗
, School of Computer Science, Wuhan University, China
YUAN SUN, La Trobe Business School, La Trobe University, Australia
ZHIFENG YANG, OceanBase, Ant Group, China
CHUANHUI YANG, OceanBase, Ant Group, China
ZHIYONG PENG, School of Computer Science & Big Data Institute, Wuhan University, China
This paper proposes a federated, fair, and fast
𝑘
-means algorithm (F
3
KM) to solve the fair clustering problem
eciently in scenarios where data cannot be shared among dierent parties. The proposed algorithm decom-
poses the fair
𝑘
-means problem into multiple subproblems and assigns each subproblem to a client for local
computation. Our algorithm allows each client to possess multiple sensitive attributes (or have no sensitive
attributes). We propose an in-processing method that employs the alternating direction method of multipliers
(ADMM) to solve each subproblem. During the procedure of solving subproblems, only the computation results
are exchanged between the server and the clients, without exchanging the raw data. Our theoretical analysis
shows that F
3
KM is ecient in terms of both communication and computation complexities. Specically,
it achieves a better trade-o between utility and communication complexity, and reduces the computation
complexity to linear with respect to the dataset size. Our experiments show that F
3
KM achieves a better
trade-o between utility and fairness than other methods. Moreover, F
3
KM is able to cluster ve million
points in one hour, highlighting its impressive eciency.
CCS Concepts: • Computing methodologies → Cluster analysis.
Additional Key Words and Phrases: Federated, Fair, Fast, 𝑘-means, ADMM
ACM Reference Format:
Shengkun Zhu, Quanqing Xu, Jinshan Zeng, Sheng Wang, Yuan Sun, Zhifeng Yang, Chuanhui Yang, and Zhiy-
ong Peng. 2023. F
3
KM: Federated, Fair, and Fast
𝑘
-means. Proc. ACM Manag. Data 1, 4 (SIGMOD), Article 241
(December 2023), 25 pages. https://doi.org/10.1145/3626728
1 INTRODUCTION
The rise of big data has emphasized the crucial role of clustering analysis in data science. This
statistical technique facilitates the exploration of a dataset’s internal structure by organizing data
∗
Corresponding author
Authors’ addresses: Shengkun Zhu, whuzsk66@whu.edu.cn, School of Computer Science, Wuhan University, China;
Quanqing Xu, OceanBase, Ant Group, China, xuquanqing.xqq@oceanbase.com; Jinshan Zeng, School of Computer and
Information Engineering, Jiangxi Normal University, China, jinshanzeng@jxnu.edu.cn; Sheng Wang, School of Computer
Science, Wuhan University, China, swangcs@whu.edu.cn; Yuan Sun, La Trobe Business School, La Trobe University,
Australia, yuan.sun@latrobe.edu.au; Zhifeng Yang, OceanBase, Ant Group, China, zhuweng.yzf@oceanbase.com; Chuanhui
Yang, OceanBase, Ant Group, China, rizhao.ych@oceanbase.com; Zhiyong Peng, School of Computer Science & Big Data
Institute, Wuhan University, China, peng@whu.edu.cn.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the
full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2836-6573/2023/12-ART241 $15.00
https://doi.org/10.1145/3626728
Proc. ACM Manag. Data, Vol. 1, No. 4 (SIGMOD), Article 241. Publication date: December 2023.
BBAAD9C20180234D78A0072836F0B91042B9B20910300BA0A6D98A34B1CB2BF4AB46B438915CAB0722492008984625EBF2E9210AC1D01B511BBFC26E770E3FD824112BAD3C22B9F764942AC767674B912927EA8A7DFF741658BA519C5A64DCC8DEE6249A9E3
评论