107
FEAST: A Communication-eicient Federated Feature
Selection Framework for Relational Data
RUI FU, Beijing Institute of Technology, China
YUNCHENG WU, National University of Singapore, Singapore
QUANQING XU, OceanBase, Ant Group, China
MEIHUI ZHANG
∗
, Beijing Institute of Technology, China
Vertical federated learning (VFL) is an emerging paradigm for cross-silo organizations to build more accurate
machine learning (ML) models. In this setting, multiple organizations (i.e., parties) hold the same set of samples
with dierent features. However, dierent parties may have redundant or highly correlated features, leading
to inecient and ineective VFL model training. Eective feature selection in VFL is therefore essential to
mitigate such a problem and improve model eectiveness, as well as computation and communication eciency.
To this end, in this paper, we propose a federated feature selection framework, called FEAST, which leverages
conditional mutual information (CMI) to select more informative features while having low redundancy.
Furthermore, we design a communication-ecient method to reduce the information exchanged among the
parties while protecting the parties’ raw data. Extensive experiments on four real-world datasets demonstrate
that the proposed framework achieves state-of-the-art performance in terms of accuracy, communication and
computation costs.
CCS Concepts: • Computing methodologies
→
Feature selection; Cooperation and coordination; Supervised
learning by classication; • Mathematics of computing → Information theory.
Additional Key Words and Phrases: feature selection, vertical federated learning, communication-ecient,
conditional mutual information
ACM Reference Format:
Rui Fu, Yuncheng Wu, Quanqing Xu, and Meihui Zhang. 2023. FEAST: A Communication-ecient Federated
Feature Selection Framework for Relational Data. Proc. ACM Manag. Data 1, 1, Article 107 (May 2023), 28 pages.
https://doi.org/10.1145/3588961
1 INTRODUCTION
Recent years have witnessed a growing interest in exploiting data from cross-silo organizations
to design more accurate machine learning (ML) [
46
,
59
] models and provide better customer
services [
16
,
39
,
66
]. However, the raw data held by the distributed organizations cannot be shared
with each other due to privacy concerns. To this end, the federated learning (FL) [
6
,
41
,
62
] paradigm
is proposed, which enables cross-silo organizations to collaboratively build ML models without
disclosing their raw data. FL can be categorized into dierent settings based on the data partitioning.
In this paper, we consider the vertically-partitioned setting (aka. VFL), where the organizations
∗
Meihui Zhang is the corresponding author.
Authors’ addresses: Rui Fu, Beijing Institute of Technology, Beijing, China, 3120201016@bit.edu.cn; Yuncheng Wu, National
University of Singapore, Singapore, Singapore, wuyc@comp.nus.edu.sg; Quanqing Xu, OceanBase, Ant Group, Hangzhou,
China, xuquanqing.xqq@antgroup.com; Meihui Zhang, Beijing Institute of Technology, Beijing, China, meihui_zhang@bit.
edu.cn.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the
full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2836-6573/2023/5-ART107 $15.00
https://doi.org/10.1145/3588961
Proc. ACM Manag. Data, Vol. 1, No. 1, Article 107. Publication date: May 2023.
评论