
VFDV-IM:AnEfficient and Securely
Vertical Federated Data Valuation
Xiaokai Zhou
1
,XiaoYan
2
,XinyanLi
1
,HaoHuang
1
,QuanqingXu
3
,
Qinbo Zhang
1
,YenJerome
4
,ZhaohuiCai
1(
B
)
, and Jiawei Jiang
1(
B
)
1
Wuhan University, Wuhan, Hubei Province, China
{xiaokaizhou,xinyan
li,haohuang,qinbo zhang,zhcai,jiawei.jiang}@whu.edu.cn
2
Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong SAR, China
3
OceanBase, Ant Group, China
xuquanqing.xqq@oceanbase.com
4
The University of Macau, Macau, China
jeromeyen@um.edu.mo
Abstract. Vertical federated learning enables multiple participants to
build a joint machine learning model upon distributed features of over-
lapping samples. The performance of VFL models heavily depends on the
quality of participants’ local data. It’s essential to measure the contribu-
tions of the participants for various purposes, e.g., participant selection
and reward allocation. The Shapley value is widely adopted by previ-
ous works for contribution assessment. However, computing the Shapley
value in VFL requires repetitive model training from scratch, incurring
exp ensive computation and communication overheads. Inspired by this
challenge, in this paper, we ask: can we efficiently and securely perform
data valuation for participants via the Shapley value in VFL?
We call this problem Vertical Federated Data Valuation, and intro-
duce VFDV-IM, a method utilizing an Inheritance Mechanism to expe-
dite Shapley value calculations by leveraging historical training records.
We first propose a simple, yet effective, strategy that directly inher-
its the model trained over the entire consortium. To further optimize
VFDV-IM, we propose a model ensemble approach that measures the
similarity of evaluated consortiums, based on which we reweight the his-
torical models. We conduct extensive experiments on various datasets
and show that our VFDV-IM can efficiently calculate the Shapley value
while maintaining accuracy.
Keywords: Vertical federated learning
· Data valuation
1Introduction
Building high-quality machine learning (ML) models usually requires collecting
and merging data from various organizations. However, due to data protection
X. Zhou and X. Yan—Equal contribution.
c
! The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024
M. Onizuka et al. (Eds.): DASFAA 2024, LNCS 14850, pp. 409–424, 2024.
https://doi.org/10.1007/978-981-97-5552-3
_28
BBAAD9C20180234D78A0072836F0B3B092B9B20912680BA0A4D98434B15B2B741B4BB438915BAB0F22C920089846A2EBF2E9218AC1D05B311BBFC26F7C7E3FD3241123ADB72279F764FF283767F74BC05967EB8175F7041048FA519C2ADBBCC8D9E620954E3
评论