暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
SIGMOD 2025_Efficient and Accurate Differentially Private Cardinality Continual Releases_OceanBase.pdf
82
27页
0次
2025-07-30
免费下载
Eicient and Accurate Dierentially Private Cardinality
Continual Releases
DONGDONG XIE, MOE KLINNS Lab, Xi’an Jiaotong University, OceanBase, Ant Group, China
PINGHUI WANG
, MOE KLINNS Lab, Xi’an Jiaotong University, China
QUANQING XU, OceanBase, Ant Group, China
CHUANHUI YANG, OceanBase, Ant Group, China
RUNDONG LI, MOE KLINNS Lab, Xi’an Jiaotong University, China
Accurately estimating the number of unique elements that appear in data streams in real time is a funda-
mental problem with applications including network trac monitoring and real-time social media analytics.
Traditional sketch-based algorithms such as FM Sketch and HyperLogLog oer memory-friendly solutions
for cardinality estimation but fall short in scenarios where the stream elements are privacy-sensitive and
require dierential privacy. Although recent approaches have incorporated dierential privacy into the above
cardinality estimators, they are limited to single-query settings, restricting their applicability. Previous methods
for private cardinality continual release settings—i.e., releasing the cardinality after each new element in the
stream—demand large memory resources and are thus dicult to apply in practice. In this paper, we present
a novel cardinality estimation framework, FC, which ensures dierential privacy under continual releases
while simultaneously achieving low memory usage, high accuracy, and ecient computation. Our approach
innovatively leverages an ecient cardinality estimator and privacy-preserving mechanisms to overcome the
limitations of existing methods. Comprehensive experiments demonstrate that our method reduces memory
usage by up to 504 times compared to the best previous method while maintaining nearly the same accuracy.
Additionally, under identical memory constraints, our method improves the estimation accuracy by orders of
magnitude.
CCS Concepts: Security and privacy
Database and storage security; Information systems
Data management systems; Theory of computation Approximation algorithms analysis.
Additional Key Words and Phrases: Cardinality Estimation, Dierential Privacy, Stream Processing
ACM Reference Format:
Dongdong Xie, Pinghui Wang, Quanqing Xu, Chuanhui Yang, and Rundong Li. 2025. Ecient and Accurate
Dierentially Private Cardinality Continual Releases. Proc. ACM Manag. Data 3, 3 (SIGMOD), Article 151
(June 2025), 27 pages. https://doi.org/10.1145/3725288
1 Introduction
Releasing query results based on sensitive data while preserving privacy is a fundamental task that
has raised much attention. For example, [
1
] demonstrates that a database can be reconstructed
Pinghui Wang is the corresponding author.
Authors’ Contact Information: Dongdong Xie, MOE KLINNS Lab, Xi’an Jiaotong University, OceanBase, Ant Group,
Xi’an, China, xdddd1@stu.xjtu.edu.cn; Pinghui Wang, MOE KLINNS Lab, Xi’an Jiaotong University, Xi’an, China,
wangpinghui369@gmail.com; Quanqing Xu, OceanBase, Ant Group, Hangzhou, China, xuquanqing.xqq@oceanbase.com;
Chuanhui Yang, OceanBase, Ant Group, Hangzhou, China, rizhao.ych@oceanbase.com; Rundong Li, MOE KLINNS Lab,
Xi’an Jiaotong University, Xi’an, China, xjtulirundong@stu.xjtu.edu.cn.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the
full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM 2836-6573/2025/6-ART151
https://doi.org/10.1145/3725288
Proc. ACM Manag. Data, Vol. 3, No. 3 (SIGMOD), Article 151. Publication date: June 2025.
BBAAD9C20180234D78A0072836F0BBD0B2B9B20A1D186BB0A1D9893CB13D2BA93B49B438015B7B0F22392208984657EB6AE9215AE1D01BB11BBFC21E724E39D7241D99AD552D898764AD2FC763F743F4560CE4C5735D62A508FC319F10D29C08DFB6269FFE3
151:2 Dongdong Xie, Pinghui Wang, anqing Xu, Chuanhui Yang, and Rundong Li
by observing sucient query results over the database and provides perturbations for avoiding
privacy leakage. In recent years, dierential privacy [
2
] (DP) has become the de facto standard for
privacy-preserving query releases and has been applied by Apple, Google, Microsoft, etc. Since it
oers an elegant and rigorous denition of privacy. When releasing the query result of a database,
the best way to preserve the privacy of an element is to just remove that element from the database
and then make queries. That leads to the principle idea of DP: deleting or altering any element
in the dataset does not signicantly change the probability distribution of the query outputs.
Consequently, observers of these outputs cannot reliably infer information about the presence of
any individual element within the dataset.
Instead of querying static datasets [
2
,
3
], in many real-world applications, queries are computed
over ever-evolving data. For example, a webpage may display the number of its visitors in real
time. Therefore, it is essential to consider dierential privacy under continual releases [
4
,
5
]. In this
setting, time is divided into discrete periods, with no more than one element coming at each period.
We can model the input as a data stream
𝒆 = (𝑒
1
, . . . , 𝑒
𝑇
)
. A new query result is given at every time
step
𝑡
, which is computed over the seen stream prex
(𝑒
1
, 𝑒
2
, . . . , 𝑒
𝑡
)
. Similar to the static setting,
DP under continual releases should ensure that for any pair of data streams that dier only at one
time period, the probability distributions of the pair of query results should be hard to distinguish.
In recent years, DP under continual releases has been extensively studied [
4
11
]. Researchers have
applied DP under continual releases for tasks such as computing private sums [
12
], answering range
queries [
13
], answering join queries on databases [
14
], and more. Among these tasks, dierentially
private cardinality continual releases (DPCCR) - which involves continually releasing the number of
unique elements in a stream where duplicates may appear - is one of the most fundamental queries
with many applications. For example, e-commerce platforms may track unique user visits to product
pages, helping store owners predict product popularity and optimize inventory management. Online
video-sharing platforms like YouTube might monitor the number of unique viewers of a video,
providing a more accurate measure of a video’s popularity than simply tracking the number of
views. The results can help users assess the quality of a video. This can also assist advertisers in
estimating the reach - the number of distinct ad exposures - of their advertisements within the
video content. Additionally, understanding the number of dierent connections can inform security
measures and resource allocation in network trac monitoring.
Directly revealing the exact cardinalities would violate DP under continual releases and leak
privacy. An adversary who observes the output cardinalities
{𝑛
1
, . . . ,𝑛
𝑇
}
could potentially deduce
the existence of specic elements in the stream, especially when combined with additional side
information. A trivial example is, suppose
𝑛
𝑇
exactly equals the element universe size, observers
can deduce that any element in the universe is in the stream. There is another example. Suppose a
webpage displays the exact number of distinct viewers in real time. The attacker sends the link of
the webpage to another one and requires the link receiver to click on the link. If the number of
distinct viewers does not grow, then the attacker deduces that the receiver has already visited the
webpage before, compromising the privacy of the link receiver. Therefore, to preserve privacy, the
disturbed cardinalities {
ˆ
𝑛
1
, . . . ,
ˆ
𝑛
𝑇
} should be released instead of the exact cardinalities.
Sketch-based methods like the Flajolet-Martin (FM) sketch [
15
] and HyperLogLog (HLL) [
16
]
are widely used for their compact memory usage and accurate cardinality estimations. They adopt
a compact data structure (called a sketch) to record summaries of the stream elements and give
estimations based on the updated data structure. Previous works [
17
19
] have proved that some of
them are inherently or can be modied to preserve DP. However, their results are restricted in the
single query settings, i.e., only output the estimation once. One may execute the above methods
at every time step. However, such a solution is not viable. In DP, the privacy budget
𝜀
serves as
a metric for quantifying the level of privacy guarantee. The formal denition of
𝜀
is provided in
Proc. ACM Manag. Data, Vol. 3, No. 3 (SIGMOD), Article 151. Publication date: June 2025.
BBAAD9C20180234D78A0072836F0BBD0B2B9B20A1D186BB0A1D9893CB13D2BA93B49B438015B7B0F22392208984657EB6AE9215AE1D01BB11BBFC21E724E39D7241D99AD552D898764AD2FC763F743F4560CE4C5735D62A508FC319F10D29C08DFB6269FFE3
of 27
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜