151:2 Dongdong Xie, Pinghui Wang, anqing Xu, Chuanhui Yang, and Rundong Li
by observing sucient query results over the database and provides perturbations for avoiding
privacy leakage. In recent years, dierential privacy [
2
] (DP) has become the de facto standard for
privacy-preserving query releases and has been applied by Apple, Google, Microsoft, etc. Since it
oers an elegant and rigorous denition of privacy. When releasing the query result of a database,
the best way to preserve the privacy of an element is to just remove that element from the database
and then make queries. That leads to the principle idea of DP: deleting or altering any element
in the dataset does not signicantly change the probability distribution of the query outputs.
Consequently, observers of these outputs cannot reliably infer information about the presence of
any individual element within the dataset.
Instead of querying static datasets [
2
,
3
], in many real-world applications, queries are computed
over ever-evolving data. For example, a webpage may display the number of its visitors in real
time. Therefore, it is essential to consider dierential privacy under continual releases [
4
,
5
]. In this
setting, time is divided into discrete periods, with no more than one element coming at each period.
We can model the input as a data stream
𝒆 = (𝑒
1
, . . . , 𝑒
𝑇
)
. A new query result is given at every time
step
𝑡
, which is computed over the seen stream prex
(𝑒
1
, 𝑒
2
, . . . , 𝑒
𝑡
)
. Similar to the static setting,
DP under continual releases should ensure that for any pair of data streams that dier only at one
time period, the probability distributions of the pair of query results should be hard to distinguish.
In recent years, DP under continual releases has been extensively studied [
4
–
11
]. Researchers have
applied DP under continual releases for tasks such as computing private sums [
12
], answering range
queries [
13
], answering join queries on databases [
14
], and more. Among these tasks, dierentially
private cardinality continual releases (DPCCR) - which involves continually releasing the number of
unique elements in a stream where duplicates may appear - is one of the most fundamental queries
with many applications. For example, e-commerce platforms may track unique user visits to product
pages, helping store owners predict product popularity and optimize inventory management. Online
video-sharing platforms like YouTube might monitor the number of unique viewers of a video,
providing a more accurate measure of a video’s popularity than simply tracking the number of
views. The results can help users assess the quality of a video. This can also assist advertisers in
estimating the reach - the number of distinct ad exposures - of their advertisements within the
video content. Additionally, understanding the number of dierent connections can inform security
measures and resource allocation in network trac monitoring.
Directly revealing the exact cardinalities would violate DP under continual releases and leak
privacy. An adversary who observes the output cardinalities
{𝑛
1
, . . . ,𝑛
𝑇
}
could potentially deduce
the existence of specic elements in the stream, especially when combined with additional side
information. A trivial example is, suppose
𝑛
𝑇
exactly equals the element universe size, observers
can deduce that any element in the universe is in the stream. There is another example. Suppose a
webpage displays the exact number of distinct viewers in real time. The attacker sends the link of
the webpage to another one and requires the link receiver to click on the link. If the number of
distinct viewers does not grow, then the attacker deduces that the receiver has already visited the
webpage before, compromising the privacy of the link receiver. Therefore, to preserve privacy, the
disturbed cardinalities {
ˆ
𝑛
1
, . . . ,
ˆ
𝑛
𝑇
} should be released instead of the exact cardinalities.
Sketch-based methods like the Flajolet-Martin (FM) sketch [
15
] and HyperLogLog (HLL) [
16
]
are widely used for their compact memory usage and accurate cardinality estimations. They adopt
a compact data structure (called a sketch) to record summaries of the stream elements and give
estimations based on the updated data structure. Previous works [
17
–
19
] have proved that some of
them are inherently or can be modied to preserve DP. However, their results are restricted in the
single query settings, i.e., only output the estimation once. One may execute the above methods
at every time step. However, such a solution is not viable. In DP, the privacy budget
𝜀
serves as
a metric for quantifying the level of privacy guarantee. The formal denition of
𝜀
is provided in
Proc. ACM Manag. Data, Vol. 3, No. 3 (SIGMOD), Article 151. Publication date: June 2025.
BBAAD9C20180234D78A0072836F0BBD0B2B9B20A1D186BB0A1D9893CB13D2BA93B49B438015B7B0F22392208984657EB6AE9215AE1D01BB11BBFC21E724E39D7241D99AD552D898764AD2FC763F743F4560CE4C5735D62A508FC319F10D29C08DFB6269FFE3
文档被以下合辑收录
评论