
𝜏-LevelIndex: Towards Eicient ery Processing in
Continuous Preference Space
Jiahao Zhang
†
, Bo Tang
‡§
, Man Lung Yiu
†
, Xiao Yan
‡§
, Keming Li
‡§ ∗
†
Department of Computing, Hong Kong Polytechnic University
§
Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology
‡
Department of Computer Science and Engineering, Southern University of Science and Technology
{csjhzhang,csmlyiu}@comp.polyu.edu.hk,{tangb3@,yanx@,likm2020@mail.}sustech.edu.cn
ABSTRACT
Top-
𝑘
related queries in continuous preference space (e.g., k-shortlist
preference query
kSPR
, uncertain top-
𝑘
query
UTK
, output-size
specied utility-based query
ORU
) have numerous applications but
are expensive to process. Existing algorithms process each query via
specialized optimizations, which are dicult to generalize. In this
work, we propose a novel and general index structure
𝜏
-
LevelIndex
,
which can be used to process various queries in continuous pref-
erence space eciently. We devise ecient approaches to build
the
𝜏
-
LevelIndex
by fully exploiting the properties of continuous
preference space. We conduct extensive experimental studies on
both real- and synthetic- benchmarks. The results show that (i)
our proposed index building approaches have low costs in terms
of both space and time, and (ii)
𝜏
-
LevelIndex
signicantly outper-
forms specialized solutions for processing a spectrum of queries in
continuous preference space, and the speedup can be two to three
orders of magnitude.
CCS CONCEPTS
• Information systems → Top-k retrieval in databases.
KEYWORDS
k-level index, preference space, top-𝑘 query processing
ACM Reference Format:
Jiahao Zhang, Bo Tang, Man Lung Yiu, Xiao Yan, Keming Li. 2022.
𝜏
-
LevelIndex: Towards Ecient Query Processing in Continuous Preference
Space. In Proceedings of the 2022 International Conference on Management
of Data (SIGMOD’22), June 12–17, 2022, Philadelphia, PA, USA. ACM, New
York, NY, USA, 14 pages. https://doi.org/10.1145/3514221.3526182
1 INTRODUCTION
The preference model is widely used in many applications, e.g.,
recommender system, multi-criteria decision making, and prod-
uct ranking. Specically, each option in the market has several
∗
Dr. Bo Tang is the corresponding author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
SIGMOD’22, June 12–17, 2022, Philadelphia, PA, USA.
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9249-5/22/06.. . $15.00
https://doi.org/10.1145/3514221.3526182
attributes and the value of each attribute models the competitive-
ness of the option. For example, each hotel in the online portals
(e.g, Booking [
1
] and TripAdvisor [
6
]) includes value, facility, and
cleanliness attributes. The attribute values of Crowne Plaza Hotel
in Booking are 0.83, 0.86, 0.89, respectively. We denote an option
with
𝑑
attributes as r
= (𝑟 [
1
], 𝑟 [
2
], · · · , 𝑟 [𝑑])
and all options form
the dataset
D
. A user chooses options based on her preference
weight vector w
= (𝑤 [
1
], 𝑤 [
2
], · · · , 𝑤 [𝑑])
, which species a nu-
meric weight for each attribute. To compute the score of each option
for the user, a linear scoring function (i.e.,
S
w
(
r
) =
Í
𝑑
𝑖=1
𝑟 [𝑖]𝑤 [𝑖]
)
is used and the top-
𝑘
products with the highest scores are appeal-
ing to the user. In the past decades, the database community has
conducted extensive studies on shortlisting product options for
users, i.e., top-
𝑘
query [
22
,
39
], skyline query [
11
,
32
], and hybrids
of top-
𝑘
and skyline queries [
14
,
28
].
Rtree
and its variants [
10
,
19
]
are widely used to accelerate the above query processing.
Besides nding top-
𝑘
options for users, it is also important to
consider user preferences from the perspective of product providers.
For instance, a hotel manager may want to identify potential cus-
tomers who rank his hotel as top-
𝑘
, which should be targeted in
advertising campaigns. Recently, many queries (e.g., Monochromatic
reverse top-
𝑘
[
42
],
MaxRank
[
31
],
kSPR
[
37
], restricted skyline [
14
],
UTK
[
30
], and
ORU
[
28
]) are proposed to explore the entire user
preference space (i.e., the continuous space dened by preference
weight w) instead of discrete user preference weights in traditional
top-
𝑘
ranking queries (i.e., points in the preference space). These
queries can help product providers analyze the competitiveness of
their products in the market, identify target users, adjust the design
of products to attract more users, etc. For example, the
MaxRank
query [
31
] reports the maximum rank a product can have among
all possible users’ preferences, which tells the product provider the
market status of his product.
Many algorithms have been proposed to accelerate the query
processing in continuous preference space, but they are still inade-
quate in two aspects. First, the query processing complexity is high
even with state-of-the-art solutions as many expensive geometric
operations are involved. For example, our experiments show that
an ORU query can take more than 1000 seconds. Second, Existing
algorithms develop specialized techniques for each query and there
lacks a general index structure that accelerates a wide range of
queries in continuous preference space, i.e., an analogue of
Rtree
for top-𝑘 ranking queries.
In this work, we devise a general index (
𝜏
-
LevelIndex
) for queries
in continuous preference space, where
𝜏
is a user-specied param-
eter. A general index is favorable as it not only reduces query
Session 28: Spatial, Temporal, and Multimedia Databases
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA
评论