APWeb-WAIM2022_Accelerated Algorithms for α-happiness Query_深算院_opt.pdf

tinge

229

16页

3次

2023-10-11

免费下载

Accelerated Algorithms for α-Happiness

Query

Min Xie

(

)

Shenzhen Institute of Computing Sciences, Shenzhen University, Shenzhen, China

xiemin@sics.ac.cn

Abstract. Extracting a good representative subset of tuples that meets

the user’s needs from a large database is an important problem in multi-

criteria decision making. Many queries have been proposed for this pur-

pose, including the top-k query and the skyline query. Unfortunately,

these traditional queries either ask the user to specify their needs explic-

itly or overwhelm users with a large output size. Recently, an α-happiness

query was proposed, which overcomes the deﬁciencies of existing queries:

users do not need to specify any preference, while they can obtain a small

set of tuples such that users are happy with the results, i.e., their favorite

tuples in the returned subset is guaranteed to be not much worse than

their favorite tuples in the whole database. In this paper, we study the

α-happiness query. Inspired by the techniques of incremental convex hull

computation, we develop two accelerated algorithms, which maintain use-

ful information to avoid redundant computation, in both 2-dimensional

and d-dimensional space (d>2). We performed extensive experiments,

comparing against the best-known method under various settings on

both real and synthetic datasets. Our superiority is demonstrated: we

can achieve up to two orders and 7 times of improvements in execution

times in 2-dimensional and d-dimensional space, respectively.

Keywords: α-happiness

· Incremental convex hull · Decision making

1 Introduction

Nowadays, a database system usually contains millions of tuples and end users

may only want to ﬁnd those tuples that ﬁt their needs. This problem is known

as multi-criteria decision making [5,18,19], and various queries were proposed

to obtain a small representative subset of tuples without asking the user to scan

the whole database. An example is the traditional top-k query [18,19], where a

user has to provide her preference function, called the utility function. Based on

the user’s utility function, the utility of each tuple for this user can be computed,

where a higher utility means that the tuple is more preferred. Finally, the best k

tuples with the highest utilities are returned. Unfortunately, requiring the user

to provide the exact utility function is too restrictive in many scenarios. In this

case, the skyline query can be applied [5], which adopts the “dominance” concept.

A tuple p is said to dominate another tuple q if p is not worse than q on each

 The Author(s), under exclusive license to Springer Nature Switzerland AG 2023

B. Li et al. (Eds.): APWeb-WAIM 2022, LNCS 13422, pp. 53–68, 2023.

https://doi.org/10.1007/978-3-031-25198-6

54 M. Xie

attribute and p is better than q on at least one attribute. Intuitively, p will have a

higher utility than q w.r.t. all monotonic utility functions. Tuples which are not

dominated by any other tuples are returned in the skyline query. However, since

there is no constraint on the output size of a skyline query, a skyline query can

overwhelm the user with many results (at worst the whole database). Motivated

by this, a query called α-happiness query was studied recently in [27]toovercome

the deﬁciencies of both the top-k query (which requires the users to specify their

utility functions) and the skyline query (which might have a large output size).

Informally, an α-happiness query computes a set of tuples, with size as small

as possible, that makes the users happy where the degree of happiness is quanti-

ﬁed as the happiness ratio of the user. Speciﬁcally, given a set of tuples, a user is

x% happy with the set if the highest utility of tuples in the set is at least x% of

the highest utility of all tuples in the whole database. In this case, we say that the

happiness ratio of the user is x%. Clearly, the happiness ratio is a value from 0 to

1. The larger the happiness ratio, the happier the user. The α-happiness query

guarantees the happiness ratio of an end user is at least α. In practice, more

tuples have to be returned to guarantee a higher happiness level, as exp ected.

However, with more tuples returned, users have to spend more eﬀort to examine

the output, which is not desirable as they did in the skyline query. Hence, we

want the solution to be as small as possible, to ensure the given happiness level.

Consider a car database application. Assume that Alice wants to buy a car

from the car database where each car is described by two attributes, namely

horse power (HP) and miles per gallon (MPG). To help Alice for ﬁnding her

desired car, Alice can specify an α value, which represents the least happiness

level she expects. In practice, Alice can set α to be 0.9, indicating that she wants

a set of cars whose highest utility is at least 90% of the highest utility of all cars in

the database. Then, we execute the α-happiness query, which returns a small set

of cars from the database, hoping that Alice will be satisﬁed (since the happiness

ratio of Alice is at least α, as speciﬁed). However, if Alice is not satisﬁed with

those cars, she can increase the value of α, and execute the α-happiness query

again to obtain more cars with better quality to ensure a higher α.

Although it is NP-hard to solve the α-happiness query [27], various practical

algorithms were proposed in the literature. The best-known previous approach

for the α-happiness query is Cone-Greedy [27]. However, when we experimen-

tally evaluated Cone-Greedy, its execution time is unnecessarily long. This

is because Cone-Greedy did not keep suﬃcient information and thus, might

perform redundant computation, resulting in a long query time. The situation

is even worse when the user wants to perform multiple α-happiness queries with

diﬀerent values of

α on the same database, which is common in reality since users

might adjust the value of α to obtain more/less tuples to ﬁt their needs. Moti-

vated by this, we propose two novel approaches which accelerate Cone-Greedy

in both 2-dimensional and d-dimensional space (d>2). Our algorithms are

inspired by the incremental convex hull computation in computational geome-

try, and diﬀerent from Cone-Greedy, they eﬀectively maintain the information

needed during the computation and re-use them when necessary. Our exper-

iments show that our algorithms substantially outperform Cone-Greedy in

execution time. Our major contributions are summarized as follows:

of 16

免费下载

关注

评论