Are We Ready For Learne d Cardinality Estimation?
Xiaoying Wang
†∗
, Changbo Qu
†∗
, Weiyuan Wu
†∗
, Jiannan Wang
†
, Qingqing Zhou
♦
Simon Fraser University
†
Tencent Inc.
♦
{xiaoying_wang, changboq, youngw, jnwang}@sfu.ca hewanzhou@tencent.com
ABSTRACT
Cardinality estimation is a fundamental but long unresolved prob-
lem in query optimization. Recently, multiple papers from different
research groups consistently report that learned models have the
potential to replace existing cardinality estimators. In this paper,
we ask a forward-thinking question: Are we ready to deploy these
learned cardinality models in production? Our study consists of three
main parts. Firstly, we focus on the static environment (i.e., no data
updates) and compare five new learned methods with nine tradi-
tional methods on four real-world datasets under a unified workload
setting. The results show that learned models are indeed more ac-
curate than traditional methods, but they often suffer from high
training and inference costs. Secondly, we explore whether these
learned models are ready for dynamic environments (i.e., frequent
data updates). We find that they cannot catch up with fast data up-
dates and return large errors for different reasons. For less frequent
updates, they can perform better but there is no clear winner among
themselves. Thirdly, we take a deeper look into learned models and
explore when they may go wrong. Our results show that the perfor-
mance of learned methods can be greatly affected by the changes
in correlation, skewness, or domain size. More importantly, their
behaviors are much harder to interpret and often unpredictable.
Based on these findings, we identify two promising research direc-
tions (control the cost of learned models and make learned models
trustworthy) and suggest a number of research opportunities. We
hope that our study can guide researchers and practitioners to work
together to eventually push learned cardinality estimators into real
database systems.
PVLDB Reference Format:
Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, Qingqing Zhou.
Are We Ready For Learned Cardinality Estimation?. PVLDB, 14(9): 1640 -
1654, 2021.
doi:10.14778/3461535.3461552
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at
https://github.com/sfu-db/AreCELearnedYet.
1 INTRODUCTION
The rise of “ML for DB” has sparked a large bo dy of exciting research
studies exploring how to replace existing database components with
learned models [
32
,
37
,
39
,
68
,
84
,
98
]. Impressive results have been
repeatedly reported from these papers, which suggest that “ML for
DB” is a promising research area for the database community to
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 14, No. 9 ISSN 2150-8097.
doi:10.14778/3461535.3461552
* The first three authors contributed equally to this research.
explore. To maximize the impact of this research area, one natural
question that we should keep asking ourselves is: Are we ready to
deploy these learned models in production?
In this paper, we seek to answer this question for cardinality
estimation. In particular, we focus on single-table cardinality esti-
mation, a fundamental and long standing problem in query opti-
mization [
18
,
95
]. It is the task of estimating the number of tuples
of a table that satisfy the query predicates. Database systems use
a query optimizer to choose an execution plan with the estimated
minimum cost. The performance of a query optimizer largely de-
pends on the quality of cardinality estimation. A query plan based
on a wrongly estimated cardinality can be orders of magnitude
slower than the best plan [42].
Multiple recent papers [
18
,
28
,
30
,
34
,
95
] have shown that learne d
models can greatly improve the cardinality estimation accuracy
compared with traditional methods. However, their experiments
have a number of limitations (see Section 2.5 for more detailed
discussion). Firstly, they do not include all the learned methods in
their evaluation. Secondly, they do not use the same datasets and
workload. Thirdly, they do not extensively test how well learned
methods perform in dynamic environments (e.g., by varying update
rate). Lastly, they mainly fo cus on when learned methods will go
right rather than when they may go wrong.
We overcome these limitations and conduct comprehensive ex-
periments and analyses. The paper makes four contributions:
Are Learned Methods Ready For Static Environments?
We
propose a unified workload generator and collect four real-world
benchmark datasets. We compare five new learned methods with
nine traditional methods using the same datasets and workload in
static environments (i.e., no data updates). The results on accuracy
are quite promising. In terms of training/inference time, there is
only one method [
18
] that can achieve similar performance with
existing DBMSs. The other learned methods typically require 10
−
1000
×
more time in training and inference. Moreover, all learned
methods have an extra cost for hyper-parameter tuning.
Are Learned Methods Ready For Dynamic Environments?
We explore how each learned method performs by varying up-
date rate on four real-world datasets. The results show that learned
methods fail to catch up with fast data updates and tend to re-
turn large error for various reasons (e.g., the stale model processes
too many queries, the update period is not long enough to get a
good updated model). When data updates are less frequent, learned
methods can perform better but there is no clear winner among
themselves. We further explore the update time vs. accuracy trade-
off, and investigate how much GPU can help learned methods in
dynamic environments.
When Do Learned Methods Go Wrong?
We vary correlation,
skewness, and domain size, respectively, on a synthetic dataset, and
try to understand when learned methods may go wrong. We find
that all learned methods tend to output larger error on more cor-
related data, but they react differently w.r.t. skewness and domain
size. Due to the use of black-box models, their wrong behaviors are
评论