暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
ICDE 2025_TopTune Tailored Optimization for Categorical andContinuous Knobs Towards Accelerated andImproved Database Performance Tuning_达梦数据.pdf
138
14页
5次
2025-06-20
免费下载
TopTune: Tailored Optimization for Categorical and
Continuous Knobs Towards Accelerated and
Improved Database Performance Tuning
Rukai Wei
†‡
, Yu Liu
, Yufeng Hou
, Heng Cui
, Yongqiang Zhang
, Ke Zhou
Huazhong University of Science and Technology, Wuhan, China
Dameng Database Co.,Ltd, Wuhan, China
[weirukai, liu yu, houyufeng, hengcui, zhke]@hust.edu.cn, zyq@dameng.com
contribute equally, corresponding author
Abstract—Using a machine learning (ML) model as a core
component in database knob tuning has demonstrated remark-
able advancements in recent years. However, a model that
optimizes both categorical and continuous values in the same
way may not guarantee efficiency and effectiveness in knob
tuning. This is due to the fact that the usual assumption of a
differentiable input space for efficient exploration of continuous
spaces does not hold true in categorical spaces. Moreover, the
inherent complexity of interdependences among knobs and the
high-dimensionality of the configuration space compound the
challenges of tuning. In this paper, we propose TOPTUNE, which
employs tailored optimization for continuous and categorical
knobs, to achieve accelerated tuning efficiency and improved
tuning performance. Specifically, we decompose the configuration
space into two orthogonal subspaces: categorical and continuous
spaces. Subsequently, we employ Bayesian optimization models,
i.e., SMAC and GP to explore the categorical and continuous
subspaces, respectively. These two models will alternately explore
the two spaces with the proposed communication mechanism to
ensure TOPTUNE can capture the dependence between contin-
uous and categorical knobs. Furthermore, to balance efficiency
and accuracy, we utilize a knob-dimensional projection strategy
to reduce the exploration domain by embedding the high-
dimension configuration space into a lower-dimensional proxy
space. In addition, we implement batch Bayesian optimization
technology, which enables parallel knob evaluation while balanc-
ing exploration and exploitation. We evaluate TOPTUNE under
different benchmarks (SYSBENCH, TPC-C, and JOB), metrics
(throughput and latency), and DBMSs (MySQL and Dameng).
Extensive experiments demonstrate that TOPTUNE identifies
better configurations in up to approximately 12.2× less time
while achieving a 10.7% improvement in throughput compared
to state-of-the-art methods.
Index Terms—knob tuning, tailored optimization, space de-
composition, knob dimensional projection, batch Bayesian opti-
mization
I. INTRODUCTION
Efficiently exploring the configuration space to find sensible
values of knobs is crucial for database performance opti-
mization. Modern Database Management Systems (DBMSs)
offer access to hundreds of configurable knobs. These knobs
can be broadly classified into categorical and continuous
knobs, based on their tuning value domain. A categorical
knob can be selected in a set of candidate values while
(a) Categorical knobs (b) Continuous knobs
Fig. 1: A toy example using GP, SMAC, and Mixed-kernel BO
models on the SYSBENCH benchmark. (a) involves tuning 10
categorical knobs and (b) involves tuning 10 continuous knobs.
a continuous knob can be adjusted within a value range
of [min, max]. For example, in the Dameng database, the
categorical knob PARALLEL_POLICY has three options, i.e.,
{0, 1, 2}, each representing a distinct parallel policy em-
ployed for query optimization. In contrast, the continuous knob
WORKER_CPU_PERCENT accepts integer values ranging from
0 to 100, where a higher value indicates a greater percentage
of CPU resources that the DBMS can utilize. There are two
discrepancies between categorical and continuous knobs in the
effect on performance.
1) Continuity and gradient. The values of continu-
ous knobs orderly correspond to performance, while
those of categorical knobs do not. For example,
WORKER_CPU_PERCENT=100 usually performs better
than WORKER_CPU_PERCENT=90 or 80 in through-
put. In contrast, for PARALLEL_POLICY, the throughput
has no obvious gradient changes by choosing 0, 1, or 2.
2) Sensitivity to value changes. Continuous knobs typically
have more redundant values than categorical knobs, and
small changes are unlikely to significantly affect DBMS
performance. For instance, adjusting the continuous knob
VM_POOL_SIZE from 32 to 1048576, a broad range,
results in relatively stable throughput within 1024 to
2048. In contrast, a change in a categorical knob often
has a great impact on throughput or latency.
Motivation. Compared to knob dependence [1], these discrep-
ancies are not receiving adequate attention. Traditional meth-
613
2025 IEEE 41st International Conference on Data Engineering (ICDE)
2375-026X/25/$31.00 ©2025 IEEE
DOI 10.1109/ICDE65448.2025.00052
ods prefer to link the picked knobs into a vector and input the
vector into a single optimizer for uniform tuning [2]–[4]. This
behavior ensures that various types of knobs are optimized
toward the same objective, while implicitly assuming that
exploring different spaces in similar manners yields the same
benefits. However, this assumption may be violated when ex-
ploring the combinatorial domain mixed with categorical and
continuous spaces [5]. Compelling the optimizer to explore
near-optimal configurations within this combinatorial domain
renders the quest for the desired configuration sub-optimal
and inefficient. To address this problem, we aim to develop
a novel knob optimizer that can (1) optimize continuous
and categorical knobs in different ways while considering
the dependence between knobs, and (2) find near-optimal
configurations with fewer tuning iterations than state-of-
the-art methods. This poses three challenges.
(C1) Limited ability of the model to optimize continuous
and categorical variables simultaneously. Tuning with a
combination of categorical and continuous variables can be
challenging. If some inputs are categorical variables, then
the common assumption that all variables are differentiable
over the input space, which allows for efficient exploration,
is no longer valid, and vice versa [6], [7]. For example,
OtterTune [1] and ResTune [8] adopt a Gaussian Process
(GP) [9], [10] model as a knob optimizer based on the
assumption that the configuration space is continuous and
differentiable. Conversely, GPTuner [4] employs a SMAC [11]
model since it excels in handling categorical knobs. Indeed,
these methods may be biased in model selection, and cannot be
competent to explore both categorical and continuous spaces
exactly. To verify this assumption, we tune 10 categorical
and 10 continuous knobs, respectively, using SMAC, GP,
and Mixed-kernel BO [12] on the SYSBENCH benchmark.
As shown in Fig. 1(a), SMAC achieves superior throughput
for categorical knobs. Conversely, for continuous knobs, GP
consistently delivers more accurate and efficient optimization
results than SMAC, as shown in Fig. 1(b). While Mixed-
kernel BO can handle both continuous and categorical knobs
using different kernel functions, it still struggles to achieve top
performance in both spaces. This indicates that the optimal
model varies for each type of knob.
(C2) Complex dependence between categorical and con-
tinuous knobs during the exploration for near-optimal
configurations. Training optimal models for two types of
knobs in isolation does not guarantee optimal performance
for databases. Fig. 2 (a) shows a toy example on the SYS-
BENCH benchmark. The change occurs in throughput for
varying values of two categorical and two continuous knobs,
where Cat_x and Con_x represent the value combinations
of the two categorical and two continuous knobs, respectively.
When the values of categorical knobs are set to Cat_1,
the optimal throughput corresponds to Con_1. Conversely,
when the values of categorical knobs are set to Cat_2, the
optimal continuous knobs alter to Con_3. This means that
tuning both the categorical and the continuous knobs in a
coordinated way may provide more benefits than tuning them
(a) Dependence between continuous and
categorical knobs
(b) Performance limitation when
tuning less knobs
Fig. 2: Toy examples on the SYSBENCH benchmark. (a)
shows throughput corresponding to various combinations of
categorical and continuous knob values. (b) shows the through-
put and 95th %-tile latency comparisons between tuning all 60
knobs and the 20 key knobs.
independently. Therefore, it is crucial to account for inter-
space communication when designing a knob optimizer that
explores categorical and continuous spaces differently.
(C3) Tricky balance between efficiency and effective-
ness. Enhancing the efficiency of knob exploration in high-
dimensional space often involves reducing the configuration
space. Traditional methods achieve it by freezing the knobs
deemed unimportant [1], [3] or discretizing the search space
of continuous knobs with fixed step sizes [13]. Indeed, the
preference assumptions behind these behaviors are overly
restrictive thereby reducing the chances of finding a near-
optimal configuration, as all the input knobs may contribute
to the tuning. As shown in Fig. 2 (b), tuning all 60 knobs,
not just the seemingly important ones, will consistently lead
to a more optimal configuration. This implies that the solution
for finding the near-optimal configuration with fewer tuning
iterations should avoid preferring only the key knobs.
Our Approach. To tackle these challenges, we propose TOP-
TUNE, an innovative database knob optimizer using Bayesian
optimization (BO)-based methods for enhanced tuning effi-
ciency and superior performance. There are four highlights in
our optimizer. (1) We propose to decompose the configura-
tion space into categorical and continuous subspaces [6],
[7], addressing the C1 issue. Two tuning models which excel
at continuous optimization (e.g., GP) and categorical optimiza-
tion (e.g., SMAC), respectively, are employed to deal with
optimization problems within two subspaces, respectively.
This scheme can take full advantage of the different tuning
models, improving overall performance in the heterogeneous
configuration space. Furthermore, search space decomposition
aids in overcoming the curse of dimensionality. By assigning
fewer knobs to each tuning model after decomposition, it
enhances tuning efficiency. (2) We design a mechanism to
facilitate information communication between the two tun-
ing models [14], addressing the C2 issue. In this mechanism,
the two models alternate in conducting the tuning process.
Each model shares its tuning decisions, specifically the current
optimal knob values, with the other model via a context
feature. This ensures that TOPTUNE can achieve the global
optimum based on dependence between knobs. (3) We intro-
614
of 14
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜