ICDE 2025_TopTune Tailored Optimization for Categorical andContinuous Knobs Towards Accelerated andImproved Database Performance Tuning_达梦数据.pdf

迹部景吾

138

14页

5次

2025-06-20

免费下载

TopTune: Tailored Optimization for Categorical and

Continuous Knobs Towards Accelerated and

Improved Database Performance Tuning

Rukai Wei

†‡⋆

, Yu Liu

†⋆

, Yufeng Hou

†⋆

, Heng Cui

†

, Yongqiang Zhang

‡

, Ke Zhou

†✉

†

Huazhong University of Science and Technology, Wuhan, China

‡

Dameng Database Co.,Ltd, Wuhan, China

[weirukai, liu yu, houyufeng, hengcui, zhke]@hust.edu.cn, zyq@dameng.com

⋆

contribute equally, ✉ corresponding author

Abstract—Using a machine learning (ML) model as a core

component in database knob tuning has demonstrated remark-

able advancements in recent years. However, a model that

optimizes both categorical and continuous values in the same

way may not guarantee efﬁciency and effectiveness in knob

tuning. This is due to the fact that the usual assumption of a

differentiable input space for efﬁcient exploration of continuous

spaces does not hold true in categorical spaces. Moreover, the

inherent complexity of interdependences among knobs and the

high-dimensionality of the conﬁguration space compound the

challenges of tuning. In this paper, we propose TOPTUNE, which

employs tailored optimization for continuous and categorical

knobs, to achieve accelerated tuning efﬁciency and improved

tuning performance. Speciﬁcally, we decompose the conﬁguration

space into two orthogonal subspaces: categorical and continuous

spaces. Subsequently, we employ Bayesian optimization models,

i.e., SMAC and GP to explore the categorical and continuous

subspaces, respectively. These two models will alternately explore

the two spaces with the proposed communication mechanism to

ensure TOPTUNE can capture the dependence between contin-

uous and categorical knobs. Furthermore, to balance efﬁciency

and accuracy, we utilize a knob-dimensional projection strategy

to reduce the exploration domain by embedding the high-

dimension conﬁguration space into a lower-dimensional proxy

space. In addition, we implement batch Bayesian optimization

technology, which enables parallel knob evaluation while balanc-

ing exploration and exploitation. We evaluate TOPTUNE under

different benchmarks (SYSBENCH, TPC-C, and JOB), metrics

(throughput and latency), and DBMSs (MySQL and Dameng).

Extensive experiments demonstrate that TOPTUNE identiﬁes

better conﬁgurations in up to approximately 12.2× less time

while achieving a 10.7% improvement in throughput compared

to state-of-the-art methods.

Index Terms—knob tuning, tailored optimization, space de-

composition, knob dimensional projection, batch Bayesian opti-

mization

I. INTRODUCTION

Efﬁciently exploring the conﬁguration space to ﬁnd sensible

values of knobs is crucial for database performance opti-

mization. Modern Database Management Systems (DBMSs)

offer access to hundreds of conﬁgurable knobs. These knobs

can be broadly classiﬁed into categorical and continuous

knobs, based on their tuning value domain. A categorical

knob can be selected in a set of candidate values while

(a) Categorical knobs (b) Continuous knobs

Fig. 1: A toy example using GP, SMAC, and Mixed-kernel BO

models on the SYSBENCH benchmark. (a) involves tuning 10

categorical knobs and (b) involves tuning 10 continuous knobs.

a continuous knob can be adjusted within a value range

of [min, max]. For example, in the Dameng database, the

categorical knob PARALLEL_POLICY has three options, i.e.,

{0, 1, 2}, each representing a distinct parallel policy em-

ployed for query optimization. In contrast, the continuous knob

WORKER_CPU_PERCENT accepts integer values ranging from

0 to 100, where a higher value indicates a greater percentage

of CPU resources that the DBMS can utilize. There are two

discrepancies between categorical and continuous knobs in the

effect on performance.

1) Continuity and gradient. The values of continu-

ous knobs orderly correspond to performance, while

those of categorical knobs do not. For example,

WORKER_CPU_PERCENT=100 usually performs better

than WORKER_CPU_PERCENT=90 or 80 in through-

put. In contrast, for PARALLEL_POLICY, the throughput

has no obvious gradient changes by choosing 0, 1, or 2.

2) Sensitivity to value changes. Continuous knobs typically

have more redundant values than categorical knobs, and

small changes are unlikely to signiﬁcantly affect DBMS

performance. For instance, adjusting the continuous knob

VM_POOL_SIZE from 32 to 1048576, a broad range,

results in relatively stable throughput within 1024 to

2048. In contrast, a change in a categorical knob often

has a great impact on throughput or latency.

Motivation. Compared to knob dependence [1], these discrep-

ancies are not receiving adequate attention. Traditional meth-

613

2025 IEEE 41st International Conference on Data Engineering (ICDE)

DOI 10.1109/ICDE65448.2025.00052

ods prefer to link the picked knobs into a vector and input the

vector into a single optimizer for uniform tuning [2]–[4]. This

behavior ensures that various types of knobs are optimized

toward the same objective, while implicitly assuming that

exploring different spaces in similar manners yields the same

beneﬁts. However, this assumption may be violated when ex-

ploring the combinatorial domain mixed with categorical and

continuous spaces [5]. Compelling the optimizer to explore

near-optimal conﬁgurations within this combinatorial domain

renders the quest for the desired conﬁguration sub-optimal

and inefﬁcient. To address this problem, we aim to develop

a novel knob optimizer that can (1) optimize continuous

and categorical knobs in different ways while considering

the dependence between knobs, and (2) ﬁnd near-optimal

conﬁgurations with fewer tuning iterations than state-of-

the-art methods. This poses three challenges.

(C1) Limited ability of the model to optimize continuous

and categorical variables simultaneously. Tuning with a

combination of categorical and continuous variables can be

challenging. If some inputs are categorical variables, then

the common assumption that all variables are differentiable

over the input space, which allows for efﬁcient exploration,

is no longer valid, and vice versa [6], [7]. For example,

OtterTune [1] and ResTune [8] adopt a Gaussian Process

(GP) [9], [10] model as a knob optimizer based on the

assumption that the conﬁguration space is continuous and

differentiable. Conversely, GPTuner [4] employs a SMAC [11]

model since it excels in handling categorical knobs. Indeed,

these methods may be biased in model selection, and cannot be

competent to explore both categorical and continuous spaces

exactly. To verify this assumption, we tune 10 categorical

and 10 continuous knobs, respectively, using SMAC, GP,

and Mixed-kernel BO [12] on the SYSBENCH benchmark.

As shown in Fig. 1(a), SMAC achieves superior throughput

for categorical knobs. Conversely, for continuous knobs, GP

consistently delivers more accurate and efﬁcient optimization

results than SMAC, as shown in Fig. 1(b). While Mixed-

kernel BO can handle both continuous and categorical knobs

using different kernel functions, it still struggles to achieve top

performance in both spaces. This indicates that the optimal

model varies for each type of knob.

(C2) Complex dependence between categorical and con-

tinuous knobs during the exploration for near-optimal

conﬁgurations. Training optimal models for two types of

knobs in isolation does not guarantee optimal performance

for databases. Fig. 2 (a) shows a toy example on the SYS-

BENCH benchmark. The change occurs in throughput for

varying values of two categorical and two continuous knobs,

where Cat_x and Con_x represent the value combinations

of the two categorical and two continuous knobs, respectively.

When the values of categorical knobs are set to Cat_1,

the optimal throughput corresponds to Con_1. Conversely,

when the values of categorical knobs are set to Cat_2, the

optimal continuous knobs alter to Con_3. This means that

tuning both the categorical and the continuous knobs in a

coordinated way may provide more beneﬁts than tuning them

(a) Dependence between continuous and

categorical knobs

(b) Performance limitation when

tuning less knobs

Fig. 2: Toy examples on the SYSBENCH benchmark. (a)

shows throughput corresponding to various combinations of

categorical and continuous knob values. (b) shows the through-

put and 95th %-tile latency comparisons between tuning all 60

knobs and the 20 key knobs.

independently. Therefore, it is crucial to account for inter-

space communication when designing a knob optimizer that

explores categorical and continuous spaces differently.

(C3) Tricky balance between efﬁciency and effective-

ness. Enhancing the efﬁciency of knob exploration in high-

dimensional space often involves reducing the conﬁguration

space. Traditional methods achieve it by freezing the knobs

deemed unimportant [1], [3] or discretizing the search space

of continuous knobs with ﬁxed step sizes [13]. Indeed, the

preference assumptions behind these behaviors are overly

restrictive thereby reducing the chances of ﬁnding a near-

optimal conﬁguration, as all the input knobs may contribute

to the tuning. As shown in Fig. 2 (b), tuning all 60 knobs,

not just the seemingly important ones, will consistently lead

to a more optimal conﬁguration. This implies that the solution

for ﬁnding the near-optimal conﬁguration with fewer tuning

iterations should avoid preferring only the key knobs.

Our Approach. To tackle these challenges, we propose TOP-

TUNE, an innovative database knob optimizer using Bayesian

optimization (BO)-based methods for enhanced tuning efﬁ-

ciency and superior performance. There are four highlights in

our optimizer. (1) We propose to decompose the conﬁgura-

tion space into categorical and continuous subspaces [6],

[7], addressing the C1 issue. Two tuning models which excel

at continuous optimization (e.g., GP) and categorical optimiza-

tion (e.g., SMAC), respectively, are employed to deal with

optimization problems within two subspaces, respectively.

This scheme can take full advantage of the different tuning

models, improving overall performance in the heterogeneous

conﬁguration space. Furthermore, search space decomposition

aids in overcoming the curse of dimensionality. By assigning

fewer knobs to each tuning model after decomposition, it

enhances tuning efﬁciency. (2) We design a mechanism to

facilitate information communication between the two tun-

ing models [14], addressing the C2 issue. In this mechanism,

the two models alternate in conducting the tuning process.

Each model shares its tuning decisions, speciﬁcally the current

optimal knob values, with the other model via a context

feature. This ensures that TOPTUNE can achieve the global

optimum based on dependence between knobs. (3) We intro-

614

of 14

免费下载

文档被以下合辑收录

数据库顶会 ICDE 2025 论文下载（共16篇）

本合辑收集了数据库顶会 ICDE 2025 的论文，可以免费下载。

关注

文档被以下合辑收录

评论