
ods prefer to link the picked knobs into a vector and input the
vector into a single optimizer for uniform tuning [2]–[4]. This
behavior ensures that various types of knobs are optimized
toward the same objective, while implicitly assuming that
exploring different spaces in similar manners yields the same
benefits. However, this assumption may be violated when ex-
ploring the combinatorial domain mixed with categorical and
continuous spaces [5]. Compelling the optimizer to explore
near-optimal configurations within this combinatorial domain
renders the quest for the desired configuration sub-optimal
and inefficient. To address this problem, we aim to develop
a novel knob optimizer that can (1) optimize continuous
and categorical knobs in different ways while considering
the dependence between knobs, and (2) find near-optimal
configurations with fewer tuning iterations than state-of-
the-art methods. This poses three challenges.
(C1) Limited ability of the model to optimize continuous
and categorical variables simultaneously. Tuning with a
combination of categorical and continuous variables can be
challenging. If some inputs are categorical variables, then
the common assumption that all variables are differentiable
over the input space, which allows for efficient exploration,
is no longer valid, and vice versa [6], [7]. For example,
OtterTune [1] and ResTune [8] adopt a Gaussian Process
(GP) [9], [10] model as a knob optimizer based on the
assumption that the configuration space is continuous and
differentiable. Conversely, GPTuner [4] employs a SMAC [11]
model since it excels in handling categorical knobs. Indeed,
these methods may be biased in model selection, and cannot be
competent to explore both categorical and continuous spaces
exactly. To verify this assumption, we tune 10 categorical
and 10 continuous knobs, respectively, using SMAC, GP,
and Mixed-kernel BO [12] on the SYSBENCH benchmark.
As shown in Fig. 1(a), SMAC achieves superior throughput
for categorical knobs. Conversely, for continuous knobs, GP
consistently delivers more accurate and efficient optimization
results than SMAC, as shown in Fig. 1(b). While Mixed-
kernel BO can handle both continuous and categorical knobs
using different kernel functions, it still struggles to achieve top
performance in both spaces. This indicates that the optimal
model varies for each type of knob.
(C2) Complex dependence between categorical and con-
tinuous knobs during the exploration for near-optimal
configurations. Training optimal models for two types of
knobs in isolation does not guarantee optimal performance
for databases. Fig. 2 (a) shows a toy example on the SYS-
BENCH benchmark. The change occurs in throughput for
varying values of two categorical and two continuous knobs,
where Cat_x and Con_x represent the value combinations
of the two categorical and two continuous knobs, respectively.
When the values of categorical knobs are set to Cat_1,
the optimal throughput corresponds to Con_1. Conversely,
when the values of categorical knobs are set to Cat_2, the
optimal continuous knobs alter to Con_3. This means that
tuning both the categorical and the continuous knobs in a
coordinated way may provide more benefits than tuning them
(a) Dependence between continuous and
categorical knobs
(b) Performance limitation when
tuning less knobs
Fig. 2: Toy examples on the SYSBENCH benchmark. (a)
shows throughput corresponding to various combinations of
categorical and continuous knob values. (b) shows the through-
put and 95th %-tile latency comparisons between tuning all 60
knobs and the 20 key knobs.
independently. Therefore, it is crucial to account for inter-
space communication when designing a knob optimizer that
explores categorical and continuous spaces differently.
(C3) Tricky balance between efficiency and effective-
ness. Enhancing the efficiency of knob exploration in high-
dimensional space often involves reducing the configuration
space. Traditional methods achieve it by freezing the knobs
deemed unimportant [1], [3] or discretizing the search space
of continuous knobs with fixed step sizes [13]. Indeed, the
preference assumptions behind these behaviors are overly
restrictive thereby reducing the chances of finding a near-
optimal configuration, as all the input knobs may contribute
to the tuning. As shown in Fig. 2 (b), tuning all 60 knobs,
not just the seemingly important ones, will consistently lead
to a more optimal configuration. This implies that the solution
for finding the near-optimal configuration with fewer tuning
iterations should avoid preferring only the key knobs.
Our Approach. To tackle these challenges, we propose TOP-
TUNE, an innovative database knob optimizer using Bayesian
optimization (BO)-based methods for enhanced tuning effi-
ciency and superior performance. There are four highlights in
our optimizer. (1) We propose to decompose the configura-
tion space into categorical and continuous subspaces [6],
[7], addressing the C1 issue. Two tuning models which excel
at continuous optimization (e.g., GP) and categorical optimiza-
tion (e.g., SMAC), respectively, are employed to deal with
optimization problems within two subspaces, respectively.
This scheme can take full advantage of the different tuning
models, improving overall performance in the heterogeneous
configuration space. Furthermore, search space decomposition
aids in overcoming the curse of dimensionality. By assigning
fewer knobs to each tuning model after decomposition, it
enhances tuning efficiency. (2) We design a mechanism to
facilitate information communication between the two tun-
ing models [14], addressing the C2 issue. In this mechanism,
the two models alternate in conducting the tuning process.
Each model shares its tuning decisions, specifically the current
optimal knob values, with the other model via a context
feature. This ensures that TOPTUNE can achieve the global
optimum based on dependence between knobs. (3) We intro-
614
文档被以下合辑收录
评论