
database performance with minimal cost of workload runs. Given
a tuning task, the construction of search space and the choice of
search optimizer are the main factors that aect the tuning eciency.
While prior research has tried to enhance the tuning eciency by
selecting important knobs [
5
,
17
,
23
] and developing advanced
search optimizers [
22
,
24
,
50
,
53
,
56
], when applying these tuning
systems in practice, we nd the following issues and challenges.
Ineffective Huge Search Space. In the literature, it is commonly
assumed that the number of evaluations required to nd an opti-
mum is proportional to the size of the search space [
48
]. Popular
databases such as PostgreSQL or MySQL have hundreds of cong-
urable knobs [
20
,
24
]. Studies have shown that selecting a few im-
portant knobs can expedite the subsequent tuning process [
23
,
51
].
However, a static selection of global important knobs is not applica-
ble to various workloads, since they may vary across workloads [
23
].
To select workload-specic important knobs, existing practices con-
duct many target workload runs on dierent congurations before
proceeding to tune the selected knobs. Unfortunately, the number
of workload runs required to well select the important knobs could
be up to hundreds in quantity, which even surpasses the workload
runs required for the actual tuning process itself, as discussed in
Section 3.1. As a result, the high cost associated with this approach
poses a signicant challenge for enhancing the overall tuning e-
ciency in practice.
We also nd that a vital factor to speed up the tuning process has
been ignored for long. That is how to dene an appropriate value
range for each tuning knob. Existing tuning approaches usually
adopt the default value ranges provided by the database manual,
which are excessively broad and may contain theoretical values that
are infeasible for a specic workload. For instance, the default range
for
innodb_buffer_pool_size
is 5 MegaByte to 16 ExaByte in
MySQL, with the upper bound far exceeding the available instance
memory. As a result, tuning eorts would be wasted in unproduc-
tive areas [
49
], leading to out of memory errors [
21
]. Furthermore,
even when the upper bound of memory-related knobs is set below
the instance capacity, there is still a signicant amount of super-
uous space, which also holds true for other types of knobs. This
excessive space arises because the default ranges are designed to
cover all possible workload scenarios, rather than being customized
to a specic workload, so the default ranges are excessively large
for performance tuning, as shown in Section 3.2. Given this fact,
we propose to build compact ranges that are tailored to each work-
load. It should be as small as possible while still including optimal
regions. In such a way, the search space could be substantially re-
duced, which further improves tuning eciency. Unfortunately,
identifying the compact knob ranges without lots of target obser-
vations is not easy. To this end, we face the challenge that “how to
dene a compact space for a given tuning task without the
need for a large number of workload runs”.
Fixed Search Optimizer for Different Tuning Tasks. Many ad-
vanced search optimizers have been proposed to navigate the search
space for database tuning, which can be classied into two cate-
gories: search-based and learning-based. Search-based optimizers
employ heuristics or meta-heuristics to explore optimal congu-
rations, such as BestCong [
56
] and genetic algorithm (GA) [
9
].
Learning-based optimizers aim to improve exploration by mod-
eling the performance function and can be further classied into
Reinforcement-Learning based [
9
,
31
,
50
] and Bayesian-Optimization
based [5, 10, 15, 17, 24, 27, 53, 55] approaches.
Despite the availability of various search optimizers, the appli-
cability of dierent optimizers remains unclear since no single
optimizer can dominate all tuning tasks, as indicated by the no
free lunch theorems for optimization [
48
] and the empirical stud-
ies [
6
,
51
] on database tuning. Using an inferior and sub-optimal
optimizer could result in a signicant performance loss of several or-
ders of magnitude [
6
,
51
]. A recent study [
9
] on database tuning has
suggested that GA is suitable for the early tuning phase, where it fo-
cuses more on sampling congurations with high short-term gains,
while DDPG performs better in the later stages. Accordingly, it pro-
poses to adopt GA during the early tuning phase and then switch
to DDPG [
35
] for higher performance. However, the real-world
scenarios are more complex due to the ever-increasing candidates
of search optimizers and distinct tuning tasks, e.g., various work-
loads and dierent shapes of search space. Simple heuristics fail to
recommend the optimal search optimizer since they cannot capture
the relationship between the tuning tasks and the performance of
disparate optimizers. Therefore, to further enhance the eciency
of database tuning, we face the challenge that “how to identify
an appropriate search optimizer for a specic tuning task”.
The decision is hard to make since we cannot perform exhaustive
testing of all candidate optimizers on potential tuning tasks, as it is
prohibitively expensive.
Our Approach. The aim of this study is to expedite database
tuning by simultaneously addressing the aforementioned two chal-
lenges by the automation of search space construction and optimizer
selection. To achieve this, we propose OpAdviser, a data-driven
approach that acts as an Optimization Adviser for database con-
guration tuning. Specically, tuning services could accumulate a
wealth of historical data as they perform tuning for dierent clients
and applications. Valuable knowledge can be extracted from the
historical data to guide the setting of target tuning without con-
ducting extensive experiments. Thus, OpAdviser leverages the data
collected from previous tuning tasks and constructed benchmark
data to automatically build a compact search space and select an
appropriate search optimizer for a given task.
First, OpAdviser constructs a compact search space which can
signicantly reduce the number of target workload runs needed
to nish knob tuning. It learns the geometries of search space
from the tuning data collected in previous tuning tasks. Although
dierent tasks share common knowledge, their respective important
knobs and eective ranges are quite dierent. Consequently, the
geometries derived from one previous task may not be entirely
suitable for the target. To address this issue, OpAdviser extracts
the promising region from dierent source tasks based on their
similarity with the target task. It adjusts the task similarity during
tuning based on the augmented observations and adapts the target
search space according to the on-the-y task similarity through
weighted voting. The transfer process is carefully designed so that
the negative transfer is avoided when source tasks are less similar
and the common geometries are extracted to tailor the search space.
Second, OpAdviser selects a suitable search optimizer by cap-
turing the mapping from task characteristics to the performance
ranking of each optimizer. It takes an arbitrary task as input and
predicts the most promising optimizer without online testing via a
540
文档被以下合辑收录
评论