暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
阿里云_VLDB-2024-An Eficient Transfer Learning Based Configuration Adviser for Database Tuning.pdf
556
14页
2次
2024-08-13
免费下载
An Eicient Transfer Learning Based Configuration Adviser for
Database Tuning
Xinyi Zhang
Peking University
zhang_xinyi@pku.edu.cn
Hong Wu
Alibaba Group
hong.wu@alibaba-inc.com
Yang Li
Peking University
liyang.cs@pku.edu.cn
Zhengju Tang
Peking University
tzj_jwlfp_fxz@stu.pku.edu.cn
Jian Tan
Alibaba Group
j.tan@alibaba-inc.com
Feifei Li
Alibaba Group
lifeifei@alibaba-inc.com
Bin Cui
Peking University
bin.cui@pku.edu.cn
ABSTRACT
In recent years, a wide spectrum of database tuning systems have
emerged to automatically optimize database performance. However,
these systems require a signicant number of workload runs to
deliver a satisfactory level of database performance, which is time-
consuming and resource-intensive. While many attempts have been
made to address this issue by using advanced search optimizers,
empirical studies have shown that no single optimizer can dominate
the rest across tuning tasks with dierent characteristics. Choosing
an inferior optimizer may signicantly increase the tuning cost.
Unfortunately, current practices typically adopt a single optimizer
or follow simple heuristics without considering the task character-
istics. Consequently, they fail to choose the most suitable optimizer
for a specic task. Furthermore, constructing a compact search
space can signicantly improve the tuning eciency. However,
current practices neglect the setting of the value range for each
knob and rely on a large number of workload runs to select im-
portant knobs, resulting in a considerable amount of unnecessary
exploration in ineective regions.
To pursue ecient database tuning, in this paper, we argue that
it is imperative to have an approach that can judiciously determine
a precise space and search optimizer for an arbitrary tuning task.
To this end, we propose OpAdviser, which exploits the information
learned from historical tuning tasks to guide the search space con-
struction and search optimizer selection. Our design can greatly
accelerate the tuning process and further reduce the required work-
load runs. Given a tuning task, OpAdviser learns the geometries
of search space, including important knobs and their eective re-
gions, from relevant previous tasks. It then constructs the target
search space from the geometries according to the on-the-y task
similarity, which allows for adaptive adjustment of the target space.
OpAdviser also employs a pairwise ranking model to capture the
School of CS & Key Laboratory of High Condence Software Technologies,
Peking University
Database and Storage Laboratory, Damo Academy, Alibaba Group
National Engineering Laboratory for Big Data Analysis and Applications, Peking
University
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 17, No. 3 ISSN 2150-8097.
doi:10.14778/3632093.3632114
relationship from task characteristics to optimizer rankings. This
ranking model is invoked during tuning and predicts the best op-
timizer to be used for the current iteration. We conduct extensive
evaluations across a diverse set of workloads, where OpAdviser
achieves 9.2% higher throughput and signicantly reduces the num-
ber of workload runs with an average speedup of
3.4
×
compared
to state-of-the-art tuning systems.
PVLDB Reference Format:
Xinyi Zhang, Hong Wu, Yang Li, Zhengju Tang, Jian Tan, Feifei Li, and Bin
Cui. An Ecient Transfer Learning Based Conguration Adviser for
Database Tuning. PVLDB, 17(3): 539 - 552, 2023.
doi:10.14778/3632093.3632114
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at
https://github.com/Blairruc-pku/OpAdviser.
1 INTRODUCTION
Optimizing the conguration of a database management system
(DBMS) is a critical aspect to achieve high system performance.
Conventionally, the tuning task has been performed manually by
database administrators (DBAs), involving workload and hardware
conguration analysis, selection of appropriate congurations, and
performance testing. However, the manual tuning process is time-
consuming and expertise-consuming when dealing with complex
modern workloads and various hardware environments, especially
in the cloud environment. Consequently, automatic conguration
tuning has attracted lots of attention in the research community [
4
,
11, 16, 26, 32, 33, 37, 42–44, 46, 47, 52, 54].
The current systems for conguration tuning share a generic
workow. Initially, the tuning system denes a search space that en-
compasses all possible congurations, given a target workload and
a performance metric. Subsequently, the system iteratively explores
congurations within the search space according to the suggestion
from a search optimizer, which aims to nd the conguration that
maximizes the database performance. Notably, evaluating a cong-
uration necessitates resources and time to run the workload, which
dominates as the major cost when tuning databases. Minimizing
the number of workload runs for nding a good conguration is a
crucial requirement to adopt the tuning systems in practical sce-
narios, such as production services with numerous databases [
24
].
A sophisticated tuning system should deliver a satisfactory level of
539
database performance with minimal cost of workload runs. Given
a tuning task, the construction of search space and the choice of
search optimizer are the main factors that aect the tuning eciency.
While prior research has tried to enhance the tuning eciency by
selecting important knobs [
5
,
17
,
23
] and developing advanced
search optimizers [
22
,
24
,
50
,
53
,
56
], when applying these tuning
systems in practice, we nd the following issues and challenges.
Ineffective Huge Search Space. In the literature, it is commonly
assumed that the number of evaluations required to nd an opti-
mum is proportional to the size of the search space [
48
]. Popular
databases such as PostgreSQL or MySQL have hundreds of cong-
urable knobs [
20
,
24
]. Studies have shown that selecting a few im-
portant knobs can expedite the subsequent tuning process [
23
,
51
].
However, a static selection of global important knobs is not applica-
ble to various workloads, since they may vary across workloads [
23
].
To select workload-specic important knobs, existing practices con-
duct many target workload runs on dierent congurations before
proceeding to tune the selected knobs. Unfortunately, the number
of workload runs required to well select the important knobs could
be up to hundreds in quantity, which even surpasses the workload
runs required for the actual tuning process itself, as discussed in
Section 3.1. As a result, the high cost associated with this approach
poses a signicant challenge for enhancing the overall tuning e-
ciency in practice.
We also nd that a vital factor to speed up the tuning process has
been ignored for long. That is how to dene an appropriate value
range for each tuning knob. Existing tuning approaches usually
adopt the default value ranges provided by the database manual,
which are excessively broad and may contain theoretical values that
are infeasible for a specic workload. For instance, the default range
for
innodb_buffer_pool_size
is 5 MegaByte to 16 ExaByte in
MySQL, with the upper bound far exceeding the available instance
memory. As a result, tuning eorts would be wasted in unproduc-
tive areas [
49
], leading to out of memory errors [
21
]. Furthermore,
even when the upper bound of memory-related knobs is set below
the instance capacity, there is still a signicant amount of super-
uous space, which also holds true for other types of knobs. This
excessive space arises because the default ranges are designed to
cover all possible workload scenarios, rather than being customized
to a specic workload, so the default ranges are excessively large
for performance tuning, as shown in Section 3.2. Given this fact,
we propose to build compact ranges that are tailored to each work-
load. It should be as small as possible while still including optimal
regions. In such a way, the search space could be substantially re-
duced, which further improves tuning eciency. Unfortunately,
identifying the compact knob ranges without lots of target obser-
vations is not easy. To this end, we face the challenge that “how to
dene a compact space for a given tuning task without the
need for a large number of workload runs”.
Fixed Search Optimizer for Different Tuning Tasks. Many ad-
vanced search optimizers have been proposed to navigate the search
space for database tuning, which can be classied into two cate-
gories: search-based and learning-based. Search-based optimizers
employ heuristics or meta-heuristics to explore optimal congu-
rations, such as BestCong [
56
] and genetic algorithm (GA) [
9
].
Learning-based optimizers aim to improve exploration by mod-
eling the performance function and can be further classied into
Reinforcement-Learning based [
9
,
31
,
50
] and Bayesian-Optimization
based [5, 10, 15, 17, 24, 27, 53, 55] approaches.
Despite the availability of various search optimizers, the appli-
cability of dierent optimizers remains unclear since no single
optimizer can dominate all tuning tasks, as indicated by the no
free lunch theorems for optimization [
48
] and the empirical stud-
ies [
6
,
51
] on database tuning. Using an inferior and sub-optimal
optimizer could result in a signicant performance loss of several or-
ders of magnitude [
6
,
51
]. A recent study [
9
] on database tuning has
suggested that GA is suitable for the early tuning phase, where it fo-
cuses more on sampling congurations with high short-term gains,
while DDPG performs better in the later stages. Accordingly, it pro-
poses to adopt GA during the early tuning phase and then switch
to DDPG [
35
] for higher performance. However, the real-world
scenarios are more complex due to the ever-increasing candidates
of search optimizers and distinct tuning tasks, e.g., various work-
loads and dierent shapes of search space. Simple heuristics fail to
recommend the optimal search optimizer since they cannot capture
the relationship between the tuning tasks and the performance of
disparate optimizers. Therefore, to further enhance the eciency
of database tuning, we face the challenge that “how to identify
an appropriate search optimizer for a specic tuning task”.
The decision is hard to make since we cannot perform exhaustive
testing of all candidate optimizers on potential tuning tasks, as it is
prohibitively expensive.
Our Approach. The aim of this study is to expedite database
tuning by simultaneously addressing the aforementioned two chal-
lenges by the automation of search space construction and optimizer
selection. To achieve this, we propose OpAdviser, a data-driven
approach that acts as an Optimization Adviser for database con-
guration tuning. Specically, tuning services could accumulate a
wealth of historical data as they perform tuning for dierent clients
and applications. Valuable knowledge can be extracted from the
historical data to guide the setting of target tuning without con-
ducting extensive experiments. Thus, OpAdviser leverages the data
collected from previous tuning tasks and constructed benchmark
data to automatically build a compact search space and select an
appropriate search optimizer for a given task.
First, OpAdviser constructs a compact search space which can
signicantly reduce the number of target workload runs needed
to nish knob tuning. It learns the geometries of search space
from the tuning data collected in previous tuning tasks. Although
dierent tasks share common knowledge, their respective important
knobs and eective ranges are quite dierent. Consequently, the
geometries derived from one previous task may not be entirely
suitable for the target. To address this issue, OpAdviser extracts
the promising region from dierent source tasks based on their
similarity with the target task. It adjusts the task similarity during
tuning based on the augmented observations and adapts the target
search space according to the on-the-y task similarity through
weighted voting. The transfer process is carefully designed so that
the negative transfer is avoided when source tasks are less similar
and the common geometries are extracted to tailor the search space.
Second, OpAdviser selects a suitable search optimizer by cap-
turing the mapping from task characteristics to the performance
ranking of each optimizer. It takes an arbitrary task as input and
predicts the most promising optimizer without online testing via a
540
of 14
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜