阿里云_VLDB-2024-An Eficient Transfer Learning Based Configuration Adviser for Database Tuning.pdf

迹部景吾

627

14页

2次

2024-08-13

免费下载

An Eicient Transfer Learning Based Configuration Adviser for

Database Tuning

Xinyi Zhang

∗

Peking University

zhang_xinyi@pku.edu.cn

Hong Wu

†

Alibaba Group

hong.wu@alibaba-inc.com

Yang Li

∗

Peking University

liyang.cs@pku.edu.cn

Zhengju Tang

∗

Peking University

tzj_jwlfp_fxz@stu.pku.edu.cn

Jian Tan

†

Alibaba Group

j.tan@alibaba-inc.com

Feifei Li

†

Alibaba Group

lifeifei@alibaba-inc.com

Bin Cui

∗‡

Peking University

bin.cui@pku.edu.cn

ABSTRACT

In recent years, a wide spectrum of database tuning systems have

emerged to automatically optimize database performance. However,

these systems require a signicant number of workload runs to

deliver a satisfactory level of database performance, which is time-

consuming and resource-intensive. While many attempts have been

made to address this issue by using advanced search optimizers,

empirical studies have shown that no single optimizer can dominate

the rest across tuning tasks with dierent characteristics. Choosing

an inferior optimizer may signicantly increase the tuning cost.

Unfortunately, current practices typically adopt a single optimizer

or follow simple heuristics without considering the task character-

istics. Consequently, they fail to choose the most suitable optimizer

for a specic task. Furthermore, constructing a compact search

space can signicantly improve the tuning eciency. However,

current practices neglect the setting of the value range for each

knob and rely on a large number of workload runs to select im-

portant knobs, resulting in a considerable amount of unnecessary

exploration in ineective regions.

To pursue ecient database tuning, in this paper, we argue that

it is imperative to have an approach that can judiciously determine

a precise space and search optimizer for an arbitrary tuning task.

To this end, we propose OpAdviser, which exploits the information

learned from historical tuning tasks to guide the search space con-

struction and search optimizer selection. Our design can greatly

accelerate the tuning process and further reduce the required work-

load runs. Given a tuning task, OpAdviser learns the geometries

of search space, including important knobs and their eective re-

gions, from relevant previous tasks. It then constructs the target

search space from the geometries according to the on-the-y task

similarity, which allows for adaptive adjustment of the target space.

OpAdviser also employs a pairwise ranking model to capture the

∗

School of CS & Key Laboratory of High Condence Software Technologies,

Peking University

†

Database and Storage Laboratory, Damo Academy, Alibaba Group

‡

National Engineering Laboratory for Big Data Analysis and Applications, Peking

University

This work is licensed under the Creative Commons BY-NC-ND 4.0 International

License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of

this license. For any use beyond those covered by this license, obtain permission by

emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights

licensed to the VLDB Endowment.

Proceedings of the VLDB Endowment, Vol. 17, No. 3 ISSN 2150-8097.

doi:10.14778/3632093.3632114

relationship from task characteristics to optimizer rankings. This

ranking model is invoked during tuning and predicts the best op-

timizer to be used for the current iteration. We conduct extensive

evaluations across a diverse set of workloads, where OpAdviser

achieves 9.2% higher throughput and signicantly reduces the num-

ber of workload runs with an average speedup of

∼

3.4

compared

to state-of-the-art tuning systems.

PVLDB Reference Format:

Xinyi Zhang, Hong Wu, Yang Li, Zhengju Tang, Jian Tan, Feifei Li, and Bin

Cui. An Ecient Transfer Learning Based Conguration Adviser for

Database Tuning. PVLDB, 17(3): 539 - 552, 2023.

doi:10.14778/3632093.3632114

PVLDB Artifact Availability:

The source code, data, and/or other artifacts have been made available at

https://github.com/Blairruc-pku/OpAdviser.

1 INTRODUCTION

Optimizing the conguration of a database management system

(DBMS) is a critical aspect to achieve high system performance.

Conventionally, the tuning task has been performed manually by

database administrators (DBAs), involving workload and hardware

conguration analysis, selection of appropriate congurations, and

performance testing. However, the manual tuning process is time-

consuming and expertise-consuming when dealing with complex

modern workloads and various hardware environments, especially

in the cloud environment. Consequently, automatic conguration

tuning has attracted lots of attention in the research community [

11, 16, 26, 32, 33, 37, 42–44, 46, 47, 52, 54].

The current systems for conguration tuning share a generic

workow. Initially, the tuning system denes a search space that en-

compasses all possible congurations, given a target workload and

a performance metric. Subsequently, the system iteratively explores

congurations within the search space according to the suggestion

from a search optimizer, which aims to nd the conguration that

maximizes the database performance. Notably, evaluating a cong-

uration necessitates resources and time to run the workload, which

dominates as the major cost when tuning databases. Minimizing

the number of workload runs for nding a good conguration is a

crucial requirement to adopt the tuning systems in practical sce-

narios, such as production services with numerous databases [

A sophisticated tuning system should deliver a satisfactory level of

539

database performance with minimal cost of workload runs. Given

a tuning task, the construction of search space and the choice of

search optimizer are the main factors that aect the tuning eciency.

While prior research has tried to enhance the tuning eciency by

selecting important knobs [

] and developing advanced

search optimizers [

], when applying these tuning

systems in practice, we nd the following issues and challenges.

Ineffective Huge Search Space. In the literature, it is commonly

assumed that the number of evaluations required to nd an opti-

mum is proportional to the size of the search space [

]. Popular

databases such as PostgreSQL or MySQL have hundreds of cong-

urable knobs [

]. Studies have shown that selecting a few im-

portant knobs can expedite the subsequent tuning process [

However, a static selection of global important knobs is not applica-

ble to various workloads, since they may vary across workloads [

To select workload-specic important knobs, existing practices con-

duct many target workload runs on dierent congurations before

proceeding to tune the selected knobs. Unfortunately, the number

of workload runs required to well select the important knobs could

be up to hundreds in quantity, which even surpasses the workload

runs required for the actual tuning process itself, as discussed in

Section 3.1. As a result, the high cost associated with this approach

poses a signicant challenge for enhancing the overall tuning e-

ciency in practice.

We also nd that a vital factor to speed up the tuning process has

been ignored for long. That is how to dene an appropriate value

range for each tuning knob. Existing tuning approaches usually

adopt the default value ranges provided by the database manual,

which are excessively broad and may contain theoretical values that

are infeasible for a specic workload. For instance, the default range

for

innodb_buffer_pool_size

is 5 MegaByte to 16 ExaByte in

MySQL, with the upper bound far exceeding the available instance

memory. As a result, tuning eorts would be wasted in unproduc-

tive areas [

], leading to out of memory errors [

]. Furthermore,

even when the upper bound of memory-related knobs is set below

the instance capacity, there is still a signicant amount of super-

uous space, which also holds true for other types of knobs. This

excessive space arises because the default ranges are designed to

cover all possible workload scenarios, rather than being customized

to a specic workload, so the default ranges are excessively large

for performance tuning, as shown in Section 3.2. Given this fact,

we propose to build compact ranges that are tailored to each work-

load. It should be as small as possible while still including optimal

regions. In such a way, the search space could be substantially re-

duced, which further improves tuning eciency. Unfortunately,

identifying the compact knob ranges without lots of target obser-

vations is not easy. To this end, we face the challenge that “how to

dene a compact space for a given tuning task without the

need for a large number of workload runs”.

Fixed Search Optimizer for Different Tuning Tasks. Many ad-

vanced search optimizers have been proposed to navigate the search

space for database tuning, which can be classied into two cate-

gories: search-based and learning-based. Search-based optimizers

employ heuristics or meta-heuristics to explore optimal congu-

rations, such as BestCong [

] and genetic algorithm (GA) [

Learning-based optimizers aim to improve exploration by mod-

eling the performance function and can be further classied into

Reinforcement-Learning based [

] and Bayesian-Optimization

based [5, 10, 15, 17, 24, 27, 53, 55] approaches.

Despite the availability of various search optimizers, the appli-

cability of dierent optimizers remains unclear since no single

optimizer can dominate all tuning tasks, as indicated by the no

free lunch theorems for optimization [

] and the empirical stud-

ies [

] on database tuning. Using an inferior and sub-optimal

optimizer could result in a signicant performance loss of several or-

ders of magnitude [

]. A recent study [

] on database tuning has

suggested that GA is suitable for the early tuning phase, where it fo-

cuses more on sampling congurations with high short-term gains,

while DDPG performs better in the later stages. Accordingly, it pro-

poses to adopt GA during the early tuning phase and then switch

to DDPG [

] for higher performance. However, the real-world

scenarios are more complex due to the ever-increasing candidates

of search optimizers and distinct tuning tasks, e.g., various work-

loads and dierent shapes of search space. Simple heuristics fail to

recommend the optimal search optimizer since they cannot capture

the relationship between the tuning tasks and the performance of

disparate optimizers. Therefore, to further enhance the eciency

of database tuning, we face the challenge that “how to identify

an appropriate search optimizer for a specic tuning task”.

The decision is hard to make since we cannot perform exhaustive

testing of all candidate optimizers on potential tuning tasks, as it is

prohibitively expensive.

Our Approach. The aim of this study is to expedite database

tuning by simultaneously addressing the aforementioned two chal-

lenges by the automation of search space construction and optimizer

selection. To achieve this, we propose OpAdviser, a data-driven

approach that acts as an Optimization Adviser for database con-

guration tuning. Specically, tuning services could accumulate a

wealth of historical data as they perform tuning for dierent clients

and applications. Valuable knowledge can be extracted from the

historical data to guide the setting of target tuning without con-

ducting extensive experiments. Thus, OpAdviser leverages the data

collected from previous tuning tasks and constructed benchmark

data to automatically build a compact search space and select an

appropriate search optimizer for a given task.

First, OpAdviser constructs a compact search space which can

signicantly reduce the number of target workload runs needed

to nish knob tuning. It learns the geometries of search space

from the tuning data collected in previous tuning tasks. Although

dierent tasks share common knowledge, their respective important

knobs and eective ranges are quite dierent. Consequently, the

geometries derived from one previous task may not be entirely

suitable for the target. To address this issue, OpAdviser extracts

the promising region from dierent source tasks based on their

similarity with the target task. It adjusts the task similarity during

tuning based on the augmented observations and adapts the target

search space according to the on-the-y task similarity through

weighted voting. The transfer process is carefully designed so that

the negative transfer is avoided when source tasks are less similar

and the common geometries are extracted to tailor the search space.

Second, OpAdviser selects a suitable search optimizer by cap-

turing the mapping from task characteristics to the performance

ranking of each optimizer. It takes an arbitrary task as input and

predicts the most promising optimizer without online testing via a

540

of 14

免费下载

文档被以下合辑收录

VLDB2024 数据库顶会论文（共31篇）

本合辑收录了VLDB2024 数据库顶会论文。

关注

文档被以下合辑收录

评论