暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
2021-7_ResTune Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases _Xinyi Zhang.pdf
133
13页
4次
2022-01-27
免费下载
ResTune: Resource Oriented Tuning Boosted by Meta-Learning
for Cloud Databases
Xinyi Zhang
Peking University &
Alibaba Group
zhang_xinyi@pku.edu.cn
Hong Wu
Alibaba Group
hong.wu@alibaba-inc.com
Zhuo Chang
§
Alibaba Group & Peking
University
z.chang@pku.edu.cn
Shuowei Jin
Alibaba Group
shuowei.jsw@alibaba-
inc.com
Jian Tan
Alibaba Group
j.tan@alibaba-inc.com
Feifei Li
Alibaba Group
lifeifei@alibaba-inc.com
Tieying Zhang
Alibaba Group
tieying.zhang@alibaba-
inc.com
Bin Cui
§
Peking University
bin.cui@pku.edu.cn
ABSTRACT
Modern database management systems (DBMS) contain tens to
hundreds of critical performance tuning knobs that determine the
system runtime behaviors. To reduce the total cost of ownership,
cloud database providers put in drastic eort to automatically opti-
mize the resource utilization by tuning these knobs. There are two
challenges. First, the tuning system should always abide by the ser-
vice level agreement (SLA) while optimizing the resource utilization,
which imposes strict constrains on the tuning process. Second, the
tuning time should be reasonably acceptable since time-consuming
tuning is not practical for production and online troubleshooting.
In this paper, we design ResTune to automatically optimize
the resource utilization without violating SLA constraints on the
throughput and latency requirements. ResTune leverages the tun-
ing experience from the history tasks and transfers the accumulated
knowledge to accelerate the tuning process of the new tasks. The
prior knowledge is represented from historical tuning tasks through
an ensemble model. The model learns the similarity between the
historical workloads and the target, which signicantly reduces
the tuning time by a meta-learning based approach. ResTune can
eciently handle dierent workloads and various hardware en-
vironments. We perform evaluations using benchmarks and real
world workloads on dierent types of resources. The results show
that, compared with the manually tuned congurations, ResTune
Xinyi Zhang and Hong Wu contribute equally to this paper.
Center for Data Science, Peking University & National Engineering Laboratory for
Big Data Analysis and Applications
Database and Storage Laboratory, Damo Academy, Alibaba Group
§
School of EECS & Key Laboratory of High Condence Software Technologies, Peking
University
Institute of Computational Social Science, Peking University (Qingdao)
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
SIGMOD ’21, June 20–25, 2021, Virtual Event, China
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8343-1/21/06.. . $15.00
https://doi.org/10.1145/3448016.3457291
reduces 65%, 87%, 39% of CPU utilization, I/O and memory on av-
erage, respectively. Compared with the state-of-the-art methods,
ResTune nds better congurations with up to 18× speedups.
CCS CONCEPTS
Information systems Autonomous database administra-
tion; Computing methodologies Machine learning.
KEYWORDS
resource; tuning; cloud database; service level agreement
ACM Reference Format:
Xinyi Zhang, Hong Wu, Zhuo Chang, Shuowei Jin, Jian Tan, Feifei Li,
Tieying Zhang, and Bin Cui. 2021. ResTune: Resource Oriented Tuning
Boosted by Meta-Learning for Cloud Databases. In Proceedings of the 2021
International Conference on Management of Data (SIGMOD ’21), June 20–
25, 2021, Virtual Event, China. ACM, New York, NY, USA, 13 pages. https:
//doi.org/10.1145/3448016.3457291
1 INTRODUCTION
Tuning conguration knobs of modern database management sys-
tems (DBMS) is critical for system performance, albeit challenging.
Dierent knobs directly aect the running database performance
and jointly determine the quality of service and the resource uti-
lization of DBMS. As a common practice, to apply an appropriate
conguration for a given workload, database administrators (DBAs)
are responsible for tuning these knobs based on experience. How-
ever, in a cloud environment, manually tuning possibly tens to
hundreds of controlling knobs do not guarantee the performance
across various workloads and could not scale. Therefore, automatic
tuning becomes an appealing feature for cloud providers.
On one hand, optimizing the system performance (e.g., through-
put, latency) is critical to improving users’ experience. On the other
hand, controlling the resource utilization is a necessity from the
cloud provider’s perspective, due to the following reasons. First,
one of the goals of using cloud databases is to reduce the Total Cost
of Ownership (TCO). Maintaining a low cost is an important eco-
nomic factor to attract users, which urges to more eciently utilize
the available computing resources. Second, optimizing computing
resources such as CPU, memory, and I/O helps troubleshoot perfor-
mance bugs that cause unnecessary high utilization. High resource
Research Data Management Track Paper
SIGMOD ’21, June 20–25, 2021, Virtual Event, China
2102
0
1724
3448
5172
6896
8620
sync_spin_loops
1
1413
2825
4237
5650
7062
8474
9886
table_open_cache
Throughput (txn/sec)
0
1724
3448
5172
6896
8620
sync_spin_loops
1
1413
2825
4237
5650
7062
8474
9886
table_open_cache
CPU Utilization (%)
5K
6K
7K
8K
9K
10K
15
30
45
60
75
90
Figure 1: TPS and CPU Usage for Real Workload with 2 Knobs
utilization often leads to unpredictable system hangs and resource
contentions in a shared or multi-tenant environment [
9
,
20
]. For
example, high CPU utilization is a frequent issue that aects the
availability of cloud databases [
2
]. Third, the throughput of real
workloads is often bounded by the request rate determined by
the clients. Thus, the request rates do not necessarily reach the
processing capacity of DBMS. For these common application sce-
narios, squeezing more throughput from the capacity is not the
goal. Meanwhile, controlling resource utilization is more valuable
for end-users, which can help them to choose appropriate cloud
instance types and to further avoid over-provisioning.
One challenge of tuning conguration knobs is to reduce re-
source utilization while still guaranteeing the Service Level Agree-
ment (SLA), e.g., without violating the throughput and latency
requirements. Figure 1 plots the throughput along with CPU usage
on a real workload with 2 controlling knobs, i.e., the number of
open tables
1
and the number of times a thread waits for the mu-
tex to be freed before suspending
2
. The result shows that, even
though a wide range of congurations has dierent CPU usages,
they experience the same throughput. As mentioned earlier, the
throughput of real workloads is often bounded by the user request
rate. Therefore, there are opportunities to optimize resource uti-
lization without sacricing the SLA. Most existing database tuning
methods [
11
,
19
,
27
,
49
] mainly focus on improving the through-
put and latency without optimizing the resource usage and SLA
simultaneously. For example, iTuned [
11
] and OtterTune [
6
] use
Gaussian Processes to tune knobs to achieve only high throughputs.
CDBTune [
49
] and QTune [
27
] use the reinforcement learning ap-
proach to train a policy model to recommend good knobs, which,
however, takes a long time to learn the model [23].
The other challenge is to satisfy the natural constraint imposed
by the real applications that often limit the required tuning times.
Tuning systems replay the workload repeatedly to learn the model
iteratively, and the replay times dominate the tuning process. The
state-of-the-art systems [
6
,
49
] take hundreds to thousands of iter-
ations to nd an ideal conguration. For typical benchmarks that
assume the transaction statistics do not change over time, the re-
play time can be set to 3-5 minutes [
49
]. But for real workloads,
we observe that the replay time for each iteration takes at least
5 minutes to adapt to dierent types of transactions. This could
cause the total tuning time for real workloads to last for a few days.
This issue is more pronounced when considering that tuning itself
requires computing resources such as DBMS copies to replay on the
1
MySQL knob: table_open_cache
2
MySQL knob: innodb_sync_spin_loops
user side (Section 4). Thus, the tuning time should be minimized.
In addition, tuning DBMS systems, e.g., reducing the high resource
utilization, can be used for online performance troubleshooting.
High utilization could have a severe impact on system availability.
From this point of view, the tuning time should match the typi-
cal system recovery time, which is often from a few minutes to 1
hour [
1
]. To accelerate the tuning process by reducing the budget
to tens of iterations, ResTune utilizes the historical data collected
from tuning other tasks and transfer the experience into tuning
new tasks. This requires the tuning algorithm to eciently and
eectively represent useful knowledge from historical tuning data.
Our Approach
. Dierent from previous works that only consider
the throughput and latency, in this paper, we dene the resource-
oriented tuning problem that aims to nd the congurations to
minimize the resource usage without sacricing the throughput
and latency. We formulate it as a constrained optimization problem
and propose ResTune, a constraint-aware database tuning system
boosted by meta-learning. ResTune is a tool provided by the cloud
providers, which aims to reduce the Total Cost of Ownership for
its end users. It optimizes the resource utilization for a given work-
load by imposing constraints on the performance requirements.
ResTune models both the objective function and the constraints
using Gaussian processes to recommend congurations with op-
timized resource utilization while guaranteeing the SLA. To im-
prove the eciency of ResTune, we use meta-learning, which is the
method of systematically learning from meta-data to accomplish
new tasks [
44
]. A novel meta-learning pipeline is proposed to use
multiple models (base-learners) to represent prior knowledge and
an ensemble model (meta-learner) to combine and eectively uti-
lize the experiences. The meta-learner measures the usefulness of
base-learners to target workload through meta-feature and model
prediction. In this way, ResTune could accordingly make use of
existing data and accelerate the tuning process. Furthermore, our
approach can transfer the knowledge over dierent workloads and
heterogeneous hardware environments.
Specically, we make the following contributions:
To deal with the challenges in real DBMS scenarios, we formu-
late the resource-oriented conguration tuning problem as a
constrained Bayesian Optimization problem.
To accelerate the tuning process within an acceptable time inter-
val, a meta-learning strategy is proposed to extract experience
from past tasks. Unlike previous studies, our approach uses rel-
ative rankings rather than absolute distances to measure the
similarity between workloads. It can better transfer knowledge
across dierent hardware environments and achieve fast tuning
and ecient adaptation. To the best of our knowledge, this is the
rst attempt to boost constrained Bayesian Optimization with
meta-learning for tuning DBMS.
We implement the proposed method and evaluate on standard
benchmarks and real workloads. Compared with the manual con-
gurations provided by the DBAs, ResTune reduces 65% of CPU
utilization, 87% of I/O, and 39% of memory on average. Compared
with the state-of-the-art DBMS tuning systems, ResTune nds
better congurations with up to 18× speedups.
The remainder of the paper is organized as follows. Section
2 provides the related work and Section 3 formally denes the
Research Data Management Track Paper
SIGMOD ’21, June 20–25, 2021, Virtual Event, China
2103
of 13
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜