
0
1724
3448
5172
6896
8620
sync_spin_loops
1
1413
2825
4237
5650
7062
8474
9886
table_open_cache
Throughput (txn/sec)
0
1724
3448
5172
6896
8620
sync_spin_loops
1
1413
2825
4237
5650
7062
8474
9886
table_open_cache
CPU Utilization (%)
5K
6K
7K
8K
9K
10K
15
30
45
60
75
90
Figure 1: TPS and CPU Usage for Real Workload with 2 Knobs
utilization often leads to unpredictable system hangs and resource
contentions in a shared or multi-tenant environment [
9
,
20
]. For
example, high CPU utilization is a frequent issue that aects the
availability of cloud databases [
2
]. Third, the throughput of real
workloads is often bounded by the request rate determined by
the clients. Thus, the request rates do not necessarily reach the
processing capacity of DBMS. For these common application sce-
narios, squeezing more throughput from the capacity is not the
goal. Meanwhile, controlling resource utilization is more valuable
for end-users, which can help them to choose appropriate cloud
instance types and to further avoid over-provisioning.
One challenge of tuning conguration knobs is to reduce re-
source utilization while still guaranteeing the Service Level Agree-
ment (SLA), e.g., without violating the throughput and latency
requirements. Figure 1 plots the throughput along with CPU usage
on a real workload with 2 controlling knobs, i.e., the number of
open tables
1
and the number of times a thread waits for the mu-
tex to be freed before suspending
2
. The result shows that, even
though a wide range of congurations has dierent CPU usages,
they experience the same throughput. As mentioned earlier, the
throughput of real workloads is often bounded by the user request
rate. Therefore, there are opportunities to optimize resource uti-
lization without sacricing the SLA. Most existing database tuning
methods [
11
,
19
,
27
,
49
] mainly focus on improving the through-
put and latency without optimizing the resource usage and SLA
simultaneously. For example, iTuned [
11
] and OtterTune [
6
] use
Gaussian Processes to tune knobs to achieve only high throughputs.
CDBTune [
49
] and QTune [
27
] use the reinforcement learning ap-
proach to train a policy model to recommend good knobs, which,
however, takes a long time to learn the model [23].
The other challenge is to satisfy the natural constraint imposed
by the real applications that often limit the required tuning times.
Tuning systems replay the workload repeatedly to learn the model
iteratively, and the replay times dominate the tuning process. The
state-of-the-art systems [
6
,
49
] take hundreds to thousands of iter-
ations to nd an ideal conguration. For typical benchmarks that
assume the transaction statistics do not change over time, the re-
play time can be set to 3-5 minutes [
49
]. But for real workloads,
we observe that the replay time for each iteration takes at least
5 minutes to adapt to dierent types of transactions. This could
cause the total tuning time for real workloads to last for a few days.
This issue is more pronounced when considering that tuning itself
requires computing resources such as DBMS copies to replay on the
1
MySQL knob: table_open_cache
2
MySQL knob: innodb_sync_spin_loops
user side (Section 4). Thus, the tuning time should be minimized.
In addition, tuning DBMS systems, e.g., reducing the high resource
utilization, can be used for online performance troubleshooting.
High utilization could have a severe impact on system availability.
From this point of view, the tuning time should match the typi-
cal system recovery time, which is often from a few minutes to 1
hour [
1
]. To accelerate the tuning process by reducing the budget
to tens of iterations, ResTune utilizes the historical data collected
from tuning other tasks and transfer the experience into tuning
new tasks. This requires the tuning algorithm to eciently and
eectively represent useful knowledge from historical tuning data.
Our Approach
. Dierent from previous works that only consider
the throughput and latency, in this paper, we dene the resource-
oriented tuning problem that aims to nd the congurations to
minimize the resource usage without sacricing the throughput
and latency. We formulate it as a constrained optimization problem
and propose ResTune, a constraint-aware database tuning system
boosted by meta-learning. ResTune is a tool provided by the cloud
providers, which aims to reduce the Total Cost of Ownership for
its end users. It optimizes the resource utilization for a given work-
load by imposing constraints on the performance requirements.
ResTune models both the objective function and the constraints
using Gaussian processes to recommend congurations with op-
timized resource utilization while guaranteeing the SLA. To im-
prove the eciency of ResTune, we use meta-learning, which is the
method of systematically learning from meta-data to accomplish
new tasks [
44
]. A novel meta-learning pipeline is proposed to use
multiple models (base-learners) to represent prior knowledge and
an ensemble model (meta-learner) to combine and eectively uti-
lize the experiences. The meta-learner measures the usefulness of
base-learners to target workload through meta-feature and model
prediction. In this way, ResTune could accordingly make use of
existing data and accelerate the tuning process. Furthermore, our
approach can transfer the knowledge over dierent workloads and
heterogeneous hardware environments.
Specically, we make the following contributions:
•
To deal with the challenges in real DBMS scenarios, we formu-
late the resource-oriented conguration tuning problem as a
constrained Bayesian Optimization problem.
•
To accelerate the tuning process within an acceptable time inter-
val, a meta-learning strategy is proposed to extract experience
from past tasks. Unlike previous studies, our approach uses rel-
ative rankings rather than absolute distances to measure the
similarity between workloads. It can better transfer knowledge
across dierent hardware environments and achieve fast tuning
and ecient adaptation. To the best of our knowledge, this is the
rst attempt to boost constrained Bayesian Optimization with
meta-learning for tuning DBMS.
•
We implement the proposed method and evaluate on standard
benchmarks and real workloads. Compared with the manual con-
gurations provided by the DBAs, ResTune reduces 65% of CPU
utilization, 87% of I/O, and 39% of memory on average. Compared
with the state-of-the-art DBMS tuning systems, ResTune nds
better congurations with up to ∼ 18× speedups.
The remainder of the paper is organized as follows. Section
2 provides the related work and Section 3 formally denes the
Research Data Management Track Paper
SIGMOD ’21, June 20–25, 2021, Virtual Event, China
评论