
DBMind: A Self-Driving Platform in openGauss
Xuanhe Zhou
§∗
, Lianyuan Jin
§∗
, Ji Sun
§
, Xinyang Zhao
§
, Xiang Yu
§
, Jianhua Feng
§
, Shifu Li
♣
,
Tianqing Wang
♣
, Kun Li
♣
, Luyang Liu
♣
§
Department of Computer Science,Tsinghua University, Beijing, China
♣
Gauss Department, Huawei Company, Beijing, China
{zhouxuan19,jinly20,sun-j16,xy-zhao20,x-yu17}@mails.tsinghua.edu.cn,fengjh@tsinghua.edu.cn,
{lishifu,wangtianqing2,likun75,liuluyang2}@huawei.com
ABSTRACT
We demonstrate a self-driving system
DBMind
, which pro-
vides three autonomous capabilities in database, including
self-monitoring
,
self-diagnosis
and
self-optimization
.
First,
self-monitoring
judiciously collects database metrics and
detects anomalies (e.g., slow queries and IO contention), which
can prole database status while only slightly aecting system per-
formance (<5%). Then,
self-diagnosis
utilizes an LSTM model
to analyze the root causes of the anomalies and automatically
detect root causes from a pre-dened failure hierarchy. Next,
self-optimization
automatically optimizes the database perfor-
mance using learning-based techniques, including deep reinforce-
ment learning based knob tuning, reinforcement learning based
index selection, and encoder-decoder based view selection. We have
implemented
DBMind
in an open source database openGauss and
demonstrated real scenarios.
PVLDB Reference Format:
Xuanhe Zhou, Lianyuan Jin, Ji Sun, Xinyang Zhao, Xiang Yu, Jianhua Feng,
Shifu Li, Tianqing Wang, Kun Li, Luyang Liu. DBMind: A Self-Driving
Platform in openGauss. PVLDB, 14(12): 2743 - 2746, 2021.
doi:10.14778/3476311.3476334
1 INTRODUCTION
Traditional databases rely on DBAs to diagnose and optimize the
databases in order to meet the high-performance requirements.
However, these manual methods cannot satisfy the requirements
for rapidly growing users, data, and workloads, and thus it calls for
a self-driving database management platform that automatically
monitors, diagnoses and optimizes databases. For example, suppose
a cloud database provider maintains 100,000 database instances
and one DBA can manage 100 database instances. It requires one
thousand DBAs to maintain these instances. To make the things
worse, some tricky problems (e.g., disk crash) require DBAs to take
hours to trace and recover the database.
Existing databases mainly have three limitations [
3
,
13
]. First,
there are hundreds of system metrics, and current databases cannot
eciently detect anomalies (e.g., slow query, IO contention) and
potential risks (e.g., insucient disk space) with basic statistical
methods. Besides, it is expensive to rely on DBAs to detect large
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 14, No. 12 ISSN 2150-8097.
doi:10.14778/3476311.3476334
scale statistical data, especially for cloud databases with millions
of instances. Second, existing databases cannot automatically diag-
nose the root causes of the detected anomalies, because there are
numerous highly correlated database modules and it is laborious
to rely on experts to label the anomaly cases. Third, the optimiza-
tion techniques (e.g., query rewrite, index suggestion) in current
databases are mainly heuristics and they may nd sub-optimal so-
lutions under complex scenarios. For example, for a nested query,
openGauss creates a temporary table for the uncorrelated subquery
but cannot consider the optimization within the subquery.
To address these challenges, we propose learning-based tech-
niques, build a self-driving database platform
DBMind
and demon-
strate the following features (Figure 1).
(1)
Self-monitoring
monitors and collects the information of
database instances. The information includes
(i )
OS resource met-
rics,
(ii)
database status metrics,
(iii)
log alarm metrics. It monitors
each database instance and stores the collected information in a
storage system on the user side or time-series databases integrated
in the server side. Moreover,
Self-monitoring
detects anomalies
from the time-series data by
(i )
identifying abnormal indicators
with spectral residual algorithm [
5
] and
(ii)
utilizing prediction
algorithms (e.g., graph neural networks [
14
]) to predict future risks
(e.g., slow queries, resource anomaly, performance degradation, and
security anomaly).
(2)
Self-diagnosis
trains an LSTM model to learn root causes
from both normal and abnormal data. Besides, it constructs a m
(which organizes the failure category-subcategory into a hierarchy)
to store representative metrics and root causes. For any abnor-
mal data, we compute an abnormal vector with the Kolmogorov-
Smirnov test and match the root cause in the failure hierarchy to
detect the root cause.
(3)
Self-optimization
proposes learning-based techniques to
optimize the databases, including reinforcement learning tech-
niques for index recommendation, deep reinforcement learning
techniques for knob tuning [
4
,
12
], and encoder-decoder model for
materialized view recommendation [2].
DBMind
diers from existing database systems in two main as-
pects: (1)
DBMind
designs eective learned methods to realize self-
monitoring, self-diagnosis, and self-optimization; (2)
DBMind
is in-
tegrated into an open source database opengauss and achieves both
high usability and robustness. Experiments on real datasets have
veried that
DBMind
can quickly discover slow SQL statements, give
optimization suggestions in real time, save DBA time by over 80%,
identies and solves potential risks (e.g., disk crash).
* These authors contribute equally to this work
2743
评论