暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
SIGMOD 2023 - Dike, A Benchmark Suite for Distributed Transactional Databases.pdf
263
4页
6次
2023-05-31
免费下载
Dike: A Benchmark Suite for Distributed Transactional Databases
Huidong Zhang
hdzhang@stu.ecnu.edu.cn
East China Normal University
Shang Hai, China
Luyi Qu
luyiqu@stu.ecnu.edu.cn
East China Normal University
Shang Hai, China
Qingshuai Wang
qswang@stu.ecnu.edu.cn
East China Normal University
Shang Hai, China
Rong Zhang
rzhang@dase.ecnu.edu.cn
East China Normal University
Shang Hai, China
Peng Cai
pcai@dase.ecnu.edu.cn
East China Normal University
Shang Hai, China
Quanqing Xu
xuquanqing.xqq@oceanbase.com
OceanBase, AntGroup
Hang Zhou, China
Zhifeng Yang
zhuweng.yzf@oceanbase.com
OceanBase, AntGroup
Hang Zhou, China
Chuanhui Yang
rizhao.ych@oceanbase.com
OceanBase, AntGroup
Hang Zhou, China
ABSTRACT
Distributed relational database management systems (abbr. DDBMSs)
for online transaction processing (abbr. OLTP) have been gradually
adopted in production environments. With many relevant products
vying for the markets, an unbiased benchmark is urgently needed
to promote the development of transactional DDBMSs. Current
benchmarks for OLTP applications have not taken the challenges
encountered during the designs and implementations of a transac-
tional DDBMS into consideration, which expects to provide high
elasticity and availability as well as high throughputs. We propose
a benchmark suite Dike to evaluate the eorts to tackle these chal-
lenges. Dike is designed mainly from three aspects: quantitative
control to evaluate scalability, imbalanced distribution to evalu-
ate schedulability, and comprehensive fault injections to evaluate
availability. It also provides a dynamic load control to simulate
real-world scenarios. In this demonstration, users can experience
core features of Dike with user-friendly interfaces.
CCS CONCEPTS
Information systems Data management systems.
KEYWORDS
Distributed Transactional Database, Benchmark Suite
ACM Reference Format:
Huidong Zhang, Luyi Qu, Qingshuai Wang, Rong Zhang, Peng Cai, Quan-
qing Xu, Zhifeng Yang, and Chuanhui Yang. 2023. Dike: A Benchmark
Suite for Distributed Transactional Databases. In Companion of the 2023
International Conference on Management of Data (SIGMOD-Companion ’23),
Rong Zhang is the corresponding author
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
SIGMOD-Companion ’23, June 18–23, 2023, Seattle, WA, USA
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9507-6/23/06.. . $15.00
https://doi.org/10.1145/3555041.3589710
June 18–23, 2023, Seattle, WA, USA. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3555041.3589710
1 INTRODUCTION
DDBMSs provide elastic and available services to handle continuing
business expansion. They have devoted lots of eorts to meet the
requirements of scalability, schedulability and availability. Specif-
ically, the database is partitioned among multiple servers for the
horizontal scaling of storage and computing resources, i.e., scalabil-
ity. Database migrates partitions between servers to balance storage
usages and avoid performance bottlenecks caused by a single server
overload, i.e., schedulability. Database stores multiple replicas for
each partition [
2
], to guarantee no data loss or service interruption
in the event of any system failures, i.e., availability.
Challenges arise when the database scales to distributed scenar-
ios. Firstly, transactions involving partitions from dierent servers,
aka, distributed transactions, suer a lot from network communica-
tion for coordination on transaction statuses [
3
]. Complex business
logic may lead to a large number of distributed transactions. Sec-
ondly, competitive accesses to hotspot data cost more in global
transaction ordering, global deadlock detection and cascading roll-
back, since DDBMSs have longer transaction processing paths than
standalone databases. Thirdly, when data and workload distribution
among servers are imbalanced, online database service suers from
performance degradation and service level agreement violation.
Fourthly, various system failures occur frequently in the cluster,
challenging DDBMSs to guarantee service quality.
Existing classic transactional benchmarks for relational databases,
such as TPC-C and TPC-E, are meticulously designed in database
schemas and workloads [
4
], but not sucient to cover the advanced
designs of DDBMSs and evaluate the pros and cons of each DDBMS,
which has been thoroughly discussed in work [
1
]. They can not (1)
quantify the generation of distributed transactions or contentions,
(2) generate skew data distribution or workloads to trigger data
migration for DDBMSs, and (3) provide comprehensive fault in-
jections. We propose a benchmark suite Dike, which follows the
practical application scenario of TPC-C but enriches its capabilities
to benchmark transactional DDBMSs. Dike is endowed with the
following four representative characteristics, which are
SIGMOD-Companion ’23, June 18–23, 2023, Seale, WA, USA Huidong Zhang et al.
Quantitative Control: The rst two challenges imply that
distributed transactions and contentions block transaction
execution and restrict the scalability of DDBMSs. Quantita-
tive control of workload generation is necessary to facilitate
impartial comparisons among DDBMSs.
Imbalanced Distribution: The third challenge requires a
benchmark to generate non-uniform data distributions and
skew access patterns to partitions. These features can evalu-
ate the schedulability of DDBMSs to support data migration
and balance the system resource usage of servers.
Comprehensive Fault Injection: The fourth challenge
mentions that frequent and various exceptions plague DDBMSs.
Chaos testing with comprehensive fault injections could ver-
ify the robustness and availability of DDBMSs.
Dynamic Load Variation: In real application scenarios,
load distributions vary over time in both volumes and types.
DDBMSs expect to scale out (resp. down) to handle heavy
(resp. light) loads to guarantee performance with low cost.
In this demonstration, we demonstrate Dike from (1) special
designs to benchmark the scalability, availability and schedulability
of transactional DDBMSs, (2) interactive user interfaces to operate
Dike and benchmark results to verify the eectiveness of Dike.
2 DESIGN OVERVIEW OF DIKE
Figure 1: Dike Architecture
The system architecture of Dike is depicted in Figure 1. The
control ow is in double solid lines, which mainly includes JDBC
commands and quantitative, imbalanced, dynamic control of the
benchmark process. The data ow is in double dashed lines, mainly
including statistic results.
Dike is a standalone benchmark suite deployed on multiple
clients for scalable workload generation. Three common compo-
nents grayed out provide support for the benchmark process. Con-
guration Context parses control parameters for the user-preferred
scenario. Probabilistic Lib implements probabilistic distribution
models and quantitative control algorithms for database and work-
load generation. JDBC Connection Pool maintains connections to
DDBMSs and works in the MySQL or PostgreSQL compatible way.
From the perspective of the benchmark process, the workow
can be divided into three stages, i.e., the preparation stage (marked
in blue), the execution stage (marked in red) and the report stage
(marked in green). In the preparation stage, Schema Generator cre-
ates TPC-C style database schemas based on partitioning and data
placement strategies. Database Generator populates the database
and controls data volume. In the execution stage, Workload Executer
controls transaction proportions, instantiates transaction templates
and interacts with DDBMSs via JDBC connection. Workload Con-
troller is responsible for the variation of load patterns and sends
control messages to Workload Executer, e.g., dynamic load volume.
In the report stage, Statistics Collector receives system resource met-
rics from Resource Monitor and transaction traces from Workload
Executer, and nally generates benchmark reports. Database and
workload are generated in a multi-thread mode for eciency.
The system under test (abbr. SUT) is a DDBMS, which may be
deployed with Load Balancer, such as OBProxy for OceanBase [
5
]
and HAProxy for CockroachDB. Dike deploys Resource Monitor
on each server to record the utilization of system resources for
performance diagnosis.
We illustrate our designs in the following four aspects, which
are how to generate (1) quantitative distributed transactions, (2)
quantitative contentions, (3) non-uniform data distribution and
skew partition access pattern, and (4) various types of exceptions.
2.1 Quantitative Distributed Transaction
Distributed transactions suer from high latency for acquiring data
from remote servers, and adopting atomic commit protocols to
achieve a consensus on all updates among servers.
The classic OLTP benchmark TPC-C generates distributed trans-
actions by accessing data from dierent warehouses with a low
unquantiable probability [
4
]. Quantitative control of distributed
transactions, which have a xed number of cross-servers, can con-
trol to quantify the impact on database throughputs and make fair
comparisons among DDBMSs.
Most tables in TPC-C, except for Item, are closely associated
with Warehouse. Data scales with the cardinality of Warehouse [
4
]
and partitions by the unique identier of each warehouse, i.e., wid,
which are the same for Dike. Specically, transaction NewOrder in
TPC-C simulates the behavior of product orderings, which mainly
involves products supplied by the local warehouse. Dike extends
the logic of NewOrder by controling the products from remote
warehouses. It generates distributed transactions by updating table
Stock of each warehouse in dierent servers.
Based on the access probability, we formalize the transaction
distribution as follows. Suppose the number of servers in the cluster
is
𝑁
, the target number of cross-servers is
𝑛
and the number of dis-
tinct warehouses is
𝑐
. The probability that the data of a warehouse
resides on
𝑖
𝑡
server is
𝑝
𝑖
with
Σ
𝑁
𝑖=1
𝑝
𝑖
=
1. Whether NewOrder visits
𝑖
𝑡
server or not is denoted as
𝑃 (𝑥
𝑖
=
0
)
and
𝑃 (𝑥
𝑖
=
1
)
respectively,
which are calculated in Equation 1.
𝑃
(
𝑥
𝑖
= 0
)
=
(
1 𝑝
𝑖
)
𝑐
; 𝑃
(
𝑥
𝑖
= 1
)
= 1
(
1 𝑝
𝑖
)
𝑐
(1)
To make the expectation of distinct servers visited by NewOrder
close to the target
𝑛
, we formalize the quantitative distribution
control problem by Equation 2. Given
𝑁
and
𝑛
,
𝑐
can be calculated
through Probabilistic Lib. Workload Executer selects
𝑐
distinct ware-
houses for NewOrder to instantiate transaction templates to create
distributed transactions.
𝐸(𝑥) =
𝑁
𝑖=1
𝐸
(
𝑥
𝑖
)
= 𝑁
𝑁
𝑖=1
(
1 𝑝
𝑖
)
𝑐
= 𝑛 (2)
of 4
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜