SIGMOD 2023 - Dike, A Benchmark Suite for Distributed Transactional Databases.pdf

波风水门

263

4页

6次

2023-05-31

免费下载

Dike: A Benchmark Suite for Distributed Transactional Databases

Huidong Zhang

hdzhang@stu.ecnu.edu.cn

East China Normal University

Shang Hai, China

Luyi Qu

luyiqu@stu.ecnu.edu.cn

East China Normal University

Shang Hai, China

Qingshuai Wang

qswang@stu.ecnu.edu.cn

East China Normal University

Shang Hai, China

Rong Zhang

∗

rzhang@dase.ecnu.edu.cn

East China Normal University

Shang Hai, China

Peng Cai

pcai@dase.ecnu.edu.cn

East China Normal University

Shang Hai, China

Quanqing Xu

xuquanqing.xqq@oceanbase.com

OceanBase, AntGroup

Hang Zhou, China

Zhifeng Yang

zhuweng.yzf@oceanbase.com

OceanBase, AntGroup

Hang Zhou, China

Chuanhui Yang

rizhao.ych@oceanbase.com

OceanBase, AntGroup

Hang Zhou, China

ABSTRACT

Distributed relational database management systems (abbr. DDBMSs)

for online transaction processing (abbr. OLTP) have been gradually

adopted in production environments. With many relevant products

vying for the markets, an unbiased benchmark is urgently needed

to promote the development of transactional DDBMSs. Current

benchmarks for OLTP applications have not taken the challenges

encountered during the designs and implementations of a transac-

tional DDBMS into consideration, which expects to provide high

elasticity and availability as well as high throughputs. We propose

a benchmark suite Dike to evaluate the eorts to tackle these chal-

lenges. Dike is designed mainly from three aspects: quantitative

control to evaluate scalability, imbalanced distribution to evalu-

ate schedulability, and comprehensive fault injections to evaluate

availability. It also provides a dynamic load control to simulate

real-world scenarios. In this demonstration, users can experience

core features of Dike with user-friendly interfaces.

CCS CONCEPTS

• Information systems → Data management systems.

KEYWORDS

Distributed Transactional Database, Benchmark Suite

ACM Reference Format:

Huidong Zhang, Luyi Qu, Qingshuai Wang, Rong Zhang, Peng Cai, Quan-

qing Xu, Zhifeng Yang, and Chuanhui Yang. 2023. Dike: A Benchmark

Suite for Distributed Transactional Databases. In Companion of the 2023

International Conference on Management of Data (SIGMOD-Companion ’23),

∗

Rong Zhang is the corresponding author

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specic permission

and/or a fee. Request permissions from permissions@acm.org.

SIGMOD-Companion ’23, June 18–23, 2023, Seattle, WA, USA

ACM ISBN 978-1-4503-9507-6/23/06.. . $15.00

https://doi.org/10.1145/3555041.3589710

June 18–23, 2023, Seattle, WA, USA. ACM, New York, NY, USA, 4 pages.

https://doi.org/10.1145/3555041.3589710

1 INTRODUCTION

DDBMSs provide elastic and available services to handle continuing

business expansion. They have devoted lots of eorts to meet the

requirements of scalability, schedulability and availability. Specif-

ically, the database is partitioned among multiple servers for the

horizontal scaling of storage and computing resources, i.e., scalabil-

ity. Database migrates partitions between servers to balance storage

usages and avoid performance bottlenecks caused by a single server

overload, i.e., schedulability. Database stores multiple replicas for

each partition [

], to guarantee no data loss or service interruption

in the event of any system failures, i.e., availability.

Challenges arise when the database scales to distributed scenar-

ios. Firstly, transactions involving partitions from dierent servers,

aka, distributed transactions, suer a lot from network communica-

tion for coordination on transaction statuses [

]. Complex business

logic may lead to a large number of distributed transactions. Sec-

ondly, competitive accesses to hotspot data cost more in global

transaction ordering, global deadlock detection and cascading roll-

back, since DDBMSs have longer transaction processing paths than

standalone databases. Thirdly, when data and workload distribution

among servers are imbalanced, online database service suers from

performance degradation and service level agreement violation.

Fourthly, various system failures occur frequently in the cluster,

challenging DDBMSs to guarantee service quality.

Existing classic transactional benchmarks for relational databases,

such as TPC-C and TPC-E, are meticulously designed in database

schemas and workloads [

], but not sucient to cover the advanced

designs of DDBMSs and evaluate the pros and cons of each DDBMS,

which has been thoroughly discussed in work [

]. They can not (1)

quantify the generation of distributed transactions or contentions,

(2) generate skew data distribution or workloads to trigger data

migration for DDBMSs, and (3) provide comprehensive fault in-

jections. We propose a benchmark suite Dike, which follows the

practical application scenario of TPC-C but enriches its capabilities

to benchmark transactional DDBMSs. Dike is endowed with the

following four representative characteristics, which are

SIGMOD-Companion ’23, June 18–23, 2023, Seale, WA, USA Huidong Zhang et al.

•

Quantitative Control: The rst two challenges imply that

distributed transactions and contentions block transaction

execution and restrict the scalability of DDBMSs. Quantita-

tive control of workload generation is necessary to facilitate

impartial comparisons among DDBMSs.

•

Imbalanced Distribution: The third challenge requires a

benchmark to generate non-uniform data distributions and

skew access patterns to partitions. These features can evalu-

ate the schedulability of DDBMSs to support data migration

and balance the system resource usage of servers.

•

Comprehensive Fault Injection: The fourth challenge

mentions that frequent and various exceptions plague DDBMSs.

Chaos testing with comprehensive fault injections could ver-

ify the robustness and availability of DDBMSs.

•

Dynamic Load Variation: In real application scenarios,

load distributions vary over time in both volumes and types.

DDBMSs expect to scale out (resp. down) to handle heavy

(resp. light) loads to guarantee performance with low cost.

In this demonstration, we demonstrate Dike from (1) special

designs to benchmark the scalability, availability and schedulability

of transactional DDBMSs, (2) interactive user interfaces to operate

Dike and benchmark results to verify the eectiveness of Dike.

2 DESIGN OVERVIEW OF DIKE

Figure 1: Dike Architecture

The system architecture of Dike is depicted in Figure 1. The

control ow is in double solid lines, which mainly includes JDBC

commands and quantitative, imbalanced, dynamic control of the

benchmark process. The data ow is in double dashed lines, mainly

including statistic results.

Dike is a standalone benchmark suite deployed on multiple

clients for scalable workload generation. Three common compo-

nents grayed out provide support for the benchmark process. Con-

guration Context parses control parameters for the user-preferred

scenario. Probabilistic Lib implements probabilistic distribution

models and quantitative control algorithms for database and work-

load generation. JDBC Connection Pool maintains connections to

DDBMSs and works in the MySQL or PostgreSQL compatible way.

From the perspective of the benchmark process, the workow

can be divided into three stages, i.e., the preparation stage (marked

in blue), the execution stage (marked in red) and the report stage

(marked in green). In the preparation stage, Schema Generator cre-

ates TPC-C style database schemas based on partitioning and data

placement strategies. Database Generator populates the database

and controls data volume. In the execution stage, Workload Executer

controls transaction proportions, instantiates transaction templates

and interacts with DDBMSs via JDBC connection. Workload Con-

troller is responsible for the variation of load patterns and sends

control messages to Workload Executer, e.g., dynamic load volume.

In the report stage, Statistics Collector receives system resource met-

rics from Resource Monitor and transaction traces from Workload

Executer, and nally generates benchmark reports. Database and

workload are generated in a multi-thread mode for eciency.

The system under test (abbr. SUT) is a DDBMS, which may be

deployed with Load Balancer, such as OBProxy for OceanBase [

]

and HAProxy for CockroachDB. Dike deploys Resource Monitor

on each server to record the utilization of system resources for

performance diagnosis.

We illustrate our designs in the following four aspects, which

are how to generate (1) quantitative distributed transactions, (2)

quantitative contentions, (3) non-uniform data distribution and

skew partition access pattern, and (4) various types of exceptions.

2.1 Quantitative Distributed Transaction

Distributed transactions suer from high latency for acquiring data

from remote servers, and adopting atomic commit protocols to

achieve a consensus on all updates among servers.

The classic OLTP benchmark TPC-C generates distributed trans-

actions by accessing data from dierent warehouses with a low

unquantiable probability [

]. Quantitative control of distributed

transactions, which have a xed number of cross-servers, can con-

trol to quantify the impact on database throughputs and make fair

comparisons among DDBMSs.

Most tables in TPC-C, except for Item, are closely associated

with Warehouse. Data scales with the cardinality of Warehouse [

]

and partitions by the unique identier of each warehouse, i.e., wid,

which are the same for Dike. Specically, transaction NewOrder in

TPC-C simulates the behavior of product orderings, which mainly

involves products supplied by the local warehouse. Dike extends

the logic of NewOrder by controling the products from remote

warehouses. It generates distributed transactions by updating table

Stock of each warehouse in dierent servers.

Based on the access probability, we formalize the transaction

distribution as follows. Suppose the number of servers in the cluster

𝑁

, the target number of cross-servers is

𝑛

and the number of dis-

tinct warehouses is

𝑐

. The probability that the data of a warehouse

resides on

𝑖

𝑡ℎ

server is

𝑝

𝑖

with

𝑁

𝑖=1

𝑝

𝑖

1. Whether NewOrder visits

𝑖

𝑡ℎ

server or not is denoted as

𝑃 (𝑥

𝑖

)

and

𝑃 (𝑥

𝑖

)

respectively,

which are calculated in Equation 1.

𝑃

(

𝑥

𝑖

= 0

)

(

1 − 𝑝

𝑖

)

𝑐

; 𝑃

(

𝑥

𝑖

= 1

)

= 1 −

(

1 − 𝑝

𝑖

)

𝑐

(1)

To make the expectation of distinct servers visited by NewOrder

close to the target

𝑛

, we formalize the quantitative distribution

control problem by Equation 2. Given

𝑁

and

𝑛

𝑐

can be calculated

through Probabilistic Lib. Workload Executer selects

𝑐

distinct ware-

houses for NewOrder to instantiate transaction templates to create

distributed transactions.

𝐸(𝑥) =

𝑁



𝑖=1

𝐸

(

𝑥

𝑖

)

= 𝑁 −

𝑁



𝑖=1

(

1 − 𝑝

𝑖

)

𝑐

= 𝑛 (2)