SIGMOD-Companion ’23, June 18–23, 2023, Seale, WA, USA Huidong Zhang et al.
•
Quantitative Control: The rst two challenges imply that
distributed transactions and contentions block transaction
execution and restrict the scalability of DDBMSs. Quantita-
tive control of workload generation is necessary to facilitate
impartial comparisons among DDBMSs.
•
Imbalanced Distribution: The third challenge requires a
benchmark to generate non-uniform data distributions and
skew access patterns to partitions. These features can evalu-
ate the schedulability of DDBMSs to support data migration
and balance the system resource usage of servers.
•
Comprehensive Fault Injection: The fourth challenge
mentions that frequent and various exceptions plague DDBMSs.
Chaos testing with comprehensive fault injections could ver-
ify the robustness and availability of DDBMSs.
•
Dynamic Load Variation: In real application scenarios,
load distributions vary over time in both volumes and types.
DDBMSs expect to scale out (resp. down) to handle heavy
(resp. light) loads to guarantee performance with low cost.
In this demonstration, we demonstrate Dike from (1) special
designs to benchmark the scalability, availability and schedulability
of transactional DDBMSs, (2) interactive user interfaces to operate
Dike and benchmark results to verify the eectiveness of Dike.
2 DESIGN OVERVIEW OF DIKE
Figure 1: Dike Architecture
The system architecture of Dike is depicted in Figure 1. The
control ow is in double solid lines, which mainly includes JDBC
commands and quantitative, imbalanced, dynamic control of the
benchmark process. The data ow is in double dashed lines, mainly
including statistic results.
Dike is a standalone benchmark suite deployed on multiple
clients for scalable workload generation. Three common compo-
nents grayed out provide support for the benchmark process. Con-
guration Context parses control parameters for the user-preferred
scenario. Probabilistic Lib implements probabilistic distribution
models and quantitative control algorithms for database and work-
load generation. JDBC Connection Pool maintains connections to
DDBMSs and works in the MySQL or PostgreSQL compatible way.
From the perspective of the benchmark process, the workow
can be divided into three stages, i.e., the preparation stage (marked
in blue), the execution stage (marked in red) and the report stage
(marked in green). In the preparation stage, Schema Generator cre-
ates TPC-C style database schemas based on partitioning and data
placement strategies. Database Generator populates the database
and controls data volume. In the execution stage, Workload Executer
controls transaction proportions, instantiates transaction templates
and interacts with DDBMSs via JDBC connection. Workload Con-
troller is responsible for the variation of load patterns and sends
control messages to Workload Executer, e.g., dynamic load volume.
In the report stage, Statistics Collector receives system resource met-
rics from Resource Monitor and transaction traces from Workload
Executer, and nally generates benchmark reports. Database and
workload are generated in a multi-thread mode for eciency.
The system under test (abbr. SUT) is a DDBMS, which may be
deployed with Load Balancer, such as OBProxy for OceanBase [
5
]
and HAProxy for CockroachDB. Dike deploys Resource Monitor
on each server to record the utilization of system resources for
performance diagnosis.
We illustrate our designs in the following four aspects, which
are how to generate (1) quantitative distributed transactions, (2)
quantitative contentions, (3) non-uniform data distribution and
skew partition access pattern, and (4) various types of exceptions.
2.1 Quantitative Distributed Transaction
Distributed transactions suer from high latency for acquiring data
from remote servers, and adopting atomic commit protocols to
achieve a consensus on all updates among servers.
The classic OLTP benchmark TPC-C generates distributed trans-
actions by accessing data from dierent warehouses with a low
unquantiable probability [
4
]. Quantitative control of distributed
transactions, which have a xed number of cross-servers, can con-
trol to quantify the impact on database throughputs and make fair
comparisons among DDBMSs.
Most tables in TPC-C, except for Item, are closely associated
with Warehouse. Data scales with the cardinality of Warehouse [
4
]
and partitions by the unique identier of each warehouse, i.e., wid,
which are the same for Dike. Specically, transaction NewOrder in
TPC-C simulates the behavior of product orderings, which mainly
involves products supplied by the local warehouse. Dike extends
the logic of NewOrder by controling the products from remote
warehouses. It generates distributed transactions by updating table
Stock of each warehouse in dierent servers.
Based on the access probability, we formalize the transaction
distribution as follows. Suppose the number of servers in the cluster
is
𝑁
, the target number of cross-servers is
𝑛
and the number of dis-
tinct warehouses is
𝑐
. The probability that the data of a warehouse
resides on
𝑖
𝑡ℎ
server is
𝑝
𝑖
with
Σ
𝑁
𝑖=1
𝑝
𝑖
=
1. Whether NewOrder visits
𝑖
𝑡ℎ
server or not is denoted as
𝑃 (𝑥
𝑖
=
0
)
and
𝑃 (𝑥
𝑖
=
1
)
respectively,
which are calculated in Equation 1.
𝑃
(
𝑥
𝑖
= 0
)
=
(
1 − 𝑝
𝑖
)
𝑐
; 𝑃
(
𝑥
𝑖
= 1
)
= 1 −
(
1 − 𝑝
𝑖
)
𝑐
(1)
To make the expectation of distinct servers visited by NewOrder
close to the target
𝑛
, we formalize the quantitative distribution
control problem by Equation 2. Given
𝑁
and
𝑛
,
𝑐
can be calculated
through Probabilistic Lib. Workload Executer selects
𝑐
distinct ware-
houses for NewOrder to instantiate transaction templates to create
distributed transactions.
𝐸(𝑥) =
𝑁
𝑖=1
𝐸
(
𝑥
𝑖
)
= 𝑁 −
𝑁
𝑖=1
(
1 − 𝑝
𝑖
)
𝑐
= 𝑛 (2)
评论