暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
HTAP Databases- What is New and What is Next.pdf
878
6页
34次
2022-08-03
免费下载
HTAP Databases: What is New and What is Next
Guoliang Li
Department of Computer Science, Tsinghua University
liguoliang@tsinghua.edu.cn
Chao Zhang
Department of Computer Science, Tsinghua University
cycchao@mail.tsinghua.edu.cn
ABSTRACT
Processing the mixed workloads of transactions and analytical
queries in a single database system can eliminate the ETL process
and enable real-time data analysis on the transaction data. How-
ever, there is no free lunch. Such systems must balance the trade-o
between workload isolation and data freshness due to interweav-
ing workloads of OLTP and OLAP. Since Gartner coined the term,
Hybrid Transactional/Analytical Processing (HTAP), we have wit-
nessed the emergence of various database systems to support HTAP.
One common feature is that they leverage the best of row store
and column store to achieve high quality of HTAP. As they have
disparate storage strategies and processing techniques to satisfy the
requirements of various HTAP applications, it is essential to under-
stand, compare, and evaluate their key techniques. In this tutorial,
we oer a comprehensive survey of HTAP databases. We introduce
a taxonomy of state-of-the-art HTAP databases according to their
storage strategies and architectures. We then take a deep dive into
their key techniques regarding transaction processing, analytical
processing, data synchronization, query optimization, and resource
scheduling. We also introduce existing HTAP benchmarks. Finally,
we discuss the research challenges and open problems for HTAP.
CCS CONCEPTS
Information systems
Database transaction processing;
Database query processing.
KEYWORDS
HTAP Databases; Transaction Processing; Query Processing
ACM Reference Format:
Guoliang Li and Chao Zhang. 2022. HTAP Databases: What is New and
What is Next. In Proceedings of the 2022 International Conference on Manage-
ment of Data (SIGMOD ’22), June 12–17, 2022, Philadelphia, PA, USA. ACM,
Philadelphia, PA, USA, 6 pages. https://doi.org/10.1145/3514221.3522565
1 INTRODUCTION
Background. All organizations are processing more data than ever
at their disposal, and data keeps coming with high velocity, vol-
ume and variety [
26
,
30
,
53
,
55
]. For businesses with data-intensive
applications, it is benecial to have a single HTAP system that
not only can eciently handle on-line transactional processing
(OLTP), but also can perform on-line analytical processing (OLAP)
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA.
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9249-5/22/06.. .$15.00
https://doi.org/10.1145/3514221.3522565
for prompt decision-making. For instance, when equipped with an
HTAP system, entrepreneurs in retail applications can analyze the
latest transaction data in real time and identify the sales trend, then
take timely actions, e.g., roll out advertising campaigns for promis-
ing products [
35
]. In nance applications, vendors can leverage an
HTAP system to process the customer transactions eciently while
detecting the fraudulent transactions simultaneously [16, 36, 47].
HTAP Denition. Hybrid Transactional/Analytical Processing
(HTAP) is an application architecture proposed by a Gartner report
[
35
] at 2014, which utilizes in-memory computing technologies
to enable concurrent analytical and transaction processing on the
same in-memory data store. Such an architecture should elimi-
nate the need of Extract-Transform-Load (ETL) process, thereby
accelerating data analytics and bringing dramatic business innova-
tion. In 2018, Gartner extended the HTAP concept to "In-Process
HTAP" [
15
], an application architecture that supports weaving an-
alytical and transaction processing techniques together as needed
to accomplish the business task. Such a new denition indicates
HTAP is no longer limited to in-memory computing techniques.
Motivation. Over the last few years, numerous database systems
[
18
22
,
29
,
31
,
42
,
44
] have been developed to enable HTAP. One
common feature is that they utilize the best of row store and col-
umn store to achieve high quality of HTAP. Nevertheless, they have
disparate storage strategies and processing techniques albeit the
dual-store feature. This main reason for such diversity is that dif-
ferent classes of HTAP systems target at dierent applications. For
instance, it depends on whether OLTP or OLAP is the rst citizen
of the applications, or both are important. It also depends on the re-
quirements of availability, scalability, system performance, and data
freshness [
9
] specied in the service level agreements (SLAs) [
17
].
Consequently, HTAP systems must balance the trade-o between
workload isolation and data freshness due to interweaving work-
loads of OLTP and OLAP. To better harness these HTAP forces for
various applications, it is of paramount importance to study, under-
stand, and compare their key techniques. In this tutorial, we study
HTAP databases that utilize row store and column store together
to eciently handle the mixed workloads of OLTP and OLAP in a
single database system.
Tutorial Overview. We will provide a comprehensive tutorial on
HTAP databases. The intended length of the tutorial is 3 hours. The
tutorial consists of four sections as follows.
(1) HTAP Databases (30 min). This section starts with an intro-
duction to the background of HTAP databases. It provides a classi-
cation according to their storage architectures, then introduces
the main approaches in each category. As shown in Figure 1, it clas-
sies HTAP databases into four categories: (a) Primary Row store
+ In-Memory Column store; (b) Distributed Row Store + Column
Store Replica; (c) Disk Row Store + Distributed Column Store; and
(d) Primary Column Store + Delta Row Store. Then, it presents the
main HTAP techniques and representatives for each architecture.
Tutorial
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA
2483
Node 3
Row Store
Disk
Master
Node 2
Node 1
Memory
Node 3
(a) Primary Row Store+In-Memory Column Store (b) Distributed Row Store + Column Store Replica (c) Disk Row Store + Distributed Column Store (d) Primary Column Store + Delta Row Store
Persistent Storage
Memory
Log
Merge
Column Store
Delta
ClientClient
Disk
Column Store
Memory
Node 1
Partition 1
Partition 2
Partition 3
Master
Partition 3
Partition 1
Partition 2
Node 2
Partition 2
Partition 3
Partition 1
Transform
Row Store
Delta
Column Store
Persistent Storage
Log
Merge
Transform
Figure 1: Storage Architectures of State-Of-The-Art HTAP Databases
Table 1: A Classication of State-Of-The-Art HTAP Databases based on the Storage Architecture
Category HTAP databases TP Throughput AP Throughput TP Scalability AP Scalability Isolation Freshness
Primary Row Store + In-
Memory Column Store
Oracle Dual-Format[19],
SQL Server[20], DB2 BLU[39]
High High Medium Low Low High
Distributed Row Store +
Column Store Replica
TiDB[18], SingleStore[44] Medium Medium High High High Low
Disk Row Store + Dis-
tributed Column Store
MySQL Heatwave[31] Medium Medium Medium High High Medium
Primary Column Store
+ Delta Row Store
SAP HANA[43] Medium High Low Medium Low High
Particularly, it summarizes the pros and cons of dierent HTAP
solutions regarding performance, scalability, workload isolation,
and data freshness (see Table 1).
(2) HTAP Techniques (40 min). This section takes a deep dive into
the key techniques of HTAP databases, paying particular attentions
to their techniques concerning transaction processing, analytical
processing, data synchronization, query optimization, and resource
scheduling. The detailed key techniques in each module are shown
in Table 2. Overall, it focuses on ve task types for HTAP as follows.
Transaction processing (TP) techniques. This part will introduce
two types of TP techniques, including (i) MVCC + logging [
19
,
20
,
31
,
39
,
43
] that relies on multi-version concurrency control (MVCC)
protocols and logging techniques for transaction processing; and
(ii) 2PC+Raft+logging [
18
] that processes the transactions in a dis-
tributed architecture based on a two-phase commit (2PC) protocol,
a Raft-based consensus algorithm, and logging techniques.
Analytical processing (AP) techniques. This part will introduce
three kinds of AP techniques. The rst type is (i) in-memory delta
and column scan [
19
,
20
,
31
,
39
,
43
] that responds to an analytical
query by performing a scan on the in-memory columnar data and
visible delta tuples yet being merged simultaneously. The second
type is (ii) disk-based delta and column scan [
18
] that scans the
log-based delta les and the column store together for an incoming
query. The third type is (iii) column scan [
44
] that performs the
query purely in the column store.
Data synchronization (DS) techniques. This part will introduce
three types of DS techniques for synchronizing data between OLTP
and OLAP, including (i) in-memory delta merge [
19
,
20
,
31
,
39
,
43
]
that merges the newly-inserted in-memory delta data to the main
column store; and (ii) disk-based delta merge [
18
] that periodically
merges the disk-based delta les to the main column store; and (iii)
rebuild from primary row store [
19
,
39
] that rebuilds the in-memory
column store from the primary row store.
Query optimization techniques. This part will introduce three as-
pects of query optimization techniques, including (i) column se-
lection for HTAP [
19
,
31
] that automatically selects the columns
from the primary store into main memory based on the history
workload; (ii) hybrid row/column scan [
18
,
20
] that relies on cost-
based functions to determines whether to perform a query over the
row store or over the column store; and (iii) CPU/GPU Accelera-
tion for HTAP [
5
,
22
] that leverages heterogeneous hardware, i.e.,
CPU/GPU architecture to accelerate HTAP workloads, respectively.
Resource scheduling techniques. This part will introduce the re-
source scheduling techniques that aim to improve the resource
utilization by dynamically allocating resources, e.g., CPU and mem-
ory, for HTAP. It mainly introduces two types of techniques. The
rst one is the workload-driven scheduling [
43
,
45
] that adaptively
adjusts the resources of OLTP and OLAP workloads based on the
execution status of workload. The second one is the freshness-
driven scheduling [
40
] that controls the execution modes of HTAP
workloads based on the freshness metric.
(3) HTAP Benchmarks (10 mins). This section introduces the
existing benchmarks and evaluation practices on HTAP databases.
It will introduce several end-to-end HTAP benchmarks including
TPC-C [
48
], TPC-H [
49
], HTAPbench [
10
], and CH-benchmark [
11
].
Specically, it will walk through the key aspects of the benchmarks,
including data generation, execution rule, and performance metrics.
In addition, it will introduce two HTAP micro-benchmarks: ADAPT
[
6
] and HAP [
7
] benchmarks. After that, it summarizes the key
insights from existing evaluation practices [13, 38, 40, 42, 45].
(4) Challenges and Open Problems (10 mins). The nal section
concludes the tutorial and discusses the research challenges and
open problems for HTAP techniques. It summarizes the tutorial
topics, then presents several challenges and open problems. Firstly,
it presents the limitations of existing methods on column selection
for HTAP workloads, then discusses the possibility of learning-
based methods on this task. Secondly, it discusses the challenges for
HTAP query optimization and calls for a learned query optimizer
for HTAP. Thirdly, it discusses the limitation of current approaches
on HTAP resource scheduling, then calls for new adaptive meth-
ods. Finally, it discusses the limitation of existing benchmarks and
envisions a new HTAP benchmark suite.
Tutorial
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA
2484
of 6
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜