HTAP Databases: What is New and What is Next
Guoliang Li
Department of Computer Science, Tsinghua University
liguoliang@tsinghua.edu.cn
Chao Zhang
Department of Computer Science, Tsinghua University
cycchao@mail.tsinghua.edu.cn
ABSTRACT
Processing the mixed workloads of transactions and analytical
queries in a single database system can eliminate the ETL process
and enable real-time data analysis on the transaction data. How-
ever, there is no free lunch. Such systems must balance the trade-o
between workload isolation and data freshness due to interweav-
ing workloads of OLTP and OLAP. Since Gartner coined the term,
Hybrid Transactional/Analytical Processing (HTAP), we have wit-
nessed the emergence of various database systems to support HTAP.
One common feature is that they leverage the best of row store
and column store to achieve high quality of HTAP. As they have
disparate storage strategies and processing techniques to satisfy the
requirements of various HTAP applications, it is essential to under-
stand, compare, and evaluate their key techniques. In this tutorial,
we oer a comprehensive survey of HTAP databases. We introduce
a taxonomy of state-of-the-art HTAP databases according to their
storage strategies and architectures. We then take a deep dive into
their key techniques regarding transaction processing, analytical
processing, data synchronization, query optimization, and resource
scheduling. We also introduce existing HTAP benchmarks. Finally,
we discuss the research challenges and open problems for HTAP.
CCS CONCEPTS
• Information systems
→
Database transaction processing;
Database query processing.
KEYWORDS
HTAP Databases; Transaction Processing; Query Processing
ACM Reference Format:
Guoliang Li and Chao Zhang. 2022. HTAP Databases: What is New and
What is Next. In Proceedings of the 2022 International Conference on Manage-
ment of Data (SIGMOD ’22), June 12–17, 2022, Philadelphia, PA, USA. ACM,
Philadelphia, PA, USA, 6 pages. https://doi.org/10.1145/3514221.3522565
1 INTRODUCTION
Background. All organizations are processing more data than ever
at their disposal, and data keeps coming with high velocity, vol-
ume and variety [
26
,
30
,
53
,
55
]. For businesses with data-intensive
applications, it is benecial to have a single HTAP system that
not only can eciently handle on-line transactional processing
(OLTP), but also can perform on-line analytical processing (OLAP)
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA.
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9249-5/22/06.. .$15.00
https://doi.org/10.1145/3514221.3522565
for prompt decision-making. For instance, when equipped with an
HTAP system, entrepreneurs in retail applications can analyze the
latest transaction data in real time and identify the sales trend, then
take timely actions, e.g., roll out advertising campaigns for promis-
ing products [
35
]. In nance applications, vendors can leverage an
HTAP system to process the customer transactions eciently while
detecting the fraudulent transactions simultaneously [16, 36, 47].
HTAP Denition. Hybrid Transactional/Analytical Processing
(HTAP) is an application architecture proposed by a Gartner report
[
35
] at 2014, which utilizes in-memory computing technologies
to enable concurrent analytical and transaction processing on the
same in-memory data store. Such an architecture should elimi-
nate the need of Extract-Transform-Load (ETL) process, thereby
accelerating data analytics and bringing dramatic business innova-
tion. In 2018, Gartner extended the HTAP concept to "In-Process
HTAP" [
15
], an application architecture that supports weaving an-
alytical and transaction processing techniques together as needed
to accomplish the business task. Such a new denition indicates
HTAP is no longer limited to in-memory computing techniques.
Motivation. Over the last few years, numerous database systems
[
18
–
22
,
29
,
31
,
42
,
44
] have been developed to enable HTAP. One
common feature is that they utilize the best of row store and col-
umn store to achieve high quality of HTAP. Nevertheless, they have
disparate storage strategies and processing techniques albeit the
dual-store feature. This main reason for such diversity is that dif-
ferent classes of HTAP systems target at dierent applications. For
instance, it depends on whether OLTP or OLAP is the rst citizen
of the applications, or both are important. It also depends on the re-
quirements of availability, scalability, system performance, and data
freshness [
9
] specied in the service level agreements (SLAs) [
17
].
Consequently, HTAP systems must balance the trade-o between
workload isolation and data freshness due to interweaving work-
loads of OLTP and OLAP. To better harness these HTAP forces for
various applications, it is of paramount importance to study, under-
stand, and compare their key techniques. In this tutorial, we study
HTAP databases that utilize row store and column store together
to eciently handle the mixed workloads of OLTP and OLAP in a
single database system.
Tutorial Overview. We will provide a comprehensive tutorial on
HTAP databases. The intended length of the tutorial is 3 hours. The
tutorial consists of four sections as follows.
(1) HTAP Databases (30 min). This section starts with an intro-
duction to the background of HTAP databases. It provides a classi-
cation according to their storage architectures, then introduces
the main approaches in each category. As shown in Figure 1, it clas-
sies HTAP databases into four categories: (a) Primary Row store
+ In-Memory Column store; (b) Distributed Row Store + Column
Store Replica; (c) Disk Row Store + Distributed Column Store; and
(d) Primary Column Store + Delta Row Store. Then, it presents the
main HTAP techniques and representatives for each architecture.
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA
评论