SIGMOD-Companion '24, June 9–15, 2024, Santiago, AA, Chile
Patrick Perez, Aleksandrs Santars, Michael Chen, Matthew Olan, Daniel
C. Zilio, Imran Sayyid, Humphrey Li, Ketan Rampurkar, Krishna K.
Ramachandran, Yiren Shen. 2024. Native Cloud Object Storage in Db2
Warehouse: Implementing a Fast and Cost-Ecient Storage Architecture.
In Companion of the 2024 International Conference on Management of Data
(SIGMOD-Companion '24), June 9--15, 2024, Santiago, AA, Chile, 12 pages.
hps://doi.org/10.1145/3626246.3653393
1 INTRODUCTION
The advent of hyperscale cloud infrastructure from public cloud
vendors such as AWS, Azure, and Google has fundamentally
altered the economics of analytic data systems, with the most
prominent impacts seen in the area of storage. In the domain of
SQL Data Warehousing, the availability of low cost effectively
unlimited capacity cloud object storage [1] [2] (COS) has given
rise to modern cloud architectures that exploit this independently
scalable storage in combination with compute servers with
ephemeral local NVMe caches to deliver the high performance and
advanced SQL capabilities of traditional data warehousing
systems with the low storage cost and scalability more typically
associated with data lakes. The price / performance advantages of
these cloud architectures have made it difficult for more
traditional data warehouse architectures built on higher
performance, higher cost storage that is tightly coupled to the
compute to compete in the public cloud sphere.
While this shift has seen the rise of new "born in the cloud"
data warehousing systems (most notably Snowflake [3]), for
vendors with established non-cloud data warehousing
technologies, significant advantages can be achieved if they can
capitalize on the depth of innovation in their existing systems by
modernizing them to a cloud native storage architecture. The
promise is the substantial savings in time and effort by avoiding
reinventing sophisticated, high performance, battle tested
capabilities. The challenge is how (and even if) such an uplift can
be performed in a manner that is both cost effective and doesn’t
compromise the capabilities and performance of the original
technology in the process.
In this paper we describe the effort we undertook to reinvent
IBM’s Db2 Warehouse [4] to adopt a modern storage architecture
and achieve cost and performance competitiveness on the public
cloud, while retaining all its substantial SQL and transactional
capabilities.
1.1 Challenges of Cloud Object Storage
When moving a database system storage that was designed for a
traditional storage subsystem, like network-attached block
storage [5] [6], to COS, a key challenge is the differences in I/O
characteristics. These differences are both in terms of throughput
and latency: COS is seen as a storage solution that is throughput
optimized whereas block storage is a storage solution that is
balanced for both throughput and latency [7]. In the case of
throughput, assuming adequate server side resources for the COS
implementation, the throughput limit for COS is imposed by the
network bandwidth available to the compute nodes performing
the I/O and the parallelism utilized to maximize the use of that
bandwidth, whereas for block storage this limit is imposed by the
bandwidth each device attached to the compute nodes can accept,
and throughput scalability is achieved by attaching more devices
to each compute node or increasing this bandwidth (and also on
the parallel access to each of those to maximize the throughput).
In the case of latency, COS throughout the industry is known to
offer a significantly higher fixed latency per request when
compared to block storage (~100-300ms vs ~10-30ms, a 10X
difference), resulting in the need to perform I/O operations in
significantly larger block sizes (order of 10s of MBs for individual
objects vs order of KBs for blocks in block storage access) to better
amortize the higher latency cost per operation, as we will show in
the Section 4 of this paper. Further complicating the picture is the
fact that COS writes operate at the object granularity, meaning
that modifying an existing object requires rewriting it in its
entirety.
Due to these I/O and performance differences, a direct
adaptation of the existing block storage optimized page storage to
target COS would result in very poor performance due to the
latency impact on small page I/O. A naive approach to improve
over this would be to group adjacent data pages into larger blocks
to accommodate the larger object sizes that are ideal for COS. In
Db2 for example, data pages within a table space are organized
into logical blocks of contiguous data pages called “extents”,
which could be configured as the object size to store in object
store. This would also align with existing data organization
optimizations such as Db2’s BLU Acceleration (Db2’s column
store engine) where pages from individual column groups are
grouped into extents to improve data locality and maximize read
performance for analytic workloads [8]. With such an approach,
each extent would be stored in COS as an independent object and
would retain the existing data clustering characteristics. This
approach, however, suffers from multiple issues. First, the larger
object sizes required for COS I/O efficiency would require a
significant increase in the size of these extents, making it difficult
to maintain data locality and space efficiency across varied insert-
update-delete (IUD) patterns (in Db2, this would result in the need
to enlarge extents from the current default size of 128KB to at least
32MB, that is, move from storing 4 32KB data pages per extent to
storing 1024 32KB data pages per extent, potentially incurring
substantial space wastage from partially filled column extents).
The space wastage incurred would manifest not only in added
write amplification, but also read amplification and a significant
loss of read cache efficiency as a result. Furthermore, any random
data page modification patterns would result in the need to
synchronously rewrite the entire MB sized objects, resulting in
high write amplification. For these reasons it was evident from the
early stages that in order to retain the performance characteristics
of Db2 Warehouse we would need a fundamentally different page
storage model – one that could efficiently translate the pattern of
small random page I/Os inherent in the existing Db2 page model
into the larger sequential object writes required for efficient usage
of COS storage.
1.2 LSM tree based page storage model
One of the main challenges for the development of a new storage
layer for Db2 Warehouse was the need to preserve the existing
MPP engine capabilities, optimizations, and behaviors, which
评论