暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
Data Morphing- An Adaptive, Cache-Conscious Storage Technique.pdf
261
12页
0次
2022-06-01
免费下载
Data Morphing: An Adaptive, Cache-Conscious Storage
Technique
Richard A. Hankins Jignesh M. Patel
University of Michigan
1301 Beal Avenue; Ann Arbor, MI 48109-2122; USA
{hankinsr, jignesh}@eecs.umich.edu
Abstract
The number of processor cache misses has a crit-
ical impact on the performance of DBMSs run-
ning on servers with large main-memory configu-
rations. In turn, the cache utilization of database
systems is highly dependent on the physical or-
ganization of the records in main-memory. A re-
cently proposed storage model, called PAX, was
shown to greatly improve the performance of se-
quential file-scan operations when compared to
the commonly implemented N-ary storage model.
However, the PAX storage model can also demon-
strate poor cache utilization for other common op-
erations, such as index scans. Under a workload
of heterogenous database operations, neither the
PAX storage model nor the N-ary storage model
is optimal.
In this paper, we propose a flexible data stor-
age technique called Data Morphing. Using
Data Morphing, a cache-efficient attribute layout,
called a partition, is first determined through an
analysis of the query workload. This partition is
then used as a template for storing data in a cache-
efficient way. We present two algorithms for com-
puting partitions, and also present a versatile stor-
age model that accommodates the dynamic reor-
ganization of the attributes in a file. Finally, we
experimentally demonstrate that the Data Morph-
ing technique provides a significant performance
improvement over both the traditional N-ary stor-
age model and the PAX model.
Permission to copy without fee all or part of this material is granted pro-
vided that the copies are not made or distributed for direct commercial
advantage, the VLDB copyright notice and the title of the publication and
its date appear, and notice is given that copying is by permission of the
Very Large Data Base Endowment. To copy otherwise, or to republish,
requires a fee and/or special permission from the Endowment.
Proceedings of the 29th VLDB Conference,
Berlin, Germany, 2003
1 Introduction
Database systems have traditionally focused on improving
the overall system performance by minimizing the num-
ber of disk I/O operations. Disk I/O has traditionally been
the most important component of the memory access hi-
erarchy, but the current trend in Random Access Memory
(RAM) cost and capacity now makes it necessary to re-
evaluate this focus. The cost of RAM is decreasing rapidly,
while the capacity of RAM is increasing exponentially.
This trend was recognized by the authors of the Asilomar
report where they predicted that, in the future, all but the
largest data sets will reside entirely in main-memory [5].
As increasing amounts of data become main-memory res-
ident, the performance bottleneck becomes the accesses to
main-memory rather than the accesses to disk [7]. Because
of this shift, intelligently storing and accessing the data in
main-memory is becoming critical to the performance of
database systems [2, 7, 17, 22].
Unfortunately, the main-memory bottleneck is becom-
ing more severe due to the growing disparity between
processor speeds and the latency in accessing the main-
memory. To alleviate this disparity, modern processors em-
ploy a hierarchy of low-latency memory, called caches, that
reside between the processor and the RAM. Each cache
stores the most frequently accessed data blocks (includ-
ing instructions), reducing the amount of data that must be
retrieved directly from the main-memory. Starting at the
first level of cache memory, each subsequent level of cache
stores larger amounts of data, but each subsequent level of
cache is also more expensive to access. Modern processors
typically contain two, or even three, levels of processor
cache, with each level storing both instructions and data.
Research has shown that database systems incur a signif-
icant amount of second level (L2) data cache misses, and
consequently, the L2 cache utilization is critical to overall
performance [3]. In this paper we only focus on techniques
for reducing the L2 cache misses, and simply refer to them
as cache misses for the remainder of this paper.
Critical to system performance is the efficient retrieval
and update of data records inside the database. Database
systems typically store records of data in files, where each
file consists of a collection of slotted-pages, and each
slotted-page contains a collection of records. A page is
a fixed-size block of allocated memory. A slotted-page is
a page that contains an array of byte-offsets that reference
the start of each record allocated within the page. Typi-
cally, each attribute of the record is stored in consecutive
memory addresses on the slotted page, in what is called the
N-ary storage model. Common implementations of the N-
ary storage model place variable length attributes at the end
of the fixed-length attributes, and use fixed length place-
holders to reference the relocated attributes.
A recently proposed storage model, called PAX, ver-
tically decomposes each record, storing each attribute in
subdivisions of a page, called mini-pages [2]. The PAX
storage model improves the cache utilization for queries
that access only a few attributes from a high percentage
of records, such as during a sequential file scan. This im-
provement in cache utilization is a direct result of being
able to pack attributes of many different records into the
same cache line. However, for access plans where only a
fraction of the records are accessed, such as during an index
scan, the PAX storage model can result in a greater number
of cache misses. In such cases, N-ary may actually perform
better, especially as the number of attributes accessed per
record increases.
As previously demonstrated, both the N-ary and the
PAX storage models work well for different families of ac-
cess plans. A choice between the two storage models is dif-
ficult as the query workload may be large or may vary over
time. A better storage model should incorporate the best
characteristics of both models; namely, vertical decompo-
sition and groups of sequentially allocated attributes. This
storage model should also provide the flexibility to adapt to
changing workloads. In this paper, we propose a novel data
storage technique, called Data Morphing (DM) that meets
these goals.
This paper makes the following contributions:
We present a flexible page architecture that is a gener-
alization of the previously proposed PAX page architec-
ture. In the Data Morphing page architecture, attributes
of the same tuple can be stored in non-contiguous groups,
which increases the spatial locality of memory accesses.
As a result, fewer cache misses are incurred during query
processing.
We present two algorithms for calculating the attribute
groups that are stored on each page. The Naive algo-
rithm performs an exhaustive search of the possible at-
tribute layouts, but the algorithm is expensive to compute
for relations with a large number of attributes. The Hill-
Climb algorithm performs a greedy search of the layout
space, trading optimality for faster time complexity.
We present an empirical evaluation of the Data Morphing
technique. We show that the partitioning algorithms cal-
culate attribute groupings that reduce the number of cache
misses incurred during query execution. As a direct result
of improving the cache utilization, our prototype DBMS
implementing the DM technique can evaluate queries up
to 45% faster than the N-ary storage model and up to 25%
faster than the PAX model.
The remainder of this paper is organized as follows:
Section 2 presents an example motivating the need for Data
Morphing. Section 3 provides definitions for the common
terms used in the subsequent sections. In Section 4, we
present the Data Morphing technique. A detailed experi-
mental evaluation is presented in Section 5. Related work
is discussed in Section 6, and Section 7 contains our con-
clusions.
2 Motivating Example
As described in the introduction, neither the N-ary storage
model nor the PAX storage model provide the optimal stor-
age model for data. To illustrate this fact, we will use the
relation and query shown in Figure 1 in the following ex-
ample.
Client (id:Integer,priority:Integer,name: Varchar(32),
usage: Real, address: Varchar(32),
location: Integer) key (id);
SELECT location, usage
FROM Client WHERE priority < 12
Figure 1: Client Relation and Query
In Figure 1, the Client relation consists of six at-
tributes of types Integer, Real, and Varchar. The Integer and
Real data types are each four bytes in size. The Varchar(32)
data type is a variable length string with a maximum length
of 32 bytes. The selectivity of the predicate on priority
is 12.5%, and the processor cache line size is 32 bytes. Fig-
ure 2 shows a single record in the Client relation as it is
stored on a page using the N-ary storage model; as in typ-
ical implementations, the variable-length strings are stored
at the end of the fixed-length attributes. To visualize the at-
tributes that are read into the processor cache as the result
of a cache miss, the width of the diagram is shown to be the
same size as a cache line.
Assuming that a sequential file scan is used to answer
this query, the records are retrieved as follows. The first
priority attribute is requested from memory. This
memory access incurs a cache miss to read the contigu-
ous block of memory containing the attribute into the pro-
cessor cache. If the v alue of the priority attribute is
less than 12,thelocation and usage attributes are ac-
cessed. Because the location and usage attributes are
contained in the processor cache, retrieving them incurs no
additional cache misses. The select operator then continues
with the next record, until all of the records have been read.
The cost of answering this query totals approximately one
cache miss per record.
The PAX storage model vertically decomposes the
records into zones, called mini-pages, as shown in Fig-
ure 3; for simplicity, we do not show all of the details.
In the figure, the attributes are presented as belonging
of 12
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜