暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
论文-SQLite的过去、现在和未来 - SQLite Past, Present, and Future.pdf
338
13页
7次
2022-09-02
免费下载
SQLite: Past, Present, and Future
Kevin P. Ganey
University of Wisconsin-Madison
kpganey@wisc.edu
Martin Prammer
University of Wisconsin-Madison
prammer@cs.wisc.edu
Larry Braseld
SQLite
larrybr@sqlite.org
D. Richard Hipp
SQLite
drh@sqlite.org
Dan Kennedy
SQLite
dan@sqlite.org
Jignesh M. Patel
University of Wisconsin-Madison
jignesh@cs.wisc.edu
ABSTRACT
In the two decades following its initial release, SQLite has become
the most widely deployed database engine in existence. Today,
SQLite is found in nearly every smartphone, computer, web browser,
television, and automobile. Several factors are likely responsible for
its ubiquity, including its in-process design, standalone codebase,
extensive test suite, and cross-platform le format. While it sup-
ports complex analytical queries, SQLite is primarily designed for
fast online transaction processing (OLTP), employing row-oriented
execution and a B-tree storage format. However, fueled by the rise
of edge computing and data science, there is a growing need for
ecient in-process online analytical processing (OLAP). DuckDB,
a database engine nicknamed “the SQLite for analytics”, has re-
cently emerged to meet this demand. While DuckDB has shown
strong performance on OLAP benchmarks, it is unclear how SQLite
compares. Furthermore, we are aware of no work that attempts to
identify root causes for SQLite’s performance behavior on OLAP
workloads. In this paper, we discuss SQLite in the context of this
changing workload landscape. We describe how SQLite evolved
from its humble beginnings to the full-featured database engine it
is today. We evaluate the performance of modern SQLite on three
benchmarks, each representing a dierent avor of in-process data
management, including transactional, analytical, and blob process-
ing. We delve into analytical data processing on SQLite, identifying
key bottlenecks and weighing potential solutions. As a result of our
optimizations, SQLite is now up to 4.2X faster on SSB. Finally, we
discuss the future of SQLite, envisioning how it will evolve to meet
new demands and challenges.
PVLDB Reference Format:
Kevin P. Ganey, Martin Prammer, Larry Braseld, D. Richard Hipp, Dan
Kennedy, and Jignesh M. Patel. SQLite: Past, Present, and Future. PVLDB,
15(12): 3535 - 3547, 2022.
doi:10.14778/3554821.3554842
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at
https://github.com/UWHustle/sqlite-past-present-future.
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 15, No. 12 ISSN 2150-8097.
doi:10.14778/3554821.3554842
1 INTRODUCTION
SQLite was initially released in August 2000 as a small library of data
management functions [
29
]. Originally packaged as an extension
to the Tcl programming language, SQLite was born out of the
frustration of debugging a database server running in a separate
process [
53
]. Unlike client-server database systems, which typically
occupy dedicated processes and communicate with applications via
shared memory primitives, SQLite is embedded in the process of
the host application [
33
]. Instead of communicating with a database
server across process boundaries, applications manage a SQLite
database by calling SQLite library functions.
In the decades that followed its initial release, SQLite grew to
become the most widely deployed database engine in existence [
27
].
SQLite is embedded in major web browsers, personal computers,
smart televisions, automotive media systems, and the PHP and
Python programming languages. Furthermore, SQLite is found in
every iOS and Android device, which currently number in the
billions. There are likely over one trillion SQLite databases in active
use. It is estimated that SQLite is one of the most widely deployed
software libraries of any type.
No single factor is likely responsible for SQLite’s popularity.
Instead, in addition to its fundamentally embeddable design, several
characteristics combine to make SQLite useful in a broad range of
scenarios. In particular, SQLite strives to be:
Cross-platform.
A SQLite database is stored in a single
le, which can be freely copied between 32-bit and 64-bit
machines and little-endian and big-endian architectures [
30
].
SQLite can run on any platform with an 8-bit byte, two’s
complement 32-bit and 64-bit integers, and a C compiler.
Due to its stability and portability, SQLite’s le format is a
US Library of Congress recommended storage format for the
preservation of digital content [1].
Compact and self-contained.
The SQLite library is avail-
able as a single C le, consisting about 150 thousand lines
of source code [
31
]. With all features enabled, the compiled
library size can be less than 750 KiB [
17
]. SQLite has no exter-
nal dependencies and requires only a handful of C standard
library functions to operate. SQLite requires no installation
or conguration.
Reliable.
There are over 600 lines of test code for every
line of code in SQLite [
25
]. Tests cover 100% of branches
in the library. The test suite is extremely diverse, including
fuzz tests, boundary value tests, regression tests, and tests
that simulate operating system crashes, power losses, I/O er-
rors, and out-of-memory errors. Due to its reliability, SQLite
3535
is often used in mission-critical applications such as ight
software [36].
Fast.
SQLite can support tens of thousands of transactions
per second. In some cases, SQLite reads and writes blob
data 35% faster and uses 20% less storage space than the
lesystem [
16
]. SQLite’s query planner produces ecient
plans for complex analytical queries [28].
While distinct in many ways, SQLite shares several characteris-
tics with traditional database systems designed for online transac-
tion processing (OLTP). SQLite uses a row-oriented storage format,
where all the columns of a given record are stored in a contiguous
memory region [
23
]. SQLite’s operators act on individual rows,
rather than batches of rows as in vectorized query execution [
32
].
Finally, SQLite provides full ACID guarantees: transactions are
atomic, consistent, isolated, and durable [34].
However, SQLite is used in scenarios that are well outside the
boundaries of conventional OLTP. Well-known uses of SQLite in-
clude processing data on embedded devices and the internet of
things, managing application state as an application le format,
serving website requests, analyzing large datasets, caching enter-
prise data, and transferring data from one system to another [
19
].
A previous study, which traced SQLite activity on mobile phones,
found that SQLite was used for a diverse range of tasks [
39
]. During
the study, the majority of operations performed by SQLite were
single-table scans and key-value lookups. However, the trace in-
cluded much more complex analytical queries that involved joins
between several tables. In addition, a small but signicant portion
of the workloads consisted of OLTP operations.
Recently, the explosive growth of edge computing and interac-
tive data analysis has created a need for ecient in-process online
analytical processing (OLAP). Several database systems have al-
ready been created or adapted for OLAP, including MonetDB [
37
],
Oracle OLAP [
10
], and SAP HANA [
11
]. However, these eorts have
largely focused on client-server architectures. Despite this focus,
ecient OLAP is often desired in situations where a client-server
database system is unwieldy or even impossible to use.
In-process OLAP is an important component of edge computing.
In the internet of things, data analysis is increasingly being pushed
to the edge in order to reduce network trac and server load [
48
]. In
addition, there are often privacy concerns associated with sending
sensitive data across a network. While OLAP database systems
are well-suited for the task of data analysis, edge devices may not
possess the energy or computational resources necessary to host a
client-server database system.
As another example, data scientists frequently perform in-process
OLAP to interactively explore a dataset prior to building a model.
Signicant portions of data science workows involve relational
algebraic operations, such as selection, projection, join, and aggrega-
tion. These operations can be concisely scripted using a dataframe
library such as pandas [
52
]. However, despite their ease of use,
dataframe libraries provide limited query optimization and often
materialize large intermediate results. Furthermore, datasets often
exceed the capacity of memory, which may force the data scientist
to implement hand-rolled buer management and use inecient
storage representations, such as CSV or JSON. An embeddable data-
base engine is better equipped to handle these workloads. For this
reason, SQLite is already a popular tool in data science. Due to its
stability, portability, and space-eciency, the SQLite database le
format is commonly used for sharing datasets. For example, SQLite
is one of the primary le formats used by the popular Kaggle data
science platform [
38
]. The Python
sqlite3
module [
15
] is often
used to carry out SQL operations in data science notebooks. While
SQLite produces ecient query plans and handles datasets much
larger than memory, it is less optimized for analytics compared to
OLAP-specic database systems such as those mentioned above.
These areas are prime targets for a powerful, embeddable OLAP
database engine. DuckDB [
47
] recently emerged to meet this de-
mand. Nicknamed “the SQLite for analytics”, DuckDB is built from
the ground up for in-process OLAP, employing columnar storage,
parallel and vectorized query processing, and multi-version con-
currency control optimized for extract-transform-load (ETL) opera-
tions. While still in a pre-release development phase, DuckDB has
already produced competitive performance on OLAP benchmarks
[
8
]. We believe that DuckDB lls a much needed gap in embeddable
data processing.
While SQLite and DuckDB have both been evaluated separately,
a well-rounded comparison of the two systems is missing from
the literature. As described above, SQLite shares many design el-
ements with OLTP database systems, so one might expect it to
excel on OLTP benchmarks. However, SQLite aims to be as general-
purpose as possible, so competitive performance could reasonably
be expected on a variety of workloads. In contrast, DuckDB is
purpose-built for analytics, so one might expect it to outperform
SQLite on OLAP benchmarks.
In this paper, we provide experimental support for these expec-
tations and quantify their magnitude. For OLAP, we go one level
deeper—we identify specic characteristics of SQLite responsible
for its OLAP performance and then present optimizations that sub-
stantially increase its speed.
The key contributions of this paper are:
We present a historical perspective of SQLite.
We also
present a concise description of its architecture.
We provide a thorough evaluation of SQLite on char-
acteristic workloads, using DuckDB as a baseline.
Our
evaluation includes benchmarks that represent diverse a-
vors of in-process data management.
We optimize SQLite for analytical data processing.
We
identify key bottlenecks in SQLite and discuss the advantages
and disadvantages of potential solutions. We integrate our
optimizations into SQLite, resulting in overall 4.2X speedup
on SSB.
We identify several performance measures specic to
embeddable database engines
, including library footprint
and blob processing performance.
We provide a high-level description of potential fu-
ture directions for further performance improvement
in SQLite.
The rest of this paper is organized as follows. Section 2 provides
an architectural overview of SQLite and rationale for its design.
Section 3 discusses SQLite in the context of evolving workloads
and hardware. Section 4 evaluates SQLite and DuckDB on a variety
3536
of 13
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜