论文-SQLite的过去、现在和未来 - SQLite Past, Present, and Future.pdf

章芋文

338

13页

7次

2022-09-02

免费下载

SQLite: Past, Present, and Future

Kevin P. Ganey

University of Wisconsin-Madison

kpganey@wisc.edu

Martin Prammer

University of Wisconsin-Madison

prammer@cs.wisc.edu

Larry Braseld

SQLite

larrybr@sqlite.org

D. Richard Hipp

SQLite

drh@sqlite.org

Dan Kennedy

SQLite

dan@sqlite.org

Jignesh M. Patel

University of Wisconsin-Madison

jignesh@cs.wisc.edu

ABSTRACT

In the two decades following its initial release, SQLite has become

the most widely deployed database engine in existence. Today,

SQLite is found in nearly every smartphone, computer, web browser,

television, and automobile. Several factors are likely responsible for

its ubiquity, including its in-process design, standalone codebase,

extensive test suite, and cross-platform le format. While it sup-

ports complex analytical queries, SQLite is primarily designed for

fast online transaction processing (OLTP), employing row-oriented

execution and a B-tree storage format. However, fueled by the rise

of edge computing and data science, there is a growing need for

ecient in-process online analytical processing (OLAP). DuckDB,

a database engine nicknamed “the SQLite for analytics”, has re-

cently emerged to meet this demand. While DuckDB has shown

strong performance on OLAP benchmarks, it is unclear how SQLite

compares. Furthermore, we are aware of no work that attempts to

identify root causes for SQLite’s performance behavior on OLAP

workloads. In this paper, we discuss SQLite in the context of this

changing workload landscape. We describe how SQLite evolved

from its humble beginnings to the full-featured database engine it

is today. We evaluate the performance of modern SQLite on three

benchmarks, each representing a dierent avor of in-process data

management, including transactional, analytical, and blob process-

ing. We delve into analytical data processing on SQLite, identifying

key bottlenecks and weighing potential solutions. As a result of our

optimizations, SQLite is now up to 4.2X faster on SSB. Finally, we

discuss the future of SQLite, envisioning how it will evolve to meet

new demands and challenges.

PVLDB Reference Format:

Kevin P. Ganey, Martin Prammer, Larry Braseld, D. Richard Hipp, Dan

Kennedy, and Jignesh M. Patel. SQLite: Past, Present, and Future. PVLDB,

15(12): 3535 - 3547, 2022.

doi:10.14778/3554821.3554842

PVLDB Artifact Availability:

The source code, data, and/or other artifacts have been made available at

https://github.com/UWHustle/sqlite-past-present-future.

This work is licensed under the Creative Commons BY-NC-ND 4.0 International

License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of

this license. For any use beyond those covered by this license, obtain permission by

emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights

licensed to the VLDB Endowment.

Proceedings of the VLDB Endowment, Vol. 15, No. 12 ISSN 2150-8097.

doi:10.14778/3554821.3554842

1 INTRODUCTION

SQLite was initially released in August 2000 as a small library of data

management functions [

]. Originally packaged as an extension

to the Tcl programming language, SQLite was born out of the

frustration of debugging a database server running in a separate

process [

]. Unlike client-server database systems, which typically

occupy dedicated processes and communicate with applications via

shared memory primitives, SQLite is embedded in the process of

the host application [

]. Instead of communicating with a database

server across process boundaries, applications manage a SQLite

database by calling SQLite library functions.

In the decades that followed its initial release, SQLite grew to

become the most widely deployed database engine in existence [

SQLite is embedded in major web browsers, personal computers,

smart televisions, automotive media systems, and the PHP and

Python programming languages. Furthermore, SQLite is found in

every iOS and Android device, which currently number in the

billions. There are likely over one trillion SQLite databases in active

use. It is estimated that SQLite is one of the most widely deployed

software libraries of any type.

No single factor is likely responsible for SQLite’s popularity.

Instead, in addition to its fundamentally embeddable design, several

characteristics combine to make SQLite useful in a broad range of

scenarios. In particular, SQLite strives to be:

• Cross-platform.

A SQLite database is stored in a single

le, which can be freely copied between 32-bit and 64-bit

machines and little-endian and big-endian architectures [

SQLite can run on any platform with an 8-bit byte, two’s

complement 32-bit and 64-bit integers, and a C compiler.

Due to its stability and portability, SQLite’s le format is a

US Library of Congress recommended storage format for the

preservation of digital content [1].

• Compact and self-contained.

The SQLite library is avail-

able as a single C le, consisting about 150 thousand lines

of source code [

]. With all features enabled, the compiled

library size can be less than 750 KiB [

]. SQLite has no exter-

nal dependencies and requires only a handful of C standard

library functions to operate. SQLite requires no installation

or conguration.

• Reliable.

There are over 600 lines of test code for every

line of code in SQLite [

]. Tests cover 100% of branches

in the library. The test suite is extremely diverse, including

fuzz tests, boundary value tests, regression tests, and tests

that simulate operating system crashes, power losses, I/O er-

rors, and out-of-memory errors. Due to its reliability, SQLite

3535

is often used in mission-critical applications such as ight

software [36].

• Fast.

SQLite can support tens of thousands of transactions

per second. In some cases, SQLite reads and writes blob

data 35% faster and uses 20% less storage space than the

lesystem [

]. SQLite’s query planner produces ecient

plans for complex analytical queries [28].

While distinct in many ways, SQLite shares several characteris-

tics with traditional database systems designed for online transac-

tion processing (OLTP). SQLite uses a row-oriented storage format,

where all the columns of a given record are stored in a contiguous

memory region [

]. SQLite’s operators act on individual rows,

rather than batches of rows as in vectorized query execution [

Finally, SQLite provides full ACID guarantees: transactions are

atomic, consistent, isolated, and durable [34].

However, SQLite is used in scenarios that are well outside the

boundaries of conventional OLTP. Well-known uses of SQLite in-

clude processing data on embedded devices and the internet of

things, managing application state as an application le format,

serving website requests, analyzing large datasets, caching enter-

prise data, and transferring data from one system to another [

A previous study, which traced SQLite activity on mobile phones,

found that SQLite was used for a diverse range of tasks [

]. During

the study, the majority of operations performed by SQLite were

single-table scans and key-value lookups. However, the trace in-

cluded much more complex analytical queries that involved joins

between several tables. In addition, a small but signicant portion

of the workloads consisted of OLTP operations.

Recently, the explosive growth of edge computing and interac-

tive data analysis has created a need for ecient in-process online

analytical processing (OLAP). Several database systems have al-

ready been created or adapted for OLAP, including MonetDB [

Oracle OLAP [

], and SAP HANA [

]. However, these eorts have

largely focused on client-server architectures. Despite this focus,

ecient OLAP is often desired in situations where a client-server

database system is unwieldy or even impossible to use.

In-process OLAP is an important component of edge computing.

In the internet of things, data analysis is increasingly being pushed

to the edge in order to reduce network trac and server load [

]. In

addition, there are often privacy concerns associated with sending

sensitive data across a network. While OLAP database systems

are well-suited for the task of data analysis, edge devices may not

possess the energy or computational resources necessary to host a

client-server database system.

As another example, data scientists frequently perform in-process

OLAP to interactively explore a dataset prior to building a model.

Signicant portions of data science workows involve relational

algebraic operations, such as selection, projection, join, and aggrega-

tion. These operations can be concisely scripted using a dataframe

library such as pandas [

]. However, despite their ease of use,

dataframe libraries provide limited query optimization and often

materialize large intermediate results. Furthermore, datasets often

exceed the capacity of memory, which may force the data scientist

to implement hand-rolled buer management and use inecient

storage representations, such as CSV or JSON. An embeddable data-

base engine is better equipped to handle these workloads. For this

reason, SQLite is already a popular tool in data science. Due to its

stability, portability, and space-eciency, the SQLite database le

format is commonly used for sharing datasets. For example, SQLite

is one of the primary le formats used by the popular Kaggle data

science platform [

]. The Python

sqlite3

module [

] is often

used to carry out SQL operations in data science notebooks. While

SQLite produces ecient query plans and handles datasets much

larger than memory, it is less optimized for analytics compared to

OLAP-specic database systems such as those mentioned above.

These areas are prime targets for a powerful, embeddable OLAP

database engine. DuckDB [

] recently emerged to meet this de-

mand. Nicknamed “the SQLite for analytics”, DuckDB is built from

the ground up for in-process OLAP, employing columnar storage,

parallel and vectorized query processing, and multi-version con-

currency control optimized for extract-transform-load (ETL) opera-

tions. While still in a pre-release development phase, DuckDB has

already produced competitive performance on OLAP benchmarks

[

]. We believe that DuckDB lls a much needed gap in embeddable

data processing.

While SQLite and DuckDB have both been evaluated separately,

a well-rounded comparison of the two systems is missing from

the literature. As described above, SQLite shares many design el-

ements with OLTP database systems, so one might expect it to

excel on OLTP benchmarks. However, SQLite aims to be as general-

purpose as possible, so competitive performance could reasonably

be expected on a variety of workloads. In contrast, DuckDB is

purpose-built for analytics, so one might expect it to outperform

SQLite on OLAP benchmarks.

In this paper, we provide experimental support for these expec-

tations and quantify their magnitude. For OLAP, we go one level

deeper—we identify specic characteristics of SQLite responsible

for its OLAP performance and then present optimizations that sub-

stantially increase its speed.

The key contributions of this paper are:

• We present a historical perspective of SQLite.

We also

present a concise description of its architecture.

• We provide a thorough evaluation of SQLite on char-

acteristic workloads, using DuckDB as a baseline.

Our

evaluation includes benchmarks that represent diverse a-

vors of in-process data management.

• We optimize SQLite for analytical data processing.

identify key bottlenecks in SQLite and discuss the advantages

and disadvantages of potential solutions. We integrate our

optimizations into SQLite, resulting in overall 4.2X speedup

on SSB.

• We identify several performance measures specic to

embeddable database engines

, including library footprint

and blob processing performance.

• We provide a high-level description of potential fu-

ture directions for further performance improvement

in SQLite.

The rest of this paper is organized as follows. Section 2 provides

an architectural overview of SQLite and rationale for its design.

Section 3 discusses SQLite in the context of evolving workloads

and hardware. Section 4 evaluates SQLite and DuckDB on a variety

3536

of 13

免费下载

sqlite paper

关注

评论