
Y. Yang et al.: EdgeDB: Efficient Time-Series Database for Edge Computing
FIGURE 1. A comparison of cloud-centralized framework vs edge-database based framework for IoT. Cloud-centralized framework needs to
transfer all time-series data to remote cloud for data management with significant network overhead. Edge-database based framework can utilize
the edge node to locally process the collected data, and transfers important data and aggregated values to cloud, then collaborates with the
centralized databases to perform detailed queries.
Nevertheless, it is desirable for the edge nodes to accommo-
date not only high-throughput data ingestion but also real-
time data retrieval from both fresh and historical data in local
storage to satisfy the requirements of IoT applications.
In this paper, we present EdgeDB, an efficient time-series
database with edge servers (shown in Figure 1b) designed to
manage thousands of high-sampling-frequency sensors while
providing higher insert performance, query performance, and
write speed, with lower resource requirement than existing
databases. The key idea behind EdgeDB is to design an opti-
mal organization and process flow by efficiently combining
multiple online streams in both query-friendly and store-
friendly ways, enabled by the three key mechanisms of multi-
stream merging, indexing, and storing. The multi-stream
merging mechanism is designed to compactly re-organizes
multiple correlated data streams within a time window into
a tablet. Then, the multi-stream indexing mechanism, called
Time Partitioned Elastic Index (TPEI), is proposed to index
these tablets efficiently, enabling real-time queries. Finally,
a write-optimized strategy is developed to combine multiple
tablets into a group to allow fast inter-stream accesses.
In summary, the contributions of this paper are as follows:
1) We propose a multi-stream merging mechanism
to compactly organize multiple correlated streams
together at runtime, supporting highly efficient inser-
tion and join query operations; and introduce Time
Partitioned Elastic Index to accelerate time-range
queries with small memory overheads.
2) We present Time Merged Tree to merge multiple tablets
into a large group to flush to the storage devices with a
single write operation, to improve the write throughput
and to speed up inter-tablet join query operations.
3) We implement and evaluate the EdgeDB prototype.
The experimental results driven by real-world datasets
demonstrate that EdgeDB outperforms a state-of-the-
art time-series database BTrDB by up to 3.6× in write
throughput, 2.2× in insert throughput, and up to 67%
in query latency, with lower memory consumption.
The remainder of this paper is organized as follows:
Section II presents background and motivation. The design
details of EdgeDB are described in Section III. In Section IV,
we evaluate EdgeDB with experimental results. Section V
concludes this paper.
II. BACKGROUND AND MOTIVATION
A. EDGE COMPUTING OF IoT
Internet of Things (IoT) is poised to fundamentally
change how we interact with our surrounding environments
[31], [35]. By deploying a variety of increasing numbers of
sensors with high-sampling-frequency, we are able to per-
ceive the surrounding environment quickly and clearly. More
importantly, we can make informed and timely decisions
in emergency situations using the sensor-generated data.
However, the considerable volumes of data make the long-
distance networks a severe performance bottleneck, as illus-
trated in Figure 1a, because of a long time required for data
transfer to remote servers. For example, a small electricity
grid with 1,000 smart meters will produce 22.4 MB data
per second, and the data must be transferred to datacenters
over intermittent LTE with high transmission delays [2]. Even
though these servers have powerful hardware, the high trans-
fer latency makes it difficult, if not impossible, for the cloud
to make real-time decisions based on these data. To overcome
this problem, edge computing is introduced to provide near-
data processing [15], [33].
With the help of edge computing, we have the poten-
tial to store and process all collected data in a timely
manner on the edge nodes, instead of remote servers. For
example, an autonomous vehicle generates gigabytes of
data every second [6] that require real-time processing to
make correct decisions under various circumstances. The
vehicle-mounted computer is a typical edge node, which can
manage and analyze these data locally, and provide quick
responses or advance warnings for the driver.
Therefore, edge nodes should be able to support extremely
high insertion throughput, as well as real-time responses to
various types of queries. For instance, users not only need to
query the raw data tuples of any time-series stream to analyze
detailed phenomena, but also need to get the time-range based
aggregated values to see the holistic views, or perform join
query to get the correlated data tuples from a number of
streams within a period of time to further comprehensively
analyze these streams. As a concrete example, in order to cal-
culate the Pearson Correlation Coefficients among all streams
within a (T1, T2) time period, we need to first perform
‘‘Select * from all streams where Time>T1 & Time<T2’’
to get the data of these streams for further processing.
142296 VOLUME 7, 2019
评论