of IoV workload patterns, which can guide us to design well-suited
database systems for IoV scenarios.
To the best of our knowledge, no prior work extensively studies
IoV workloads. In this paper, we provide the rst empirical study of
IoV workloads to investigate their unique requirements to underly-
ing database systems. We explore both data read and write patterns
from three major automakers and TSPs in China (detailed in §2),
and summarize three key challenges for IoV data management:
C1.
Extremely high data ingestion rate. IoV systems are write-
intensive, where the data trac can easily reach over 1GB per
second and 80TB per day. This demands the underlying data-
base to sustain extremely high write throughput. In addition,
the huge volumes of data resulting from the high ingestion
rate further put enormous pressure on storage costs.
C2.
A huge number of metrics per vehicle with update-style
writing. A single vehicle can generate over 2500 distinct time-
series metrics simultaneously, which is nearly an order of
magnitude more than what a traditional time-series database
can handle. Moreover, dierent components in a vehicle upload
their own metrics data independently, which means that a
write request to the database will not involve all metrics of the
vehicle. Hence, the underlying database has to handle many
small writes with an incomplete metric format.
C3.
Diverse patterns for querying a small or large number
of metrics. To support numerous downstream services, the
database has to be capable of handling dierent query types
with high concurrency and low latency. Dierent queries may
involve dierent metrics from the massive dataset. Some of
them retrieve a large number of metrics (e.g., applications that
fetch data for comprehensive analysis), while others need only
a small subset (e.g., engineers that query related metrics for
remote failure diagnosis).
We note that existing time-series databases and any other databases
are unable to fully address above three IoV workload challenges.
Wide-column databases, e.g., HBase [
8
], and column-oriented time-
series databases, e.g., InuxDB [
13
], have to consume unaordable
amounts of computation and space resources to index massive met-
rics, resulting in unacceptably poor write throughputs (challenge
C1&C2). When querying massive metrics, time-series databases
need to perform independent retrievals for each metric, introducing
a vast amount of I/Os; HBase, which treats each data point as a
key-value entry, has to spend a lot of CPU time on performing key
comparisons (challenge C3). To address the issues of writing and
querying massive metrics, document-oriented databases like Mon-
goDB [
14
], which models all metric values from a vehicle generated
at the same timestamp as a exible schema-free document, seem
to be a suitable solution. However, due to their document-oriented
storage layout, for those queries involving a small subset of metrics,
they have to fetch all metrics (i.e., the entire document), leading to
prohibitive read amplication (challenge C3). Moreover, they also
struggle when metric values in one document arrive in multiple
rounds (challenge C2), as each round is treated as an extra update
that has to read the target document out rst.
After spotting the gap between IoV workload challenges and
existing database solutions, we propose Lindorm-UWC (ultra-wide-
column) for data management in IoV systems. To accommodate
massive amounts of IoV data, Lindorm-UWC has a distributed ar-
chitecture that partitions data by vehicles and time and supports au-
tomated load balancing. In order to eciently handle multi-metric
IoV data, we design an ultra-wide-column storage engine based on
the Log Structured Merge tree (LSM-tree) [
38
], where each partition
in Lindorm-UWC employs an independent one. The storage engine
implements two mechanisms to accelerate the ingesting of multi-
metric data: rst, multiple metric values contained in one write
request are consolidated into a single column to eliminate indexing
and grouping of dierent metrics on the write path; second, each
write request is processed in an append-only way instead of in-
place updating on existing data of that vehicle. To eciently handle
various query patterns, Lindorm-UWC organizes on-disk data in
both row-oriented and column-oriented storage formats, allowing
it to choose and read from a suitable le format that can reduce read
amplication. To lower storage cost, we employ a tiered cold-hot
data storage layout, ooading less-frequently accessed cold data to
poor-performing but cheap storage media. Lindorm-UWC has been
serving IoV enterprise customers on Alibaba Cloud since 2019. It
manages tens of petabytes of IoV data and handles more than 10
million requests per second.
Our major contributions are summarized as follows:
•
We conduct the rst empirical study to highlight the workload
characteristics in real-world IoV systems. By analyzing data in-
gestion and query patterns, we obtain a series of valuable ndings
that can drive the database design for IoV workloads.
•
We propose Lindorm-UWC, a database designed for IoV systems.
To manage the vast amount of IoV data, Lindorm-UWC employs
a tailored distributed architecture and a cold-hot data separation
mechanism. To eciently handle the writing and querying of
multi-metric data, we innovatively design an ultra-wide-column
storage engine, which supports high write throughput and pro-
vides ecient queries of diverse patterns.
•
To evaluate the eectiveness of Lindorm-UWC’s design, we de-
velop a benchmark suite based on the characteristics of real-
world IoV workloads, and then conduct comprehensive exper-
iments with it. We compare Lindorm-UWC with three typical
databases as baselines: MongoDB, HBase, and InuxDB. The
experimental results indicate that Lindorm-UWC signicantly
outperforms our baselines. In a variety of workloads, Lindorm-
UWC’s write performance is from 79% to an order of magnitude
higher. For query eciency, Lindorm-UWC can always sustain
high concurrency for any queries retrieving either a small or a
large number of metrics.
2 EMPIRICAL STUDY
2.1 IoV System Background
We rst introduce how vehicles generate data in the IoV system,
and how the data is used by dierent application services. The IoV
system is typically a three-tier structure [
20
,
32
], consisting of the
physical layer, the connectivity layer, and the cloud layer.
The physical layer refers to vehicles equipped with network
devices and a large number of sensors. The running vehicles contin-
uously collect and process information about the road environment,
vehicle components, and vehicle running status through the sen-
sors, and then upload them to the connectivity layer through the
4118
文档被以下合辑收录
评论