暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
VLDB2024_Lindorm-UWC:An Ultra-Wide-Column Database for Internet of Vehicles_阿里云.pdf
324
13页
5次
2024-09-09
免费下载
Lindorm-UWC: An Ultra-Wide-Column Database for Internet of
Vehicles
Qianyu Ouyang
Chunhui Shen
§
Wenlong Yang
{ouyangqianyu.oyqy,tianwu.sch,zhengyan.ywl,tianyu.yp,xiaoqiang.xiao,leijianhui.ljh,yadong.cyd,qilu.zql,wangxiang}
{linyong.ly,qingyi.mqy,jizhicheng.jzc,mw371030,mingyan.zc,sh.wang,zwei,lifeifei,jingren.zhou}@alibaba-inc.com
Peng Yu
Qiang Xiao
Jianhui Lei
Yadong Chen
Tsinghua University
Qilu Zhong
Xiang Wang
Yong Lin
Qingyi Meng
Zhejiang University
§
Zhicheng Ji
Wei Meng
Cen Zheng
Sheng Wang
Alibaba Cloud
Dan Pei
Wei Zhang
Feifei Li
Jingren Zhou
ABSTRACT
In the Internet of Vehicle (IoV) systems, intelligent vehicles generate
huge amounts of data that supports diverse services and applica-
tions. In practice, database systems are deployed in the cloud to
manage data uploaded from the vehicle side and provide real-time
query capacities. However, existing database systems are ill-suited
because IoV data contains a large number of metrics and is writ-
ten at an extremely high throughput. To better understand IoV
data and corresponding challenges to underlying database systems,
we conduct the rst extensive empirical study of real-world IoV
workloads. According to our ndings from the study, we design
Lindorm-UWC as a superior database for IoV systems. It imple-
ments a distributed architecture and a cold/hot data separation
mechanism to accommodate massive amounts of IoV data. In each
data partition, it deploys an ultra-wide-column storage engine to
eciently handle the query and ingestion of multi-metric data. We
evaluate Lindorm-UWC under dierent data scales and various
types of query. Our experimental results show that it can always
achieve higher write throughput (over 79% increase) and competi-
tive query performance compared to various alternative solutions.
Lindorm-UWC has been serving IoV enterprise customers on Al-
ibaba Cloud since 2019, managing tens of petabytes of IoV data.
PVLDB Reference Format:
Qianyu Ouyang, Chunhui Shen, Wenlong Yang, Peng Yu, Qiang Xiao,
Jianhui Lei, Yadong Chen, Qilu Zhong, Xiang Wang, Yong Lin, Qingyi
Meng, Zhicheng Ji, Wei Meng, Cen Zheng, Sheng Wang, Dan Pei, Wei
Zhang, Feifei Li, Jingren Zhou. Lindorm-UWC: An Ultra-Wide-Column
Database for Internet of Vehicles. PVLDB, 17(12): 4117 - 4129, 2024.
doi:10.14778/3685800.3685831
1 INTRODUCTION
With the development of information and communication technol-
ogy (ICT) as well as in-vehicle sensing technology, the automotive
industry is undergoing a signicant transformation—intelligence
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 17, No. 12 ISSN 2150-8097.
doi:10.14778/3685800.3685831
and digitalization have become the new standards for modern auto-
mobiles [
32
,
37
], where the number of intelligent vehicles is rapidly
increasing [1]. Vehicles now are equipped with numerous sensors
that continuously collect data on vehicle operations, driving be-
haviors, and road conditions [
20
,
21
,
23
]. Through the network,
vehicles can timely upload data to the gateways in the cloud, pro-
viding data-driven services and applications and connecting the
vehicles to the external world.
The above trend facilitates the emerging system of the so-called
Internet of Vehicles (IoV) [
47
]. Within the IoV system, the data is
collected and utilized by multiple stakeholders, including vehicle
manufacturers, telematics service providers (TSP), autonomous
driving vendors, government regulation platforms, etc. They extract
valuable information and insights from vast amounts of data to
provide various services. For instance, when a vehicle encounters a
malfunction, a TSP can remotely diagnose the issue by examining
the vehicle’s operational data [
21
]. As the scale of an IoV system (i.e.,
the number of vehicles, sensors, services) expands, the eciency
of managing and utilizing IoV data is of great importance.
In practice, the IoV data is naturally time-series data in a form
that is consistent with the Internet of Things (IoT) data and DevOps
metric data [
26
]. The data is generated on a per-vehicle basis. A
single vehicle is equipped with a multitude of sensors, constantly
generating a vast number of metrics that form multi-dimensional
time series. There are many time-series database systems in the
market that are good at handling IoT and DevOps scenarios, such
as IoTDB [
44
], InuxDB [
13
] and Prometheus [
15
]. These data-
base systems are employed to support high-rate data ingestion and
low-latency real-time queries. However, due to the uniqueness of
IoV scenario, these time-series databases are ill-suited for handling
IoV data. One major dierence arises from the scale of the met-
rics. In DevOps systems, a single server or compute node typically
generates no more than dozens or hundreds of monitoring met-
rics [
16
,
19
] (e.g., CPU usage, disk IO time, Free Memory); in IoT
systems, a single device contains no more than several hundreds of
sensors [
25
,
40
,
44
]. In contrast, in an IoV system, a vehicle can eas-
ily generate thousands of metrics [
31
,
44
], since it consists of many
complex subcomponents, e.g., Advanced Driver Assistance Systems
(ADAS), Battery Management System (BMS), Domain Control Unit
(DCU). Note that such a large number of metrics are collected and
retrieved on a per-vehicle basis, which poses signicant challenges
for existing time-series databases that manage data on a per-metric
basis. This inspires us to explore a comprehensive understanding
4117
of IoV workload patterns, which can guide us to design well-suited
database systems for IoV scenarios.
To the best of our knowledge, no prior work extensively studies
IoV workloads. In this paper, we provide the rst empirical study of
IoV workloads to investigate their unique requirements to underly-
ing database systems. We explore both data read and write patterns
from three major automakers and TSPs in China (detailed in §2),
and summarize three key challenges for IoV data management:
C1.
Extremely high data ingestion rate. IoV systems are write-
intensive, where the data trac can easily reach over 1GB per
second and 80TB per day. This demands the underlying data-
base to sustain extremely high write throughput. In addition,
the huge volumes of data resulting from the high ingestion
rate further put enormous pressure on storage costs.
C2.
A huge number of metrics per vehicle with update-style
writing. A single vehicle can generate over 2500 distinct time-
series metrics simultaneously, which is nearly an order of
magnitude more than what a traditional time-series database
can handle. Moreover, dierent components in a vehicle upload
their own metrics data independently, which means that a
write request to the database will not involve all metrics of the
vehicle. Hence, the underlying database has to handle many
small writes with an incomplete metric format.
C3.
Diverse patterns for querying a small or large number
of metrics. To support numerous downstream services, the
database has to be capable of handling dierent query types
with high concurrency and low latency. Dierent queries may
involve dierent metrics from the massive dataset. Some of
them retrieve a large number of metrics (e.g., applications that
fetch data for comprehensive analysis), while others need only
a small subset (e.g., engineers that query related metrics for
remote failure diagnosis).
We note that existing time-series databases and any other databases
are unable to fully address above three IoV workload challenges.
Wide-column databases, e.g., HBase [
8
], and column-oriented time-
series databases, e.g., InuxDB [
13
], have to consume unaordable
amounts of computation and space resources to index massive met-
rics, resulting in unacceptably poor write throughputs (challenge
C1&C2). When querying massive metrics, time-series databases
need to perform independent retrievals for each metric, introducing
a vast amount of I/Os; HBase, which treats each data point as a
key-value entry, has to spend a lot of CPU time on performing key
comparisons (challenge C3). To address the issues of writing and
querying massive metrics, document-oriented databases like Mon-
goDB [
14
], which models all metric values from a vehicle generated
at the same timestamp as a exible schema-free document, seem
to be a suitable solution. However, due to their document-oriented
storage layout, for those queries involving a small subset of metrics,
they have to fetch all metrics (i.e., the entire document), leading to
prohibitive read amplication (challenge C3). Moreover, they also
struggle when metric values in one document arrive in multiple
rounds (challenge C2), as each round is treated as an extra update
that has to read the target document out rst.
After spotting the gap between IoV workload challenges and
existing database solutions, we propose Lindorm-UWC (ultra-wide-
column) for data management in IoV systems. To accommodate
massive amounts of IoV data, Lindorm-UWC has a distributed ar-
chitecture that partitions data by vehicles and time and supports au-
tomated load balancing. In order to eciently handle multi-metric
IoV data, we design an ultra-wide-column storage engine based on
the Log Structured Merge tree (LSM-tree) [
38
], where each partition
in Lindorm-UWC employs an independent one. The storage engine
implements two mechanisms to accelerate the ingesting of multi-
metric data: rst, multiple metric values contained in one write
request are consolidated into a single column to eliminate indexing
and grouping of dierent metrics on the write path; second, each
write request is processed in an append-only way instead of in-
place updating on existing data of that vehicle. To eciently handle
various query patterns, Lindorm-UWC organizes on-disk data in
both row-oriented and column-oriented storage formats, allowing
it to choose and read from a suitable le format that can reduce read
amplication. To lower storage cost, we employ a tiered cold-hot
data storage layout, ooading less-frequently accessed cold data to
poor-performing but cheap storage media. Lindorm-UWC has been
serving IoV enterprise customers on Alibaba Cloud since 2019. It
manages tens of petabytes of IoV data and handles more than 10
million requests per second.
Our major contributions are summarized as follows:
We conduct the rst empirical study to highlight the workload
characteristics in real-world IoV systems. By analyzing data in-
gestion and query patterns, we obtain a series of valuable ndings
that can drive the database design for IoV workloads.
We propose Lindorm-UWC, a database designed for IoV systems.
To manage the vast amount of IoV data, Lindorm-UWC employs
a tailored distributed architecture and a cold-hot data separation
mechanism. To eciently handle the writing and querying of
multi-metric data, we innovatively design an ultra-wide-column
storage engine, which supports high write throughput and pro-
vides ecient queries of diverse patterns.
To evaluate the eectiveness of Lindorm-UWC’s design, we de-
velop a benchmark suite based on the characteristics of real-
world IoV workloads, and then conduct comprehensive exper-
iments with it. We compare Lindorm-UWC with three typical
databases as baselines: MongoDB, HBase, and InuxDB. The
experimental results indicate that Lindorm-UWC signicantly
outperforms our baselines. In a variety of workloads, Lindorm-
UWC’s write performance is from 79% to an order of magnitude
higher. For query eciency, Lindorm-UWC can always sustain
high concurrency for any queries retrieving either a small or a
large number of metrics.
2 EMPIRICAL STUDY
2.1 IoV System Background
We rst introduce how vehicles generate data in the IoV system,
and how the data is used by dierent application services. The IoV
system is typically a three-tier structure [
20
,
32
], consisting of the
physical layer, the connectivity layer, and the cloud layer.
The physical layer refers to vehicles equipped with network
devices and a large number of sensors. The running vehicles contin-
uously collect and process information about the road environment,
vehicle components, and vehicle running status through the sen-
sors, and then upload them to the connectivity layer through the
4118
of 13
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜