VLDB2024_Lindorm-UWC：An Ultra-Wide-Column Database for Internet of Vehicles_阿里云.pdf

迹部景吾

328

13页

5次

2024-09-09

免费下载

Lindorm-UWC: An Ultra-Wide-Column Database for Internet of

Vehicles

Qianyu Ouyang

†‡

Chunhui Shen

‡§

Wenlong Yang

‡

{ouyangqianyu.oyqy,tianwu.sch,zhengyan.ywl,tianyu.yp,xiaoqiang.xiao,leijianhui.ljh,yadong.cyd,qilu.zql,wangxiang}

{linyong.ly,qingyi.mqy,jizhicheng.jzc,mw371030,mingyan.zc,sh.wang,zwei,lifeifei,jingren.zhou}@alibaba-inc.com

Peng Yu

‡

Qiang Xiao

‡

Jianhui Lei

‡

Yadong Chen

‡

Tsinghua University

†

Qilu Zhong

‡

Xiang Wang

‡

Yong Lin

‡

Qingyi Meng

‡

Zhejiang University

Zhicheng Ji

†‡

Wei Meng

‡

Cen Zheng

‡

Sheng Wang

‡

Alibaba Cloud

‡

Dan Pei

†

Wei Zhang

‡

Feifei Li

‡

Jingren Zhou

‡

ABSTRACT

In the Internet of Vehicle (IoV) systems, intelligent vehicles generate

huge amounts of data that supports diverse services and applica-

tions. In practice, database systems are deployed in the cloud to

manage data uploaded from the vehicle side and provide real-time

query capacities. However, existing database systems are ill-suited

because IoV data contains a large number of metrics and is writ-

ten at an extremely high throughput. To better understand IoV

data and corresponding challenges to underlying database systems,

we conduct the rst extensive empirical study of real-world IoV

workloads. According to our ndings from the study, we design

Lindorm-UWC as a superior database for IoV systems. It imple-

ments a distributed architecture and a cold/hot data separation

mechanism to accommodate massive amounts of IoV data. In each

data partition, it deploys an ultra-wide-column storage engine to

eciently handle the query and ingestion of multi-metric data. We

evaluate Lindorm-UWC under dierent data scales and various

types of query. Our experimental results show that it can always

achieve higher write throughput (over 79% increase) and competi-

tive query performance compared to various alternative solutions.

Lindorm-UWC has been serving IoV enterprise customers on Al-

ibaba Cloud since 2019, managing tens of petabytes of IoV data.

PVLDB Reference Format:

Qianyu Ouyang, Chunhui Shen, Wenlong Yang, Peng Yu, Qiang Xiao,

Jianhui Lei, Yadong Chen, Qilu Zhong, Xiang Wang, Yong Lin, Qingyi

Meng, Zhicheng Ji, Wei Meng, Cen Zheng, Sheng Wang, Dan Pei, Wei

Zhang, Feifei Li, Jingren Zhou. Lindorm-UWC: An Ultra-Wide-Column

Database for Internet of Vehicles. PVLDB, 17(12): 4117 - 4129, 2024.

doi:10.14778/3685800.3685831

1 INTRODUCTION

With the development of information and communication technol-

ogy (ICT) as well as in-vehicle sensing technology, the automotive

industry is undergoing a signicant transformation—intelligence

This work is licensed under the Creative Commons BY-NC-ND 4.0 International

License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of

this license. For any use beyond those covered by this license, obtain permission by

emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights

licensed to the VLDB Endowment.

Proceedings of the VLDB Endowment, Vol. 17, No. 12 ISSN 2150-8097.

doi:10.14778/3685800.3685831

and digitalization have become the new standards for modern auto-

mobiles [

], where the number of intelligent vehicles is rapidly

increasing [1]. Vehicles now are equipped with numerous sensors

that continuously collect data on vehicle operations, driving be-

haviors, and road conditions [

]. Through the network,

vehicles can timely upload data to the gateways in the cloud, pro-

viding data-driven services and applications and connecting the

vehicles to the external world.

The above trend facilitates the emerging system of the so-called

Internet of Vehicles (IoV) [

]. Within the IoV system, the data is

collected and utilized by multiple stakeholders, including vehicle

manufacturers, telematics service providers (TSP), autonomous

driving vendors, government regulation platforms, etc. They extract

valuable information and insights from vast amounts of data to

provide various services. For instance, when a vehicle encounters a

malfunction, a TSP can remotely diagnose the issue by examining

the vehicle’s operational data [

]. As the scale of an IoV system (i.e.,

the number of vehicles, sensors, services) expands, the eciency

of managing and utilizing IoV data is of great importance.

In practice, the IoV data is naturally time-series data in a form

that is consistent with the Internet of Things (IoT) data and DevOps

metric data [

]. The data is generated on a per-vehicle basis. A

single vehicle is equipped with a multitude of sensors, constantly

generating a vast number of metrics that form multi-dimensional

time series. There are many time-series database systems in the

market that are good at handling IoT and DevOps scenarios, such

as IoTDB [

], InuxDB [

] and Prometheus [

]. These data-

base systems are employed to support high-rate data ingestion and

low-latency real-time queries. However, due to the uniqueness of

IoV scenario, these time-series databases are ill-suited for handling

IoV data. One major dierence arises from the scale of the met-

rics. In DevOps systems, a single server or compute node typically

generates no more than dozens or hundreds of monitoring met-

rics [

] (e.g., CPU usage, disk IO time, Free Memory); in IoT

systems, a single device contains no more than several hundreds of

sensors [

]. In contrast, in an IoV system, a vehicle can eas-

ily generate thousands of metrics [

], since it consists of many

complex subcomponents, e.g., Advanced Driver Assistance Systems

(ADAS), Battery Management System (BMS), Domain Control Unit

(DCU). Note that such a large number of metrics are collected and

retrieved on a per-vehicle basis, which poses signicant challenges

for existing time-series databases that manage data on a per-metric

basis. This inspires us to explore a comprehensive understanding

4117

of IoV workload patterns, which can guide us to design well-suited

database systems for IoV scenarios.

To the best of our knowledge, no prior work extensively studies

IoV workloads. In this paper, we provide the rst empirical study of

IoV workloads to investigate their unique requirements to underly-

ing database systems. We explore both data read and write patterns

from three major automakers and TSPs in China (detailed in §2),

and summarize three key challenges for IoV data management:

C1.

Extremely high data ingestion rate. IoV systems are write-

intensive, where the data trac can easily reach over 1GB per

second and 80TB per day. This demands the underlying data-

base to sustain extremely high write throughput. In addition,

the huge volumes of data resulting from the high ingestion

rate further put enormous pressure on storage costs.

C2.

A huge number of metrics per vehicle with update-style

writing. A single vehicle can generate over 2500 distinct time-

series metrics simultaneously, which is nearly an order of

magnitude more than what a traditional time-series database

can handle. Moreover, dierent components in a vehicle upload

their own metrics data independently, which means that a

write request to the database will not involve all metrics of the

vehicle. Hence, the underlying database has to handle many

small writes with an incomplete metric format.

C3.

Diverse patterns for querying a small or large number

of metrics. To support numerous downstream services, the

database has to be capable of handling dierent query types

with high concurrency and low latency. Dierent queries may

involve dierent metrics from the massive dataset. Some of

them retrieve a large number of metrics (e.g., applications that

fetch data for comprehensive analysis), while others need only

a small subset (e.g., engineers that query related metrics for

remote failure diagnosis).

We note that existing time-series databases and any other databases

are unable to fully address above three IoV workload challenges.

Wide-column databases, e.g., HBase [

], and column-oriented time-

series databases, e.g., InuxDB [

], have to consume unaordable

amounts of computation and space resources to index massive met-

rics, resulting in unacceptably poor write throughputs (challenge

C1&C2). When querying massive metrics, time-series databases

need to perform independent retrievals for each metric, introducing

a vast amount of I/Os; HBase, which treats each data point as a

key-value entry, has to spend a lot of CPU time on performing key

comparisons (challenge C3). To address the issues of writing and

querying massive metrics, document-oriented databases like Mon-

goDB [

], which models all metric values from a vehicle generated

at the same timestamp as a exible schema-free document, seem

to be a suitable solution. However, due to their document-oriented

storage layout, for those queries involving a small subset of metrics,

they have to fetch all metrics (i.e., the entire document), leading to

prohibitive read amplication (challenge C3). Moreover, they also

struggle when metric values in one document arrive in multiple

rounds (challenge C2), as each round is treated as an extra update

that has to read the target document out rst.

After spotting the gap between IoV workload challenges and

existing database solutions, we propose Lindorm-UWC (ultra-wide-

column) for data management in IoV systems. To accommodate

massive amounts of IoV data, Lindorm-UWC has a distributed ar-

chitecture that partitions data by vehicles and time and supports au-

tomated load balancing. In order to eciently handle multi-metric

IoV data, we design an ultra-wide-column storage engine based on

the Log Structured Merge tree (LSM-tree) [

], where each partition

in Lindorm-UWC employs an independent one. The storage engine

implements two mechanisms to accelerate the ingesting of multi-

metric data: rst, multiple metric values contained in one write

request are consolidated into a single column to eliminate indexing

and grouping of dierent metrics on the write path; second, each

write request is processed in an append-only way instead of in-

place updating on existing data of that vehicle. To eciently handle

various query patterns, Lindorm-UWC organizes on-disk data in

both row-oriented and column-oriented storage formats, allowing

it to choose and read from a suitable le format that can reduce read

amplication. To lower storage cost, we employ a tiered cold-hot

data storage layout, ooading less-frequently accessed cold data to

poor-performing but cheap storage media. Lindorm-UWC has been

serving IoV enterprise customers on Alibaba Cloud since 2019. It

manages tens of petabytes of IoV data and handles more than 10

million requests per second.

Our major contributions are summarized as follows:

•

We conduct the rst empirical study to highlight the workload

characteristics in real-world IoV systems. By analyzing data in-

gestion and query patterns, we obtain a series of valuable ndings

that can drive the database design for IoV workloads.

•

We propose Lindorm-UWC, a database designed for IoV systems.

To manage the vast amount of IoV data, Lindorm-UWC employs

a tailored distributed architecture and a cold-hot data separation

mechanism. To eciently handle the writing and querying of

multi-metric data, we innovatively design an ultra-wide-column

storage engine, which supports high write throughput and pro-

vides ecient queries of diverse patterns.

•

To evaluate the eectiveness of Lindorm-UWC’s design, we de-

velop a benchmark suite based on the characteristics of real-

world IoV workloads, and then conduct comprehensive exper-

iments with it. We compare Lindorm-UWC with three typical

databases as baselines: MongoDB, HBase, and InuxDB. The

experimental results indicate that Lindorm-UWC signicantly

outperforms our baselines. In a variety of workloads, Lindorm-

UWC’s write performance is from 79% to an order of magnitude

higher. For query eciency, Lindorm-UWC can always sustain

high concurrency for any queries retrieving either a small or a

large number of metrics.

2 EMPIRICAL STUDY

2.1 IoV System Background

We rst introduce how vehicles generate data in the IoV system,

and how the data is used by dierent application services. The IoV

system is typically a three-tier structure [

], consisting of the

physical layer, the connectivity layer, and the cloud layer.

The physical layer refers to vehicles equipped with network

devices and a large number of sensors. The running vehicles contin-

uously collect and process information about the road environment,

vehicle components, and vehicle running status through the sen-

sors, and then upload them to the connectivity layer through the

4118

of 13

免费下载

文档被以下合辑收录

VLDB2024 数据库顶会论文（共31篇）

本合辑收录了VLDB2024 数据库顶会论文。

关注

文档被以下合辑收录

评论