
CnosDB: A Flexible Distributed
Time-Series Database for Large-Scale
Data
Yu Yan
1
, Bo Zheng
2
, Hongzhi Wang
1(
B
)
,JinkaiZhang
1
, and Yutong Wang
1
1
Harbin Institute of Technology, Harbin, China
{yuyan,wangzh}@hit.edu.cn, harbour.zheng@cnosdb.com
2
Cnosdb Inc., Beijing, China
Abstract. With the development of the Internet of Things, the time
series data generated by monitors, analyzers, and detection instruments
in the industry has surged. The management of very large-scale time
series data faces great challenges. However, the current distributed time
series database is still poor in terms of data storage efficiency and data
writing speed. In order to achieve the fast writing and high efficient stor-
age of billions or even tens of billions of data points, we propose a cloud
native distributed time series database, CnosDB. Our system integrates
various data compression algorithms to achieve high compression rate in
each data type. And we propose a three-layer storage policy to achieve
fast writing under the premise of ensuring rapid time-based batch oper-
ations. In this paper, introduce the architecture and key techniques of
CnosDB, and describe three key demo scenarios of our system.
1 Introduction
With the advent of big data, the scale of time series data surge in the indus-
try, such as monitors, analyzers, and detection instruments in the electric power
industry [10] and the chemical industry. Industrial data has three typical fea-
tures: Fast Generation Speed [9]: Each monitoring point can generate large
amount of data one second. Unique Timestamp [8]: Each piece of data has a
dependent and unique timestamp. Wide Collection Range [13]: The conven-
tional real-time monitoring system has thousands of monitoring points, which
are generated data every second.
Faced with the real-time and large amount of time series data, traditional
databases such as MySQL can no longer meet the requirements for massive
data storage and management, and various types of time series databases have
emerged.
Recently, in order to achieve efficient management of large-scale time series
data, some time series database management systems have been developed. In
the early days, researchers used other databases as the backend and developed
a middleware for time series data management, such as TimescaleDB [3], [6],
etc. Without own storage engines, middleware-based methods cannot effectively
c
The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
X. Wang et al. (Eds.): DASFAA 2023, LNCS 13946, pp. 696–700, 2023.
https://doi.org/10.1007/978-3-031-30678-5
_58
评论