
TimeCSL: Unsupervised Contrastive Learning of General
Shapelets for Explorable Time Series Analysis
Zhiyu Liang
Harbin Institute of Technology
Harbin, China
zyliang@hit.edu.cn
Chen Liang
Harbin Institute of Technology
Harbin, China
23B903050@stu.hit.edu.cn
Zheng Liang
Harbin Institute of Technology
Harbin, China
lz20@hit.edu.cn
Hongzhi Wang
∗
Harbin Institute of Technology
Harbin, China
wangzh@hit.edu.cn
Bo Zheng
CnosDB Inc.
Beijing, China
harbour.zheng@cnosdb.com
ABSTRACT
Unsupervised (a.k.a. Self-supervised) representation learning (URL)
has emerged as a new paradigm for time series analysis, because
it has the ability to learn generalizable time series representation
benecial for many downstream tasks without using labels that
are usually dicult to obtain. Considering that existing approaches
have limitations in the design of the representation encoder and the
learning objective, we have proposed Contrastive Shapelet Learn-
ing (CSL), the rst URL method that learns the general-purpose
shapelet-based representation through unsupervised contrastive
learning, and shown its superior performance in several analysis
tasks, such as time series classication, clustering, and anomaly
detection. In this paper, we develop TimeCSL, an end-to-end sys-
tem that makes full use of the general and interpretable shapelets
learned by CSL to achieve explorable time series analysis in a unied
pipeline. We introduce the system components and demonstrate
how users interact with TimeCSL to solve dierent analysis tasks
in the unied pipeline, and gain insight into their time series by
exploring the learned shapelets and representation.
PVLDB Reference Format:
Zhiyu Liang, Chen Liang, Zheng Liang, Hongzhi Wang, and Bo Zheng.
TimeCSL: Unsupervised Contrastive Learning of General Shapelets for
Explorable Time Series Analysis. PVLDB, 17(12): 4489-4492, 2024.
doi:10.14778/3685800.3685907
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at
https://github.com/real2sh/CSL.
1 INTRODUCTION
A time series is one (a.k.a. univariate) or a group (multivariate) of
variables observed over time. Time series analysis by discovering
dependencies of the time-evolving variables can oer important
∗
Corresponding author.
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 17, No. 12 ISSN 2150-8097.
doi:10.14778/3685800.3685907
insights into the underlying phenomena, which is useful for real-
world applications in various scenarios, such as manufacturing [
6
],
medicine [11] and nance [2].
A major challenge in modeling time series is the lack of labels,
because the underlying states required for labeling these time-
dependent data are dicult to understand, even for domain spe-
cialists [15]. For this reason, recent studies focus on unsupervised
(a.k.a. self-supervised) representation learning (URL) of time se-
ries [
3
,
4
,
15
,
20
,
21
], which aims to train a neural network (called
encoder) without accessing the ground-truth labels to embed the
data into feature vectors. The learned features (a.k.a. representation)
can then be used for training models to solve a downstream analysis
task, using lile annotated data compared to traditional super-
vised methods [
21
]. Not only that, the features are more general-
purpose since they can benet several tasks.
Unfortunately, existing URL approaches for time series have
two limitations. First, these methods focus on the representation
encoders based on the convolutional neural network (CNN) [
5
,
16
] and the Transformer [
18
]. However, these architectures are
originally designed for domains such as computer vision and natural
language processing, and have been shown to face many diculties
in modeling time series, due to the lack of capability to deal with
the characteristics specic to time series [
14
,
23
]. Second, some
existing approaches are based on domain-specic assumptions. For
example, Franceschi et al
. [4]
and Tonekaboni et al
. [15]
assume that
the subseries distant in time are dissimilar, but it is easily violated
in periodic time series [
13
]. As a result, these methods cannot be
well generalized to dierent scenarios.
To address the above issues, we have proposed
C
ontrastive
S
hapelet
L
earning (CSL) [
10
], a brand new unsupervised repre-
sentation learning framework for multivariate (and also univariate)
time series. To the best of our knowledge, CSL is the rst general-
purpose URL metho d based on shapelet, an interpretable pattern
specically designed for time series that represents the discrimi-
native subsequence. Unlike traditional approaches that learn the
shapelets for a specic analysis task, such as time series classi-
cation [
8
,
9
,
19
] or clustering [
22
], the shapelets of the proposed
CSL are learned with unsupervised contrastive learning, which has
shown superiority in many downstream analysis tasks [
10
], includ-
ing classication, clustering, segment-level anomaly detection, and
long time series representation. We summarize the performance of
CSL against the competing methods in Figure 1, which is extensively
4489
文档被以下合辑收录
评论