暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
VLDB2024_TimeCSL_Unsupervised Contrastive Learning of GeneralShapelets for Explorable Time Series Analysis_CnosDB.pdf
595
4页
2次
2024-09-09
免费下载
TimeCSL: Unsupervised Contrastive Learning of General
Shapelets for Explorable Time Series Analysis
Zhiyu Liang
Harbin Institute of Technology
Harbin, China
zyliang@hit.edu.cn
Chen Liang
Harbin Institute of Technology
Harbin, China
23B903050@stu.hit.edu.cn
Zheng Liang
Harbin Institute of Technology
Harbin, China
lz20@hit.edu.cn
Hongzhi Wang
Harbin Institute of Technology
Harbin, China
wangzh@hit.edu.cn
Bo Zheng
CnosDB Inc.
Beijing, China
harbour.zheng@cnosdb.com
ABSTRACT
Unsupervised (a.k.a. Self-supervised) representation learning (URL)
has emerged as a new paradigm for time series analysis, because
it has the ability to learn generalizable time series representation
benecial for many downstream tasks without using labels that
are usually dicult to obtain. Considering that existing approaches
have limitations in the design of the representation encoder and the
learning objective, we have proposed Contrastive Shapelet Learn-
ing (CSL), the rst URL method that learns the general-purpose
shapelet-based representation through unsupervised contrastive
learning, and shown its superior performance in several analysis
tasks, such as time series classication, clustering, and anomaly
detection. In this paper, we develop TimeCSL, an end-to-end sys-
tem that makes full use of the general and interpretable shapelets
learned by CSL to achieve explorable time series analysis in a unied
pipeline. We introduce the system components and demonstrate
how users interact with TimeCSL to solve dierent analysis tasks
in the unied pipeline, and gain insight into their time series by
exploring the learned shapelets and representation.
PVLDB Reference Format:
Zhiyu Liang, Chen Liang, Zheng Liang, Hongzhi Wang, and Bo Zheng.
TimeCSL: Unsupervised Contrastive Learning of General Shapelets for
Explorable Time Series Analysis. PVLDB, 17(12): 4489-4492, 2024.
doi:10.14778/3685800.3685907
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at
https://github.com/real2sh/CSL.
1 INTRODUCTION
A time series is one (a.k.a. univariate) or a group (multivariate) of
variables observed over time. Time series analysis by discovering
dependencies of the time-evolving variables can oer important
Corresponding author.
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 17, No. 12 ISSN 2150-8097.
doi:10.14778/3685800.3685907
insights into the underlying phenomena, which is useful for real-
world applications in various scenarios, such as manufacturing [
6
],
medicine [11] and nance [2].
A major challenge in modeling time series is the lack of labels,
because the underlying states required for labeling these time-
dependent data are dicult to understand, even for domain spe-
cialists [15]. For this reason, recent studies focus on unsupervised
(a.k.a. self-supervised) representation learning (URL) of time se-
ries [
3
,
4
,
15
,
20
,
21
], which aims to train a neural network (called
encoder) without accessing the ground-truth labels to embed the
data into feature vectors. The learned features (a.k.a. representation)
can then be used for training models to solve a downstream analysis
task, using lile annotated data compared to traditional super-
vised methods [
21
]. Not only that, the features are more general-
purpose since they can benet several tasks.
Unfortunately, existing URL approaches for time series have
two limitations. First, these methods focus on the representation
encoders based on the convolutional neural network (CNN) [
5
,
16
] and the Transformer [
18
]. However, these architectures are
originally designed for domains such as computer vision and natural
language processing, and have been shown to face many diculties
in modeling time series, due to the lack of capability to deal with
the characteristics specic to time series [
14
,
23
]. Second, some
existing approaches are based on domain-specic assumptions. For
example, Franceschi et al
. [4]
and Tonekaboni et al
. [15]
assume that
the subseries distant in time are dissimilar, but it is easily violated
in periodic time series [
13
]. As a result, these methods cannot be
well generalized to dierent scenarios.
To address the above issues, we have proposed
C
ontrastive
S
hapelet
L
earning (CSL) [
10
], a brand new unsupervised repre-
sentation learning framework for multivariate (and also univariate)
time series. To the best of our knowledge, CSL is the rst general-
purpose URL metho d based on shapelet, an interpretable pattern
specically designed for time series that represents the discrimi-
native subsequence. Unlike traditional approaches that learn the
shapelets for a specic analysis task, such as time series classi-
cation [
8
,
9
,
19
] or clustering [
22
], the shapelets of the proposed
CSL are learned with unsupervised contrastive learning, which has
shown superiority in many downstream analysis tasks [
10
], includ-
ing classication, clustering, segment-level anomaly detection, and
long time series representation. We summarize the performance of
CSL against the competing methods in Figure 1, which is extensively
4489
Classification
(
Mean Ranking of Accuracy
)
Clustering
(
Mean Ranking of
RI and NMI
)
Anomaly Detection
(
Mean Ranking of F1-Score
)
Long SequenceRepresentation
(
Mean Ranking of Accuracy
)
Efficiency
(
Mean Ranking of
T
raining Time per Epoch
)
CSL (Ours) TS2VEC T-Loss TNC TS-TCC TST
4.0
2.5
4.0
1.0
2.5
1.0
4.0
2.5
1.0
4.0
2.5
1.0
1.0
2.5
4.0
Figure 1: Overall performance of CSL against the competitors
(smaller is better) regarding classicaton, clustering, anom-
aly detection, long sequence representation and training e-
ciency. See §5.2, §5.7 and §5.8 in [10] for the details.
evaluated using 38 datasets from various real-world scenarios [
10
].
The results can also be reproduced using the UniTS system [7].
In this paper, we demonstrate
TimeCSL
, a novel system that
makes full use of CSL to achieve explorable time series analysis
for various tasks in a unied process.
TimeCSL
includes an end-
to-end unied pipeline that rst learns the general shapelets of
multiple scales and (dis)similarity metrics without any annotation
by running the CSL algorithm [
10
]. Then, it addresses dierent time
series analysis tasks by building arbitrary task-oriented analyzers
(e.g., SVM for classication and K-Means for clustering) on top
of the general-purpose shapelet-based features. The pipeline has
shown superior performance compared to that of the complex task-
specic approaches, and signicantly outperforms the traditional
supervised methods when there are few available labels. We refer
the interested readers to our research paper [10] for more details.
TimeCSL
provides exible and intuitive visual exploration of
the raw time series, the learned shapelets, and the shapelet-based
time series representation, oering a useful tool for interpreting the
analysis results. Users can experiment with the system using their
own data to explore the learned shapelet-based features, which
are usually more insightful and intuitive-to-understand than the
complex raw time series. This “explorable analysis can help to
explain the decisions made by the task-oriented analyzer.
2 THE TIMECSL SYSTEM
As depicted in Figure 2,
TimeCSL
is comprised of two components,
including Unsupervised Contrastive Shapelet Learning and Explorable
Time Series Analysis. These components work as follows.
2.1
Unsupervised Contrastive Shapelet Learning
The goal of this component is to learn general shapelets from
the training time series, which transforms the raw time series
into shapelet-based representation to facilitate dierent down-
stream analysis tasks. This is achieved using our proposed CSL
method [10].
Time Series Dataset
(Uni- or Multi-variate)
Analysis Results
e.g. Classes / Clusters
Unsupervised
Contrastive Shapelet
Learning (CSL)
Explorable Time Series Analysis
General Shapelets
Task-Oriented Analyzer
(e.g. SVM / K-Means)
Visual Exploration
e.g. Shapelets / Representations
Figure 2: The TimeCSL pipeline.
Given a dataset containing
𝑁
time series as
𝑿 = {𝒙
1
, 𝒙
2
, ..., 𝒙
𝑁
}
R
𝑁 ×𝐷 ×𝑇
, where each time series
𝒙
𝑖
R
𝐷 ×𝑇
has
𝐷
variables
(
𝐷
1) and
𝑇
observations ordered by time, CSL embeds
𝒙
𝑖
into
the shapelet-based representation
𝒛
𝑖
R
𝐷
𝑟 𝑒𝑝𝑟
using the proposed
Shapelet Transformer
𝑓
, i.e.
𝒛
𝑖
= 𝑓 (𝒙
𝑖
)
, where
𝑓
contains the learn-
able shapelets of various (dis)similarity measures and lengths (a.k.a.
scales). CSL learns
𝑓
using an unsupervised contrastive learning
algorithm, which iteratively optimizes the proposed Multi-Graine d
Contrasting and Multi-Scale Alignment objectives in an end-to-end
manner with stochastic gradient descent.
Using the Shapelet Transformer
𝑓
(i.e. all the shapelets) learned
by CSL, the
TimeCSL
system transforms all input time series into
the shapelet-based features as
𝒛
𝑖
= 𝑓 (𝒙
𝑖
)
, and performs the down-
stream analysis tasks on top of the representation
𝒛
𝑖
. It is note-
worthy that
𝒛
𝑖
represents the (dis)similarity (e.g., the minimum
Euclidean norm or the maximum cosine similarity) between the
subsequences of
𝒙
𝑖
and each of the shapelets, and therefore is fully
interpretable and explainable.
2.2 Explorable Time Series Analysis
By making full use of the general-purpose and explainable shapelet-
based features learned by CSL, this component not only oers a
unied and exible way to perform dierent time series analysis
tasks (e.g. classication, clustering, and anomaly detection), but also
the intuitive visual exploration of the raw time series, the learned
shapelets, and the shapelet-based representation, so that the users
can gain useful insights into their data to understand the decision
basis of the analysis results (e.g. the predicted classes or clusters).
Task solving. As mentioned above,
TimeCSL
solves all dierent
time series analysis tasks using the shapelet-based representation
learned by CSL. This is achieved by building a task-oriented ana-
lyzer (e.g., SVM for classication or K-Means for clustering) that
takes the shapelet-based feature vector
𝒛
𝑖
as input and outputs the
corresponding analysis results (e.g., classes or clusters).
TimeCSL
provides two modes to build the analyzer, including the freezing
mode and the ne-tuning mode, which dier in whether to ne-tune
the parameters of the Shapelet Transformer
𝑓
that is pre-trained
by the Unsupervised Contrastive Shapelet Learning component.
Freezing mode.
In this basic mode, the task-oriented analyzer
is built by directly using the pre-trained Shapelet Transformer
𝑓
to extract the general-purpose shapelet-based features, while the
parameters of
𝑓
are kept unchanged during the building. Therefore,
any standard analyzer (e.g. the many popular classiers such as
SVM, logistic regression, GBDT, etc) can be seamlessly integrated
to facilitate dierent application scenarios of the users.
Fine-tuning mode.
This is an advanced mode that allows users
to ne-tune the parameters of the pre-trained Shapelet Transformer
4490
of 4
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜