VLDB2024_TimeCSL_Unsupervised Contrastive Learning of GeneralShapelets for Explorable Time Series Analysis_CnosDB.pdf

迹部景吾

659

4页

2次

2024-09-09

免费下载

TimeCSL: Unsupervised Contrastive Learning of General

Shapelets for Explorable Time Series Analysis

Zhiyu Liang

Harbin Institute of Technology

Harbin, China

zyliang@hit.edu.cn

Chen Liang

Harbin Institute of Technology

Harbin, China

23B903050@stu.hit.edu.cn

Zheng Liang

Harbin Institute of Technology

Harbin, China

lz20@hit.edu.cn

Hongzhi Wang

∗

Harbin Institute of Technology

Harbin, China

wangzh@hit.edu.cn

Bo Zheng

CnosDB Inc.

Beijing, China

harbour.zheng@cnosdb.com

ABSTRACT

Unsupervised (a.k.a. Self-supervised) representation learning (URL)

has emerged as a new paradigm for time series analysis, because

it has the ability to learn generalizable time series representation

benecial for many downstream tasks without using labels that

are usually dicult to obtain. Considering that existing approaches

have limitations in the design of the representation encoder and the

learning objective, we have proposed Contrastive Shapelet Learn-

ing (CSL), the rst URL method that learns the general-purpose

shapelet-based representation through unsupervised contrastive

learning, and shown its superior performance in several analysis

tasks, such as time series classication, clustering, and anomaly

detection. In this paper, we develop TimeCSL, an end-to-end sys-

tem that makes full use of the general and interpretable shapelets

learned by CSL to achieve explorable time series analysis in a unied

pipeline. We introduce the system components and demonstrate

how users interact with TimeCSL to solve dierent analysis tasks

in the unied pipeline, and gain insight into their time series by

exploring the learned shapelets and representation.

PVLDB Reference Format:

Zhiyu Liang, Chen Liang, Zheng Liang, Hongzhi Wang, and Bo Zheng.

TimeCSL: Unsupervised Contrastive Learning of General Shapelets for

Explorable Time Series Analysis. PVLDB, 17(12): 4489-4492, 2024.

doi:10.14778/3685800.3685907

PVLDB Artifact Availability:

The source code, data, and/or other artifacts have been made available at

https://github.com/real2sh/CSL.

1 INTRODUCTION

A time series is one (a.k.a. univariate) or a group (multivariate) of

variables observed over time. Time series analysis by discovering

dependencies of the time-evolving variables can oer important

∗

Corresponding author.

This work is licensed under the Creative Commons BY-NC-ND 4.0 International

License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of

this license. For any use beyond those covered by this license, obtain permission by

emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights

licensed to the VLDB Endowment.

Proceedings of the VLDB Endowment, Vol. 17, No. 12 ISSN 2150-8097.

doi:10.14778/3685800.3685907

insights into the underlying phenomena, which is useful for real-

world applications in various scenarios, such as manufacturing [

medicine [11] and nance [2].

A major challenge in modeling time series is the lack of labels,

because the underlying states required for labeling these time-

dependent data are dicult to understand, even for domain spe-

cialists [15]. For this reason, recent studies focus on unsupervised

(a.k.a. self-supervised) representation learning (URL) of time se-

ries [

], which aims to train a neural network (called

encoder) without accessing the ground-truth labels to embed the

data into feature vectors. The learned features (a.k.a. representation)

can then be used for training models to solve a downstream analysis

task, using lile annotated data compared to traditional super-

vised methods [

]. Not only that, the features are more general-

purpose since they can benet several tasks.

Unfortunately, existing URL approaches for time series have

two limitations. First, these methods focus on the representation

encoders based on the convolutional neural network (CNN) [

] and the Transformer [

]. However, these architectures are

originally designed for domains such as computer vision and natural

language processing, and have been shown to face many diculties

in modeling time series, due to the lack of capability to deal with

the characteristics specic to time series [

]. Second, some

existing approaches are based on domain-specic assumptions. For

example, Franceschi et al

. [4]

and Tonekaboni et al

. [15]

assume that

the subseries distant in time are dissimilar, but it is easily violated

in periodic time series [

]. As a result, these methods cannot be

well generalized to dierent scenarios.

To address the above issues, we have proposed

ontrastive

hapelet

earning (CSL) [

], a brand new unsupervised repre-

sentation learning framework for multivariate (and also univariate)

time series. To the best of our knowledge, CSL is the rst general-

purpose URL metho d based on shapelet, an interpretable pattern

specically designed for time series that represents the discrimi-

native subsequence. Unlike traditional approaches that learn the

shapelets for a specic analysis task, such as time series classi-

cation [

] or clustering [

], the shapelets of the proposed

CSL are learned with unsupervised contrastive learning, which has

shown superiority in many downstream analysis tasks [

], includ-

ing classication, clustering, segment-level anomaly detection, and

long time series representation. We summarize the performance of

CSL against the competing methods in Figure 1, which is extensively

4489

Classification

(

Mean Ranking of Accuracy

)

Clustering

(

Mean Ranking of



RI and NMI

)

Anomaly Detection

(

Mean Ranking of F1-Score

)

Long SequenceRepresentation

(

Mean Ranking of Accuracy

)

Efficiency

(

Mean Ranking of



raining Time per Epoch

)

CSL (Ours) TS2VEC T-Loss TNC TS-TCC TST

4.0

2.5

4.0

1.0

2.5

1.0

4.0

2.5

1.0

4.0

2.5

1.0

2.5

4.0

Figure 1: Overall performance of CSL against the competitors

(smaller is better) regarding classicaton, clustering, anom-

aly detection, long sequence representation and training e-

ciency. See §5.2, §5.7 and §5.8 in [10] for the details.

evaluated using 38 datasets from various real-world scenarios [

The results can also be reproduced using the UniTS system [7].

In this paper, we demonstrate

TimeCSL

, a novel system that

makes full use of CSL to achieve explorable time series analysis

for various tasks in a unied process.

TimeCSL

includes an end-

to-end unied pipeline that rst learns the general shapelets of

multiple scales and (dis)similarity metrics without any annotation

by running the CSL algorithm [

]. Then, it addresses dierent time

series analysis tasks by building arbitrary task-oriented analyzers

(e.g., SVM for classication and K-Means for clustering) on top

of the general-purpose shapelet-based features. The pipeline has

shown superior performance compared to that of the complex task-

specic approaches, and signicantly outperforms the traditional

supervised methods when there are few available labels. We refer

the interested readers to our research paper [10] for more details.

TimeCSL

provides exible and intuitive visual exploration of

the raw time series, the learned shapelets, and the shapelet-based

time series representation, oering a useful tool for interpreting the

analysis results. Users can experiment with the system using their

own data to explore the learned shapelet-based features, which

are usually more insightful and intuitive-to-understand than the

complex raw time series. This “explorable” analysis can help to

explain the decisions made by the task-oriented analyzer.

2 THE TIMECSL SYSTEM

As depicted in Figure 2,

TimeCSL

is comprised of two components,

including Unsupervised Contrastive Shapelet Learning and Explorable

Time Series Analysis. These components work as follows.

2.1

Unsupervised Contrastive Shapelet Learning

The goal of this component is to learn general shapelets from

the training time series, which transforms the raw time series

into shapelet-based representation to facilitate dierent down-

stream analysis tasks. This is achieved using our proposed CSL

method [10].

Time Series Dataset

(Uni- or Multi-variate)

Analysis Results

漏e.g. Classes / Clusters漐

Unsupervised

Contrastive Shapelet

Learning (CSL)

Explorable Time Series Analysis

General Shapelets

Task-Oriented Analyzer

(e.g. SVM / K-Means)

Visual Exploration

漏e.g. Shapelets / Representations漐

Figure 2: The TimeCSL pipeline.

Given a dataset containing

𝑁

time series as

𝑿 = {𝒙

, 𝒙

, ..., 𝒙

𝑁

} ∈

𝑁 ×𝐷 ×𝑇

, where each time series

𝒙

𝑖

∈ R

𝐷 ×𝑇

has

𝐷

variables

(

𝐷 ≥

1) and

𝑇

observations ordered by time, CSL embeds

𝒙

𝑖

into

the shapelet-based representation

𝒛

𝑖

∈ R

𝐷

𝑟 𝑒𝑝𝑟

using the proposed

Shapelet Transformer

𝑓

, i.e.

𝒛

𝑖

= 𝑓 (𝒙

𝑖

)

, where

𝑓

contains the learn-

able shapelets of various (dis)similarity measures and lengths (a.k.a.

scales). CSL learns

𝑓

using an unsupervised contrastive learning

algorithm, which iteratively optimizes the proposed Multi-Graine d

Contrasting and Multi-Scale Alignment objectives in an end-to-end

manner with stochastic gradient descent.

Using the Shapelet Transformer

𝑓

(i.e. all the shapelets) learned

by CSL, the

TimeCSL

system transforms all input time series into

the shapelet-based features as

𝒛

𝑖

= 𝑓 (𝒙

𝑖

)

, and performs the down-

stream analysis tasks on top of the representation

𝒛

𝑖

. It is note-

worthy that

𝒛

𝑖

represents the (dis)similarity (e.g., the minimum

Euclidean norm or the maximum cosine similarity) between the

subsequences of

𝒙

𝑖

and each of the shapelets, and therefore is fully

interpretable and explainable.

2.2 Explorable Time Series Analysis

By making full use of the general-purpose and explainable shapelet-

based features learned by CSL, this component not only oers a

unied and exible way to perform dierent time series analysis

tasks (e.g. classication, clustering, and anomaly detection), but also

the intuitive visual exploration of the raw time series, the learned

shapelets, and the shapelet-based representation, so that the users

can gain useful insights into their data to understand the decision

basis of the analysis results (e.g. the predicted classes or clusters).

Task solving. As mentioned above,

TimeCSL

solves all dierent

time series analysis tasks using the shapelet-based representation

learned by CSL. This is achieved by building a task-oriented ana-

lyzer (e.g., SVM for classication or K-Means for clustering) that

takes the shapelet-based feature vector

𝒛

𝑖

as input and outputs the

corresponding analysis results (e.g., classes or clusters).

TimeCSL

provides two modes to build the analyzer, including the freezing

mode and the ne-tuning mode, which dier in whether to ne-tune

the parameters of the Shapelet Transformer

𝑓

that is pre-trained

by the Unsupervised Contrastive Shapelet Learning component.

• Freezing mode.

In this basic mode, the task-oriented analyzer

is built by directly using the pre-trained Shapelet Transformer

𝑓

to extract the general-purpose shapelet-based features, while the

parameters of

𝑓

are kept unchanged during the building. Therefore,

any standard analyzer (e.g. the many popular classiers such as

SVM, logistic regression, GBDT, etc) can be seamlessly integrated

to facilitate dierent application scenarios of the users.

• Fine-tuning mode.

This is an advanced mode that allows users

to ne-tune the parameters of the pre-trained Shapelet Transformer

4490