暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
VLDB2024_A Shapelet-based Framework for Unsupervised Multivariate Time Series Representation Learning_华为.pdf
609
14页
3次
2024-09-09
免费下载
A Shapelet-base d Framework for Unsupervised Multivariate Time
Series Representation Learning
Zhiyu Liang
Harbin Institute of Technology
Harbin, China
zyliang@hit.edu.cn
Jianfeng Zhang
Huawei Noah’s Ark Lab
Shenzhen, China
zhangjianfeng3@huawei.com
Chen Liang
Harbin Institute of Technology
Harbin, China
1190201818@stu.hit.edu.cn
Hongzhi Wang
Harbin Institute of Technology
Harbin, China
wangzh@hit.edu.cn
Zheng Liang
Harbin Institute of Technology
Harbin, China
lz20@hit.edu.cn
Lujia Pan
Huawei Noah’s Ark Lab
Shenzhen, China
panlujia@huawei.com
ABSTRACT
Recent studies have shown great promise in unsupervised represen-
tation learning (URL) for multivariate time series, because URL has
the capability in learning generalizable representation for many
downstream tasks without using inaccessible labels. However, ex-
isting approaches usually adopt the models originally designed for
other domains (e.g., computer vision) to encode the time series data
and rely on strong assumptions to design learning objectives, which
limits their ability to perform well. To deal with these problems,
we propose a novel URL framework for multivariate time series by
learning time-series-specic shapelet-based representation through
a popular contrasting learning paradigm. To the best of our knowl-
edge, this is the rst work that explores the shapelet-based embed-
ding in the unsupervised general-purpose representation learning.
A unied shapelet-based encoder and a novel learning objective
with multi-grained contrasting and multi-scale alignment are partic-
ularly designed to achieve our goal, and a data augmentation library
is employed to improve the generalization. We conduct extensive
experiments using tens of real-world datasets to assess the represen-
tation quality on many downstream tasks, including classication,
clustering, and anomaly detection. The results demonstrate the su-
periority of our method against not only URL competitors, but also
techniques specially designed for downstream tasks. Our code has
been made publicly available at https://github.com/real2sh/CSL.
PVLDB Reference Format:
Zhiyu Liang, Jianfeng Zhang, Chen Liang, Hongzhi Wang, Zheng Liang,
and Lujia Pan. A Shapelet-based Framework for Unsupervised Multivariate
Time Series Representation Learning. PVLDB, 17(3): 386-399, 2023.
doi:10.14778/3632093.3632103
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at
https://github.com/real2sh/CSL.
Corresponding author.
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 17, No. 3 ISSN 2150-8097.
doi:10.14778/3632093.3632103
1 INTRODUCTION
Multivariate time series (MTS) generally describes a group of de-
pendent variables evolving over time, each of which represents a
monitoring metric (e.g., temperature or CPU utilization) of an entity
(e.g., system or service). MTS data play a vital role in many practical
scenarios, such as manufacturing, medicine, and nance [
4
,
24
,
42
].
While MTS are being increasingly collected from various applica-
tions, a particular challenge in modeling them is the lack of labels.
Unlike images or text that usually contain human-recognizable
patterns, label acquisition for time series is much more dicult,
because the underlying state of these time-evolving signals can
be too complicated even for the domain specialists [
46
]. For this
reason, it has recently become a research focus to explore unsu-
pervised (a.k.a. self-supervised) representation learning (URL) for
MTS [
10
,
56
,
59
,
60
]. URL aims to train a neural network (called
encoder) without accessing the labels to embed the data into feature
vectors, by using carefully designed learning objectives to leverage
the inherent structure of the raw data. The learned representations
(a.k.a. features or embeddings) can then be used for training models
to solve a downstream analysis task using little annotated data com-
pared to the traditional supervised methods [
60
]. And the features
are more general-purpose since they can facilitate to several tasks.
Unfortunately, unlike in domains such as computer vision (CV)[
8
,
27
,
29
] or natural language processing (NLP) [
12
,
65
], URL in the
context of time series is still under-explored. MTS are typically
continuous-valued data with high noise, diverse temporal patterns
and varying semantic meanings, etc [
3
]. These unique complexi-
ties make advanced URL methods in the aforementioned domains
dicult to perform well [
10
,
46
]. Although several studies have
attempted to ll this gap by considering the characteristics of time
series, such as the time-evolving nature [
46
] and the multi-scale
semantics [
11
,
59
], existing approaches can still be weak in learning
well-performed representations partly due to the following reasons.
First, the existing representation encoder designs are highly in-
spired by experiences in CV and NLP domains, which may not
be well-suited for MTS. Specically, convolutional neural network
(CNN) [
16
,
49
] and Transformer [
51
] are commonly-used encoders
in recent studies [
10
,
11
,
46
,
56
,
59
,
60
]. However, the encoders still
face many diculties when applying in MTS due to the lack of
capability to deal with the characteristics of time series [
32
,
45
,
66
].
386
Second, some existing approaches rely on domain-specic assump-
tions, such as the neighbor similarity [
11
,
46
] and the contextual
consistency [
59
], thus are dicult to generalize to various scenarios.
For instance, Franceschi et al
. [11]
and Tonekaboni et al
. [46]
as-
sume that subsequences distant in time should be dissimilar, which
can be easily violated in periodic time series [43].
To tackle the issues mentioned above, we explore the time-
series-specic representation encoder without strong assumptions
for URL. In particular, we consider the encoder based on a non-
parametric time series analysis concept named shapelet [
58
], i.e.
salient subsequence which is tailored to extract time series features
from only important time windows to avoid the noises outside. The
main reason is that the shapelet-based representation has shown su-
perior performance in specic tasks such as classication [
25
,
33
,
54
]
and clustering [
61
]. Besides, compared to the feature extracted from
other neural networks such as CNN, the shapelet-based feature can
be more intuitive to understand [
58
]. However, it has never been
explored in the recently rising topic of URL for general-purpose
representation. To ll this gap, we take the rst step and propose
to learn shapelet-based encoder employing contrastive learning,a
popular paradigm that has shown success in URL [8, 10, 59, 64].
We highlight three challenges in learning high-quality and general-
purpose shapelet-based representation. The rst is how to design a
shapelet-based encoder to capture diverse temporal patterns of
various time ranges, considering that it is originally proposed to
represent only a single shape feature, and exhaustive search or prior
knowledge is needed to determine the encoding scale [
5
,
25
,
61
].
The second is how to design a URL objective to learn general infor-
mation for downstream tasks through this shapelet-based encoder,
which has never been studied. Last, while contrastive learning
leverages the representation similarity of the augmentations of one
sample [
8
] to learn the encoder, it remains an open problem to
properly augment the time series to keep the similarity [46, 59].
To cope with these challenges, we propose a novel unsuper-
vised MTS representation learning framework named Contrastive
Shapelet Learning (CSL). Specically, we design a unied archi-
tecture that uses multiple shapelets with various (dis)similarity
measures and lengths to jointly encode a sample, such that to cap-
ture diverse temporal patterns from short to long term. As shapelets
of dierent lengths can separately embed one sample into dierent
representation spaces that are complementary with each other, we
propose a multi-grained contrasting objective to simultaneously
consider the joint embedding and the representations at each time
scale. In parallel, we design a multi-scale alignment loss to encour-
age the representations of dierent scales to achieve consensus.
The basic idea is to automatically capture the varying semantics
by leveraging the intra-scale and inter-scale dependencies of the
shapelet-based embedding. Besides, we develop an augmentation
library using diverse types of data augmentation methods to further
improve the representation quality. To the best of our knowledge,
CSL is the rst general-purpose URL framework based on shapelets.
The main contributions are summarized as follows:
This paper studies how to improve the URL performance using
time-series-specic shapelet-based representation, which has
achieved success in specic tasks but has never been explored
for the general-purpose URL.
A novel framework is proposed that adopts contrastive learn-
ing to learn shapelet-based representations. A unied shapelet-
based encoder architecture and a learning objective with multi-
grained contrasting and multi-scale alignment are particularly
designed to capture diverse patterns in various time ranges. A
library containing various types of data augmentation meth-
ods is constructed to improve the representation quality.
Experiments on tens of real-world datasets from various do-
mains show that i) our learned representations are general
to many downstream tasks, such as classication, clustering,
and anomaly detection; ii) the proposed method outperforms
existing URL competitors and can be comparable to (even bet-
ter than) tailored techniques for classication and clustering.
Additionally, we study the eectiveness of the key compo-
nents proposed in CSL and the model sensitivity to the key
parameters, demonstrate the superiority of CSL against the
fully-supervised competitors on partially labeled data, and ex-
plain the shapelets learned by CSL. We also study our method
in long time series representation and assess its running time.
2 RELATED WORK
There are two lines of research closely related to this paper:
Unsupervised MTS representation learning. Unlike in domains
such as CV [
8
,
27
,
29
,
55
] and NLP [
12
,
65
], the study of URL in time
series is still in its infancy.
Inspired by word representation [
36
], Franceschi et al
. [11]
adapts
the triplet loss to time series to achieve URL. Similarly, Zerveas
et al
. [60]
explores the utility of transformer [
51
] for URL due to
the success of transformer in modeling natural language. Oord
et al
. [39]
proposes to learn the representation by predicting the
future in latent space. Eldele et al
. [10]
extends this idea by con-
ducting both temporal and contextual contrasting to improve the
representation quality. Instead of using prediction, Yue et al
. [59]
combines timestamp-level contrasting with contextual contrasting
to achieve hierarchical representation. Tonekaboni et al
. [46]
as-
sumes consistency between overlapping temporal neighborhoods
to model dynamic latent states, while Yang and Hong
[56]
utilizes
the consistency between temporal and spectral domains to enrich
the representation. Although these methods have achieved improve-
ments in representation quality, they still have limitations such as
the lack of intuitions in encoder design and the dependency on
specic assumptions, as discussed in Section 1.
Time series shapelet. The concept of shapelet is rst proposed
by Ye and Keogh
[58]
for supervised time series classication tasks.
It focuses on extracting features in a notable time range to reduce
the interference of noise, which is prevalent in time series.
In the early studies, shapelets are selected by enumerating subse-
quences of the training time series [
5
,
17
,
38
,
58
], which suers from
non-optimal representation and high computational overhead [
13
].
To address these problems, a shapelet learning method is rst pro-
posed by Grabocka et al
. [13]
, which directly learns the optimal
shapelets through a supervised objective. After this study, many
approaches [
30
,
33
,
34
,
54
] have been proposed to improve the ef-
fectiveness and eciency for classication. Except for supervised
classication task, some works [
47
,
61
,
62
] employ shapelets for
time series clustering and also show competitive performance.
387
of 14
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜