暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
VLDB2024_TFB:Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods_华为.pdf
661
15页
3次
2024-09-09
免费下载
TFB: Towards Comprehensive and Fair Benchmarking of Time
Series Forecasting Methods
Xiangfei Qiu
East China Normal
University, China
Jilin Hu
East China Normal
University, China
Lekui Zhou
Huawei Cloud Algorithm
Innovation Lab, China
Xingjian Wu
East China Normal
University, China
Junyang Du
East China Normal
University, China
Buang Zhang
East China Normal
University, China
Chenjuan Guo
East China Normal
University, China
Aoying Zhou
East China Normal
University, China
Christian S. Jensen
Aalborg University,
Denmark
Zhenli Sheng
Huawei Cloud Algorithm
Innovation Lab, China
Bin Yang
East China Normal
University, China
ABSTRACT
Time series are generated in diverse domains such as economic,
trac, health, and energy, where forecasting of future values has
numerous important applications. Not surprisingly, many forecast-
ing methods are being proposed. To ensure progress, it is essential
to be able to study and compare such methods empirically in a com-
prehensive and reliable manner. To achieve this, we propose TFB,
an automated benchmark for Time Series Forecasting (TSF) meth-
ods. TFB advances the state-of-the-art by addressing shortcomings
related to datasets, comparison methods, and evaluation pipelines:
1) insucient coverage of data domains, 2) stereotype bias against
traditional methods, and 3) inconsistent and inexible pipelines. To
achieve better domain coverage, we include datasets from 10 dier-
ent domains : trac, electricity, energy, the environment, nature,
economic, stock markets, banking, health, and the web. We also
provide a time series characterization to ensure that the selected
datasets are comprehensive. To remove biases against some meth-
ods, we include a diverse range of methods, including statistical
learning, machine learning, and deep learning methods, and we
also support a variety of evaluation strategies and metrics to ensure
a more comprehensive evaluations of dierent methods. To support
the integration of dierent methods into the benchmark and enable
fair comparisons, TFB features a exible and scalable pipeline that
eliminates biases. Next, we employ TFB to perform a thorough eval-
uation of 21 Univariate Time Series Forecasting (UTSF) methods
on 8,068 univariate time series and 14 Multivariate Time Series
Forecasting (MTSF) methods on 25 datasets. The results oer a
deeper understanding of the forecasting methods, allowing us to
better select the ones that are most suitable for particular datasets
and settings. Overall, TFB and this evaluation provide researchers
with improved means of designing new TSF methods.
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 17, No. 9 ISSN 2150-8097.
doi:10.14778/3665844.3665863
PVLDB Reference Format:
Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang
Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng and
Bin Yang. TFB: Towards Comprehensive and Fair Benchmarking of Time
Series Forecasting Methods. PVLDB, 17(9): 2363 - 2377, 2024.
doi:10.14778/3665844.3665863
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at
https://github.com/decisionintelligence/TFB.
1 INTRODUCTION
As part of the ongoing digitalization, time series are generated
in a variety of domains, such as economic [
36
,
75
], trac [
30
,
33
35
,
49
,
51
,
52
,
62
,
79
,
85
,
93
,
95
], health [
44
,
61
,
83
,
88
], energy [
1
,
29
],
and AIOps [
7
,
8
,
41
,
72
,
87
,
103
]. Time Series Forecasting (TSF)
is essential in key applications in these domains [
28
,
67
,
94
,
97
].
Given historical observations, it is valuable if we can know the
future values ahead of time. Correspondingly, TSF has been rmly
established as an active research eld, witnessing the proposal of
numerous methods.
Time series organize data points chronologically and are either
univariate or multivariate depending on the number of variables
in each data point. Accordingly, TSF methods can be classied as
either Univariate Time Series Forecasting (UTSF) or Multivariate
Time Series Forecasting (MTSF) methods. Among early methods,
Autoregressive Integrated Moving Average (ARIMA) [
4
] and Vector
Autoregression (VAR) [
82
] are arguably the most popular univari-
ate and multivariate forecasting methods, respectively. Subsequent
methods that exploit machine learning, e.g., XGBoost [11, 99] and
Random Forest [
5
,
59
] oer better performance than the early meth-
ods. Most recently, methods based on deep learning have demon-
strated state-of-the-art (SOTA) forecasting performance on a variety
of datasets [
10
,
12
,
14
16
,
50
,
60
,
64
,
70
,
89
,
91
,
92
,
96
,
101
,
102
,
104
].
As more and more methods are being proposed for dierent
datasets and settings, there is an increasing need for fair and com-
prehensive empirical evaluations. To achieve this, we identify and
address three issues in existing evaluation frameworks, thereby
advancing our evaluation capabilities.
2363
Figure 1: Visualization of data with dierent characteristics.
TFB (ours)
BasicTS+
BasicTS
TSlib &
LTSF-Linear
Datasets
Traffic
Electricity
Energy
Environment
Nature
Economic
Stock
Banking
Health
Web
Figure 2: Statistics of data domains covered by existing mul-
tivariate time series benchmarks.
Figure 3: Box plot of the variations in normalized values of
characteristics across the multivariate datasets in the TFB
and TSlib.
Issue 1. Insucient Coverage of Data Domains. Time series
from dierent domains may exhibit diverse characteristics. Fig-
ure 1a depicts a time series from the environment domain called
AQShunyi [
100
] that records temperature information at hourly
intervals, exhibiting a distinct seasonal pattern. This pattern is rea-
sonable in this scenario because temperatures in nature often cycle
around the year. Figure 1b shows a time series from FRED-MD [
58
]
belongs to economic domain that describes the monthly macroe-
conomic from 114 regional, national, and international sources
with a clear increasing tendency. This may be attributed to overall
Table 1: VAR, LR versus other methods, using MAE as the
evaluation metric and a forecasting horizon of 24 steps.
Datasets VAR LR
PatchTST NLinear FEDformer Crossformer
NASDAQ 0.462
0.616
0.567 0.522 0.547 0.745
Wind 0.620
0.583
0.652 0.640 0.697 0.590
ILI 1.012
4.856
0.835 0.919 1.020 1.096
economic stability with minimal uctuations, reecting sustained
growth in the macroeconomic indicators. Figure 1c depicts a se-
ries among Electricity [
84
] which comes from electricity domain
and has a signicant change in the data at a certain point in time,
which might indicate an abrupt event, etc. However, these sim-
ple patterns are only the tip of the iceberg, and time series from
dierent domains may exhibit much more complex patterns that
either combine the above characteristics or are entirely dierent.
Therefore, using only limited domains results in limited coverage
of time series characteristics, which cannot oer a full picture.
However, few empirical studies and benchmarks cover a wide
variety of data domains. Figure 2 summarizes the multivariate data
domains used in existing forecasting benchmarks which include
MTSF. We observe that TSlib [
89
], LTSF-Linear [
98
], BasicTS [
48
],
and BasicTS+ [
76
] only include around 10 datasets, covering less
than or equal to 5 domains. We observe that these datasets are
concentrated in mainly two domains, namely trac and electricity.
Since the multivariate time series datasets in TSlib are the most used,
we investigate the variations in the values of the characteristics
of datasets in TSlib and TFB—see Figure 3. We observe that the
TFB datasets exhibit more diverse distributions than those of TSlib
across the six characteristics. We argue that it is benecial to broaden
the coverage of domains, thereby enabling a more extensive assessment
of method performance.
Issue 2. Stereotype bias against traditional methods. It is dif-
cult for a single method to exhibit the best performance across
all datasets. Methods exhibit varying performance across dier-
ent datasets. To illustrate the issue, we conduct experiments on
three datasets (NASDAQ [
23
], Wind [
46
], and ILI [
90
]) from dier-
ent domains (stock markets, energy, health) on methods VAR [
82
],
PatchTST [
64
], LinearRegression (LR) [
32
,
40
], NLinear [
98
], FED-
former [
106
], and Crossformer [
101
]. Results are shown in Table 1.
2364
of 15
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜