
TFB: Towards Comprehensive and Fair Benchmarking of Time
Series Forecasting Methods
Xiangfei Qiu
East China Normal
University, China
Jilin Hu
East China Normal
University, China
Lekui Zhou
Huawei Cloud Algorithm
Innovation Lab, China
Xingjian Wu
East China Normal
University, China
Junyang Du
East China Normal
University, China
Buang Zhang
East China Normal
University, China
Chenjuan Guo
East China Normal
University, China
Aoying Zhou
East China Normal
University, China
Christian S. Jensen
Aalborg University,
Denmark
Zhenli Sheng
Huawei Cloud Algorithm
Innovation Lab, China
Bin Yang
East China Normal
University, China
ABSTRACT
Time series are generated in diverse domains such as economic,
trac, health, and energy, where forecasting of future values has
numerous important applications. Not surprisingly, many forecast-
ing methods are being proposed. To ensure progress, it is essential
to be able to study and compare such methods empirically in a com-
prehensive and reliable manner. To achieve this, we propose TFB,
an automated benchmark for Time Series Forecasting (TSF) meth-
ods. TFB advances the state-of-the-art by addressing shortcomings
related to datasets, comparison methods, and evaluation pipelines:
1) insucient coverage of data domains, 2) stereotype bias against
traditional methods, and 3) inconsistent and inexible pipelines. To
achieve better domain coverage, we include datasets from 10 dier-
ent domains : trac, electricity, energy, the environment, nature,
economic, stock markets, banking, health, and the web. We also
provide a time series characterization to ensure that the selected
datasets are comprehensive. To remove biases against some meth-
ods, we include a diverse range of methods, including statistical
learning, machine learning, and deep learning methods, and we
also support a variety of evaluation strategies and metrics to ensure
a more comprehensive evaluations of dierent methods. To support
the integration of dierent methods into the benchmark and enable
fair comparisons, TFB features a exible and scalable pipeline that
eliminates biases. Next, we employ TFB to perform a thorough eval-
uation of 21 Univariate Time Series Forecasting (UTSF) methods
on 8,068 univariate time series and 14 Multivariate Time Series
Forecasting (MTSF) methods on 25 datasets. The results oer a
deeper understanding of the forecasting methods, allowing us to
better select the ones that are most suitable for particular datasets
and settings. Overall, TFB and this evaluation provide researchers
with improved means of designing new TSF methods.
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 17, No. 9 ISSN 2150-8097.
doi:10.14778/3665844.3665863
PVLDB Reference Format:
Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang
Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng and
Bin Yang. TFB: Towards Comprehensive and Fair Benchmarking of Time
Series Forecasting Methods. PVLDB, 17(9): 2363 - 2377, 2024.
doi:10.14778/3665844.3665863
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at
https://github.com/decisionintelligence/TFB.
1 INTRODUCTION
As part of the ongoing digitalization, time series are generated
in a variety of domains, such as economic [
36
,
75
], trac [
30
,
33
–
35
,
49
,
51
,
52
,
62
,
79
,
85
,
93
,
95
], health [
44
,
61
,
83
,
88
], energy [
1
,
29
],
and AIOps [
7
,
8
,
41
,
72
,
87
,
103
]. Time Series Forecasting (TSF)
is essential in key applications in these domains [
28
,
67
,
94
,
97
].
Given historical observations, it is valuable if we can know the
future values ahead of time. Correspondingly, TSF has been rmly
established as an active research eld, witnessing the proposal of
numerous methods.
Time series organize data points chronologically and are either
univariate or multivariate depending on the number of variables
in each data point. Accordingly, TSF methods can be classied as
either Univariate Time Series Forecasting (UTSF) or Multivariate
Time Series Forecasting (MTSF) methods. Among early methods,
Autoregressive Integrated Moving Average (ARIMA) [
4
] and Vector
Autoregression (VAR) [
82
] are arguably the most popular univari-
ate and multivariate forecasting methods, respectively. Subsequent
methods that exploit machine learning, e.g., XGBoost [11, 99] and
Random Forest [
5
,
59
] oer better performance than the early meth-
ods. Most recently, methods based on deep learning have demon-
strated state-of-the-art (SOTA) forecasting performance on a variety
of datasets [
10
,
12
,
14
–
16
,
50
,
60
,
64
,
70
,
89
,
91
,
92
,
96
,
101
,
102
,
104
].
As more and more methods are being proposed for dierent
datasets and settings, there is an increasing need for fair and com-
prehensive empirical evaluations. To achieve this, we identify and
address three issues in existing evaluation frameworks, thereby
advancing our evaluation capabilities.
2363
文档被以下合辑收录
评论