
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software, 2022,33(3):985−1004 [doi: 10.13328/j.cnki.jos.006447] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
面向 Flink 迭代作业的动态资源分配策略
∗
岳晓飞
1
,
史
岚
1
,
赵宇海
1
,
季航旭
1
,
王国仁
2
1
(东北大学 计算机科学与工程学院, 辽宁 沈阳 110169)
2
(北京理工大学 计算机学院, 北京 100081)
通信作者: 赵宇海, E-mail: zhaoyuhai@mail.neu.edu.cn
摘 要: 新兴分布式计算框架 Apache Flink 支持在集群上执行大规模的迭代程序, 但其默认的静态资源分配机制
导致无法进行合理的资源配置来使迭代作业按时完成. 针对这一问题, 应该依靠用户来主动表达性能约束而不是
被动地进行资源保留, 故提出了一种基于运行时间预测的动态资源分配策略 RABORP (resource allocation based on
runtime prediction), 来为具有明确运行时限的 Flink 迭代作业制定动态资源分配计划并实施. 其主要思想是: 通过
预测各个迭代超步的运行时间, 然后根据预测结果在迭代作业提交时和超步间的同步屏障处分别进行资源的初始
分配和动态调整, 以保证可使用最小资源集, 使迭代作业在用户规定的运行时限内完成. 通过在不同数据集下执
行多种典型的 Flink 迭代作业进行了相关对比实验, 实验结果表明, 所建立的运行时间预测模型能够对各个超步
的运行时间进行准确预测, 而且在单作业和多作业场景下, 采用所提出的动态资源分配策略相比于目前最先进算
法在各项性能指标上都有所提升.
关键词: 迭代作业; 运行时间预测; 资源分配; 运行时限; Apac he Flink
中图法分类号: TP311
中文引用格式: 岳晓飞, 史岚, 赵宇海, 季航旭, 王国仁. 面向 Flink 迭代作业的动态资源分配策略. 软件学报, 2022, 33(3):
985–1004. http://www.jos.org.cn/1000-9825/6447.htm
英文引用格式: Yue XF, Shi L, Zhao YH, Ji HX, Wang GR. Dynamic Resource Allocation Strategy for Flink Iterative Jobs. Ruan
Jian Xue Bao/Journal of Software, 2022, 33(3): 985−1004 (in Chinese). http://www.jos.org.cn/1000-9825/6447.htm
Dynamic Resource Allocation Strategy for Flink Iterative Jobs
YUE Xiao-Fei
1
, SHI Lan
1
, ZHAO Yu-Hai
1
, JI Hang-Xu
1
, WANG Guo-Ren
2
1
(School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China)
2
(School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China)
Abstra ct : Apache Flink, an emerging distributed computing framework, supports the execution of large-scale iterative programs on the
cluster, but its default static resource allocation mechanism makes it impossible to carry out reasonable resource allocation to make
iterative jobs complete on time. In response to this problem, that users should be relied on to actively express performance constraints
rather than passively retain resources. RABORP, a dynamic resource allocation strategy based on runtime prediction is proposed to
develop and implement a dynamic resource allocation plan for Flink iterative jobs with clear runtime limits. The main idea is to predict
the runtime of each iteration superstep, and then the initial allocation and dynamic adjustment of resources are performed at the time of
the iterative job submission and the synchronization barrier between the supersteps according to the predicted results, to ensure that the
minimum set of resources can be used to complete the iterative job within the runtime limit specified by the user. A variety of typical
Flink iterative jobs were executed under the dataset to carry out relevant comparative experiments. Experimental results show that the
established runtime prediction model can accurately predict the runtime of each superstep, and compared with the current state-of-the-art
algorithms, the proposed dynamic resource allocation strategy used in single-job and multi-job scenarios has improved various
∗ 基金项目: 国家重点研发计划(2018YFB1004402); 国家自然科学基金(61772124)
本文由“数据库系统新型技术”专题特约编辑李国良教授、于戈教授、杨俊教授和范举教授推荐.
收稿时间: 2021-06-30; 修改时间: 2021-07-31; 采用时间: 2021-09-13; jos 在线出版时间: 2021-10-21
评论