基于数据场聚类的共享单车需求预测模型-乔少杰，韩楠，岳昆，易玉根，黄发良，元昌安，丁鹏，Louis Alberto GUTIERREZ.pdf

上善若水

421

26页

2次

2022-05-19

免费下载

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn

Journal of Software, 2022,33(4):1451−1476 [doi: 10.13328/j.cnki.jos.006461] http://www.jos.org.cn

基于数据场聚类的共享单车需求预测模型

∗

乔少杰

韩

楠

岳

昆

易玉根

黄发良

元昌安

丁

鹏

, Louis Alberto GUTIERREZ

(成都信息工程大学软件工程学院, 四川成都 610225)

(成都信息工程大学管理学院, 四川成都 610225)

(云南大学信息学院, 云南昆明 650504)

(江西师范大学软件学院, 江西南昌 330022)

(南宁师范大学计算机与信息工程学院, 广西南宁 530023)

(广西教育学院, 广西南宁 530023)

(Department of Computer Science, Rensselaer Polytechnic Institute, New York, USA)

通信作者: 韩楠, E-mail: hannan@cuit.edu.cn

摘要: 共享单车系统日益普及, 积累了海量的出行轨迹数据. 在共享单车系统中, 用户的借车和还车行为是随

机的, 且受天气、时间等动态因素影响, 使得共享单车调度不平衡, 影响单车用户体验, 并给运营商造成巨大经

济损失. 提出了新型基于站点聚类的共享单车需求预测算法, 通过构建单车转移网络计算站点活跃度, 充分考虑

站点地理位置和单车转移模式因素, 基于数据场聚类思想, 将距离相近和用车模式相似的站点聚合到一个聚簇中,

给出最佳簇中心个数求取方法. 充分分析时间和天气因素对站点单车需求的影响, 利用皮尔逊相关系数, 从真实

天气数据中选择相关性最大的天气特征, 结合历史聚簇内单车需求量, 将其转化为三维向量, 利用多特征长短时

记忆深度神经网络 LSTM (long short-term memory)对向量内的特征信息进行学习和训练, 以 30 分钟为长时间间隔,

对每个聚簇内的单车需求量进行预测分析. 与传统机器学习算法和当前主流方法进行对比, 实验结果表明, 所提

单车需求模型预测性能得到显著提升.

关键词: 共享单车系统; 单车转移网络; 站点聚类; 数据场; LSTM 网络

中图法分类号: TP18

中文引用格式: 乔少杰, 韩楠, 岳昆, 易玉根, 黄发良, 元昌安, 丁鹏, Gutierrez LA. 基于数据场聚类的共享单车需求预测

模型. 软件学报, 2022, 33(4): 1451–1476. http://www.jos.org.cn/1000-9825/6461.htm

英文引用格式: Qiao SJ, Han N, Yue K, Yi YG, Huang FL, Yuan CA, Ding P, Gutierrez LA. Shared-bike Demand Prediction Model

Based on Station Clustering. Ruan Jian Xue Bao/Journal of Software, 2022, 33(4): 1451−1476 (in Chinese). http://www.jos.org.cn/

1000-9825/6461.htm

Shared-bike Demand Predictio n Model Based o n Station Cl ustering

QIAO Shao-Jie

, HAN Nan

, YUE Kun

, YI Yu-Gen

, HUANG Fa-Liang

, YUAN Chang-An

, DING Peng

Louis Alberto GUTIERREZ

(School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China)

(School of Management, Chengdu University of Information Technology, Chengdu 610225, China)

(School of Information Science and Engineering, Yunnan University, Kunming 650504, China)

(School of Software, Jiangxi Normal University, Nanchang 330022, China)

∗ 基金项目: 国家自然科学基金(61772091, 61802035, 61962006, 62072311, U1802271, U2001212); 四川省科技计划(2021JDJQ0021,

2020YFG0153, 20YYJC2785, 2019YFS0067, 2020YJ0481, 2020YFS0466, 2020YJ0430, 2020YDR0164); CCF-华为数据库创新

研究计划(CCF-HuaweiDBIR2020004A); 广西自然科学基金(2018GXNSFDA138005)

本文由“面向开放场景的鲁棒机器学习”专刊特约编辑陈恩红教授、李宇峰副教授、邹权教授推荐.

收稿时间: 2021-01-17; 修改时间: 2021-07-16; 采用时间: 2021-08-27; jos 在线出版时间: 2021-10-26

1452

软件学报 2022 年第 33 卷第 4 期

(School of Computer and Information Engineering, Nanning Normal University 530023, Nanning, China)

(Guangxi College of Education, Nanning 530023, China)

(Department of Computer Science, Rensselaer Polytechnic Institute, New York, USA)

Abstra ct : Bike-sharing system is becoming more and more popular and there accumulates a large volume of trajectory data. In the

bike-sharing system, the borrowing and returning behavior of users are arbitrary. In addition, bike-sharing system will be affected by

weather, time period, and other dynamic factors, which makes shared bike scheduling unbalanced, affects user’s experience, and causes

huge economic losses to operators. A novel shared-bike demand prediction model based on station clustering is proposed, the activeness

of stations is calculated by constructing a bike transformation network. The geographical location of stations and the bike transmission

patterns are taken into full consideration, and the stations with near distances and transformation patterns are aggregated into a cluster

based on the idea of data field clustering. In addition, a method for computing the optimal number of cluster centers is presented. The

influence of time and weather factors on bike demand is fully analyzed and the Pearson correlation coefficient is used to choose the most

relevant weather features from the real weather data and transformed into a three-dimensional vector by taking into consideration the

historical demand for bicycles in the cluster. In addition, long short-term memory (LSTM) neural network with multiple features is

employed to learn and train the feature information in the vector, and the bike demand in each cluster is predicted and analyzed every

thirty minutes. When compared with the traditional machine learning algorithms and the state-of-the-art methods, the results show that the

prediction performance of the proposed model has been significantly improved.

Key words: bike-sharing system; bike transformation network; station clustering; data field; long short-term memory (LSTM) network

近年来, 共享单车成为一种主要的出行手段, 成为智慧城市中不可或缺的交通工具. 随着环保意识的提

升, 越来越多的人更加重视以绿色环保的方式出行. 此外, 共享单车真正解决了“最后一公里”问题, 改变了人

们的生活方式. 截至 2020 年 10 月底, 哈啰单车用户累计骑行 240 亿公里, 累计减少碳排放量近 280 万吨

[1]

. 一

定数量的共享单车, 为用户提供点到点的出行方案, 可以有效改善交通拥堵现象.

虽然共享单车为出行带来诸多便利, 成为一种主流出行方式, 但是通过对国内外当前研究现状的分析,

了解到如何以高效的方式运营共享单车系统具有一定挑战性, 现有研究成果的局限性主要归纳为如下几点.

(1) 国内外现有研究很多都是针对单个站点的需求进行预测, 没有考虑站点间的关联对单车使用的影

响. 为了解决整个城市内单车供求不平衡问题, 仅研究单个站点的需求量, 不足以提升共享单车系

统的服务质量;

(2) 虽然目前已经存在一些单车需求预测模型, 但是这些模型普遍存在区域局限性问题. 通常在一个城

市的预测准确度比较高, 但出于某些原因, 如用户出行习惯不同或天气差异较大等, 对其他城市的

单车需求预测效果并不理想;

(3) 现有单车需求预测研究根据经验仅考虑了静态条件因素, 忽略了任意时间长度以前的动态因素对

单车需求的影响. 在未来一段时间内, 用户对单车的需求量会受当前以及前一段时间单车站点状态

的影响.

本文的研究动机基于如下几点考虑: 1) 现有工作没有考虑天气因素对站点内单车需求的影响, 仅利用历

史单车行程记录和站点分布进行分析. 通过前期大量实验发现: 在共享单车系统中, 天气信息是影响需求量

的重要因素, 考虑天气特征可以极大地提高算法的准确性; 2) 对不同时间段站点内单车使用情况进行分析,

发现大多数站点的单车使用模式呈现多样性, 一定范围的站点内单车使用模式更具相似性. 然而现有研究主

要是对单一站点内单车需求进行预测, 准确性相对较低; 3) 通过深度学习的方法可以考虑任意时间长度以前

的动态因素对单车需求量的影响, 而传统的单车需求预测算法根据经验仅考虑了静态条件因素, 因此无法满

足站点内单车的实时需求预测.

为了克服现有单车需求预测方法的不足, 本文提供新型基于数据场聚类和长短时记忆深度神经网络的智

能共享单车需求预测模型, 主要贡献包括

: (1) 构建单车转移网络, 分别考虑借车站点对自身活跃度的影响和

还车站点对借车站点活跃度的影响, 得到所有站点的活跃度; (2) 综合考虑站点位置和单车转移模式, 利用数

据场聚类的思想, 对共享单车系统内的站点进行二级聚类, 利用轮廓系数求取最佳的簇中心个数; (3) 分析时

间和天气因素对站点单车需求的影响, 进而选择影响预测准确性的关键特征; (4) 构建三维向量, 利用多特征

of 26

免费下载

软件学报计算机技术

关注

评论