暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
A Survey of Scene Understanding by Event Reasoning in Autonomous Driving.pdf
265
18页
0次
2021-02-23
50墨值下载
International Journal of Automation and Computing
DOI: 10.1007/s11633-018-1126-y
A Survey of Scene Understanding by Event Reasoning in
Autonomous Driving
Jian-Ru Xue
1
Jian-Wu Fang
1,2
Pu Zhang
1
1
Institute of Artificial Intelligence and Robotics, Xi
an Jiaotong University, Xi
an, 710049, China
2
Chang
an University, Xi
an, 710064, China
Abstract: Realizing autonomy is a hot research topic for automatic vehicles in recent years. For a long time, most of the efforts
to this goal concentrate on understanding the scenes surrounding the ego-vehicle (autonomous vehicle itself). By completing low-
level vision tasks, such as detection, tracking and segmentation of the surrounding traffic participants, e.g., pedestrian, cyclists and
vehicles, the scenes can be interpreted. However, for an autonomous vehicle, low-level vision tasks are largely insufficient to give help
to comprehensive scene understanding. What are and how about the past, the on-going and the future of the scene participants? This
deep question actually steers the vehicles towards truly full automation, just like human beings. Based on this thoughtfulness, this
paper attempts to investigate the interpretation of traffic scene in autonomous driving from an event reasoning view. To reach this
goal, we study the most relevant literatures and the state-of-the-arts on scene representation, event detection and intention prediction
in autonomous driving. In addition, we also discuss the open challenges and problems in this field and endeavor to provide possible
solutions.
Keywords: Autonomous vehicle, scene understanding, event reasoning, intention prediction, scene representation.
1 Introduction
Automation is one of the hottest topics in transp orta-
tion research and could yield completely driverless cars
in less than a decade. Nature in 2015
[1]
.
Can the driverless cars be completely yielded in less than
a decade? Manifestly, it is still decades away based on the
observations of current progresses and remaining challenges
in autonomous vehicles. So far, no one is close to develop
a fully autonomous vehicle. The fleet testing by Uber and
Google operates under tightly controlled conditions
[2]
.
The reasons are from four aspects: 1) The exist-
ing methods of environment perception, e.g., detection
[3]
,
tracking
[4, 5]
and segmentation
[6]
of participants in traf-
fic scenes, still produce inevitable errors in real environ-
ment; 2) The driving environment is rather complex, unpre-
dictable, dynamic, and uncertain; 3) Deep traffic scene un-
derstanding, such as understanding the geometry/topology
structure of scene, and spatio-temporal evolution of partic-
ipants (pedestrian, vehicle, etc.), is studied far from suffi-
cient, whose ultimate goal is to semantically reasoning the
scene evolvement so as to provide clues for behavior deci-
sion and autonomous vehicle control. Actually, it is dif-
Review
Manuscript received October 16, 2017; accepted March 9, 2018
This work was supported by National Key R&D Program Project of
China (No. 2016YFB1001004), Natural Science Foundation of China
(Nos. 61751308, 61603057, 61773311), China Postdoctoral Science
Foundation (No. 2017M613152), and Collaborative Research with
MSRA.
Recommended by Associate Editor Matjaz Gams
c
Institute of Automation, Chinese Academy of Sciences and
Springer-Verlag Gmbh Germany, part of Springer Nature 2018
ficult to study because these elements are implicitly con-
tained in the driving environment and cannot be directly
observed; 4) The deployment of autonomous vehicle faces
social dilemma and involves moral issue
[7]
. Complementary
to our survey, Janai et al.
[8]
exhaustively reviewed the traf-
fic participant recognition, detection and tracking, scene
reconstruction, motion estimation, semantic segmentation,
and many other vision-based tasks. Xue et al.
[9]
made an
overview on autonomous vehicle systems from the perspec-
tives of self-localization and multi-sensor fusion for obstacle
detection and tracking, and emphasized vision-centered fu-
sion of multiple sensors. Zhu et al.
[10]
studied the latest
progresses on lane detection, traffic sign/light recognition
in the perception of intelligent vehicles. These surveys, to
a great extent, give a comprehensive and detailed investi-
gation concerning with the first reason mentioned above.
In this paper, we focus on the third aspect: survey on the
deep understanding of traffic scene for autonomous vehicles.
This paper aims to explore the evolution of traffic scene
from an event reasoning view. That is because event can
reflect the dynamic evolution process of scene with tractable
reasoning strategy
[11]
. In order to provide a clear and logi-
cal investigation, this paper reasons the event from its rep-
resentation, detection, as well as prediction stages. In the
representation stage, the main goal is to obtain high-level
clues for the following stages. In this stage, we expound
the saliency, the contextual layout, and the topology rules
for autonomous driving. As for the detection stage, we
review the event detection with respect to different partic-
ipants, such as pedestrian and vehicles. For the prediction
stage, this paper elaborates the intention of autonomous
2 International Journal of Automation and Computing
vehicles with regard to the expected time span for future
prediction. We classify the prediction of intention as long-
term intention prediction and short-term prediction. Fig. 1
demonstrates the surveying flowchart in this paper. At dis-
tinct stages, we discuss open problems and challenges, and
endeavour to provide possible solutions.
Fig. 1 The scene understanding flowchart by event reasoning
framework for autonomous driving
Actually, beyond those stages, some end-to-end ap-
proaches emerge recently for scene understanding facing
autonomous driving
[1214]
. They rely on a large-scale data-
driven mechanism, and formulate the scene to decide with
deep layers or recursive perception, such as fast recurrent
fully convolutional networks (FCN) for direct perception in
autonomous driving
[12]
and FCN-LSTM
[13]
for a future mo-
tion action feasibility distribution. We specially take a sec-
tion to present this category. We hope that our survey can
sweep away some entry barriers of deep scene understanding
for autonomous driving, and draw forth meaningful insights
and solutions for this field.
1.1 Autonomy pursuit in driving
Developing autonomous systems aim to assist humans
in handing everyday tasks. Autonomous driving system, a
system for closely related to humans
everyday trips, has
become people
s one of the most typical pursuits. It can
free hands from the steering wheel, and spare time for tack-
ling many other things. Meanwhile, the equipped sensors of
autonomous vehicle can also recognize the surrounding con-
dition immediately and ensure safe driving, thus decreasing
traffic accidents. Encouraged by those merits, researchers
are diligently pursuing autonomous driving all the time.
There are two kinds of driving force in the development
of autonomous driving. One is the projects launched and
challenges posed by different governments, research insti-
tutes and vehicle manufacturers. The other we want to
emphasize is the publicly available benchmarks.
Projects and launched challenges. Since 1986, Eu-
rope started an intelligent transportation system project,
named as PROMETHEUS, involving more than 13 ve-
hicle manufacturers and research institutions around 19
countries. Thorpe et al.
[15]
in Carnegie Mellon University
launched the first autonomous driving project in the United
States. This project made breakthrough in 1995 that au-
tonomously drove a car from Pittsburgh, Pennsylvania to
San Diego, California. Supported by many related stud-
ies, the US government established the National Automated
Highway System Consortium (NAHSC) in 1995. Motivated
by these projects, highway scenarios has been intensively
studied for a long time, while urban scene remained as an
uncultivated area. Actually, urban scene is closely related
to human
s daily lives. At that time, a famous “DARPA
Grand Challenge (DUC)” launched by Defense Advanced
Research Projects Agency (DARPA) largely accelerated
the progress of autonomous vehicle. Among them, “Ur-
ban Challenge”
[16]
, the third challenge launched by DARPA
(others had been held in 2004 and 2005 respectively, aiming
to test the self-driving performance in the Mojave Desert of
the United States
[17, 18]
, took place on November 3, 2007 at
the now-closed George Air Force Base in Victorville, Cali-
fornia. Rules included obeying all traffic regulations while
negotiating with other vehicles and obstacles and merg-
ing into traffic. There were 4 teams completed the route
within 6 hours. In 2009, National Natural Science Foun-
dation of China launched the China Intelligent Vehicle Fu-
ture Challenge (iVFC). Up to now, the ninth contest was
held in November 2017. Google started their self-driving
car project in 2009, and completed over 5 million miles
driving test until March 2018
1
. In 2016, the project was
evolved into an independent self-driving technology com-
pany Waymo. Tesla Autopilot
2
, by equipping cameras,
twelve ultrasonic sensors and a forward-facing radar, all the
vehicles can have the self-driving ability since October 2016.
As a matter of fact, more and more vehicle manufacturers,
such as Audi, BMW, Benz, have begin their projects to
develop their self-driving vehicles.
Benchmarks. In 2012, Geiger et al.
[19]
introduced the
KITTI vision benchmark, which contained six different ur-
ban scenes, and had 156 video sequences with time span
from 2 minutes to 8 minutes. Within this benchmark,
they launched several typical vision tasks, such as pedes-
trian/vehicle detection, optical flow, stereo flow, road detec-
tion, lane detection, etc. The benchmark was collected by
an ego-vehicle equipped with color and gray cameras, and
Velodyne 3D laser scanner and high-precision GPS/IMU
inertial navigation systems. At the same time, Cambridge
University released CamVid dataset
[20]
, which provided a
semantic segmentation evaluation benchmark containing
only four video sequences on urban scene. Another pop-
ular benchmark is the Cityscapes dataset
[21]
released in
2016. Urban scene was collected in 50 cities, and have 5 000
fine-annotated images and 20 000 coarse-annotated images.
Cityscapes has become the most challenging dataset for
semantic segmentation task. Actually, annotation is time
and labor consuming. Based on that, Gaidon et al.
[22]
con-
structed a large-scale KITTI-like virtual dataset
3
by com-
puter graphic technology. The benefit of virtual dataset
is that it can generate every wanted task, for those that
1
https://www.theverge.com/2018/2/28/17058030/waymo-self-
driving-car-360-degree-video
2
https://www.tesla.com/autopilot
3
http://www.europe.nav erlabs.com/Research/Computer-
Vision/Proxy-Virtual-Worlds
of 18
50墨值下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜