暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
CNN-based multi-frame IMO detection from a monocular camera.pdf
307
8页
0次
2021-02-23
50墨值下载
CNN-based multi-frame IMO detection from a monocular camera
Nolang Fanani
1
, Matthias Ochs
1
, Alina St
¨
urck
1
, Rudolf Mester
1,2
Abstract This paper presents a method for detecting inde-
pendently moving objects (IMOs) from a monocular camera
mounted on a moving car. A CNN-based classifier is employed
to generate IMO candidate patches; independent motion is
detected by geometric criteria on keypoint trajectories in these
patches. Instead of looking only at two consecutive frames, we
analyze keypoints inside the IMO candidate patches through
multi-frame epipolar consistency checks. The obtained motion
labels (IMO/static) are then propagated over time using the
combination of motion cues and appearance-based information
of the IMO candidate patches. We evaluate the performance of
our method on the KITTI dataset, focusing on sub-sequences
containing IMOs.
I. INTRODUCTION
Classical visual odometry methods rely strongly on the
rigidity of the depicted environment through which a robot
(or car) moves. Deviations from this rigidity (e.g. other
robots, cars, pedestrians, etc.) are traditionally excluded from
egomotion analysis by using variants of RANSAC. Recently,
methods have been proposed which are robust against other
moving objects without RANSAC [5], [6]. However, these
methods just ’blank out’ all areas which are not conformant
to the epipolar geometry induced by the ego-motion, but they
do not analyze these areas further.
In the present paper, we present an approach which is
kind of ’piggy-backed’ on the mentioned recently developed
propagation based tracking method (PbT) [6], and employs
a CNN to produce candidate patches that correspond to
single vehicle instances, thus potential independently moving
objects (IMOs). In the presented scheme, these patches are
subsequently associated with each other over time using a
dynamic motion model and simple appearance descriptors,
and the conformity of these association with the epipolar
geometry computed by PbT is checked. As a result, those
IMOs that are moving differently from the motion of the
ego-car can be detected.
There are two main contributions of our work. First, we
propose a monocular IMO detection scheme which relies on
multi-frame epipolar consistency checks. Hence, the case of
epipolar-conformant IMOs is not in the scope of our work.
Second, we propose a way to propagate and associate IMO
candidates over time by combining the motion model with
appearance information.
In this paper, after having presented related work, we
present the detection and association principles used for this
approach. Perspectives for additionally detecting and track-
ing ’epipolar-conformant’ cars are provided in section VI-C.
1
Visual Sensorics & Information Processing Lab, Goethe University
Frankfurt am Main, Germany
2
Computer Vision Laboratory, ISY, Link
¨
oping University, Sweden
Fig. 1. Top: The scheme of the proposed IMO detection. Bottom: An ex-
ample of car classifications into static (green), IMO (red) and undetermined
(yellow).
We conclude with an evaluation of the method on the KITTI
dataset.
II. RELATED WORK
The detection of independently moving objects (IMOs)
from visual sensors is a vitally important part of many
computer vision systems. A traditional application can be
found in visual surveillance, where the camera is static, but
recently, object detection from moving cameras is becoming
more influential.
In this work we will focus on the detection of moving
cars from a vehicle mounted camera. This scenario is very
different to others such as handheld cameras (such as [12]) or
general robot vision (such as [11]) in that motion is severely
restricted.
In the development of other advanced driver assistance
systems (ADAS) several approaches have been proposed.
Many of those choose to work with additional information
such as color images ([2], [17]) or a stereo system [24], [13].
In contrast to these we want to show that it is possible to
reliably detect IMOs from a simple monocular camera.
Previously published monocular algorithms can be differ-
entiated into two categories. Appearance-based approaches
([11], [15]) are often based on patch-matching or learning-
techniques to determine the movement of any visible object.
2018 IEEE Intelligent Vehicles Symposium (IV)
Changshu, Suzhou, China, June 26-30, 2018
978-1-5386-4451-5/18/$31.00 ©2018 IEEE 957
Among such approaches are [27] who use features combined
with a classification process for vehicle description or [28]
who use an attention-inspired model to subtract less impor-
tant image regions to obtain the moving foreground region.
On the other hand, motion-based approaches ([19], [26],
[10]) aim to work with optical flow and other geometric
constraints to determine varying motion patterns from IMOs.
We aim to provide an approach that combines cues from
both the appearance of a car by employing a CNN-based
detection as well as motion cues from optical flow based
on the epipolar geometry to determine the presence of an
independently moving object and track it. In this aspect our
approach shares some similarities with [16] who use two
separate CNNs to determine visual odometry and object lo-
calisation and fuse their results to obtain object localisations
and also with [25] who use CNNs to obtain a rigidity score
for each object and combine this with motion cues from
optical flow. In contrast to their approach, we additionally
use a series of hypothesis tests on keypoints lying on patches
that have previously been identified as belonging to a vehicle
in manner similar to [23] to track the regarded vehicle
through consecutive frames. Bai et al [1] identifies IMOs by
estimating the dense optical flows from each IMO candidate.
In contrast, our approach utilizes sparse keypoints to identify
static and IMOs.
III. FRAMEWORK OVERVIEW
The overall flow of the proposed method is presented
in figure 1 (top image). It builds on a monocular visual
odometry framework, the propagation based tracking (PbT)
scheme [7]. An important principle of PbT is that the new
relative pose for a new frame n+1 is predicted using the car
ego-dynamics, and that a refined relative pose is computed
only on the basis of keypoints that have been tracked at least
twice, that is: keypoints which already passed a stringent test
of being belonging to the static environment. All keypoints,
including the new ones generated in sparsely covered areas
of a new frame, are tracked in an epipolar-guided manner as
discussed more detailed in section III-A.
In the present scheme, we detect IMO candidate patches
for each new frame using the instance segmentation scheme
of van den Brand et al. [22]. This results in M new IMO
candidate patches for each new frame n + 1. The generation
of the CNN-based IMO candidate patches is discussed in
section III-B.
All IMO candidate patches in image n are classified in
one of the three states static, IMO, or undetermined.
Keypoints that are located in IMO candidate patches are con-
sidered for pose estimation only if they have been classified
as static.
When a new frame n + 1 comes in, it is processed by the
CNN in order to determine the IMO candidate patches. At
the same moment, we have a set of old tracked keypoints
in previous frame n. Some of those are already confirmed
as static as they have been tracked at least twice and
have shown 3D consistency (see section IV-A). Others have
been tracked just once (from frame n 1 to n) and are only
candidates for being considered as static.
Subsequently, the relative pose between frame n and
n + 1 is determined from all keypoints which are considered
to be static in frame n in the standard manner used in
propagation based tracking. With the resulting relative pose
n n + 1 being computed, individual keypoints can
then be checked for being conformant with the epipolar
geometry. By accumulating the checks from all keypoints
on a patch, this leads to the classification of that patch into
IMO, static, or undetermined. This IMO detection
procedure is described in detail in section IV-A and section
IV-B.
For each of the new IMO candidate patches in frame
n + 1, an association with the existing patches in frame n
must be performed in order to propagate the motion state
(static/IMO) over time. The association can be made
appearance-based, that is: by comparing size and texture of
the patches, or tracking based, that is: by checking matching
residuals between keypoints inside of the patches. The inter-
frame car patch association is discussed in section V.
A. PbT framework
As said, we build our scheme on the monocular visual
odometry framework using propagation-based tracking (PbT)
proposed in [5], [6]. Principles of keypoint generation and
tracking are used in a similar way here, thus we give some
details in the following. The egomotion of the ego-car is
estimated using keypoints which have been confirmed to
be static. These keypoints are the combination of keypoints
outside CNN-based car masks and keypoints from car masks
that have been classified as static cars. In addition, PbT with
its epipolar constraint is able to propagate the static label of
a car mask on subsequent frames as long as the keypoints
inside that car mask are successfully tracked.
We find keypoints inside each IMO candidate patch by
choosing both corner and edgel points, as proposed in [18].
This approach is an extension of the good-feature-to-track
(GFTT) detector initially proposed by Shi and Tomasi [20].
As the matching and tracking processes used in the present
paper are guided by the epipolar geometry, patches which
have a local structure with only one dominant orientation
(e.g. lines and straight edges) can be matched as long as the
dominant orientation is sufficiently well inclined relative to
the epipolar line under consideration.
Each keypoint is represented by a 15 × 15 patch centered
on the keypoint and we use a 2D Gaussian filter with
the same size of the patch as a masking weight W for
each patch. In order to track the keypoint on subsequent
frames, we employ an iterative matching which minimizes
the photometric error between the patch correspondences.
When a new keypoint is tried to be matched for the first
time, we initialize the matching with motion prior informa-
tion as proposed in [4] followed by a Lucas-Kanade like
optimization to find the final match. The matching results of
a keypoint on consecutive frames form a keypoint trajectory.
A keypoint is finally accepted and used for pose estimation
958
of 8
50墨值下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜