Planecell: Representing Structural Space with Plane Elements
Lei Fan
1,2
, Long Chen
2
, Kai Huang
2
and Dongpu Cao
3
Abstract— Reconstruction based on the stereo camera has
received considerable attention recently, but two particular
challenges still remain. The first concerns the need to present
and compress data in an effective way, and the second is to
maintain as much of the available information as possible
while ensuring sufficient accuracy. To overcome these issues,
we propose a new 3D representation method, namely, planecell,
that extracts planarity from the depth-assisted image segmen-
tation and then directly projects these depth planes into the
3D world. The proposed method demonstrates its advancement
especially dealing with large-scale structural environment, such
as autonomous driving scene. The reconstruction result of our
method achieves equal accuracy compared to dense point clouds
and compresses the output file 200 times. To further obtain
global surfaces, an energy function formulated from Condi-
tional Random Field that generalizes the planar relationships
is maximized. We evaluate our method with reconstruction
baselines on the KITTI outdoor scene dataset, and the results
indicate the superiorities compared to other 3D space repre-
sentation methods in accuracy, memory requirements and the
scope of applications.
I. INTRODUCTION
3D reconstruction has been an active research area in
the computer vision community, which can be used in
numerous tasks, such as perception and navigation of intelli-
gent robotics, high precision mapping, and online modeling.
Among various sensors that can be used for reconstruc-
tion, stereos cameras are popular for offering advantages in
terms of being low-cost and supplying color information.
Many researchers have improved the precision and speed of
self-positioning and depth calculation algorithms to enable
better reconstruction. However, the basic map representa-
tion method determines the upper bound of reconstruction
performance to some extent. Current approaches including
point clouds, voxel-based or piece-wise planar methods are
confronted with problems dealing massive stereo image
sequences, such as significant redundancy, ambiguities and
high memory requirements. To overcome these limitations,
we propose a new representation method named planecell,
which models planes to deliver geometric information in the
3D space.
It is a classical approach to representing the 3D space
with a preliminary point-level map. The point-based repre-
This work was supported in part by the National Natural Science
Foundation of China under Grant 61773414.
1
Lei Fan is with the Vehicle Intelligence Pioneers Inc., Qingdao Shan-
dong 266109, P.R.China chenl46@mail.sysu.edu.cn
2
Lei Fan, Long Chen and Huang kai are with School of Data and Com-
puter Science, Sun Yat-sen University, Guangzhou, Guangdong, P.R.China.
chenl46@mail.sysu.edu.cn
3
Dongpu Cao is with Department of Mechanical and Mechatronics
Engineering, University of Waterloo, 200 university avenue west, Waterloo,
Ontario N2L3G1, Canada. dongpu@uwaterloo.ca
sentations usually suffer a tradeoff of density and efficiency.
Many approaches [17], [1], [14] have been developed to
address this issue, i.e., to merge similar points in the 3D
reconstruction results for both indoor and outdoor scenes.
The current leading representation method, called the voxel
map [2], [22], [17], [20], is designed to give each voxel
grid an occupancy probability, and then aggregates all points
within a fixed range. However, dense reconstructions using
regular voxel grids are limited to reach small volumes
because of their memory requirements.
Previous studies have adopted the plane prior both in
stereo matching [23] and reconstruction [17], [18], [10],
[3]. Deriving primitives in the model raises the complex-
ity, which restricts further applications. The structure-from-
motion method [3] presented the urban scene with planes
underlying sparse data. Superpixels or image segmentation
methods have been applied in the representation [4] as basic
components. Combined with meshes and smoothing terms, it
achieves good results on large-scale scenes. Although these
methods can reconstruct the scene in a dense and light-
weight approach, the accuracy and time-consumption are still
unsatisfactory.
In this paper, we propose a novel approach that differs
by mapping the 3D space with basic plane units directly
extracting from 2D images, which is called planecell for
it resembles cells to a living being. The proposed method
utilizes a general function to represent a group of points with
similar geometric information, i.e., belong to the same plane
by a depth-aware superpixel segmentation, and these planes
are projected into the real-world coordinates after plane-
fitting with depth values. The standardized representation
promotes memory efficiency and provides convenience for
following computations, such as large surface segmentation
and distance calculation. Our method extracts planecells from
images by superpixelizing the input image following the
hierarchical strategy of SEEDS [19] and converts them into
a 3D map. Further aggregation of planecells to a larger
surface is modeled by a Conditional Random Field (CRF)
formulation. The proposed representation is motivated by the
planar nature of the environment. The input to our method
is stereo pairs, and the output is a plane-based 3D map with
decent pixel-wise precision and high compression rate. Note
the depth acquirement is not specified, e.g. stereo matching
algorithms, LiDAR or RGB-D sensors can also be used.
The detailed contributions of this paper are as follows:
(a) We propose a novel plane-based 3D map representa-
tion method that demonstrates remarkable accuracy and has
enhanced the space perception abilities. (b) The proposed
method reduces the required memory of presenting large-
2018 IEEE Intelligent Vehicles Symposium (IV)
Changshu, Suzhou, China, June 26-30, 2018
978-1-5386-4451-5/18/$31.00 ©2018 IEEE 978
评论