parse and navigate the vehicle. In either scenario, it’s useful to have context of the world (as it relates to
transport) such as road blockages, traffic speeds and accidents as they are vital to ensure timely and
optimized routing plans.
1.3 Related Work
There have been a number of papers with resurgence of imagery based learning for autonomous vehicles[9].
However, the efforts are more focussed on imagery to action (steering wheel and so on)t using deep learning
approaches. No research emphasis has been placed on an integrated solution that puts together different
elements of transport and tying them together.
On problems related specifically to traffic density, Shanhang et al [4] report on an approach to understanding
traffic density from web data. Their effort is focussed on web cam data instead of real time traffic congestion
using real time data from vehicles (ride sharing or autonomous). On the deep learning approaches side,
solutions have typically tried to solve object counting as applied to crowd density estimations [6,7].
In contrast, our work here focuses on vehicular congestion based on real traffic imagery contrasting to the
related work which was targeting people counting problems. Additionally, we present a transfer learned
approach for this problem to accelerate training. Furthermore, the approach of breaking down component
wise different transport problems is a scalable and cohesive approach to train and reuse across multiple
problems related to transport.
2 Unified Transport Engine (UTE)
As described earlier, there are a number of ad hoc solutions in place that power existing mapping and ride
sharing marketplaces. There has been a lot of success in deep learning where a number of different
functional units are combined together to build a more comprehensive system. The goal of our work is to
investigate an unified approach to transport using deep learning. Such an approach allows for knowledge
transferability across different components, as well as sharing signals across the whole system. By having a
unified framework, we ensure consistency, common data sets for training, and the opportunity to do joint
optimization across systems. In particular, we show different components that are critical to transport such
as routing, ETA, dispatch and so on. The ability to jointly train and cross functional boundaries is an
important step towards unification of a transport engine. Furthermore, this allows for more flexibility in
expanding to new sensory data - such as transition to leveraging imagery across different systems.
The overall architecture of the Unified Transport Engine is shown in Figure 1. At a high level, the key
components are summarized below:
2.1Input Streams
Fundamentally, the input stream consists of a set of different sources of sensory and historical model data:
● Mobile Data: This refers to set of sensory information collected from the device - e.g. GPS data
collected from mobile phones. This is often processed/filtered through a variety of algorithms before being
aggregated into the feature set.
● Camera Data: Imagery is an integral part of Mapping and Autonomous stack. Data can be
captured via special cameras mounted on the car or through mobile phones as in the case of crowdsourcing.
Number of variants here depending on the objective/use of the data - sequences of image frames are required
for extracting motion/structure primitives, single images for perception and so on.
● Lidar/Radar Data: Autonomous vehicles today typically collect additional non-sensory data using
Lidar, Radar, or Sonar Systems. These are generally more robust to estimate depth maps of the world but
often come with significantly additional costs.
● Priors/Historical:Another important ingredient is data collected from prior experience - billions of
rides, usage patterns and hotspots - this data is run through separate systems and machine learned models
评论