统一传输引擎：学习预测交通拥堵.pdf

poPoq

8页

0次

2021-05-01

50墨值下载

Unified Transport Engine:

Learning to Predict Traffic Congestions

Ramesh Sarukkai, Shaohui Sun

Mapping & Perception

       Lyft Inc.

Palo Alto, CA

rsarukkai@lyft.com





psun@lyft.com

Abstract

We present the framework for an unified transport engine that allows for streamlining

wide variety of data sources and designed to support different elements required for

seamless transport ranging from speed profiles, congestion, estimated time, features

required for routing, and so on. In this paper, we specifically dig into the problem of

real time traffic congestion estimation using imagery collected from standard mobile

phones. We discuss how we can quickly build highly accurate detectors using transfer

learning. Examples and edge cases that worked well as well are challenges for our

system are presented.

1 Introduction

Transport in the physical world is an incredibly hard problem, even for humans. There is a lot of complexity

ranging from perception (lane, signage), localization (GPS, SLAM), marketplace dynamics

(supply/demand), accurate maps/routing data, passenger/driver personalization, and much more. This

problem is compounded by the fact that there is noise and lossy compression happening at various layers. Its

incredible that billions of people are able to drive relatively safely on a daily basis!

1.1Classic Approaches

Classic approaches generally have been to collect data directly from vehicles - either through embedded

sensors (OEMs) or through mobile devices. In either case, the data is pre-processed locally and then

aggregated in the backend. A series of machine learning and filtering algorithms are applied server side in

order to determine various transport related data such as traffic congestion, speed, vehicle and passenger

localization and so on. This data is then used to execute on functionality required - such as dispatching a

driver for a passenger request, auction in the marketplace to find the closest vehicle, expected time to reach

destination or pickup and so on. Currently, classic non-Autonomous scenarios for mapping and ride sharing

don’t use imagery for such functionality.

1.2 Autonomous Vehicles

With the advent of autonomous vehicles, it’s critical to be able to learn and scale solutions to these problems

automatically - especially using imagery. With the recent spurt of storage and compute resources, we are

able to do more and more with data - pushing the envelope to the edge, but many fundamental problems in

putting this all together remains. There are two variations thematically for driving autonomous vehicles - one

end of the spectrum is end-to-end driving where the model suggests required action (e.g. steering wheel

angle) from imagery/sensory inputs. Another end of the spectrum is a more semantic approach where we

perceive objects in the world, have a refined understanding of the environment and then have a planner to

parse and navigate the vehicle. In either scenario, it’s useful to have context of the world (as it relates to

transport) such as road blockages, traffic speeds and accidents as they are vital to ensure timely and

optimized routing plans.

1.3 Related Work

There have been a number of papers with resurgence of imagery based learning for autonomous vehicles[9].

However, the efforts are more focussed on imagery to action (steering wheel and so on)t using deep learning

approaches. No research emphasis has been placed on an integrated solution that puts together different

elements of transport and tying them together.

On problems related specifically to traffic density, Shanhang et al [4] report on an approach to understanding

traffic density from web data. Their effort is focussed on web cam data instead of real time traffic congestion

using real time data from vehicles (ride sharing or autonomous). On the deep learning approaches side,

solutions have typically tried to solve object counting as applied to crowd density estimations [6,7].

In contrast, our work here focuses on vehicular congestion based on real traffic imagery contrasting to the

related work which was targeting people counting problems. Additionally, we present a transfer learned

approach for this problem to accelerate training. Furthermore, the approach of breaking down component

wise different transport problems is a scalable and cohesive approach to train and reuse across multiple

problems related to transport.

2 Unified Transport Engine (UTE)

As described earlier, there are a number of ad hoc solutions in place that power existing mapping and ride

sharing marketplaces. There has been a lot of success in deep learning where a number of different

functional units are combined together to build a more comprehensive system. The goal of our work is to

investigate an unified approach to transport using deep learning. Such an approach allows for knowledge

transferability across different components, as well as sharing signals across the whole system. By having a

unified framework, we ensure consistency, common data sets for training, and the opportunity to do joint

optimization across systems. In particular, we show different components that are critical to transport such

as routing, ETA, dispatch and so on. The ability to jointly train and cross functional boundaries is an

important step towards unification of a transport engine. Furthermore, this allows for more flexibility in

expanding to new sensory data - such as transition to leveraging imagery across different systems.

The overall architecture of the Unified Transport Engine is shown in Figure 1. At a high level, the key

components are summarized below:

2.1Input Streams

Fundamentally, the input stream consists of a set of different sources of sensory and historical model data:

● Mobile Data: This refers to set of sensory information collected from the device - e.g. GPS data

collected from mobile phones. This is often processed/filtered through a variety of algorithms before being

aggregated into the feature set.

● Camera Data: Imagery is an integral part of Mapping and Autonomous stack. Data can be

captured via special cameras mounted on the car or through mobile phones as in the case of crowdsourcing.

Number of variants here depending on the objective/use of the data - sequences of image frames are required

for extracting motion/structure primitives, single images for perception and so on.

● Lidar/Radar Data: Autonomous vehicles today typically collect additional non-sensory data using

Lidar, Radar, or Sonar Systems. These are generally more robust to estimate depth maps of the world but

often come with significantly additional costs.

● Priors/Historical:Another important ingredient is data collected from prior experience - billions of

rides, usage patterns and hotspots - this data is run through separate systems and machine learned models