暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
数据库顶会VLDB 2021最佳可扩展数据科学论文奖《Optimizing Bipartite Matching in Real-World Applications by Incremental Cost Computation》- Tenindra Abeywickrama.pdf
730
9页
0次
2021-08-19
免费下载
Optimizing Bipartite Matching in Real-World Applications by
Incremental Cost Computation
Tenindra Abeywickrama
Grab-NUS AI Lab
tenindra.a@grab.com
Victor Liang
Grab-NUS AI Lab
victor.liang@grab.com
Kian-Lee Tan
School of Computing
National University of Singapore
tankl@comp.nus.edu.sg
ABSTRACT
The Kuhn-Munkres (KM) algorithm is a classical combinatorial op-
timization algorithm that is widely used for minimum cost bipartite
matching in many real-world applications, such as transportation.
For example, a ride-hailing service may use it to nd the optimal
assignment of drivers to passengers to minimize the overall wait
time. Typically, given two bipartite sets, this process involves com-
puting the edge costs between all bipartite pairs and nding an
optimal matching. However, existing works overlook the impact of
edge cost computation on the overall running time. In reality, edge
computation often signicantly outweighs the computation of the
optimal assignment itself, as in the case of assigning drivers to pas-
sengers which involves computation of expensive graph shortest
paths. Following on from this observation, we observe common
real-world settings exhibit a useful property that allows us to incre-
mentally compute edge costs only as required using an inexpensive
lower-bound heuristic. This technique signicantly reduces the
overall cost of assignment compared to the original KM algorithm,
as we demonstrate experimentally on multiple real-world data sets,
workloads, and problems. Moreover, our algorithm is not limited
to this domain and is potentially applicable in other settings where
lower-bounding heuristics are available.
PVLDB Reference Format:
Tenindra Abeywickrama, Victor Liang, and Kian-Lee Tan. Optimizing
Bipartite Matching in Real-World Applications by Incremental Cost
Computation. PVLDB, 14(7): 1150 - 1158, 2021.
doi:10.14778/3450980.3450983
1 INTRODUCTION
The Kuhn-Munkres (KM) algorithm [
14
,
16
], also known as the
Hungarian Method, is a combinatorial optimization algorithm that
is widely utilized to solve many real-world problems, particularly
in transportation. The KM algorithm solves the assignment problem,
also known as the minimum-weight bipartite matching problem,
which involves nding an optimal pair-wise assignment of a set
of agents to a set of jobs. Assigning an agent to a job is associated
with some cost, thus the goal is to nd an optimal assignment or
This work was primarily conducted while the rst author was with the Grab-NUS AI
Lab in the Institute of Data Science at the National University of Singapore
This work is licensed under the Creative Commons BY-NC-ND 4.0 International
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights
licensed to the VLDB Endowment.
Proceedings of the VLDB Endowment, Vol. 14, No. 7 ISSN 2150-8097.
doi:10.14778/3450980.3450983
matching of agent-job pairs, such that the overall cost is minimized
(or maximized depending on the problem and desired outcome).
Assignment tasks are of particular importance in transportation
problems, and the KM algorithm is widely used as a subroutine
in many existing works [
8
,
12
,
26
,
28
]. For example, it is used in
ride-hailing services to optimally match drivers to passengers for
maximum utilization of available vehicles. Other examples include
computing mail delivery routes using Route Inspection, where
minimum-weight bipartite matching is used to compute the min-
imum T-join [
4
] or the order picking problem solved by using
an approximate Traveling Salesman algorithm utilizing bipartite
matching. The KM algorithm takes the assignment costs as input,
hence these costs must be computed for each assignment task. How-
ever, we nd that existing works overlook the signicance of this
step. Moreover, all of the aforementioned examples involve com-
puting assignment costs based on computationally expensive graph
shortest paths. For example, the cost to assign a car to a passenger
is the wait time, which is commonly modeled by the travel time of
the shortest path in a road network graph. As we discuss next, cost
computation has signicant implications for algorithm eciency
with increasingly expensive assignment cost metrics.
1.1 Motivation
Let
𝑈
be a set of agents and
𝑉
be a set of jobs, both with size
𝑚 = |𝑈 | = |𝑉 |
, for which an optimal assignment is required. Also,
let
𝑐 (𝑢, 𝑣)
be the cost of assigning agent
𝑢 𝑈
to job
𝑣 𝑉
. Costs
𝑐 (𝑢, 𝑣)𝑢 𝑈 , 𝑣 𝑉
are often conceptualized as an
𝑚 × 𝑚
matrix.
To the best of our knowledge, all previous work utilizing KM to
solve transport problems like ride-hailing assumes this matrix is
provided to the KM algorithm or the cost of computing the ma-
trix is not a bottleneck. However, in many real-world applications,
computing the matrix is not only a non-trivial cost but also more
computationally expensive than the assignment itself. Moreover,
the matrix may need to be re-computed each time an assignment
is required. Given the real-time nature of transportation problems,
this may be quite frequent, which serves to only exacerbate the
non-trivial cost of computing
𝑐 (𝑢, 𝑣)
. For example, in a ride-hailing
service, a new assignment is required as new cars become available
and new passenger requests are received continuously in real-time.
According to Fortune
1
, popular ride-hailing services like Grab are
reported to process 6 million ride requests a day, highlighting the
scale of throughput required.
Our observation can be demonstrated using a simple ride-hailing
assignment framework. Let us represent the cost of assigning a
passenger (job) to a ride-hailing car (agent) as the travel-time of
1
https://fortune.com/longform/grab-gojek-super-apps/
1150
the shortest route from the car to the passenger. All costs for one
car (agent) can be computed by performing a single Dijkstra’s
single-source multiple-destination (SSMD) shortest path query. The
entire cost matrix can be populated by performing
𝑚
such searches.
Simple worst-case analysis suggests that this will cost
𝑂 (𝑚|𝐸| +
𝑚|𝑁 | log |𝑁 |)
time
2
where
|𝑁 |
and
|𝐸|
are the number of vertices
and edges in the road network graph and
𝑚 = |𝑈 | = |𝑉 |
. Typical
real-world scenarios would see this dominate the KM algorithm
time complexity of
𝑂 (𝑚
3
)
. For example, in the Singapore road
network
|𝑁 |
is over 280
,
000 while
𝑚
might be 100 representing
nding a matching for 100 ride-hailing cars to 100 passengers. We
verify this intuition in practice for the Singapore road network for
varying values of
𝑚
in Figure 1a using Dijkstra’s search as above.
As expected, the time to compute the matrix dominates the time
to compute the optimal assignment for increasing
𝑚
, only being
overtaken when
𝑚
reaches 5000. In Figure 1b we show this is still
true even if a fast modern point-to-point shortest path technique
like Contraction Hierarchies is used
3
.
0
20
40
60
80
100
50 250 1000 2500 5000
% of Query Time
Bipartite Set Size (m)
Matrix
Assignment
(a) Dijkstra
0
20
40
60
80
100
50 250 1000 2500 5000
% of Query Time
Bipartite Set Size (m)
Matrix
Assignment
(b) Contraction Hierarchies
Figure 1: Proportion of running time spent on matrix com-
putation and nding optimal assignment on Singapore road
network for varying 𝑚
1.2 Contributions
We have seen that computing the cost matrix often dominates com-
puting the optimal assignment itself. Moreover, the cost matrix
must be computed from scratch for each assignment problem and
may be required to be performed frequently in real-world applica-
tions such as ride-hailing. In attempting to address the scalability
and throughput concerns that arise as a result, it begs the question
of whether all assignment costs are even necessary to compute an
optimal solution as rst observed by [
15
]. We observe that this is
also not necessarily the case due to a property exhibited by opti-
mal assignments in typical real-world scenarios. For example, in
a ride-hailing service for a particular geographic region, such as
Singapore, typically passengers and drivers will be distributed in
various parts of the region. It is unlikely that a driver
𝑢 𝑈
will be
assigned to some passenger 𝑣 𝑉 a signicant distance away. We
say that such problems exhibit high spatial locality of matching. Us-
ing this intuition we propose a minimum-weight bipartite matching
algorithm based on the KM algorithm that incrementally computes
costs that are most likely to be in the optimal matching. We develop
2
Dijkstra’s complexity using Fibonacci heaps. Note that the number of edges
|𝐸 |
on
road network graphs observes |𝐸 | = 𝑂 ( |𝑁 |)
3
While faster techniques for point-to-point shortest path search are available, Dijkstra
is typically faster for SSMD search when many destinations are involved. This is
because for increasing
𝑚
the number of point-to-point queries increases by its square,
explaining why the matrix computation cost percentage is still high for large values of
𝑚 in Figure 1b for CH.
novel renement rules using inexpensive lower-bounding heuris-
tics to only compute costs when necessary. Notably, our technique
still computes the optimal matching, but does so while computing
far fewer expensive pair-wise exact assignment costs, signicantly
reducing the overall running time. Moreover, our technique is a
drop-in replacement for the KM algorithm in any technique or
framework that uses the KM as a subroutine. Our contributions can
be summarized as follows:
We identify that computing assignment costs such as graph
shortest paths are more computationally expensive than
nding the optimal assignment itself in typical workloads
for real-world problems such as ride-hailing.
We present a minimum-weight bipartite matching algorithm
based on the Kuhn-Munkres algorithm that incrementally
computes the exact assignment costs required for an assign-
ment only when it is necessary according to novel pruning
rules utilizing inexpensive lower-bounding heuristics.
We implement a specialized lower-bounding heuristic for
use in ride-hailing services, where the assignment cost is
represented by the travel-time of the shortest path in a road
network graph, adapting landmark-based lower-bounds and
graph search techniques.
Our extensive experimental investigation using large-scale
real-world data sets and workloads demonstrates the signi-
cant improvement achieved by our proposed solutions with
highly favorable implications for real-world scalability and
throughput.
2 PRELIMINARIES
The assignment problem is often formulated as the minimum weight
bipartite matching problem. In this formulation, we are given a
bipartite graph
𝐵 = (𝑈 𝑉 , 𝐸
𝐵
)
where
𝑈
and
𝑉
are the bipartite sets
of vertices.
𝐸
𝐵
is the set of edges, and contains an edge
(𝑢, 𝑣)𝑢
𝑈 , 𝑣 𝑉
. The weight
𝑐 (𝑢, 𝑣)
of an edge represents the cost of
assigning
𝑢
to
𝑣
. The assignment problem nds a perfect matching,
where every object in
𝑈
is assigned to exactly one object in
𝑉
(and
vice versa), such that the sum of the weights over all assigned pairs
is minimized. For simpler exposition, we consider the size of sets
to be equal, i.e.,
𝑚 = |𝑈 | = |𝑉 |
, which in practice can be simulated
by adding dummy vertices to the smaller set. Next, we describe the
preliminaries for the applied setting for which our techniques are
designed to be deployed.
Road Network:
In the case of a ride-hailing service, the bipar-
tite sets consist of the locations of passengers and drivers to be
matched. The cost of assigning a passenger to a driver is commonly
considered as the minimum travel-time for the driver to reach the
passenger’s location. These costs can be computed by rst consid-
ering the road network
𝐺 = (𝑉
𝐺
, 𝐸)
, where
𝑉
𝐺
is the set of vertices
and
𝐸
is the set of edges. Each edge
(𝑥, 𝑦) 𝐸
represents the road
segments connecting junction vertices
𝑥
and
𝑦
with weight
𝑤 (𝑥, 𝑦)
representing the travel-time to traverse the edge. Note that other
real positive metrics, such as physical length, can also be consid-
ered. In our context travel-time, and hence the waiting time for
passengers, is most relevant. The network distance 𝑑 (𝑠, 𝑡) between
a source vertex
𝑠
and destination vertex
𝑡
is the minimum sum
of weights connecting vertices
𝑠
and
𝑡
, i.e., by the shortest path
1151
of 9
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜