数据库顶会VLDB 2021最佳可扩展数据科学论文奖《Optimizing Bipartite Matching in Real-World Applications by Incremental Cost Computation》- Tenindra Abeywickrama.pdf

小小亮

773

9页

0次

2021-08-19

免费下载

Optimizing Bipartite Matching in Real-World Applications by

Incremental Cost Computation

Tenindra Abeywickrama

∗

Grab-NUS AI Lab

tenindra.a@grab.com

Victor Liang

Grab-NUS AI Lab

victor.liang@grab.com

Kian-Lee Tan

School of Computing

National University of Singapore

tankl@comp.nus.edu.sg

ABSTRACT

The Kuhn-Munkres (KM) algorithm is a classical combinatorial op-

timization algorithm that is widely used for minimum cost bipartite

matching in many real-world applications, such as transportation.

For example, a ride-hailing service may use it to nd the optimal

assignment of drivers to passengers to minimize the overall wait

time. Typically, given two bipartite sets, this process involves com-

puting the edge costs between all bipartite pairs and nding an

optimal matching. However, existing works overlook the impact of

edge cost computation on the overall running time. In reality, edge

computation often signicantly outweighs the computation of the

optimal assignment itself, as in the case of assigning drivers to pas-

sengers which involves computation of expensive graph shortest

paths. Following on from this observation, we observe common

real-world settings exhibit a useful property that allows us to incre-

mentally compute edge costs only as required using an inexpensive

lower-bound heuristic. This technique signicantly reduces the

overall cost of assignment compared to the original KM algorithm,

as we demonstrate experimentally on multiple real-world data sets,

workloads, and problems. Moreover, our algorithm is not limited

to this domain and is potentially applicable in other settings where

lower-bounding heuristics are available.

PVLDB Reference Format:

Tenindra Abeywickrama, Victor Liang, and Kian-Lee Tan. Optimizing

Bipartite Matching in Real-World Applications by Incremental Cost

Computation. PVLDB, 14(7): 1150 - 1158, 2021.

doi:10.14778/3450980.3450983

1 INTRODUCTION

The Kuhn-Munkres (KM) algorithm [

], also known as the

Hungarian Method, is a combinatorial optimization algorithm that

is widely utilized to solve many real-world problems, particularly

in transportation. The KM algorithm solves the assignment problem,

also known as the minimum-weight bipartite matching problem,

which involves nding an optimal pair-wise assignment of a set

of agents to a set of jobs. Assigning an agent to a job is associated

with some cost, thus the goal is to nd an optimal assignment or

∗

This work was primarily conducted while the rst author was with the Grab-NUS AI

Lab in the Institute of Data Science at the National University of Singapore

This work is licensed under the Creative Commons BY-NC-ND 4.0 International

License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of

this license. For any use beyond those covered by this license, obtain permission by

emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights

licensed to the VLDB Endowment.

Proceedings of the VLDB Endowment, Vol. 14, No. 7 ISSN 2150-8097.

doi:10.14778/3450980.3450983

matching of agent-job pairs, such that the overall cost is minimized

(or maximized depending on the problem and desired outcome).

Assignment tasks are of particular importance in transportation

problems, and the KM algorithm is widely used as a subroutine

in many existing works [

]. For example, it is used in

ride-hailing services to optimally match drivers to passengers for

maximum utilization of available vehicles. Other examples include

computing mail delivery routes using Route Inspection, where

minimum-weight bipartite matching is used to compute the min-

imum T-join [

] or the order picking problem solved by using

an approximate Traveling Salesman algorithm utilizing bipartite

matching. The KM algorithm takes the assignment costs as input,

hence these costs must be computed for each assignment task. How-

ever, we nd that existing works overlook the signicance of this

step. Moreover, all of the aforementioned examples involve com-

puting assignment costs based on computationally expensive graph

shortest paths. For example, the cost to assign a car to a passenger

is the wait time, which is commonly modeled by the travel time of

the shortest path in a road network graph. As we discuss next, cost

computation has signicant implications for algorithm eciency

with increasingly expensive assignment cost metrics.

1.1 Motivation

Let

𝑈

be a set of agents and

𝑉

be a set of jobs, both with size

𝑚 = |𝑈 | = |𝑉 |

, for which an optimal assignment is required. Also,

let

𝑐 (𝑢, 𝑣)

be the cost of assigning agent

𝑢 ∈ 𝑈

to job

𝑣 ∈ 𝑉

. Costs

𝑐 (𝑢, 𝑣)∀𝑢 ∈ 𝑈 , 𝑣 ∈ 𝑉

are often conceptualized as an

𝑚 × 𝑚

matrix.

To the best of our knowledge, all previous work utilizing KM to

solve transport problems like ride-hailing assumes this matrix is

provided to the KM algorithm or the cost of computing the ma-

trix is not a bottleneck. However, in many real-world applications,

computing the matrix is not only a non-trivial cost but also more

computationally expensive than the assignment itself. Moreover,

the matrix may need to be re-computed each time an assignment

is required. Given the real-time nature of transportation problems,

this may be quite frequent, which serves to only exacerbate the

non-trivial cost of computing

𝑐 (𝑢, 𝑣)

. For example, in a ride-hailing

service, a new assignment is required as new cars become available

and new passenger requests are received continuously in real-time.

According to Fortune

, popular ride-hailing services like Grab are

reported to process 6 million ride requests a day, highlighting the

scale of throughput required.

Our observation can be demonstrated using a simple ride-hailing

assignment framework. Let us represent the cost of assigning a

passenger (job) to a ride-hailing car (agent) as the travel-time of

https://fortune.com/longform/grab-gojek-super-apps/

1150

the shortest route from the car to the passenger. All costs for one

car (agent) can be computed by performing a single Dijkstra’s

single-source multiple-destination (SSMD) shortest path query. The

entire cost matrix can be populated by performing

𝑚

such searches.

Simple worst-case analysis suggests that this will cost

𝑂 (𝑚|𝐸| +

𝑚|𝑁 | log |𝑁 |)

time

where

|𝑁 |

and

|𝐸|

are the number of vertices

and edges in the road network graph and

𝑚 = |𝑈 | = |𝑉 |

. Typical

real-world scenarios would see this dominate the KM algorithm

time complexity of

𝑂 (𝑚

)

. For example, in the Singapore road

network

|𝑁 |

is over 280

000 while

𝑚

might be 100 representing

nding a matching for 100 ride-hailing cars to 100 passengers. We

verify this intuition in practice for the Singapore road network for

varying values of

𝑚

in Figure 1a using Dijkstra’s search as above.

As expected, the time to compute the matrix dominates the time

to compute the optimal assignment for increasing

𝑚

, only being

overtaken when

𝑚

reaches 5000. In Figure 1b we show this is still

true even if a fast modern point-to-point shortest path technique

like Contraction Hierarchies is used

100

50 250 1000 2500 5000

% of Query Time

Bipartite Set Size (m)

Matrix

Assignment

(a) Dijkstra

100

50 250 1000 2500 5000

% of Query Time

Bipartite Set Size (m)

Matrix

Assignment

(b) Contraction Hierarchies

Figure 1: Proportion of running time spent on matrix com-

putation and nding optimal assignment on Singapore road

network for varying 𝑚

1.2 Contributions

We have seen that computing the cost matrix often dominates com-

puting the optimal assignment itself. Moreover, the cost matrix

must be computed from scratch for each assignment problem and

may be required to be performed frequently in real-world applica-

tions such as ride-hailing. In attempting to address the scalability

and throughput concerns that arise as a result, it begs the question

of whether all assignment costs are even necessary to compute an

optimal solution as rst observed by [

]. We observe that this is

also not necessarily the case due to a property exhibited by opti-

mal assignments in typical real-world scenarios. For example, in

a ride-hailing service for a particular geographic region, such as

Singapore, typically passengers and drivers will be distributed in

various parts of the region. It is unlikely that a driver

𝑢 ∈ 𝑈

will be

assigned to some passenger 𝑣 ∈ 𝑉 a signicant distance away. We

say that such problems exhibit high spatial locality of matching. Us-

ing this intuition we propose a minimum-weight bipartite matching

algorithm based on the KM algorithm that incrementally computes

costs that are most likely to be in the optimal matching. We develop

Dijkstra’s complexity using Fibonacci heaps. Note that the number of edges

|𝐸 |

road network graphs observes |𝐸 | = 𝑂 ( |𝑁 |)

While faster techniques for point-to-point shortest path search are available, Dijkstra

is typically faster for SSMD search when many destinations are involved. This is

because for increasing

𝑚

the number of point-to-point queries increases by its square,

explaining why the matrix computation cost percentage is still high for large values of

𝑚 in Figure 1b for CH.

novel renement rules using inexpensive lower-bounding heuris-

tics to only compute costs when necessary. Notably, our technique

still computes the optimal matching, but does so while computing

far fewer expensive pair-wise exact assignment costs, signicantly

reducing the overall running time. Moreover, our technique is a

drop-in replacement for the KM algorithm in any technique or

framework that uses the KM as a subroutine. Our contributions can

be summarized as follows:

•

We identify that computing assignment costs such as graph

shortest paths are more computationally expensive than

nding the optimal assignment itself in typical workloads

for real-world problems such as ride-hailing.

•

We present a minimum-weight bipartite matching algorithm

based on the Kuhn-Munkres algorithm that incrementally

computes the exact assignment costs required for an assign-

ment only when it is necessary according to novel pruning

rules utilizing inexpensive lower-bounding heuristics.

•

We implement a specialized lower-bounding heuristic for

use in ride-hailing services, where the assignment cost is

represented by the travel-time of the shortest path in a road

network graph, adapting landmark-based lower-bounds and

graph search techniques.

•

Our extensive experimental investigation using large-scale

real-world data sets and workloads demonstrates the signi-

cant improvement achieved by our proposed solutions with

highly favorable implications for real-world scalability and

throughput.

2 PRELIMINARIES

The assignment problem is often formulated as the minimum weight

bipartite matching problem. In this formulation, we are given a

bipartite graph

𝐵 = (𝑈 ∪𝑉 , 𝐸

𝐵

)

where

𝑈

and

𝑉

are the bipartite sets

of vertices.

𝐸

𝐵

is the set of edges, and contains an edge

(𝑢, 𝑣)∀𝑢 ∈

𝑈 , 𝑣 ∈ 𝑉

. The weight

𝑐 (𝑢, 𝑣)

of an edge represents the cost of

assigning

𝑢

𝑣

. The assignment problem nds a perfect matching,

where every object in

𝑈

is assigned to exactly one object in

𝑉

(and

vice versa), such that the sum of the weights over all assigned pairs

is minimized. For simpler exposition, we consider the size of sets

to be equal, i.e.,

𝑚 = |𝑈 | = |𝑉 |

, which in practice can be simulated

by adding dummy vertices to the smaller set. Next, we describe the

preliminaries for the applied setting for which our techniques are

designed to be deployed.

Road Network:

In the case of a ride-hailing service, the bipar-

tite sets consist of the locations of passengers and drivers to be

matched. The cost of assigning a passenger to a driver is commonly

considered as the minimum travel-time for the driver to reach the

passenger’s location. These costs can be computed by rst consid-

ering the road network

𝐺 = (𝑉

𝐺

, 𝐸)

, where

𝑉

𝐺

is the set of vertices

and

𝐸

is the set of edges. Each edge

(𝑥, 𝑦) ∈ 𝐸

represents the road

segments connecting junction vertices

𝑥

and

𝑦

with weight

𝑤 (𝑥, 𝑦)

representing the travel-time to traverse the edge. Note that other

real positive metrics, such as physical length, can also be consid-

ered. In our context travel-time, and hence the waiting time for

passengers, is most relevant. The network distance 𝑑 (𝑠, 𝑡) between

a source vertex

𝑠

and destination vertex

𝑡

is the minimum sum

of weights connecting vertices

𝑠

and

𝑡

, i.e., by the shortest path

1151