From EPI:
Write a program which takes an existing highway network (specified as a set of highway sections between pairs of cities) and proposals for new highway sections, and returns the proposed highway section which leads to the most improvement in the total driving distance.
The total driving distance is defined to be sum of the shortest path distances between all pairs of cities. All sections, existing and proposed, allow for bi-directional traffic, and the original network is connected.
Once again, we’re confronted with another graph-related problem.
Before we can determine the effectiveness of a new highway section, we’ll need to first compute the current driving distances between all pairs of cities. In this article, we’ll briefly review common shortest path algorithms to solidify our understanding of them.
A Review of Shortest Path Algorithms
Typically, when attempting to determine the shortest path between a pair of cities, we’ll use one of two methods:
Djikstra’s Algorithm
Djikstra’s algorithm involves greedily selecting the node that’s closest to our source node. We used a modified version of Djikstra’s in the Connecting Cities at Minimum Cost problem, where we worked with raw edge costs instead of cumulative distances.
class Node:
def __init__(self, label):
self.label = label
self.d = math.inf # distance
# Initialise nodes and priority queue
# ...
def djikstra():
while pq:
dist, u = heapq.heappop(pq)
if dist != nodes[u].d:
continue
visited.add(u)
for v, cost in nodes[u].edges.items():
if v not in visited:
nodes[v].d = min(nodes[v].d, nodes[u].d + cost)
heapq.heappush(pq, (nodes[v].d, v))
- Due to its greedy approach, Djikstra cannot account for negative weight edges that might show up later in unvisited parts of the graph.
- The algorithm runs in
O(|V| log |V|)
/O(|E| log |E|)
time.
Bellman-Ford Algorithm
The Bellman-Ford algorithm involves going through |E|
edges |V|
number of times.
def bellman_ford(edges, n):
for i in range(n):
for u, v, cost in edges:
nodes[v].d = min(nodes[v].d, nodes[u].d + cost)
# Check for -ve cycles
for u, v, cost in edges:
if nodes[v].d < nodes[u].d + cost:
return False
- We know that the shortest path between the source and the destination node cannot be more than
n
. When we cycle through all the edges for eachi
, one of the edges we relax will be part of this shortest path. - Bellman-Ford computes the shortest distances for all possible destination nodes.
- Its non-greedy approach allows it to handle negative weight edges.
- Bellman-Ford can be used to detect negative cycles.
- The algorithm runs in
O(|V||E|)
time.
Both of these algorithms are single-source shortest path algorithms. You could imagine repeating them multiple times to cover the entire road network described in the problem, but the recommended strategy is to use the Floyd-Warshall algorithm, which is both time efficient and easy to understand.
The Floyd-Warshall Algorithm
The intuition behind Floyd-Warshall is quite simple. We want to consider increasing subsets of intermediate nodes and evaluate every possible pair (i, j)
for these subsets.
def floyd_warshall(edges, n):
M = [
[math.inf for _ in range(n)]
for _ in range(n)
]
for u, v, cost in edges:
M[u][v] = M[v][u] = cost
for i in range(n):
M[i][i] = 0
for k in range(n):
for i in range(n):
for j in range(n):
M[i][j] = min(
M[i][j],
M[i][k] + M[k][j]
)
At k = 0
, we’re considering all pairs of (i, j)
that can be improved with the inclusion of city 0
as a pit stop. For subsequent values k = 1, 2...
, we’re considering all pairs of (i, j)
that can be improved with the inclusion of the set of cities k = { 0, 1, 2... }
as possible pit stops.
Does this sort of remind you of Dynamic Programming? If it doesn’t, don’t fret. Think of it this way –– the shortest distance we can achieve for (i, j)
with the intermediate nodes { 0 ... k }
is either:
- The shortest distance between
(i, j)
with intermediate nodes{ 0 ... k-1 }
- The shortest distance between
(i, k)
with{ 0 ... k-1 }
plus the shortest distance between(k, j)
with{ 0 ... k-1 }
Ordering in Floyd-Warshall
Note that the order of your loops matter –– k
has to remain at the othermost loop.
There aren’t many good explanations online for this particular aspect of Floyd-Warshall, but let’s attempt to see why this property is vital with an example:
# Suppose there are `n = 4` cities in our network
for x in itertools.product(range(4), repeat=3):
print(x)
"""
Loop: First Second Third
(0, 0, 0)
(0, 0, 1)
(0, 0, 2)
...
(0, 3, 2)
(0, 3, 3)
(1, 0, 0)
(1, 0, 1)
(1, 0, 2)
(1, 0, 3)
(1, 1, 0)
(1, 1, 1)
(1, 1, 2)
(1, 1, 3)
(1, 2, 0)
(1, 2, 1)
(1, 2, 2)
(1, 2, 3)
(1, 3, 0)
(1, 3, 1)
(1, 3, 2)
(1, 3, 3)
(2, 0, 0)
(2, 0, 1)
...
"""
To reiterate: Every k
value represents a subset. If k = 2
, that means we’ve fully considered { 0, 1 }
as intermediate nodes and are now considering k = 2
for a given (i, j)
.
Consider the problem of i = 1
, j = 2
, and k = 3
:
M[1][2] = min(M[1][2], M[1][3] + M[3][2])
In order for this computation to be valid, it must be the case that we’ve computed the following subproblems:
i = 1
,j = 2
, andk = 2
i = 1
,j = 3
, andk = 2
i = 3
,j = 2
, andk = 2
If we order our loop as (i, j, k)
, we can’t fully satisfy this requirement. In our output above, (1, 2, 3)
is printed before (1, 3, 2)
and (3, 2, 2)
. This means we’ll be iterating on a problem whose subproblems haven’t been resolved.
What if we ordered our loop in the manner of (k, i, j)
as we’ve done in our code? By the time we reach (3, 1, 2)
, the results for (2, 1, 2)
, (2, 1, 3)
, and (2, 3, 2)
would have all been determined –– which is exactly what we want.
In essence, the order the loops matter because they describe the order in which we address the subproblems.
Computing Pairwise Distances Between Cities
Let’s now apply Floyd-Warshall to our current problem.
# Predefined
Highway = collections.namedtuple('Highway', ('x', 'y', 'dist'))
def find_best_proposal(H, P, n):
# Matrix to keep track of pairwise distances
M = [[math.inf for _ in range(n)] for _ in range(n)]
# Every city's distance to itself should be 0
for i in range(n):
M[i][i] = 0
# Insert provided distance information
for h in H:
M[h.x][h.y] = M[h.y][h.x] = h.dist
# Run Floyd-Warshall
for k, i, j in itertools.product(range(n), repeat=3):
M[i][j] = min(M[i][j], M[i][k] + M[k][j])
# Consider best proposal
# ...
Once we’ve computed the distances between cities, our next step would be to evaluate each of the proposals and the improvements they offer to the entire system.
Evaluating Possible New Highways
We can think of each proposed highway p
in P
as a new intermediate node that we’re introducing. The total distance savings obtained from introducing a new highway will be measured as the sum of the savings earned for each (i, j)
pair.
Highway = collections.namedtuple('Highway', ('x', 'y', 'dist'))
def find_best_proposal(H, P, n):
M = [[math.inf for _ in range(n)] for _ in range(n)]
# Run Floyd-Warshall
# ...
# Consider best proposal
best_proposal, best_savings = None, None
for p in P:
total_savings = 0
for i, j in itertools.product(range(n), repeat=2):
# Update savings for every (i, j) pair
savings = M[i][j] - (M[i][p.x] + p.dist + M[i][p.y])
total_savings += max(savings, 0)
if not best_savings or total_savings > best_savings:
best_proposal, best_savings = p, total_savings
return best_proposal
Once we’re done iterating through all proposals, we can simply return the proposal that provides our network with the highest amount of savings.
Given |V|
nodes, our algorithm runs in O(|V|^2)
space and O(|V|^3)
time.
With that, we’ve solved the Road Network Optimisation problem 🛣.
Full Solution
Highway = collections.namedtuple('Highway', ('x', 'y', 'dist'))
def find_best_proposal(H, P, n):
# Computing existing shortest distances
M = [[math.inf for _ in range(n)] for _ in range(n)]
for i in range(n):
M[i][i] = 0
for h in H:
M[h.x][h.y] = M[h.y][h.x] = h.dist
for k, i, j in itertools.product(range(n), repeat=3):
M[i][j] = min(M[i][j], M[i][k] + M[k][j])
# Evaluating proposals
best_proposal, best_savings = None, None
for p in P:
total_savings = 0
for i, j in itertools.product(range(n), repeat=2):
savings = M[i][j] - (M[i][p.x] + p.dist + M[p.y][j])
total_savings += max(savings, 0)
if not best_savings or total_savings > best_savings:
best_proposal, best_savings = p, total_savings
return best_proposal