Minimum Spanning Tree Algorithm

Given a list of Connections, which is the Connection class (the city name at both ends of the edge and a cost between them), find some edges, connect all the cities and spend the least amount.
Return the connects if can connect all the cities, otherwise return empty list.

Return the connections sorted by the cost, or sorted city1 name if their cost is same, or sorted city2 if their city1 name is also same.

Example

Gievn the connections = ["Acity","Bcity",1], ["Acity","Ccity",2], ["Bcity","Ccity",3]

Return ["Acity","Bcity",1], ["Acity","Ccity",2]

This problem is the classical minimum spanning tree problem. There are two greedy algorithm available: Prim's and Kruskal.

Solution 1. Union Find Kruskal.

The core idea is that as long as we have not spanned all vertices(cities), we keep picking the cheapest edge e = (u, v), u is in X and v is NOT in X. We can use a union find data structure to simulate this process. If we pick an edge(connection), we would have the following 2 cases.

a. city1 and city2 are already connected: this means this edge does not satisfy the condition that city1 is in X and city2 is NOT in X, this connection should be ignored.

b. city1 and city2 are not connected: this means this edge satisfies the above condition, if it is the cheapst edge out of all those edges that meets the condition, we should select this connection and add it to the final result.

Based on the above analysis, we have the following algorithm.

1. sort the connections to make sure smaller cost connections are in front.

2. create a mapping between city names and union find index as it is best to use integer as union find's index. use numbers from 0 to n - 1 for all cities assuming there are n different cities. As we are creating the mapping, the next available integer index also represents the total number of cities whose mapping are created so far. After all the mapping is done, this idx variable tells us the total number of nodes(cities). This is important in checking if there is any city that is disconnected with all other cities.Based on connected graph theory, if all n cities are connected, there we would include n - 1 different edges that do not introduce any cycle, thus generating the MST. However, if there is a part of the graph that are disconnected from the rest, then it means we must only included fewer than n - 1 different connections, otherwise all cities would be connected.

3. Iterate all connections, add to the final result each connection whose ends are not connected in the uf and connect both ends' mapping in the uf.

4. Check if there are n - 1 connections in the final result. If there aren't return an empty list to indicate there are disconnected cities in the given connections.

Runtime/Space complexity: O(m * logm + 2 * m + n) ~ O(m * logm) runtime; O(n) space

Assuming there are n different cities and m different edges,

1. sorting: O(m * log m) runtime, O(1) space assuming in place quick sort is used.

2. mapping: O(m) runtime, O(n) space.

3. unionfind creation: O(n) runtime, O(n) space.

4. connections iteration: O(m) runtime, as both the uf find and connect operations take O(1) time on average.

 1 /**
 2  * Definition for a Connection.
 3  * public class Connection {
 4  *   public String city1, city2;
 5  *   public int cost;
 6  *   public Connection(String city1, String city2, int cost) {
 7  *       this.city1 = city1;
 8  *       this.city2 = city2;
 9  *       this.cost = cost;
10  *   }
11  * }
12  */
13 class UnionFind {
14     int[] father;
15     UnionFind(int n) {
16         father = new int[n];
17         for(int i = 0; i < n; i++) {
18             father[i] = i;
19         }
20     }
21     int find(int x) {
22         if(father[x] == x) {
23             return x;
24         }
25         return father[x] = find(father[x]);
26     }
27     void connect(int a, int b) {
28         int root_a = find(a);
29         int root_b = find(b);
30         if(root_a != root_b) {
31             father[root_a] = root_b;
32         }
33     }
34 }
35 public class Solution {
36     private Comparator<Connection> comp = new Comparator<Connection>() {
37         public int compare(Connection c1, Connection c2) {
38             if(c1.cost != c2.cost) {
39                 return c1.cost - c2.cost;
40             }
41             else if(!c1.city1.equals(c2.city1)) {
42                 return c1.city1.compareTo(c2.city1);
43             }
44             return c1.city2.compareTo(c2.city2);
45         }  
46     };
47     public List<Connection> lowestCost(List<Connection> connections) {
48         List<Connection> mst = new ArrayList<Connection>();
49         if(connections == null || connections.size() == 0) {
50             return mst;
51         }
52         Collections.sort(connections, comp);
53         int idx = 0;
54         HashMap<String, Integer> strToIdxMap = new HashMap<String, Integer>();
55         for(Connection c : connections) {
56             if(!strToIdxMap.containsKey(c.city1)) {
57                 strToIdxMap.put(c.city1, idx++);
58             }
59             if(!strToIdxMap.containsKey(c.city2)) {
60                 strToIdxMap.put(c.city2, idx++);
61             }
62         }
63         UnionFind uf = new UnionFind(idx);
64         for(Connection c : connections) {
65             int city1Root = uf.find(strToIdxMap.get(c.city1));
66             int city2Root = uf.find(strToIdxMap.get(c.city2));
67             if(city1Root != city2Root) {
68                 mst.add(c);
69                 uf.connect(city1Root, city2Root);
70             }
71         }
72         if(mst.size() < idx - 1) {
73             return new ArrayList<Connection>();
74         }
75         return mst;
76     }
77 }

Solution 2. Prim's Greedy MST algorithm with Adjacent List Graph Representation and Priority Queue.

This solution gives O(m * log n) runtime and O(n + m) space for converting the input connections into a graph representation and the usage of heap. However, this solution is not as good as solution 1 in that extra preprocessing(list of connections to graph) and postprocessing(converting mst to a list of sorted collections as the required output) are needed.

This implementation uses heap data structure which should supports O(logn) aribitrary removal operation given that it already know the reference to the to be removed node. The also needs to support O(1) look up for checking if a vertex is in the heap or not.