Disjoint Set Union

The Disjoint Set Union (DSU) is a data structure used to store disjoint sets. Two sets are considered disjoint if they don’t have any elements in common. 

The DSU supports mainly the following operations:

  • Create a new set with one element in it
  • Merge two disjoint sets
  • Find the representative of the set that contains a certain element x. (Each set will have a representative, or leader of the set) 

This data structure will allow you to do such operations in around O(1) time complexity, and also allow you to store data on these sets such as the sizes. 

The disjoint sets inside of the data structure can be maintained as trees. Each tree will correspond to a set with each tree having a root (which is also the representative of the set) 

A new set will start as a single vertex as shown: 

To begin efficiently merging trees, we need to maintain for each node (or element in a set), its parent in the corresponding tree. If a node is the root of a tree, then its parent is itself. Recall that every node in a tree has exactly one parent. 

Now using the parent representation, we can easily merge naively. Imagine we merge the set containing {2} to the set containing {1}. We can simply set the parent of 2 to be 1 and create a new tree as shown. 

To find the representative of a tree, we keep traversing to the parent until we reach the root (in which case its parent will be itself). 

void newNode(int v){
    parent[v] = v;
}

int getRepresentative(int v){
    if(v == parent[v]){
        return v;
    }
    return getRepresentative(parent[v]);
}

void merge(int u, int v){
    u = getRepresentative(u);
    v = getRepresentative(v);
    if(u != v){ //don’t merge if they’re already in the same set
        parent[u] = v;
    }
}

Notice that we only change the parents of the root of each tree. 

However, this implementation is inefficient as it can degenerate to O(N) worst case per call. (Imagine a tree that is a long chain). 

Path Compression 

When we call getRepresentative(v), we are technically finding the root of the tree for all nodes on the path from v to the root. 

If 1 is the root and we call getRepresentative(4), we will traverse 4 -> 3 -> 2 -> 1. 

To speed up each call of getRepresentative(v) on average, we can change the parents of 4, 3, and 2 all to be 1 as we recursively move up the tree. If we do that we will end up with a tree like so:

Notice how the tree got flatter. By doing so, we can reduce our time complexity to around logN per call. 

int getRepresentative(int v){
    if(v == parent[v]){
        return v;
    }
    return parent[v] = getRepresentative(parent[v]);
}

Union by Size

We can further make optimizations to our DSU by now changing the merge function. Before, we arbitrarily selected which tree to merge into the other. Now we will strategically choose the smaller one to merge into the larger one (by size). If we do this correctly, the time complexity should reduce down to on average O(a(N)) where a(x) is the inverse Ackermann function, which grows so slow that a(x) < 4 for x < 10^500. 

void newNode(int v){
    parent[v] = v;
    compSize[v] = 1;
}

int getRepresentative(int v){
    if(v == parent[v]){
        return v;
    }
    return parent[v] = getRepresentative(parent[v]);
}

void merge(int u, int v){
    u = getRepresentative(u);
    v = getRepresentative(v);
    if(u == v){ //don’t merge if they’re already in the same set
        return;
    }
    if(compSize[u] > compSize[v])swap(u, v); //ensure u is smaller size
    compSize[v] += compSize[u]; //add all of u into v
    parent[u] = v; //set new parent
}

Example Problem: https://dmoj.ca/problem/ds2

This problem asks us to find a spanning tree in a graph, or to tell us if the graph is disconnected.

To solve, we simply iterate over the edges while maintaining a DSU to check if two nodes are in the same component already or not. If they aren’t, then we add the current edge to the graph and merge the two nodes. At the end, check if all nodes are in the same component (or we can check if we added N-1 edges as a tree will always have N-1 edges). 

#include<bits/stdc++.h>
using namespace std;

int parent[100001];
int compSize[100001];

void newNode(int v){
    parent[v] = v;
    compSize[v] = 1;
}

int getRepresentative(int v){
    if(v == parent[v]){
        return v;
    }
    return parent[v] = getRepresentative(parent[v]);
}

void merge(int u, int v){
    u = getRepresentative(u);
    v = getRepresentative(v);
    if(u == v){ //don’t merge if they’re already in the same set
        return;
    }
    if(compSize[u] > compSize[v])swap(u, v); //ensure u is smaller size
    compSize[v] += compSize[u]; //add all of u into v
    parent[u] = v; //set new parent
}

int main(){
    int N, M; cin >> N >> M;
    for(int i = 1; i <= N; i++){
        newNode(i);
    }
    vector<int> answer;
    for(int i = 0; i < M; i++){
        int u, v; cin >> u >> v;
        if(getRepresentative(u) == getRepresentative(v)){
            continue; //already in same set
        }
        merge(u, v);
        answer.push_back(i+1);
    }
    if(answer.size() != N – 1){
        cout << “Disconnected Graph” << endl;
    }else{
        for(int i : answer){
            cout << i << ‘\n’;
        }
    }

}

Extensions:

  • Checking bipartiteness online