Small to Large

Let’s assume we have the following problem: You are given a rooted tree where each node has a color. You want to count, for each subtree, the number of distinct colors that are within that subtree. 

A naive solution is to DFS through each subtree, at each subtree you store a set containing all the distinct colors. Then you can merge the subtrees of your children together. The code resembles something like this: 

const int N = 1000;
int ans[N+1];
set<int> colors[N+1];

int color[N+1];

void dfs(int v, int p){
    colors[v].insert(color[v]);
    for(int u : adj[v]){
        if(u == p)continue;
        dfs(u, v);
        for(int c : colors[u]){
            colors[v].insert(c);
        }
    }
    ans[v] = colors[v].size();
}

int main(){

    /*
    code to read input
    */
    dfs(1, -1);
    for(int i = 1; i <= N; i++){
        cout << ans[i] << ‘\n’;
    }
}

This code will have a worst case of O(N^2logN). An example of this is if the tree is a long chain. 

It turns out we can speed this code up significantly to reach O(Nlog^2N) worst case. The name small to large implies we should always merge the smaller set into the larger one. It turns out if we only merge smaller sets into the larger one, we will achieve worst case O(Nlog^2N) time complexity. The proof involves proving that each node color will only move at most O(logN) times. 

const int N = 1000;
int ans[N+1];
set<int> colors[N+1];

int color[N+1];

void dfs(int v, int p){
    colors[v].insert(color[v]);
    for(int u : adj[v]){
        if(u == p)continue;
        dfs(u, v);
        if(colors[u].size() > colors[v].size()){
            swap(colors[u], colors[v]); //ensure that colors[u].size() is smaller hence small to large
        }
        for(int c : colors[u]){
            colors[v].insert(c);
        }
    }
    ans[v] = colors[v].size();
}

int main(){

    /*
    code to read input
    */
    dfs(1, -1);
    for(int i = 1; i <= N; i++){
        cout << ans[i] << ‘\n’;
    }
}