Segment Tree -

The segment tree is the most commonly used data structure in competition programming. It is also extremely versatile and able to do many different operations that can hugely simplify many data structure problems. It is mainly used to do range update and range query.

Let’s start with the most basic version of the segment tree: range sum query

The segment tree employs a divide and conquer idea: Let’s repeatedly split the array into half and recursively repeat. The segment tree is essentially a binary tree.

Assume we have an array A of length 8. The bottom row of the binary tree is the original array (A[1], A[2] … ). The ancestors of these nodes show the ranges of which we store answers: [1, 2] stores A[1] + A[2], [5, 6] stores A[5] + A[6], [5,8] stores A[5] + A[6] + A[7] + A[8] … etc.

Of course if the array isn’t a power of 2 we won’t have a perfect binary tree, but we can technically just pad null elements to the end of the array until we reach a power of 2.

Building this tree is simple enough, we can use recursion.

Now how do we answer queries?

We will traverse the segment tree. Assume the current interval that the segment tree covers is [lo, hi]. We have a few cases. If the queried range [l, r] matches [lo, hi], we can simply return the current sum and terminate.

Otherwise either the queried range is completely contained in the left/right interval or its split between both children.

Assume we query [3, 7]. We can see that at the end, only the nodes we circled will end up matter ([3, 4] [5, 6] and [7]). Remember that each interval is split in half, so we will have at most depth logN. In addition, we will visit at most 4 nodes per level (this can be proved with induction). Hence the time complexity of such a query is O(logN).

An update is much more straightforward. Say we want to update index 3, notice that only

the highlighted path needs to be modified (logN nodes).

To update, we start from node 3 and traverse towards the root, each time updating the current node and merging the left and right children.

const int DEFAULT = 0;
vector<int> segtree;

int combine(int a, int b){return a+b;} //combine 2 children function

void build(int v, int l, int r, vector<int> &arr){
if(l == r){
segtree[v] = arr[l]; //bottom layer of segment tree
return;
}
int mid = (l + r) / 2;
build(v * 2, l, mid, arr); //build left subtree
build(v * 2 + 1, mid + 1, r, arr); //build right subtree
segtree[v] = combine(segtree[v * 2], segtree[v * 2 + 1]); //merge
}

void update(int v, int l, int r, int pos, int val){
if(l == r){
segtree[v] = val; //pos == l == r update bottom layer
return;
}
int mid = (l + r) / 2;
if(mid <= pos){
update(v * 2, l, mid, pos, val); //if our update position on left
}else{
update(v * 2 + 1, mid + 1, r, pos, val); //update on the right
}
segtree[v] = combine(segtree[v * 2], segtree[v * 2 + 1]); //combine
}

int query(int v, int l, int r, int lo, int hi){
if(hi < l || r < lo)return DEFAULT; //out of query range
if(l >= lo && r <= hi)return segtree[v]; //completely inside query range
int mid = (l + r) / 2;
return combine(query(v*2, l, mid, lo, hi), query(v * 2 + 1, mid + 1, r, lo, hi));
}

Note that a segment tree supports any associative function, it could be range GCD, range min/max. DEFAULT is an initial value that has no effect on our answer.

Walking a Segment Tree

It is also entirely possible to solve the following problem: Find the first element of the subarray [L,R] that has value >= x.

We can do this by storing a segment tree that supports maximum range queries. Then we can “walk” down the nodes of the segment tree as following:

//first index such that a[i] >= x
int walk(int v, int l, int r, int lo, int hi, int x){
if(l > hi || r < lo)return -1; //out of range doesn’t exist
if(segtree[v] < x)return -1; //the maximum element in our current range < x, so there isn’t possibly an element >= x in this range so return impossible
if(l == r)return l; //reached the bottom layer of the segtree
int mid = (l + r) / 2;
int left = walk(v * 2, l, mid, lo, hi, x);
if(left != -1)return left; //found in our left child, return left answer
return walk(v * 2 + 1, mid + 1, r, lo, hi, x); //answer must be in right child
}

This can easily be adapted to finding the first element <=, finding the right most element, etc.

Practice:

https://dmoj.ca/problem/cco13p2

https://dmoj.ca/problem/stp1

https://dmoj.ca/problem/ds3