## MO’s algorithm

The name doesn’t tell anything about this algorithm, but what we need to learn is the beautiful thing that it can do. There are two versions of this algorithm.

First one is pretty simple, you’re given an array and queries. Each query asks to give the sum in the range l to r. This could be sum or mode or min or anything depending on the skills we have after practicing this first. I am taking sum.

The naïve approach would be simple to iterate over all l to r which has a worst-case complexity O(n*q) where n is the length of the query and q is the number of queries.

This isn’t feasible when the constraints are high.

Can we think of an O(n*logn) solution? Yes, that is what segment trees can do.

Then why are we studying this algorithm? Because it is little easier to implement and understand as well. And in this version, it is possible to implement almost all the problems using segment trees.

So, the idea is to use our array into some blocks, the number of blocks would be ceil(sqrt(n)).

Now suppose we have the answer for each of the block and also suppose that the queries are of the form which have l and r as multiples of sqrt(n). That’s it! The answer for these kinds of queries would be just prefixSum[r]-prefixSum[l-1], (prefixSum[i] is the sum of elements of 0 to i blocks)

Now what is left behind. If we are given queries in the form of l and r where l and r are not multiples of sqrt(n), Will finding the maximum part of it through the blocks and then doing the remaining part naively would work? Yes it would because naively it can go atmax sqrt(n) steps because otherwise it would be in the maximum part.

So we get a complexity of O(q*sqrt(n))

Note: if there is an update operation, we can perform it and process the queries online, an update operation would take sqrt(n) time.

C++ code:

The second version is bit tricky. Suppose we are given an array and again queries, and each query asks the number of values which are repeated at least 3 times in the range l to r.

This isn’t easy to implement with segment trees and would require a lot of thinking if u are familiar with seg trees. But in this version of Mo’s algorithm it provides a beautiful solution to this.

Let’s see how it goes.

Note: this time we won’t be doing update operation and we will also realise why its not good.

Also, since this time no updation is happening, we can store all the queries with their indices.

Now we call these queries offline and we will be reordering them in a special sorting order and that’s it. This completes the Mo’s algo.

That sounds a little awkward, but now let’s see what kind of sorting it is.

First of all we will divide the blocks again into sqrt(n) sizes, and in each block we will put all those queries which have their L value lying in that block. The next step is to again sort those queries but this time based on their R value.

Now lets just find the answer for this block.
1. Start iterating over the queries, take two pointers curLeft and curRight, fix these two variables as L and R respectively for the current query.
2. And then move from curLeft to curRight and find the answer(I’m not explaining how to find the answer because that’s obvious), Since we know the answer in that range, we will now move to the next query.
3. Now, set the curRight to current right of our query. And the curLeft to the current left of our query.
4. Since the curRight will always be on right, it will keep on moving to the for that particular block. And Since curLeft can go in any arbitrary direction but not go outside this block because we had queries which have L belonging to this block.
5. So curLeft would atmost vary sqrt(n) in each step and the right point may go upto length N but in a single run only.

So for each block the time goes like O(n+sqrt(n)).

And we have sqrt(n) number of blocks, so the overall complexity would be like O(N*sqrt(N)) depending on the implementation and the question.

It won’t be a difficult task to implement the same as we only need to sort the queries and then process it. I’ll leave that to the reader’s task.

Lets dry run it over a test case.

Suppose we have array A=[1,1,2,1,3,2,6,3,1,6,2,6,6,3,1,1]

1. First decompose it in blocks: [1,1,2,1][3,2,6,3][1,6,2,6][6,3,1,1]

2. Now we have some queries like: Assuming 1 based indexing, (1,14),(10,13),(3,11),(7,12),(5,16),(9,16)

3. Now sort these queries based on their starting points and divide them in their corresponding blocks.

4. Sorted queries will look like [(1,14),(3,11)] , [(5,16),(7,12)] , [(9,16),(10,13)]

5. Now they are to be sorted by their ending points in their blocks only.

6. It will look like, [(3,11),(1,14)] , [(7,12),(5,16)] , [(10,13),(9,16)]

7. Now we will start our algorithm on above uniquely sorted array that will result a time complexity of just n*sqrt(n).

8. Take first query, we set our curLeft and curRight at 3, now curRight will go till 11 in our actual array and we mark our answer as 1 since ‘2’ occurred one time in that range.

9. For second query, we move our curLeft to 1, then curRight to 14 and then update our answer to 4 as ‘1’ occurred 4 times, ‘2’ occurred 3 times, ‘3’ occurred 3 times, ‘6’ occurred 4 times which is atleast 3 times.

10. As we can see, for each block curLeft will only move in sqrt(n) region and curRight will move in n range. So for each block the time is sqrt(n)+n.

Similarly for all the sqrt(n) blocks, we have our complexity as of order sqrt(n)*((sqrt(n)+n)+nlogn.