Quick Median
Written by Mike James   
Friday, 11 August 2017
Article Index
Quick Median
Divide & Conquer In C#

You have probably heard of Quicksort but what about Quick Median? This is another of the many partitioning algorithms that work in clever ways to do things faster. Quick Median is a useful and  instructive algorithm and it was invented by C.A.R. Hoare who also invented the Qucksort.

Algorithms that work by a "divide and conquer" approach that tend to reduce times from something polynomial to something log2n. These algorithms are incredibly useful but they can be hard to invent so the more you know the better. Quick median has the advantage of being easier to understand than Quicksort.

Finding the middle

A friend of mine was writing a program to find the median.

What's the median?

The simple answer is that it is the value, in the list, that is just bigger than half of the values in a list. In a sense it is a middle value and it is important not only in statistics but also in lots of everyday programming tasks.

To be more precise is a value in the list that is bigger than or equal to half of the list and smaller than or equal to the other half.

If we use integer arithmetic with truncation i.e. 3/2 is 1 then, for a list with an odd number of elements the median is the n/2 element. That is for 11 elements the median is in L[5] assuming the first element is L[0]

Of course this definition only works if there are an odd number of elements in the list. If there are an even number the standard solution is to take the average of the elements at n/2 and n/2+1 in the sorted list.

For simplicity we can assume that the number of elements in the list is odd.

The most obvious way of finding the median of a set of numbers is to sort the list into order and then look at the one half way down the list. In other words, find the value that divides the list into two equal portions one bigger or equal and one smaller or equal than it. 

The only trouble is that sorting isn't a very fast operation but at least the algorithm is simple and stable.

Quick Median - A Partition

There is another way to find a median that is both fast and fascinating. If you are a collector of algorithms this is one you should have pinned on the wall. It was invented by C.A.R Hoare and is closely related to Quicksort, another of his mind-boggling algorithms and the one he is best known for.

It is based on a partitioning method that moves elements of the array around to produce two regions - a left portion that is smaller than or equal to a given value x in the array, and a right portion that is larger than or equal to x.

 

      <=x       
        >=x        

 

 

You can think up lots of methods of partitioning an array in this way but the simplest is:

  1. start a scan from the left of the array and stop when you find a value that is greater than or equal to x.
    You know it is in the wrong place so remember where you got to -

    and

  2. start a scan from the right of the array and stop when you find a value that is smaller than or equal to x.
    You know it is in the wrong place so swap it with the value that you found in step 1.

Continue the scans from where they left off.

That is:

            do
            {
                while (a[i] < x ) i++;
                while (a[j] > x) j--;
                float t = a[i];
                a[i] = a[j];
                a[j] = t;
            } while (i < j);

 

When the two scans meet the array is correctly partitioned - because every element to the left of the where they met is smaller than or equal to x, it would have been moved if it wasn't and every thing to the right is larger than or equal to x for the same reason. The element that both scans stop on is equal to x.

The only slightly mysterious part is why do the scans eventually stop on an element that is x?

This is difficult to understand.

One of the two scans will reach a[i] = a[n/2] which is equal to x first. 

Suppose it is the first scan the argument is the same if we assume it is the second second scan with adjustments.

Then the second scan will either stop on the same element or an element a[j] which is smaller than x. No matter what it is the element that the second scan stopped on a[j] and a[i]  are then swapped but then the second scan cannot progress because its a[j] is now equal to x. Thus the x value acts as a sentinel and the second scan cannot now pass it. The second scan also shouldn't pass it because it is looking for values smaller then x to move and these should be on the left of x anyway which is where they are.

The first scan can however continue and if it finds a value greater than x it will stop and the two values will be swapped. After this the two scans can start moving again but eventually we will reach a position where one of the scans hits the element x element again.

This process stops when both scans stop at the x element which is the only element they both stop scanning on.

Notice that our diagram of the partitioned array is more accurately drawn:

      <x       
=x
                 >x        

                      ^ ^
                      | |
                      i j

 

 

This element a[k]=x at the position where the scans meet has the property that all of the elements to its left are less than it and all of the elements to the right are greater than it. Or put another way there are k-1 elements less than it to the left and n-k elements to the right greater than it.

Median Finding

This partitioning by scanning is a nice algorithm but what has it got to do with the median?

Well imagine that you choose a value in the middle of the array and perform the scan. If by some chance the two portions produced were of equal size x would be the median.

That is, if x is the median the partition is:



      <x       
=x
           >x        

                        ^
                       half way

   

Of course you would be very lucky to have found the median in the middle of the array and in most cases this wouldn't work. If x was smaller than the median then the two scans would cross before the middle of the array:

 

 <x  
=x
            >x                 

^

    half way

 

In this case you can pick the element that is now in the middle of the array again.

This value must have changed from the previous time you picked it otherwise the scans would have crossed in the middle and it would be the median.

The value of x is used to repeat the procedure again but only on the right-hand portion.

You only have to re-partition the portion to the right because the portion to the left is already smaller than the new value of x - but can you see why?

It is because the new value of x is taken from the portion of the list that is larger than the old value of x and so it must be larger than the left hand portion of the list.

If the initial value of x was bigger than the median the picture would be:

 

 

 

 

 

               <x                
=x
    >x  

^

half way

and by the same argument you can pick the new element in the middle and repeat the partitioning but only on the left-hand portion.

If you repeat this partitioning process you will eventually find a pair of scans the do meet in the middle and the value of x is the median.

I don't know about you, but I get an uneasy feeling about this algorithm even though I think I understand it.

If you really want to make sure that you understand the algorithm follow it to the point that you can understand how and why the value in the middle changes at each scan and why you only have to re-partition one of the portions.

It is enough to keep you awake at night!

Eventually the value stored at the middle of the array becomes the median and the values that it is bigger than or equal to are to the left and the values that it is smaller than or equal to are to the right.



Last Updated ( Friday, 11 August 2017 )