Quick Median
Written by Mike James   
Article Index
Quick Median
Divide and conquer in C#

You have probably heard of Quicksort but this is just one of many partitioning algorithms that work in clever ways to do things faster. Here we look at another of the algorithms devised by C.A.R. Hoare, the Quick Median.

 

Finding the middle

A friend of mine was writing a program to find the median.

What's the median?

The simple answer is that it is the value that is bigger than half of the values in a list. In a sense it is a middle value and it is important not only in statistics but also in lots of everyday programming tasks.

For example, if you want to choose a value to split a set of elements into two equal groups then the median is the break point.

The most obvious way of finding the median of a set of numbers is to sort the list into order and then look at the one half way down the list. In other words, find the value that divides the list into two equal portions one bigger and one small that it.  The only trouble is that sorting isn't a very fast operation but at least the algorithm is simple and stable.

There is another way to find a median that is both fast and fascinating. If you are a collector of algorithms this is one you should have pinned on the wall. It was invented by C.A.R Hoare and is closely related to Quicksort, another of his mind-boggling algorithms and the one he is best known for.

It is based on a partitioning method that moves elements of the array around to produce two regions - a left portion that is smaller than or equal to a given value x, and a right portion that is larger than or equal to x.

 

      <=x     
        =>x        

 

You can think up lots of methods of partitioning an array in this way but the simplest is:

  1. start a scan from the left of the array and stop when you find a value that isn't smaller than x.
    You know it is in the wrong place so remember where you got to and -
  2. start a scan from the right of the array and stop when you find a value that isn't bigger than x.
    You know it is in the wrong place so swap it with the value that you found in step 1.

Continue the scans from where they left off. When the two scans meet the array is correctly partitioned - because every element to the left of the where they met is smaller than or equal to x, it would have been moved if it wasn't and every thing to the right is larger than or equal to x for the same reason.

This partitioning by scanning is a nice algorithm but what has it got to do with the median?

Well imagine that you choose a value in the middle of the array and perform the scan. If by some chance the two portions produced were of equal size x would be the median.

That is, if x is the median the partition is:


        <=x        
        =>x        
                    ^
half way

Of course you would be very lucky to have found the median in the middle of the array and in most cases this wouldn't work. If x was smaller than the median then the two scans would cross before the middle of the array:

 

   <=x    
            =>x             
                    ^
half way

 

In this case you can pick the element that is now in the middle of the array (it must have changed but can you see why?) as a new value of x and repeat the procedure again but only on the right-hand portion.

You only have to re-partition the portion to the right because the portion to the left is already smaller than the new value of x - but can you see why?

If the initial value of x was bigger than the median the picture would be:

 

             <=x                
  =>x  
                    ^
half way

and by the same argument you can pick the new element in the middle and repeat the partitioning but only on the left-hand portion.

If you repeat this partitioning process you will eventually find a pair of scans the do meet in the middle and the value of x is the median.

I don't know about you, but I get an uneasy feeling about this algorithm. If you really want to make sure that you understand the algorithm follow it to the point that you can understand how and why the value in the middle changes at each scan and why you only have to re-partition one of the portions.

It is enough to keep you awake at night!

Eventually the value stored at the middle of the array becomes the median and the values that it is bigger than or equal to are to the left and the values that it is smaller than or equal to are to the right.



 
 

   
RSS feed of all content
I Programmer - full contents
Copyright © 2014 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.