A US district court has given its decision on the meaning of "strictly random" and has come up with a mathematical ruling.
If you were asked to implement a random algorithm I hope you would know how to do it - although the fine details can get tricky even if you think you understand.
Earlier this year there was a problem with the drawing of the US Green Cards and two weeks after the drawing the results were voided. The Green Card system is a lottery that allows people to win permanent residency of the US and the word "lottery" should give you a clue as to the requirements of the program that implements it.
The problem was that the draw wasn't random in any reasonable sense.
Over 19 million people applied and when the results were announced in May 2011 it became clear that more than 90% of the winners had made their entries in the first two days of the 30-day submission period. This cannot be random in the usual sense of the word and so the lottery was declared void.
Of course the people who had won didn't like the idea that they had at first won and then hadn't and so they started a lawsuit in the federal court. Their argument was that although the sample was biased the pattern wasn't perfect as some 2% were picked from other entry times. Also the selection process was in a sense "arbitrary" and so again followed the principle of "random".
Another wonderful line of argument was that as the list had been produced by a "computer error" it was indeed "random". To be more accurate they claimed that:
"a computer error . . . cannot be equated with non-randomness... if a computer error produces a list of names – any names – then the list must be random since it was “without purpose” and “haphazard.”
The court appears to have thought long and hard over the meaning of "random" and concluded that any process that resulted in 90% being picked from the first two days of entries couldn't be random and so denied the claims.
"Moreover, the Court does not find that it was arbitrary or capricious for the Department to decide to rescind a lottery that did not meet the single most important criterion for a drawing: a random selection."
But how did this all happen. According to the State Department:
"The State Department used a new randomizer program for the 2012 DV lottery, which turned out to include an error in the process that “rendered the Randomizer Program ineffective". Instead of directing the computer to select the winners as they had been re-ordered and randomized in step two, the computer simply selected the entries in the order in which they were originally numbered."
That is, quite an error!
The next question that occurs to any programmer is why then didn't the sample consist of all of the people who applied first. That is, why wasn't it a strictly first come first served selection? The answer is a little strange:
"The database program then stored the petitions to a physical location on the hard drives, for the most part in the order in which they were received. However, because of the high volume of petitions received for the lottery – more than one petition per second – the database could not store the petitions in the sequential order in which they were submitted with 100% precision. In some instances, the program “would record the petition in a distant location on the hard drive and leave, temporarily, an empty spot or gap on the hard drive adjacent to where it had recorded the immediately preceding petition.
The database program later filled in the gaps with petitions submitted later in time than their numerical placement in the database would otherwise suggest."
Nope that really doesn't make a lot of sense. Where things are stored in a database doesn't usually alter the outcome of any process --- unless that is the State Department is using a punch card system...
So a fiasco was caused by some unspecified bug that somehow lost the random function in the selection process.
More cartoons at xkcd
What is interesting is that there is an issue - surely random or at least pseudo-random is so obvious that no court ruling is needed?
If you think so than try to decide if the following algorithm would be legal.
- Index number all of the entries sequentially as they arrive.
- Shuffle the entire database so that it is in random order using:
scan through the entire database and swap each record with a random record selected from the entire database.
- Now take the first N entries as your selected set.
That is, in code - assuming there are M entries:
where Rand(M) generates a random integer in the range 0 to M-1.
The random swap is repeated on each record in the database so producing a completely random order ... or does it?
If you think this isnt' random try working out why.
Not as easy as it looks is it?
For a full explanation of why this naive shuffle isn't random see:
How not to shuffle - the Knuth-Fisher-Yates algorithm
If you would like to be informed about new articles on I Programmer you can either follow us on Twitter or Facebook or you can subscribe to our weekly newsletter.