|Inside Random Numbers|
|Written by Mike James|
|Tuesday, 18 November 2014|
Page 2 of 3
Before we go much further you should try to construct something that will make a program appear random.
If you know how to program then you might well be thinking of using the function that generates random numbers in a programming language you know called RND, RAND or something similar, but this is simply cheating. Some other poor programmer had to solve the problem to give you the RND function and to see how hard a task it was you have to avoid simply using it!
One thought that often occurs is why not mimic the coin toss by simply getting together a jumble of programming instructions to produce a “random program”.
After all if you run a random program who knows what it does! This reasoning is only partly correct because in practice you can be fairly sure of knowing what a random program does – nothing!
Most of the arrangements of even correct programming commands don't do anything useful let alone interesting. What is more how do you construct a random program anyway? This surely needs a sources or randomness and so you are about to go round in a circle.
The key point here is that most simple methods or methods that aim to mimic randomness in the real world just don't work. The problem needs a lot of cleverness to solve.
A second thought might be to try and copy the toss of a coin by sampling something that is rapidly changing – i.e. mimicking the spin of the coin.
For example, every PC has a timer which counts in thousands of a second how long it has been switched on. Why not simply read the last two digits of the time in thousands of a second and use these as a random event – i.e. a random number in the range 00 to 99.
This does indeed provide you with a random number, after all how long the machine has been on measured in milliseconds changes far too quickly for you to be able to predict the value of the last two digits. However, when you start looking too carefully at this method you discover that it has some problems.
For example, if you sample it to get a random number 23 say then if you sample it soon after you are going to get 24. Unless the sampling rate is very small compared to the counting rate you are going to get “random” numbers that tend to increase.
This might not matter too much for some applications and indeed the PC timer plays a role in generating random numbers but not quite the one you might expect.
In many ways this is just another attempt to find a source of entropy within the standard hardware - a timer sampled non-periodically can provide some entropy, an empty memory location might also do. Some CPUs even have a special physical random number generator built in just to provide some physical entropy but these too often have their problems.
A mathematical approach
If physical processes are generally too ordered to be used to generate a good approximation to random numbers why not try some mathematical ones.
At this point it looks as if we have really gone mad – after all what is more predictable than mathematics?
However, if you think about the way values of individual figures in a calculation are determined it can be very difficult to predict what they are going to be without actually doing the sums.
One of the first random number generators of this sort was Von Neumann’s mid square method. If you take a number and square it then predicting what the middle two digits of the result are going to be given only the middle two digits of the starting value is difficult.
For example, 1234, i.e. middle digits23, 12342=1522756 i.e. middle digits 27.
You can see that given the numbers 23 it is difficult to predict what the next two numbers are because 23 could be the middle two digits or a lot of different numbers and the connection even if you know the full number isn't that obvious.
The power method was used for some years until it was discovered that the numbers it generated weren't as random as you might expect – too many of the form 00AB or AB00 are produced. The importance of the power method is that it establishes that mathematical methods can produce numbers that are difficult to predict if you don't know the exact computation that produced them.
Good Random Numbers
At the moment we are using the vague term “unpredictable” as a test for random numbers but in practice we need to be a bit more precise.
What do good random numbers look like?
The first obvious thing is that they should cover the range evenly.
That is, if you have random numbers in the range 1 to 6 then each value should turn up equally often. That is they should be uniform with no values occurring more often than any other.
How does this relate to “predictability”?
Suppose the number 6 came up more often than any other number then it would be reasonable to suppose that you would make a long term profit by betting on it given that you can predict that it comes up more often.
In this sense “predictable” means statistically predictable.
You might not know the algorithm that created the numbers but you can still make good guesses at what comes next by simply observing the patterns in the generated sequence.
A sequence of numbers can be treated as random for many purposes if it isn’t possible to make a profit by betting on them using the best plan you can invent!
So in this sense random numbers have to occur equally often and, taking the same argument on a stage, all possible pairs should occur equally often.
Well, for example, if the number 4 was followed more often by 6 than any other digit then you could use this irregularity to predict the next number whenever you saw a 4 appear. In the same way that any inequality in the frequency of the single digits could be used to make a profit on betting so could any inequality in the frequency of digit pairs – and of course digit triples, quadruples and so on…
What this means is that to test a stream of numbers for good randomness you need to look at the frequencies of single digits, pairs, triples, quadruples, whatever five digits are called and so on forever, clearly an impossible task!
In practice we generally only ask random numbers to be random up to pairings on the basis that humans and even machines are very poor at noticing patterns beyond pairings.
In fact for most applications we even relax the condition on the number pairs to the simpler requirement that subsequent random number pairs are “uncorrelated”. Roughly speaking this means that if there is any association between number pairs it is so complicated that humans can’t make use of it to their advantage.
When you test the random numbers that the square method produces in this way they fail miserably. They may be unpredictable in a weak deterministic sense but they contain a statistical pattern that can be used to guess the next number in such a way that you would be correct more often than blind guessing.
There are a range of statistical tests that a random number generator can be subject to but one of the simplest checks is to generate pairs of a numbers and treat them as x and y coordinates in a scatter plot. When you look at the plot a poor random number generator will often be revealed by bands and gaps in the plotted points.
|Last Updated ( Tuesday, 18 November 2014 )|