|Programmer's Guide To Theory - Kolmogorov Complexity|
|Written by Mike James|
|Monday, 22 June 2020|
Page 1 of 2
Randomness and regularity are two sides of the same coin but what connects them? Kolmogorov complexity, related to both, is one of the strangest ideas of all. This is what this extract from Chapter 7 of my recent book is all about.
A Programmers Guide To Theory
Now available as a paperback and ebook from Amazon.
At the end of the previous chapter we met the idea that if you want to generate an infinite anything then you have to use a finite program and this seems to imply that the whatever it is that is being computed has to have some regularity. This idea is difficult to make precise, but that doesn’t mean it isn’t an important one. It might be the most important idea in the whole of computer science because it explains the relationship between the finite and the infinite and it makes clear what randomness is all about.
Suppose I give you a string like 111111... which goes on for one hundred ones in the same way. The length of the string is 100 characters, but you can write a short program that generates it very easily:
Now consider the string "231048322087232.." and so on for one hundred digits. This is supposed to be a random string, it isn't because I typed it in by hitting number keys as best I could, but even so you would be hard pressed to create a program that could print it that was shorter than it is. In other words, there is no way to specify this random-looking string other than to quote it.
This observation of the difference between these two strings is what leads to the idea of Kolmogorov, or algorithmic, complexity. The first string can be generated by a program with roughly 30 characters, and so you could say it has 30 bytes of information, but the second string needs a program of at least the hundred characters to quote the number as a literal and so it has 100 bytes of information.
You can already see that this is a nice idea, but it has problems. Clearly the number of bytes needed for a program that prints one hundred ones isn't a well-defined number - it depends on the programming language you use. However, in any programming language we can define the Kolmogorov complexity as the length of the smallest program that generates the string in question.
Andrey Kolmogorov was a Russian mathematician credited with developing this theory of information but it was based on a theorem of Ray Solomonoff which was later rediscovered by Gregory Chaitin - hence the theory is often called Solomonoff-Kolmogorov-Chatin complexity theory.
Obviously one way around this problem that the measure of this complexity is to use the size of a Turing machine that generates the sequence, but even this can result in slightly different answers depending on the exact definition of the Turing machine. However, in practice the Turing machine description is the one preferred.
So complexity is defined with respect to a given description language – often a Turing machine. The fact that you cannot get an exact absolute measure of Kolmogorov complexity is irritating but not a real problem as any two measures can be shown to differ by a constant.
The Kolmogorov complexity of a string is just the smallest program that generates it.
For infinite strings things are a little more interesting because, if you don't have a program that will generate the string, you essentially don't have the string in any practical sense. That is, without a program that generates the digits of an infinite sequence you can't actually define the string. This is also the connection between irrational numbers and non-computable numbers.
As explained in the previous chapter, an irrational number is an infinite sequence of digits. For example:
where the ... means carry on forever.
Some irrationals have programs that generate them and as such their Kolmogorov complexity is a lot less than infinity.
However, as there are only a countable number of programs and there are an uncountable number of irrationals – see the previous chapter - there has to be a lot of irrational numbers that don't have programs that generate them and hence that have an infinite Kolmogorov complexity. Put simply, there aren't enough programs to compute all of the irrationals and hence most irrationals have an infinite Kolmogorov complexity. To be precise, there is an aleph-zero, or a countable infinity, of irrational numbers that have Kolmogorov complexity less than infinity and an aleph-one, or an uncountable infinity of them, that have a Kolmogorov complexity of infinity.
A key theorem in algorithmic complexity is:
If this were not so we could generate the aleph-one set of infinite strings using an aleph-zero set of programs.
The irrationals that have a smaller than infinite Kolmogorov complexity are very special, but there are an infinity of these too. In a sense these are the "nice" irrationals - numbers like π and e - that have interesting programs that compute them to any desired precision. How would you count the numbers that had a less than infinite Kolmogorov complexity?
Simple just enumerate all of the programs you can create by listing their machine code as a binary number. Not all of these programs would generate a number, indeed most of them wouldn't do anything useful, but among this aleph-zero of programs you would find the aleph-zero of "nice" irrational numbers.
Notice that included among the nice irrationals are some transcendentals. A transcendental is a number that isn't the root of any finite polynomial equation. Any number that is the root of a finite polynomial is called algebraic. Clearly, for a number that is the root of a finite polynomial, i.e. not transcendental but algebraic, you can specify it by writing a program that solves the polynomial. For example, √2 is an irrational, but it is algebraic and so it has a program that generates it and hence it’s a nice irrational.
|Last Updated ( Monday, 22 June 2020 )|