Page 2 of 2
If you think that the work described so far is enough for any lifetime, then you will be surprised to hear that we haven't even touched on Shannon's real claim to fame as "Father of the Information Age".
At the time that he was musing on using the binary system for computations, we knew very little about what information was. It was possible for very knowledgeable engineers to propose schemes that today we would realize were just plain silly.
For example, it wasn't understood at all clearly why radio signals spilled over from the frequency that they were transmitted on to occupy a band of frequencies. It was thought that with improved technology you might be able to reduce the bandwidth needed to almost nothing. Engineers were mystified why, when they transmitted a radio signal on a single frequency, 100kHz, say the signal actually spread out to occupy a range of frequencies, say 80kHz to 120Hz. This is where the term "bandwidth" comes from and it limited how close you could pack radio stations. If the bandwidth could be reduced you could get more radio stations on the air.
Today we know that you need a certain amount of bandwidth to transmit a given amount of information and this is a law of nature. For example, you can't transmit data down a telephone line faster than a given speed because the phone line has a very limited bandwidth - just enough to transmit the human voice. To do better you need a cable with a wider bandwidth such as a coaxial cable or a fiber optic cable.
In the same way we know that you can take a 1MByte file and compress it down to say 0.5MByte, but after you have used the best compression algorithm on it you can't get it into any less space.
The reason is that once the file's true information content is reached you cannot compress it any more. All of these ideas and many more are due to Shannon and his theory of information.
Although Shannon cast his theory in terms of communication it is just as applicable to information storage and retrieval. The general ideas are difficult to describe because they involve probability theory and considerations of how surprised you are to receive any particular information.
It may sound strange to say that the amount of information in a message depends on how surprising its content it but that's the key to a coherent theory of information. If you can receive a total of M possible messages and they are all equally likely then the amount of information in any message is log2M, where log2 is the logarithm to the base 2.
To prove this would take us into probability theory, but from our computer-oriented standpoint it seems obvious enough because you need exactly log2M bits to represent M messages. If you have 1 bit you can represent two messages as 0 or 1; 2 bits can code 4 messages as 00,01,10 and 11, and so on..
It should be clear that given b bits you can represent 2b messages. That is M=2b or as log2M to the base 2 is simply the power that you have to raise 2 by to get M you should also be able to see that log2M=b.
If the messages are not all equally likely then you can use a more sophisticated code to reduce the average number of bits needed to represent the messages.
This is the principle of data compression and you can prove that if each message Mi happens with a probability Pi then the average number of bits needed is the sum of -Pilog2Pi over all possible messages. This is the information content of the messages in bits.
Information in bits = Sum over all messages i of -Pilog2Pi
This is the measure of information that Shannon invented in 1948. He then went on to publish a series of papers - two in 1948 and one in 1949 - that presented the subject of information theory at a level of completeness that is surprising and an amazing achievement. Other people had been trying to find a measure of information for some time and Shannon goes and not only finds one but he then writes down its complete theory in only a couple of years.
Kickstarting the Information Age
Shannon started the study of coding theory covering the effect of noise, bandwidth and power, optimal codes, error correcting codes, data compression and all things binary.
He also solved the radio transmission problem described above by stating what is now known as the Shannon-Hartley law which says that given a bandwidth W and a signal to noise ratio of R then the fastest that you can reliably transmit is
bits per second. In other words, if you reduce the bandwidth or try to send data too quickly then you get an increasing number of errors - which is what anyone who has used a high speed modem over a telephone line or a digital phone over a weak connection will tell you!
As the noise on the line gets worse the modem drops back to work at a slower speed and so reduce the number of errors.
You can go back to the original papers, conveniently collected into the book, The Mathematical Theory of Communication. They read like an authoritative text book on information theory - not research documents feeling their way towards an idea.
The bit, the nit, the dit and the Hartley
The unit of information that Shannon invented is universally called the "bit" standing for binary digit. However, there are other units of information.
If you take the log in Shannon's equation to the base e, i.e. a natural logarithm, then you get a measure of information called the nat or nit - I am completely serious here!
If you take the log to the base 10 then the unit is called the Hartley after R.V.L Hartley who tried to work it all out before Shannon but didn't have the benefit of thinking about binary numbers. The Hartley is also known as the dit.
You might also like to know that 1bit is .69 of a nat and .3 of a dit - fancy describing something as storing .69MegaNats or .3MegaDits.
All I can say is that I'm glad Shannon got there first!
Without Shannon it is doubtful we would still all be using decimal computers or be ignorant of the basic laws of data transmission and compression. Practical engineering has a way of dealing with issues as they arise so someone would have come up with all the right ideas and theories, if only in retrospect. However when it comes to information theory he created the whole subject in one go and it is difficult to think that any one else would have managed the feat in quite such a neat and complete way.