Programmer's Guide To Theory - Splitting the Bit
Written by Mike James
Monday, 22 July 2024
Article Index
Programmer's Guide To Theory - Splitting the Bit
Make It Equal
Huffman And Zip

Information theory – perhaps one of the most remarkable inventions of the twentieth century - naturally leads on to the consideration of how information can be coded and hence coding theory.

## A Programmers Guide To Theory

#### Contents

1. What Is Computer Science?
Part I What Is Computable?
2. What Is Computation?
3. The Halting Problem
4. Finite State Machines
Extract 1: Finite State Machines
5. Practical Grammar
6. Numbers, Infinity and Computation
Extract 1: Numbers
Extract 2: Aleph Zero The First Transfinite
Extract 3: In Search Of Aleph-One
Extract 4: Transcendental Numbers
7. Kolmogorov Complexity and Randomness
Extract 1:Kolmogorov Complexity
8. The Algorithm of Choice
9. Gödel’s Incompleteness Theorem
10. Lambda Calculus
Part II Bits, Codes and Logic
11. Information Theory
12. Splitting the Bit ***NEW!
13. Error Correction
14. Boolean Logic
Part III Computational Complexity
15. How Hard Can It Be?
Extract 1: Where Do The Big Os Come From
16. Recursion
Extract 1: What Is Recursion
Extract 2: Why Recursion
17. NP Versus P Algorithms
Extract 1: NP & Co-NP
Extract 2: NP Complete

<ASIN:1871962439>

<ASIN:1871962587>

In the previous chapter,Information Theory, we learned that if a message or symbol occurs with probability p then the information contained in that symbol or message is -log2(p) bits. For example, consider the amount of information contained in a single letter from an alphabet. Assuming there are twenty-six letters in an alphabet and assuming they are all used equally often the probability of receiving any one particular letter is 1/26 which gives -log2(1/26)=4.7 bits per letter.

It is also obvious from other elementary considerations that five bits are more than enough to represent 26 symbols. Quite simply, five bits allow you to count up to more than 26 and so you can assign one letter of the alphabet to each number. In fact, five bits is enough to represent 32 symbols because you can count up to 31, i.e. 11111.

Using all five bits you can associate A with zero and Z with 25 and the values from 26 to 31 are just wasted. It seems a shame to have wasted bits because we can’t quite find a way to use 4.7 bits – or can we? Can we split the bit?

## Average information

Before we go on to consider “splitting the bit”, we need to look more carefully at the calculation of the number of bits of information in one symbol from an alphabet of twenty-six. Clearly the general case is that a symbol from an alphabet of n symbols has information equal to –log2(n) bits, assuming that every symbol has equal probability.

Of course what is wrong with this analysis is that the letters of the alphabet are not all equally likely to occur. For example, if you don’t receive a letter “e” you would be surprised – count the number in this sentence. However, your surprise at seeing the letter “z” should be higher – again count the number in some other sentence than this one!

Empirical studies can provide us with the actual rates that letters occur – just count them in a book for example. If you add the “space” character to the set of symbols, now a total of 27, you will discover that it is by far the most probable character with a probability of 0.185 of occurring, followed by “e” at 0.100, “t” at 0.080 and so on down to “z” which has a probability of only 0.0005. It makes you wonder why we bother with “z” at all!

But, given that “z” is so unlikely, its information content is very high –log2(0.0005)=10.96 bits. So, “z” contains nearly eleven bits of information compared to just over three bits for “e”.

If you find this argument difficult to follow just imagine that you are placing bets on the next symbol to appear. You expect the next symbol to be a space character so you are very surprised when it turns out to be a “z”. Your betting instincts tell you what the expected behavior of the world is and when something happens that isn't expected you gain some information.

We can define an average information that we expect to get from a single character taken from the alphabet.

If the ith symbol occurs with probability pi then the information it contains is –log2(pi) bits and the average information contained in one symbol, averaged over all symbols is:

–p1log2(p1) –p2log2(p2) … –pnlog2(pn)

You can see that this is reasonable because each symbol occurs with probability pi and each time it occurs it provides –log2(pi) bits of information. Notice that while “z” carries 11 bits of information, the fact that it doesn’t occur very often means that its contribution to the average information is only about 0.0055 bits.

Applying this formula to the letters of the alphabet and their probabilities gives an average information of 4.08 bits per symbol which should be compared to 4.76 bits per symbol for an alphabet of 27 equally likely characters. Notice that the average information in 27 equally likely characters is also 4.76 bits – to see why try working it out.

No coding method can represent messages using fewer than the number of bits given by their average information. Inefficient coding methods can, and do, use more bits than the theoretical minimum. This waste of bits is usually called redundancy because the extra bits really are unnecessary, i.e. redundant. Redundancy isn't always a waste as it can be used to detect and even correct errors in the messages, as we will see the next chapter.

There is also a nice theorem which says that the average information per symbol is largest if all of the symbols are equally likely. This means that if any of the symbols of an alphabet are used more or less often than others, the average information per symbol is lower. It is this observation that is the key to data compression techniques.

In general the average information, also known as entropy is given by:

E = ∑ipilog2pi

where the sum is over all the symbols that are in use.

Last Updated ( Monday, 22 July 2024 )