|How Error Correcting Codes Work|
|Written by Harry Fairhead|
|Thursday, 08 March 2018|
Page 1 of 2
Error correcting codes are essential to computing and all sorts of communications. At first they seem a bit like magic. How can you possibly not only detect an error but correct it as well? How do they work? In fact it turns out to be very easy to understand the deeper principles.
A fundamental application of coding theory is the detection and correction of errors.
Without modern error correcting codes the audio CD would never have worked. It would have so many clicks, pops and missing bits due to the inevitable errors in reading the disc that you just wouldn't listen to it. Servers often make use of ECC memory modules - and ECC stands for Error Correcting Code. Photos sent back from space craft wouldn't be valuable without error correcting codes. Secure data storage wouldn't be secure without the use of ECC.
Finally the much loved QR Code has four possible levels of error correct with the highest level allowing up to 30% of the code to be damaged:
This QR code can still be read - its a link to Wikipedia - due to error correcting Reed Solomon codes.
So how does error detection and correction work?
There is an old programmer's joke that we might as well get out of the way at the start:
Programmer buys parrot.
Parrot sits on programmer’s shoulder and says
“pieces of nine, pieces of nine,..”.
When asked why it isn’t saying the traditional “pieces of eight” the programmer replies,
“It’s a parroty error!”
Parity error checking was the first error detection code and it is still used today. It has the advantage of being simple to understand and simple to implement.
It can also be extended to more advanced error detection and correction codes.
To understand how parity checking works consider an eight-bit item of data -
If you imagine that this was stored in a less than secure form then if a single bit was changed from a zero to a one or vice versa you would never know.
If instead of storing eight bits we store nine with the ninth – the parity bit – set to make the total number of ones odd or even then you can see that a single bit change is detectable.
If you select odd parity then the nine bits are
i.e. the parity bit is a 1, and if any single bit changes then the parity changes from odd to even and you know there has been a bit error.
Parity checking detects an error in a single bit but misses any errors that flip two bits – because after any even number of bit changes the parity is still the same.
Parity error checking is used when there is a fairly small probability of a single bit being changed and hence an even smaller probability of two bits being changed.
When you first meet parity error detection it all seems very simple but it seems like a “one-off” rather than a general principle.
How, for example, do you extend it to detect a two-bit or three-bit error?
In fact parity checking is the simplest case of a very general principle but you have to think about it all in a slightly different way to see this.
When a bit is changed at random by noise you can think of the data word as being moved a small distance away from its true location.
A single-bit change moves it a tiny amount, a two-bit change moves it further and so on. The more bits that are changed the further away the data word is from its original true location.
A simple measure of this distance between two data words is to count the number of bits that they differ by – this is called the Hamming distance after R. W. Hamming who did much of the early work into error detection and correction.
Richard Hamming 1915-1998
For example, the two data words 011 and 110 are two units apart because they differ in two places – the first and last bits.
What has Hamming distance got to do with parity checking?
Simple - if you take a valid data word which has a parity bit associated with it and change a single bit then you have a data word which is one Hamming unit of distance away from the original.
Every valid code word has an invalid code word one unit away from it.
To imagine this it is easier to think of a three-bit code. In this case you can draw a cube to represent the location of each possible code word.
A code cube
If we treat all even parity words as valid and odd parity words as invalid then you can see at once that a code such as 000 is surrounded by invalid codes one unit away, e.g. 001, 010 and 100. If you move two units away then you reach valid codes again.
What is more, every valid code is surrounded by a cluster of invalid codes one unit away. In other words, a single-bit error always moves a valid code to an invalid code and hence we detect the error.
Red corners are valid codes – black invalid
|Last Updated ( Thursday, 08 March 2018 )|