LZ Compression And The One-Bit Catastrophe

Written by Mike James

Monday, 17 July 2017

We use compression algorithms extensively to save both storage and time. The most popular algorithms are based on LZ dictionary compression and are used in GIF, Deflate, Zip, PNG. It was even named an IEEE Milestone. But we know little about them. One long standing question is, can adding a single bit to a file dramatically change the compression achieved? Surely not!

LZ compression, specifically LZ 78, is of both practical and theoretical interest. It is one of a class of dictionary compression methods invented in 1978 by Abraham Lempel and Jacob Ziv.

A dictionary compression method is lossless and works by building up a dictionary of short sequences that can then be used to code the file in question. For example, if you had a file that was composed of nothing but 10,000 "A"s then the single dictionary entry "A" and the coding 10,000 would provide a very effective compression. In the real world, of course, files are made up of complex bit sequences but unless they are random some sequences repeat. The difficulty is finding them.

An optimal dictionary encoding gives an upper bound to the Kolmogorov complexity of a string. The Kolmogorov complexity is just the shortest program that will generate the string and clearly a compression algorithm (plus the dictionary) qualifies as a program and hence any other program that does the job has to be shorter.

With all this in mind, the question arises "can adding a single bit significantly change the compression ratio?". This is sometimes referred to as the "one-bit catastrophe".

"Suppose you compressed a file using your favorite compression algorithm, but you realize there were a typo that makes you add a single bit to the original file. Compress it again and you get a much larger compressed file, for a one-bit difference only between the original files. Most compression algorithms fortunately do not have this strange behaviour; but if your favorite compression algorithm is called LZ’78, one of the most famous and studied of them, then this surprising scenario might well happen. . ."

The question is as much about the nature of order and entropy as it is compression methods. You could rephrase this as "can a single bit significantly change the order in a bit sequence or string?"

The answer, revealed by Guillaume Lagarde and Sylvain Perifel at the Sorbonne, is that for infinite-bit sequences adding a single 0 to the start turns a compressible sequence into an incompressible one.

I guess you could say that a single bit can render order into disorder but that might be a bit theatrical.

For finite words the situation isn't quite as bad as the compressibility of a word with a bit added has to be close to the original word. But there are words that are easily compressible that are converted into words that are much more difficult to compress by adding a single bit.

This theoretical musing does have a practical consequence:

"This “catastrophe” shows that LZ’78 is not robust with respect to the addition or deletion of bits. Since a usual good behaviour of functions used in data representation is a kind of “continuity”, our results show that, in this respect, LZ’78 is not a good choice, as two words that differ in a single bit can have images very far apart."

More Information

Lempel-Ziv: a "one-bit catastrophe" but not a tragedy

Data Compression The Dictionary Way

Kolmogorov Complexity

Information Theory

Non-computable numbers

Computability

Axiom Of Choice - The Programmer's Guide

The Programmer's Guide To The Transfinite

Claude Shannon - Information Theory And More

Fractal Image Compression

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Edera Announces Hardened Runtime Security
24/07/2025

Edera has set out a new software security inititative, Hardened Runtime Security, that avoids the problems caused by the industry tendency to 'move fast and break things.' The company has also j [ ... ]

+ Full Story

CISA and NSA - Use Rust Or Perhaps Java
02/07/2025

The CISA and the NSA are urging us to adopt memory-safe languages (MSLs) for the sake of cybersecurity. You probably think they mean Rust but things aren't as clear cut as you might expect.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 17 July 2017 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments