New Hutter Prize Milestone For Lossless Compression

Written by Mike James

Friday, 06 August 2021

A new milestone has been achieved in the endeavour to develop a lossless compression algorithm. Artemiy Margaritov, a researcher at the University of Edinburgh has been awarded a prize of 9000 Euros ($10,632) for beating the previous Hutter Prize benchmark by 1.13%.

The Hutter Prize for Lossless Compression of Human Knowledge was launched in 2006 with its stated aim being:

to encourage the development of intelligent compressors/ programs as a path to AGI [Artificial General Intelligence].

This challenge was initiated by Marcus Hutter, author of the seminal 2005 book on "Universal Artificial Intelligence". Formerly a professor at the Research School of Computer Science at the Australian National University, Hutter is now DeepMind Senior Scientist researching the mathematical foundations of artificial general intelligence (AGI). Explaining why he is funding a data compression contest that furthers research into AGI, he states:

This compression contest is motivated by the fact that being able to compress well is closely related to acting intelligently, thus reducing the slippery concept of intelligence to hard file size.

As we reported at the time, last year Hutter increased the size of both the task and the reward by a factor of ten. So there is now 500,000 Euros to be won, paid out for incremental improvements in data compression of an excerpt from Wikipedia.

Originally the task was to losslessly compress the 100 Mb file enwik8 with a baseline of 18,324,887. The new challenge uses the 1GB file enwik9 to less than 116MB. More precisely:

Create a Linux or Windows compressor comp.exe of size S1 that compresses enwik9 to archive.exe of size S2 such that S:=S1+S2 < L := 116'673'681 = previous record.
If run, archive.exe produces (without input from other sources) a 10⁹ byte file that is identical to enwik9.
If we can verify your claim, you are eligible for a prize of 500'000€×(1-S/L). Minimum claim is 5'000€ (1% improvement).
Restrictions: Must run in ≲100 hours using a single CPU core and <10GB RAM and <100GB HDD on our test machine.

The first winner since the revised challenge was launched is Artemiy Margaritov. In May 2021 his STARLIT algorithm beat the last benchmark, set by Alexander Rhatushnyak in July 2019. He received a bonus in proportion to the time since the last benchmark was set, raising his award by 60% to €9000.

STARLIT, compresses by first reordering the articles in enwik9 to maximize mutual information between consecutive articles, then uses a dictionary preprocessor and compresses using a reduced version of cmix, a neural network based program, to decrease memory usage from 32 GB to 10 GB and increase speed. Then the dictionary and article order list (both text files) are compressed with the newly created cmix and appended to the executable. The size is 124,984 bytes before appending and 401,505 bytes afterward. To decompress, archive9 is run, which extracts the dictionary, article order list, and 17 GB of temporary files, and 2 days later, the output as a file named enwik9_uncompressed. The original article order is restored by sorting the titles alphabetically. The improvements seem to be more due to tuning rather than any breakthrough.

Mike James is the author of The Programmer’s Guide To Theory which, as its subtitle "Great ideas explained" suggests, sets out to present the fundamental ideas of computer science in an informal and yet informative way.

More Information

Hutter Prize

Hutter Prize Now 500,000 Euros

Kolmogorov Complexity

LZ Compression And The One-Bit Catastrophe

Why Deep Networks Are Better

{loadposition signup}

{loadposition moreNEWS}

{loadposition moreNEWSlist}

{loadposition comment}

Last Updated ( Tuesday, 01 August 2023 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles