Hutter Prize Awarded Again
Tuesday, 01 August 2023

A tweet from Marcus Hutter last month announced a new winner of the €500.000 Prize for Compressing Human Knowledge. €5.187 was awarded to Saurabh Kumar for setting a new world record in compressing a 1GB excerpt of Wikipedia. Why the odd number? It's in recognition of beating the previous record by 1.04%.

Hutter Tweet

The Hutter Prize is an ongoing competition which rewards incremental improvements in data compression of an excerpt from Wikipedia. It was inaugurated in 2006 by Marcus Hutter who is a currently a Senior Researcher at Google DeepMind in London, and Honorary Professor in the Research School of Computer Science (RSCS) at the Australian National University  researching the mathematical foundations of artificial general intelligence (AGI), the topic of his book Universal Artifical Intelligence.

Originally the task was to find better compression for a 100Mb sample of Wikipedia with a prize of €50.000 but both the size of the target and the prize money were raised by a factor of 10 in 2020,  see Hutter Prize Now 500,000 Euros. The revised challenge, referred to as 10xHKCP, uses as its target enwik9, 1GB of Wikipedia. According to Matt Mahoney who runs the compression competition:

Assuming you spend several hours a day reading, writing, talking, or listening, you process about a gigabyte of language in your lifetime.

Explaining the rationale of the Hutter Prize to Lex Frieman in an interview in 2020, Hutter stated:

Being able to compress well is closely related to intelligence. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 1GB of Wikipedia better than your predecessors, your (de)compressor likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/ programs as a path to AGI.

The contest is is open to everyone. To enter, a competitor must submit a compression program and a decompressor that decompresses to the file enwik9. The total size of the compressed file and decompressor (as a Win32 or Linux executable) must be less than or equal 99% of the previous prize winning entry. For each one percent improvement, the competitor wins 5,000 euros. The decompression program must also meet execution time and memory constraints. Another sipulation is that the compression program must be open-source.

The first winner since the revised challenge was launched was Artemiy Margaritov, a researcher at the University of Edinburgh. In May 2021 his STARLIT algorithm beat the previous benchmark that had been set by Alexander Rhatushnyak in July 2019. Now Saurabh Kumar, a graduate of IIT (Indian Institutes of Technology) has compressed the file to 114,156,155 bytes, surpassing the precious record by 1.04%. He did this using  fast-cmix-hp, a speed optimization of STARLIT and cmix-hp, an open source compressor that has gone through many versions for nearly a decade.    




