|New Hutter Prize Milestone For Lossless Compression|
|Written by Mike James|
|Friday, 06 August 2021|
A new milestone has been achieved in the endeavour to develop a lossless compression algorithm. Artemiy Margaritov, a researcher at the University of Edinburgh has been awarded a prize of 9000 Euros ($10,632) for beating the previous Hutter Prize benchmark by 1.13%.
The Hutter Prize for Lossless Compression of Human Knowledge was launched in 2006 with its stated aim being:
to encourage the development of intelligent compressors/ programs as a path to AGI [Artificial General Intelligence].
This challenge was initiated by Marcus Hutter, author of the seminal 2005 book on "Universal Artificial Intelligence". Formerly a professor at the Research School of Computer Science at the Australian National University, Hutter is now DeepMind Senior Scientist researching the mathematical foundations of artificial general intelligence (AGI). Explaining why he is funding a data compression contest that furthers research into AGI, he states:
This compression contest is motivated by the fact that being able to compress well is closely related to acting intelligently, thus reducing the slippery concept of intelligence to hard file size.
As we reported at the time, last year Hutter increased the size of both the task and the reward by a factor of ten. So there is now 500,000 Euros to be won, paid out for incremental improvements in data compression of an excerpt from Wikipedia.
Originally the task was to losslessly compress the 100 Mb file enwik8 with a baseline of 18,324,887. The new challenge uses the 1GB file enwik9 to less than 116MB. More precisely:
The first winner since the revised challenge was launched is Artemiy Margaritov. In May 2021 his STARLIT algorithm beat the last benchmark, set by Alexander Rhatushnyak in July 2019. He received a bonus in proportion to the time since the last benchmark was set, raising his award by 60% to €9000.
STARLIT, compresses by first reordering the articles in enwik9 to maximize mutual information between consecutive articles, then uses a dictionary preprocessor from and compresses using a reduced version of cmix, a neural network based program, to decrease memory usage from 32 GB to 10 GB and increase speed. Then the dictionary and article order list (both text files) are compressed with the newly created cmix and appended to the executable. The size is 124,984 bytes before appending and 401,505 bytes afterward. To decompress, archive9 is run, which extracts the dictionary, article order list, and 17 GB of temporary files, and 2 days later, the output as a file named enwik9_uncompressed. The original article order is restored by sorting the titles alphabetically. The improvements seem to be more due to tuning rather than any breakthrough.
|Last Updated ( Friday, 06 August 2021 )|