Google Open Sources Three New Hash Functions
Written by Alex Armstrong   
Thursday, 24 March 2016

OK, a hash function may not be a huge piece of software, but when you know how hard it is to find a good hash function having three more open source is to be welcomed. 



The first is a parallel implementation of SipHash, a fast cryptographically strong pseudo-random function. The Google version produces the same output as the standard SipHash but 1.5 times quicker due to the use of AVX-2 instructions. AVX instructions work with 128-bit vectors. SipHash is important because it is used to implement hash tables in a lot of modern languages, for example Python, Haskell, Ruby and Rust. It is also used in systems packages such as systemd and OpenDNS. The point really is that hash functions are everywhere. 

Fast random hash functions are needed in servers and language implementations because of the possibility of hash flooding attacks. A hash table works fast as long as the table has plenty of space and the hash function is good. However, if you present a hash table with a list of data that all have the same hash then they all want to be stored in the same location and every update after the first is more costly. This can be so costly as to bring a server to a halt.

If you think this is unlikely then in 2011 an attack on a PHP server with 500KB of carefully constructed POST data took a minute of CPU time to deal with. The solution is to use a random hash function which can be set to give a different set of hash values each time it is used on the same data, e.g. SipHash.


SipTreeHash is a variant on SipHash and doesn't give the same outputs. It works by slicing the input into 8-byte chunks and computes the SipHash on each chunk in parallel, again using AVX-2 instructions, and then combines the result using a final SipHash. The only problem is that AVX-2 instructions are only available on Intel's Haswell and later processors or on the soon-to-be-released AMD processors. 

As there is an overhead in performing the chunking, this is only faster than SipHash if the input is larger than 96 bytes. For large enough inputs SipTreeHash is up to three times faster than SipHash. 


The final new hash function is HighwayHash - where do they get the names from? In this case the hash function is completely new and designed to make best use of AVX-2 multiply and permute instructions. The basic idea is that the 32x32 bit multiplications used produce 64-bit results and so are not easy to reverse and the permutations even out the distribution of bytes. The payoff is that it is two to three times faster than SipTreeHash and hence up to six times faster than SipHash. 

The big problem is that this new hash function is not well studied and this is one of the reasons for open sourcing the code. It is hopped that HighwayHash will be examined and proved cryptographically strong. 

The code is available on GitHub and, as you might expect, it is all in C++ and complied using GCC. 


More Information

HighwayHash On GitHub

SipHash: a fast short-input PRF 


Related Articles

Hashing - The Greatest Idea In Programming 

Advanced Hashing 

The Reason For The Weird PHP Function Names

The Lost Art Of The Storage Mapping Function 

Universal Hashing 

The Invertible Bloom Filter 

The Bloom Filter 



To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin



Semantic Kernel for Java Now GA

The Microsoft Semantic Kernel (SK) is an open source lightweight framework that lets you easily mix conventional programming languages with AI "prompts". After a year of work since the initial re [ ... ]

Pg_lakehouse Makes PostgreSQL Quack

Pg_Lakehouse from ParadeDB is an extension that turns PostgreSQL into the analytical engine of DuckDB. Why is that useful? How do you use it?

More News


kotlin book



or email your comment to:

Last Updated ( Saturday, 26 March 2016 )