Universal Hashing

### New Book Reviews!

 Universal Hashing
Written by Mike James
Monday, 27 June 2011
Article Index
Universal Hashing
Implementing hashing
Using columns

## Second Method

With a fast parity function the method described above can return a hash in m*b operations where b is the number of bits set in the intermediate result - say m^2/2 on average.

The second method can be more efficient  but it makes use of the columns in the matrices rather than the  rows. So in this case we need to generate m bit values to represent the columns - but why bother? It costs the same to generate to generate 32 bit positive values.

We can simply truncate the result at the end to m bits and get the same result as if we had truncated the columns to m bits before the calculation.

To avoid interacting with the previous code we can change the constructor to create a second random matrix in column order:

`Int32[] hashmatrix2 =new Int32[31];`
`for (int i = 0; i < 31; i++){ hashmatrix2[i] = R.Next();}`

Now the elements of the array, or at least the first m bits are regarded as the columns of the matrix.

Now two work out the hash we select the columns that correspond to the bits set in the data and exclusive or them together:

`public Int32 Hash2(int x){ Int32 sum = 0; for (int i = 0; i < 31; i++) {  if ((x & 0x01) == 0x01)  {   sum ^=hashmatrix2[i];  }  x >>= 1; } return sum>>(31-M);}`

At the end we truncate the result to m bits by right shifting the result down to leave only m bits.

With this in place you can now try out the alternative method of calculation.

`Int32 h2 = hashobj.Hash2(x);`

Notice that you get a different result because they are using two different matrices.

Can these routines be improved?

The answer is yes but how exactly depends on the language and architecture you are working with.

The most obvious improvement is to unroll any fixed sized loop to eliminate the loop structure. A more complex optimization would be to use a very long binary word representation or to use a GPU to parallelize the algorithms. Notice that either algorithm lends itself to being implemented in hardware quite easily.

If you need a universal family of hash functions to try out another algorithm then either of the two methods works well.

Random Matrix Hashing

You can download the code for both the Windows Forms version and the WPF version in a single ZIP file from the CodeBin (note you have to register first).

If you would like to be informed about new articles on I Programmer you can either follow us on Twitter or Facebook or you can subscribe to our weekly newsletter.

#### Related articles:

Hashing

Hashing solves one of the basic problems of computing - finding something that you have stored somewhere.

Extensible hashing and perfect hashing are ideas that are worth exploring.

The Bloom Filter

The Bloom Filter is an ingenious hashing algorithm used in Google's BigTable database

Last Updated ( Tuesday, 22 April 2014 )