Advanced Hashing

Written by Mike James

Thursday, 22 April 2021

Article Index
Advanced Hashing
Extensible Hashing
Perfect Hashing

Page 3 of 3

Perfect hashing

This use of a table to construct a hash function produces excellent hash function behaviour but it also opens up another possibility.

As the table determines where any particular key will be hashed to and the table is something that we create why not try to create tables with advantageous properties. For example, why not test the quality of the hashing function by trying it out on a random selection of keys and see where they are hashed to. If the hash function produces a lot of collisions then you can scrap it and try again.

Of course being programmers the obvious thing to do is to write a program that generates hash functions and tests them automatically - a sort of designer hash function factory!

Now take this search for a good hashing function one tiny step further. Let's look for a perfect hash function! If you know that there is a particular set of data that you want to work with you could use this as your test set of keys and leave the hash function factory searching until it found a table based hash function that produces zero collisions! There is no guarantee that such a table will be found in a reasonable amount of time but this is a gamble worth taking.

A perfect hash function can be used to store the test set of keys without collision and so you can find them again with a single lookup.

Of course as soon as you move beyond the test set of keys collisions will happen but in some applications this might not be important.

For example, if you were writing a compiler then you could search for a perfect hash function for the keywords of the language as part of the tokenisation pass. In this case it would be even better if the hash function hashed each keyword onto consecutive values without gaps i.e. the first keyword hashed to 1, the second to 2 and so on.

Minimal perfect hash function

Such a hash function is called a minimal perfect hash function for the set of keys.

The uses of a minimal perfect hash function aren't that wide but they are important.

For example, in the case of the compiler tokenisation routine each identifier could be hashed in turn. The hash values of the keywords could be arranged to lie in a known range by using a minimal perfect hash function constructed using them. The non-keyword identifiers would be stored in the table in the usual way with collision handling provided.

Consider the problem of detecting particular key words within an input stream of characters. A minimal perfect hash function constructed for the words that you are trying to detect in the input stream would work very well as an alarm system. Simply hash the data stream and look out for the particular range of values that the keywords occupy. Notice that using this method you are testing for the presence of all the keywords with a single hash function!

Perfect hash functions sound useful but the difficulty is in constructing them. The crude trial and error method is very slow and to do the job properly you need to invent a few heuristics to guide the search by modifying the existing table, rather than starting with a completely fresh table each time.

hash3

Hashing - The Greatest Idea In Programming

The Bloom Filter

The Invertible Bloom Filter

Universal Hashing

Storage Mapping Functions

Inside Bitcoin - virtual currency

Assemblers and assembly language

What Programmers Know

knowcover

* Recently revised

Comments

or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Virtual Memory

Virtual memory is a way of pretending that your computer has more memory than it really has. But like all good things it comes at a cost. Virtual memory is an example of trading speed for storage.

+ Full Article

Inside Random Numbers

We often refer to things that are unpredictable as being "random" but this is not the same as truly random behavior - which is something we have to work hard to achieve. Put another way - how can a lo [ ... ]

+ Full Article

Recent Articles

Recent Book Reviews

Popular Articles

Perfect hashing

Minimal perfect hash function

Related Articles

What Programmers Know

Contents

Comments