Hashing - The Greatest Idea In Programming
Written by Mike James   
Thursday, 03 September 2020
Article Index
Hashing - The Greatest Idea In Programming
Collisions
Passwords

Although it is a matter of opinion, you can't help but admire the idea of the hash function. It not only solves one of the basic problems of computing - finding something that you have stored somewhere - but it helps with detecting file tampering, password security and more.

 

Killer Algorithms

killercover

Contents

  1. Algorithms - clever ways of doing things
  2. Finding Things With Binary Search
  3. Hashing - The Greatest Idea In Programming 
  4. Advanced Hashing*
  5. Universal Hashing*
  6. QuickSort Exposed*
  7. Quick Median*
  8. The Fast Fourier Transform
  9. Backprop
  10. The Bloom Filter 
  11. The Invertible Bloom Filter !!!NEW
  12. Kalman Filter
  13. Public Key Encryption*
  14. Linear Programming

* First Draft

One of the basic problems of computing is finding something that you have stored somewhere.

For example, if you are keeping a table of names and telephone numbers how can you organise the list so that you can find the phone number given the name?

The most obvious solution is to keep the list sorted into alphabetical order and use some sort of search, usually binary, to locate the name. This is efficient but only as long as the list is kept in strict order.

If you only have to add names and number infrequently then you could use an overflow table that isn't sorted into order. Then to find a name you would use a binary search on the sorted table and if you didn't find the name there a linear search on the overflow table would either find the name or confirm that the name wasn't known.

Using an overflow table in this way works as long as the overflow table is keep short. If you are adding a lot of names then the time to perform the linear search in the overflow table will quickly become too much. In this case it is time to think of hashing.

Hashing is one of the great ideas of computing and every programmer should know something about it. The basic idea is remarkably simple, in fact it is so simple you can spend time worrying about where the magic comes from.

Suppose you have a function,

hash(name)

that will compute a number in the range 1 to N depending in some way on the value of name then why not store the name, and phone number at hash(name) in the table.

Notice that the hash function is a bit strange. It has to be completely deterministic in the sense that you give it a name and it always gives you the same number. However it also has to provide you with a different number for each name and as we will see this isn't always possible. 

Using this scheme finding a name in the table is just a matter of computing hash(name) and looking at this location in the table to see if name is stored there!

Yes, that's all there is to it. Well there are a few practical details but using hash(name) to give the location where name is stored is the key idea.

An Example

To make sure that there can be no confusion about how simple hashing really is let's look at a small example.

If the hashing function is the sum of the character codes of the first two letters of the name minus 128 i.e. 

hash(ABCDE)=ASC(A)+ASC(B)-128

where ASC returns the character code, then

hash("JAMES")

would be stored at

ASC("J")+ASC("A")-128

or, in plain English, at location 11 in the table.

If you want to know if you have a phone number for JAMES you simply compute hash("JAMES") and look in location 11 to see if it's there.

No sorting and no searching required. When you compute the hash function you know where to store the data and you know where to find the data.

The key idea is that a hash function takes in text or any sort of data and outputs a set of numbers based on that data.

hash3

 

This, or something similar, is the way most computer languages implement advanced data structures such as dictionaries are implemented using hashing. For example in Python a dictionary object is implemented using a hash table. So even if you don't explicitly build a hash table you could well be using one. 



Last Updated ( Thursday, 03 September 2020 )