Hashing - The Greatest Idea In Programming
Written by Mike James   
Thursday, 03 September 2020
Article Index
Hashing - The Greatest Idea In Programming

The Password Mechanism

The fact that a hash can be used to label data is also the basis of most password systems in use. The basic idea is that the user invents a password and the system stores a hash of the password. When the user logs on the system applies the hash function to the password that the user has supplied and if it matches the stored hash then the user is allowed to proceed.

Notice that we have the usual problem with collisions - if the user gives the wrong password that just happens to give the same hash then they get in anyway. 

The big advantage of this scheme is that even if an attacker gets the hash value they cannot use it to log in. They need the password that generates the hash and this is difficult to find. The reason is that while it is easy to generate the hash from a password, it is hard to generate a password from the hash. In other words, hash(password) is easy to compute, but hash-1(value) is hard. 

Just because it is hard doesn't mean it is impossible and so called password crackers use big computers and GPUs to compute the inverse.  

Proof Of Work

One of the more unusual uses of the hash idea is in Bitcoin's proof of work algorithm. In this case a block of data is published and before a server can claim to have validated the transaction it has to find an item of data that when added to the block gives a hash with a given number of zeros. As computing the inverse hash function isn't easy the only way to do this is to add trial data and compute the hash until you find the result with the required number of zeros. This takes time and means that there is a high probability that just one server will find the answer first - and so avoid the problem of multiple servers validating the block. 

A nice touch it that the algorithm anticipated the fact that hardware would get better. It includes feedback by measuring the average time taken to solve the hash problem and adds additional zeros to the requirement. You can prove that the time taken to find some data to add to the block to give n zeros in the hash goes up exponentially with n. In other words the difficulty of the problem can be varied to keep the time to solve about the same. 

Once the problem is solved and the block declared valid the data is appended to the block so that the data it contains cannot be changed from that point on without invalidating the hash making the transaction tamper proof. 

Hash Functions In Algorithms

There are also lots of uses for hash functions to speed up algorithms. For example, if you need to weed out duplicate records in a set of size N you might end up comparing every possible pair, i.e. roughly N2/2 comparisons, or better sorting the table. An alternative is to compute hash values for all the records and store pointers to each record in a hash table. Duplicate records are then just collisions in the table.  

You can also use hash functions with additional properties to implement matching algorithms. If you can invent a hash function that maps data that you consider to be similar to hash values that are close, you can use hashing to search not just for duplicates, i.e. identical values, but values that are close. This approach is important in string matching, acoustic matching and even pattern recognition. 

There are lots of very clever algorithms that make use of the quick and easy way you can check for inequality between data. The basic principle is always that if two objects have different hash values then they can't be the same thing.

Make sure you check out the Bloom filter for example.  



Killer Algorithms



  1. Algorithms - clever ways of doing things
  2. Finding Things With Binary Search
  3. Hashing - The Greatest Idea In Programming 
  4. Advanced Hashing*
  5. Universal Hashing*
  6. QuickSort Exposed*
  7. Quick Median*
  8. The Fast Fourier Transform
  9. Backprop
  10. The Bloom Filter 
  11. The Invertible Bloom Filter !!!NEW
  12. Kalman Filter
  13. Public Key Encryption*
  14. Linear Programming

* First Draft

Related Articles

Storage Mapping Functions

Inside Bitcoin - virtual currency

Assemblers and assembly language       


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


kotlin book



or email your comment to: comments@i-programmer.info



We take multitasking for granted now but it was a difficult technology to get right - and still is. We take a look at how it all developed and at the variations on the basic idea.

Codd and His Twelve Database Rules

Theories of how we should organize databases are thin on the ground. The one exception is the work of E.F. Codd, the originator of the commandment-like “Codd’s Rules”. This approach to database  [ ... ]

Other Articles




Last Updated ( Thursday, 03 September 2020 )