Hashing - The Greatest Idea In Programming

Written by Mike James

Thursday, 03 September 2020

Article Index
Hashing - The Greatest Idea In Programming
Collisions
Passwords

Page 3 of 3

The Password Mechanism

The fact that a hash can be used to label data is also the basis of most password systems in use. The basic idea is that the user invents a password and the system stores a hash of the password. When the user logs on the system applies the hash function to the password that the user has supplied and if it matches the stored hash then the user is allowed to proceed.

Notice that we have the usual problem with collisions - if the user gives the wrong password that just happens to give the same hash then they get in anyway.

The big advantage of this scheme is that even if an attacker gets the hash value they cannot use it to log in. They need the password that generates the hash and this is difficult to find. The reason is that while it is easy to generate the hash from a password, it is hard to generate a password from the hash. In other words, hash(password) is easy to compute, but hash^-1(value) is hard.

Just because it is hard doesn't mean it is impossible and so called password crackers use big computers and GPUs to compute the inverse.

Proof Of Work

One of the more unusual uses of the hash idea is in Bitcoin's proof of work algorithm. In this case a block of data is published and before a server can claim to have validated the transaction it has to find an item of data that when added to the block gives a hash with a given number of zeros. As computing the inverse hash function isn't easy the only way to do this is to add trial data and compute the hash until you find the result with the required number of zeros. This takes time and means that there is a high probability that just one server will find the answer first - and so avoid the problem of multiple servers validating the block.

A nice touch it that the algorithm anticipated the fact that hardware would get better. It includes feedback by measuring the average time taken to solve the hash problem and adds additional zeros to the requirement. You can prove that the time taken to find some data to add to the block to give n zeros in the hash goes up exponentially with n. In other words the difficulty of the problem can be varied to keep the time to solve about the same.

Once the problem is solved and the block declared valid the data is appended to the block so that the data it contains cannot be changed from that point on without invalidating the hash making the transaction tamper proof.

Hash Functions In Algorithms

There are also lots of uses for hash functions to speed up algorithms. For example, if you need to weed out duplicate records in a set of size N you might end up comparing every possible pair, i.e. roughly N²/2 comparisons, or better sorting the table. An alternative is to compute hash values for all the records and store pointers to each record in a hash table. Duplicate records are then just collisions in the table.

You can also use hash functions with additional properties to implement matching algorithms. If you can invent a hash function that maps data that you consider to be similar to hash values that are close, you can use hashing to search not just for duplicates, i.e. identical values, but values that are close. This approach is important in string matching, acoustic matching and even pattern recognition.

There are lots of very clever algorithms that make use of the quick and easy way you can check for inequality between data. The basic principle is always that if two objects have different hash values then they can't be the same thing.

Make sure you check out the Bloom filter for example.

hash3

Killer Algorithms

killercover

Algorithms - clever ways of doing things
Finding Things With Binary Search
Hashing - The Greatest Idea In Programming
Advanced Hashing*
Universal Hashing*
QuickSort Exposed*
Quick Median*
The Fast Fourier Transform
Backprop
The Bloom Filter
The Invertible Bloom Filter !!!NEW
Kalman Filter
Public Key Encryption*
Linear Programming

* First Draft

Article Index
Hashing - The Greatest Idea In Programming

Storage Mapping Functions

Inside Bitcoin - virtual currency

Assemblers and assembly language

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Facebook or Linkedin.

Comments

or email your comment to: comments@i-programmer.info

Programmer's Guide To Theory - Splitting the Bit

Information theory – perhaps one of the most remarkable inventions of the twentieth century - naturally leads on to the consideration of how information can be coded and hence coding theory.

+ Full Article

Inside Random Numbers

We often refer to things that are unpredictable as being "random" but this is not the same as truly random behavior - which is something we have to work hard to achieve. Put another way - how can a lo [ ... ]