|Hashing - The Greatest Idea In Programming|
|Written by Mike James|
|Thursday, 03 September 2020|
Page 3 of 3
The Password Mechanism
The fact that a hash can be used to label data is also the basis of most password systems in use. The basic idea is that the user invents a password and the system stores a hash of the password. When the user logs on the system applies the hash function to the password that the user has supplied and if it matches the stored hash then the user is allowed to proceed.
Notice that we have the usual problem with collisions - if the user gives the wrong password that just happens to give the same hash then they get in anyway.
The big advantage of this scheme is that even if an attacker gets the hash value they cannot use it to log in. They need the password that generates the hash and this is difficult to find. The reason is that while it is easy to generate the hash from a password, it is hard to generate a password from the hash. In other words, hash(password) is easy to compute, but hash-1(value) is hard.
Just because it is hard doesn't mean it is impossible and so called password crackers use big computers and GPUs to compute the inverse.
Proof Of Work
One of the more unusual uses of the hash idea is in Bitcoin's proof of work algorithm. In this case a block of data is published and before a server can claim to have validated the transaction it has to find an item of data that when added to the block gives a hash with a given number of zeros. As computing the inverse hash function isn't easy the only way to do this is to add trial data and compute the hash until you find the result with the required number of zeros. This takes time and means that there is a high probability that just one server will find the answer first - and so avoid the problem of multiple servers validating the block.
A nice touch it that the algorithm anticipated the fact that hardware would get better. It includes feedback by measuring the average time taken to solve the hash problem and adds additional zeros to the requirement. You can prove that the time taken to find some data to add to the block to give n zeros in the hash goes up exponentially with n. In other words the difficulty of the problem can be varied to keep the time to solve about the same.
Once the problem is solved and the block declared valid the data is appended to the block so that the data it contains cannot be changed from that point on without invalidating the hash making the transaction tamper proof.
Hash Functions In Algorithms
There are also lots of uses for hash functions to speed up algorithms. For example, if you need to weed out duplicate records in a set of size N you might end up comparing every possible pair, i.e. roughly N2/2 comparisons, or better sorting the table. An alternative is to compute hash values for all the records and store pointers to each record in a hash table. Duplicate records are then just collisions in the table.
You can also use hash functions with additional properties to implement matching algorithms. If you can invent a hash function that maps data that you consider to be similar to hash values that are close, you can use hashing to search not just for duplicates, i.e. identical values, but values that are close. This approach is important in string matching, acoustic matching and even pattern recognition.
There are lots of very clever algorithms that make use of the quick and easy way you can check for inequality between data. The basic principle is always that if two objects have different hash values then they can't be the same thing.
Make sure you check out the Bloom filter for example.
* First Draft
or email your comment to: email@example.com
|Last Updated ( Thursday, 03 September 2020 )|