Page 2 of 2
A Practical Application - A Database Array
If binary trees sound too unlikely to gain your interest in SMFs then perhaps it is worth pointing out their use with respect to disk files.
For example, suppose you need a 2D array bigger than RAM will allow then as long as you have a fast hard disk available what could be easier than implementing a virtual array using a random access file or a database table.
Simply set up a random access file such that each record can hold exactly one value and then when you want to access the value of a[i,j] seek record number i + n*j where n is the size of the first dimension.
You can probably work out for yourself various ways of making the process more efficient - for example by defining a record large enough to hold a complete row of the array etc. The same principles apply to creating any virtual data structure on disk - store each element of the data structure in a record and use an SMF to find which record corresponds to each element.
It is also worth pointing out that this is how databases are implemented. In this case you set up a file of a given size and consider it to consist of fixed size records. If each record uses b Bytes of storage then the location of the ith record is obviously:
and this is were you perform a random access seek to and read in b bytes to get the ith record.
In many cases you don't lookup the database using the record number but a more general key. In this case there is an extra step - lookup the record number you need in a key table. In other words, if you want to find the record containing the data corresponding to "key" you first look "key" up in the index which gives the record number.
This works in all cases unless the record size is variable when you need a different approach.
Hashing - Just Another SMF
If you are still unimpressed by the idea of an SMF then perhaps my last example will please you. The whole idea of an SMF can be generalised to include a function that maps some elements into the same storage location.
This may seem like a crazy idea but you might have come across it before under the names "scatter storage" or "hash functions". Whatever you call it, it has to be one of the nicest ideas in the whole of computing - see Hashing - The Greatest Idea In Programming.
The principle of an SMF is that given one data value, the key, you can find the location of an associated data value using f(key).
All of the SMFs we have looked at so far have been very regular. They make use of the regularity of the data to map it to a one-after the other sequential storage - but sometimes this isn't necessary.
Suppose you can find a function, any old function, f(key) that gives you a location for all possible values of the key and in most but not all cases gives you different locations for different keys - then why not use it?
For example, suppose you want to store words in an array. You could use the SMF given by adding together the ASCII codes of the first two letters minus 128.
For example, f(CAT) would be 67+65-128 (ASCII codes of C and A minus 128) or 4. This means that you could store CAT in location 4. In the same way f(DOG) is 68+79-128 = 19 and so DOG would be stored in location 19.
This works just as well as a regular SMF but with one problem sometimes two different keys will be mapped to the same location. For example as only the first two letters are used f(CAR) is the same as f(CAT) and we would attempt to store the two at the same place.
This is called a collision and different scatter storage or hashing schemes deal with the problem in different ways. The easiest thing to do is to check to see if the location given by f(key) has been used and if it has check f(key)+1, f(key)+2 and so on until you find a free location. There are lots of variations on collision management but this linear search is simple and fairly efficient.
If you don't see the point of hashing functions then try the problem of storing and subsequently finding names in an array. Without hash functions you either have to perform a sequential search of the array or sort the array and perform a quadratic (binary) search. The former is inefficient and the latter complex and has the overhead of a complete sort. A hashing function gives you the location of any word in one evaluation and even if there is a collision you should find the word after a short linear search.
There is a lot more to be said about hash functions but the main thing is that you see them as nothing more than slightly odd SMFs.
You can think of hashing as using a chaotic or pseudo random SMF.
SMFs Hidden Rather Than Forgotten
So it looks as if Storage Mapping Functions are alive and well after all. Perhaps the sub-title of this article should have been the "hidden" rather than "lost" art.
Hashing - The Greatest Idea In Programming
The Bloom Filter
The Invertible Bloom Filter
Inside Bitcoin - virtual currency
Assemblers and assembly language