The Lost Art Of The Storage Mapping Function
The Lost Art Of The Storage Mapping Function
Written by Harry Fairhead   
Tuesday, 05 November 2013
Article Index
The Lost Art Of The Storage Mapping Function
Practical SMF

You may not have heard of SMFs, Storage Mapping Functions, but you are likely to have used them. They tend to be overlooked because there are more exciting methods of implementing storage, such as hashing schemes, but really it all started right here with an SMF. 

A Necessary Skill

Back in the bad old days when languages were primitive and programmers had to punch cards, most knew about the academic sounding Storage Mapping Functions or SMFs.

In these days of data abstraction and encapsulation SMFs are less well known, but just as useful. In fact they are so important that you probably use the technique often without even realizing it and without knowing that it has a name. 

A storage mapping function is simply a function that gives the location in memory where something is stored. 

If you want to be abstract, then a SMF is a function F that takes a key i.e. something determines the data you want and returns the location of the data:


The nature of a storage mapping function changes according to the type of key used and, of course with the type of storage in use. 

Storage mapping functions were probably invented when programmers first needed to implement the one dimensional array. 

For example, an array of two byte integers can be implemented using the simple SMF:

location = base + 2*index

where base is the starting address of the array.

This works simply because the first array element is stored at base, the next is at base+2, the next at base+4 and in general the ith element is stored at base+2*i.

In general a one-dimensional array composed of elements that each occupy b bytes can be implemented using the SMF:

location = base +b*index

Obviously to find the location of a[i], assuming that you have decided to call the array a, all you have to do is work out base+b*i and then retrieve or store the value associated with a[i] there.

If the first array element is considered to a[0] then you have to work with i-1 making the SMF

location = base +b*(index-1)

You can argue that this is the reason that most arrays start from an index of zero - it avoids the messy subtract 1. 

This really is how assembly language programmers did the job. They didn't have arrays - they have to structure the storage they had using SMFs. Given the address of a block of memory then the SMF was used to work out where to store a value and where to retrieve it. 

If you know C you might just recognize the way pointer arithmetic works with arrays as being just a formalization of the array SMF. 

Into 2D

SMFs for one-dimensional arrays are a bit too simple to be interesting but at least they make good examples.

The same principle however applies to two-dimensional arrays.

A two-dimensional array a[i,j] of b byte elements with dimension n by m can be implemented using the SMF:

location = base + b*i + b*n*j

It is also assumed that the array indices all start at 0, if not then simply change i to i-1, and j to j-1 etc. 

The same idea can be extended to as many dimensions as you like but its when you get to 2D arrays that the meaning of a SMF starts to become clear. What you are doing is finding a function that maps the structure of the data onto a linear storage system. In the case of the 2D array you have a table and its rows are mapped onto linear storage one after the other:



Bitmaps And Stride

This is also the SMF used to map pixel data into linear storage be it memory or a file. If you have a bitmap stored in a byte array with nxm pixels and each pixel takes bpp bytes to store the pixel at x,y is stored starting at:


The only complication in the case of a bitmap image is that sometimes the length of a row has to be a multiple of four or some other value. In this case the length of a row isn't just the size given by the bitmap's dimensions. That is the row is padded beyond the m pixels to make it a multiple of what ever number is specified. The physical storage size of a row of a bitmap image is usually referred to as its "stride". This makes the SMF:


There are lots of other cases where you need to map a 2d structure onto a line and the same SMF is used with minor changes. If you need to map a n dimensional structure onto a line then the same sort of form works. For example suppose you have a cube of data n by m by p. Then you are going to store the data in row order and add the number of columns and the number of pages. So to find C[i,j,k] the SMF is:

location = i + n * j + n*m*k

If it takes b bytes to store an element then you multiply the SMF by b. 

The n is just the number of elements in a row and the n*m is the number of elements in a page. The same general method works for any dimension.  

Key Mapping and Trees

The two-dimensional array SMF is beginning to look a little more interesting but it still nothing more than an application of the idea of using a function f(key) to locate the data corresponding a key.

In the case of the one-dimensional array the key is simply the index i and the location of a[i] is given by f(i) and in the two-dimensional case the keys are i and j and the location of a[i,j] is given by f(i,j). Higher dimensional arrays can be treated in the same way.

At this point my guess is that many a high-level language programmer will be thinking how could an SMF be at all useful to me?

After all, high-level languages keep programmers well away from real memory locations. The answer is that while most languages provide simple data structures such as arrays, they often don't provide the rarer data structures such as trees. Even when they do there are times when an SMF implementation of a simple or exotic data structure can be useful. 

The trick is to use an array as if it was a chunk of old-fashioned linear memory and a suitable SMF to map the new data structure onto the array.


Well let's try a simple binary tree implementation.

A binary tree is a tree structure such that each node, apart from the final or terminal nodes, has exactly two children - hence the binary in binary tree.

Each node is associated with a value that takes n bytes to store then a suitable SMF is:

location = base + n*i + n*(2^j-1)

which gives the location of the value stored at ith node at the jth level. That is we are using i and j to "index" the tree by node number i at level j.

To see why this works consider the following small binary tree:




and so on.

Notice that going from one level to the next doubles the number of nodes and you should be able to work out that the number of nodes at level j is simply 2^j.

With this insight you should now also be able to see how the SMF works. The tree is stored as lines of nodes one level at a time. That is:

location    0   1 2   3 4 5 6
node        0 | 0 1 | 0 1 2 3
level       0 | 1   | 2

To make use of this SMF in a high-level language all you have to do is declare an array of elements that will hold the node values and use the SMF with n=1 to determine which element node i at level j is stored in.

For example, if the array is a[N] then node i at level j is stored in a[i+2^j-1].

You can see the general idea. If you have a data structure which has a regular pattern then you need to find a function that maps the regular pattern to a linear sequence of storage locations.

Basically you need a function which enumerates the elements of the data structure one after another. Inventing a storage mapping function can be tricky but its always possible - unless the data structure can't be enumerated.



Last Updated ( Tuesday, 05 November 2013 )

RSS feed of all content
I Programmer - full contents
Copyright © 2018 All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.