Page 2 of 3
An implementation
To demonstrate how the idea works let's create a small class that implements both approaches to the random binary matrix hash.
First we need to create the random matrix. Instead of working with a bit array it makes more sense to use an Int32 for each row of the array and assume that the input data is an int32. For simplicity we can use Int32 and just use the lower 31 bits to give a positive number range for the data.
So start a new C# project and add a new class complete with random number generator and Int32 array ready to hold the rows of the bit matrix:
class Mhash { Random R = new Random(); Int32[] hashmatrix; Int32 M=0;
The constructor creates the random matrix with m rows and stores m for other methods to make use of:
public Mhash(Int32 m) { M=m; hashmatrix= new Int32[M]; for (int i = 0; i < M; i++) { hashmatrix[i] = R.Next(); } }
Once the constructor is finished we can use it to create a hash object capable of taking a positive Int32 and returning an m bit hash.
The next step is to create the method that does this job:
public Int32 Hash1(Int32 x) { Int32 hash=0; for (int i = 0; i < M; i++) { hash=hash<<1;
hash = hash  parity(x & hashmatrix[i]); } return hash; }
The method takes each row of the matrix and ands it with the data. The result is then converted by the parity function into a 0, for even parity or a 1, for odd parity. Don't worry how parity works for the moment. At the end of the routine we have an m bit hash to return.
If you want to write the method a little more compactly you could write the body of the for loop as:
for (int i = 0; i < M; i++) { hash <<= 1; hash = parity(x & hashmatrix[i]); }
Now to try it out all you have to do is:
Mhash hashobj=new Mhash(8); Int32 h1 = hashobj.Hash1(x);
and you have an 8bit hash for any positive Int32 you care to supply.
Parity
This looks like a very easy and very efficient way of creating an m bit hash needing only m ands and m shifts. The problem is that there is still the parity function to write.
As this needs to examine each bit to effectively count the number of ones in the data this seems to need m operations. In fact the job can be done in a number of operations that equals the number of bits set. This algorithm is attributed Brian Kernigan but its based on a technique well worth knowing. How do you zero the least significant set bit?
At first it seems impossibly difficult to do without using shifts and tests to find the first bit set. However if you start out with a value and subtract 1 then this always zeros the least significant bit that was set  think about it...
For example:
101110 1 = 101101
or
1011001 = 101011
and so on. At this point you might think that no progress has been made because while you have zeroed the least significant set bit you have set other lower order bits. Of course the trick is that the original data already had these bits zeroed. So if you and the new value with the original then the result will have all of those bits zeroed in addition to the least significant set bit.
Thus to zero the least significant set bit in x all you have to do is:
x= x & (x10);
If you keep doing this then eventually you will zero all of the bits and x will be zero.
Now you can put this to use in a parity function that only iterates the number of set bits times:
Int32 parity(Int32 p) { int flag=0; while(p!=0) { flag=flag ^ 1; p=p&(p1); } return flag; }
Once again this can be written more compactly as:
Int32 parity(Int32 p) { int flag=0; while(p!=0) { flag ^= 1; p &=(p1); } return flag; }
