| The Invertible Bloom Filter |
| Written by Mike James | ||||||||
| Friday, 15 July 2022 | ||||||||
Page 2 of 2
RetrievalThis is all fairly simple but notice that due to hash collisions some elements of the array may be used to store more than one data element. You can tell which these are by the value of the count field. Any array element that has a count field greater than one cannot be used to retrieve the data as the value and key are storing something that results from XORing with multiple data values. However, any item of data is stored in k different locations and for there to be no location with a count of one would mean that all of the hash functions had been subject to a collision - and that's not very likely. So what this means is we can implement a probabilistic retrieval operation:
You can see the overall idea search each of the locations indicated by each has function for an entry that has a count of 1. Check that the keys match and return the value. If you can't find such an entry or if the keys don't match then return a null to indicated that that the item isn't in the filter. Why do we need to check the key value? The answer is that just as in the case of the standard Bloom filter is is possible that due to collisions all of the entries could have been generated by other data items. So even if all of the entries you check have non-zero counts they could have been generated by chance due to collisions and the only way to rule this out is to store and check the key. Of course there is a price to pay for this fast retrieval. It is possible that the data item is stored in the array but hash collisions have stored other data items in the same locations. In this case the method returns a not_found to indicate that the value may be in the filter even though it cannot be retrieved. This corresponds to the following operation: GET(x): return the value y such that there is a key-value pair, (x; y), in B. If null is returned, then (x; y) is not in B for any value of y. With low (but constant) probability, this operation may fail, returning a “not found” error condition. In this case there may or may not be a key-value pair (x; y) in B. DeleteThe next operation to consider is how to delete or remove an entry. This is surprisingly easy as it corresponds to an insert operation but one that decrements the count. All you have to do is compute the hash functions and XOR the values again but subtract one from the count:
This works because the XOR undoes the previous XOR operation - recall that it is its own inverse. Notice that deleting a data value might reduce an element's count back to one and so it undoes any previous collisions. This is, in fact, also the way you can list as many values as possible from those that have been stored in the filter. The operation corresponds to: DELETE(x; y): delete the key-value pair, (x; y), from B. This operation always succeeds, provided (x; y) is in B.
|
| Article Index |
|---|
| Hashing - The Greatest Idea In Programming |
Comments
or email your comment to: comments@i-programmer.info
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
The Lost Art Of The Storage Mapping Function You may not have heard of SMFs, Storage Mapping Functions, but you are likely to have used them. They tend to be overlooked because there are more exciting methods of implementing storage, such as has [ ... ] |
Bus Basics Buses are everywhere and yes when you are looking for one they tend to come in threes! With that joke out of the way, let’s take a look at what a bus is in general and in particular. |
| Other Articles |




