Bloom filters are an essential component of an LSM-based database engine like MyRocks. This post will illustrate through a simple example how bloom filters work in MyRocks.
Why?
With MyRocks/RocksDB, data is stored in a set of large SST files. When MyRocks needs to find the value associated with a given key, it uses a bloom filter to guess if the key could potentially be in an SST file.
How?
A bloom filter is a space-efficient way of storing information about a list of keys. At its base, there is a bitmap and a hash function. The hash value of the keys stored in an SST is computed, and the results are used to set some bits to “1” in the bitmap. When you want to know if a key is present or not in the list, you run it through the hash function and check if the corresponding bits in the bitmap are …
[Read more]