chg: [blog] fixed

pull/99/head
Alexandre Dulaunoy 2024-03-25 11:16:31 +01:00
parent 983fd8969c
commit e2b045d05e
No known key found for this signature in database
GPG Key ID: 09E2CD4944E6CBCD
1 changed files with 1 additions and 1 deletions

View File

@ -52,7 +52,7 @@ Another surprising thing is the very high value of `m` which is `uint64::MAX - 5
Everything was going fine, until we looked for further optimization to apply.
Digging into other datastructures (like hash tables) implementations, we noticed a nice optimization can be done on the bit **index generation algorithm**. This optimization consists of taking a bitset (holding all the bits of the filter) with a size being a **power of two**. If we do so, instead of computing bit index with a modulo, we can compute bit index with a `bit and` operation. This comes from the following nice property, for any `m` being a power of two `i % m == i & (m - 1)`. It is important to understand that this particular optimization will allow us to win on CPU cycles as less instructions are needed to compute the bit indexes.
Digging into other datastructures (like hash tables) implementations, we noticed a nice optimization can be done on the **bit index generation algorithm**. This optimization consists of taking a bitset (holding all the bits of the filter) with a size being a **power of two**. If we do so, instead of computing bit index with a modulo, we can compute bit index with a `bit and` operation. This comes from the following nice property, for any `m` being a power of two `i % m == i & (m - 1)`. It is important to understand that this particular optimization will allow us to win on CPU cycles as less instructions are needed to compute the bit indexes.
Trying to apply this with bloom filter is fairly trivial, when computing the desired size (in bits) of the bloom filter we just need to take **the next power of two**. When implementing this with our current Rust port of the [bloom](https://github.com/DCSO/bloom) we realized the implementation was broken. When the bit size of the filter becomes a power of two, the **real false positive proability** of the filter literally explodes (**x20** in some cases). Without, going further into the details we believe the weakness comes from the **bit index generation** algorithm describes above. A [Github issue](https://github.com/DCSO/bloom/issues/19) has been opened in the original project to track this issue.