From e2b045d05ea9f0bf6656c9c23a84bd1b9eb52c21 Mon Sep 17 00:00:00 2001 From: Alexandre Dulaunoy Date: Mon, 25 Mar 2024 11:16:31 +0100 Subject: [PATCH] chg: [blog] fixed --- content/blog/Poppy-a-new-bloom-filter-format-and-project.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/blog/Poppy-a-new-bloom-filter-format-and-project.md b/content/blog/Poppy-a-new-bloom-filter-format-and-project.md index 7ad2428..79be3ac 100644 --- a/content/blog/Poppy-a-new-bloom-filter-format-and-project.md +++ b/content/blog/Poppy-a-new-bloom-filter-format-and-project.md @@ -52,7 +52,7 @@ Another surprising thing is the very high value of `m` which is `uint64::MAX - 5 Everything was going fine, until we looked for further optimization to apply. -Digging into other datastructures (like hash tables) implementations, we noticed a nice optimization can be done on the bit **index generation algorithm**. This optimization consists of taking a bitset (holding all the bits of the filter) with a size being a **power of two**. If we do so, instead of computing bit index with a modulo, we can compute bit index with a `bit and` operation. This comes from the following nice property, for any `m` being a power of two `i % m == i & (m - 1)`. It is important to understand that this particular optimization will allow us to win on CPU cycles as less instructions are needed to compute the bit indexes. +Digging into other datastructures (like hash tables) implementations, we noticed a nice optimization can be done on the **bit index generation algorithm**. This optimization consists of taking a bitset (holding all the bits of the filter) with a size being a **power of two**. If we do so, instead of computing bit index with a modulo, we can compute bit index with a `bit and` operation. This comes from the following nice property, for any `m` being a power of two `i % m == i & (m - 1)`. It is important to understand that this particular optimization will allow us to win on CPU cycles as less instructions are needed to compute the bit indexes. Trying to apply this with bloom filter is fairly trivial, when computing the desired size (in bits) of the bloom filter we just need to take **the next power of two**. When implementing this with our current Rust port of the [bloom](https://github.com/DCSO/bloom) we realized the implementation was broken. When the bit size of the filter becomes a power of two, the **real false positive proability** of the filter literally explodes (**x20** in some cases). Without, going further into the details we believe the weakness comes from the **bit index generation** algorithm describes above. A [Github issue](https://github.com/DCSO/bloom/issues/19) has been opened in the original project to track this issue.