Skip to content

Commit

Permalink
Add class-level docs to bloom filter (and builder) and include the se…
Browse files Browse the repository at this point in the history
…rialization format in the impl file
  • Loading branch information
jmalkin committed Aug 14, 2024
1 parent 533b6b9 commit ecc856b
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 0 deletions.
33 changes: 33 additions & 0 deletions filters/include/bloom_filter.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,14 @@ template<typename A> class bloom_filter_builder_alloc;
using bloom_filter = bloom_filter_alloc<std::allocator<uint8_t>>;
using bloom_filter_builder = bloom_filter_builder_alloc<std::allocator<uint8_t>>;

/**
* <p>This class provides methods to help estimate the correct parameters when
* creating a Bloom filter, and methods to create the filter using those values.</p>
*
* <p>The underlying math is described in the
* <a href='https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions'>
* Wikipedia article on Bloom filters</a>.</p>
*/
template<typename Allocator = std::allocator<uint8_t>>
class bloom_filter_builder_alloc {
using A = Allocator;
Expand Down Expand Up @@ -149,6 +157,31 @@ class bloom_filter_builder_alloc {
static void validate_accuracy_inputs(uint64_t max_distinct_items, double target_false_positive_prob);
};

/**
* <p>A Bloom filter is a data structure that can be used for probabilistic
* set membership.</p>
*
* <p>When querying a Bloom filter, there are no false positives. Specifically:
* When querying an item that has already been inserted to the filter, the filter will
* always indicate that the item is present. There is a chance of false positives, where
* querying an item that has never been presented to the filter will indicate that the
* item has already been seen. Consequently, any query should be interpreted as
* "might have seen."</p>
*
* <p>A standard Bloom filter is unlike typical sketches in that it is not sub-linear
* in size and does not resize itself. A Bloom filter will work up to a target number of
* distinct items, beyond which it will saturate and the false positive rate will start to
* increase. The size of a Bloom filter will be linear in the expected number of
* distinct items.</p>
*
* <p>See the bloom_filter_builder_alloc class for methods to create a filter, especially
* one sized correctly for a target number of distinct elements and a target
* false positive probability.</p>
*
* <p>This implementation uses xxHash64 and follows the approach in Kirsch and Mitzenmacher,
* "Less Hashing, Same Performance: Building a Better Bloom Filter," Wiley Interscience, 2008, pp. 187-218.</p>
*/

template<typename Allocator = std::allocator<uint8_t>>
class bloom_filter_alloc {
using A = Allocator;
Expand Down
23 changes: 23 additions & 0 deletions filters/include/bloom_filter_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,29 @@ bloom_filter_alloc<A> bloom_filter_alloc<A>::deserialize(const void* bytes, size
return internal_deserialize_or_wrap(const_cast<void*>(bytes), length_bytes, false, false, allocator);
}

/*
* A Bloom Filter's serialized image always uses 3 longs of preamble when empty,
* otherwise 4 longs:
*
* <pre>
* Long || Start Byte Adr:
* Adr:
* || 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
* 0 || Preamble_Longs | SerVer | FamID | Flags |----Num Hashes---|-----Unused------|
*
* || 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
* 1 ||---------------------------------Hash Seed-------------------------------------|
*
* || 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
* 2 ||-------BitArray Length (in longs)----------|-----------Unused------------------|
*
* || 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
* 3 ||---------------------------------NumBitsSet------------------------------------|
* </pre>
*
* The raw BitArray bits, if non-empty start at byte 32.
*/

template<typename A>
bloom_filter_alloc<A> bloom_filter_alloc<A>::deserialize(std::istream& is, const A& allocator) {
const uint8_t prelongs = read<uint8_t>(is);
Expand Down

0 comments on commit ecc856b

Please sign in to comment.