Replies: 1 comment 2 replies
-
What do you think of xorfilter? Might also be a good option if we don't need to rebuild the index frequently/dynamically.
Ribbon filter looks great too. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
The discussion is about adding bloom filter for data block. It is very useful for high cardinality columns like email or account name. Different products have different design philosophy about this concept.
Snowflake
I went through some Snowflake tech talk and doc, only found this one. https://nattaylor.com/blog/2019/snowflake-internals/
It says:
Per file min/max values, #distinct values, #nulls, bloom filters etc.
-- That is to say, Snowflake default create a bloom filter file for every data file.Databricks:
Databricks take the same approach with Snowflak, see this doc: https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/bloom-filters
There are many clickhouse experts in the community, correct me if I was wrong, Clickhouse doesn't create bloom filter by default -- requires user to do that.
In summary, Snowflake and Databricks create a bloom-filter index file for every data-block file, it keeps per-column indexing information, that will benefit predicate like 't.number=1', pruning unrelated data-block. While Clickhouse requires use to create the indexing.
Which one do you think are better ?
Beta Was this translation helpful? Give feedback.
All reactions