As we now got hudi 0.11 with multiple columns bloom indexes thougth
`hoodie.metadata.index.bloom.filter.column.list`, the question is wether
those bloom are used by query planner for e.g id=19
The spark built-in blooms are used in this case, maybe that's also the
hudi multi-bloom purpose as well ?
By all means. That would be great.
Always looking for helping hand in improving docs
On Sat, Apr 2, 2022 at 6:18 AM Nicolas Paris
wrote:
> Hi Vinoth,
>
> Thanks for your in depth explanations. I think those details could be
> of interest in the documentation. I can work on this if agreed
>
> On
Hi Vinoth,
Thanks for your in depth explanations. I think those details could be
of interest in the documentation. I can work on this if agreed
On Wed, 2022-03-30 at 14:36 -0700, Vinoth Chandar wrote:
> Hi,
>
> I noticed that it finally landed. We actually began tracking that
> JIRA
> while init
Hi,
I noticed that it finally landed. We actually began tracking that JIRA
while initially writing Hudi at Uber.. Parquet + Bloom Filters has taken
just a few years :)
I think we could switch out to reading the built-in bloom filters as well.
it could make the footer reading lighter potentially.
Hi,
spark 3.2 ships parquet 1.12 which provides built-in bloom filters on
arbirtrary columns. I wonder if:
- hudi can benefit from them ? (likely in 0.11, but not with MOR tables)
- would make sense to replace the hudi blooms with them ?
- what would be the advantage of storing our blooms in hfil