Hi Community,
  Bloom datamap has been implemented for a while at blocklet level. 
One problem of bloom datamap is that the pruning process is 
done in driver side and caching the bloom index data is expensive. 
So here we are proposing to build bloom filter inside the carbon 
data file at page level, such that page skipping can be done in executor 
side. And this has no conflict with bloom datamap.


A draft of the implementation could be like this:
1. Setting: create table with specifing which column needs to 
   build page level bloom in table properties (act like local dictionary)
2. Building: when loading, build the bitmap of bloom as the 
   way collecting MIN_MAX for these columns
3. Writing: write the bitmaps in DataChunk3. Learned from bloom datamap, 
   we also compress the bitmaps for storage and IO performance purpose.
4. Usage: (IncludeFilterExecuterImpl.prunePages) prune page with bloom if 
   not all pages pruned by MIN_MAX, then intersect the bitset for selected pages


A rough test shows this can skip many pages for query in some scenarios
(like phone number filter query which easily hits a blocklet/page). 

Please give you inputs if you have any idea or suggestion.

Thanks & Regards,
Manhua

Reply via email to