Good stuff Nicholas, I'll add this to the book.
On 2/16/12 3:52 PM, "Nicolas Spiegelberg" <nspiegelb...@fb.com> wrote: >Bryan, > >Currently, ROW & ROWCOL Bloom Filters are only checked for explicit, >single-row 'Get' scans. ROWCOL BFs are only checked when you're querying >for explicit column qualifiers (vs getting the entire row). This is >because multi-row scans & full-row scans are implicit queries. To >clarify: > >With a multirow scan, the next row after 0x0001 is NOT 0x0002. HBase only >knows that the next row is > 0x0001. The next row could be 0x00010 or >0x0003. However, when you call Htable.get(row=0x0001), HBase knows that >you explicitly want that row and don't want 0x00010. > >Nicolas > >On 2/15/12 9:18 PM, "Bryan Beaudreault" <bbeaudrea...@hubspot.com> wrote: > >>Hello, >> >>We are looking at Bloom Filters and wondering if they are helpful when >>doing a sequential read (multi-row scan) or only when doing a Get for a >>single row. It logically makes sense that it would only affect (or to >>greater affect) getting a single row since it is a way for determining if >>you have to read a whole store file when fetching a key. But, we are >>told >>that Scan and Get are essentially the same code on the backend, so I >>imagine both will check the Blooms if they exist. >> >>Also, would a ROWCOL bloom be more effective if you are often doing >>multi-row scans but always with specifying only a subset of columns in >>those rows? >> >>Thanks, >> >>Bryan > >