Bryan, Currently, ROW & ROWCOL Bloom Filters are only checked for explicit, single-row 'Get' scans. ROWCOL BFs are only checked when you're querying for explicit column qualifiers (vs getting the entire row). This is because multi-row scans & full-row scans are implicit queries. To clarify:
With a multirow scan, the next row after 0x0001 is NOT 0x0002. HBase only knows that the next row is > 0x0001. The next row could be 0x00010 or 0x0003. However, when you call Htable.get(row=0x0001), HBase knows that you explicitly want that row and don't want 0x00010. Nicolas On 2/15/12 9:18 PM, "Bryan Beaudreault" <[email protected]> wrote: >Hello, > >We are looking at Bloom Filters and wondering if they are helpful when >doing a sequential read (multi-row scan) or only when doing a Get for a >single row. It logically makes sense that it would only affect (or to >greater affect) getting a single row since it is a way for determining if >you have to read a whole store file when fetching a key. But, we are told >that Scan and Get are essentially the same code on the backend, so I >imagine both will check the Blooms if they exist. > >Also, would a ROWCOL bloom be more effective if you are often doing >multi-row scans but always with specifying only a subset of columns in >those rows? > >Thanks, > >Bryan
