Tej Meka created HBASE-27691:
--------------------------------

             Summary: Fake Cell is being passed to Filters and comparators 
during StoreFileScans
                 Key: HBASE-27691
                 URL: https://issues.apache.org/jira/browse/HBASE-27691
             Project: HBase
          Issue Type: Bug
          Components: scan, Scanners
    Affects Versions: 2.2.7
            Reporter: Tej Meka
         Attachments: image-2023-03-07-15-46-01-182.png, 
image-2023-03-07-15-50-59-696.png

I am trying to upgrade HBase version (client and server) from 1.2.0 to 2.2.6 
and started seeing some unexpected behavior around discovery of ambiguous row 
in filter during StoreFileScans.

*Is it a valid case that filters and Comparators might see a fake cell passed 
to them if that row is set as an inclusive(by default) start row to skip 
preceding row during Store file scans during client side execution?*

When rows were persisted or updated on a table through bulkload, looks like a 
scan with specific column triggers a different behavior compared to a scan 
without columns which doesn't trigger this behavior.

>From what I have troubleshooted so far, it looks like this is triggered during 
>[lazy 
>scan|https://github.com/apache/hbase/blob/rel/2.2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L251-L256]
> inside 
>[StoreScanner|https://github.com/apache/hbase/blob/rel/2.2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L406-L410]
> with 
>[StoreFileScanner|https://github.com/apache/hbase/blob/rel/2.2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L388-L448]
> implementation where it eventually returns fake cell as current row on store 
>heap 
>[StoreFileScanner|https://github.com/apache/hbase/blob/rel/2.2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L437-L444]
> thus passed to filter but it's actually filtered later and not returned to 
>client. 

This was not the case with hbase 1.7.2. I have created couple of simple Tests 
using hbase 1.7.2 and hbase 2.2.6 that bulkloads some sample rows to table and 
creates a column specific Scan to reproduce behavior that I have been talking 
about.

I have simply copied KeyOnlyFilter, added few loggers to catch rowkeys being 
passed to filter and added few loggers to catch row keys returned as a result 
on client side.

Here is my working repo that demonstrate this diverged behavior 
[hbase-scans|https://github.com/tejkiran/hbase-scans]

I have a mapper that creates PUT with row keys 0, 2, 3 and bulkload those rows 
to table. When a scan is issued with 2.2.6 hbase API, it parses that start row 
on Scan to filter during server side execution.

Screenshot of discovered row keys in filter during server side .
 !image-2023-03-07-15-46-01-182.png! 

Screenshoot of discovered row keys in filter with hbase 1.7.2

 !image-2023-03-07-15-50-59-696.png! 






 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to