Feng Honghua created HBASE-10679:
------------------------------------

             Summary: Both clients operating on a same region will get wrong 
scan results if the first scanner expires and the second scanner is created 
with the same scannerId
                 Key: HBASE-10679
                 URL: https://issues.apache.org/jira/browse/HBASE-10679
             Project: HBase
          Issue Type: Bug
          Components: regionserver
            Reporter: Feng Honghua
            Assignee: Feng Honghua
            Priority: Critical


The scenario is as below (both Client A and Client B scan against Region R)
# A opens a scanner SA on R, the scannerId is N, it successfully get its first 
row "a"
# SA's lease expires and it's removed from scanners
# B opens a scanner SB on R, the scannerId is N too. it successfully get its 
first row "m"
# A issues its second scan request with scannerId N, regionserver finds N is 
valid scannerId and the region matches too. (since the region is always online 
on this regionserver and both two scanners are against it), so it executes scan 
request on SB, returns "n" to A -- wrong! (get data from other scanner, A 
expects row something like "b" that follows "a")
# B issues its second scan request with scannerId N, regionserver also thinks 
it's valid, and executes scan on SB, return "o" to B -- wrong! (should return 
"n" but "n" has been scanned out by A just now)

The consequence is both clients get wrong scan results:
# A gets data from scanner created by other client, its own scanner has expired 
and removed
# B misses data which should be gotten but has been wrongly scanned out by A

The root cause is scannerId generated by regionserver can't be guaranteed 
unique within regionserver's whole lifecycle, *there is only guarantee that 
scannerIds of scanners that are currently still valid (not expired) are 
unique*, so a same scannerId can present in scanners again after a former 
scanner with this scannerId expires and has been removed from scanners. And if 
the second scanner is against the same region, the bug arises.

Theoretically, the possibility of above scenario should be very rare(two 
consecutive scans on a same region from two different clients get a same 
scannerId, and the first expires before the second is created), but it does can 
happen, and once it happens, the consequence is severe(all clients involved get 
wrong data), and should be extremely hard to diagnose/debug



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to