Hole in split transaction rollback; edits to .META. need to be rolled back even 
if it seems like they didn't make it
--------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-3872
                 URL: https://issues.apache.org/jira/browse/HBASE-3872
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.90.3
            Reporter: stack
            Assignee: stack
            Priority: Critical
             Fix For: 0.90.4


Saw this interesting one on a cluster of ours.  The cluster was configured with 
too few handlers so lots of the phenomeneon where actions were queued but then 
by the time they got into the server and tried respond to the client, the 
client had disconnected because of the timeout of 60 seconds.  Well, the meta 
edits for a split were queued at the regionserver carrying .META. and by the 
time it went to write back, the client had gone (the first insert of parent 
offline with daughter regions added as info:splitA and info:splitB).  The 
client presumed the edits failed and 'successfully' rolled back the transaction 
(failing to undo .META. edits thinking they didn't go through).

A few minutes later the .META. scanner on master runs.  It sees 'no references' 
in daughters -- the daughters had been cleaned up as part of the split 
transaction rollback -- so it thinks its safe to delete the parent.

Two things:

+ Tighten up check in master... need to check daughter region at least exists 
and possibly the daughter region has an entry in .META.
+ Dependent on the edit that fails, schedule rollback edits though it will seem 
like they didn't go through.

This is pretty critical one.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to