Viraj Jasani created HBASE-26433:
------------------------------------

             Summary: Rollback from ZK-less to ZK-based assignment could 
produce inconsistent state - doubly assigned regions
                 Key: HBASE-26433
                 URL: https://issues.apache.org/jira/browse/HBASE-26433
             Project: HBase
          Issue Type: Bug
    Affects Versions: 1.7.1
            Reporter: Viraj Jasani
            Assignee: Viraj Jasani
             Fix For: 1.7.2


By enabling configĀ {_}hbase.assignment.usezk.migrating{_}, we initiate the 
transition of HBase 1.x cluster from default ZK-based region assignment to 
ZK-less region assignments. Once the migration is enabled, any subsequent 
region transition is going to add two additional CQs in meta: info:sn and 
info:state. The workflow that adds new CQs in meta should be the only workflow 
reading it (unless it requires coordination among multiple workflows), however 
that is not the case here. Reading info:sn and info:state to rebuild user 
region states in RegionStateStore data structure is a hidden bug because it 
doesn't restrict the usage for only ZK-less region assignment.

What are the effects?

After enabling ZK-less migration, if we revert it back, info:state and info:sn 
are not reverted. Moreover, new active master rebuilds the region states in 
memory and use this info. So if all regions have consistent info:sn values 
(i.e. consistent with info:server and info:serverstartcode), nothing goes wrong 
and this is likely going to happen when we revert the config with rolling 
restart of masters. However, after this config revert, if any region moves, 
only info:server and info:serverstartcode get updated but info:sn and 
info:state values stay the same. Because of the missing condition, subsequent 
active master restart would try to rebuild regions and assign regions as per 
info:sn, but those regions are already OPEN on info:server, hence we get doubly 
assigned regions.

We need two part fix for this:
 # Guard reading of info:sn and info:state with proper conditions.
 # Once active master init is complete, if ZK-based region assignment is 
enabled and redundant CQs are available in meta (info:sn and info:state), 
delete them all.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to