Josh Elser created HBASE-20706:
----------------------------------

             Summary: [hack] Don't add known not-OPEN regions in reopen phase 
of MTP
                 Key: HBASE-20706
                 URL: https://issues.apache.org/jira/browse/HBASE-20706
             Project: HBase
          Issue Type: Sub-task
          Components: amv2
            Reporter: Josh Elser
            Assignee: Josh Elser
             Fix For: 3.0.0, 2.1.0, 2.0.1


Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" 
fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the 
wrong construct, we would want something specific to reopen (e.g. a 
ReopenProcedure).

However, we're in a really bad state right now. If there are non-open regions 
for a table which has a modify submitted against it, the entire system locks up 
in a fast-spin while holding the table's lock. This fills up HDFS with PV2 
wals, and prevents you from doing anything in the hbase shell to try to fix 
those unassigned regions. You'll see spam in the master log like:
{noformat}
2018-06-07 03:21:29,448 WARN  [PEWorker-1] procedure.ModifyTableProcedure: 
Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in 
state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
        at 
org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
        at 
org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.<init>(MoveRegionProcedure.java:67)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
        at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
        at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
{noformat}
We unstuck out internal test cluster giving the following change on top of 
Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a 
table's regions to only be those which are currently OPEN. There may be some 
transient failures here as well, but a subsequent retry of the reopen step 
should filter out that change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to