[ https://issues.apache.org/jira/browse/HBASE-23904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Stack resolved HBASE-23904. ----------------------------------- Fix Version/s: 2.3.0 3.0.0 Resolution: Fixed Pushed to branch-2 and master. > Procedure updating meta and Master shutdown are incompatible: CODE-BUG > ---------------------------------------------------------------------- > > Key: HBASE-23904 > URL: https://issues.apache.org/jira/browse/HBASE-23904 > Project: HBase > Issue Type: Bug > Components: amv2 > Reporter: Michael Stack > Priority: Major > Fix For: 3.0.0, 2.3.0 > > > Chasing flakies, studying TestMasterAbortWhileMergingTable, I noticed a > failure because > {code:java} > 2020-02-27 00:57:51,702 ERROR [PEWorker-6] > procedure2.ProcedureExecutor(1688): CODE-BUG: Uncaught runtime exception: > pid=14, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true; > MergeTableRegionsProcedure table=test, > regions=[48c9be922fa4356bfc7fc61b5b0785f3, ef196d5377c5c1d143e9a2a2ea056a9c], > force=false > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.FutureTask@28b956c7 rejected from > java.util.concurrent.ThreadPoolExecutor@639f20e5[Terminated, pool size = 0, > active threads = 0, queued tasks = 0, completed tasks = 5] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) > at > org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:974) > at > org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:953) > at > org.apache.hadoop.hbase.MetaTableAccessor.multiMutate(MetaTableAccessor.java:1771) > at > org.apache.hadoop.hbase.MetaTableAccessor.mergeRegions(MetaTableAccessor.java:1637) > at > org.apache.hadoop.hbase.master.assignment.RegionStateStore.mergeRegions(RegionStateStore.java:268) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsMerged(AssignmentManager.java:1854) > at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.updateMetaForMergedRegions(MergeTableRegionsProcedure.java:687) > at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:229) > at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:77) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1669) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1416) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:79) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1986) > {code} > A few seconds above, as part of the test, we'd stopped Master > {code:java} > 2020-02-27 00:57:51,620 INFO [Time-limited test] > regionserver.HRegionServer(2212): ***** STOPPING region server > 'rn-hbased-lapp01.rno.exampl.com,36587,1582765058324' ***** > 2020-02-27 00:57:51,620 INFO [Time-limited test] > regionserver.HRegionServer(2226): STOPPED: Stopping master 0 {code} > The rejected execution damages the merge procedure. It shows as an unhandled > CODE-BUG. > Why we let a runtime exception out when trying to update meta is mildly > interesting. We use Throwables.propagateIfPossible(e, > IOException.{color:#000080}class{color}) from guava which at first blush > would seem to throw the exception if it an IOE else return. In code, if > return, we'll wrap whatever makes it through with an IOE. But > propagateIfPossible is a little sneaky in that if the passed Exception is a > RuntimeException, as the Reject is, it will go ahead and throw and NOT > return. Not sure if this was authors' understanding ([~zhangduo] ? > HBASE-21789 for hbase-2.2.0). Looking at the old code, which called > makeIOExceptionOfException from ProtobufUtil, if I read it right, this would > wrap the exception in an IOE regardless whether a RuntimeException or not. > A little digging exposes that likely root of the problem is that the Master > is stopping. Its connection, which is used by the merge procedure when > updating meta, is being shutdown too. The rejected exception is probably > because the pool has been shutdown. Hard to tell for sure as Master doesn't > log the minutae of services closed. > The propagateIfPossible facility is used in a few places. Its addition to > MetaTableAccessor is in one place only by HBASE-21789. I could restore the > old behavior easy enough (Was afraid we had to deal with this issue around > ALL meta table accesses via MTA). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)