Sergey Shelukhin created HBASE-22334: ----------------------------------------
Summary: handle blocking RPC threads better (time out calls? ) Key: HBASE-22334 URL: https://issues.apache.org/jira/browse/HBASE-22334 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Combined with HBASE-22333, we had the case where user sent lots of create table requests with pre-split for the same table (because the tasks of some job would try to create table opportunistically if it doesn't exist, and there were many such tasks); these requests took up all the RPC threads and caused large call queue to form; then, the first call got stuck because RS calls to report an opened region were stuck in queue. All the other calls were stuck here: {noformat} submitProcedure( new CreateTableProcedure(procedureExecutor.getEnvironment(), desc, newRegions, latch)); latch.await(); {noformat} The procedures in this case were stuck for hours; even if the other issue was resolved, assigning 1000s of regions can take a long time and cause lots of delay before it unblocks the the other procedures and allows them to release the latch. In general, waiting on RPC thread is not a good idea. I wonder if it would make sense to fail client requests taking up the RPC thread based on timeout; or if they are not making progress (e.g. in this case, the procedure is not getting updated; might need to be handled on case by case basis). -- This message was sent by Atlassian JIRA (v7.6.3#76005)