Sergey Shelukhin created HBASE-22334:
----------------------------------------

             Summary: handle blocking RPC threads better (time out calls? )
                 Key: HBASE-22334
                 URL: https://issues.apache.org/jira/browse/HBASE-22334
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin


Combined with HBASE-22333, we had the case where user sent lots of create table 
requests with pre-split for the same table (because the tasks of some job would 
try to create table opportunistically if it doesn't exist, and there were many 
such tasks); these requests took up all the RPC threads and caused large call 
queue to form; then, the first call got stuck because RS calls to report an 
opened region were stuck in queue. All the other calls were stuck here:
{noformat}
          submitProcedure(
            new CreateTableProcedure(procedureExecutor.getEnvironment(), desc, 
newRegions, latch));
          latch.await();
{noformat}

The procedures in this case were stuck for hours; even if the other issue was 
resolved, assigning 1000s of regions can take a long time and cause lots of 
delay before it unblocks the the other procedures and allows them to release 
the latch.

In general, waiting on RPC thread is not a good idea. I wonder if it would make 
sense to fail client requests taking up the RPC thread based on timeout; or if 
they are not making progress (e.g. in this case, the procedure is not getting 
updated; might need to be handled on case by case basis).





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to