Michael Stack created HBASE-24545:
-------------------------------------

             Summary: Add backoff to SCP check on WAL split completion
                 Key: HBASE-24545
                 URL: https://issues.apache.org/jira/browse/HBASE-24545
             Project: HBase
          Issue Type: Bug
            Reporter: Michael Stack


Crashed cluster. Lots of backed up WALs. Startup. Recover hundreds of servers; 
each has a running SCP. Taking a thread dump during recovery, I noticed that 
there were 160 threads each in SCP waiting on split WAL completion. Each thread 
was scanning zk splitWAL directory every 100ms. The dir had thousands of 
entries in it so each check was pulling down MB from zk... * 160 (max 
configured PE threads (16) * 10 for the KeepAlive factor that has us do 10 * 
configured PEs as max for PE worker pool).

If lots of remaining WALs to split, have the SCP backoff on its wait so it 
checks less frequently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to