Hi, All

  One of the RegionServer of our company’s cluster was crashed. At this
time, I found:

1.       All the RegionServer stopped handling the requests from the client
side( requestsPerSecond=0 at the master-status UI page).

2.       It takes about 12-15 minutes to recovery.

3.       I have set hbase.regionserver.restart.on.zk.expire to true, but it
does not work.

  For 1, I knew the cluster began to split log and recover the data on the
crashed RegionServer, will the recovery operation block all the requests
from the client side?

  For 2, Is there any solution to reduce the recovery time?

  For 3, I checked the log, found “session is timeout” exception, maybe
for full gc and the session was timeout. But why the
hbase.regionserver.restart.on.zk.expire does not work? My HBase version is
0.94.0.

 

  Thanks for any suggestions and feedback!

 

Fowler Zhang

 

Reply via email to