[ https://issues.apache.org/jira/browse/HBASE-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-1111: ------------------------- Fix Version/s: (was: 0.90.0) Issue Type: Umbrella (was: Improvement) Moving out this nebulous umbrella issue. > [performance] Crash recovery takes way too long > ----------------------------------------------- > > Key: HBASE-1111 > URL: https://issues.apache.org/jira/browse/HBASE-1111 > Project: HBase > Issue Type: Umbrella > Reporter: stack > > Watching hbase recover from crashes, its taking way too long: > 1. Must wait first on lease to expire (if server is rebooted, it should > cancel the old servers' lease but make sure the lease expiration code runs) > 2. Master splits logs. This is single-threaded. At least a maximum of 64 > logs but seems to run slow anyways. > 3. Assign out the regions that were on dead-server (minutes or even tens of > minutes could have elapsed at this stage) > 4. Wait on the regionservers to open. If small cluster, because > regionservers open regions in series, could take a long time opening a bunch > of issues. Meantime the regions are not available, clients will likely > timeout. > 5. To make things worse, I've seen load-balancer cut in to 'help out' telling > regionserver close some of its regions though its busy opening a bunch. > Andrew Purtell notes that HBASE-1110 will change a bunch of the above. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.