[ https://issues.apache.org/jira/browse/HBASE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-2998. -------------------------- Resolution: Fixed Hadoop Flags: [Reviewed] Thanks for the review Jon. I did as you suggested (and that test passes). I just tried it too up on cluster w/ 5 node ensemble. Committing. > rolling-restart.sh shouldn't rely on zoo.cfg > -------------------------------------------- > > Key: HBASE-2998 > URL: https://issues.apache.org/jira/browse/HBASE-2998 > Project: HBase > Issue Type: Bug > Reporter: Jean-Daniel Cryans > Assignee: stack > Priority: Critical > Fix For: 0.90.0 > > Attachments: 2998.txt > > > I tried the rolling-restart script on our dev environment, which is > configured with zoo.cfg for zookeeper, and it worked pretty well. Then I > tried it on our MR cluster, which doesn't have a zoo.cfg, and we suffered > some downtime (no biggie tho, nothing critical was running). When the script > calls this line: > {code} > bin/hbase zkcli stat $zmaster > {code} > It directly runs a ZooKeeperMain which isn't modified to read from the HBase > configuration files. What happens next if ZK isn't running on the master node > is that it receives a ConnectionRefused, ignores it, procedes to restart the > master (which waits on the znode), and the starts restarting the region > servers. They can't shutdown properly under 60 seconds, since they need a > master, so they get killed. What follows is pretty ugly and pretty much > requires a whole restart. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.