How many servers are there in zookeeper quorum ? Have you checked the log of zookeeper leader round the time master crashed ?
Cheers On Wed, Jul 15, 2015 at 7:14 PM, Jo Young Zhang <joyoungzh...@gmail.com> wrote: > I found hbase clutser crashed on-the-hour > HBase master running log as follows > > "2015-07-14 14:41:49,832 DEBUG > [master:10.240.131.18:60000.oldLogCleaner] > master.ReplicationLogCleaner: > Didn't find this log in ZK, deleting: > 10-241-125-46%2C60020%2C1436841063572.1436851865226 > 2015-07-14 14:45:49,822 DEBUG > [master:10.240.131.18:60000.oldLogCleaner] > master.ReplicationLogCleaner: > Didn't find this log in ZK, deleting: > 10-241-85-137%2C60020%2C1436841341086.1436852143141 > 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: HBase 0.96.2-hadoop2 > 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: Subversion > https://svn.apache.org/repos/asf/hbase/tags/0.96.2RC2 -r 1581096 > 2015-07-14 15:00:03,481 INFO [main] util.VersionInfo: Compiled by stack on > Mon Mar 24 16:03:18 PDT 2014 > 2015-07-14 15:00:03,729 INFO [main] zookeeper.ZooKeeper: Client > environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT > 2015-07-14 15:00:03,730 INFO [main] zookeeper.ZooKeeper: Client > environment: > host.name=10-240-131-18 > 2015-07-14 15:00:03,730 INFO [main] zookeeper.ZooKeeper: Client > environment:java.version=1.7.0_72 > > ... > > 2015-07-14 15:00:03,749 INFO [main] zookeeper.RecoverableZooKeeper: Process > identifier=clean znode for master connecting to ZooKeeper ensemble= > 10.240.131.17:2200,10.240.131.16:2200,10.240.131.15:2200, > 10.240.131.14:2200, > 10.240.131.18:2200 > 2015-07-14 15:00:03,751 INFO [main-SendThread(10-240-131-18:2200)] > zookeeper.ClientCnxn: > Opening socket connection to server 10-240-131-18/10.240.131.18:2200. Will > not attempt to authenticate using SASL (unknown error) > 2015-07-14 15:00:03,757 INFO [main-SendThread(10-240-131-18:2200)] > zookeeper.ClientCnxn: > Socket connection established to 10-240-131-18/10.240.131.18:2200, > initiating session > 2015-07-14 15:00:03,764 INFO [main-SendThread(10-240-131-18:2200)] > zookeeper.ClientCnxn: > Session establishment complete on server 10-240-131-18/10.240.131.18:2200, > sessionid = 0x34e8a64b453024a, negotiated timeout = 40000 > 2015-07-14 15:00:04,835 INFO [main] zookeeper.ZooKeeper: Session: > 0x34e8a64b453024a closed > 2015-07-14 15:00:04,835 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down" > > After print " Didn't find this log in ZK..." every hour at a time > The master dead > > Zookeeper running log as follows > > "2015-07-14 15:00:03,756 [myid:3] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2200:NIOServerCnxnFactory@197] - Accepted socket > connection > from /10.240.131.18:52733 > 2015-07-14 15:00:03,761 [myid:3] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2200:ZooKeeperServer@868] - Client attempting to establish > new session at /10.240.131.18:52733 > 2015-07-14 15:00:03,762 [myid:3] - INFO > [CommitProcessor:3:ZooKeeperServer@617] - Established session > 0x34e8a64b453024a with negotiated timeout 40000 for client / > 10.240.131.18:52733 > 2015-07-14 15:00:04,836 [myid:3] - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2200:NIOServerCnxn@1007] - Closed socket connection for > client /10.240.131.18:52733 which had sessionid 0x34e8a64b453024a" >