Hi i I'm testing the HA auto-failover within hadoop-2.2.0 The cluster can be manully failover ,however failed with the automatic failover. I setup the HA according to the URL
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html When I test the automatic failover, I killed my active NN by kill -9 <Pid-nn>,while the standby namenode does not change to active state. It came out the log in my DFSZKFailoverController as [1] Please help me ,any suggestion will be appreciated. Regards. zkfc log[1]---------------------------------------------------------------------------------------------------- 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ====== 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to hadoop3... 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_5.3 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.42 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available. 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available. 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available. 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available. 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available. 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-md5 none 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-md5 none 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3' (RSA) to the list of known hosts. 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3 port 22 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to hadoop3 as user hadoop com.jcraft.jsch.JSchException: Auth fail at com.jcraft.jsch.Session.connect(Session.java:452) at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method. 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election java.lang.RuntimeException: Unable to fence NameNode at hadoop3/ 10.7.23.124:8020 at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522) at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session: 0x2429313c808025b closed 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hadoop3/10.7.23.124:2181, initiating session 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop3/10.7.23.124:2181, sessionid = 0x3429312ba330262, negotiated timeout = 5000 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected. 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController: Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is necessary 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session: 0x3429312ba330262 closed 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x3429312ba330262 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down