Post your config files and in which method you are following for automatic failover
On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yypvsxf19870...@gmail.com>wrote: > Hi i > I'm testing the HA auto-failover within hadoop-2.2.0 > > The cluster can be manully failover ,however failed with the automatic > failover. > I setup the HA according to the URL > > http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html > > When I test the automatic failover, I killed my active NN by kill -9 > <Pid-nn>,while the standby namenode does not change to active state. > It came out the log in my DFSZKFailoverController as [1] > > Please help me ,any suggestion will be appreciated. > > > Regards. > > > zkfc > log[1]---------------------------------------------------------------------------------------------------- > > 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ====== > Beginning Service Fencing Process... ====== > 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying > method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) > 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort: > Connecting to hadoop3... > 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Connecting to hadoop3 port 22 > 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Connection established > 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Remote version string: SSH-2.0-OpenSSH_5.3 > 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Local version string: SSH-2.0-JSCH-0.1.42 > 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > CheckCiphers: > aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256 > 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > aes256-ctr is not available. > 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > aes192-ctr is not available. > 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > aes256-cbc is not available. > 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > aes192-cbc is not available. > 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > arcfour256 is not available. > 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > SSH_MSG_KEXINIT sent > 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > SSH_MSG_KEXINIT received > 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > kex: server->client aes128-ctr hmac-md5 none > 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > kex: client->server aes128-ctr hmac-md5 none > 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > SSH_MSG_KEXDH_INIT sent > 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > expecting SSH_MSG_KEXDH_REPLY > 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > ssh_rsa_verify: signature true > 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Permanently added 'hadoop3' (RSA) to the list of known hosts. > 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > SSH_MSG_NEWKEYS sent > 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > SSH_MSG_NEWKEYS received > 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > SSH_MSG_SERVICE_REQUEST sent > 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > SSH_MSG_SERVICE_ACCEPT received > 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Authentications that can continue: > gssapi-with-mic,publickey,keyboard-interactive,password > 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Next authentication method: gssapi-with-mic > 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Authentications that can continue: publickey,keyboard-interactive,password > 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Next authentication method: publickey > 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Disconnecting from hadoop3 port 22 > 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort: > Unable to connect to hadoop3 as user hadoop > com.jcraft.jsch.JSchException: Auth fail > at com.jcraft.jsch.Session.connect(Session.java:452) > at > org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) > at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) > at > org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) > at > org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) > at > org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) > at > org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) > at > org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing > method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. > 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to > fence service by any configured method. > 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController: > Local service NameNode at hadoop2/10.7.23.125:8020 entered state: > SERVICE_NOT_RESPONDING > 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > java.lang.RuntimeException: Unable to fence NameNode at hadoop3/ > 10.7.23.124:8020 > at > org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522) > at > org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) > at > org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) > at > org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) > at > org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x2429313c808025b closed > 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 > sessionTimeout=5000 > watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b > 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configuration) > 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hadoop3/10.7.23.124:2181, initiating session > 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hadoop3/10.7.23.124:2181, sessionid = > 0x3429312ba330262, negotiated timeout = 5000 > 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Session connected. > 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController: > Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and > marking that fencing is necessary > 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Yielding from election > 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x3429312ba330262 closed > 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Ignoring stale result from old client with sessionId 0x3429312ba330262 > 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down >