Hi Another auto-failover testing problem: My HA can auto-failover after I kill the active NN.When it comes to the unplug network interface to simulate the hardware fail,the auto-failover seems not to work after wait for times -the zkfc logs as [1].
I'm using the default sshfence. [1] zkfc logs---------------------------------------------------------------------------------------- 2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ====== 2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) 2013-12-03 10:05:56,651 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to hadoop3... 2013-12-03 10:05:56,651 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22 2013-12-03 10:05:59,648 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to hadoop3 as user hadoop com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host at com.jcraft.jsch.Util.createSocket(Util.java:386) at com.jcraft.jsch.Session.connect(Session.java:182) at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2013-12-03 10:05:59,649 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. 2013-12-03 10:05:59,649 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method. 2013-12-03 10:05:59,650 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election java.lang.RuntimeException: Unable to fence NameNode at hadoop3/ 10.7.23.124:8020 at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522) at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2013-12-03 10:05:59,650 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2013-12-03 10:05:59,676 INFO org.apache.zookeeper.ZooKeeper: Session: 0x142931031810260 closed 2013-12-03 10:06:00,678 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5ce2acea 2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoop1/10.7.23.122:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hadoop1/10.7.23.122:2181, initiating session 2013-12-03 10:06:00,709 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop1/10.7.23.122:2181, sessionid = 0x142931031810261, negotiated timeout = 5000 2013-12-03 10:06:00,711 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down