Hi Yu

   I think when unplug the nic ,the ssh could not make through because it
can not connect to  failed  active NN.
Suppose that ,the sshfence will failed.
   Am I right?


2013/12/3 YouPeng Yang <yypvsxf19870...@gmail.com>

> Hi Yu
>
>   Thanks for your response.
>   I'm sure my ssh setup is good. Ssh from  act NN to stanby nn need no
> password.
>
>
>
>
>
>
>
> I attached my config
> ------core-site.xml-----------------
>
> <configuration>
>  <property>
>      <name>fs.defaultFS</name>
>      <value>hdfs://lklcluster</value>
>      <final>true</final>
>  </property>
>
>  <property>
>      <name>hadoop.tmp.dir</name>
>      <value>/home/hadoop/tmp2</value>
>  </property>
>
>
> </configuration>
>
>
> -------hdfs-site.xml----------
> ---
>
> <configuration>
>  <property>
>      <name>dfs.namenode.name.dir</name>
>     <value>/home/hadoop/namedir2</value>
>  </property>
>
>  <property>
>      <name>dfs.datanode.data.dir</name>
>      <value>/home/hadoop/datadir2</value>
>  </property>
>
>  <property>
>    <name>dfs.nameservices</name>
>    <value>lklcluster</value>
> </property>
>
> <property>
>     <name>dfs.ha.namenodes.lklcluster</name>
>     <value>nn1,nn2</value>
> </property>
> <property>
>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>   <value>hadoop2:8020</value>
> </property>
> <property>
>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>     <value>hadoop3:8020</value>
> </property>
>
> <property>
>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>     <value>hadoop2:50070</value>
> </property>
>
> <property>
>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>     <value>hadoop3:50070</value>
> </property>
>
> <property>
>   <name>dfs.namenode.shared.edits.dir</name>
>
> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
> </property>
> <property>
>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> <property>
>   <name>dfs.ha.fencing.methods</name>
>   <value>sshfence</value>
> </property>
>
> <property>
>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>    <value>/home/hadoop/.ssh/id_rsa</value>
> </property>
>
> <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>      <value>5000</value>
> </property>
>
> <property>
>   <name>dfs.journalnode.edits.dir</name>
>    <value>/home/hadoop/journal/data</value>
> </property>
>
> <property>
>    <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
> </property>
>
> <property>
>      <name>ha.zookeeper.quorum</name>
>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
> </property>
>
> </configuration>
>
>
>
> 2013/12/3 Azuryy Yu <azury...@gmail.com>
>
>> This is still because your fence method configuraed improperly.
>> plseae paste your fence configuration. and double check you can ssh on
>> active NN to standby NN without password.
>>
>>
>> On Tue, Dec 3, 2013 at 10:23 AM, YouPeng Yang 
>> <yypvsxf19870...@gmail.com>wrote:
>>
>>> Hi
>>>    Another auto-failover testing problem:
>>>
>>>    My HA can auto-failover after I kill the active NN.When it comes to
>>> the unplug  network interface to simulate the hardware fail,the
>>> auto-failover seems  not to work after   wait for times -the zkfc logs as
>>> [1].
>>>
>>>    I'm using the default sshfence.
>>>
>>>
>>>
>>>
>>>
>>>
>>> [1] zkfc
>>> logs----------------------------------------------------------------------------------------
>>> 2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: ======
>>> Beginning Service Fencing Process... ======
>>> 2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>> 2013-12-03 10:05:56,651 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Connecting to hadoop3...
>>> 2013-12-03 10:05:56,651 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>> 2013-12-03 10:05:59,648 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Unable to connect to hadoop3 as user hadoop
>>> com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route
>>> to host
>>>     at com.jcraft.jsch.Util.createSocket(Util.java:386)
>>>     at com.jcraft.jsch.Session.connect(Session.java:182)
>>>     at
>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-03 10:05:59,649 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>> 2013-12-03 10:05:59,649 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>>> fence service by any configured method.
>>> 2013-12-03 10:05:59,650 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Exception handling the winning of election
>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>> 10.7.23.124:8020
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-03 10:05:59,650 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Trying to re-establish ZK session
>>> 2013-12-03 10:05:59,676 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x142931031810260 closed
>>> 2013-12-03 10:06:00,678 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>>> sessionTimeout=5000
>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5ce2acea
>>> 2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server hadoop1/10.7.23.122:2181. Will not attempt
>>> to authenticate using SASL (Unable to locate a login configuration)
>>> 2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection established to hadoop1/10.7.23.122:2181, initiating session
>>> 2013-12-03 10:06:00,709 INFO org.apache.zookeeper.ClientCnxn: Session
>>> establishment complete on server hadoop1/10.7.23.122:2181, sessionid =
>>> 0x142931031810261, negotiated timeout = 5000
>>> 2013-12-03 10:06:00,711 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>>
>>
>>
>

Reply via email to