Are you able to connect both NN hosts using SSH without password?
Make sure you have correct ssh keys in authorized key file.

Regards
Jitendra


On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yypvsxf19870...@gmail.com>wrote:

> Hi Pavan
>
>
>   I'm using sshfence
>
> ------core-site.xml-----------------
>
> <configuration>
>  <property>
>      <name>fs.defaultFS</name>
>      <value>hdfs://lklcluster</value>
>      <final>true</final>
>  </property>
>
>  <property>
>      <name>hadoop.tmp.dir</name>
>      <value>/home/hadoop/tmp2</value>
>  </property>
>
>
> </configuration>
>
>
> -------hdfs-site.xml-------------
>
> <configuration>
>  <property>
>      <name>dfs.namenode.name.dir</name>
>     <value>/home/hadoop/namedir2</value>
>  </property>
>
>  <property>
>      <name>dfs.datanode.data.dir</name>
>      <value>/home/hadoop/datadir2</value>
>  </property>
>
>  <property>
>    <name>dfs.nameservices</name>
>    <value>lklcluster</value>
> </property>
>
> <property>
>     <name>dfs.ha.namenodes.lklcluster</name>
>     <value>nn1,nn2</value>
> </property>
> <property>
>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>   <value>hadoop2:8020</value>
> </property>
> <property>
>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>     <value>hadoop3:8020</value>
> </property>
>
> <property>
>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>     <value>hadoop2:50070</value>
> </property>
>
> <property>
>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>     <value>hadoop3:50070</value>
> </property>
>
> <property>
>   <name>dfs.namenode.shared.edits.dir</name>
>
> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
> </property>
> <property>
>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> <property>
>   <name>dfs.ha.fencing.methods</name>
>   <value>sshfence</value>
> </property>
>
> <property>
>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>    <value>/home/hadoop/.ssh/id_rsa</value>
> </property>
>
> <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>      <value>5000</value>
> </property>
>
> <property>
>   <name>dfs.journalnode.edits.dir</name>
>    <value>/home/hadoop/journal/data</value>
> </property>
>
> <property>
>    <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
> </property>
>
> <property>
>      <name>ha.zookeeper.quorum</name>
>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
> </property>
>
> </configuration>
>
>
> 2013/12/2 Pavan Kumar Polineni <smartsunny...@gmail.com>
>
>> Post your config files and in which method you are following for
>> automatic failover
>>
>>
>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang 
>> <yypvsxf19870...@gmail.com>wrote:
>>
>>> Hi i
>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>
>>>   The cluster can be manully failover ,however failed with the automatic
>>> failover.
>>> I setup the HA according to  the URL
>>>
>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>
>>>   When I test the automatic failover, I killed my active NN by kill -9
>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>   It came out the log in my DFSZKFailoverController as [1]
>>>
>>>  Please help me ,any suggestion will be appreciated.
>>>
>>>
>>> Regards.
>>>
>>>
>>> zkfc
>>> log[1]----------------------------------------------------------------------------------------------------
>>>
>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>> Beginning Service Fencing Process... ======
>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Connecting to hadoop3...
>>> 2013-12-02 19:49:28,590 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>> 2013-12-02 19:49:28,592 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>> SSH-2.0-OpenSSH_5.3
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>> SSH-2.0-JSCH-0.1.42
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>> 2013-12-02 19:49:28,609 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>> hmac-md5 none
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>> hmac-md5 none
>>> 2013-12-02 19:49:28,617 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>> 2013-12-02 19:49:28,617 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>> 2013-12-02 19:49:28,634 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>> 2013-12-02 19:49:28,635 WARN
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>> (RSA) to the list of known hosts.
>>> 2013-12-02 19:49:28,635 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>> 2013-12-02 19:49:28,635 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>> 2013-12-02 19:49:28,636 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>> 2013-12-02 19:49:28,637 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>> 2013-12-02 19:49:28,638 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>> 2013-12-02 19:49:28,639 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>> gssapi-with-mic
>>> 2013-12-02 19:49:28,642 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>> continue: publickey,keyboard-interactive,password
>>> 2013-12-02 19:49:28,642 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>> publickey
>>> 2013-12-02 19:49:28,644 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>> port 22
>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Unable to connect to hadoop3 as user hadoop
>>> com.jcraft.jsch.JSchException: Auth fail
>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>     at
>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>>> fence service by any configured method.
>>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>>> SERVICE_NOT_RESPONDING
>>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Exception handling the winning of election
>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>> 10.7.23.124:8020
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Trying to re-establish ZK session
>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x2429313c808025b closed
>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>>> sessionTimeout=5000
>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>>> to authenticate using SASL (Unable to locate a login configuration)
>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>>> 0x3429312ba330262, negotiated timeout = 5000
>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Session connected.
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>>> marking that fencing is necessary
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Yielding from election
>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x3429312ba330262 closed
>>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>>
>>
>>
>

Reply via email to