Hi Thanks for your reply. It works. Formerly, I setup the ssh with a passwd,and before start-dfs.sh or stop-dfs.sh ,it needs to enter password once by enter ssh-agent bash and ssh-add. Now I recreate the rsa without a passwd.Finnaly it work -HA does the automatic-failover..
But I do think it is a safe way with a password when i create the rsa. Can I acheive the HA automatic-failover also with a ssh setting including a passwd? Regards 2013/12/2 Jitendra Yadav <jeetuyadav200...@gmail.com> > If you are using hadoop user and you have correct ssh conf then below > commands > should works without password. > > Execute from NN2 & NN1 > # ssh hadoop@NN1_host > > & > > Execute from NN2 & NN1 > # ssh hadoop@NN2_host > > Regards > Jitendra > > > > On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <yypvsxf19870...@gmail.com>wrote: > >> Hi Jitendra >> Yes >> I'm doubt that it need to enter the ssh-agent bash & ssh-add before I >> ssh the NN from each other.Is it an problem? >> >> Regards >> >> >> >> >> 2013/12/2 Jitendra Yadav <jeetuyadav200...@gmail.com> >> >>> Are you able to connect both NN hosts using SSH without password? >>> Make sure you have correct ssh keys in authorized key file. >>> >>> Regards >>> Jitendra >>> >>> >>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang >>> <yypvsxf19870...@gmail.com>wrote: >>> >>>> Hi Pavan >>>> >>>> >>>> I'm using sshfence >>>> >>>> ------core-site.xml----------------- >>>> >>>> <configuration> >>>> <property> >>>> <name>fs.defaultFS</name> >>>> <value>hdfs://lklcluster</value> >>>> <final>true</final> >>>> </property> >>>> >>>> <property> >>>> <name>hadoop.tmp.dir</name> >>>> <value>/home/hadoop/tmp2</value> >>>> </property> >>>> >>>> >>>> </configuration> >>>> >>>> >>>> -------hdfs-site.xml------------- >>>> >>>> <configuration> >>>> <property> >>>> <name>dfs.namenode.name.dir</name> >>>> <value>/home/hadoop/namedir2</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.datanode.data.dir</name> >>>> <value>/home/hadoop/datadir2</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.nameservices</name> >>>> <value>lklcluster</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.ha.namenodes.lklcluster</name> >>>> <value>nn1,nn2</value> >>>> </property> >>>> <property> >>>> <name>dfs.namenode.rpc-address.lklcluster.nn1</name> >>>> <value>hadoop2:8020</value> >>>> </property> >>>> <property> >>>> <name>dfs.namenode.rpc-address.lklcluster.nn2</name> >>>> <value>hadoop3:8020</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.namenode.http-address.lklcluster.nn1</name> >>>> <value>hadoop2:50070</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.namenode.http-address.lklcluster.nn2</name> >>>> <value>hadoop3:50070</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.namenode.shared.edits.dir</name> >>>> >>>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value> >>>> </property> >>>> <property> >>>> <name>dfs.client.failover.proxy.provider.lklcluster</name> >>>> >>>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> >>>> </property> >>>> <property> >>>> <name>dfs.ha.fencing.methods</name> >>>> <value>sshfence</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.ha.fencing.ssh.private-key-files</name> >>>> <value>/home/hadoop/.ssh/id_rsa</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.ha.fencing.ssh.connect-timeout</name> >>>> <value>5000</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.journalnode.edits.dir</name> >>>> <value>/home/hadoop/journal/data</value> >>>> </property> >>>> >>>> <property> >>>> <name>dfs.ha.automatic-failover.enabled</name> >>>> <value>true</value> >>>> </property> >>>> >>>> <property> >>>> <name>ha.zookeeper.quorum</name> >>>> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value> >>>> </property> >>>> >>>> </configuration> >>>> >>>> >>>> 2013/12/2 Pavan Kumar Polineni <smartsunny...@gmail.com> >>>> >>>>> Post your config files and in which method you are following for >>>>> automatic failover >>>>> >>>>> >>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang < >>>>> yypvsxf19870...@gmail.com> wrote: >>>>> >>>>>> Hi i >>>>>> I'm testing the HA auto-failover within hadoop-2.2.0 >>>>>> >>>>>> The cluster can be manully failover ,however failed with the >>>>>> automatic failover. >>>>>> I setup the HA according to the URL >>>>>> >>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html >>>>>> >>>>>> When I test the automatic failover, I killed my active NN by kill >>>>>> -9 <Pid-nn>,while the standby namenode does not change to active state. >>>>>> It came out the log in my DFSZKFailoverController as [1] >>>>>> >>>>>> Please help me ,any suggestion will be appreciated. >>>>>> >>>>>> >>>>>> Regards. >>>>>> >>>>>> >>>>>> zkfc >>>>>> log[1]---------------------------------------------------------------------------------------------------- >>>>>> >>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ====== >>>>>> Beginning Service Fencing Process... ====== >>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying >>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) >>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort: >>>>>> Connecting to hadoop3... >>>>>> 2013-12-02 19:49:28,590 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port >>>>>> 22 >>>>>> 2013-12-02 19:49:28,592 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established >>>>>> 2013-12-02 19:49:28,603 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: >>>>>> SSH-2.0-OpenSSH_5.3 >>>>>> 2013-12-02 19:49:28,603 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: >>>>>> SSH-2.0-JSCH-0.1.42 >>>>>> 2013-12-02 19:49:28,603 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: >>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256 >>>>>> 2013-12-02 19:49:28,608 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available. >>>>>> 2013-12-02 19:49:28,608 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available. >>>>>> 2013-12-02 19:49:28,608 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available. >>>>>> 2013-12-02 19:49:28,608 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available. >>>>>> 2013-12-02 19:49:28,609 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available. >>>>>> 2013-12-02 19:49:28,610 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent >>>>>> 2013-12-02 19:49:28,610 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received >>>>>> 2013-12-02 19:49:28,610 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client >>>>>> aes128-ctr >>>>>> hmac-md5 none >>>>>> 2013-12-02 19:49:28,610 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server >>>>>> aes128-ctr >>>>>> hmac-md5 none >>>>>> 2013-12-02 19:49:28,617 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent >>>>>> 2013-12-02 19:49:28,617 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting >>>>>> SSH_MSG_KEXDH_REPLY >>>>>> 2013-12-02 19:49:28,634 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature >>>>>> true >>>>>> 2013-12-02 19:49:28,635 WARN >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3' >>>>>> (RSA) to the list of known hosts. >>>>>> 2013-12-02 19:49:28,635 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent >>>>>> 2013-12-02 19:49:28,635 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received >>>>>> 2013-12-02 19:49:28,636 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent >>>>>> 2013-12-02 19:49:28,637 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT >>>>>> received >>>>>> 2013-12-02 19:49:28,638 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can >>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password >>>>>> 2013-12-02 19:49:28,639 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: >>>>>> gssapi-with-mic >>>>>> 2013-12-02 19:49:28,642 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can >>>>>> continue: publickey,keyboard-interactive,password >>>>>> 2013-12-02 19:49:28,642 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: >>>>>> publickey >>>>>> 2013-12-02 19:49:28,644 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3 >>>>>> port 22 >>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort: >>>>>> Unable to connect to hadoop3 as user hadoop >>>>>> com.jcraft.jsch.JSchException: Auth fail >>>>>> at com.jcraft.jsch.Session.connect(Session.java:452) >>>>>> at >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) >>>>>> at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) >>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) >>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) >>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing >>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. >>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable >>>>>> to fence service by any configured method. >>>>>> 2013-12-02 19:49:28,645 INFO >>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at >>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING >>>>>> 2013-12-02 19:49:28,646 WARN >>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning >>>>>> of election >>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/ >>>>>> 10.7.23.124:8020 >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) >>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) >>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) >>>>>> 2013-12-02 19:49:28,646 INFO >>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK >>>>>> session >>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session: >>>>>> 0x2429313c808025b closed >>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: >>>>>> Initiating client connection, >>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000 >>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b >>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening >>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not >>>>>> attempt to authenticate using SASL (Unable to locate a login >>>>>> configuration) >>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket >>>>>> connection established to hadoop3/10.7.23.124:2181, initiating >>>>>> session >>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session >>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid >>>>>> = 0x3429312ba330262, negotiated timeout = 5000 >>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: >>>>>> EventThread shut down >>>>>> 2013-12-02 19:49:29,706 INFO >>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected. >>>>>> 2013-12-02 19:49:29,706 INFO >>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for >>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is >>>>>> necessary >>>>>> 2013-12-02 19:49:29,706 INFO >>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election >>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session: >>>>>> 0x3429312ba330262 closed >>>>>> 2013-12-02 19:49:29,728 WARN >>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old >>>>>> client with sessionId 0x3429312ba330262 >>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: >>>>>> EventThread shut down >>>>>> >>>>> >>>>> >>>> >>> >> >