+ dev group.
This is what I found in the /var/lib/ambari-agent/data/command-#.json in the one of the master host. In this you can see the , the active namenode is substituted by FQDN but not the the standby node. Is this a bug in the Ambari version. I am using *Ambari 2.1* version. hadoop-env{ "dfs_ha_initial_namenode_active": "usw2ha3dpma01.local", "hadoop_root_logger": "INFO,RFA", "dfs_ha_initial_namenode_standby": "%HOSTGROUP::host_group_master_2%", "namenode_opt_permsize": "128m" } Thanks Anand On Tue, Aug 25, 2015 at 11:23 AM Anandha L Ranganathan < analog.s...@gmail.com> wrote: > > Hi > > I am trying to install Active Namenode HA using blueprints. > During the cluster creation through scripts, it does following and > completes. > > 1) Journal nodes starts and initialized (formats journal node). > 2) Initialization the HA state in zookeeper or ZKFC ( Both in Active and > Standby namenode ) > After 96% it fails. I logged into the cluster using UI and re-started > the standby namenode. But it throw the exception saying that Namenode not > formatted. > I have to manually copy the fsimage logs from using this command, "hdfs > namenode -bootstrapStandby -force " in the standby NN server. > and re-starting the namenode works fine and goes into standby mode. > > Is it something I am missing in the configuration ? > My Namenode HA blue prints looks like this. > > hadoop-env{ > "dfs_ha_initial_namenode_active": "%HOSTGROUP::host_group_master_1%" > "dfs_ha_initial_namenode_standby": "%HOSTGROUP::host_group_master_2" > } > > > hadoop-ev{ > > "dfs_ha_initial_namenode_active": > "%HOSTGROUP::host_group_master_1%" > "dfs_ha_initial_namenode_standby": > "%HOSTGROUP::host_group_master_2" > } > > hdfs-site{ > "dfs.client.failover.proxy.provider.dfs-nameservices": > "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", > "dfs.ha.automatic-failover.enabled": "true", > "dfs.ha.fencing.methods": "shell(/bin/true)", > "dfs.ha.namenodes.dfs-nameservices": "nn1,nn2", > "dfs.namenode.http-address.dfs-nameservices.nn1": > "%HOSTGROUP::host_group_master_1%:50070", > "dfs.namenode.http-address.dfs-nameservices.nn2": > "%HOSTGROUP::host_group_master_2%:50070", > "dfs.namenode.https-address.dfs-nameservices.nn1": > "%HOSTGROUP::host_group_master_1%:50470", > "dfs.namenode.https-address.dfs-nameservices.nn2": > "%HOSTGROUP::host_group_master_2%:50470", > "dfs.namenode.rpc-address.dfs-nameservices.nn1": > "%HOSTGROUP::host_group_master_1%:8020", > "dfs.namenode.rpc-address.dfs-nameservices.nn2": > "%HOSTGROUP::host_group_master_2%:8020", > "dfs.namenode.shared.edits.dir": > "qjournal://%HOSTGROUP::host_group_master_1%:8485;%HOSTGROUP::host_group_master_2%:8485;%HOSTGROUP::host_group_master_3%:8485/dfs-nameservices", > "dfs.nameservices": "dfs-nameservices" > > } > > > core-site{ > "fs.defaultFS": "hdfs://dfs-nameservices", > "ha.zookeeper.quorum": > "%HOSTGROUP::host_group_master_1%:2181,%HOSTGROUP::host_group_master_2%:2181,%HOSTGROUP::host_group_master_3%:2181" > > } > > > > This is the log message of Standby Namenode server. > > 2015-08-25 08:26:26,373 INFO zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:user.dir=/usr/hdp/2.2.6.0-2800/hadoop > 2015-08-25 08:26:26,380 INFO zookeeper.ZooKeeper > (ZooKeeper.java:<init>(438)) - Initiating client connection, > connectString=usw2ha2dpma01.local:2181,usw2ha2dpma02.local:2181,usw2ha2dpma03.local:2181 > sessionTimeout=5000 > watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5b7a5baa > 2015-08-25 08:26:26,399 INFO zookeeper.ClientCnxn > (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to > server usw2ha2dpma02.local/172.17.213.51:2181. Will not attempt to > authenticate using SASL (unknown error) > 2015-08-25 08:26:26,405 INFO zookeeper.ClientCnxn > (ClientCnxn.java:primeConnection(852)) - Socket connection established to > usw2ha2dpma02.local/172.17.213.51:2181, initiating session > 2015-08-25 08:26:26,413 INFO zookeeper.ClientCnxn > (ClientCnxn.java:onConnected(1235)) - Session establishment complete on > server usw2ha2dpma02.local/172.17.213.51:2181, sessionid = > 0x24f63f6f3050001, negotiated timeout = 5000 > 2015-08-25 08:26:26,416 INFO ha.ActiveStandbyElector > (ActiveStandbyElector.java:processWatchEvent(547)) - Session connected. > 2015-08-25 08:26:26,441 INFO ipc.CallQueueManager > (CallQueueManager.java:<init>(53)) - Using callQueue class > java.util.concurrent.LinkedBlockingQueue > 2015-08-25 08:26:26,472 INFO ipc.Server (Server.java:run(605)) - Starting > Socket Reader #1 for port 8019 > 2015-08-25 08:26:26,520 INFO ipc.Server (Server.java:run(827)) - IPC > Server Responder: starting > 2015-08-25 08:26:26,526 INFO ipc.Server (Server.java:run(674)) - IPC > Server listener on 8019: starting > 2015-08-25 08:26:27,596 INFO ipc.Client > (Client.java:handleConnectionFailure(859)) - Retrying connect to server: > usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 > MILLISECONDS) > 2015-08-25 08:26:27,615 WARN ha.HealthMonitor > (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying > to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020: > Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020 > failed on connection exception: java.net.ConnectException: Connection > refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > 2015-08-25 08:26:27,616 INFO ha.HealthMonitor > (HealthMonitor.java:enterState(238)) - Entering state SERVICE_NOT_RESPONDING > 2015-08-25 08:26:27,616 INFO ha.ZKFailoverController > (ZKFailoverController.java:setLastHealthState(850)) - Local service > NameNode at usw2ha2dpma02.local/172.17.213.51:8020 entered state: > SERVICE_NOT_RESPONDING > 2015-08-25 08:26:27,616 INFO ha.ZKFailoverController > (ZKFailoverController.java:recheckElectability(766)) - Quitting master > election for NameNode at usw2ha2dpma02.local/172.17.213.51:8020 and > marking that fencing is necessary > 2015-08-25 08:26:27,617 INFO ha.ActiveStandbyElector > (ActiveStandbyElector.java:quitElection(354)) - Yielding from election > 2015-08-25 08:26:27,621 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(512)) - EventThread shut down > 2015-08-25 08:26:27,621 INFO zookeeper.ZooKeeper > (ZooKeeper.java:close(684)) - Session: 0x24f63f6f3050001 closed > 2015-08-25 08:26:29,623 INFO ipc.Client > (Client.java:handleConnectionFailure(859)) - Retrying connect to server: > usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 > MILLISECONDS) > 2015-08-25 08:26:29,624 WARN ha.HealthMonitor > (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying > to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020: > Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020 > failed on connection exception: java.net.ConnectException: Connection > refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > 2015-08-25 08:26:31,626 INFO ipc.Client > (Client.java:handleConnectionFailure(859)) - Retrying connect to server: > usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 > MILLISECONDS) > 2015-08-25 08:26:31,627 WARN ha.HealthMonitor > (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying > to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020: > Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020 > failed on connection exception: java.net.ConnectException: Connection > refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > 2015-08-25 08:26:33,629 INFO ipc.Client > (Client.java:handleConnectionFailure(859)) - Retrying connect to server: > usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 > MILLISECONDS) > 2015-08-25 08:26:33,630 WARN ha.HealthMonitor > (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying > to > >