Thanks Bob. It works fine and I am able to resolve the issue.
I filed the bug https://issues.apache.org/jira/browse/AMBARI-12893. I can fix this and provide a patch. Could you point me to build instruction wiki for Ambari. On Wed, Aug 26, 2015 at 6:35 AM, Robert Nettleton < [email protected]> wrote: > Hi Anand, > > I just tried out a simple HDFS HA deployment (with Ambari 2.1.0), using > the HOSTGROUP syntax for these two properties, and it failed as I expected. > > I’m not sure why “dfs_ha_initial_namenode_active” includes the FQDN. I > suspect that there is some other problem that is causing this. > > As I mentioned before, these two properties are not currently meant for > %HOSTGROUP% substitution, so the fix is to specify the FQDNs within these > properties. > > If you are concerned about including hostnames in your Blueprint, for > portability concerns, then you can always set these properties in the > cluster creation template. > > If you don’t need to select the initial state of the namenodes in your > cluster, you can just remove these properties from your Blueprint, and the > Blueprint processor will select an “active” and “standby” namenode. > > If it still appears to you that the property is being set by the > Blueprints processor, please feel free to file a JIRA to track the > investigation into this. > > Hope this helps! > > Thanks, > Bob > > On Aug 26, 2015, at 2:29 AM, Anandha L Ranganathan <[email protected]> > wrote: > > > + dev group. > > > > > > This is what I found in the /var/lib/ambari-agent/data/command-#.json in > > the one of the master host. > > In this you can see the , the active namenode is substituted by FQDN but > > not the the standby node. Is this a bug in the Ambari version. > > > > I am using *Ambari 2.1* version. > > > > hadoop-env{ > > > > "dfs_ha_initial_namenode_active": "usw2ha3dpma01.local", > > "hadoop_root_logger": "INFO,RFA", > > "dfs_ha_initial_namenode_standby": > > "%HOSTGROUP::host_group_master_2%", > > "namenode_opt_permsize": "128m" > > } > > > > > > Thanks > > Anand > > > > > > On Tue, Aug 25, 2015 at 11:23 AM Anandha L Ranganathan < > > [email protected]> wrote: > > > >> > >> Hi > >> > >> I am trying to install Active Namenode HA using blueprints. > >> During the cluster creation through scripts, it does following and > >> completes. > >> > >> 1) Journal nodes starts and initialized (formats journal node). > >> 2) Initialization the HA state in zookeeper or ZKFC ( Both in Active > and > >> Standby namenode ) > >> After 96% it fails. I logged into the cluster using UI and re-started > >> the standby namenode. But it throw the exception saying that Namenode > not > >> formatted. > >> I have to manually copy the fsimage logs from using this command, "hdfs > >> namenode -bootstrapStandby -force " in the standby NN server. > >> and re-starting the namenode works fine and goes into standby mode. > >> > >> Is it something I am missing in the configuration ? > >> My Namenode HA blue prints looks like this. > >> > >> hadoop-env{ > >> "dfs_ha_initial_namenode_active": "%HOSTGROUP::host_group_master_1%" > >> "dfs_ha_initial_namenode_standby": "%HOSTGROUP::host_group_master_2" > >> } > >> > >> > >> hadoop-ev{ > >> > >> "dfs_ha_initial_namenode_active": > >> "%HOSTGROUP::host_group_master_1%" > >> "dfs_ha_initial_namenode_standby": > >> "%HOSTGROUP::host_group_master_2" > >> } > >> > >> hdfs-site{ > >> "dfs.client.failover.proxy.provider.dfs-nameservices": > >> > "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", > >> "dfs.ha.automatic-failover.enabled": "true", > >> "dfs.ha.fencing.methods": "shell(/bin/true)", > >> "dfs.ha.namenodes.dfs-nameservices": "nn1,nn2", > >> "dfs.namenode.http-address.dfs-nameservices.nn1": > >> "%HOSTGROUP::host_group_master_1%:50070", > >> "dfs.namenode.http-address.dfs-nameservices.nn2": > >> "%HOSTGROUP::host_group_master_2%:50070", > >> "dfs.namenode.https-address.dfs-nameservices.nn1": > >> "%HOSTGROUP::host_group_master_1%:50470", > >> "dfs.namenode.https-address.dfs-nameservices.nn2": > >> "%HOSTGROUP::host_group_master_2%:50470", > >> "dfs.namenode.rpc-address.dfs-nameservices.nn1": > >> "%HOSTGROUP::host_group_master_1%:8020", > >> "dfs.namenode.rpc-address.dfs-nameservices.nn2": > >> "%HOSTGROUP::host_group_master_2%:8020", > >> "dfs.namenode.shared.edits.dir": > >> > "qjournal://%HOSTGROUP::host_group_master_1%:8485;%HOSTGROUP::host_group_master_2%:8485;%HOSTGROUP::host_group_master_3%:8485/dfs-nameservices", > >> "dfs.nameservices": "dfs-nameservices" > >> > >> } > >> > >> > >> core-site{ > >> "fs.defaultFS": "hdfs://dfs-nameservices", > >> "ha.zookeeper.quorum": > >> > "%HOSTGROUP::host_group_master_1%:2181,%HOSTGROUP::host_group_master_2%:2181,%HOSTGROUP::host_group_master_3%:2181" > >> > >> } > >> > >> > >> > >> This is the log message of Standby Namenode server. > >> > >> 2015-08-25 08:26:26,373 INFO zookeeper.ZooKeeper > >> (Environment.java:logEnv(100)) - Client > >> environment:user.dir=/usr/hdp/2.2.6.0-2800/hadoop > >> 2015-08-25 08:26:26,380 INFO zookeeper.ZooKeeper > >> (ZooKeeper.java:<init>(438)) - Initiating client connection, > >> > connectString=usw2ha2dpma01.local:2181,usw2ha2dpma02.local:2181,usw2ha2dpma03.local:2181 > >> sessionTimeout=5000 > >> > watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5b7a5baa > >> 2015-08-25 08:26:26,399 INFO zookeeper.ClientCnxn > >> (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to > >> server usw2ha2dpma02.local/172.17.213.51:2181. Will not attempt to > >> authenticate using SASL (unknown error) > >> 2015-08-25 08:26:26,405 INFO zookeeper.ClientCnxn > >> (ClientCnxn.java:primeConnection(852)) - Socket connection established > to > >> usw2ha2dpma02.local/172.17.213.51:2181, initiating session > >> 2015-08-25 08:26:26,413 INFO zookeeper.ClientCnxn > >> (ClientCnxn.java:onConnected(1235)) - Session establishment complete on > >> server usw2ha2dpma02.local/172.17.213.51:2181, sessionid = > >> 0x24f63f6f3050001, negotiated timeout = 5000 > >> 2015-08-25 08:26:26,416 INFO ha.ActiveStandbyElector > >> (ActiveStandbyElector.java:processWatchEvent(547)) - Session connected. > >> 2015-08-25 08:26:26,441 INFO ipc.CallQueueManager > >> (CallQueueManager.java:<init>(53)) - Using callQueue class > >> java.util.concurrent.LinkedBlockingQueue > >> 2015-08-25 08:26:26,472 INFO ipc.Server (Server.java:run(605)) - > Starting > >> Socket Reader #1 for port 8019 > >> 2015-08-25 08:26:26,520 INFO ipc.Server (Server.java:run(827)) - IPC > >> Server Responder: starting > >> 2015-08-25 08:26:26,526 INFO ipc.Server (Server.java:run(674)) - IPC > >> Server listener on 8019: starting > >> 2015-08-25 08:26:27,596 INFO ipc.Client > >> (Client.java:handleConnectionFailure(859)) - Retrying connect to server: > >> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry > >> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, > sleepTime=1000 > >> MILLISECONDS) > >> 2015-08-25 08:26:27,615 WARN ha.HealthMonitor > >> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception > trying > >> to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020 > : > >> Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020 > >> failed on connection exception: java.net.ConnectException: Connection > >> refused; For more details see: > >> http://wiki.apache.org/hadoop/ConnectionRefused > >> 2015-08-25 08:26:27,616 INFO ha.HealthMonitor > >> (HealthMonitor.java:enterState(238)) - Entering state > SERVICE_NOT_RESPONDING > >> 2015-08-25 08:26:27,616 INFO ha.ZKFailoverController > >> (ZKFailoverController.java:setLastHealthState(850)) - Local service > >> NameNode at usw2ha2dpma02.local/172.17.213.51:8020 entered state: > >> SERVICE_NOT_RESPONDING > >> 2015-08-25 08:26:27,616 INFO ha.ZKFailoverController > >> (ZKFailoverController.java:recheckElectability(766)) - Quitting master > >> election for NameNode at usw2ha2dpma02.local/172.17.213.51:8020 and > >> marking that fencing is necessary > >> 2015-08-25 08:26:27,617 INFO ha.ActiveStandbyElector > >> (ActiveStandbyElector.java:quitElection(354)) - Yielding from election > >> 2015-08-25 08:26:27,621 INFO zookeeper.ClientCnxn > >> (ClientCnxn.java:run(512)) - EventThread shut down > >> 2015-08-25 08:26:27,621 INFO zookeeper.ZooKeeper > >> (ZooKeeper.java:close(684)) - Session: 0x24f63f6f3050001 closed > >> 2015-08-25 08:26:29,623 INFO ipc.Client > >> (Client.java:handleConnectionFailure(859)) - Retrying connect to server: > >> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry > >> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, > sleepTime=1000 > >> MILLISECONDS) > >> 2015-08-25 08:26:29,624 WARN ha.HealthMonitor > >> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception > trying > >> to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020 > : > >> Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020 > >> failed on connection exception: java.net.ConnectException: Connection > >> refused; For more details see: > >> http://wiki.apache.org/hadoop/ConnectionRefused > >> 2015-08-25 08:26:31,626 INFO ipc.Client > >> (Client.java:handleConnectionFailure(859)) - Retrying connect to server: > >> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry > >> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, > sleepTime=1000 > >> MILLISECONDS) > >> 2015-08-25 08:26:31,627 WARN ha.HealthMonitor > >> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception > trying > >> to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020 > : > >> Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020 > >> failed on connection exception: java.net.ConnectException: Connection > >> refused; For more details see: > >> http://wiki.apache.org/hadoop/ConnectionRefused > >> 2015-08-25 08:26:33,629 INFO ipc.Client > >> (Client.java:handleConnectionFailure(859)) - Retrying connect to server: > >> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry > >> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, > sleepTime=1000 > >> MILLISECONDS) > >> 2015-08-25 08:26:33,630 WARN ha.HealthMonitor > >> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception > trying > >> to > >> > >> > >
