Thanks Bob. It works fine and I am able to resolve the issue.
I filed the bug https://issues.apache.org/jira/browse/AMBARI-12893. I can fix this and provide a patch. Could you point me to build instruction wiki for Ambari ? On Wed, Aug 26, 2015 at 7:22 PM, Anandha L Ranganathan < [email protected]> wrote: > Thanks Bob. > > It works fine and I am able to resolve the issue. > > I filed the bug https://issues.apache.org/jira/browse/AMBARI-12893. > > I can fix this and provide a patch. Could you point me to build > instruction wiki for Ambari. > > On Wed, Aug 26, 2015 at 6:35 AM, Robert Nettleton < > [email protected]> wrote: > >> Hi Anand, >> >> I just tried out a simple HDFS HA deployment (with Ambari 2.1.0), using >> the HOSTGROUP syntax for these two properties, and it failed as I expected. >> >> I’m not sure why “dfs_ha_initial_namenode_active” includes the FQDN. I >> suspect that there is some other problem that is causing this. >> >> As I mentioned before, these two properties are not currently meant for >> %HOSTGROUP% substitution, so the fix is to specify the FQDNs within these >> properties. >> >> If you are concerned about including hostnames in your Blueprint, for >> portability concerns, then you can always set these properties in the >> cluster creation template. >> >> If you don’t need to select the initial state of the namenodes in your >> cluster, you can just remove these properties from your Blueprint, and the >> Blueprint processor will select an “active” and “standby” namenode. >> >> If it still appears to you that the property is being set by the >> Blueprints processor, please feel free to file a JIRA to track the >> investigation into this. >> >> Hope this helps! >> >> Thanks, >> Bob >> >> On Aug 26, 2015, at 2:29 AM, Anandha L Ranganathan <[email protected]> >> wrote: >> >> > + dev group. >> > >> > >> > This is what I found in the /var/lib/ambari-agent/data/command-#.json in >> > the one of the master host. >> > In this you can see the , the active namenode is substituted by FQDN but >> > not the the standby node. Is this a bug in the Ambari version. >> > >> > I am using *Ambari 2.1* version. >> > >> > hadoop-env{ >> > >> > "dfs_ha_initial_namenode_active": "usw2ha3dpma01.local", >> > "hadoop_root_logger": "INFO,RFA", >> > "dfs_ha_initial_namenode_standby": >> > "%HOSTGROUP::host_group_master_2%", >> > "namenode_opt_permsize": "128m" >> > } >> > >> > >> > Thanks >> > Anand >> > >> > >> > On Tue, Aug 25, 2015 at 11:23 AM Anandha L Ranganathan < >> > [email protected]> wrote: >> > >> >> >> >> Hi >> >> >> >> I am trying to install Active Namenode HA using blueprints. >> >> During the cluster creation through scripts, it does following and >> >> completes. >> >> >> >> 1) Journal nodes starts and initialized (formats journal node). >> >> 2) Initialization the HA state in zookeeper or ZKFC ( Both in Active >> and >> >> Standby namenode ) >> >> After 96% it fails. I logged into the cluster using UI and >> re-started >> >> the standby namenode. But it throw the exception saying that Namenode >> not >> >> formatted. >> >> I have to manually copy the fsimage logs from using this command, "hdfs >> >> namenode -bootstrapStandby -force " in the standby NN server. >> >> and re-starting the namenode works fine and goes into standby mode. >> >> >> >> Is it something I am missing in the configuration ? >> >> My Namenode HA blue prints looks like this. >> >> >> >> hadoop-env{ >> >> "dfs_ha_initial_namenode_active": "%HOSTGROUP::host_group_master_1%" >> >> "dfs_ha_initial_namenode_standby": "%HOSTGROUP::host_group_master_2" >> >> } >> >> >> >> >> >> hadoop-ev{ >> >> >> >> "dfs_ha_initial_namenode_active": >> >> "%HOSTGROUP::host_group_master_1%" >> >> "dfs_ha_initial_namenode_standby": >> >> "%HOSTGROUP::host_group_master_2" >> >> } >> >> >> >> hdfs-site{ >> >> "dfs.client.failover.proxy.provider.dfs-nameservices": >> >> >> "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", >> >> "dfs.ha.automatic-failover.enabled": "true", >> >> "dfs.ha.fencing.methods": "shell(/bin/true)", >> >> "dfs.ha.namenodes.dfs-nameservices": "nn1,nn2", >> >> "dfs.namenode.http-address.dfs-nameservices.nn1": >> >> "%HOSTGROUP::host_group_master_1%:50070", >> >> "dfs.namenode.http-address.dfs-nameservices.nn2": >> >> "%HOSTGROUP::host_group_master_2%:50070", >> >> "dfs.namenode.https-address.dfs-nameservices.nn1": >> >> "%HOSTGROUP::host_group_master_1%:50470", >> >> "dfs.namenode.https-address.dfs-nameservices.nn2": >> >> "%HOSTGROUP::host_group_master_2%:50470", >> >> "dfs.namenode.rpc-address.dfs-nameservices.nn1": >> >> "%HOSTGROUP::host_group_master_1%:8020", >> >> "dfs.namenode.rpc-address.dfs-nameservices.nn2": >> >> "%HOSTGROUP::host_group_master_2%:8020", >> >> "dfs.namenode.shared.edits.dir": >> >> >> "qjournal://%HOSTGROUP::host_group_master_1%:8485;%HOSTGROUP::host_group_master_2%:8485;%HOSTGROUP::host_group_master_3%:8485/dfs-nameservices", >> >> "dfs.nameservices": "dfs-nameservices" >> >> >> >> } >> >> >> >> >> >> core-site{ >> >> "fs.defaultFS": "hdfs://dfs-nameservices", >> >> "ha.zookeeper.quorum": >> >> >> "%HOSTGROUP::host_group_master_1%:2181,%HOSTGROUP::host_group_master_2%:2181,%HOSTGROUP::host_group_master_3%:2181" >> >> >> >> } >> >> >> >> >> >> >> >> This is the log message of Standby Namenode server. >> >> >> >> 2015-08-25 08:26:26,373 INFO zookeeper.ZooKeeper >> >> (Environment.java:logEnv(100)) - Client >> >> environment:user.dir=/usr/hdp/2.2.6.0-2800/hadoop >> >> 2015-08-25 08:26:26,380 INFO zookeeper.ZooKeeper >> >> (ZooKeeper.java:<init>(438)) - Initiating client connection, >> >> >> connectString=usw2ha2dpma01.local:2181,usw2ha2dpma02.local:2181,usw2ha2dpma03.local:2181 >> >> sessionTimeout=5000 >> >> >> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5b7a5baa >> >> 2015-08-25 08:26:26,399 INFO zookeeper.ClientCnxn >> >> (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to >> >> server usw2ha2dpma02.local/172.17.213.51:2181. Will not attempt to >> >> authenticate using SASL (unknown error) >> >> 2015-08-25 08:26:26,405 INFO zookeeper.ClientCnxn >> >> (ClientCnxn.java:primeConnection(852)) - Socket connection established >> to >> >> usw2ha2dpma02.local/172.17.213.51:2181, initiating session >> >> 2015-08-25 08:26:26,413 INFO zookeeper.ClientCnxn >> >> (ClientCnxn.java:onConnected(1235)) - Session establishment complete on >> >> server usw2ha2dpma02.local/172.17.213.51:2181, sessionid = >> >> 0x24f63f6f3050001, negotiated timeout = 5000 >> >> 2015-08-25 08:26:26,416 INFO ha.ActiveStandbyElector >> >> (ActiveStandbyElector.java:processWatchEvent(547)) - Session connected. >> >> 2015-08-25 08:26:26,441 INFO ipc.CallQueueManager >> >> (CallQueueManager.java:<init>(53)) - Using callQueue class >> >> java.util.concurrent.LinkedBlockingQueue >> >> 2015-08-25 08:26:26,472 INFO ipc.Server (Server.java:run(605)) - >> Starting >> >> Socket Reader #1 for port 8019 >> >> 2015-08-25 08:26:26,520 INFO ipc.Server (Server.java:run(827)) - IPC >> >> Server Responder: starting >> >> 2015-08-25 08:26:26,526 INFO ipc.Server (Server.java:run(674)) - IPC >> >> Server listener on 8019: starting >> >> 2015-08-25 08:26:27,596 INFO ipc.Client >> >> (Client.java:handleConnectionFailure(859)) - Retrying connect to >> server: >> >> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry >> >> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, >> sleepTime=1000 >> >> MILLISECONDS) >> >> 2015-08-25 08:26:27,615 WARN ha.HealthMonitor >> >> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception >> trying >> >> to monitor health of NameNode at usw2ha2dpma02.local/ >> 172.17.213.51:8020: >> >> Call From usw2ha2dpma02.local/172.17.213.51 to >> usw2ha2dpma02.local:8020 >> >> failed on connection exception: java.net.ConnectException: Connection >> >> refused; For more details see: >> >> http://wiki.apache.org/hadoop/ConnectionRefused >> >> 2015-08-25 08:26:27,616 INFO ha.HealthMonitor >> >> (HealthMonitor.java:enterState(238)) - Entering state >> SERVICE_NOT_RESPONDING >> >> 2015-08-25 08:26:27,616 INFO ha.ZKFailoverController >> >> (ZKFailoverController.java:setLastHealthState(850)) - Local service >> >> NameNode at usw2ha2dpma02.local/172.17.213.51:8020 entered state: >> >> SERVICE_NOT_RESPONDING >> >> 2015-08-25 08:26:27,616 INFO ha.ZKFailoverController >> >> (ZKFailoverController.java:recheckElectability(766)) - Quitting master >> >> election for NameNode at usw2ha2dpma02.local/172.17.213.51:8020 and >> >> marking that fencing is necessary >> >> 2015-08-25 08:26:27,617 INFO ha.ActiveStandbyElector >> >> (ActiveStandbyElector.java:quitElection(354)) - Yielding from election >> >> 2015-08-25 08:26:27,621 INFO zookeeper.ClientCnxn >> >> (ClientCnxn.java:run(512)) - EventThread shut down >> >> 2015-08-25 08:26:27,621 INFO zookeeper.ZooKeeper >> >> (ZooKeeper.java:close(684)) - Session: 0x24f63f6f3050001 closed >> >> 2015-08-25 08:26:29,623 INFO ipc.Client >> >> (Client.java:handleConnectionFailure(859)) - Retrying connect to >> server: >> >> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry >> >> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, >> sleepTime=1000 >> >> MILLISECONDS) >> >> 2015-08-25 08:26:29,624 WARN ha.HealthMonitor >> >> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception >> trying >> >> to monitor health of NameNode at usw2ha2dpma02.local/ >> 172.17.213.51:8020: >> >> Call From usw2ha2dpma02.local/172.17.213.51 to >> usw2ha2dpma02.local:8020 >> >> failed on connection exception: java.net.ConnectException: Connection >> >> refused; For more details see: >> >> http://wiki.apache.org/hadoop/ConnectionRefused >> >> 2015-08-25 08:26:31,626 INFO ipc.Client >> >> (Client.java:handleConnectionFailure(859)) - Retrying connect to >> server: >> >> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry >> >> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, >> sleepTime=1000 >> >> MILLISECONDS) >> >> 2015-08-25 08:26:31,627 WARN ha.HealthMonitor >> >> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception >> trying >> >> to monitor health of NameNode at usw2ha2dpma02.local/ >> 172.17.213.51:8020: >> >> Call From usw2ha2dpma02.local/172.17.213.51 to >> usw2ha2dpma02.local:8020 >> >> failed on connection exception: java.net.ConnectException: Connection >> >> refused; For more details see: >> >> http://wiki.apache.org/hadoop/ConnectionRefused >> >> 2015-08-25 08:26:33,629 INFO ipc.Client >> >> (Client.java:handleConnectionFailure(859)) - Retrying connect to >> server: >> >> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry >> >> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, >> sleepTime=1000 >> >> MILLISECONDS) >> >> 2015-08-25 08:26:33,630 WARN ha.HealthMonitor >> >> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception >> trying >> >> to >> >> >> >> >> >> >
