+ dev group.
This is what I found in the /var/lib/ambari-agent/data/command-#.json in
the one of the master host.
In this you can see the , the active namenode is substituted by FQDN but
not the the standby node. Is this a bug in the Ambari version.
I am using *Ambari 2.1* version.
hadoop-env{
"dfs_ha_initial_namenode_active": "usw2ha3dpma01.local",
"hadoop_root_logger": "INFO,RFA",
"dfs_ha_initial_namenode_standby":
"%HOSTGROUP::host_group_master_2%",
"namenode_opt_permsize": "128m"
}
Thanks
Anand
On Tue, Aug 25, 2015 at 11:23 AM Anandha L Ranganathan <
[email protected]> wrote:
>
> Hi
>
> I am trying to install Active Namenode HA using blueprints.
> During the cluster creation through scripts, it does following and
> completes.
>
> 1) Journal nodes starts and initialized (formats journal node).
> 2) Initialization the HA state in zookeeper or ZKFC ( Both in Active and
> Standby namenode )
> After 96% it fails. I logged into the cluster using UI and re-started
> the standby namenode. But it throw the exception saying that Namenode not
> formatted.
> I have to manually copy the fsimage logs from using this command, "hdfs
> namenode -bootstrapStandby -force " in the standby NN server.
> and re-starting the namenode works fine and goes into standby mode.
>
> Is it something I am missing in the configuration ?
> My Namenode HA blue prints looks like this.
>
> hadoop-env{
> "dfs_ha_initial_namenode_active": "%HOSTGROUP::host_group_master_1%"
> "dfs_ha_initial_namenode_standby": "%HOSTGROUP::host_group_master_2"
> }
>
>
> hadoop-ev{
>
> "dfs_ha_initial_namenode_active":
> "%HOSTGROUP::host_group_master_1%"
> "dfs_ha_initial_namenode_standby":
> "%HOSTGROUP::host_group_master_2"
> }
>
> hdfs-site{
> "dfs.client.failover.proxy.provider.dfs-nameservices":
> "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
> "dfs.ha.automatic-failover.enabled": "true",
> "dfs.ha.fencing.methods": "shell(/bin/true)",
> "dfs.ha.namenodes.dfs-nameservices": "nn1,nn2",
> "dfs.namenode.http-address.dfs-nameservices.nn1":
> "%HOSTGROUP::host_group_master_1%:50070",
> "dfs.namenode.http-address.dfs-nameservices.nn2":
> "%HOSTGROUP::host_group_master_2%:50070",
> "dfs.namenode.https-address.dfs-nameservices.nn1":
> "%HOSTGROUP::host_group_master_1%:50470",
> "dfs.namenode.https-address.dfs-nameservices.nn2":
> "%HOSTGROUP::host_group_master_2%:50470",
> "dfs.namenode.rpc-address.dfs-nameservices.nn1":
> "%HOSTGROUP::host_group_master_1%:8020",
> "dfs.namenode.rpc-address.dfs-nameservices.nn2":
> "%HOSTGROUP::host_group_master_2%:8020",
> "dfs.namenode.shared.edits.dir":
> "qjournal://%HOSTGROUP::host_group_master_1%:8485;%HOSTGROUP::host_group_master_2%:8485;%HOSTGROUP::host_group_master_3%:8485/dfs-nameservices",
> "dfs.nameservices": "dfs-nameservices"
>
> }
>
>
> core-site{
> "fs.defaultFS": "hdfs://dfs-nameservices",
> "ha.zookeeper.quorum":
> "%HOSTGROUP::host_group_master_1%:2181,%HOSTGROUP::host_group_master_2%:2181,%HOSTGROUP::host_group_master_3%:2181"
>
> }
>
>
>
> This is the log message of Standby Namenode server.
>
> 2015-08-25 08:26:26,373 INFO zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:user.dir=/usr/hdp/2.2.6.0-2800/hadoop
> 2015-08-25 08:26:26,380 INFO zookeeper.ZooKeeper
> (ZooKeeper.java:<init>(438)) - Initiating client connection,
> connectString=usw2ha2dpma01.local:2181,usw2ha2dpma02.local:2181,usw2ha2dpma03.local:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5b7a5baa
> 2015-08-25 08:26:26,399 INFO zookeeper.ClientCnxn
> (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to
> server usw2ha2dpma02.local/172.17.213.51:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 2015-08-25 08:26:26,405 INFO zookeeper.ClientCnxn
> (ClientCnxn.java:primeConnection(852)) - Socket connection established to
> usw2ha2dpma02.local/172.17.213.51:2181, initiating session
> 2015-08-25 08:26:26,413 INFO zookeeper.ClientCnxn
> (ClientCnxn.java:onConnected(1235)) - Session establishment complete on
> server usw2ha2dpma02.local/172.17.213.51:2181, sessionid =
> 0x24f63f6f3050001, negotiated timeout = 5000
> 2015-08-25 08:26:26,416 INFO ha.ActiveStandbyElector
> (ActiveStandbyElector.java:processWatchEvent(547)) - Session connected.
> 2015-08-25 08:26:26,441 INFO ipc.CallQueueManager
> (CallQueueManager.java:<init>(53)) - Using callQueue class
> java.util.concurrent.LinkedBlockingQueue
> 2015-08-25 08:26:26,472 INFO ipc.Server (Server.java:run(605)) - Starting
> Socket Reader #1 for port 8019
> 2015-08-25 08:26:26,520 INFO ipc.Server (Server.java:run(827)) - IPC
> Server Responder: starting
> 2015-08-25 08:26:26,526 INFO ipc.Server (Server.java:run(674)) - IPC
> Server listener on 8019: starting
> 2015-08-25 08:26:27,596 INFO ipc.Client
> (Client.java:handleConnectionFailure(859)) - Retrying connect to server:
> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
> MILLISECONDS)
> 2015-08-25 08:26:27,615 WARN ha.HealthMonitor
> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying
> to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020:
> Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020
> failed on connection exception: java.net.ConnectException: Connection
> refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
> 2015-08-25 08:26:27,616 INFO ha.HealthMonitor
> (HealthMonitor.java:enterState(238)) - Entering state SERVICE_NOT_RESPONDING
> 2015-08-25 08:26:27,616 INFO ha.ZKFailoverController
> (ZKFailoverController.java:setLastHealthState(850)) - Local service
> NameNode at usw2ha2dpma02.local/172.17.213.51:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2015-08-25 08:26:27,616 INFO ha.ZKFailoverController
> (ZKFailoverController.java:recheckElectability(766)) - Quitting master
> election for NameNode at usw2ha2dpma02.local/172.17.213.51:8020 and
> marking that fencing is necessary
> 2015-08-25 08:26:27,617 INFO ha.ActiveStandbyElector
> (ActiveStandbyElector.java:quitElection(354)) - Yielding from election
> 2015-08-25 08:26:27,621 INFO zookeeper.ClientCnxn
> (ClientCnxn.java:run(512)) - EventThread shut down
> 2015-08-25 08:26:27,621 INFO zookeeper.ZooKeeper
> (ZooKeeper.java:close(684)) - Session: 0x24f63f6f3050001 closed
> 2015-08-25 08:26:29,623 INFO ipc.Client
> (Client.java:handleConnectionFailure(859)) - Retrying connect to server:
> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
> MILLISECONDS)
> 2015-08-25 08:26:29,624 WARN ha.HealthMonitor
> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying
> to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020:
> Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020
> failed on connection exception: java.net.ConnectException: Connection
> refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
> 2015-08-25 08:26:31,626 INFO ipc.Client
> (Client.java:handleConnectionFailure(859)) - Retrying connect to server:
> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
> MILLISECONDS)
> 2015-08-25 08:26:31,627 WARN ha.HealthMonitor
> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying
> to monitor health of NameNode at usw2ha2dpma02.local/172.17.213.51:8020:
> Call From usw2ha2dpma02.local/172.17.213.51 to usw2ha2dpma02.local:8020
> failed on connection exception: java.net.ConnectException: Connection
> refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
> 2015-08-25 08:26:33,629 INFO ipc.Client
> (Client.java:handleConnectionFailure(859)) - Retrying connect to server:
> usw2ha2dpma02.local/172.17.213.51:8020. Already tried 0 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
> MILLISECONDS)
> 2015-08-25 08:26:33,630 WARN ha.HealthMonitor
> (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying
> to
>
>