Re: RE: Failed to active namenode when config HA

?????? Tue, 30 Sep 2014 01:51:50 -0700

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine 
the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<brahmareddy.batt...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<user@hadoop.apache.org>; 

Subject:  RE:  Failed to active namenode when config HA

 You need to start the ZKFC process which will monitor and manage  the state of 
namenode.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper 
quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of 
coordination data, notifying clients of changes in that data, and monitoring 
clients for failures. The implementation  of automatic HDFS failover relies on 
ZooKeeper for the following things:

 Failure detection - each of the NameNode machines in the cluster maintains a 
persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session 
will expire, notifying the other NameNode that a failover  should be triggered.

 Active NameNode election - ZooKeeper provides a simple mechanism to 
exclusively elect a node as active. If the current active NameNode crashes, 
another node may take a special exclusive lock in ZooKeeper indicating  that it 
should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client 
which also monitors and manages the state of the NameNode. Each of the machines 
which runs a NameNode also runs  a ZKFC, and that ZKFC is responsible for:

 Health monitoring - the ZKFC pings its local NameNode on a periodic basis with 
a health-check command. So long as the NameNode responds in a timely fashion 
with a healthy status, the ZKFC considers the node  healthy. If the node has 
crashed, frozen, or otherwise entered an unhealthy state, the health monitor 
will mark it as unhealthy.

 ZooKeeper session management - when the local NameNode is healthy, the ZKFC 
holds a session open in ZooKeeper. If the local NameNode is active, it also 
holds a special "lock" znode. This lock uses ZooKeeper's  support for 
"ephemeral" nodes; if the session expires, the lock node will be automatically 
deleted.

 ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees 
that no other node currently holds the lock znode, it will itself try to 
acquire the lock. If it succeeds, then it has "won the  election", and is 
responsible for running a failover to make its local NameNode active. The 
failover process is similar to the manual failover described above: first, the 
previous active is fenced if necessary, and then the local NameNode transitions 
to active  state.

 Please go through following link for more details..

http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Thanks & Regards

Brahma Reddy Battula

 From: ?????? [475053...@qq.com]
 Sent: Tuesday, September 30, 2014 8:54 AM
 To: user
 Subject: Re: Failed to active namenode when config HA

 Hi, Matt

 Thank you very much for your response!

 There were some mistakes in my description as i wrote this mail in a hurry. I 
put those properties is in hdfs-site.xml not core-site.xml.

 There are four name nodes because i also using HDFS federation, so there are 
two nameservices in porperty
 <name>dfs.nameservices</name>
 and each nameservice will have two namenodes.

 If i configure only HA (only one nameservice), everything is ok, and HAAdmin 
can determine the namenodes nn1, nn3.

 But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 
and nn2,nn4 for nameservices2. I can start these namenodes successfully and the 
namenodes are all in standby state at th beginning. But if i want to change one 
namenode to active  state, use command
 hdfs haadmin -transitionToActive nn1
 HAAdmin throw exception as it cannot determine the four 
namenodes(nn1,nn2,nn3,nn4) at all.

 Do you used to configure HA&Federation and know what may cause these problem?

 Thanks,
 Lucy

 ------------------ Original ------------------
  From:  "Matt Narrell";<matt.narr...@gmail.com>;
 Send time: Monday, Sep 29, 2014 6:28 AM
 To: "user"<user@hadoop.apache.org>; 
 Subject:  Re: Failed to active namenode when config HA

 I??m pretty sure HDFS HA is relegated to two name nodes (not four), designated 
active and standby.  Secondly, I believe these properties should be in 
hdfs-site.xml NOT core-site.xml.

 Furthermore, I think your HDFS nameservices are misconfigured.  Consider the 
following:

 <?xml version="1.0"?>
 <configuration>
   <property>
     <name>dfs.replication</name>
     <value>3</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/var/data/hadoop/hdfs/nn</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/var/data/hadoop/hdfs/dn</value>
   </property>

     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.nameservices</name>
       <value>hdfs-cluster</value>
     </property>

     <property>
       <name>dfs.ha.namenodes.hdfs-cluster</name>
       <value>nn1,nn2</value>
     </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
         <value>namenode1:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
         <value>namenode1:50070</value>
       </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
         <value>namenode2:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
         <value>namenode2:50070</value>
       </property>

     <property>
       <name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
     </property>

     <property>
       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>

     <property>
       <name>dfs.ha.fencing.methods</name>
       <value>sshfence</value>
     </property>
     <property>
       <name>dfs.ha.fencing.ssh.private-key-files</name>
       <value>/home/hadoop/.ssh/id_rsa</value>
     </property>
 </configuration>

 mn

 On Sep 28, 2014, at 12:56 PM, ?????? <475053...@qq.com> wrote:

 > Hi,
 > 
 > I'm new to hadoop and meet some problems when config HA.
 > Below are some important configuration in core-site.xml
 > 
 >   <property>
 >     <name>dfs.nameservices</name>
 >     <value>ns1,ns2</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns1</name>
 >     <value>nn1,nn3</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns2</name>
 >     <value>nn2,nn4</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
 >     <value>namenode1:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
 >     <value>namenode3:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
 >     <value>namenode2:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
 >     <value>namenode4:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.shared.edits.dir</name>
 >     
 > <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
 >   </property>
 >   <property>
 >     <name>dfs.client.failover.proxy.provider.ns1</name>
 >     
 > <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.methods</name>
 >     <value>sshfence</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.private-key-files</name>
 >     <value>/home/hduser/.ssh/id_rsa</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
 >     <value>30000</value>
 >   </property>
 >   <property>
 >     <name>dfs.journalnode.edits.dir</name>
 >     <value>/home/hduser/mydata/hdfs/journalnode</value>
 >   </property>
 > 
 > (two nameservice ns1,ns2 is for configuring federation later. In this step, 
 > I only want launch ns1 on namenode1,namenode3)
 > 
 > After configuration, I did the following steps
 > firstly,  I start jornalnode on datanode2,datanode3,datanode4
 > secondly I format datanode1 and start namenode on it
 > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start 
 > namenode on it
 > 
 > Everything seems fine unless no namenode is active now, then i tried to 
 > active one by running 
 > hdfs haadmin -transitionToActive nn1 on namenode1
 > but strangely it says "Illegal argument: Unable to determine the nameservice 
 > id."
 > 
 > Could anyone tell me why it cannot determine nn1 from my configuration?
 > Is there something wrong in my configuraion?
 > 
 > Thanks a lot!!!
 > 
 >

Re: RE: Failed to active namenode when config HA

Reply via email to