FYI I am researching on NN HA with hbase and wanted to know if any graceful
behavior exists in the following scenario. Got the answer thanks.

 

-----Original Message-----
From: Michael Segel [mailto:michael_se...@hotmail.com] 
Sent: Monday, March 07, 2011 6:18 PM
To: user@hbase.apache.org
Subject: RE: will HBase detect NN failure?

 

 

Silly question...

 

If you lose the name node (see your thread below)... 

Why do you not restart your data nodes as well?

 

Your NN is a 'single point of failure' and when you lose the name node, you
pretty much have a DOA system.

Since HBase sits on top of HDFS, you're bound to have inconsistencies and
issues.  If you lose your NN, you should be bringing down the entire cluster
and restarting *everything*.

Yes this is a pain, but it would solve a lot of your problems...

 

 

JMHO

 

-Mike

 

 

> Date: Mon, 7 Mar 2011 15:48:27 +0530

> From: gok...@huawei.com

> Subject: RE: will HBase detect NN failure?

> To: user@hbase.apache.org

> 

> I have got this issue in the build taken from latest append trunk only.

> 

>  

> 

> These are the steps to reproduce.

> 

> 1. Write a file and do some syncs but not close 

> 

> 2. Restart NN

> 

> 3. Run the following while loop for the above file

> 

>  

> 

>   _____  

> 

> From: Ryan Rawson [mailto:ryano...@gmail.com] 

> Sent: Monday, March 07, 2011 3:26 PM

> To: gok...@huawei.com; user@hbase.apache.org

> Subject: Re: will HBase detect NN failure?

> 

>  

> 

> There are a series of patches that address this, check the recent commit

> history of append branch. 

> 

> On Mar 7, 2011 1:52 AM, "Gokulakannan M" <gok...@huawei.com> wrote:

> > Hi All,

> > 

> > 

> > 

> > In HBase 0.90 I have seen that it has a fault tolerant behavior

> > of triggering lease recovery and closing the file when the writer dies
in

> > the middle. Yet does hbase have any workaround/recovery when Namenode is

> > restarted in the middle of the file write(possibly the HLog file , after

> > some syncs)???

> > 

> > I faced a problem in the above scenario. When the NN is

> > restarted(but not DN), the following code goes into infinite loop as
lease

> > recovery is not at all happening. But once the DN is restarted, the file

> can

> > be recovered successfully(I think the DN is not sending those partial

> blocks

> > in blocksBeingWritten to NN when only NN is restarted). 

> > 

> > 

> > 

> > // Recover the files lease if necessary

> > boolean recovered = false;

> > while (!recovered) {

> > try {

> > FSDataOutputStream out = fs.append(logfiles[i].getPath());

> > out.close();

> > recovered = true;

> > } catch (IOException e) {

> > if (LOG.isDebugEnabled()) {

> > LOG.debug("Triggering lease recovery.");

> > }

> > try {

> > Thread.sleep(leaseRecoveryPeriod);

> > } catch (InterruptedException ex) {

> > // ignore it and try again

> > }

> > }

> > 

> > 

> > 

> > 

> > 

> > Thanks,

> > 

> > Gokul

> > 

> > 

> > 

> > 

> > 

> > 

> > 

> > 

> > 

> 

                                

Reply via email to