Re: Dealing with single point of failure

Michel Segel Wed, 14 Dec 2011 18:23:54 -0800

Sorry to join late...
SPoF is a real problem if your planning to serve data realtime from your 
cluster.
( Yes you can do this w HBase ...)


Then, regardless of data loss, you have to bring up the cluster.
Down time can be significant enough to kill your business, depending on your 
use case.
Sure there are ways to make the NN more fault tolerant, but then you increase 
the complexity of your solution and still have to worry about automatic 
failover. 

MapR did a nice little trick that I would expect to show up in some fashion in 
Apache some time down the road. 

( my bet is that someone will be clever enough to reverse engineer this. )

Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 14, 2011, at 1:28 AM, "M. C. Srivas" <mcsri...@gmail.com> wrote:

> On Sat, Oct 29, 2011 at 1:34 PM, lars hofhansl <lhofha...@yahoo.com> wrote:
> 
>> This is more of "theoretical problem" really.
>> Yahoo and others claim they lost far more data due to human error than any
>> HDFS problems (including Namenode failures).
>> 
> 
> Actually it is not theoretical at all.
> 
> SPOF  !=  data-loss.
> 
> Data-loss can occur even if you don't have any SPOF's.  Vice versa, many
> SPOF systems do not have data-loss (eg, a single Netapp).
> 
> SPOF == lack of high-availability.
> 
> Which is indeed the case with HDFS, even at Y!  For example, when a cluster
> is upgraded it becomes unavailable.
> 
> @Mark:
> the Avatar-node is not for the faint-hearted. AFAIK, only FB runs it.
> Konstantin Shvachko and co at eBay have a much better NN-SPOF solution in
> 0.22 that was just released. I recommend you try that.
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> You can prevent data loss by having the namenode write the metadata to
>> another machine (via NFS or DRBD or if you have a SAN).
>> You'll still have an outage while switching over to a different machine,
>> but at least you won't lose any data.
>> 
>> 
>> Facebook has a partial solution (Avatarnode) and the HSFS folks are
>> working on a solution (which like Avatarnode mainly involves keeping
>> a hot copy of the Namenode so that failover is "instantaneous" - 1 or 2
>> minutes at most).
>> 
>> 
>> ----- Original Message -----
>> From: Mark <static.void....@gmail.com>
>> To: user@hbase.apache.org
>> Cc:
>> Sent: Saturday, October 29, 2011 11:46 AM
>> Subject: Dealing with single point of failure
>> 
>> How does one deal with the fact that HBase has a single point of failure..
>> namely the namenode. What steps can be taken to eliminate and/or minimize
>> the impact of a namenode failure? What can a situation where reliability is
>> of utmost importance should one choose an alternative technology.. ie
>> Cassandra?
>> 
>> Thanks
>> 
>>

Re: Dealing with single point of failure

Reply via email to