Hey Baggio,

Looks like you've done some good analysis.  Much of what you've mentioned under 
HBase is in the works (multi-thread compactions, distributed log splitting, 
HBCK tool).

I would definitely recommend upgrading to 0.90 when it is released, there are 
some good fixes related to exception handling and DFS errors.  The according 
HDFS releases (CDH3 or 20-append) provide true durability.

Thanks for sharing!

JG

> -----Original Message-----
> From: baggio liu [mailto:baggi...@gmail.com]
> Sent: Monday, December 13, 2010 8:45 AM
> To: user@hbase.apache.org
> Subject: Re: HBase stability
> 
> Hi  Anze,
>    Our production cluster used HBase 0.20.6 and hdfs (CDH3b2), and we work
> for stability about a month. Some issue we have been met, and may helpful
> to you.
> 
> HDFS:
>     1.  hbase file has short life cycle than map-red, some times there're many
> blocks should be delete, we should tuning for the speed of hdfs invalid block.
>     2. hadoop 0.20 branch can not deal with disk failure, HDFS-630 will be
> helpful.
>     3. region server can not deal IOException rightly. When DFSClient meet
> network error, it'll throw IOException, and it may be not fatal for region
> server, so these IOException MUST be review.
>     4. In large scale scan, there're many concurrent reader in a short time.
> We must make datanode dataxceiver number to a large number, and file
> handle limit should be tuning. In addition, the connection reuse between
> DFSClient and datanode should be done.
> 
> HBase
>     1. single thread compaction limit the speed of compaction, it should be
> made multi-thread.( during multi-thread compaction we should limit network
> bandwidth in compaction )
>     2. single thread split HLog (read HLog) wile make Hbase down time longer,
> make it multi-thread can limit HBase down time.
>     3.  Additional, some tools should be done such as meta region checker,
> fixer and so on.
>     4.  zookeeper session timeout should be tuning according to your load on
> HBase cluster.
>     5.  gc stratigy should be tuning on your region server/HMaster.
> 
>     Beside upon,  in production cluster, data loss issue should be fix  as
> while.(currently hadoop 0.20 append branch and CDH3b2 hadoop can be
> used.)
>     Because of hdfs make many optimization on throughput, for application
> like HBase (many random read/write) . Many tuning and change on hdfs
> should be done.
>     Hope this experience can be helpful to you.
> 
> 
> Thanks & Best regard
> Baggio
> 
> 
> 2010/12/14 Todd Lipcon <t...@cloudera.com>
> 
> > HI Anze,
> >
> > In word, yes - 0.20.4 is not that stable in my experience, and
> > upgrading to the latest CDH3 beta (which includes HBase 0.89.20100924)
> > should give you a huge improvement in stability.
> >
> > You'll still need to do a bit of tuning of settings, but once it's
> > well tuned it should be able to hold up under load without crashing.
> >
> > -Todd
> >
> > On Mon, Dec 13, 2010 at 2:41 AM, Anze <anzen...@volja.net> wrote:
> > > Hi all!
> > >
> > > We have been using HBase 0.20.4 (cdh3b1) in production on 2 nodes
> > > for a
> > few
> > > months now and we are having constant issues with it. We fell over
> > > all standard traps (like "Too many open files", network
> > > configuration problems,...). All in all, we had about one crash every week
> or so.
> > > Fortunately we are still using it just for background processing so
> > > our service didn't suffer directly, but we have lost huge amounts of
> > > time
> > just
> > > fixing the data errors that resulted from data not being written to
> > permanent
> > > storage. Not to mention fixing the issues.
> > > As you can probably understand, we are very frustrated with this and
> > > are seriously considering moving to another bigtable.
> > >
> > > Right now, HBase crashes whenever we run very intensive rebuild of
> > secondary
> > > index (normal table, but we use it as secondary index) to a huge
> > > table. I
> > have
> > > found this:
> > > http://wiki.apache.org/hadoop/Hbase/Troubleshooting
> > > (see problem 9)
> > > One of the lines read:
> > > "Make sure you give plenty of RAM (in hbase-env.sh), the default of
> > > 1GB
> > won't
> > > be able to sustain long running imports."
> > >
> > > So, if I understand correctly, no matter how HBase is set up, if I
> > > run an intensive enough application, it will choke? I would expect
> > > it to be
> > slower
> > > when under (too much) pressure, but not to crash.
> > >
> > > Of course, we will somehow solve this issue (working on it), but...
> > > :(
> > >
> > > What are your experiences with HBase? Is it stable? Is it just us
> > > and the
> > way
> > > we set it up?
> > >
> > > Also, would upgrading to 0.89 (cdh3b3) help?
> > >
> > > Thanks,
> > >
> > > Anze
> > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >

Reply via email to