Martin,

The trouble was due to a defect in how HDFS managed partitioning deletion work 
among the datanodes. Especially when under high write load, HBase can post a 
lot of deletes due to compactions. Running the balancer just makes it worse -- 
additional replications into the face of uneven deletion just brings the end 
faster when a datanode fills. 

This is fixed in CDH3 via HDFS-630: 
https://issues.apache.org/jira/browse/HDFS-630

This is fixed in HDFS 0.21 + via HADOOP-5124: 
https://issues.apache.org/jira/browse/HADOOP-5124

All,

It might be a good idea to apply one of these fixes to the ASF 0.20-append 
branch.

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back.
  - Piet Hein (via Tom White)


--- On Mon, 1/24/11, Martin Fiala <fial...@gmail.com> wrote:

> From: Martin Fiala <fial...@gmail.com>
> Subject: DFS rebalancing with running HBase
> To: user@hbase.apache.org
> Date: Monday, January 24, 2011, 4:21 AM
> Hello,
> 
> in one old thread regarding hadoop/hbase 0.19.x Andrew
> Purtell wrote, that running DFS balancer while HBase is
> running, is not recommended. I didn't find any remarks about
> this in Hadoop or HBase documentation.
> 
> http://mail-archives.apache.org/mod_mbox/hbase-user/200905.mbox/%3c812604.43615...@web65510.mail.ac4.yahoo.com%3e
> 
> Is it still the case? What bad things can happen?
> 
> It is quite clear, that with writing heavily to HBase and
> running balancer simultaneously, the cluster is not going to
> be balanced. It can become even more unbalanced.
> What about running balancer when we are only reading from
> HBase or writing small amounts of records?
> 
> Regards,
> Martin Fiala
> 


      

Reply via email to