HDFS does the data rebalancing, over time as major compactions and new
data comes in, files are written first to the local node then to
remote nodes.

Whats the replication factor you are running?  HDFS on 2 nodes is
tricky, since you can either choose r=1 (no data protection) or r=2
(all writes go to both nodes).

The sweet spot is above 6 nodes alas.

-ryan

On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xion...@gmail.com> wrote:
> Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS 0.20.append
> Thanks,
> -- Weiwei
>
> On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xion...@gmail.com> wrote:
>>
>> Thanks very much for your replies.
>> Something was unclear in my previous emails. I had one node started first
>> and another was added in later. And there're already some regions created in
>> the first started node. Then I started to import more data into the same
>> table and found that it's always the first node that keeps serving the data
>> writes.
>> Actually I was expecting that the region data would be re-balanced to
>> another data node. And I did see in the master log that HBase master is
>> trying to unassigning some regions from the overloaded node and re-assign
>> them to the less-loaded node. But the real data was never migrated.
>> I think I observed the region index and cache rebalancing from the master
>> log (correct me if I were wrong).  Does anyone know how frequently this
>> happens?
>> Another question is, does HBase support data and I/O rebalancing? Or I
>> should rely on HDFS to do data rebalancing? I guess HBase should also
>> support data rebalancing otherwise every time I restart HBase the regions
>> will have to be rebalanced again. Will someone tell me how to configure or
>> program HBase to do data rebalancing?
>> Thanks,
>> -- Weiwei
>> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>>>
>>> What version of HBase are you testing?
>>>
>>> Is it literally 0 vs N assignments?
>>>
>>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xion...@gmail.com> wrote:
>>> > Thanks!
>>> >
>>> > I checked the master log and found some info like this:
>>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: balance
>>> > hri=***, src=***, dst=*** "
>>> >
>>> > So I assume the balancer is running. There's no failing info there, but
>>> > I
>>> > didn't see the regions were actually balanced as the log states.
>>> >
>>> > Is it possible that I have been keeping dumping data into the table
>>> > thus the
>>> > balancing won't work?
>>> >
>>> > Thanks,
>>> > -- Weiwei
>>> >
>>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:
>>> >
>>> >> Check the master log.  See if the load balancer is running or not.  It
>>> >> usually runs every 5 minutes by default.  It may not run if regions
>>> >> are transitioning.  It'll log regardless.
>>> >>
>>> >> St.Ack
>>> >>
>>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xion...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I recently set up a 2-node Hadoop and HBase cluster and am trying to
>>> >> > load
>>> >> > data into my HBase table using HBase client.
>>> >> >
>>> >> > The issue bothers me is that the data are always written into one
>>> >> > node of
>>> >> > the cluster, i.e., all the regions of the hbase table are on one
>>> >> > node.
>>> >> >
>>> >> > Is there any configuration I need to change for make the load
>>> >> > balanced?
>>> >> >
>>> >> > Thanks,
>>> >> > -- w
>>> >> >
>>> >>
>>> >
>>
>
>

Reply via email to