I hope I'm not hijacking the thread but I'm seeing what I think is a
similar issue. About a week ago I loaded a bunch of data into a newly
created table. It took about an hour and resulted in 12 regions being
created on a single node. (Afterwards I remembered a conversation with
JD where he described this behavior and how you could pre-create at
least N regions where N is your number of nodes to get better
distribution off the bat).

Anyway, it's been about a week and all regions for the table are still
on 1 node. I see messages like this in the logs every 5 minutes:

2011-03-14 15:59:03,148 INFO
org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
servers=4 regions=62 average=15.5 mostloaded=16 leastloaded=16

It seems the total regions are evenly balanced, but individual tables
are not. Where should I look to troubleshoot why this table's regions
(as well as others) aren't evenly distributed? I'd guess that I can
major compact all tables to fix it, but I'd like to figure out why it
hasn't happened automatically.

HBase 0.90.0
CDH3b2

thanks,
Bill

On Mon, Mar 14, 2011 at 3:31 PM, Weiwei Xiong <xion...@gmail.com> wrote:
> I see.  Thanks Ryan.
>
> -- Weiwei
>
> On Mon, Mar 14, 2011 at 3:28 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>
>> by default runs 1x/day. you can do it manually in the hbase shell by
>> typing:
>>
>> hbase(main):001:0> major_compact "table_name"
>>
>> -ryan
>>
>>
>> On Mon, Mar 14, 2011 at 3:25 PM, Weiwei Xiong <xion...@gmail.com> wrote:
>> > Thanks for your info Ryan.
>> > Does HBase do major compaction regularly or do I need to manually do
>> this?
>> > If it's automatic, how frequently is it performed?
>> > I am running 1 replication.
>> > Thanks,
>> > -- Weiwei
>> >
>> > On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>> >>
>> >> HDFS does the data rebalancing, over time as major compactions and new
>> >> data comes in, files are written first to the local node then to
>> >> remote nodes.
>> >>
>> >> Whats the replication factor you are running?  HDFS on 2 nodes is
>> >> tricky, since you can either choose r=1 (no data protection) or r=2
>> >> (all writes go to both nodes).
>> >>
>> >> The sweet spot is above 6 nodes alas.
>> >>
>> >> -ryan
>> >>
>> >> On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xion...@gmail.com>
>> wrote:
>> >> > Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS
>> 0.20.append
>> >> > Thanks,
>> >> > -- Weiwei
>> >> >
>> >> > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xion...@gmail.com>
>> wrote:
>> >> >>
>> >> >> Thanks very much for your replies.
>> >> >> Something was unclear in my previous emails. I had one node started
>> >> >> first
>> >> >> and another was added in later. And there're already some regions
>> >> >> created in
>> >> >> the first started node. Then I started to import more data into the
>> >> >> same
>> >> >> table and found that it's always the first node that keeps serving
>> the
>> >> >> data
>> >> >> writes.
>> >> >> Actually I was expecting that the region data would be re-balanced to
>> >> >> another data node. And I did see in the master log that HBase master
>> is
>> >> >> trying to unassigning some regions from the overloaded node and
>> >> >> re-assign
>> >> >> them to the less-loaded node. But the real data was never migrated.
>> >> >> I think I observed the region index and cache rebalancing from the
>> >> >> master
>> >> >> log (correct me if I were wrong).  Does anyone know how frequently
>> this
>> >> >> happens?
>> >> >> Another question is, does HBase support data and I/O rebalancing? Or
>> I
>> >> >> should rely on HDFS to do data rebalancing? I guess HBase should also
>> >> >> support data rebalancing otherwise every time I restart HBase the
>> >> >> regions
>> >> >> will have to be rebalanced again. Will someone tell me how to
>> configure
>> >> >> or
>> >> >> program HBase to do data rebalancing?
>> >> >> Thanks,
>> >> >> -- Weiwei
>> >> >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ryano...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> What version of HBase are you testing?
>> >> >>>
>> >> >>> Is it literally 0 vs N assignments?
>> >> >>>
>> >> >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xion...@gmail.com>
>> >> >>> wrote:
>> >> >>> > Thanks!
>> >> >>> >
>> >> >>> > I checked the master log and found some info like this:
>> >> >>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster:
>> >> >>> > balance
>> >> >>> > hri=***, src=***, dst=*** "
>> >> >>> >
>> >> >>> > So I assume the balancer is running. There's no failing info
>> there,
>> >> >>> > but
>> >> >>> > I
>> >> >>> > didn't see the regions were actually balanced as the log states.
>> >> >>> >
>> >> >>> > Is it possible that I have been keeping dumping data into the
>> table
>> >> >>> > thus the
>> >> >>> > balancing won't work?
>> >> >>> >
>> >> >>> > Thanks,
>> >> >>> > -- Weiwei
>> >> >>> >
>> >> >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:
>> >> >>> >
>> >> >>> >> Check the master log.  See if the load balancer is running or
>> not.
>> >> >>> >>  It
>> >> >>> >> usually runs every 5 minutes by default.  It may not run if
>> regions
>> >> >>> >> are transitioning.  It'll log regardless.
>> >> >>> >>
>> >> >>> >> St.Ack
>> >> >>> >>
>> >> >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <
>> xion...@gmail.com>
>> >> >>> >> wrote:
>> >> >>> >> > Hi,
>> >> >>> >> >
>> >> >>> >> > I recently set up a 2-node Hadoop and HBase cluster and am
>> trying
>> >> >>> >> > to
>> >> >>> >> > load
>> >> >>> >> > data into my HBase table using HBase client.
>> >> >>> >> >
>> >> >>> >> > The issue bothers me is that the data are always written into
>> one
>> >> >>> >> > node of
>> >> >>> >> > the cluster, i.e., all the regions of the hbase table are on
>> one
>> >> >>> >> > node.
>> >> >>> >> >
>> >> >>> >> > Is there any configuration I need to change for make the load
>> >> >>> >> > balanced?
>> >> >>> >> >
>> >> >>> >> > Thanks,
>> >> >>> >> > -- w
>> >> >>> >> >
>> >> >>> >>
>> >> >>> >
>> >> >>
>> >> >
>> >> >
>> >
>> >
>>
>

Reply via email to