Thanks for your info Ryan. Does HBase do major compaction regularly or do I need to manually do this? If it's automatic, how frequently is it performed?
I am running 1 replication. Thanks, -- Weiwei On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ryano...@gmail.com> wrote: > HDFS does the data rebalancing, over time as major compactions and new > data comes in, files are written first to the local node then to > remote nodes. > > Whats the replication factor you are running? HDFS on 2 nodes is > tricky, since you can either choose r=1 (no data protection) or r=2 > (all writes go to both nodes). > > The sweet spot is above 6 nodes alas. > > -ryan > > On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xion...@gmail.com> wrote: > > Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS 0.20.append > > Thanks, > > -- Weiwei > > > > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xion...@gmail.com> wrote: > >> > >> Thanks very much for your replies. > >> Something was unclear in my previous emails. I had one node started > first > >> and another was added in later. And there're already some regions > created in > >> the first started node. Then I started to import more data into the same > >> table and found that it's always the first node that keeps serving the > data > >> writes. > >> Actually I was expecting that the region data would be re-balanced to > >> another data node. And I did see in the master log that HBase master is > >> trying to unassigning some regions from the overloaded node and > re-assign > >> them to the less-loaded node. But the real data was never migrated. > >> I think I observed the region index and cache rebalancing from the > master > >> log (correct me if I were wrong). Does anyone know how frequently this > >> happens? > >> Another question is, does HBase support data and I/O rebalancing? Or I > >> should rely on HDFS to do data rebalancing? I guess HBase should also > >> support data rebalancing otherwise every time I restart HBase the > regions > >> will have to be rebalanced again. Will someone tell me how to configure > or > >> program HBase to do data rebalancing? > >> Thanks, > >> -- Weiwei > >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ryano...@gmail.com> > wrote: > >>> > >>> What version of HBase are you testing? > >>> > >>> Is it literally 0 vs N assignments? > >>> > >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xion...@gmail.com> > wrote: > >>> > Thanks! > >>> > > >>> > I checked the master log and found some info like this: > >>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: balance > >>> > hri=***, src=***, dst=*** " > >>> > > >>> > So I assume the balancer is running. There's no failing info there, > but > >>> > I > >>> > didn't see the regions were actually balanced as the log states. > >>> > > >>> > Is it possible that I have been keeping dumping data into the table > >>> > thus the > >>> > balancing won't work? > >>> > > >>> > Thanks, > >>> > -- Weiwei > >>> > > >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote: > >>> > > >>> >> Check the master log. See if the load balancer is running or not. > It > >>> >> usually runs every 5 minutes by default. It may not run if regions > >>> >> are transitioning. It'll log regardless. > >>> >> > >>> >> St.Ack > >>> >> > >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xion...@gmail.com> > >>> >> wrote: > >>> >> > Hi, > >>> >> > > >>> >> > I recently set up a 2-node Hadoop and HBase cluster and am trying > to > >>> >> > load > >>> >> > data into my HBase table using HBase client. > >>> >> > > >>> >> > The issue bothers me is that the data are always written into one > >>> >> > node of > >>> >> > the cluster, i.e., all the regions of the hbase table are on one > >>> >> > node. > >>> >> > > >>> >> > Is there any configuration I need to change for make the load > >>> >> > balanced? > >>> >> > > >>> >> > Thanks, > >>> >> > -- w > >>> >> > > >>> >> > >>> > > >> > > > > >