by default runs 1x/day. you can do it manually in the hbase shell by typing:
hbase(main):001:0> major_compact "table_name" -ryan On Mon, Mar 14, 2011 at 3:25 PM, Weiwei Xiong <xion...@gmail.com> wrote: > Thanks for your info Ryan. > Does HBase do major compaction regularly or do I need to manually do this? > If it's automatic, how frequently is it performed? > I am running 1 replication. > Thanks, > -- Weiwei > > On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ryano...@gmail.com> wrote: >> >> HDFS does the data rebalancing, over time as major compactions and new >> data comes in, files are written first to the local node then to >> remote nodes. >> >> Whats the replication factor you are running? HDFS on 2 nodes is >> tricky, since you can either choose r=1 (no data protection) or r=2 >> (all writes go to both nodes). >> >> The sweet spot is above 6 nodes alas. >> >> -ryan >> >> On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xion...@gmail.com> wrote: >> > Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS 0.20.append >> > Thanks, >> > -- Weiwei >> > >> > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xion...@gmail.com> wrote: >> >> >> >> Thanks very much for your replies. >> >> Something was unclear in my previous emails. I had one node started >> >> first >> >> and another was added in later. And there're already some regions >> >> created in >> >> the first started node. Then I started to import more data into the >> >> same >> >> table and found that it's always the first node that keeps serving the >> >> data >> >> writes. >> >> Actually I was expecting that the region data would be re-balanced to >> >> another data node. And I did see in the master log that HBase master is >> >> trying to unassigning some regions from the overloaded node and >> >> re-assign >> >> them to the less-loaded node. But the real data was never migrated. >> >> I think I observed the region index and cache rebalancing from the >> >> master >> >> log (correct me if I were wrong). Does anyone know how frequently this >> >> happens? >> >> Another question is, does HBase support data and I/O rebalancing? Or I >> >> should rely on HDFS to do data rebalancing? I guess HBase should also >> >> support data rebalancing otherwise every time I restart HBase the >> >> regions >> >> will have to be rebalanced again. Will someone tell me how to configure >> >> or >> >> program HBase to do data rebalancing? >> >> Thanks, >> >> -- Weiwei >> >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ryano...@gmail.com> >> >> wrote: >> >>> >> >>> What version of HBase are you testing? >> >>> >> >>> Is it literally 0 vs N assignments? >> >>> >> >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xion...@gmail.com> >> >>> wrote: >> >>> > Thanks! >> >>> > >> >>> > I checked the master log and found some info like this: >> >>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: >> >>> > balance >> >>> > hri=***, src=***, dst=*** " >> >>> > >> >>> > So I assume the balancer is running. There's no failing info there, >> >>> > but >> >>> > I >> >>> > didn't see the regions were actually balanced as the log states. >> >>> > >> >>> > Is it possible that I have been keeping dumping data into the table >> >>> > thus the >> >>> > balancing won't work? >> >>> > >> >>> > Thanks, >> >>> > -- Weiwei >> >>> > >> >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote: >> >>> > >> >>> >> Check the master log. See if the load balancer is running or not. >> >>> >> It >> >>> >> usually runs every 5 minutes by default. It may not run if regions >> >>> >> are transitioning. It'll log regardless. >> >>> >> >> >>> >> St.Ack >> >>> >> >> >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xion...@gmail.com> >> >>> >> wrote: >> >>> >> > Hi, >> >>> >> > >> >>> >> > I recently set up a 2-node Hadoop and HBase cluster and am trying >> >>> >> > to >> >>> >> > load >> >>> >> > data into my HBase table using HBase client. >> >>> >> > >> >>> >> > The issue bothers me is that the data are always written into one >> >>> >> > node of >> >>> >> > the cluster, i.e., all the regions of the hbase table are on one >> >>> >> > node. >> >>> >> > >> >>> >> > Is there any configuration I need to change for make the load >> >>> >> > balanced? >> >>> >> > >> >>> >> > Thanks, >> >>> >> > -- w >> >>> >> > >> >>> >> >> >>> > >> >> >> > >> > > >