Thanks Jean, but why does only a couple of RS get loaded with data? We are seeing out of 5 only 2 datanodes have around 90% of disk usage. Where as the rest are at around 40%.
We have run the hbase balancer, and on an average we have around 500 regions per regionserver and a total of 5 RS's. We have even disabled number of tables which are not required and currently the count of regions/RS is around 120. Another question that comes to my mind is. Somewhere down the line the Hadoop cluster tends to be imbalanced and lead to 100% disk utilization and the balancer activity has to be triggered, how do you guys handle such problem in your hbase cluster? Just a thought, could we execute the DFS balancer and after the balancing activity trigger major compaction for each table? Thanks Divye Sheth On Tue, Mar 4, 2014 at 6:45 PM, Jean-Marc Spaggiari <jean-m...@spaggiari.org > wrote: > Hi Divye, > > the DFS balancer is that last thing you want to run in your HBase > cluster.That will break all the data locallity for the compacted regions. > > On compaction, a region write the files on the local server first, then the > 2 other replicates are going on different datanodes. so on read, HBase can > garantee that data is read from local datanode dans not from another > datanode over the network. > > Have you run the HBase balancer? How many regions do you have per region > server? > > JM >