Thanks Jean, but why does only a couple of RS get loaded with data? We are
seeing out of 5 only 2 datanodes have around 90% of disk usage. Where as
the rest are at around 40%.

We have run the hbase balancer, and on an average we have around 500
regions per regionserver and a total of 5 RS's. We have even disabled
number of tables which are not required and currently the count of
regions/RS is around 120.

Another question that comes to my mind is. Somewhere down the line the
Hadoop cluster tends to be imbalanced and lead to 100% disk utilization and
the balancer activity has to be triggered, how do you guys handle such
problem in your hbase cluster?

Just a thought, could we execute the DFS balancer and after the balancing
activity trigger major compaction for each table?

Thanks
Divye Sheth


On Tue, Mar 4, 2014 at 6:45 PM, Jean-Marc Spaggiari <jean-m...@spaggiari.org
> wrote:

> Hi Divye,
>
> the DFS balancer is that last thing you want to run in your HBase
> cluster.That will break all the data locallity for the compacted regions.
>
> On compaction, a region write the files on the local server first, then the
> 2 other replicates are going on different datanodes. so on read, HBase can
> garantee that data is read from local datanode dans not from another
> datanode over the network.
>
> Have you run the HBase balancer? How many regions do you have per region
> server?
>
> JM
>

Reply via email to