Hi all,

I would guess that this topic has probably been raised umpteen times in the 
past, so I apologise in advance if some of you are miffed.  I am not a 'big 
data' database person so all this tuning mucky-muck has me a bit confused.

We currently have two clusters:

Cluster 1
76 regionservers, each with 7TB of available HDFS and 64GB of RAM
Approx 424 regions per RS

Config
hbase.client.write.buffer = 4MB
hbase.regionserver.handler.count = 30
hbase.hregion.memstore.flush.size = 128MB
hbase.hregion.memstore.block.multiplier = 2
hbase.hregion.max.filesize = 10GB

Cluster 2
10 regionservers, each with 7TB of available HDFS and (min) 128GB of RAM
Approx.. 483 regions per RS

Config:

hbase.client.write.buffer = 2MB
hbase.regionserver.handler.count = 100
hbase.hregion.memstore.flush.size = 1GB
hbase.hregion.memstore.block.multiplier = 32
hbase.hregion.max.filesize = 6GB

The number of regions per region server seems to be approximately consistent 
given the number and configuration of regionservers and also despite the 
difference in configuration.  To begin with, the two clusters (cloudera) were 
setup using defaults but cluster 2 has been recently altered as the main entity 
using it complained of "too many regions".

My query and interest centres around how it happens that two HBASE setups with 
a big discrepancy in the number of nodes can end up with regions in the 400-500 
range.

Yours, in ignorance

Marc

Reply via email to