Hi all, I would guess that this topic has probably been raised umpteen times in the past, so I apologise in advance if some of you are miffed. I am not a 'big data' database person so all this tuning mucky-muck has me a bit confused.
We currently have two clusters: Cluster 1 76 regionservers, each with 7TB of available HDFS and 64GB of RAM Approx 424 regions per RS Config hbase.client.write.buffer = 4MB hbase.regionserver.handler.count = 30 hbase.hregion.memstore.flush.size = 128MB hbase.hregion.memstore.block.multiplier = 2 hbase.hregion.max.filesize = 10GB Cluster 2 10 regionservers, each with 7TB of available HDFS and (min) 128GB of RAM Approx.. 483 regions per RS Config: hbase.client.write.buffer = 2MB hbase.regionserver.handler.count = 100 hbase.hregion.memstore.flush.size = 1GB hbase.hregion.memstore.block.multiplier = 32 hbase.hregion.max.filesize = 6GB The number of regions per region server seems to be approximately consistent given the number and configuration of regionservers and also despite the difference in configuration. To begin with, the two clusters (cloudera) were setup using defaults but cluster 2 has been recently altered as the main entity using it complained of "too many regions". My query and interest centres around how it happens that two HBASE setups with a big discrepancy in the number of nodes can end up with regions in the 400-500 range. Yours, in ignorance Marc
