Hey Marc, Possibly that happens because of your split policy which relies on the memstore flush size and the number of regions for the table hosted by the particular region server. That would lead that the first cluster would have more regions of smaller size.
Thanks, Sergey On Wed, Sep 29, 2021 at 8:41 AM Marc Hoppins <marc.hopp...@eset.com> wrote: > Hi all, > > I would guess that this topic has probably been raised umpteen times in > the past, so I apologise in advance if some of you are miffed. I am not a > 'big data' database person so all this tuning mucky-muck has me a bit > confused. > > We currently have two clusters: > > Cluster 1 > 76 regionservers, each with 7TB of available HDFS and 64GB of RAM > Approx 424 regions per RS > > Config > hbase.client.write.buffer = 4MB > hbase.regionserver.handler.count = 30 > hbase.hregion.memstore.flush.size = 128MB > hbase.hregion.memstore.block.multiplier = 2 > hbase.hregion.max.filesize = 10GB > > Cluster 2 > 10 regionservers, each with 7TB of available HDFS and (min) 128GB of RAM > Approx.. 483 regions per RS > > Config: > > hbase.client.write.buffer = 2MB > hbase.regionserver.handler.count = 100 > hbase.hregion.memstore.flush.size = 1GB > hbase.hregion.memstore.block.multiplier = 32 > hbase.hregion.max.filesize = 6GB > > The number of regions per region server seems to be approximately > consistent given the number and configuration of regionservers and also > despite the difference in configuration. To begin with, the two clusters > (cloudera) were setup using defaults but cluster 2 has been recently > altered as the main entity using it complained of "too many regions". > > My query and interest centres around how it happens that two HBASE setups > with a big discrepancy in the number of nodes can end up with regions in > the 400-500 range. > > Yours, in ignorance > > Marc >