Hey all, We run a bunch of big hbase clusters that get used by hundreds of product teams for a variety of real-time workloads. We are a B2B company, so most data has a customerId somewhere in the rowkey. As the team that owns the hbase infrastructure, we try to help product teams properly design schemas to avoid hotspotting, but inevitably it happens. It may not necessarily just be hotspotting, but for example request volume may not be evenly distributed across all regions of a table.
This hotspotting/distribution issue makes it hard for the balancer to keep the cluster balanced from a load perspective -- sure, all RS have the same number of regions, but those regions are not all created equal from a load perspective. This results in cases where one RS might be consistently at 70% cpu, another might be at 30%, and all the rest are in a band in-between. We already have a normalizer job which works similarly to the SimpleRegionNormalizer -- keeping regions approximately the same size from a data size perspective. I'm considering updating our normalizer to also take into account region load. My general plan is to follow a similar strategy to the balancer -- keep a configurable number of RegionLoad objects in memory per-region, and extract averages for readRequestsCount from those. If a region's average load is > some threshold relative to other regions in the same table, split it. If it's < some threshold relative to other regions in the same table, merge it. I'm writing because I'm wondering if anyone else has had this problem and if there exists prior art here. Is there a reason HBase does not provide a configurable load-based normalizer (beyond typical OSS reasons -- no one contributed it yet)? Thanks!