HotSpot detection/mitigation worker?

Bryan Beaudreault Mon, 17 May 2021 13:03:05 -0700

Hey all,

We run a bunch of big hbase clusters that get used by hundreds of product
teams for a variety of real-time workloads. We are a B2B company, so most
data has a customerId somewhere in the rowkey. As the team that owns the
hbase infrastructure, we try to help product teams properly design schemas
to avoid hotspotting, but inevitably it happens. It may not necessarily
just be hotspotting, but for example request volume may not be evenly
distributed across all regions of a table.


This hotspotting/distribution issue makes it hard for the balancer to keep
the cluster balanced from a load perspective -- sure, all RS have the same
number of regions, but those regions are not all created equal from a load
perspective. This results in cases where one RS might be consistently at
70% cpu, another might be at 30%, and all the rest are in a band in-between.

We already have a normalizer job which works similarly to the
SimpleRegionNormalizer -- keeping regions approximately the same size from
a data size perspective. I'm considering updating our normalizer to also
take into account region load.

My general plan is to follow a similar strategy to the balancer -- keep a
configurable number of RegionLoad objects in memory per-region, and extract
averages for readRequestsCount from those. If a region's average load is >
some threshold relative to other regions in the same table, split it. If
it's < some threshold relative to other regions in the same table, merge it.

I'm writing because I'm wondering if anyone else has had this problem and
if there exists prior art here. Is there a reason HBase does not provide a
configurable load-based normalizer (beyond typical OSS reasons -- no one
contributed it yet)?

Thanks!

HotSpot detection/mitigation worker?

Reply via email to