Re: Number of Regions with small Tables

Christian Pfarr Mon, 12 Jul 2021 04:45:13 -0700

ah, ok... thought this was done by the balancer...

normalizer is enabled (checked via hbase shell), but with no special 
configuration than in hbase-default.xml


We run hbase 1.5.0 atm...

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

Mallikarjun <[email protected]> schrieb am Montag, 12. Juli 2021 um 
13:16:

> Do you have any configuration for Region Normalizer (
> 

> https://hbase.apache.org/book.html#normalizer) or something?
> 

> Balancer does not split or merge regions. AFAIK, split policy controlled by
> 

> `hbase.regionserver.region.split.policy` does the splitting and there is
> 

> nothing similar for merges.
> 

> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 

> Mallikarjun
> 

> On Mon, Jul 12, 2021 at 2:48 PM Christian Pfarr [email protected]
> 

> wrote:
> 

> > Hello @all,
> > 

> > i´ve a quesion regarding controlling the number of regions on small tables
> > 

> > in HBase.
> > 

> > But first i have to give you some hints about our Usecase.
> > 

> > We´ve built a lambda architecture with HDFS (Batch), HBase(Speed) and
> > 

> > Drill as Serving Layer where we are combining Parquet Files from HDFS with
> > 

> > HBase Rows that are newer then the most recent Row in HDFS.
> > 

> > The HBase table is filled in realtime via Nifi, while it is cleaned up
> > 

> > every Batch (nightly) so that Drill can put the most workload on HDFS.
> > 

> > Unfortunately the hbase table is very small and because of this, we have
> > 

> > only one region and because of that, drill cannot parallelize the query,
> > 

> > which leads to long query times.
> > 

> > If i pre-split the hbase table everything is fine, until the balancer
> > 

> > comes and merges the small regions. So after a few hours everything is slow
> > 

> > again :-/
> > 

> > So... my question is now, whats the best way to handle these parallization
> > 

> > issue.
> > 

> > I thought about setting hbase.hregion.max.filesize to a very small
> > 

> > number, for example HDFS Blocksize = 128 MB but i´m not shure if this leads
> > 

> > to new problems.
> > 

> > What do you think? Is there a better way to handle this?
> > 

> > Regards,
> > 

> > z0ltrix

publickey - [email protected] - 0xF0E154C5.asc
Description: application/pgp-keys

signature.asc
Description: OpenPGP digital signature

Re: Number of Regions with small Tables

Reply via email to