Number of Regions with small Tables

Christian Pfarr Mon, 12 Jul 2021 02:18:07 -0700

Hello @all,

i´ve a quesion regarding controlling the number of regions on small tables in 
HBase.
But first i have to give you some hints about our Usecase.


We´ve built a lambda architecture with HDFS (Batch), HBase(Speed) and Drill as 
Serving Layer where we are combining Parquet Files from HDFS with HBase Rows 
that are newer then the most recent Row in HDFS.
The HBase table is filled in realtime via Nifi, while it is cleaned up every 
Batch (nightly) so that Drill can put the most workload on HDFS.
Unfortunately the hbase table is very small and because of this, we have only 
one region and because of that, drill cannot parallelize the query, which leads 
to long query times.

If i pre-split the hbase table everything is fine, until the balancer comes and 
merges the small regions. So after a few hours everything is slow again :-/

So... my question is now, whats the best way to handle these parallization 
issue.
I thought about setting hbase.hregion.max.filesize to a very small number, for 
example HDFS Blocksize = 128 MB but i´m not shure if this leads to new problems.

What do you think? Is there a better way to handle this?

Regards,
z0ltrix

publickey - [email protected] - 0xF0E154C5.asc
Description: application/pgp-keys

signature.asc
Description: OpenPGP digital signature

Number of Regions with small Tables

Reply via email to