Hello @all,

i´ve a quesion regarding controlling the number of regions on small tables in 
HBase.
But first i have to give you some hints about our Usecase.

We´ve built a lambda architecture with HDFS (Batch), HBase(Speed) and Drill as 
Serving Layer where we are combining Parquet Files from HDFS with HBase Rows 
that are newer then the most recent Row in HDFS.
The HBase table is filled in realtime via Nifi, while it is cleaned up every 
Batch (nightly) so that Drill can put the most workload on HDFS.
Unfortunately the hbase table is very small and because of this, we have only 
one region and because of that, drill cannot parallelize the query, which leads 
to long query times.

If i pre-split the hbase table everything is fine, until the balancer comes and 
merges the small regions. So after a few hours everything is slow again :-/

So... my question is now, whats the best way to handle these parallization 
issue.
I thought about setting hbase.hregion.max.filesize to a very small number, for 
example HDFS Blocksize = 128 MB but i´m not shure if this leads to new problems.

What do you think? Is there a better way to handle this?

Regards,
z0ltrix

Attachment: publickey - [email protected] - 0xF0E154C5.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to