Do you have any configuration for Region Normalizer ( https://hbase.apache.org/book.html#normalizer) or something?
Balancer does not split or merge regions. AFAIK, split policy controlled by `hbase.regionserver.region.split.policy` does the splitting and there is nothing similar for merges. --- Mallikarjun On Mon, Jul 12, 2021 at 2:48 PM Christian Pfarr <[email protected]> wrote: > Hello @all, > > i´ve a quesion regarding controlling the number of regions on small tables > in HBase. > But first i have to give you some hints about our Usecase. > > We´ve built a lambda architecture with HDFS (Batch), HBase(Speed) and > Drill as Serving Layer where we are combining Parquet Files from HDFS with > HBase Rows that are newer then the most recent Row in HDFS. > The HBase table is filled in realtime via Nifi, while it is cleaned up > every Batch (nightly) so that Drill can put the most workload on HDFS. > Unfortunately the hbase table is very small and because of this, we have > only one region and because of that, drill cannot parallelize the query, > which leads to long query times. > > If i pre-split the hbase table everything is fine, until the balancer > comes and merges the small regions. So after a few hours everything is slow > again :-/ > > So... my question is now, whats the best way to handle these parallization > issue. > I thought about setting hbase.hregion.max.filesize to a very small > number, for example HDFS Blocksize = 128 MB but i´m not shure if this leads > to new problems. > > What do you think? Is there a better way to handle this? > > Regards, > z0ltrix > > >
