Hi we are using  CDH 5.7 HBase 1.2

we are doing a performance testing over HBase through regular Load, which
has 4 Region Servers.

Input Data is compressed binary files around 2TB, which we process and
write as Key-Value pairs to HBase.
the output data size in  HBase is almost 4 times around 8TB, because we are
writing as text.
this process is a Map-Reduce Job,

when we are doing the load, we observed there's a lot of GC happening on
Region Server's ,so we changed couple of  parameters to decrease the GC
time.

we increased the flush size to 128MB to 1 GB and compactionThreshold to 50
and  regionserver.maxlogs to 42
following are the configuration we changed from default.


hbase.hregion.memstore.flush.size = 1 GB
hbase.hstore.max.filesize=10GB
hbase.hregion.preclose.flush.size= 50 MB

hbase.hstore.compactionThreshold=50
hbase.regionserver.maxlogs=42

after the load, we observed that HBase table has only 4 regions with each
of size around 2.5 TB

i am trying to understand, what configuration parameter caused this issue.

i was going through this article
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/

Region split policy in our HBase is
org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy
according to Region Split policy, Region Server should create regions when
the region size limit is exceeded.
can some one explain me the root cause.


Thanks,
Yeshwanth

Reply via email to