I have a table in Hbase that sizes around 96Gb, I generate 4 regions of 30Gb. Some time, table starts to split because the max size for region is 1Gb (I just realize of that, I'm going to change it or create more pre-splits.).
There're two things that I don't understand. how is it creating the splits? right now I have 130 regions and growing. The problem is the size of the new regions: 1.7 M /hbase/filters/4ddbc34a2242e44c03121ae4608788a2 1.6 G /hbase/filters/548bdcec79cfe9a99fa57cb18f801be2 3.1 G /hbase/filters/58b50df089bd9d4d1f079f53238e060d 2.5 M /hbase/filters/5a0d6d5b3b8faf67889ac5f5c2947c4f 1.9 G /hbase/filters/5b0a35b5735a473b7e804c4b045ce374 883.4 M /hbase/filters/5b49c68e305b90d87b3c64a0eee60b8c 1.7 M /hbase/filters/5d43fd7ea9808ab7d2f2134e80fbfae7 632.4 M /hbase/filters/5f04c7cd450d144f88fb4c7cff0796a2 There're some new regions that they're just a some KBytes!. Why they are so small?? When does HBase decide to split? because it started to split two hours later to create the table. One, I create the table and insert data, I don't insert new data or modify them. Another interested point it's why there're major compactions: 2014-04-15 11:33:47,400 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/.tmp/df90c260cb4e4256a153dd178244f04c to hdfs://m01.cluster:8020/hbase/filters/ef994715505054299ede8c48c600cea4/d/df90c260cb4e4256a153dd178244f04c 2014-04-15 11:33:47,407 INFO org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded ROWCOL (CompoundBloomFilter) metadata for df90c260cb4e4256a153dd178244f04c 2014-04-15 11:33:47,416 INFO org.apache.hadoop.hbase.regionserver.Store:* Completed major compaction of 1 file*(s) in d of filters,51,1397554175140.ef994715505054299ede8c48c600cea4. into df90c260cb4e4256a153dd178244f04c, size=789.1 M; total size for store is 789.1 M 2014-04-15 11:33:47,416 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: completed compaction: regionName=filters,51,1397554175140.ef994715505054299ede8c48c600cea4., storeName=d, fileCount=1, fileSize=1.5 G, priority=6, time=414761474510060; duration=7sec I thought major compaction just happen once at day and compact many files per region. Data is always the same here, I don't inject new data. I'm working with 0.94.6 CDH44. I'm going to change the size of the regions, but, I would like to understand why things happen. Thank you.