This is very informative and helpful, I will try changing the settings and will report back.
On Mon, Mar 14, 2011 at 11:54 AM, Jean-Daniel Cryans <jdcry...@apache.org>wrote: > Alright so here's a preliminary report: > > - No compression is stable for me too, short pauses. > - LZO gave me no problems either, generally faster than no compression. > - GZ initially gave me weird results, but I quickly saw that I forgot > to copy over the native libs from the hadoop folder so my logs were > full of: > > 2011-03-14 10:20:29,624 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,626 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,628 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,630 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,632 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,634 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2011-03-14 10:20:29,636 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > > I copied the libs over, bounced the region servers, and the > performance was much more stable until a point where I got a 20 > seconds pause, and looking at the logs I see: > > 2011-03-14 10:31:17,625 WARN > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region > test,,1300127266461.9d0eb095b77716c22cd5c78bb503c744. has too many > store files; delaying flush up to 90000ms > > (our config sets the block at 20 store files instead of the default > which is around 12 IIRC) > > Quickly followed by a bunch of: > > 2011-03-14 10:31:26,757 INFO > org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for > 'IPC Server handler 20 on 60020' on region > test,,1300127266461.9d0eb095b77716c22cd5c78bb503c744.: memstore size > 285.6m is >= than blocking 256.0m size > > (our settings make it that we won't block on memstores until 4x their > sizes, in your case you may see a 2x blocking factor so 128MB which is > default) > > The reason is that our memstores, once flushed, occupy a very small > space, consider this: > > 2011-03-14 10:31:16,606 INFO > org.apache.hadoop.hbase.regionserver.Store: Added > > hdfs://sv2borg169:9000/hbase/test/9d0eb095b77716c22cd5c78bb503c744/test/420552941380451032, > entries=216000, sequenceid=70556635737, memsize=64.3m, filesize=6.0m > > It means that it will create tiny files of ~6MB and the compactor will > spend all it's time merging those files until a point where HBase must > stop inserting in order to not blow its available memory. Thus, the > same data will get rewritten a couple of times. > > Normally, and by that I mean a system where you're not just trying to > insert data ASAP but where most of your workload is made up of reads, > this works well as the memstores are filled much more slowly and > compactions happen at a normal pace. > > If you search around the interwebs for tips on speeding up HBase > inserts, you'll often see the configs I referred to earlier: > > <name>hbase.hstore.blockingStoreFiles</name> > <value>20</value> > and > <name>hbase.hregion.memstore.block.multiplier</name> > <value>4</value> > > They should work pretty well for most use cases that are made of heavy > writes given that the region servers have enough heap (eg more than 3 > or 4GB). You should also consider setting MAX_FILESIZE to >1GB to > limit the number of regions and MEMSTORE_FLUSHSIZE to >128MB to flush > bigger files. > > Hope this helps, > > J-D > > On Mon, Mar 14, 2011 at 10:29 AM, Jean-Daniel Cryans > <jdcry...@apache.org> wrote: > > Thanks for the report Bryan, I'll try your little program against one > > of our 0.90.1 cluster that has similar hardware. > > > > J-D > > > > On Sun, Mar 13, 2011 at 1:48 PM, Bryan Keller <brya...@gmail.com> wrote: > >> If interested, I wrote a small program that demonstrates the problem ( > http://vancameron.net/HBaseInsert.zip). It uses Gradle, so you'll need > that. To run, enter "gradle run". > >> > >> On Mar 13, 2011, at 12:14 AM, Bryan Keller wrote: > >> > >>> I am using the Java client API to write 10,000 rows with about 6000 > columns each, via 8 threads making multiple calls to the > HTable.put(List<Put>) method. I start with an empty table with one column > family and no regions pre-created. > >>> > >>> With compression turned off, I am seeing very stable performance. At > the start there are a couple of 10-20sec pauses where all insert threads > are blocked during a region split. Subsequent splits do not cause all of the > threads to block, presumably because there are more regions so no one region > split blocks all inserts. GCs for HBase during the insert is not a major > problem (6k/55sec). > >>> > >>> When using either LZO or gzip compression, however, I am seeing > frequent and long pauses, sometimes around 20 sec but often over 80 seconds > in my test. During these pauses all 8 of the threads writing to HBase are > blocked. The pauses happen throughout the insert process. GCs are higher in > HBase when using compression (60k, 4min), but it doesn't seem enough to > explain these pauses. Overall performance obviously suffers dramatically as > a result (about 2x slower). > >>> > >>> I have tested this in different configurations (single node, 4 nodes) > with the same result. I'm using HBase 0.90.1 (CDH3B4), Sun/Oracle Java > 1.6.0_24, CentOS 5.5, Hadoop LZO 0.4.10 from Cloudera. Machines have 12 > cores and 24 gb of RAM. Settings are pretty much default, nothing out of the > ordinary. I tried playing around with region handler count and memstore > settings, but these had no effect. > >>> > >> > >> > > >