Re: MemStoreFlusher: region has too many store files (client timeout)

kzurek Tue, 05 Feb 2013 06:52:54 -0800

Thanks, but I've already read that article and I was trying to set up
settings accordingly. But any type of help could be useful. Here are logs
(compressed with tar gzip):  hbase_logs.tar.gz
<http://apache-hbase.679495.n3.nabble.com/file/n4037960/hbase_logs.tar.gz>



Kevin O'dell wrote
> kzurek,
> 
>   Okay, that is a fair flush size(I have not seen larger than 256MB).  Do
> you think you could send me a RS log from yesterday while you were seeing
> this issue?  I would just like to thumb through it and make some tuning
> recommendations.  I have a pretty good idea of what you need to do, if you
> look at:
> 
> http://gbif.blogspot.com/2012/07/optimizing-writes-in-hbase.html The
> article has some good ideas for write tuning.
> 
> On Tue, Feb 5, 2013 at 3:39 AM, kzurek &lt;

> kzurek@

> &gt; wrote:
> 
>> Thanks for the reply, although I should clear some misunderstandings. In
>> general, I do know general behavior and difference between minor and
>> major
>> compaction, as well as when minor compaction might become (could be
>> promoted) to major compaction. I just wanted to verify influence of
>> compaction (mostly major) on our cluster performance. Thus, I've created
>> test where I'm putting data to only one region (total 71) by one single
>> threaded client using build in caching mechanism (according to "HBase The
>> Definitive Guide" book) and triggering major compaction by hand
>> (HBaseAdmin). Although, after few tests I've noticed that major
>> compaction
>> (Large Compaction) is being triggered (cache flusher, recursive queue) so
>> I
>> left it as it was (not triggering it anymore). That brought me to this
>> situation, where I'm putting data and after a while I'm getting timeouts
>> on
>> the client, in meanwhile I see that memstore is being flush which cant
>> create new store file (cause there are to many of them) and which is
>> frequently blocked by compaction process. I hope that this short
>> description
>> will bring closer look at the issue. In addition, here are some answers
>> to
>> your questions:
>>
>> 1. How often are you flushing?
>>  I'm not triggering flushing by hand, but I've noticed that data is being
>> flushed every 4s (275m) or 1m 30s-40s (1.5g).
>> 2. How often are you force flushing from HLog rolls?
>> Default settings are: blocksize=64 MB, rollsize=60.8 MB, enabled=true,
>> optionallogflushinternal=1000ms. It seems that roll is made every hour.
>> 3. What size are your flushes?
>>  Depends, from 275m up to 1.5g. I've set my memstore flush size to 256m
>> and
>> memstore block multiplier to 6. Should I increase the flush size??
>> 4. What does your region count look like as that can affect your flush
>> size?
>> Initial split is 37 regions on 6 RegionServers, but at the moment there
>> are
>> 71 regions.
>>
>>
>>
>> Kevin O'dell wrote
>> > Kzurek,
>> >
>> > Just because you turn off time based major compactions, it does not
>> mean
>> > that you have turned major compaction off.  Compaction can still be
>> > promoted to be Majors.  Also, the only real difference between a major
>> and
>> > minor compaction is one processes deletes.  You should really schedule
>> at
>> > least daily major compactions.  As for your blocking issue, there are
>> > quite
>> > a few things you would want to look at:
>> >
>> > How often are you flushing?
>> >
>> > How often are you force flushing from HLog rolls?
>> >
>> > What size are your flushes?
>> >
>> > What does your region count look like as that can affect your flush
>> size?
>> >
>> > etc
>> >
>> > When I see HBase blocking constantly it is usually a sign that you need
>> to
>> > do some fine grain tuning.
>> >
>> > On Mon, Feb 4, 2013 at 7:45 AM, kzurek &lt;
>>
>> > kzurek@
>>
>> > &gt; wrote:
>> >
>> >> I'm facing some issues regarding to major compaction. I've disabled
>> major
>> >> compaction and it is not triggered manually, but when I'm loading data
>> to
>> >> selected region, I saw that major compaction queue is growing and it
>> is
>> >> being triggered ('Large Compaction' in logs) quite often (mainly due
>> to
>> >> cacheFlusher). Moreover, I've noticed that my clients app gets timeout
>> >> while
>> >> putting data into the cluster (happens when memory store flusher is
>> >> trying
>> >> to dump memory content to store file, but it cannot due to too many
>> store
>> >> files), also drop in data rate, which in this case is obvious, is
>> >> noticeable.  For me, it looks like compaction process is not fast
>> enough
>> >> comparing to incoming rate of data or ... maybe something else?? and
>> >> finally
>> >> is blocking the update process.
>> >>
>> >> Basic information:
>> >> HBase Version: 0.92.1, r1298924
>> >> Hadoop Version: 1.0.3, r1335192
>> >>
>> >> Logs:
>> >> 2013-02-01 15:43:14,627 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large
>> Compaction
>> >> requested:
>> >>
>> >>
>> regionName=test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.,
>> >> storeName=data, fileCount=3, fileSize=478.3m (249.8m, 113.7m, 114.7m),
>> >> priority=-3, time=1051078047102762; Because:
>> >> regionserver60020.cacheFlusher;
>> >> compaction_queue=(48:0), split_queue=0
>> >> 2013-02-01 15:43:14,627 DEBUG
>> org.apache.hadoop.hbase.regionserver.Store:
>> >> 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact
>> >> 2013-02-01 15:43:14,709 WARN org.apache.hadoop.ipc.HBaseServer:
>> >> (responseTooSlow):
>> >>
>> >>
>> {"processingtimems":61908,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@57a56081
>> >> ),
>> >> rpc version=1, client version=29,
>> >> methodsFingerPrint=54742778","client":"192.168.1.68:49419
>> >>
>> ","starttimems":1359729732799,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
>> >> 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>> >> Server
>> >> Responder, call
>> multi(org.apache.hadoop.hbase.client.MultiAction@57a56081
>> >> ),
>> >> rpc version=1, client version=29, methodsFingerPrint=54742778 from
>> >> 192.168.1.68:49419: output error
>> >> 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>> >> Server
>> >> handler 3 on 60020 caught: java.nio.channels.ClosedChannelException
>> >>         at
>> >>
>> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>> >>         at
>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1710)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:924)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1003)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:409)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346)
>> >>
>> >> 2013-02-01 15:43:19,397 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> Flush requested on
>> >>
>> >>
>> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.
>> >> 2013-02-01 15:43:19,397 WARN
>> >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
>> >>
>> >>
>> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.
>> >> has too many store files; delaying flush up to 90000ms
>> >> 2013-02-01 15:43:19,397 DEBUG
>> org.apache.hadoop.hbase.regionserver.Store:
>> >> 3b710693d6314c2a987b07dd82451158 - data: no store files to compact
>> >> 2013-02-01 15:43:19,397 DEBUG
>> org.apache.hadoop.hbase.regionserver.Store:
>> >> 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact
>> >> 2013-02-01 15:43:55,693 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> Blocking updates for 'IPC Server handler 10 on 60020' on region
>> >>
>> >>
>> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.:
>> >> memstore size 1.5g is >= than blocking 1.5g size
>> >> 2013-02-01 15:44:49,452 INFO
>> >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Waited 90055ms
>> on
>> a
>> >> compaction to clean up 'too many store files'; waited long enough...
>> >> proceeding with flush of
>> >>
>> >>
>> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.
>> >> 2013-02-01 15:44:49,452 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> Started memstore flush for
>> >>
>> >>
>> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.,
>> >> current region memstore size 1.5g
>> >> 2013-02-01 15:44:49,453 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> Finished snapshotting
>> >>
>> >>
>> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.,
>> >> commencing wait for mvcc, flushsize=1624094576
>> >> 2013-02-01 15:44:50,014 DEBUG
>> >> org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with
>> >> CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false]
>> >> [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false]
>> >> [cacheEvictOnClose=false] [cacheCompressed=false]
>> >> 2013-02-01 15:44:55,082 DEBUG
>> org.apache.hadoop.hbase.regionserver.Store:
>> >> Renaming flushed file at
>> >>
>> >>
>> hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/.tmp/7ddee02376664ac387eb3c786c541ed0
>> >> to
>> >>
>> >>
>> hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0
>> >> 2013-02-01 15:44:55,086 INFO
>> org.apache.hadoop.hbase.regionserver.Store:
>> >> Added
>> >>
>> >>
>> hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0,
>> >> entries=7157100, sequenceid=58642, memsize=1.5g, filesize=113.8m
>> >> 2013-02-01 15:44:55,087 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> Unblocking updates for region
>> >>
>> >>
>> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.
>> >> 'IPC Server handler 10 on 60020'
>> >> 2013-02-01 15:44:55,087 INFO
>> >> org.apache.hadoop.hbase.regionserver.HRegion:
>> >> Finished memstore flush of ~1.5g/1624094576, currentsize=0.0/0 for
>> region
>> >>
>> >>
>> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.
>> >> in 5635ms, sequenceid=58642, compaction requested=false
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887.html
>> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >>
>> >
>> >
>> >
>> > --
>> > Kevin O'Dell
>> > Customer Operations Engineer, Cloudera
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887p4037949.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
> 
> 
> 
> -- 
> Kevin O'Dell
> Customer Operations Engineer, Cloudera





--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887p4037960.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: MemStoreFlusher: region has too many store files (client timeout)

Reply via email to