Thanks, but I've already read that article and I was trying to set up settings accordingly. But any type of help could be useful. Here are logs (compressed with tar gzip): hbase_logs.tar.gz <http://apache-hbase.679495.n3.nabble.com/file/n4037960/hbase_logs.tar.gz>
Kevin O'dell wrote > kzurek, > > Okay, that is a fair flush size(I have not seen larger than 256MB). Do > you think you could send me a RS log from yesterday while you were seeing > this issue? I would just like to thumb through it and make some tuning > recommendations. I have a pretty good idea of what you need to do, if you > look at: > > http://gbif.blogspot.com/2012/07/optimizing-writes-in-hbase.html The > article has some good ideas for write tuning. > > On Tue, Feb 5, 2013 at 3:39 AM, kzurek < > kzurek@ > > wrote: > >> Thanks for the reply, although I should clear some misunderstandings. In >> general, I do know general behavior and difference between minor and >> major >> compaction, as well as when minor compaction might become (could be >> promoted) to major compaction. I just wanted to verify influence of >> compaction (mostly major) on our cluster performance. Thus, I've created >> test where I'm putting data to only one region (total 71) by one single >> threaded client using build in caching mechanism (according to "HBase The >> Definitive Guide" book) and triggering major compaction by hand >> (HBaseAdmin). Although, after few tests I've noticed that major >> compaction >> (Large Compaction) is being triggered (cache flusher, recursive queue) so >> I >> left it as it was (not triggering it anymore). That brought me to this >> situation, where I'm putting data and after a while I'm getting timeouts >> on >> the client, in meanwhile I see that memstore is being flush which cant >> create new store file (cause there are to many of them) and which is >> frequently blocked by compaction process. I hope that this short >> description >> will bring closer look at the issue. In addition, here are some answers >> to >> your questions: >> >> 1. How often are you flushing? >> I'm not triggering flushing by hand, but I've noticed that data is being >> flushed every 4s (275m) or 1m 30s-40s (1.5g). >> 2. How often are you force flushing from HLog rolls? >> Default settings are: blocksize=64 MB, rollsize=60.8 MB, enabled=true, >> optionallogflushinternal=1000ms. It seems that roll is made every hour. >> 3. What size are your flushes? >> Depends, from 275m up to 1.5g. I've set my memstore flush size to 256m >> and >> memstore block multiplier to 6. Should I increase the flush size?? >> 4. What does your region count look like as that can affect your flush >> size? >> Initial split is 37 regions on 6 RegionServers, but at the moment there >> are >> 71 regions. >> >> >> >> Kevin O'dell wrote >> > Kzurek, >> > >> > Just because you turn off time based major compactions, it does not >> mean >> > that you have turned major compaction off. Compaction can still be >> > promoted to be Majors. Also, the only real difference between a major >> and >> > minor compaction is one processes deletes. You should really schedule >> at >> > least daily major compactions. As for your blocking issue, there are >> > quite >> > a few things you would want to look at: >> > >> > How often are you flushing? >> > >> > How often are you force flushing from HLog rolls? >> > >> > What size are your flushes? >> > >> > What does your region count look like as that can affect your flush >> size? >> > >> > etc >> > >> > When I see HBase blocking constantly it is usually a sign that you need >> to >> > do some fine grain tuning. >> > >> > On Mon, Feb 4, 2013 at 7:45 AM, kzurek < >> >> > kzurek@ >> >> > > wrote: >> > >> >> I'm facing some issues regarding to major compaction. I've disabled >> major >> >> compaction and it is not triggered manually, but when I'm loading data >> to >> >> selected region, I saw that major compaction queue is growing and it >> is >> >> being triggered ('Large Compaction' in logs) quite often (mainly due >> to >> >> cacheFlusher). Moreover, I've noticed that my clients app gets timeout >> >> while >> >> putting data into the cluster (happens when memory store flusher is >> >> trying >> >> to dump memory content to store file, but it cannot due to too many >> store >> >> files), also drop in data rate, which in this case is obvious, is >> >> noticeable. For me, it looks like compaction process is not fast >> enough >> >> comparing to incoming rate of data or ... maybe something else?? and >> >> finally >> >> is blocking the update process. >> >> >> >> Basic information: >> >> HBase Version: 0.92.1, r1298924 >> >> Hadoop Version: 1.0.3, r1335192 >> >> >> >> Logs: >> >> 2013-02-01 15:43:14,627 DEBUG >> >> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large >> Compaction >> >> requested: >> >> >> >> >> regionName=test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., >> >> storeName=data, fileCount=3, fileSize=478.3m (249.8m, 113.7m, 114.7m), >> >> priority=-3, time=1051078047102762; Because: >> >> regionserver60020.cacheFlusher; >> >> compaction_queue=(48:0), split_queue=0 >> >> 2013-02-01 15:43:14,627 DEBUG >> org.apache.hadoop.hbase.regionserver.Store: >> >> 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact >> >> 2013-02-01 15:43:14,709 WARN org.apache.hadoop.ipc.HBaseServer: >> >> (responseTooSlow): >> >> >> >> >> {"processingtimems":61908,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@57a56081 >> >> ), >> >> rpc version=1, client version=29, >> >> methodsFingerPrint=54742778","client":"192.168.1.68:49419 >> >> >> ","starttimems":1359729732799,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} >> >> 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC >> >> Server >> >> Responder, call >> multi(org.apache.hadoop.hbase.client.MultiAction@57a56081 >> >> ), >> >> rpc version=1, client version=29, methodsFingerPrint=54742778 from >> >> 192.168.1.68:49419: output error >> >> 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC >> >> Server >> >> handler 3 on 60020 caught: java.nio.channels.ClosedChannelException >> >> at >> >> >> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) >> >> at >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) >> >> at >> >> >> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1710) >> >> at >> >> >> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653) >> >> at >> >> >> >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:924) >> >> at >> >> >> >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1003) >> >> at >> >> >> >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:409) >> >> at >> >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346) >> >> >> >> 2013-02-01 15:43:19,397 DEBUG >> >> org.apache.hadoop.hbase.regionserver.HRegion: >> >> Flush requested on >> >> >> >> >> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. >> >> 2013-02-01 15:43:19,397 WARN >> >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region >> >> >> >> >> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. >> >> has too many store files; delaying flush up to 90000ms >> >> 2013-02-01 15:43:19,397 DEBUG >> org.apache.hadoop.hbase.regionserver.Store: >> >> 3b710693d6314c2a987b07dd82451158 - data: no store files to compact >> >> 2013-02-01 15:43:19,397 DEBUG >> org.apache.hadoop.hbase.regionserver.Store: >> >> 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact >> >> 2013-02-01 15:43:55,693 INFO >> >> org.apache.hadoop.hbase.regionserver.HRegion: >> >> Blocking updates for 'IPC Server handler 10 on 60020' on region >> >> >> >> >> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.: >> >> memstore size 1.5g is >= than blocking 1.5g size >> >> 2013-02-01 15:44:49,452 INFO >> >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Waited 90055ms >> on >> a >> >> compaction to clean up 'too many store files'; waited long enough... >> >> proceeding with flush of >> >> >> >> >> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. >> >> 2013-02-01 15:44:49,452 DEBUG >> >> org.apache.hadoop.hbase.regionserver.HRegion: >> >> Started memstore flush for >> >> >> >> >> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., >> >> current region memstore size 1.5g >> >> 2013-02-01 15:44:49,453 DEBUG >> >> org.apache.hadoop.hbase.regionserver.HRegion: >> >> Finished snapshotting >> >> >> >> >> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., >> >> commencing wait for mvcc, flushsize=1624094576 >> >> 2013-02-01 15:44:50,014 DEBUG >> >> org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with >> >> CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] >> >> [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] >> >> [cacheEvictOnClose=false] [cacheCompressed=false] >> >> 2013-02-01 15:44:55,082 DEBUG >> org.apache.hadoop.hbase.regionserver.Store: >> >> Renaming flushed file at >> >> >> >> >> hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/.tmp/7ddee02376664ac387eb3c786c541ed0 >> >> to >> >> >> >> >> hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0 >> >> 2013-02-01 15:44:55,086 INFO >> org.apache.hadoop.hbase.regionserver.Store: >> >> Added >> >> >> >> >> hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0, >> >> entries=7157100, sequenceid=58642, memsize=1.5g, filesize=113.8m >> >> 2013-02-01 15:44:55,087 INFO >> >> org.apache.hadoop.hbase.regionserver.HRegion: >> >> Unblocking updates for region >> >> >> >> >> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. >> >> 'IPC Server handler 10 on 60020' >> >> 2013-02-01 15:44:55,087 INFO >> >> org.apache.hadoop.hbase.regionserver.HRegion: >> >> Finished memstore flush of ~1.5g/1624094576, currentsize=0.0/0 for >> region >> >> >> >> >> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. >> >> in 5635ms, sequenceid=58642, compaction requested=false >> >> >> >> >> >> >> >> -- >> >> View this message in context: >> >> >> http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887.html >> >> Sent from the HBase User mailing list archive at Nabble.com. >> >> >> > >> > >> > >> > -- >> > Kevin O'Dell >> > Customer Operations Engineer, Cloudera >> >> >> >> >> >> -- >> View this message in context: >> http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887p4037949.html >> Sent from the HBase User mailing list archive at Nabble.com. >> > > > > -- > Kevin O'Dell > Customer Operations Engineer, Cloudera -- View this message in context: http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887p4037960.html Sent from the HBase User mailing list archive at Nabble.com.