I'm facing some issues regarding to major compaction. I've disabled major compaction and it is not triggered manually, but when I'm loading data to selected region, I saw that major compaction queue is growing and it is being triggered ('Large Compaction' in logs) quite often (mainly due to cacheFlusher). Moreover, I've noticed that my clients app gets timeout while putting data into the cluster (happens when memory store flusher is trying to dump memory content to store file, but it cannot due to too many store files), also drop in data rate, which in this case is obvious, is noticeable. For me, it looks like compaction process is not fast enough comparing to incoming rate of data or ... maybe something else?? and finally is blocking the update process.
Basic information: HBase Version: 0.92.1, r1298924 Hadoop Version: 1.0.3, r1335192 Logs: 2013-02-01 15:43:14,627 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction requested: regionName=test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., storeName=data, fileCount=3, fileSize=478.3m (249.8m, 113.7m, 114.7m), priority=-3, time=1051078047102762; Because: regionserver60020.cacheFlusher; compaction_queue=(48:0), split_queue=0 2013-02-01 15:43:14,627 DEBUG org.apache.hadoop.hbase.regionserver.Store: 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact 2013-02-01 15:43:14,709 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":61908,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@57a56081), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"192.168.1.68:49419","starttimems":1359729732799,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@57a56081), rpc version=1, client version=29, methodsFingerPrint=54742778 from 192.168.1.68:49419: output error 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) at org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1710) at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:924) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1003) at org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:409) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346) 2013-02-01 15:43:19,397 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. 2013-02-01 15:43:19,397 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. has too many store files; delaying flush up to 90000ms 2013-02-01 15:43:19,397 DEBUG org.apache.hadoop.hbase.regionserver.Store: 3b710693d6314c2a987b07dd82451158 - data: no store files to compact 2013-02-01 15:43:19,397 DEBUG org.apache.hadoop.hbase.regionserver.Store: 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact 2013-02-01 15:43:55,693 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 10 on 60020' on region test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.: memstore size 1.5g is >= than blocking 1.5g size 2013-02-01 15:44:49,452 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Waited 90055ms on a compaction to clean up 'too many store files'; waited long enough... proceeding with flush of test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. 2013-02-01 15:44:49,452 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., current region memstore size 1.5g 2013-02-01 15:44:49,453 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., commencing wait for mvcc, flushsize=1624094576 2013-02-01 15:44:50,014 DEBUG org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false] 2013-02-01 15:44:55,082 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/.tmp/7ddee02376664ac387eb3c786c541ed0 to hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0 2013-02-01 15:44:55,086 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0, entries=7157100, sequenceid=58642, memsize=1.5g, filesize=113.8m 2013-02-01 15:44:55,087 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. 'IPC Server handler 10 on 60020' 2013-02-01 15:44:55,087 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~1.5g/1624094576, currentsize=0.0/0 for region test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. in 5635ms, sequenceid=58642, compaction requested=false -- View this message in context: http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887.html Sent from the HBase User mailing list archive at Nabble.com.