Something wrong in snappy Library? Have you try to not use compression?
2016-06-03 11:13 GMT+08:00 吴国泉wgq <wgq...@qunar.com>: > HI STACK: > > 1. The log is very large,so I pick some of it. But it seems not > provide valuable info.Here is the region named > qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be can’t flush. > > When the flush Thread works well, The log is like this: > 2016-05-24 12:38:27,071 INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 16681 > 2016-05-24 12:38:37,071 INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 8684 > 2016-05-24 12:38:43,753 INFO [MemStoreFlusher.2] regionserver.HRegion: > Started memstore flush for > qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., current region > memstore size 305.3 K > 2016-05-24 12:38:43,753 WARN [MemStoreFlusher.2] wal.FSHLog: Couldn't > find oldest seqNum for the region we are about to flush: > [dd8f92e3c161a8534b30ab17c28ae8be] > 2016-05-24 12:38:43,816 INFO [MemStoreFlusher.2] > regionserver.DefaultStoreFlusher: Flushed, sequenceid=54259, memsize=305.3 > K, hasBloomFilter=true, into tmp file > hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/.tmp/fddbb05945354d5cbdae4afd24e5bb9d > 2016-05-24 12:38:43,822 DEBUG [MemStoreFlusher.2] > regionserver.HRegionFileSystem: Committing store file > hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/.tmp/fddbb05945354d5cbdae4afd24e5bb9d > as > hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/t/fddbb05945354d5cbdae4afd24e5bb9d > 2016-05-24 12:38:43,837 INFO [MemStoreFlusher.2] regionserver.HStore: > Added > hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/t/fddbb05945354d5cbdae4afd24e5bb9d, > entries=108, sequenceid=54259, filesize=68.3 K > 2016-05-24 12:38:43,837 INFO [MemStoreFlusher.2] regionserver.HRegion: > Finished memstore flush of ~305.3 K/312664, currentsize=0/0 for region > qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. in 84ms, > sequenceid=54259, compaction requested=true > > And when the flush Thread does not work well,The log just always > shows : > 2016-05-25 14:57:02,588 INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 18068 > 2016-05-25 14:57:12,587 INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 13165 > 2016-05-25 14:57:20,656 DEBUG [MemStoreFlusher.36] regionserver.HRegion: > NOT flushing memstore for region > qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, > writesEnabled=true > 2016-05-25 14:57:22,587 INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 5526 > 2016-05-25 14:57:28,113 DEBUG [MemStoreFlusher.34] regionserver.HRegion: > NOT flushing memstore for region > qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, > writesEnabled=true > 2016-05-25 14:57:32,587 INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 8178 > 2016-05-25 14:57:40,767 DEBUG [MemStoreFlusher.9] regionserver.HRegion: > NOT flushing memstore for region > qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, > writesEnabled=true > 2016-05-25 14:57:42,587 INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 22068 > 2016-05-25 14:57:52,587 INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 5492 > 2016-05-25 14:58:02,587 INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 10472 > 2016-05-25 14:58:04,655 DEBUG [MemStoreFlusher.23] regionserver.HRegion: > NOT flushing memstore for region > qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, > writesEnabled=true > 2016-05-25 14:58:12,587 INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 8435 > > I check the log before the region can’t flush, but find nothing > special from usual. > by the way, this table has a heavy load on writing, so I turn off > the WAL. > > > 2. It is the disk is filling faster than on other machines. Not the > instat show. > > 3.this is the jstack log of the memstoreflusher: > "MemStoreFlusher.14" prio=10 tid=0x00007fe69487d800 nid=0x387d waiting on > condition [0x00007fce7118c000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007fd09e031470> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264) > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233) > at java.lang.Thread.run(Thread.java:744) > > "MemStoreFlusher.13" prio=10 tid=0x00007fe69487b800 nid=0x387c runnable > [0x00007fce7128c000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232) > - locked <0x00007fcfe3ba1190> (a > org.apache.hadoop.io.compress.snappy.SnappyCompressor) > at > org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) > at > org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) > at > org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238) > at > org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70) > - locked <0x00007fd8d1778be8> (a java.lang.Object) > at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253) > at java.lang.Thread.run(Thread.java:744) > > "MemStoreFlusher.12" prio=10 tid=0x00007fe694879800 nid=0x387b waiting on > condition [0x00007fce7138e000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007fd09e031470> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264) > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233) > at java.lang.Thread.run(Thread.java:744) > > 在 2016年6月3日,上午12:08,Stack <st...@duboce.net<mailto:st...@duboce.net>> 写道: > > On Wed, Jun 1, 2016 at 3:03 AM, 吴国泉wgq <wgq...@qunar.com<mailto: > wgq...@qunar.com>> wrote: > > hi all: > > 1.Is region always on same machine or do you see this phenomenon on more > than one machine? > Not always on the same machine, but always on the machine which > hold the first region of a table(the only table that its first region > cannot flush,when restart the regionserver, the first region would move to > another machine) > > 2.The RS is ABORTED? Because it can't flush? Is that what it says in the > log? Can we see the log message around the ABORT? > sorry,I did not express clear here. It is the MemStore of the first > region can’t flush, not the RS. > The RS Log is like this: > INFO [regionserver60020.periodicFlusher] > regionserver.HRegionServer: regionserver60020.periodicFlusher requesting > flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. > after a delay of 9452 > DEBUG [MemStoreFlusher.39] regionserver.HRegion > memstore for region > qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, > writesEnabled=true > > And the web UI shows: > Aborted flushing .(Not flushing since already flushing.) But the > Flusher thread never finish. > > > > > Can you back up in the log for this region? It looks like the first flush > attempt failed or never completed. Perhaps there is a clue in the log > message from the first attempt as to why the failure. The above log seems > to be from a later stage, when we have gotten into the unhealthy state > where we can't flush the first region memstore. > > > > 3.100% disk only? The CPU does not go up too? Can we see a thread dump? > Do jstack -l PID if you can > Only the disk usage(command : df -h) increases faster than other > machine, not the IO usage. The usage of CPU is very low. > > > > Sorry. Your disk is filling faster than on other machines? Or the iostat > shows the disk is doing more seeks than on other machines? > > > > 4.Any other special configurations going on on this install? Phoenix or > use of Coprocessors? > NO, no phoenix. Only AccessController coprocessor. > > 5.If you thread dump a few times, it is always stuck here? > Yes, always stuck here. here is the jstack log.(in this log, it is > the MemStoreFlusher.13 can ’t flush) > > > PS: As I see. I think this is because the first region cannot > flush cause the problem. But I do not know why it can’t flush and why just > the first region of the only table has this problem. > > > > Please include pointer to the stack trace. You can't have attachments on > mail to this list. > > Thank you, > St.Ack > > > > > > 在 2016年6月1日,上午3:10,Stack <st...@duboce.net<mailto:st...@duboce.net>> 写道: > > On Mon, May 30, 2016 at 7:03 PM, 聪聪 <175998...@qq.com<mailto: > 175998...@qq.com>> wrote: > > HI ALL: Recently,I met a strange problem, the first Region’s > Memstore of one table (the only one) often blocked when flushing.(Both > Version: hbase-0.98.6-cdh5.2.0 and 1.0.1.1, I updated 0.98 to > 1.0.1.1,hope to solve the problem,But failed) > > > > Is region always on same machine or do you see this phenomenon on more than > one machine? > > > > On the web UI, I can see the status shows: ABORTED(since XXsec > ago), Not flushing since already flushing. > > > > The RS is ABORTED? Because it can't flush? Is that what it says in the log? > Can we see the log message around the ABORT? > > > > But it will never flush success, and the usage of the disk will > increase very high.Now other regionservers just use 30% of the disk > capacity, the problematic region server will increase to 100%,unless > killing the process. > > > > 100% disk only? The CPU does not go up too? > > Can we see a thread dump? Do jstack -l PID if you can. > > > > What’s more, the region server process cannot be shutdown > normally,every time I have to use the KILL -9 command. > I check the log,the reason why cannot flush is one of the > MemstoreFlusher cannot exiting. > The log is like blow: > 2016-05-29 19:54:11,982 INFO [MemStoreFlusher.13] > regionserver.MemStoreFlusher: MemStoreFlusher.13 exiting > 2016-05-29 19:54:13,016 INFO [MemStoreFlusher.6] > regionserver.MemStoreFlusher: MemStoreFlusher.6 exiting > 2016-05-29 19:54:13,260 INFO [MemStoreFlusher.16] > regionserver.MemStoreFlusher: MemStoreFlusher.16 exiting > 2016-05-29 19:54:16,032 INFO [MemStoreFlusher.33] > regionserver.MemStoreFlusher: MemStoreFlusher.33 exiting > 2016-05-29 19:54:16,341 INFO [MemStoreFlusher.25] > regionserver.MemStoreFlusher: MemStoreFlusher.25 exiting > 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.31] > regionserver.MemStoreFlusher: MemStoreFlusher.31 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.29] > regionserver.MemStoreFlusher: MemStoreFlusher.29 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.23] > regionserver.MemStoreFlusher: MemStoreFlusher.23 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.32] > regionserver.MemStoreFlusher: MemStoreFlusher.32 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.38] > regionserver.MemStoreFlusher: MemStoreFlusher.38 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.10] > regionserver.MemStoreFlusher: MemStoreFlusher.10 exiting > 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.7] > regionserver.MemStoreFlusher: MemStoreFlusher.7 exiting > 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.12] > regionserver.MemStoreFlusher: MemStoreFlusher.12 exiting > 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.21] > regionserver.MemStoreFlusher: MemStoreFlusher.21 exiting > 2016-05-29 19:54:16,622 INFO [MemStoreFlusher.37] > regionserver.MemStoreFlusher: MemStoreFlusher.37 exiting > 2016-05-29 19:54:16,622 INFO [MemStoreFlusher.24] > regionserver.MemStoreFlusher: MemStoreFlusher.24 exiting > 2016-05-29 19:54:16,806 INFO [MemStoreFlusher.39] > regionserver.MemStoreFlusher: MemStoreFlusher.39 exiting > 2016-05-29 19:54:16,908 INFO [MemStoreFlusher.17] > regionserver.MemStoreFlusher: MemStoreFlusher.17 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.3] > regionserver.MemStoreFlusher: MemStoreFlusher.3 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.27] > regionserver.MemStoreFlusher: MemStoreFlusher.27 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.18] > regionserver.MemStoreFlusher: MemStoreFlusher.18 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.4] > regionserver.MemStoreFlusher: MemStoreFlusher.4 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.19] > regionserver.MemStoreFlusher: MemStoreFlusher.19 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.22] > regionserver.MemStoreFlusher: MemStoreFlusher.22 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.9] > regionserver.MemStoreFlusher: MemStoreFlusher.9 exiting > 2016-05-29 19:54:17,137 INFO [MemStoreFlusher.36] > regionserver.MemStoreFlusher: MemStoreFlusher.36 exiting > 2016-05-29 19:54:17,263 INFO [MemStoreFlusher.11] > regionserver.MemStoreFlusher: MemStoreFlusher.11 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.14] > regionserver.MemStoreFlusher: MemStoreFlusher.14 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.2] > regionserver.MemStoreFlusher: MemStoreFlusher.2 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.26] > regionserver.MemStoreFlusher: MemStoreFlusher.26 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.0] > regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.30] > regionserver.MemStoreFlusher: MemStoreFlusher.30 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.28] > regionserver.MemStoreFlusher: MemStoreFlusher.28 exiting > > > here is the jstack log when I try to shutdown the region server process > normally: > we can see the MemstoreFlusher.34 meet some problems, it can not exiting > normally > (By the way, we can see “locked at snappy”, but it not always locked at > snappy,sometime it will locked at other place,so I think snappy is ok, Here > I just want to show the MemStoreFlusher meet some problem). > > > Any other special configurations going on on this install? Phoenix or use > > of Coprocessors? > > > > "MemStoreFlusher.34" prio=10 tid=0x00007f20b8939000 nid=0x511b runnable > [0x00007f08941c3000] > java.lang.Thread.State: RUNNABLE > at > > > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native > Method) > at > > > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232) > - locked <0x00007f0a5f85c430> (a > org.apache.hadoop.io.compress.snappy.SnappyCompressor) > at > > > org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) > at > > > org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) > at > > > org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238) > at > > > org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840) > at > > > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367) > at > > > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974) > at > > > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > at > > > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70) > - locked <0x00007f09fdcccda0> (a java.lang.Object) > at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829) > at > > > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989) > at > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779) > at > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575) > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461) > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435) > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66) > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253) > at java.lang.Thread.run(Thread.java:744) > > > > If you thread dump a few times, it is always stuck here? > > Thanks, > St.Ack > > > > > "MemStoreFlusher.33" prio=10 tid=0x00007f20b8937000 nid=0x511a waiting on > condition [0x00007f08942c5000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007f09b8e20f80> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264) > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68) > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233) > at java.lang.Thread.run(Thread.java:744) > > > > > > I do not know why the memstore cannot flush. > Is this a bug? > > > > 下面是被转发的邮件: > > 发件人: quan <329853...@qq.com<mailto:329853...@qq.com>> > > 主题: memstore flush blocked > > 日期: 2016年5月30日 GMT+8 上午10:47:43 > > 收件人: user@hbase.apache.org<mailto:user@hbase.apache.org> > > 抄送: 蒲聪 <cong...@qunar.com<mailto:cong...@qunar.com>> > > > HI ALL: Recently,I met a strange problem, the first Region’s > Memstore of one table (the only one) often blocked when flushing.(Both > Version: hbase-0.98.6-cdh5.2.0 and 1.0.1.1, I updated 0.98 to > 1.0.1.1,hope to solve the problem,But failed) > On the web UI, I can see the status shows: ABORTED(since XXsec > ago), Not flushing since already flushing. > But it will never flush success, and the usage of the disk will > increase very high.Now other regionservers just use 30% of the disk > capacity, the problematic region server will increase to 100%,unless > killing the process. > What’s more, the region server process cannot be shutdown > normally,every time I have to use the KILL -9 command. > I check the log,the reason why cannot flush is one of the > MemstoreFlusher cannot exiting. > The log is like blow: > 2016-05-29 19:54:11,982 INFO [MemStoreFlusher.13] > regionserver.MemStoreFlusher: MemStoreFlusher.13 exiting > 2016-05-29 19:54:13,016 INFO [MemStoreFlusher.6] > regionserver.MemStoreFlusher: MemStoreFlusher.6 exiting > 2016-05-29 19:54:13,260 INFO [MemStoreFlusher.16] > regionserver.MemStoreFlusher: MemStoreFlusher.16 exiting > 2016-05-29 19:54:16,032 INFO [MemStoreFlusher.33] > regionserver.MemStoreFlusher: MemStoreFlusher.33 exiting > 2016-05-29 19:54:16,341 INFO [MemStoreFlusher.25] > regionserver.MemStoreFlusher: MemStoreFlusher.25 exiting > 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.31] > regionserver.MemStoreFlusher: MemStoreFlusher.31 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.29] > regionserver.MemStoreFlusher: MemStoreFlusher.29 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.23] > regionserver.MemStoreFlusher: MemStoreFlusher.23 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.32] > regionserver.MemStoreFlusher: MemStoreFlusher.32 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.38] > regionserver.MemStoreFlusher: MemStoreFlusher.38 exiting > 2016-05-29 19:54:16,621 INFO [MemStoreFlusher.10] > regionserver.MemStoreFlusher: MemStoreFlusher.10 exiting > 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.7] > regionserver.MemStoreFlusher: MemStoreFlusher.7 exiting > 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.12] > regionserver.MemStoreFlusher: MemStoreFlusher.12 exiting > 2016-05-29 19:54:16,620 INFO [MemStoreFlusher.21] > regionserver.MemStoreFlusher: MemStoreFlusher.21 exiting > 2016-05-29 19:54:16,622 INFO [MemStoreFlusher.37] > regionserver.MemStoreFlusher: MemStoreFlusher.37 exiting > 2016-05-29 19:54:16,622 INFO [MemStoreFlusher.24] > regionserver.MemStoreFlusher: MemStoreFlusher.24 exiting > 2016-05-29 19:54:16,806 INFO [MemStoreFlusher.39] > regionserver.MemStoreFlusher: MemStoreFlusher.39 exiting > 2016-05-29 19:54:16,908 INFO [MemStoreFlusher.17] > regionserver.MemStoreFlusher: MemStoreFlusher.17 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.3] > regionserver.MemStoreFlusher: MemStoreFlusher.3 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.27] > regionserver.MemStoreFlusher: MemStoreFlusher.27 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.18] > regionserver.MemStoreFlusher: MemStoreFlusher.18 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.4] > regionserver.MemStoreFlusher: MemStoreFlusher.4 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.19] > regionserver.MemStoreFlusher: MemStoreFlusher.19 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.22] > regionserver.MemStoreFlusher: MemStoreFlusher.22 exiting > 2016-05-29 19:54:16,952 INFO [MemStoreFlusher.9] > regionserver.MemStoreFlusher: MemStoreFlusher.9 exiting > 2016-05-29 19:54:17,137 INFO [MemStoreFlusher.36] > regionserver.MemStoreFlusher: MemStoreFlusher.36 exiting > 2016-05-29 19:54:17,263 INFO [MemStoreFlusher.11] > regionserver.MemStoreFlusher: MemStoreFlusher.11 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.14] > regionserver.MemStoreFlusher: MemStoreFlusher.14 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.2] > regionserver.MemStoreFlusher: MemStoreFlusher.2 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.26] > regionserver.MemStoreFlusher: MemStoreFlusher.26 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.0] > regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.30] > regionserver.MemStoreFlusher: MemStoreFlusher.30 exiting > 2016-05-29 19:54:17,499 INFO [MemStoreFlusher.28] > regionserver.MemStoreFlusher: MemStoreFlusher.28 exiting > > > here is the jstack log when I try to shutdown the region server process > normally: > we can see the MemstoreFlusher.34 meet some problems, it can not exiting > normally > (By the way, we can see “locked at snappy”, but it not always locked at > snappy,sometime it will locked at other place,so I think snappy is ok, Here > I just want to show the MemStoreFlusher meet some problem). > > > "MemStoreFlusher.34" prio=10 tid=0x00007f20b8939000 nid=0x511b runnable > [0x00007f08941c3000] > java.lang.Thread.State: RUNNABLE > at > > > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native > Method) > at > > > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232) > - locked <0x00007f0a5f85c430> (a > org.apache.hadoop.io.compress.snappy.SnappyCompressor) > at > > > org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) > at > > > org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) > at > > > org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238) > at > > > org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931) > at > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840) > at > > > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367) > at > > > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974) > at > > > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > at > > > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70) > - locked <0x00007f09fdcccda0> (a java.lang.Object) > at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829) > at > > > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989) > at > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779) > at > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575) > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461) > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435) > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66) > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253) > at java.lang.Thread.run(Thread.java:744) > > > "MemStoreFlusher.33" prio=10 tid=0x00007f20b8937000 nid=0x511a waiting on > condition [0x00007f08942c5000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007f09b8e20f80> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264) > at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68) > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233) > at java.lang.Thread.run(Thread.java:744) > > > > > > I do not know why the memstore cannot flush. > Is this a bug? > > > > 安全提示:本邮件非QUNAR内部邮件,请注意保护个人及公司信息安全,如有索取帐号密码等可疑情况请向 secteam发送邮件 > > > > > > 安全提示:本邮件非QUNAR内部邮件,请注意保护个人及公司信息安全,如有索取帐号密码等可疑情况请向 secteam发送邮件 > >