HI STACK:

   1.   The log is very large,so I pick some of it. But it seems not provide 
valuable info.Here is the region named 
qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be  can’t flush.

      When the flush Thread works well, The log is like this:
      2016-05-24 12:38:27,071 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a 
delay of 16681
2016-05-24 12:38:37,071 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a 
delay of 8684
2016-05-24 12:38:43,753 INFO  [MemStoreFlusher.2] regionserver.HRegion: Started 
memstore flush for qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., 
current region memstore size 305.3 K
2016-05-24 12:38:43,753 WARN  [MemStoreFlusher.2] wal.FSHLog: Couldn't find 
oldest seqNum for the region we are about to flush: 
[dd8f92e3c161a8534b30ab17c28ae8be]
2016-05-24 12:38:43,816 INFO  [MemStoreFlusher.2] 
regionserver.DefaultStoreFlusher: Flushed, sequenceid=54259, memsize=305.3 K, 
hasBloomFilter=true, into tmp file 
hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/.tmp/fddbb05945354d5cbdae4afd24e5bb9d
2016-05-24 12:38:43,822 DEBUG [MemStoreFlusher.2] 
regionserver.HRegionFileSystem: Committing store file 
hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/.tmp/fddbb05945354d5cbdae4afd24e5bb9d
 as 
hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/t/fddbb05945354d5cbdae4afd24e5bb9d
2016-05-24 12:38:43,837 INFO  [MemStoreFlusher.2] regionserver.HStore: Added 
hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/t/fddbb05945354d5cbdae4afd24e5bb9d,
 entries=108, sequenceid=54259, filesize=68.3 K
2016-05-24 12:38:43,837 INFO  [MemStoreFlusher.2] regionserver.HRegion: 
Finished memstore flush of ~305.3 K/312664, currentsize=0/0 for region 
qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. in 84ms, 
sequenceid=54259, compaction requested=true

      And when the  flush Thread does not  work well,The log just always shows :
2016-05-25 14:57:02,588 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a 
delay of 18068
2016-05-25 14:57:12,587 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a 
delay of 13165
2016-05-25 14:57:20,656 DEBUG [MemStoreFlusher.36] regionserver.HRegion: NOT 
flushing memstore for region 
qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, 
writesEnabled=true
2016-05-25 14:57:22,587 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a 
delay of 5526
2016-05-25 14:57:28,113 DEBUG [MemStoreFlusher.34] regionserver.HRegion: NOT 
flushing memstore for region 
qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, 
writesEnabled=true
2016-05-25 14:57:32,587 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a 
delay of 8178
2016-05-25 14:57:40,767 DEBUG [MemStoreFlusher.9] regionserver.HRegion: NOT 
flushing memstore for region 
qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, 
writesEnabled=true
2016-05-25 14:57:42,587 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a 
delay of 22068
2016-05-25 14:57:52,587 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a 
delay of 5492
2016-05-25 14:58:02,587 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a 
delay of 10472
2016-05-25 14:58:04,655 DEBUG [MemStoreFlusher.23] regionserver.HRegion: NOT 
flushing memstore for region 
qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true, 
writesEnabled=true
2016-05-25 14:58:12,587 INFO  [regionserver60020.periodicFlusher] 
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush 
for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. after a 
delay of 8435

     I check the log before  the region can’t flush, but find nothing special 
from usual.
     by the way, this table has a heavy load on  writing, so I turn off the WAL.


2. It is the  disk is filling faster than on other machines.  Not the instat 
show.

3.this is the jstack log of the memstoreflusher:
"MemStoreFlusher.14" prio=10 tid=0x00007fe69487d800 nid=0x387d waiting on 
condition [0x00007fce7118c000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00007fd09e031470> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264)
at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233)
at java.lang.Thread.run(Thread.java:744)

"MemStoreFlusher.13" prio=10 tid=0x00007fe69487b800 nid=0x387c runnable 
[0x00007fce7128c000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native
 Method)
at 
org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232)
- locked <0x00007fcfe3ba1190> (a 
org.apache.hadoop.io.compress.snappy.SnappyCompressor)
at 
org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
at 
org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
at 
org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238)
at 
org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840)
at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367)
at 
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70)
- locked <0x00007fd8d1778be8> (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829)
at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662)
at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253)
at java.lang.Thread.run(Thread.java:744)

"MemStoreFlusher.12" prio=10 tid=0x00007fe694879800 nid=0x387b waiting on 
condition [0x00007fce7138e000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00007fd09e031470> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264)
at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233)
at java.lang.Thread.run(Thread.java:744)

在 2016年6月3日,上午12:08,Stack <st...@duboce.net<mailto:st...@duboce.net>> 写道:

On Wed, Jun 1, 2016 at 3:03 AM, 吴国泉wgq 
<wgq...@qunar.com<mailto:wgq...@qunar.com>> wrote:

hi all:

1.Is region always on same machine or do you see this phenomenon on more
than one machine?
      Not always on the same machine, but always on the machine which
hold the first region of a table(the only table that its first region
cannot flush,when restart the regionserver,  the first region would move to
another machine)

2.The RS is ABORTED? Because it can't flush? Is that what it says in the
log? Can we see the log message around the ABORT?
      sorry,I did not express clear here. It is the MemStore of the first
region can’t flush, not the RS.
      The RS Log is like this:
      INFO  [regionserver60020.periodicFlusher]
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
after a delay of 9452
      DEBUG [MemStoreFlusher.39] regionserver.HRegion
memstore for region
qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true,
writesEnabled=true

      And the web UI shows:
      Aborted flushing .(Not flushing since already flushing.) But  the
Flusher thread never finish.




Can you back up in the log for this region? It looks like the first flush
attempt failed or never completed. Perhaps there is a clue in the log
message from the first attempt as to why the failure. The above log seems
to be from a later stage, when we have gotten into the unhealthy state
where we can't flush the first region memstore.



3.100% disk only? The CPU does not go up too?  Can we see a thread dump?
Do jstack -l  PID if you can
      Only the disk usage(command : df -h) increases faster than other
machine, not the IO usage. The usage of CPU is very low.



Sorry. Your disk is filling faster than on other machines? Or the iostat
shows the disk is doing more seeks than on other machines?



4.Any other special configurations going on on this install? Phoenix or
use of Coprocessors?
      NO, no phoenix. Only AccessController coprocessor.

5.If you thread dump a few times, it is always stuck here?
      Yes, always stuck here. here is the jstack log.(in this log, it is
the MemStoreFlusher.13 can ’t flush)


      PS: As I see. I think this is because the first region cannot
flush cause the problem. But I do not know why it can’t flush and why just
the first region of the only table has this problem.



Please include pointer to the stack trace. You can't have attachments on
mail to this list.

Thank you,
St.Ack





在 2016年6月1日,上午3:10,Stack <st...@duboce.net<mailto:st...@duboce.net>> 写道:

On Mon, May 30, 2016 at 7:03 PM, 聪聪 <175998...@qq.com<mailto:175998...@qq.com>> 
wrote:

HI ALL:       Recently,I met a strange problem,  the first Region’s
Memstore of one table (the only one) often blocked when flushing.(Both
Version:  hbase-0.98.6-cdh5.2.0  and 1.0.1.1, I updated 0.98 to
1.0.1.1,hope to solve the problem,But failed)



Is region always on same machine or do you see this phenomenon on more than
one machine?



     On the web UI, I can see the status shows:  ABORTED(since XXsec
ago), Not flushing since already flushing.



The RS is ABORTED? Because it can't flush? Is that what it says in the log?
Can we see the log message around the ABORT?



     But it will never flush success, and the usage of the disk will
increase very high.Now other regionservers just use 30% of the disk
capacity, the problematic region server will increase to 100%,unless
killing the process.



100% disk only? The CPU does not go up too?

Can we see a thread dump? Do jstack -l  PID if you can.



     What’s more, the region server process cannot be shutdown
normally,every time I have to use the KILL -9 command.
     I check the log,the reason why cannot flush is one of the
MemstoreFlusher cannot exiting.
     The log is like blow:
     2016-05-29 19:54:11,982 INFO  [MemStoreFlusher.13]
regionserver.MemStoreFlusher: MemStoreFlusher.13 exiting
2016-05-29 19:54:13,016 INFO  [MemStoreFlusher.6]
regionserver.MemStoreFlusher: MemStoreFlusher.6 exiting
2016-05-29 19:54:13,260 INFO  [MemStoreFlusher.16]
regionserver.MemStoreFlusher: MemStoreFlusher.16 exiting
2016-05-29 19:54:16,032 INFO  [MemStoreFlusher.33]
regionserver.MemStoreFlusher: MemStoreFlusher.33 exiting
2016-05-29 19:54:16,341 INFO  [MemStoreFlusher.25]
regionserver.MemStoreFlusher: MemStoreFlusher.25 exiting
2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.31]
regionserver.MemStoreFlusher: MemStoreFlusher.31 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.29]
regionserver.MemStoreFlusher: MemStoreFlusher.29 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.23]
regionserver.MemStoreFlusher: MemStoreFlusher.23 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.32]
regionserver.MemStoreFlusher: MemStoreFlusher.32 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.1]
regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.38]
regionserver.MemStoreFlusher: MemStoreFlusher.38 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.10]
regionserver.MemStoreFlusher: MemStoreFlusher.10 exiting
2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.7]
regionserver.MemStoreFlusher: MemStoreFlusher.7 exiting
2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.12]
regionserver.MemStoreFlusher: MemStoreFlusher.12 exiting
2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.21]
regionserver.MemStoreFlusher: MemStoreFlusher.21 exiting
2016-05-29 19:54:16,622 INFO  [MemStoreFlusher.37]
regionserver.MemStoreFlusher: MemStoreFlusher.37 exiting
2016-05-29 19:54:16,622 INFO  [MemStoreFlusher.24]
regionserver.MemStoreFlusher: MemStoreFlusher.24 exiting
2016-05-29 19:54:16,806 INFO  [MemStoreFlusher.39]
regionserver.MemStoreFlusher: MemStoreFlusher.39 exiting
2016-05-29 19:54:16,908 INFO  [MemStoreFlusher.17]
regionserver.MemStoreFlusher: MemStoreFlusher.17 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.3]
regionserver.MemStoreFlusher: MemStoreFlusher.3 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.27]
regionserver.MemStoreFlusher: MemStoreFlusher.27 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.18]
regionserver.MemStoreFlusher: MemStoreFlusher.18 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.4]
regionserver.MemStoreFlusher: MemStoreFlusher.4 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.19]
regionserver.MemStoreFlusher: MemStoreFlusher.19 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.22]
regionserver.MemStoreFlusher: MemStoreFlusher.22 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.9]
regionserver.MemStoreFlusher: MemStoreFlusher.9 exiting
2016-05-29 19:54:17,137 INFO  [MemStoreFlusher.36]
regionserver.MemStoreFlusher: MemStoreFlusher.36 exiting
2016-05-29 19:54:17,263 INFO  [MemStoreFlusher.11]
regionserver.MemStoreFlusher: MemStoreFlusher.11 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.14]
regionserver.MemStoreFlusher: MemStoreFlusher.14 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.2]
regionserver.MemStoreFlusher: MemStoreFlusher.2 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.26]
regionserver.MemStoreFlusher: MemStoreFlusher.26 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.0]
regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.30]
regionserver.MemStoreFlusher: MemStoreFlusher.30 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.28]
regionserver.MemStoreFlusher: MemStoreFlusher.28 exiting


here is the jstack log when I try to shutdown the region server process
normally:
we can see the MemstoreFlusher.34 meet some problems, it can not exiting
normally
(By the way, we can see “locked at snappy”, but it not always locked at
snappy,sometime it will locked at other place,so I think snappy is ok, Here
I just want to show the MemStoreFlusher meet some problem).


Any other special configurations going on on this install? Phoenix or use

of Coprocessors?



"MemStoreFlusher.34" prio=10 tid=0x00007f20b8939000 nid=0x511b runnable
[0x00007f08941c3000]
  java.lang.Thread.State: RUNNABLE
at

org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native
Method)
at

org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232)
- locked <0x00007f0a5f85c430> (a
org.apache.hadoop.io.compress.snappy.SnappyCompressor)
at

org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
at

org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
at

org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238)
at

org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149)
at

org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783)
at

org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761)
at

org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878)
at

org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864)
at

org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948)
at

org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931)
at

org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840)
at

org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367)
at

org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
at

org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
at

org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70)
- locked <0x00007f09fdcccda0> (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829)
at

org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989)
at

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779)
at

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662)
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575)
at

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461)
at

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435)
at

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66)
at

org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253)
at java.lang.Thread.run(Thread.java:744)



If you thread dump a few times, it is always stuck here?

Thanks,
St.Ack




"MemStoreFlusher.33" prio=10 tid=0x00007f20b8937000 nid=0x511a waiting on
condition [0x00007f08942c5000]
  java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00007f09b8e20f80> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264)
at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68)
at

org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233)
at java.lang.Thread.run(Thread.java:744)





  I do not know why the memstore cannot flush.
  Is this a bug?



 下面是被转发的邮件:

发件人: quan <329853...@qq.com<mailto:329853...@qq.com>>

主题: memstore flush blocked

日期: 2016年5月30日 GMT+8 上午10:47:43

收件人: user@hbase.apache.org<mailto:user@hbase.apache.org>

抄送: 蒲聪 <cong...@qunar.com<mailto:cong...@qunar.com>>


 HI ALL:       Recently,I met a strange problem,  the first Region’s
Memstore of one table (the only one) often blocked when flushing.(Both
Version:  hbase-0.98.6-cdh5.2.0  and 1.0.1.1, I updated 0.98 to
1.0.1.1,hope to solve the problem,But failed)
     On the web UI, I can see the status shows:  ABORTED(since XXsec
ago), Not flushing since already flushing.
     But it will never flush success, and the usage of the disk will
increase very high.Now other regionservers just use 30% of the disk
capacity, the problematic region server will increase to 100%,unless
killing the process.
     What’s more, the region server process cannot be shutdown
normally,every time I have to use the KILL -9 command.
     I check the log,the reason why cannot flush is one of the
MemstoreFlusher cannot exiting.
     The log is like blow:
     2016-05-29 19:54:11,982 INFO  [MemStoreFlusher.13]
regionserver.MemStoreFlusher: MemStoreFlusher.13 exiting
2016-05-29 19:54:13,016 INFO  [MemStoreFlusher.6]
regionserver.MemStoreFlusher: MemStoreFlusher.6 exiting
2016-05-29 19:54:13,260 INFO  [MemStoreFlusher.16]
regionserver.MemStoreFlusher: MemStoreFlusher.16 exiting
2016-05-29 19:54:16,032 INFO  [MemStoreFlusher.33]
regionserver.MemStoreFlusher: MemStoreFlusher.33 exiting
2016-05-29 19:54:16,341 INFO  [MemStoreFlusher.25]
regionserver.MemStoreFlusher: MemStoreFlusher.25 exiting
2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.31]
regionserver.MemStoreFlusher: MemStoreFlusher.31 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.29]
regionserver.MemStoreFlusher: MemStoreFlusher.29 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.23]
regionserver.MemStoreFlusher: MemStoreFlusher.23 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.32]
regionserver.MemStoreFlusher: MemStoreFlusher.32 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.1]
regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.38]
regionserver.MemStoreFlusher: MemStoreFlusher.38 exiting
2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.10]
regionserver.MemStoreFlusher: MemStoreFlusher.10 exiting
2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.7]
regionserver.MemStoreFlusher: MemStoreFlusher.7 exiting
2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.12]
regionserver.MemStoreFlusher: MemStoreFlusher.12 exiting
2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.21]
regionserver.MemStoreFlusher: MemStoreFlusher.21 exiting
2016-05-29 19:54:16,622 INFO  [MemStoreFlusher.37]
regionserver.MemStoreFlusher: MemStoreFlusher.37 exiting
2016-05-29 19:54:16,622 INFO  [MemStoreFlusher.24]
regionserver.MemStoreFlusher: MemStoreFlusher.24 exiting
2016-05-29 19:54:16,806 INFO  [MemStoreFlusher.39]
regionserver.MemStoreFlusher: MemStoreFlusher.39 exiting
2016-05-29 19:54:16,908 INFO  [MemStoreFlusher.17]
regionserver.MemStoreFlusher: MemStoreFlusher.17 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.3]
regionserver.MemStoreFlusher: MemStoreFlusher.3 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.27]
regionserver.MemStoreFlusher: MemStoreFlusher.27 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.18]
regionserver.MemStoreFlusher: MemStoreFlusher.18 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.4]
regionserver.MemStoreFlusher: MemStoreFlusher.4 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.19]
regionserver.MemStoreFlusher: MemStoreFlusher.19 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.22]
regionserver.MemStoreFlusher: MemStoreFlusher.22 exiting
2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.9]
regionserver.MemStoreFlusher: MemStoreFlusher.9 exiting
2016-05-29 19:54:17,137 INFO  [MemStoreFlusher.36]
regionserver.MemStoreFlusher: MemStoreFlusher.36 exiting
2016-05-29 19:54:17,263 INFO  [MemStoreFlusher.11]
regionserver.MemStoreFlusher: MemStoreFlusher.11 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.14]
regionserver.MemStoreFlusher: MemStoreFlusher.14 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.2]
regionserver.MemStoreFlusher: MemStoreFlusher.2 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.26]
regionserver.MemStoreFlusher: MemStoreFlusher.26 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.0]
regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.30]
regionserver.MemStoreFlusher: MemStoreFlusher.30 exiting
2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.28]
regionserver.MemStoreFlusher: MemStoreFlusher.28 exiting


here is the jstack log when I try to shutdown the region server process
normally:
we can see the MemstoreFlusher.34 meet some problems, it can not exiting
normally
(By the way, we can see “locked at snappy”, but it not always locked at
snappy,sometime it will locked at other place,so I think snappy is ok, Here
I just want to show the MemStoreFlusher meet some problem).


"MemStoreFlusher.34" prio=10 tid=0x00007f20b8939000 nid=0x511b runnable
[0x00007f08941c3000]
  java.lang.Thread.State: RUNNABLE
at

org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native
Method)
at

org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232)
- locked <0x00007f0a5f85c430> (a
org.apache.hadoop.io.compress.snappy.SnappyCompressor)
at

org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
at

org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
at

org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238)
at

org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149)
at

org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783)
at

org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761)
at

org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878)
at

org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864)
at

org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948)
at

org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931)
at

org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840)
at

org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367)
at

org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
at

org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
at

org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70)
- locked <0x00007f09fdcccda0> (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829)
at

org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989)
at

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779)
at

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662)
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575)
at

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461)
at

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435)
at

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66)
at

org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253)
at java.lang.Thread.run(Thread.java:744)


"MemStoreFlusher.33" prio=10 tid=0x00007f20b8937000 nid=0x511a waiting on
condition [0x00007f08942c5000]
  java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00007f09b8e20f80> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264)
at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68)
at

org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233)
at java.lang.Thread.run(Thread.java:744)





  I do not know why the memstore cannot flush.
  Is this a bug?



安全提示:本邮件非QUNAR内部邮件,请注意保护个人及公司信息安全,如有索取帐号密码等可疑情况请向 secteam发送邮件





安全提示:本邮件非QUNAR内部邮件,请注意保护个人及公司信息安全,如有索取帐号密码等可疑情况请向 secteam发送邮件

Reply via email to