Dear Team, In one of the Hbase Cluster, occasionally the some of the region has been stuck in PENDING_CLOSE state for long time. After that I need to restart particular region server holding region and the only it resolves.
The cluster has write loaded as this cluster which receives replication from another cluster. The HBase version is 1.2.6. Please help to solve this issue Below is the thread dump regionserver//10.x.x.x:16020-shortCompactions-1691219096144" daemon prio=10 tid=0x00007f07b06a8000 nid=0x38ff9 runnable [0x00007f0741f21000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x00000005418009f8> (a sun.nio.ch.Util$2) - locked <0x0000000541800a08> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005418009b0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:186) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146) - locked <0x00000004c986b3c0> (a org.apache.hadoop.hdfs.RemoteBlockReader2) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:686) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:742) - eliminated <0x00000004c986b358> (a org.apache.hadoop.hdfs.DFSInputStream) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:799) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840) - locked <0x00000004c986b358> (a org.apache.hadoop.hdfs.DFSInputStream) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:709) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1440) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1648) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1532) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:452) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:729) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.isNextBlock(HFileReaderV2.java:854) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.positionForNextBlock(HFileReaderV2.java:849) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2._next(HFileReaderV2.java:866) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:886) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:154) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:111) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:318) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:111) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:119) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1223) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1856) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:525) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:562) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "RS_OPEN_REGION-10.x.x.x:16020-1" prio=10 tid=0x00007f07bb222800 nid=0x38fd8 waiting on condition [0x00007f074181b000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000018501e9d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Regards, Manimekalai K