We are using hadoop 2.6.0 and recently we found that sometimes datanode can hang(not HDFS-7489, we had already patched it).
This one was caused by pipeline recovery of blk_1150368526_76663248 datanode log 2015-04-28 17:51:55,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1553162417-10.0.67.25-1418714798720:blk_1150368526_76663248 src: / 10.0.46.4:33257 dest: /10.0.49.6:50010 2015-04-28 17:53:01,923 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1553162417-10.0.67.25-1418714798720:blk_1150368526_76663248 src: / 10.0.50.13:63069 dest: /10.0.49.6:50010 2015-04-28 17:53:10,790 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-1553162417-10.0.67.25-1418714798720:blk_1150368526_76663248 2015-04-28 17:53:10,790 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_1150368526_76663248, RBW 2015-04-28 17:54:10,791 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Join on writer thread Thread[DataXceiver for client DFSClient_attempt_1428562732216_171256_r_000000_0_670237631_1 at / 10.0.46.4:33257 [Receiving block BP-1553162417-10.0.67.25-1418714798720:blk_1150368526_76663248],5,dataXceiverServer] timed out 2015-04-28 17:54:10,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1553162417-10.0.67.25-1418714798720:blk_1150368526_76663248 received exception java.io.IOException: Join on writer thread Thread[DataXceiver for client DFSClient_attempt_1428562732216_171256_r_000000_0_670237631_1 at / 10.0.46.4:33257 [Receiving block BP-1553162417-10.0.67.25-1418714798720:blk_1150368526_76663248],5,dataXceiverServer] timed out java.io.IOException: Join on writer thread Thread[DataXceiver for client DFSClient_attempt_1428562732216_171256_r_000000_0_670237631_1 at / 10.0.46.4:33257 [Receiving block BP-1553162417-10.0.67.25-1418714798720:blk_1150368526_76663248],5,dataXceiverServer] timed out And this is the jstack result when datanode hangs "DataXceiver for client DFSClient_attempt_1428562732216_171256_r_000000_0_670237631_1 at / 10.0.50.13:63069 [Receiving block BP-1553162417-10.0.67.25-1418714798720:blk_1150368526_76663248]" daemon prio=10 tid=0x00007f0dd052e000 nid=0x66e9 in Object.wait() [0x00007f0d9e27 1000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1289) - locked <0x00000006efad2f60> (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverRbw(FsDatasetImpl.java:1123) - locked <0x00000006c82026c8> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverRbw(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:188) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:745) "DataXceiver for client DFSClient_attempt_1428562732216_171256_r_000000_0_670237631_1 at / 10.0.46.4:33257 [Receiving block BP-1553162417-10.0.67.25-1418714798720:blk_1150368526_76663248]" daemon prio=10 tid=0x00007f0dd02a1800 nid=0x64ca runnable [0x00007f0dae602000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:613) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:781) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:730) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:745) "PacketResponder: BP-1553162417-10.0.67.25-1418714798720:blk_1150368526_76663248, type=LAST_IN_PIPELINE, downstreams=0:[]" daemon prio=10 tid=0x00007f0dd0a2b800 nid=0x6560 in Object.wait() [0x00007f0dad0ed000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.waitForAckHead(BlockReceiver.java:1111) - locked <0x00000006efae49e8> (a java.util.LinkedList) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1181) at java.lang.Thread.run(Thread.java:745) This is the code public void stopWriter(long xceiverStopTimeout) throws IOException { if (writer != null && writer != Thread.currentThread() && writer.isAlive()) { writer.interrupt(); try { writer.join(xceiverStopTimeout); if (writer.isAlive()) { final String msg = "Join on writer thread " + writer + " timed out"; DataNode.LOG.warn(msg + "\n" + StringUtils.getStackTrace(writer)); throw new IOException(msg); } } catch (InterruptedException e) { throw new IOException("Waiting for writer thread is interrupted."); } } } You can see that, the writer is still alive after being interrupted, and we join on the writer for one minute and then timeout, so we keep holding the big FsDataset lock for one minute. This cause the datanode hangs for one minute. My question is that, does interrupt really work? We use blocking socket IO in datanode, it does not respond to a interruption signal on most platforms, and the same for filesystem IO. Why not just close the socket? Of course, in most cases, the writer should have already been stopped when we call recoverRbw. The original pipeline should have already been broken otherwise we do not need to recover it, and writer may discover the socket is broken by itself and quit. And one more thing, why always a big lock on FsDataset? Could we use fine-grained locking instead? And, do not do time-consuming tasks(or sleep...) when holding a lock? Thanks.