Hello, Users. When Hadoop Cluster had a heavy write workload, Sometime DFS Client receives a ClosedByInterruptException.
- similar issues: https://community.cloudera.com/t5/Community-Articles/Write-or-Append-failures-in-very-small-Clusters-under-heavy/ta-p/245446 As a result, DFS client tried to pipeline recovery. However, Block State of the last node always was TEMPORARY. [Error Message] ``` ReplicaNotFoundException: Cannot recover a non-RBW replica ReplicaInPipeline, blk_2355764371_1282088800, TEMPORARY getNumBytes() = 49243526 getBytesOnDisk() = 49152000 getVisibleLength()= -1 getVolume() = /hdfs/datanode/current getBlockFile() = /hdfs/datanode/current/BP-199986352--/tmp/blk_2355764371 bytesAcked=0 bytesOnDisk=49152000 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverRbw(FsDatasetImpl.java:1347) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:207) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:676) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253) at java.lang.Thread.run(Thread.java:748) ``` It repeated until lease expires. After lease expires, started lease recovery and then everything was good. I think TEMPORARY block should not join setup pipeline process. It seems like receive block upstream DN. Hadoop version is 2.7.7 Best regards, Minwoo Kang