Hi Experts, I am decommissioning one of my nodes from the cluster. All the blocks get replicated properly to the other nodes to maintain the replication factor except one. I get the following exception for the block:
*Source Datanode (One being decommissioned):* 2014-04-29 07:08:31,619 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1X.X.X.XX:50010, storageID=DS-567173478-1X.X.X.XX-50010-1366295899368, infoPort=50075, ipcPort=50020):Failed to transfer blk_-8120977448166465461_891134 to 1X.X.X.YYY:50010 got java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:323) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:435) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1177) at java.lang.Thread.run(Thread.java:662) *Destination Datanode (where block is supposed to be replicated):* 2014-04-29 07:07:24,179 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-8120977448166465461_891134 received exception org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_-8120977448166465461_891134 has already been started (though not completed), and thus cannot be created. 2014-04-29 07:07:24,179 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1X.X.X.YYY:50010, storageID=DS-1396119779-1X.X.X.YYY-50010-1388728482530, infoPort=50075, ipcPort=50020):DataXceiver org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_-8120977448166465461_891134 has already been started (though not completed), and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1229) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:99) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:662) 2014-04-29 07:07:34,329 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_2476742220921569826_901106 2014-04-29 07:07:43,929 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-8387585272893559369_854112 2014-04-29 07:07:52,329 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5961493296385433904_858037 2014-04-29 07:08:50,305 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-8120977448166465461_891134 src: /1X.X.X.XX:37100 dest: /1X.X.X.YYY:50010 2014-04-29 07:08:50,305 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-8120977448166465461_891134 received exception org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_-8120977448166465461_891134 has already been started (though not completed), and thus cannot be created. 2014-04-29 07:08:50,305 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1X.X.X.YYY:50010, storageID=DS-1396119779-1X.X.X.YYY-50010-1388728482530, infoPort=50075, ipcPort=50020):DataXceiver org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_-8120977448166465461_891134 has already been started (though not completed), and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:1229) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:99) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:662) How do I overcome these errors? The block is available in the other locations and fsck shows the cluster to be in a healthy state. I am using Hadoop-0.20-append-r1056497, we are upgrading to the latest but till the time we upgrade would really appreciate any pointers to solve this issue. Thanks Divye Sheth