hadoop 2.4.1 datanode disk failure. 'Number of Under-Replicated Blocks' is zero. After two disk failure will result in the loss of files. ( CORRUPT ) How do I fix it? 1. dfshealth.html Configured Capacity: 42.91 TB DFS Used: 1.86 GB Non DFS Used: 29.63 TB DFS Remaining: 13.28 TB DFS Used%: 0% DFS Remaining%: 30.94% Block Pool Used: 1.86 GB Block Pool Used%: 0% DataNodes usages% (Min/Median/Max/stdDev): 0.00% / 0.01% / 0.01% / 0.00% Live Nodes 2 (Decommissioned: 0) Dead Nodes 0 (Decommissioned: 0) Decommissioning Nodes 0 Number of Under-Replicated Blocks 0 2. chmod 444 /raid0/data01 ( volume failure ) 3. bin/hdfs dfs -get /t.mp4 /tmp/t4.mp4 ( read file ) 4. namenode log ( volume failure ) 2014-10-10 14:55:21,027 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: Disk error on DatanodeRegistration(192.168.55.151, datanodeUuid=b565d54d-0817-4aa5-884e-1e060179f43f, infoPort=40075, ipcPort=40020, storageInfo=lv=-55;cid=CID-TEST-ZONE;nsid=326408948;c=0): DataNode failed volumes:/raid0/data01/dfs/data/current; 2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_1073741848_1024 on 192.168.55.151:40010 size 49940112 replicaState = FINALIZED 2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: In memory blockUCState = COMPLETE 2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_1073741842_1018 on 192.168.55.151:40010 size 134217728 replicaState = FINALIZED 2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: In memory blockUCState = COMPLETE 2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_1073741844_1020 on 192.168.55.151:40010 size 134217728 replicaState = FINALIZED 2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: In memory blockUCState = COMPLETE 2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_1073741846_1022 on 192.168.55.151:40010 size 134217728 replicaState = FINALIZED 2014-10-10 14:55:25,400 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: In memory blockUCState = COMPLETE 2014-10-10 14:55:25,431 INFO BlockStateChange: BLOCK* processReport: from storage DS-4de98631-ddec-4118-8654-2961b1815230 node DatanodeRegistration(192.168.55.151, datanodeUuid=b565d54d-0817-4aa5-884e-1e060179f43f, infoPort=40075, ipcPort=40020, storageInfo=lv=-55;cid=CID-TEST-ZONE;nsid=326408948;c=0), blocks: 4, processing time: 32 msecs 5. datanode log ( volume failure ) 2014-10-10 14:55:21,473 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing failed volume /raid0/data01/dfs/data/current: org.apache.hadoop.util.DiskChecker$DiskErrorException: Can not create directory: /raid0/data01/dfs/data/current/BP-1269062812-127.0.0.1-1412645127175/current/finalized at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:91) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LDir.checkDirTree(LDir.java:160) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:255) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:209) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:168) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1317) at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:1421) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1117) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:350) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:343) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:150) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:265) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:662) 2014-10-10 14:55:21,491 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed to write dfsUsed to /raid0/data01/dfs/data/current/BP-1269062812-127.0.0.1-1412645127175/current/dfsUsed java.io.FileNotFoundException: /raid0/data01/dfs/data/current/BP-1269062812-127.0.0.1-1412645127175/current/dfsUsed (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:194) at java.io.FileOutputStream.<init>(FileOutputStream.java:145) at java.io.FileWriter.<init>(FileWriter.java:73) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.saveDfsUsed(BlockPoolSlice.java:213) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.shutdown(BlockPoolSlice.java:424) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:252) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:175) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1317) at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:1421) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1117) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:350) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:343) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:150) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:265) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:662) 2014-10-10 14:55:21,494 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Completed checkDirs. Removed 1 volumes. Current volumes: [/raid0/data02/dfs/data/current] 2014-10-10 14:55:21,494 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing replica BP-1269062812-127.0.0.1-1412645127175:1073741841 on failed volume /raid0/data01/dfs/data/current 2014-10-10 14:55:21,494 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing replica BP-1269062812-127.0.0.1-1412645127175:1073741843 on failed volume /raid0/data01/dfs/data/current 2014-10-10 14:55:21,494 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing replica BP-1269062812-127.0.0.1-1412645127175:1073741845 on failed volume /raid0/data01/dfs/data/current 2014-10-10 14:55:21,494 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing replica BP-1269062812-127.0.0.1-1412645127175:1073741847 on failed volume /raid0/data01/dfs/data/current 2014-10-10 14:55:21,495 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removed 4 out of 8(took 0 millisecs) 2014-10-10 14:55:21,495 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode.handleDiskError: Keep Running: true 2014-10-10 14:55:22,414 DEBUG org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: b=blk_1073741841_1017, f=/raid0/data01/dfs/data/current/BP-1269062812-127.0.0.1-1412645127175/current/finalized/blk_1073741841 2014-10-10 14:55:22,414 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock BP-1269062812-127.0.0.1-1412645127175:blk_1073741841_1017 received exception java.io.IOException: Block blk_1073741841_1017 is not valid. 2014-10-10 14:55:22,449 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.55.151, datanodeUuid=b565d54d-0817-4aa5-884e-1e060179f43f, infoPort=40075, ipcPort=40020, storageInfo=lv=-55;cid=CID-TEST-ZONE;nsid=326408948;c=0):Got exception while serving BP-1269062812-127.0.0.1-1412645127175:blk_1073741841_1017 to /192.168.55.151:53669 java.io.IOException: Block blk_1073741841_1017 is not valid. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:352) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:343) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:150) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:265) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:662) 2014-10-10 14:55:22,449 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: namenode02:40010:DataXceiver error processing READ_BLOCK operation src: /192.168.55.151:53669 dst: /192.168.55.151:40010 java.io.IOException: Block blk_1073741841_1017 is not valid. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:352) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:343) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:150) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:265) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:662) 6. dfshealth.html Configured Capacity: 42.91 TB DFS Used: 1.36 GB Non DFS Used: 29.62 TB DFS Remaining: 13.28 TB DFS Used%: 0% DFS Remaining%: 30.96% Block Pool Used: 1.36 GB Block Pool Used%: 0% DataNodes usages% (Min/Median/Max/stdDev): 0.00% / 0.00% / 0.00% / 0.00% Live Nodes 2 (Decommissioned: 0) Dead Nodes 0 (Decommissioned: 0) Decommissioning Nodes 0 Number of Under-Replicated Blocks 0