I ran into an issue yesterday where one of the blocks on HDFS seems to have gone away. I would appreciate any help that you can provide.
I am running Hadoop on Amazon's Elastic Map Reduce (EMR). I am running hadoop version 0.20.205 and hive version 0.8.1. I have a hive table that is written out in the reduce step of a map reduce job created by hive. This step completed with no errors, but the next map-reduce job that tries to read it failed with the following error message. "Caused by: java.io.IOException: No live nodes contain current block" I ran hadoop fs -cat on the same file and got the same error. Looking more closely at the data and name node logs, I see this error for the same problem block. It is in the name node when trying to read the data. 2012-09-03 11:56:05,054 WARN org.apache.hadoop.hdfs.server.datanode.DataNode (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0): DatanodeRegistration(10.193.39.159:9200, storageID=DS-2147477684-10.193.39.159-9200-1346659207926, infoPort=9102, ipcPort=9201):sendBlock() : Offset 134217727 and length 1 don't match block blk_-7100869813617535842_5426 ( blockLen 120152064 ) 2012-09-03 11:56:05,054 WARN org.apache.hadoop.hdfs.server.datanode.DataNode (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0): DatanodeRegistration(10.193.39.159:9200, storageID=DS-2147477684-10.193.39.159-9200-1346659207926, infoPort=9102, ipcPort=9201):Got exception while serving blk_-7100869813617535842_5426 to /10.96.57.112: java.io.IOException: Offset 134217727 and length 1 don't match block blk_-7100869813617535842_5426 ( blockLen 120152064 ) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:141) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99) at java.lang.Thread.run(Thread.java:662) 2012-09-03 11:56:05,054 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode (org.apache.hadoop.hdfs.server.datanode.DataXceiver@4a7cdff0): DatanodeRegistration(10.193.39.159:9200, storageID=DS-2147477684-10.193.39.159-9200-1346659207926, infoPort=9102, ipcPort=9201):DataXceiver java.io.IOException: Offset 134217727 and length 1 don't match block blk_-7100869813617535842_5426 ( blockLen 120152064 ) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:141) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:189) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99) at java.lang.Thread.run(Thread.java:662) Unfortunately the EMR cluster that had the data on it has since been terminated. I have access to the logs, but I can't run an fsck. I can provide more detailed stack traces etc. if you think it would be helpful. Rerunning my process by re-generating the corrupted block resolved the issue. Would really appreciate if anyone has a reasonable explanation of what happened and how to avoid in the future. Max