[ https://issues.apache.org/jira/browse/HADOOP-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066974#comment-14066974 ]
Wang Peipei commented on HADOOP-2890: ------------------------------------- Hi dhruba borthakur, isn't this problem is because that NN incorrectly uses the block object used in RPC to queue to neededReplication queue instead of using internal block object ? Because 134217728 is actually 128k. I got this message from HADOOP-5605 > HDFS should recover when replicas of block have different sizes (due to > corrupted block) > ----------------------------------------------------------------------------------------- > > Key: HADOOP-2890 > URL: https://issues.apache.org/jira/browse/HADOOP-2890 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 0.16.0 > Reporter: Lohit Vijayarenu > Assignee: dhruba borthakur > Fix For: 0.17.0 > > Attachments: inconsistentSize.patch, inconsistentSize.patch, > inconsistentSize.patch, inconsistentSize.patch > > > We had a case where reading a file caused IOException. > 08/02/25 17:23:02 INFO fs.DFSClient: Could not obtain block > blk_-8333897631311887285 from any node: java.io.IOException: No live nodes > contain current block > hadoop fsck said the block was healthy. > [lohit]$ hadoop fsck part-04344 -files -blocks -locations | grep > 8333897631311887285 > 21. -8333897631311887285 len=134217728 repl=3 [74.6.129.238:50010, > 74.6.133.231:50010, 74.6.128.158:50010] > Looking for logs about the block showed this message in namenode log > 17:26:23,543 WARN org.apache.hadoop.fs.FSNamesystem: Inconsistent size for > block blk_-8333897631311887285 reported from 74.6.133.231:50010 current size > is 134217728 reported size is 134205440 > So, the namenode was expecting 134217728 while the actual block size was > 134205440 > Dhruba took a look at the logs further and we found out this is what had > happend > 1. While the file was being created this block was replicated to three nodes > of which 2 nodes had correct sized block, but the third node has > partial/truncated block. (but the metadata was same on all nodes) > 2. Later after 3 days namenode was restarted, at which point the 3rd node > reported warning message about incorrect block size. (Namenode logged this) > 3. After few days the first 2 nodes went down and the 3rd node replicated the > partial/truncated block to two new nodes. > 4. Now when we tried to read this block, we hit the IOException > 5. On all the nodes, the metadata corresponded to the original valid block > while the block itself was missing around 12K of data. > Two problems which could be fixed here > 1. When namenode identifies replicas with different blocksize (point 2 > above). It could choose the biggest block and discard the small block. If the > block is not the last block, then its size has to be equal to the block size, > anything less than that could be considered bad block. > 2. Datanode Block periodic verifier could also verify that the metadata has > the correct size as that of the actual block present. Any changes should be > reported/recovered considering what would be done in above step. -- This message was sent by Atlassian JIRA (v6.2#6252)