[ https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133717#comment-14133717 ]
Vinayakumar B edited comment on HDFS-2932 at 9/15/14 8:56 AM: -------------------------------------------------------------- Hi [~usrikanth], Thanks for looking into this issue. 1. Case1 is solved in HDFS-3493 2. Case 2: I understand your point, I agree that it will be better to detect the failed replicas as early as possible and delete it. but we cannot do that on the fly while writing itself. It may delete the working copy itself if the report from datanode is little delayed. and this is possible in case of huge cluster. Its better to keep the corrupt replica for sometime instead of loosing the valid replica. Similar cases observed and code has been added to ignore such variations. see below comment in BlockManager.java. {code} // If it's a RBW report for a COMPLETE block, it may just be that // the block report got a little bit delayed after the pipeline // closed. So, ignore this report, assuming we will get a // FINALIZED replica later. See HDFS-2791{code} {quote} Solution is to be able to detect and capture a write-pipeline-failed replica as early as possible. First fix may be to change the check from 'isCompleted' to 'isCommitted'. This will capture write-pipeline-failed replicas reported just after commit and before 'complete' and mark them as corrupt. {quote} the timegap between 'isCommitted' and 'isCompleted' is not so huge, so ideally this will not change much. {quote} Then to capture write-pipeline-failed replicas reported before commit, I am investigating if this can be solved by marking them as corrupt as part of commit. There already exists a check to find any mis-stamped replicas during commit but we only remove them from the blocksMap. In addition can we not mark such replicas as corrupt? {quote} "setGenerationStampAndVerifyReplicas" is just updating the inmemory states of the replicas being written in Namenode. its not changing the blocksMap. I dont think this is the right place to decide about the corrupt replicas. I think its always better to handle the block validations when reported from datanode, yes it takes time :( was (Author: vinayrpet): Hi [~usrikanth], Thanks for looking into this issue. 1. Case1 is a solved in HDFS-3493 2. Case 2: I understand your point, I agree that it will be better to detect the failed replicas as early as possible and delete it. but we cannot do that on the fly while writing itself. It may delete the working copy itself if the report from datanode is little delayed. and this is possible in case of huge cluster. Its better to keep the corrupt replica for sometime instead of loosing the valid replica. Similar cases observed and code has been added to ignore such variations. see below comment in BlockManager.java. {code} // If it's a RBW report for a COMPLETE block, it may just be that // the block report got a little bit delayed after the pipeline // closed. So, ignore this report, assuming we will get a // FINALIZED replica later. See HDFS-2791{code} {quote} Solution is to be able to detect and capture a write-pipeline-failed replica as early as possible. First fix may be to change the check from 'isCompleted' to 'isCommitted'. This will capture write-pipeline-failed replicas reported just after commit and before 'complete' and mark them as corrupt. {quote} the timegap between 'isCommitted' and 'isCompleted' is not so huge, so ideally this will not change much. {quote} Then to capture write-pipeline-failed replicas reported before commit, I am investigating if this can be solved by marking them as corrupt as part of commit. There already exists a check to find any mis-stamped replicas during commit but we only remove them from the blocksMap. In addition can we not mark such replicas as corrupt? {quote} "setGenerationStampAndVerifyReplicas" is just updating the inmemory states of the replicas being written in Namenode. its not changing the blocksMap. I dont think this is the right place to decide about the corrupt replicas. I think its always better to handle the block validations when reported from datanode, yes it takes time :( > Under replicated block after the pipeline recovery. > --------------------------------------------------- > > Key: HDFS-2932 > URL: https://issues.apache.org/jira/browse/HDFS-2932 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 0.24.0 > Reporter: J.Andreina > Fix For: 0.24.0 > > > Started 1NN,DN1,DN2,DN3 in the same machine. > Written a huge file of size 2 Gb > while the write for the block-id-1005 is in progress bruought down DN3. > after the pipeline recovery happened.Block stamp changed into block_id_1006 > in DN1,Dn2. > after the write is over.DN3 is brought up and fsck command is issued. > the following mess is displayed as follows > "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas". -- This message was sent by Atlassian JIRA (v6.3.4#6332)