[ 
https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133717#comment-14133717
 ] 

Vinayakumar B edited comment on HDFS-2932 at 9/15/14 8:56 AM:
--------------------------------------------------------------

Hi [~usrikanth],
Thanks for looking into this issue.

1. Case1  is solved in HDFS-3493

2. Case 2: I understand your point,
    I agree that it will be better to detect the failed replicas as early as 
possible and delete it. but we cannot do that on the fly while writing itself. 
It may delete the working copy itself if the report from datanode is little 
delayed. and this is possible in case of huge cluster. Its better to keep the 
corrupt replica for sometime instead of loosing the valid replica. Similar 
cases observed and code has been added to ignore such variations. see below 
comment in BlockManager.java.
{code}          // If it's a RBW report for a COMPLETE block, it may just be 
that
          // the block report got a little bit delayed after the pipeline
          // closed. So, ignore this report, assuming we will get a
          // FINALIZED replica later. See HDFS-2791{code}
   {quote}
Solution is to be able to detect and capture a write-pipeline-failed replica as 
early as possible. First fix may be to change the check from 'isCompleted' to 
'isCommitted'. This will capture write-pipeline-failed replicas reported just 
after commit and before 'complete' and mark them as corrupt.
    {quote}
 the timegap between 'isCommitted' and 'isCompleted' is not so huge, so ideally 
this will not change much.

{quote}
Then to capture write-pipeline-failed replicas reported before commit, I am 
investigating if this can be solved by marking them as corrupt as part of 
commit. There already exists a check to find any mis-stamped replicas during 
commit but we only remove them from the blocksMap. In addition can we not mark 
such replicas as corrupt?
{quote}
"setGenerationStampAndVerifyReplicas" is just updating the inmemory states of 
the replicas being written in Namenode. its not changing the blocksMap. I dont 
think this is the right place to decide about the corrupt replicas.

I think its always better to handle the block validations when reported from 
datanode, yes it takes time :(


was (Author: vinayrpet):
Hi [~usrikanth],
Thanks for looking into this issue.

1. Case1  is a solved in HDFS-3493

2. Case 2: I understand your point,
    I agree that it will be better to detect the failed replicas as early as 
possible and delete it. but we cannot do that on the fly while writing itself. 
It may delete the working copy itself if the report from datanode is little 
delayed. and this is possible in case of huge cluster. Its better to keep the 
corrupt replica for sometime instead of loosing the valid replica. Similar 
cases observed and code has been added to ignore such variations. see below 
comment in BlockManager.java.
{code}          // If it's a RBW report for a COMPLETE block, it may just be 
that
          // the block report got a little bit delayed after the pipeline
          // closed. So, ignore this report, assuming we will get a
          // FINALIZED replica later. See HDFS-2791{code}
   {quote}
Solution is to be able to detect and capture a write-pipeline-failed replica as 
early as possible. First fix may be to change the check from 'isCompleted' to 
'isCommitted'. This will capture write-pipeline-failed replicas reported just 
after commit and before 'complete' and mark them as corrupt.
    {quote}
 the timegap between 'isCommitted' and 'isCompleted' is not so huge, so ideally 
this will not change much.

{quote}
Then to capture write-pipeline-failed replicas reported before commit, I am 
investigating if this can be solved by marking them as corrupt as part of 
commit. There already exists a check to find any mis-stamped replicas during 
commit but we only remove them from the blocksMap. In addition can we not mark 
such replicas as corrupt?
{quote}
"setGenerationStampAndVerifyReplicas" is just updating the inmemory states of 
the replicas being written in Namenode. its not changing the blocksMap. I dont 
think this is the right place to decide about the corrupt replicas.

I think its always better to handle the block validations when reported from 
datanode, yes it takes time :(

> Under replicated block after the pipeline recovery.
> ---------------------------------------------------
>
>                 Key: HDFS-2932
>                 URL: https://issues.apache.org/jira/browse/HDFS-2932
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 0.24.0
>            Reporter: J.Andreina
>             Fix For: 0.24.0
>
>
> Started 1NN,DN1,DN2,DN3 in the same machine.
> Written a huge file of size 2 Gb
> while the write for the block-id-1005 is in progress bruought down DN3.
> after the pipeline recovery happened.Block stamp changed into block_id_1006 
> in DN1,Dn2.
> after the write is over.DN3 is brought up and fsck command is issued.
> the following mess is displayed as follows
> "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to