Rakesh R created HDFS-9256:
------------------------------

             Summary: Erasure Coding: Improve failure handling of ECWorker 
striped block reconstruction
                 Key: HDFS-9256
                 URL: https://issues.apache.org/jira/browse/HDFS-9256
             Project: Hadoop HDFS
          Issue Type: Sub-task
            Reporter: Rakesh R
            Assignee: Rakesh R


As we know reconstruction of missed striped block is a costly operation, it 
involves the following steps:-

step-1) read the data from minimum number of sources(remotely reading the data)
step-2) decode data for the targets (CPU cycles)
step-3) transfer the data to the targets(remotely writing the data)

Assume there is a failure in step-3 due to target DN disconnected or dead etc. 
Presently {{ECWorker}} is skipping the failed DN and continue transferring data 
to the other targets. In the next round, it should again start the 
reconstruction operation from first step. Considering the cost of 
reconstruction, it would be good to give another chance to retry the failed 
operation. The idea of this jira is to disucss the possible approaches and 
implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to