Rakesh R created HDFS-9256: ------------------------------ Summary: Erasure Coding: Improve failure handling of ECWorker striped block reconstruction Key: HDFS-9256 URL: https://issues.apache.org/jira/browse/HDFS-9256 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R
As we know reconstruction of missed striped block is a costly operation, it involves the following steps:- step-1) read the data from minimum number of sources(remotely reading the data) step-2) decode data for the targets (CPU cycles) step-3) transfer the data to the targets(remotely writing the data) Assume there is a failure in step-3 due to target DN disconnected or dead etc. Presently {{ECWorker}} is skipping the failed DN and continue transferring data to the other targets. In the next round, it should again start the reconstruction operation from first step. Considering the cost of reconstruction, it would be good to give another chance to retry the failed operation. The idea of this jira is to disucss the possible approaches and implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)