Linxiao Jin created HDFS-8878:
---------------------------------

             Summary: An HDFS built-in DistCp 
                 Key: HDFS-8878
                 URL: https://issues.apache.org/jira/browse/HDFS-8878
             Project: Hadoop HDFS
          Issue Type: New Feature
            Reporter: Linxiao Jin
            Assignee: Linxiao Jin


For now, we use DistCp to do directory copy, which works quite good. However, 
it would be better if there is an HDFS built-in, efficient, directory copy 
tool. It could be faster by cut off the redundant communication between HDFS, 
YARN and MapReduce. It could also release the resource DistCp consumed in job 
tracker and YARN and easier for debugging.

We need more discussion on the new protocol between NN and DN from different 
clusters to achieve HDFS-level command sending and data transfer. One available 
hacky solution could be, the srcNN get the block distribution of the target 
file, ask each datanode to start a DFSClient and copy their local 
shortcircuited block as a file in dst cluster. After all the block-file in dst 
cluster is completed, use a DFSClient to concat them together to form the 
target destination file. There might be some optimized solution by implement a 
newly designed protocol to communicate over cluster rather than DFSClient and 
use methods from lower bottom layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to