[ 
https://issues.apache.org/jira/browse/HDFS-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HDFS-8878.
----------------------------------
    Resolution: Duplicate

> An HDFS built-in DistCp 
> ------------------------
>
>                 Key: HDFS-8878
>                 URL: https://issues.apache.org/jira/browse/HDFS-8878
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Linxiao Jin
>            Assignee: Linxiao Jin
>            Priority: Major
>
> For now, we use DistCp to do directory copy, which works quite good. However, 
> it would be better if there is an HDFS built-in, efficient, directory copy 
> tool. It could be faster by cut off the redundant communication between HDFS, 
> YARN and MapReduce. It could also release the resource DistCp consumed in job 
> tracker and YARN and easier for debugging.
> We need more discussion on the new protocol between NN and DN from different 
> clusters to achieve HDFS-level command sending and data transfer. One 
> available hacky solution could be, the srcNN get the block distribution of 
> the target file, ask each datanode to start a DFSClient and copy their local 
> shortcircuited block as a file in dst cluster. After all the block-file in 
> dst cluster is completed, use a DFSClient to concat them together to form the 
> target destination file. There might be some optimized solution by implement 
> a newly designed protocol to communicate over cluster rather than DFSClient 
> and use methods from lower bottom layer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to