[ https://issues.apache.org/jira/browse/HDFS-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HDFS-8878. ---------------------------------- Resolution: Duplicate > An HDFS built-in DistCp > ------------------------ > > Key: HDFS-8878 > URL: https://issues.apache.org/jira/browse/HDFS-8878 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Linxiao Jin > Assignee: Linxiao Jin > Priority: Major > > For now, we use DistCp to do directory copy, which works quite good. However, > it would be better if there is an HDFS built-in, efficient, directory copy > tool. It could be faster by cut off the redundant communication between HDFS, > YARN and MapReduce. It could also release the resource DistCp consumed in job > tracker and YARN and easier for debugging. > We need more discussion on the new protocol between NN and DN from different > clusters to achieve HDFS-level command sending and data transfer. One > available hacky solution could be, the srcNN get the block distribution of > the target file, ask each datanode to start a DFSClient and copy their local > shortcircuited block as a file in dst cluster. After all the block-file in > dst cluster is completed, use a DFSClient to concat them together to form the > target destination file. There might be some optimized solution by implement > a newly designed protocol to communicate over cluster rather than DFSClient > and use methods from lower bottom layer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org