Hey team,

We're planning to migrate some of our data from an obsolete Hadoop 2.7 to a more recent Hadoop 3.

There is approximately 60 Datanodes on the old one and approximately 10 on the new ones. It will get bigger over the next months but since some of the use cases are migrating out of hadoop we'll require a downsize.

Anyway, we are planning to use a distributed copy to move the data but I have a small concern:

- Can you confirm that the DistCP has to be run on the new cluster ? Since the hdfs-client on the Hadoop 2.X won't be able to write to Hadoop 3, the DistCP has to be on Hadoop 3. We wanted to launch the DistCP on the "old" one, as it is bigger it should have been faster but I do not think it is technically possible. I also have in mind that the network should the bottleneck at some point.

- The doc confuses me a bit https://hadoop.apache.org/docs/stable/hadoop-distcp/DistCp.html#Copying_Between_Versions_of_HDFS. It looks like it is required to use webhdfs, is it still relevant ?

Thanks a lot

PA


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to