DistCP is a full blown mapreduce job (mapper only, where the mappers do a "fully" parallel copy to the detsination).
CP appears (correct me if im wrong) to simply invoke the FileSystem and issues a copy command for every source file. I have an additional question: how is CP which is internal to a cluster optimized (if at all) ? On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 <shurong....@qunar.com> wrote: > ** > Hi, > > I think it' better using Copy in the same cluster while using distCP > between clusters, and cp command is a hadoop internal parallel process and > will not copy files locally. > > ------------------------------ > 麦树荣 > > *From:* KayVajj <vajjalak...@gmail.com> > *Date:* 2013-04-11 06:20 > *To:* user@hadoop.apache.org > *Subject:* Copy Vs DistCP > I have few questions regarding the usage of DistCP for copying > files in the same cluster. > > > 1) Which one is better within a same cluster and what factors (like file > size etc) wouldinfluence the usage of one over te other? > > 2) when we run a cp command like below from a client node of the cluster > (not a data node), How does the cp command work > i) like an MR job > ii) copy files locally and then it copy it back at the new location. > > Example of the copy command > > hdfs dfs -cp /<some_location>/file /<new_location>/ > > Thanks, your responses are appreciated. > > -- Kay > -- Jay Vyas http://jayunit100.blogspot.com