DistCP is a full blown mapreduce job (mapper only, where the mappers do a
"fully" parallel copy to the detsination).

CP appears (correct me if im wrong) to simply invoke the FileSystem and
issues a copy command for every source file.

I have an additional question: how is CP which is internal to a cluster
optimized (if at all) ?



On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 <shurong....@qunar.com> wrote:

> **
> Hi,
>
> I think it' better using Copy in the same cluster while using distCP
> between clusters, and cp command is a hadoop internal parallel process and
> will not copy files locally.
>
> ------------------------------
>  麦树荣
>
>  *From:* KayVajj <vajjalak...@gmail.com>
> *Date:* 2013-04-11 06:20
> *To:* user@hadoop.apache.org
> *Subject:* Copy Vs DistCP
>       I have few questions regarding the usage of DistCP for copying
> files in the same cluster.
>
>
> 1) Which one is better within a  same cluster and what factors (like file
> size etc) wouldinfluence the usage of one over te other?
>
>  2) when we run a cp command like below from a  client node of the cluster
> (not a data node), How does the cp command work
>       i) like an MR job
>      ii) copy files locally and then it copy it back at the new location.
>
>  Example of the copy command
>
>  hdfs dfs -cp /<some_location>/file /<new_location>/
>
>  Thanks, your responses are appreciated.
>
>  -- Kay
>



-- 
Jay Vyas
http://jayunit100.blogspot.com

Reply via email to