Re: Copy Vs DistCP

KayVajj Wed, 10 Apr 2013 17:17:57 -0700

The File System Copy utility copies files byte by byte if I'm not wrong.
Could it be possible that the cp command works with blocks and moves them
which could be significantly efficient?



Also how does the cp command work if the file is distributed on different
data nodes??

Thanks
Kay


On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas <jayunit...@gmail.com> wrote:

> DistCP is a full blown mapreduce job (mapper only, where the mappers do a
> "fully" parallel copy to the detsination).
>
> CP appears (correct me if im wrong) to simply invoke the FileSystem and
> issues a copy command for every source file.
>
> I have an additional question: how is CP which is internal to a cluster
> optimized (if at all) ?
>
>
>
> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 <shurong....@qunar.com> wrote:
>
>> **
>> Hi，
>>
>> I think it' better using Copy in the same cluster while using distCP
>> between clusters, and cp command is a hadoop internal parallel process and
>> will not copy files locally.
>>
>> ------------------------------
>>  麦树荣
>>
>>  *From:* KayVajj <vajjalak...@gmail.com>
>> *Date:* 2013-04-11 06:20
>> *To:* user@hadoop.apache.org
>> *Subject:* Copy Vs DistCP
>>       I have few questions regarding the usage of DistCP for copying
>> files in the same cluster.
>>
>>
>> 1) Which one is better within a  same cluster and what factors (like file
>> size etc) wouldinfluence the usage of one over te other?
>>
>>  2) when we run a cp command like below from a  client node of the
>> cluster (not a data node), How does the cp command work
>>       i) like an MR job
>>      ii) copy files locally and then it copy it back at the new location.
>>
>>  Example of the copy command
>>
>>  hdfs dfs -cp /<some_location>/file /<new_location>/
>>
>>  Thanks, your responses are appreciated.
>>
>>  -- Kay
>>
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: Copy Vs DistCP

Reply via email to