If CP command is not parallel how does it work for a file partitioned on various data nodes?
On Wed, Apr 10, 2013 at 6:30 PM, Azuryy Yu <azury...@gmail.com> wrote: > CP command is not parallel, It's just call FileSystem, even if DFSClient > has multi threads. > > DistCp can work well on the same cluster. > > > On Thu, Apr 11, 2013 at 8:17 AM, KayVajj <vajjalak...@gmail.com> wrote: > >> The File System Copy utility copies files byte by byte if I'm not wrong. >> Could it be possible that the cp command works with blocks and moves them >> which could be significantly efficient? >> >> >> Also how does the cp command work if the file is distributed on different >> data nodes?? >> >> Thanks >> Kay >> >> >> On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas <jayunit...@gmail.com> wrote: >> >>> DistCP is a full blown mapreduce job (mapper only, where the mappers do >>> a "fully" parallel copy to the detsination). >>> >>> CP appears (correct me if im wrong) to simply invoke the FileSystem and >>> issues a copy command for every source file. >>> >>> I have an additional question: how is CP which is internal to a cluster >>> optimized (if at all) ? >>> >>> >>> >>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 <shurong....@qunar.com> wrote: >>> >>>> ** >>>> Hi, >>>> >>>> I think it' better using Copy in the same cluster while using distCP >>>> between clusters, and cp command is a hadoop internal parallel process and >>>> will not copy files locally. >>>> >>>> ------------------------------ >>>> 麦树荣 >>>> >>>> *From:* KayVajj <vajjalak...@gmail.com> >>>> *Date:* 2013-04-11 06:20 >>>> *To:* user@hadoop.apache.org >>>> *Subject:* Copy Vs DistCP >>>> I have few questions regarding the usage of DistCP for copying >>>> files in the same cluster. >>>> >>>> >>>> 1) Which one is better within a same cluster and what factors (like >>>> file size etc) wouldinfluence the usage of one over te other? >>>> >>>> 2) when we run a cp command like below from a client node of the >>>> cluster (not a data node), How does the cp command work >>>> i) like an MR job >>>> ii) copy files locally and then it copy it back at the new >>>> location. >>>> >>>> Example of the copy command >>>> >>>> hdfs dfs -cp /<some_location>/file /<new_location>/ >>>> >>>> Thanks, your responses are appreciated. >>>> >>>> -- Kay >>>> >>> >>> >>> >>> -- >>> Jay Vyas >>> http://jayunit100.blogspot.com >>> >> >> >