Are you calling one command per file? That's bound to be slow as it invokes a new JVM each time. On Jan 29, 2014 7:15 AM, "Jay Vyas" <jayunit...@gmail.com> wrote:
> Im finding that "hadoop fs -put" on a cluster is quite slow for me when i > have large amounts of small files... much slower than native file ops. > Note that Im using the RawLocalFileSystem as the underlying backing > filesystem that is being written to in this case, so HDFS isnt the issue. > > I see that the Put class creates a linkedlist of # number of elements in > the path. > > 1) Is there a more performant way to run "fs -put" > > 2) Has anyone else noted that "fs -put" has extra overhead? > > Im going to trace some more but , just wanted to bounce this off the > mailing list... maybe others also have run into this issue. > > ** Is "hadoop fs -put" inherently slower than a unix "cp"action, > regardless of filesystem -- and if so , why? ** > > > -- > Jay Vyas > http://jayunit100.blogspot.com >