The input files are provided as argument to a binary being executed by map process. This binary cannot read from hdfs and i cant rewrite it. On 25 Jan 2014 19:47, "John Lilley" <john.lil...@redpoint.net> wrote:
> There are no short-circuit writes, only reads, AFAIK. > > Is it necessary to transfer from HDFS to local disk? Can you read from > HDFS directly using the FileSystem interface? > > john > > > > *From:* Shekhar Sharma [mailto:shekhar2...@gmail.com] > *Sent:* Saturday, January 25, 2014 3:44 AM > *To:* user@hadoop.apache.org > *Subject:* Re: HDFS data transfer is faster than SCP based transfer? > > > > We have the concept of short circuit reads which directly reads from data > node which improve read performance. Do we have similar concept like short > circuit writes > > On 25 Jan 2014 16:10, "Harsh J" <ha...@cloudera.com> wrote: > > There's a lot of difference here, although both do use TCP underneath, > but do note that SCP securely encrypts data but stock HDFS > configuration does not. > > You can also ask SCP to compress data transfer via the "-C" argument > btw - unsure if you already applied that pre-test - it may help show > up some difference. Also, the encryption algorithm can be changed to a > weaker one if security is not a concern during the transfer, via "-c > arcfour". > > On Fri, Jan 24, 2014 at 10:55 AM, rab ra <rab...@gmail.com> wrote: > > Hello > > > > I have a use case that requires transfer of input files from remote > storage > > using SCP protocol (using jSCH jar). To optimize this use case, I have > > pre-loaded all my input files into HDFS and modified my use case so that > it > > copies required files from HDFS. So, when tasktrackers works, it copies > > required number of input files to its local directory from HDFS. All my > > tasktrackers are also datanodes. I could see my use case has run faster. > The > > only modification in my application is that file copy from HDFS instead > of > > transfer using SCP. Also, my use case involves parallel operations (run > in > > tasktrackers) and they do lot of file transfer. Now all these transfers > are > > replaced with HDFS copy. > > > > Can anyone tell me HDFS transfer is faster as I witnessed? Is it > because, it > > uses TCP/IP? Can anyone give me reasonable reasons to support the > decrease > > of time? > > > > > > with thanks and regards > > rab > > > > -- > Harsh J >