h Kushary [mailto:himan...@gmail.com]
> *Sent:* Friday, March 29, 2013 9:57 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput**
> **
>
> ** **
>
> Yes you are right CDH4 is the 2.x line, but I even checked in the javadocs
&
.
Dave
From: Himanish Kushary [mailto:himan...@gmail.com]
Sent: Friday, March 29, 2013 9:57 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput
Yes you are right CDH4 is the 2.x line, but I even checked in the javadocs
for 1.0.4 branch
Hi Himanish,
2013/3/29 HimaHnish Kushary
> [...]
>
>
> But the real issue is the throughput. You mentioned that you had
> transferred 1.5 TB in 45 mins which comes to around 583 MB/s. I am barely
> getting 4 MB/s upload speed
>
How large is your outgoing link? Can you expect 500 MB/s with it?
;/" + JobUtils.*isoDate* + "/output/itemtable/",
>>
>> "--s3Endpoint", "s3.amazonaws.com" });
>>
>> ** **
>>
>> Watch the “srcPattern”, make sure you have that leading `.*`, that one
>> threw me for a
".*part.*",
>
> "--dest", "s3n://fruggmapreduce/results-"+env+"/" +
>JobUtils.isoDate + "/output/itemtable/",
>
> "--s3Endpoint", "s3.amazonaws.com" });
>
>
>
zonaws.com" });
>
> ** **
>
> Watch the “srcPattern”, make sure you have that leading `.*`, that one
> threw me for a loop once.
>
> ** **
>
> Dave
>
> ** **
>
> ** **
>
> *From:* Himanish Kushary [mailto:himan...@gmail.co
threw
me for a loop once.
Dave
From: Himanish Kushary [mailto:himan...@gmail.com]
Sent: Thursday, March 28, 2013 5:51 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput
Hi Dave,
Thanks for your reply. Our hadoop instance is inside ou
Hi Dave,
Thanks for your reply. Our hadoop instance is inside our corporate
LAN.Could you please provide some details on how i could use the s3distcp
from amazon to transfer data from our on-premises hadoop to amazon s3.
Wouldn't some kind of VPN be needed between the Amazon EMR instance and our
o
Have you tried using s3distcp from amazon? I used it many times to transfer
1.5TB between S3 and Hadoop instances. The process took 45 min, well over
the 10min timeout period you're running into a problem on.
Dave
From: Himanish Kushary [mailto:himan...@gmail.com]
Sent: Thursday, March
The EMR distributions have special versions of the s3 file system. They
might be helpful here.
Of course, you likely aren't running those if you are seeing 5MB/s.
An extreme alternative would be to light up an EMR cluster, copy to it,
then to S3.
On Thu, Mar 28, 2013 at 4:54 AM, Himanish Kusha
10 matches
Mail list logo