Re: HDFS to S3 copy problems

2009-05-13 Thread Ken Krugler
--- From: Tom White [mailto:t...@cloudera.com] Sent: Friday, May 08, 2009 1:36 AM To: core-user@hadoop.apache.org Subject: Re: HDFS to S3 copy problems Perhaps we should revisit the implementation of NativeS3FileSystem so that it doesn't always buffer the file on the client. We could h

Re: HDFS to S3 copy problems

2009-05-12 Thread Tom White
hese two > distributed reads vs a distributed read and a local write then local read. > > What do you think? > > Cheers, > Ian Nowland > Amazon.com > > -----Original Message- > From: Tom White [mailto:t...@cloudera.com] > Sent: Friday, May 08, 2009 1:36 AM > To: co

RE: HDFS to S3 copy problems

2009-05-08 Thread Nowland, Ian
read and a local write then local read. What do you think? Cheers, Ian Nowland Amazon.com -Original Message- From: Tom White [mailto:t...@cloudera.com] Sent: Friday, May 08, 2009 1:36 AM To: core-user@hadoop.apache.org Subject: Re: HDFS to S3 copy problems Perhaps we should r

Re: HDFS to S3 copy problems

2009-05-08 Thread Ken Krugler
Perhaps we should revisit the implementation of NativeS3FileSystem so that it doesn't always buffer the file on the client. We could have an option to make it write directly to S3. Thoughts? Regarding the problem with HADOOP-3733, you can work around it by setting fs.s3.awsAccessKeyId and fs.s3.a

Re: HDFS to S3 copy problems

2009-05-08 Thread Tom White
Perhaps we should revisit the implementation of NativeS3FileSystem so that it doesn't always buffer the file on the client. We could have an option to make it write directly to S3. Thoughts? Regarding the problem with HADOOP-3733, you can work around it by setting fs.s3.awsAccessKeyId and fs.s3.aw

Re: HDFS to S3 copy problems

2009-05-07 Thread Andrew Hitchcock
Hi Ken, S3N doesn't work that well with large files. When uploading a file to S3, S3N saves it to local disk during write() and then uploads to S3 during the close(). Close can take a long time for large files and it doesn't report progress, so the call can time out. As a work around, I'd recomme

HDFS to S3 copy problems

2009-05-07 Thread Ken Krugler
Hi all, I have a few large files (4 that are 1.8GB+) I'm trying to copy from HDFS to S3. My micro EC2 cluster is running Hadoop 0.19.1, and has one master/two slaves. I first tried using the hadoop fs -cp command, as in: hadoop fs -cp output// s3n: This seemed to be working, as I could