use S3 as input to MR job

2012-07-19 Thread Dan Yi
i have a MR job to read file on amazon S3 and process the data on local hdfs. the files are zipped text file as .gz. i tried to setup the job as below but it won't work, anyone know what might be wrong? do i need to add extra step to unzip the file first? thanks. String S3_LOCATION = "s3n://ac

Re: use S3 as input to MR job

2012-07-19 Thread Harsh J
Dan, Can you share your error? The plain .gz files (not .tar.gz) are natively supported by Hadoop via its GzipCodec, and if you are facing an error, I believe its cause of something other than compression. On Fri, Jul 20, 2012 at 6:14 AM, Dan Yi wrote: > i have a MR job to read file on amazon S

Re: use S3 as input to MR job

2012-10-02 Thread Ben Kim
I'm having a similar issue I'm running a wordcount MR as follows hadoop jar WordCount.jar wordcount.WordCountDriver > s3n://bucket/wordcount/input s3n://bucket/wordcount/output s3n://bucket/wordcount/input is a s3 object that contains other input files. However I get following NPE error 12/10

Re: use S3 as input to MR job

2012-10-02 Thread Marcos Ortiz
Are you sure that you prepare your MR code to work with mutiple files? This example (WordCount) works with a single input. You should take a look to the MultipleInput API for this. Best wishes El 02/10/2012 6:05, Ben Kim escribió: I'm having a simil