Re: How to set SequenceFile.Metadata from within SequenceFileOutputFormat?

2010-08-09 Thread David Rosenstrauch
On 08/09/2010 09:14 PM, Harsh J wrote: Another solution would be to create a custom named output using mapred.lib.MultipleOutputs and collecting to that instead of the job-set output format (which one can set to NullOutputFormat so it doesn't complain about existing paths, etc.). So if you'd wan

Re: How to set SequenceFile.Metadata from within SequenceFileOutputFormat?

2010-08-09 Thread Harsh J
Another solution would be to create a custom named output using mapred.lib.MultipleOutputs and collecting to that instead of the job-set output format (which one can set to NullOutputFormat so it doesn't complain about existing paths, etc.). So if you'd want 'foo' prefix to your 0-N number

Re: Map output in the map side is 10 bytes bigger than on the reduce?

2010-08-09 Thread Allen Wittenauer
On Aug 9, 2010, at 1:27 PM, Pedro Costa wrote: > > 2 - If I'm deducting correctly, the reduce will always fetch 10 bytes > less than the saved map output? Why do you care?

Re: How to set SequenceFile.Metadata from within SequenceFileOutputFormat?

2010-08-09 Thread David Rosenstrauch
On 08/09/2010 05:45 PM, David Rosenstrauch wrote: On 08/09/2010 04:01 PM, David Rosenstrauch wrote: On a similar note, it looks like if I want to customize the name/path of the generated SequenceFile my only option currently is to override FileOutputFormat.getDefaultWorkFile(). a) Again, have I

Re: How to set SequenceFile.Metadata from within SequenceFileOutputFormat?

2010-08-09 Thread David Rosenstrauch
On 08/09/2010 04:01 PM, David Rosenstrauch wrote: On a similar note, it looks like if I want to customize the name/path of the generated SequenceFile my only option currently is to override FileOutputFormat.getDefaultWorkFile(). a) Again, have I got this correct, or am I overlooking something? b

Map output in the map side is 10 bytes bigger than on the reduce?

2010-08-09 Thread Pedro Costa
Hi, 1 - I'm trying to compare the size of 1 map output on the map and on the reduce side. So, I did some code modifications in the MR to see what's happening when map saves map outputs and the reduce fetchs them, and I've notice that the map output fetched by the reducer is smaller 10 bytes than t

Re: How to set SequenceFile.Metadata from within SequenceFileOutputFormat?

2010-08-09 Thread David Rosenstrauch
On 08/07/2010 02:06 AM, Harsh J wrote: On Sat, Aug 7, 2010 at 11:20 AM, David Rosenstrauch wrote: I'm using a SequenceFileOutputFormat. But I'd like to be able to set some SequenceFile.Metadata on the SequenceFile.Writer that's getting created. Doesn't look like there's any easy way to do th

Re: DBInputFormat / DBWritable question

2010-08-09 Thread David Rosenstrauch
Tnx much for the info, and the additional tips. Unfortunately we're doing a lot of transforming of the DB data as we're bringing it into Hadoop, so I don't think Sqoop's an option. Thanks again, DR On 08/06/2010 12:50 AM, Aaron Kimball wrote: The InputFormat instantiates a RecordReader (DBR

Re: How to set SequenceFile.Metadata from within SequenceFileOutputFormat?

2010-08-09 Thread David Rosenstrauch
Not sure if this is something the devs would want to implement a change like this, but it couldn't hurt to at least file it and make them aware. Done: https://issues.apache.org/jira/browse/MAPREDUCE-2001 Thanks, DR On 08/09/2010 12:16 PM, Harsh J wrote: You may also propose to extend the ex

Re: How to set SequenceFile.Metadata from within SequenceFileOutputFormat?

2010-08-09 Thread Harsh J
You may also propose to extend the existing SFOP to allow this on JIRA or the dev mailing list :) On Mon, Aug 9, 2010 at 8:09 PM, David Rosenstrauch wrote: > On 08/07/2010 02:06 AM, Harsh J wrote: >> >> On Sat, Aug 7, 2010 at 11:20 AM, David Rosenstrauch >>  wrote: >>> >>> I'm using a SequenceFil

Re: How to set SequenceFile.Metadata from within SequenceFileOutputFormat?

2010-08-09 Thread David Rosenstrauch
On 08/07/2010 02:06 AM, Harsh J wrote: On Sat, Aug 7, 2010 at 11:20 AM, David Rosenstrauch wrote: I'm using a SequenceFileOutputFormat. But I'd like to be able to set some SequenceFile.Metadata on the SequenceFile.Writer that's getting created. Doesn't look like there's any easy way to do th

How read map outputs?

2010-08-09 Thread Pedro Costa
Hi, 1 - I would like to compare programatically the map output and the reduce input to see if they're equal in MR. So, I'm trying to do an hash on the output generated by the map, and on the input on the reduce side and compare them. The problem is that I'm doing the hash to all the file and not t