Amit, Have you explored chainMapper class?
*Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <john.lil...@redpoint.net>wrote: > Isn’t this is what you’d normally do in the Mapper? > > My understanding of the combiner is that it is like a “mapper-side > pre-reducer” and operates on blocks of data that have already been sorted > by key, so mucking with the keys doesn’t **seem** like a good idea. > > john > > > > *From:* Amit Sela [mailto:am...@infolinks.com] > *Sent:* Sunday, January 12, 2014 9:26 AM > *To:* user@hadoop.apache.org > *Subject:* manipulating key in combine phase > > > > Hi all, > > > > I was wondering if it is possible to manipulate the key during combine: > > > > Say I have a mapreduce job where the key has many qualifiers. > > I would like to "split" the key into two (or more) keys if it has more > than, say 100 qualifiers. > > In the combiner class I would do something like: > > > > int count = 0; > > for (Writable value: values) { > > if (++count >= 100){ > > context.write(newKey, value); > > } else { > > context.write(key, value); > > } > > } > > > > where newKey is something like key+randomUUID > > > > I know that the combiner can be called "zero, once or more..." and I'm > getting strange results (same key written more then once) so I would be > glad to get some deeper insight into how the combiner works. > > > > Thanks, > > > > Amit. >