Hi all,
I was wondering if it is possible to manipulate the key during combine:
Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more
than, say 100 qualifiers.
In the combiner class I would do something like:
int count = 0;
for (Writable value: values) {
if (++count >= 100){
context.write(newKey, value);
} else {
context.write(key, value);
}
}
where newKey is something like key+randomUUID
I know that the combiner can be called "zero, once or more..." and I'm
getting strange results (same key written more then once) so I would be
glad to get some deeper insight into how the combiner works.
Thanks,
Amit.