I've been trying to figure out how to do a set difference in hadoop. I
would like to take 2 file, and remove the values they have in common
between them. Let's say I have two bags, 'students' and 'employees'. I
want to find which students are just students, and which employees are
just emplo
Hi,
Please try removing the combiner and running.
I know that if you use multiple outputs from within a mapper, those pairs
are not a part of sort and shuffle phase. Your combiner is same as reducer
which uses mos, and might be an issue on map side. If I'm to take a guess, mos
writes to a diffe
Thanks, but removing the combiner doesn't seem to have done anything.
This is what confuses me though, the only strange thing I'm doing is the
MultipleOutput stuff. Why is the problem in the mapper then? The
Reducer is where I'm using it...
Jim
Amogh Vasekar wrote:
Hi,
Please try removing
Apparently I just can't use MultipleOutputs with the new 20.0
interfaces. I have to go back to the old ones.
Jim