Hadoop with Multiple Inpus and Outputs

2009-12-02 Thread James R. Leek
I've been trying to figure out how to do a set difference in hadoop. I would like to take 2 file, and remove the values they have in common between them. Let's say I have two bags, 'students' and 'employees'. I want to find which students are just students, and which employees are just emplo

Re: Hadoop with Multiple Inpus and Outputs

2009-12-03 Thread Amogh Vasekar
Hi, Please try removing the combiner and running. I know that if you use multiple outputs from within a mapper, those pairs are not a part of sort and shuffle phase. Your combiner is same as reducer which uses mos, and might be an issue on map side. If I'm to take a guess, mos writes to a diffe

Re: Hadoop with Multiple Inpus and Outputs

2009-12-03 Thread James R. Leek
Thanks, but removing the combiner doesn't seem to have done anything. This is what confuses me though, the only strange thing I'm doing is the MultipleOutput stuff. Why is the problem in the mapper then? The Reducer is where I'm using it... Jim Amogh Vasekar wrote: Hi, Please try removing

Re: Hadoop with Multiple Inpus and Outputs

2009-12-03 Thread James R. Leek
Apparently I just can't use MultipleOutputs with the new 20.0 interfaces. I have to go back to the old ones. Jim