Debugging Partitioner problems

2010-01-20 Thread Erik Forsberg
Hi! I have a problem with one of my reducers getting 3 times as much data as the other 15 reducers, causing longer total runtime per job. What would be the best way to debug this? I'm guessing I'm outputting keys that somehow fool the partitioner. Can I tell hadoop to save the map outputs per red

Re: Debugging Partitioner problems

2010-01-20 Thread Amogh Vasekar
>>Can I tell hadoop to save the map outputs per reducer to be able to inspect >>what's in them You can set keep.tasks.files.pattern will save mapper output, set this regex to match your job/task as need be. But this will eat up a lot of local disk space. The problem most likely is your data ( o