I am developing a MR application w/ hadoop that is generating during it's map phase a really large number of output keys and it is having an abysmal performance.
While just reading the said data takes 20 minutes and processing it but not outputting anything from the map takes around 30 min, running the full application takes around 4 hours. Is this a known or expected issue? Cheers. Tiago Alves Macambira -- "I may be drunk, but in the morning I will be sober, while you will still be stupid and ugly." -Winston Churchill