MapReduce ought to control the number of workers reasonably well, and you can override with mapred.reduce.tasks if you want. I don't think any fixed number works: what's right for 2 machines isn't right for 200.
2011/9/13 myn <m...@163.com>: > private static void startDFCounting(Path input, Path output, Configuration > baseConf,int numReducers) > > private static void makePartialVectors(Path input, > > meanshift cluster > > and so many place ,why? hadoop default is 2 reduce, but my data is 3 billon > ,2 reduce is so slowly.