subject:"why so many place does`t set job.setNumReduceTasks"

why so many place does`t set job.setNumReduceTasks

2011-09-13 Thread myn

private static void startDFCounting(Path input, Path output, Configuration baseConf,int numReducers) private static void makePartialVectors(Path input, meanshift cluster and so many place ,why? hadoop default is 2 reduce, but my data is 3 billon ,2 reduce is so slowly.

RE: why so many place does`t set job.setNumReduceTasks

2011-09-13 Thread Jeff Eastman

You can use -Dmapred.reduce.tasks=n to set the number of reducers for most Mahout CLI jobs. Just be sure it is the first argument. -Original Message- From: myn [mailto:m...@163.com] Sent: Tuesday, September 13, 2011 9:15 AM To: user@mahout.apache.org Subject: why so many place does`t

Re: why so many place does`t set job.setNumReduceTasks

2011-09-13 Thread Sean Owen

MapReduce ought to control the number of workers reasonably well, and you can override with mapred.reduce.tasks if you want. I don't think any fixed number works: what's right for 2 machines isn't right for 200. 2011/9/13 myn : > private static void startDFCounting(Path input, Path output, Config

Re:Re: why so many place does`t set job.setNumReduceTasks

2011-09-13 Thread myn

thank you the -D params is quit good /data1/cug-dw/yannian.mu/kmeans_improve/bin/mahout seq2sparse -i /user/cug-dw/yannian.mu/consume/cleanup_day/201109*_* -o /user/cug-dw/yannian.mu/consume/vector_month/20110913_201137 -mindoc 0 -maxdoc 10 -ow -nr 1000 -wt tfidf -Dmapred.job.shuffle