thank you the -D params is quit good /data1/cug-dw/yannian.mu/kmeans_improve/bin/mahout seq2sparse -i /user/cug-dw/yannian.mu/consume/cleanup_day/201109*_* -o /user/cug-dw/yannian.mu/consume/vector_month/20110913_201137 -mindoc 0 -maxdoc 1000000000 -ow -nr 1000 -wt tfidf -Dmapred.job.shuffle.input.buffer.percent=0.5 -Dio.sort.mb=60 -Dmapred.job.priority=VERY_LOW -Dmapred.reduce.slowstart.completed.maps=0.85 /data1/cug-dw/yannian.mu/kmeans_improve/bin/mahout meanshift -i /user/cug-dw/yannian.mu/consume/vector_month/20110913_201137/tfidf-vectors -o /user/cug-dw/yannian.mu/consume/meanshift_month/20110913_201137 -dm org.apache.mahout.common.distance.TanimotoDistanceMeasure -x 30 -t1 0.1 -t2 0.1 -cl -ow -ovdrop -ovnl 1000 -ovsize 1000 -ovbounds 500 -Dmapred.job.shuffle.input.buffer.percent=0.5 -Dio.sort.mb=60 -Dmapred.job.priority=VERY_LOW -Dmapred.reduce.slowstart.completed.maps=0.85
At 2011-09-14 01:05:22,"Sean Owen" <sro...@gmail.com> wrote: >MapReduce ought to control the number of workers reasonably well, and >you can override with mapred.reduce.tasks if you want. I don't think >any fixed number works: what's right for 2 machines isn't right for >200. > >2011/9/13 myn <m...@163.com>: >> private static void startDFCounting(Path input, Path output, Configuration >> baseConf,int numReducers) >> >> private static void makePartialVectors(Path input, >> >> meanshift cluster >> >> and so many place ,why? hadoop default is 2 reduce, but my data is 3 billon >> ,2 reduce is so slowly.