thank you  the -D params is quit good 
 
/data1/cug-dw/yannian.mu/kmeans_improve/bin/mahout seq2sparse -i 
/user/cug-dw/yannian.mu/consume/cleanup_day/201109*_* -o 
/user/cug-dw/yannian.mu/consume/vector_month/20110913_201137 -mindoc 0 -maxdoc 
1000000000 -ow -nr 1000 -wt tfidf -Dmapred.job.shuffle.input.buffer.percent=0.5 
-Dio.sort.mb=60 -Dmapred.job.priority=VERY_LOW 
-Dmapred.reduce.slowstart.completed.maps=0.85
 /data1/cug-dw/yannian.mu/kmeans_improve/bin/mahout meanshift -i 
/user/cug-dw/yannian.mu/consume/vector_month/20110913_201137/tfidf-vectors -o 
/user/cug-dw/yannian.mu/consume/meanshift_month/20110913_201137 -dm 
org.apache.mahout.common.distance.TanimotoDistanceMeasure -x 30 -t1 0.1 -t2 0.1 
-cl -ow -ovdrop -ovnl 1000 -ovsize 1000 -ovbounds 500 
-Dmapred.job.shuffle.input.buffer.percent=0.5 -Dio.sort.mb=60 
-Dmapred.job.priority=VERY_LOW -Dmapred.reduce.slowstart.completed.maps=0.85





At 2011-09-14 01:05:22,"Sean Owen" <sro...@gmail.com> wrote:
>MapReduce ought to control the number of workers reasonably well, and
>you can override with mapred.reduce.tasks if you want. I don't think
>any fixed number works: what's right for 2 machines isn't right for
>200.
>
>2011/9/13 myn <m...@163.com>:
>>  private static void startDFCounting(Path input, Path output, Configuration 
>> baseConf,int numReducers)
>>
>>  private static void makePartialVectors(Path input,
>>
>> meanshift cluster
>>
>> and so many place ,why?  hadoop default is 2 reduce, but my data is 3 billon 
>> ,2 reduce is so slowly.

Reply via email to