Yes, I mean the block size of HDFS. Since there is a combiner in picture in buildClusters, so, there might not be enough rows to process for the reduce tasks. Just a wild guess.You can also try with a larger input data.
On 11-03-2012 16:49, WangRamon wrote: > Hi Paritosh I think the block size may be the problem too, btw, do you mean > the block size of the HDFS? I know its default size is 64MB, but I haven't > tried some other size. Thanks Ramon> Date: Sun, 11 Mar 2012 13:18:52 +0530 >> From: pran...@xebia.com >> To: user@mahout.apache.org >> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means >> cluster >> >> Can you try reducing/increasing you block and see the impact? >> I am suspecting block size to be the problem. >> >> I have faced the same problem once ( for a different hadoop job, and it >> was very hard to debug it ). In that case, CompositeInputFormat was >> being used as input, which used to fix the block size to 64 MB, and >> hence, only few reducers were activated. So, trying different block >> sizes might give some clue. >> >> On 11-03-2012 11:04, WangRamon wrote: >>> Here is the configuration: <property> >>> <name>mapred.tasktracker.map.tasks.maximum</name> >>> <value>14</value> >>> </property> >>> <property> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name> >>> <value>14</value> >>> </property> >>> <property> >>> <name>mapred.reduce.tasks</name> >>> <value>73</value> >>> </property> >>> >>> Each node has a RAM of 32GB, i think it should be fine to have the above >>> configuartion. >>> > Date: Sat, 10 Mar 2012 22:31:44 -0700 >>>> From: j...@windwardsolutions.com >>>> To: user@mahout.apache.org >>>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means >>>> cluster >>>> >>>> What's your Hadoop config in terms of the maximum number of reducers? >>>> It's a function of your available RAM on each node and numbers of nodes. >>>> >>>> On 3/10/12 8:55 PM, WangRamon wrote: >>>>> Hi Paritosh I did the tests with 1 job and 5 jobs, they all have the >>>>> same problem, the job i'm running is the buildClusters one, I can see >>>>> there are 73 reduce tasks created from the monitor GUI, but only 12 of >>>>> them are running at any time (the rest are in pending state), the task >>>>> finished very quickly, it's about no more than 18 seconds to finish every >>>>> reduce task, so maybe that's the cause? Thanks Cheers Ramon >>>>> > Date: Sun, 11 Mar 2012 09:14:15 +0530 >>>>>> From: pran...@xebia.com >>>>>> To: user@mahout.apache.org >>>>>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means >>>>>> cluster >>>>>> >>>>>> And to answer the question about KMeans configuration : >>>>>> >>>>>> Kmeans has two jobs : >>>>>> 1) builClusters : has a reducer and has no limitation on the number of >>>>>> reducer tasks >>>>>> 2) clusterData : executes if runClustering = true, has no reducer tasks >>>>>> >>>>>> On 11-03-2012 09:10, Paritosh Ranjan wrote: >>>>>>> Can you run K-means jobs again ( all with the same block size ) and give >>>>>>> same statistics for : >>>>>>> >>>>>>> a) only 1 job running >>>>>>> b) 2 jobs running simultaneously >>>>>>> c) 5 jobs running simultaneously >>>>>>> >>>>>>> On 10-03-2012 21:08, WangRamon wrote: >>>>>>>> Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster >>>>>>>> have 42 map and 42 reduce slots configured, I set the default reduce >>>>>>>> task per job as 73 (42 * 1.75), I find there are always about 12 of >>>>>>>> the reduce tasks are running at any time although there are 73 reduce >>>>>>>> tasks created for each of the K-Means job and i do have 42 reduce >>>>>>>> slots, it means at anytime i have about 30 reduce slots free. So i >>>>>>>> tried RecommenderJob from mahout again, i remember that job will use >>>>>>>> all my slots in my previouse test, and YES for this time, >>>>>>>> "RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 >>>>>>>> reduce and 42 map, so I'm wondering is that something configured in >>>>>>>> Mahout which cause this strange behavior? Any suggestions? Thanks in >>>>>>>> advance. Btw, i'm using mahout-0.6 release. Cheers Ramon >>>>>>>> >>>>> >>> >