Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum is 8. And for this job, I want it to be 4. I set it through conf and build the job with this conf, then submit it. But hadoop lauches 8 reduce per datanode...
2013/4/30 Nitin Pawar <nitinpawar...@gmail.com> > so basically if I understand correctly > > you want to limit the # parallel execution of reducers only for this job? > > > > On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju.han.fe...@gmail.com> wrote: > >> Thanks. >> >> In fact I don't want to set reducer or mapper numbers, they are fine. >> I want to set the reduce slot capacity of my cluster when it executes my >> specific job. Say I have 100 reduce tasks for this job, I want my cluster >> to execute 4 of them in the same time, not 8 of them in the same time, only >> for this specific job. >> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job. >> This conf is well received by the job, but ignored by hadoop .. >> >> Any idea why is this? >> >> >> 2013/4/30 Nitin Pawar <nitinpawar...@gmail.com> >> >>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets >>> the maximum number of reduce tasks that may be run by an individual >>> TaskTracker server at one time. This is not per job configuration. >>> >>> he number of map tasks for a given job is driven by the number of input >>> splits and not by the mapred.map.tasks parameter. For each input split a >>> map task is spawned. So, over the lifetime of a mapreduce job the number of >>> map tasks is equal to the number of input splits. mapred.map.tasks is just >>> a hint to the InputFormat for the number of maps >>> >>> If you want to set max number of maps or reducers per job then you can >>> set the hints by using the job object you created >>> job.setNumMapTasks() >>> >>> Note this is just a hint and again the number will be decided by the >>> input split size. >>> >>> >>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju.han.fe...@gmail.com> wrote: >>> >>>> Thanks Nitin. >>>> >>>> What I need is to set slot only for a specific job, not for the whole >>>> cluster conf. >>>> But what I did does NOT work ... Have I done something wrong? >>>> >>>> >>>> 2013/4/30 Nitin Pawar <nitinpawar...@gmail.com> >>>> >>>>> The config you are setting is for job only >>>>> >>>>> But if you want to reduce the slota on tasktrackers then you will need >>>>> to edit tasktracker conf and restart tasktracker >>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju.han.fe...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I want to change the cluster's capacity of reduce slots on a per job >>>>>> basis. Originally I have 8 reduce slots for a tasktracker. >>>>>> I did: >>>>>> >>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4"); >>>>>> ... >>>>>> Job job = new Job(conf, ...) >>>>>> >>>>>> >>>>>> And in the web UI I can see that for this job, the max reduce tasks >>>>>> is exactly at 4, like I set. However hadoop still launches 8 reducer per >>>>>> datanode ... why is this? >>>>>> >>>>>> How could I achieve this? >>>>>> -- >>>>>> *JU Han* >>>>>> >>>>>> Software Engineer Intern @ KXEN Inc. >>>>>> UTC - Université de Technologie de Compiègne >>>>>> * **GI06 - Fouille de Données et Décisionnel* >>>>>> >>>>>> +33 0619608888 >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *JU Han* >>>> >>>> Software Engineer Intern @ KXEN Inc. >>>> UTC - Université de Technologie de Compiègne >>>> * **GI06 - Fouille de Données et Décisionnel* >>>> >>>> +33 0619608888 >>>> >>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> >> >> -- >> *JU Han* >> >> Software Engineer Intern @ KXEN Inc. >> UTC - Université de Technologie de Compiègne >> * **GI06 - Fouille de Données et Décisionnel* >> >> +33 0619608888 >> > > > > -- > Nitin Pawar > -- *JU Han* Software Engineer Intern @ KXEN Inc. UTC - Université de Technologie de Compiègne * **GI06 - Fouille de Données et Décisionnel* +33 0619608888