Re: Set reducer capacity for a specific M/R job

Han JU Tue, 30 Apr 2013 03:39:14 -0700

Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
is 8.
And for this job, I want it to be 4.
I set it through conf and build the job with this conf, then submit it. But
hadoop lauches 8 reduce per datanode...



2013/4/30 Nitin Pawar <nitinpawar...@gmail.com>

> so basically if I understand correctly
>
> you want to limit the # parallel execution of reducers only for this job?
>
>
>
> On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju.han.fe...@gmail.com> wrote:
>
>> Thanks.
>>
>> In fact I don't want to set reducer or mapper numbers, they are fine.
>> I want to set the reduce slot capacity of my cluster when it executes my
>> specific job. Say I have 100 reduce tasks for this job, I want my cluster
>> to execute 4 of them in the same time, not 8 of them in the same time, only
>> for this specific job.
>> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
>> This conf is well received by the job, but ignored by hadoop ..
>>
>> Any idea why is this?
>>
>>
>> 2013/4/30 Nitin Pawar <nitinpawar...@gmail.com>
>>
>>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets
>>> the maximum number of reduce tasks that may be run by an individual
>>> TaskTracker server at one time. This is not per job configuration.
>>>
>>> he number of map tasks for a given job is driven by the number of input
>>> splits and not by the mapred.map.tasks parameter. For each input split a
>>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>>> a hint to the InputFormat for the number of maps
>>>
>>> If you want to set max number of maps or reducers per job then you can
>>> set the hints by using the job object you created
>>> job.setNumMapTasks()
>>>
>>> Note this is just a hint and again the number will be decided by the
>>> input split size.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju.han.fe...@gmail.com> wrote:
>>>
>>>> Thanks Nitin.
>>>>
>>>> What I need is to set slot only for a specific job, not for the whole
>>>> cluster conf.
>>>> But what I did does NOT work ... Have I done something wrong?
>>>>
>>>>
>>>> 2013/4/30 Nitin Pawar <nitinpawar...@gmail.com>
>>>>
>>>>> The config you are setting is for job only
>>>>>
>>>>> But if you want to reduce the slota on tasktrackers then you will need
>>>>> to edit tasktracker conf and restart tasktracker
>>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju.han.fe...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>>> I did:
>>>>>>
>>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>>> ...
>>>>>> Job job = new Job(conf, ...)
>>>>>>
>>>>>>
>>>>>> And in the web UI I can see that for this job, the max reduce tasks
>>>>>> is exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>>> datanode ... why is this?
>>>>>>
>>>>>> How could I achieve this?
>>>>>> --
>>>>>> *JU Han*
>>>>>>
>>>>>> Software Engineer Intern @ KXEN Inc.
>>>>>> UTC   -  Université de Technologie de Compiègne
>>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>>
>>>>>> +33 0619608888
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Reply via email to