Hi, You are right that a change to mapred.tasktracker.reduce.tasks.maximum will require a restart of the tasktrackers. AFAIK, there is no way of modifying this property without restarting.
On a different note, could you see if the amount of intermediate data can be reduced using a combiner, or some other form of local aggregation ? Thanks hemanth On Mon, Sep 3, 2012 at 9:06 PM, Abhay Ratnaparkhi < abhay.ratnapar...@gmail.com> wrote: > How can I set 'mapred.tasktracker.reduce.tasks.maximum' to "0" in a > running tasktracker? > Seems that I need to restart the tasktracker and in that case I'll loose > the output of map tasks by particular tasktracker. > > Can I change 'mapred.tasktracker.reduce.tasks.maximum' to "0" without > restarting tasktracker? > > ~Abhay > > > On Mon, Sep 3, 2012 at 8:53 PM, Bejoy Ks <bejoy.had...@gmail.com> wrote: > >> HI Abhay >> >> The TaskTrackers on which the reduce tasks are triggered is chosen in >> random based on the reduce slot availability. So if you don't need the >> reduce tasks to be scheduled on some particular nodes you need to set >> 'mapred.tasktracker.reduce.tasks.maximum' on those nodes to 0. The >> bottleneck here is that this property is not a job level one you need to >> set it on a cluster level. >> >> A cleaner approach will be to configure each of your nodes with the right >> number of map and reduce slots based on the resources available on each >> machine. >> >> >> On Mon, Sep 3, 2012 at 7:49 PM, Abhay Ratnaparkhi < >> abhay.ratnapar...@gmail.com> wrote: >> >>> Hello, >>> >>> How can one get to know the nodes on which reduce tasks will run? >>> >>> One of my job is running and it's completing all the map tasks. >>> My map tasks write lots of intermediate data. The intermediate directory >>> is getting full on all the nodes. >>> If the reduce task take any node from cluster then It'll try to copy the >>> data to same disk and it'll eventually fail due to Disk space related >>> exceptions. >>> >>> I have added few more tasktracker nodes in the cluster and now want to >>> run reducer on new nodes only. >>> Is it possible to choose a node on which the reducer will run? What's >>> the algorithm hadoop uses to get a new node to run reducer? >>> >>> Thanks in advance. >>> >>> Bye >>> Abhay >>> >> >> >