Hi Abhay,
NameNode it has address of the all data nodes. MapReduce can do
all the data is processing. First data set is putting into HDFS filesystem
and then run hadoop jar file. Map task can handle input files for shufle,
sorting and grouped together. Map task is completed and then
Hello,
How can one get to know the nodes on which reduce tasks will run?
One of my job is running and it's completing all the map tasks.
My map tasks write lots of intermediate data. The intermediate directory is
getting full on all the nodes.
If the reduce task take any node from cluster then
HI Abhay
The TaskTrackers on which the reduce tasks are triggered is chosen in
random based on the reduce slot availability. So if you don't need the
reduce tasks to be scheduled on some particular nodes you need to set
'mapred.tasktracker.reduce.tasks.maximum' on those nodes to 0. The
bottleneck
Hi,
The reducer is run where there is slot available, the location is not
related to where the data is located and it is not possible to choose where
the reducer will run (except by tweaking the tasktracker...).
Regards
Bertrand
On Mon, Sep 3, 2012 at 4:19 PM, Abhay Ratnaparkhi
How can I set 'mapred.tasktracker.reduce.tasks.maximum' to 0 in a
running tasktracker?
Seems that I need to restart the tasktracker and in that case I'll loose
the output of map tasks by particular tasktracker.
Can I change 'mapred.tasktracker.reduce.tasks.maximum' to 0 without
restarting
Hi Abhay
You need this value to be changed before you submit your job and restart
TT. Modifying this value in mid time won't affect the running jobs.
On Mon, Sep 3, 2012 at 9:06 PM, Abhay Ratnaparkhi
abhay.ratnapar...@gmail.com wrote:
How can I set 'mapred.tasktracker.reduce.tasks.maximum'
Hi,
You are right that a change to mapred.tasktracker.reduce.tasks.maximum will
require a restart of the tasktrackers. AFAIK, there is no way of modifying
this property without restarting.
On a different note, could you see if the amount of intermediate data can
be reduced using a combiner, or
The short answer is no.
The longer answer is that you can attempt to force data locality, however even
then if an open slot becomes available, its used regardless of what you want to
do...
On Sep 3, 2012, at 9:19 AM, Abhay Ratnaparkhi abhay.ratnapar...@gmail.com
wrote:
Hello,
How can
All of my map tasks are about to complete and there is not much processing
to be done in reducer.
The job is running from a week so I don't want the job to fail. Any other
suggestion to tackle this is welcome.
~Abhay
On Mon, Sep 3, 2012 at 9:26 PM, Hemanth Yamijala