What is the down side of increasing both mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to same value ?
I read on this link<http://developer.yahoo.com/hadoop/tutorial/module7.html>that: mapred.tasktracker.map.tasks.maximum 1/2 * (cores/node) to 2 * (cores/node)Number of map tasks to deploy on each machine. mapred.tasktracker.reduce.tasks.maximum1/2 * (cores/node) to 2 * (cores/node) Number of reduce tasks to deploy on each machine. Each node has 8 cores. So according to above guidance I should both the configs from 4 to 16. The ratio of mapper to reducer doesn't really matter as far as these two properties are concerned. On Mon, Sep 30, 2013 at 12:52 PM, Sandy Ryza <sandy.r...@cloudera.com>wrote: > Hi Himanshu, > > Changing the ratio is definitely a reasonable thing to do. The capacities > come from the mapred.tasktracker.map.tasks.maximum > and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations. > You can tweak these on your nodes to get your desired ratio. > > -Sandy > > > On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <himansh...@gmail.com>wrote: > >> Hi, >> >> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map >> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a >> ratio of 2.7. We have a lot of variety of jobs running and we want to >> increase the throughput. >> >> My manual observation was that we hit the Mapper capacity and hence many >> jobs have to wait even though lot of room left in Reduce capacity. I mined >> the jobtracker logs for the jobs that completed and saw that on a hourly >> basis as well as daily basis the mapper:reducer ratio was 4-5. >> >> To increase the throughput I was thinking that I experiment changing the >> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to >> ~4. >> >> Does this sound like a correct approach ? Is this something that I can >> control or it's determined automatically by Hadoop ? >> >> Have any of you done this kind of exercise ? If yes can you please direct >> how to go about changing this ratio. I am not finding much literature on >> it. >> >> Note: Mapper and ReducerTask Capacity is the max total no. of >> mappers/reducers you can run on the cluster at any point. >> >> Regards, >> -Himanshu Vijay >> > > -- -Himanshu Vijay