Re: cluster under-utilization with Hadoop Fair Scheduler

2010-04-13 Thread abhishek sharma
Hi Ted, Were you referring to the Hadoop 0.20.2 distribution or the CDH version? I just looked at the FairScheduler assignTasks function in Hadoop dist. 0.20.2 and it is the same as version 0.20.0, and it will assign only 1 Map and 1 reduce task to a tasktracker per heartbeat (as far I can tell b

Re: cluster under-utilization with Hadoop Fair Scheduler

2010-04-11 Thread abhishek sharma
Hi Ted, I was referring to version 0.20.0. As Todd pointed out, the issue I pointed out was fixed in version 0.20.2. I only looked at the Cloudera version 0.20.2+228 (http://archive.cloudera.com/cdh/3/) currently in beta. I guess Hadoop 0.20.2 also has the fix. I will take a look at that too. T

Re: cluster under-utilization with Hadoop Fair Scheduler

2010-04-11 Thread Ted Yu
Reading assignTasks() in 0.20.2 reveals that the number of map tasks assigned is not limited to 1 per heartbeat. Cheers On Sun, Apr 11, 2010 at 12:30 PM, Todd Lipcon wrote: > Hi Abhishek, > > This behavior is improved by MAPREDUCE-706 I believe (not certain that > that's the JIRA, but I know it

Re: cluster under-utilization with Hadoop Fair Scheduler

2010-04-11 Thread Todd Lipcon
Hi Abhishek, This behavior is improved by MAPREDUCE-706 I believe (not certain that that's the JIRA, but I know it's fixed in trunk fairscheduler). These patches are included in CDH3 (currently in beta) http://archive.cloudera.com/cdh/3/ In general, though, map tasks that are so short are not goi

cluster under-utilization with Hadoop Fair Scheduler

2010-04-11 Thread abhishek sharma
Hi all, I have been using the Hadoop Fair Scheduler for some experiments on a 100 node cluster with 2 map slots per node (hence, a total of 200 map slots). In one of my experiments, all the map tasks finish within a heartbeat interval of 3 seconds. I noticed that the maximum number of concurrentl