Allow a load difference in fairshare scheduler
----------------------------------------------

                 Key: MAPREDUCE-936
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-936
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: contrib/fair-share
            Reporter: Zheng Shao


The problem we are facing: It takes a long time for all tasks of a job to get 
scheduled on the cluster, even if the cluster is almost empty.

There are two reasons that together lead to this situation:
1. The load factor makes sure each TT runs the same number of tasks. (This is 
the part that this patch tries to change).

2. The scheduler tries to schedule map tasks locally (first node-local, then 
rack-local). There is a wait time (mapred.fairscheduler.localitywait.node and 
mapred.fairscheduler.localitywait.rack, both are around 10 sec in our conf), 
and accumulated wait time (JobInfo.localityWait). The accumulated wait time is 
reset to 0 whenever a non-local map task is scheduled. That means it takes N * 
wait_time to schedule N non-local map tasks.

Because of 1, a lot of TT will not be able to take more tasks, even if they 
have free slots. As a result, a lot of the map tasks cannot be scheduled 
locally.

Because of 2, it's really hard to schedule a non-local task.

As a result, sometimes we are seeing that it takes more than 2 minutes to 
schedule all the mappers of a job.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to