Pin reduces with consecutive IDs to nodes and have a single shuffle task per 
job per node
-----------------------------------------------------------------------------------------

                 Key: HADOOP-2568
                 URL: https://issues.apache.org/jira/browse/HADOOP-2568
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
            Reporter: Devaraj Das
            Assignee: Devaraj Das
             Fix For: 0.17.0


The idea is to reduce disk seeks while fetching the map outputs. If we 
opportunistically pin reduces with consecutive IDs (like 5, 6, 7 .. 
max-reduce-tasks on that node) on a node, and have a single shuffle task, we 
should benefit, if for every fetch, that shuffle task fetches all the outputs 
for the reduces it is shuffling for. In the case where we have 2 reduces per 
node, we will decrease the #seeks in the map output files on the map nodes by 
50%. Memory usage by that shuffle task would be proportional to the number of 
reduces it is shuffling for (to account for the number of ramfs instances, one 
per reduce). But overall it should help. 

Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to