ledion bitincka created MAPREDUCE-5085:
------------------------------------------

             Summary: JobClient reorders splits 
                 Key: MAPREDUCE-5085
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5085
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: ledion bitincka


The JobClient hard codes ordering of splits in descending size. While this 
could be fine for traditional/batch mr jobs it is not well suited for map-only 
jobs where a client is interested in the order of map executions. More over, by 
constantly running more expensive mappers early in the job the cluster is taxed 
more heavily and not uniformly/smoothly utilized over time. 

{code}
...JobClient.java
  private <T extends InputSplit>
  int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException,
      InterruptedException, ClassNotFoundException {
....
    // sort the splits into order based on size, so that the biggest
    // go first
    Arrays.sort(array, new SplitComparator());
    JobSplitWriter.createSplitFiles(jobSubmitDir, conf, 
jobSubmitDir.getFileSystem(conf), array);
    return array.length;
  }
{code>

It should be straightforward to make the SplitComparator an instance variable 
of the JobClient and allow it to be set by the consumers if they care about the 
order in which splits are attempted to run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to