Johannes Zillmann created MAPREDUCE-5369: --------------------------------------------
Summary: Progress for jobs with multiple splits in local mode is wrong Key: MAPREDUCE-5369 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5369 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Reporter: Johannes Zillmann In case a job with multiple splits is executed in local mode (LocalJobRunner) its progress calculation is wrong. After the first split is processed it jumps to 100%, then back to 50% and so on. The reason lies in the progress calculation in LocalJobRunner: {code} float taskIndex = mapIds.indexOf(taskId); if (taskIndex >= 0) { // mapping float numTasks = mapIds.size(); status.setMapProgress(taskIndex/numTasks + taskStatus.getProgress()/numTasks); } else { status.setReduceProgress(taskStatus.getProgress()); } {code} The problem is that {{mapIds}} is filled lazily in run(). There is an loop over all splits. In the loop, the splits task id is added to {{mapIds}}, then the split is processed. That means {{numTasks}} is 1 while the first split is processed, it is 2 while the second task is processed and so on... I tried Hadoop 0.20.2, 1.0.3, 1.1.2 and cdh-4.1. All the same behaviour! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira