[ https://issues.apache.org/jira/browse/MAPREDUCE-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anupam Seth updated MAPREDUCE-2177: ----------------------------------- Summary: Lacking progress update in combiner phase can cause map task timeouts (was: The wait for spill completion should call Condition.awaitNanos(long nanosTimeout)) > Lacking progress update in combiner phase can cause map task timeouts > --------------------------------------------------------------------- > > Key: MAPREDUCE-2177 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2177 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 0.20.2 > Reporter: Ted Yu > Assignee: Anupam Seth > > We sometimes saw maptask timeout in cdh3b2. Here is log from one of the > maptasks: > 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: Spilling map > output: buffer full= true > 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: bufstart = > 119534169; bufend = 59763857; bufvoid = 298844160 > 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: kvstart = > 438913; kvend = 585320; length = 983040 > 2010-11-04 10:34:41,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill > 3 > 2010-11-04 10:35:45,352 INFO org.apache.hadoop.mapred.MapTask: Spilling map > output: buffer full= true > 2010-11-04 10:35:45,547 INFO org.apache.hadoop.mapred.MapTask: bufstart = > 59763857; bufend = 298837899; bufvoid = 298844160 > 2010-11-04 10:35:45,547 INFO org.apache.hadoop.mapred.MapTask: kvstart = > 585320; kvend = 731585; length = 983040 > 2010-11-04 10:45:41,289 INFO org.apache.hadoop.mapred.MapTask: Finished spill > 4 > Note how long the last spill took. > In MapTask.java, the following code waits for spill to finish: > while (kvstart != kvend) { reporter.progress(); spillDone.await(); } > In trunk code, code is similar. > There is no timeout mechanism for Condition.await(). In case the SpillThread > takes long before calling spillDone.signal(), we would see timeout. > Condition.awaitNanos(long nanosTimeout) should be called. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira