I'm trying to figure out why, when I use a single input file that is relatively 
small (< 3 MB, ~95K records), my Reducer hangs at 33% and the attempt never 
completes, with the output below. It looks like there's a problem when it tries 
to write to the log file, for some reason it gets Interrupted which is not 
handled, and the Thread then terminates without ever catching that Exception 
and so the Reducer Task sits there until the JobTracker kills it and starts up 
another one. Which inevitably hits the same one.

When I take that same data, and split it up into multiple files (5, 10, 
whichever, just >1), then it runs fine, no issues. Very strange, anyone have 
any ideas?


--Aaron


stderr logs

log4j:ERROR Failed to flush writer,
java.io.InterruptedIOException
            at java.io.FileOutputStream.writeBytes(Native Method)
            at java.io.FileOutputStream.write(FileOutputStream.java:260)
            at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
            at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
            at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
            at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
            at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
            at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:58)
            at 
org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:316)
            at org.apache.log4j.WriterAppender.append(WriterAppender.java:160)
            at 
org.apache.hadoop.mapred.TaskLogAppender.append(TaskLogAppender.java:58)
            at 
org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
            at 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
            at org.apache.log4j.Category.callAppenders(Category.java:206)
            at org.apache.log4j.Category.forcedLog(Category.java:391)
            at org.apache.log4j.Category.log(Category.java:856)
            at 
org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:199)
            at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.freeHost(ShuffleScheduler.java:345)
            at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152)



syslog logs

2011-03-15 17:02:32,258 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2011-03-15 17:02:32,265 WARN org.apache.hadoop.conf.Configuration: user.name is 
deprecated. Instead, use mapreduce.job.user.name
2011-03-15 17:02:32,360 WARN org.apache.hadoop.conf.Configuration: 
mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2011-03-15 17:02:32,413 INFO 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
memoryLimit=130514944, maxSingleShuffleLimit=32628736, mergeThreshold=86139864, 
ioSortFactor=10, memToMemMergeOutputsThreshold=10
2011-03-15 17:02:32,416 INFO 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_201103141158_0104_r_000000_0 Thread started: EventFetcher for fetching 
Map Completion Events
2011-03-15 17:02:32,423 INFO 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_201103141158_0104_r_000000_0: Got 1 new map-outputs
2011-03-15 17:02:32,423 INFO 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
10.10.11.50:50060 with 1 to fetcher#5
2011-03-15 17:02:32,423 INFO 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
10.10.11.50:50060 to fetcher#5
2011-03-15 17:02:32,730 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher: 
for 
url=50060/mapOutput?job=job_201103141158_0104&reduce=0&map=attempt_201103141158_0104_m_000000_0
 sent hash and receievd reply
2011-03-15 17:02:32,735 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher: 
fetcher#5 about to shuffle output of map attempt_201103141158_0104_m_000000_0 
decomp: 2050216 len: 2050220 to MEMORY
2011-03-15 17:02:32,790 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher: 
Read 2050216 bytes from map-output for attempt_201103141158_0104_m_000000_0
2011-03-15 17:02:32,790 INFO 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
map-output of size: 2050216, inMemoryMapOutputs.size() -> 1


Reply via email to