Re: Combiner timing out

2011-11-08 Thread Robert Evans
Chris, I have filed MAPREDUCE-3376 for this issue. I have no idea when or if I will get around to fixing it. It looks like a fairly simple change, perhaps even a one or two line change, but reproducing the issue and testing that I actua

Re: Combiner timing out

2011-11-07 Thread Robert Evans
OK I found the problem. Line 1148 of Task.java in the OldCombinerRunner class. If your combiner is part of the old mapred API then the reporter is always the NULL reporter and there is nothing that we can do about it without a code update. However if you use the new mapreduce API (Your combin

Re: Combiner timing out

2011-11-04 Thread Christopher Egner
I'm using CDH3u0 and streaming, so this is hadoop-0.20.2 at patch level 923.21 (cf https://ccp.cloudera.com/display/DOC/Downloading+CDH+Releases). I modified the streaming code to confirm that it is calling progress when I ask it to and which Reporter class is actually being used. It's the T

Re: Combiner timing out

2011-11-04 Thread Robert Evans
There was a change that went into 0.20.205 https://issues.apache.org/jira/browse/MAPREDUCE-2187 where after so many inputs to the combiner progress is automatically reported. I looked through the code for 0.20.205 and from what I can see the CombineOutputCollector should be getting an instance

Combiner timing out

2011-11-03 Thread Christopher Egner
Hi all, Let me preface this with my understanding of how tasks work. If a task takes a long time (default 10min) and demonstrates no progress, the task tracker will decide the process is hung, kill it, and start a new attempt. Normally, one uses a Reporter instance's progress method to provide