Streaming ignoring stderr output

2009-10-25 Thread Ryan Rosario
I am using a Python script as a mapper for a Hadoop Streaming (hadoop 0.20.0) job, with reducer NONE. My jobs keep getting killed with "task failed to respond after 600 seconds." I tried sending a heartbeat every minute to stderr using sys.stderr.write in my mapper, but nothing is being output to s

Re: Streaming ignoring stderr output

2009-10-26 Thread Koji Noguchi
This doesn't solve your stderr/stdout problem, but you can always set the timeout to be a bigger value if necessary. -Dmapred.task.timeout=__ (in milliseconds) Koji On 10/25/09 12:00 PM, "Ryan Rosario" wrote: > I am using a Python script as a mapper for a Hadoop Streaming (hadoop > 0.20.0

Re: Streaming ignoring stderr output

2009-10-26 Thread Ryan Rosario
Thanks. I think that I may have tripped on some sort of bug. Unfortunately, I do not know how to reproduce it and am a bit scared to try to reproduce it. I got this to work. I changed the following things, and now my job completes successfully with stderr written to the logs as output occurs. What

Re: Streaming ignoring stderr output

2009-10-27 Thread Jason Venner
Most likely one gets buffered when the file descriptor is a pipe and the other is at most line buffered as it is when the code is run by the streaming mapper tsak. On Mon, Oct 26, 2009 at 11:06 AM, Ryan Rosario wrote: > Thanks. I think that I may have tripped on some sort of bug. > Unfortunately,