So simple I was hoping to avoid admitting to it. ;-)
I had set the tasks java options at -Xmx1.5g, that needed to be -Xmx1500m, the telltale output of a mistake like that is rather tricky to find, I had to dig into the task tracker UI/logs, it doesn't show up on the job tracker's normal logs. The timing perfectly coincided with a DNS change, and Googles first hit, on the error that I *could* see in the jobtracker logs, suggested DNS, so I went down that rabbit hole for quite a while. Dave From: Shahab Yunus [mailto:shahab.yu...@gmail.com] Sent: Tuesday, May 14, 2013 6:56 PM To: user@hadoop.apache.org Subject: Re: JobClient: Error reading task output - after instituting a DNS server HI David. an you explain in a bit more detail what was the issue? Thanks. Shahab On Tue, May 14, 2013 at 2:29 AM, David Parks <davidpark...@yahoo.com> wrote: I just hate it when I figure out a problem right after asking for help. Finding the task logs via the task tracker website identified the problem which didn't show up elsewhere. Simple mis-configuration which I did concurrently with the DNS update that threw me off track. Dave From: David Parks [mailto:davidpark...@yahoo.com] Sent: Tuesday, May 14, 2013 1:20 PM To: user@hadoop.apache.org Subject: JobClient: Error reading task output - after instituting a DNS server So we just configured a local DNS server for hostname resolution and stopped using a hosts file and now jobs fail on us. But I can't figure out why. You can see the error below, but if I run curl to any of those URLs they come back "Failed to retrieve stdout log", which doesn't look much like a DNS issue. I can ping and do nslookup from any host to any other host. This is a CDH4 cluster and the host inspector is happy as could be; also Cloudera Manager indicates all is well. When I open the task tracker website I see the first task attempt show up on the site there for maybe 10 seconds or so before it fails. Any idea what I need to look at here? Job: ==== 13/05/14 05:13:40 INFO input.FileInputFormat: Total input paths to process : 131 13/05/14 05:13:41 INFO input.FileInputFormat: Total input paths to process : 1 13/05/14 05:13:42 INFO mapred.JobClient: Running job: job_201305131758_0003 13/05/14 05:13:43 INFO mapred.JobClient: map 0% reduce 0% 13/05/14 05:13:47 INFO mapred.JobClient: Task Id : attempt_201305131758_0003_m_000353_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237) 13/05/14 05:13:47 WARN mapred.JobClient: Error reading task outputhttp://hadoop-fullslot2:50060/tasklog?plaintext=true&attemptid=attempt _201305131758_0003_m_000353_0&filter=stdout 13/05/14 05:13:47 WARN mapred.JobClient: Error reading task outputhttp://hadoop-fullslot2:50060/tasklog?plaintext=true&attemptid=attempt _201305131758_0003_m_000353_0&filter=stderr 13/05/14 05:13:50 INFO mapred.JobClient: Task Id : attempt_201305131758_0003_r_000521_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237) 13/05/14 05:13:50 WARN mapred.JobClient: Error reading task outputhttp://hadoop-fullslot2:50060/tasklog?plaintext=true&attemptid=attempt _201305131758_0003_r_000521_0&filter=stdout 13/05/14 05:13:50 WARN mapred.JobClient: Error reading task outputhttp://hadoop-fullslot2:50060/tasklog?plaintext=true&attemptid=attempt _201305131758_0003_r_000521_0&filter=stderr curl of above URL: ==================== davidparks21@hadoop-meta1:~$ curl 'http://hadoop-fullslot2:50060/tasklog?plaintext=true <http://hadoop-fullslot2:50060/tasklog?plaintext=true&attemptid=attempt_2013 05131758_0003_m_000353_0&filter=stdout> &attemptid=attempt_201305131758_0003_m_000353_0&filter=stdout' <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> <title>Error 410 Failed to retrieve stdout log for task: attempt_201305131758_0003_m_000353_0</title> </head> <body><h2>HTTP ERROR 410</h2> <p>Problem accessing /tasklog. Reason: <pre> Failed to retrieve stdout log for task: attempt_201305131758_0003_m_000353_0</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/> <br/> <br/> <br/> <br/>