Hi Matt, You are most probably seeing this https://issues.apache.org/jira/browse/MAPREDUCE-2374
There is a single line fix for this issue. See the latest patch attached to the above JIRA entry. -Shrinivas -----Original Message----- From: Matt Kennedy [mailto:stinkym...@gmail.com] Sent: Tuesday, August 21, 2012 2:15 PM To: user@hadoop.apache.org Subject: Map Reduce "Child Error" task failure I'm encountering a sporadic error while running MapReduce jobs, it shows up in the console output as follows: 12/08/21 14:56:05 INFO mapred.JobClient: Task Id : attempt_201208211430_0001_m_003538_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 126. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) 12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stdout 12/08/21 14:56:05 WARN mapred.JobClient: Error reading task outputhttp://<hostname_removed>:50060/tasklog?plaintext=true&attemptid=attempt_201208211430_0001_m_003538_0&filter=stderr The conditions look exactly like those described in: https://issues.apache.org/jira/browse/MAPREDUCE-4003 Unfortunately, this issue is marked as closed for Apache Hadoop version 1.0.3, but that's the version that I'm running into this issue with. There does seem to be a correlation between the frequency of these errors and the number of concurrent map tasks being executed, however the hardware resources on the cluster do not appear to be near their limits. I'm assuming that there is a knob somewhere that is maladjusted that is causing this error, however I haven't found it. I did find this discussion (https://groups.google.com/a/cloudera.org/d/topic/cdh-user/NlhvHapf3pk/discussion) on CDH users list describing the exact same problem and the advice was to increase the value of the mapred.child.ulimit setting. However, I had this value initially unset, which should mean that the value is unlimited if my research is correct. Then I set the value to 3 GB (3x my setting for mapred.map.child.java.opts) and it still did not resolve the problem. Finally, out of frustration, I just added a zero at the end and now the value is 31457280 (the unit for the setting is in KB) which is 30GB. I'm still having the problem. Is anybody else seeing this issue or have an idea for a workaround? Right now my workaround is to set the allowed failures to be very high before a tasktracker is blacklisted, but this has the unintended side effect of taking a very long time to evict legitimately messed up tasktrackers. If this error is indicative of some other configuration problem, I'd like to try to resolve it. Ideas? Or should I re-open the JIRA? Thank you for your time, Matt