Hi, I am running an application against hadoop-2.1.0-beta RC, and my app requires 117 containers, I have got all the containers allocated, but while starting those containers, at around 99th container the node manager has gone down with the following kind of error in it's log. Also, I could reproduce this error running a "sleep 200; date" command using the Distributed Shell example, in which case I got this error at around 66th container.
2013-07-25 06:07:17,743 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, errno 11 at java.lang.Thread.startImpl(Native Method) at java.lang.Thread.start(Thread.java:887) at java.lang.ProcessInputStream.<init>(UNIXProcess.java:472) at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157) at java.security.AccessController.doPrivileged(AccessController.java:202) at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137) 2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException Thanks, Kishore