This problem I submitted a jira issue:https://issues.apache.org/jira/browse/YARN-5449#
The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux The java version is jdk1.7.0_45 The hadoop version is hadoop-2.2.0 The following is the description of this problem: Some nodemanager process is hung, and lost from resourcemanager. The nodemanager's log is stopped from printing. The used cpu of nodemanager process is very low(nearly 0%). GC of nodemanager jvm process is stopped, and the result of jstat(jstat -gccause pid 1000 100) is as follows: S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC 0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation Pause 0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation Pause The nodemanager jvm process is also accur this problem using CMS garbage collector or g1 garbage collector. The parameters of CMS garbage collector are as following: -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 The parameters of g1 garbage collector are as following: -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 -XX:+PrintAdaptiveSizePolicy