This problem I submitted a jira 
issue:https://issues.apache.org/jira/browse/YARN-5449#



The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux

The java version is jdk1.7.0_45

The hadoop version is hadoop-2.2.0



The following is the description of this problem:


Some nodemanager process is hung, and lost from resourcemanager.
The nodemanager's log is stopped from printing.
The used cpu of nodemanager process is very low(nearly 0%).
GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
-gccause pid 1000 100) is as follows:
S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC
0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation 
Pause
0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation 
Pause
0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation 
Pause
0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation 
Pause
0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation 
Pause
0.00 100.00 95.06 24.08 30.46 3274 623.437 7 5.899 629.335 No GC G1 Evacuation 
Pause

The nodemanager jvm process is also accur this problem using CMS garbage 
collector or g1 garbage collector.

The parameters of CMS garbage collector are as following:
-Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m 
-XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
-XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
-XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70

The parameters of g1 garbage collector are as following:
-Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC 
-XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
-XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 
-XX:+PrintAdaptiveSizePolicy

Reply via email to