[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827893#comment-16827893 ] Shurong Mai edited comment on YARN-5449 at 4/29/19 3:37 AM: [~rohithsharma] , thank you for your attention and advice . Before I created this issue, we had been making analysis it for a long time from jvm process thread stack, jvm process heap memory, different java version, os log, different os version, different os file system and so on. But we could not get the reason for sure. As a result of we analysed, we guessed the most probable reason of nodemanager process hung was that disk hanging when reading/writing disk, but we have not proved that yet. was (Author: shurong.mai): [~rohithsharma] , thank you for your attention and advices . Before I created this issue, we had been making analysis it for a long time from jvm process thread stack, jvm process heap memory, different java version, os log, different os version, different os file system and so on. But we can't get the reason for sure. As a result of we analysed, the most probable reason is that nodemanager process is hung. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: Shurong Mai >Priority: Major > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827893#comment-16827893 ] Shurong Mai edited comment on YARN-5449 at 4/29/19 2:59 AM: [~rohithsharma] , thank you for your attention and advices . Before I created this issue, we had been making analysis it for a long time from jvm process thread stack, jvm process heap memory, different java version, os log, different os version, different os file system and so on. But we can't get the reason for sure. As a result of we analysed, the most probable reason is that nodemanager process is hung. was (Author: shurong.mai): [~rohithsharma] , thank you for your attention and advices . Before I created this issue, we had been making analysis it for a long time from jvm process thread stack, jvm process heap memory, different java version, os log, different os version, different os file system and so on. But we can't get the reason for sure. As a result of we analysed, the most probable reason is that nodemanager process is hung. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: Shurong Mai >Priority: Major > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469 ] mai shurong edited comment on YARN-5449 at 8/4/16 2:57 AM: --- Sorry, I had added my description of this issue when I created it, but was not submitted to jira by some problems. I would add description as soon as possible. was (Author: shurong.mai): Sorry, I had added my description of this issue when I created it, bu was not submitted to jira by some problems. I would add description as soon as possible. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: mai shurong > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469 ] mai shurong edited comment on YARN-5449 at 8/2/16 9:40 AM: --- Sorry, I had added my description of this issue when I created it, bu was not submitted to jira by some problems. I would add description as soon as possible. was (Author: shurong.mai): Sorry, I had added my description of this issue when I created this jira, bu was not submitted to jira by some problems. I would add description as soon as possible. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: mai shurong > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469 ] mai shurong edited comment on YARN-5449 at 8/2/16 9:38 AM: --- Sorry, I had added my description of this issue when I created this jira, bu was not submitted to jira by some problems. I would add description as soon as possible. was (Author: shurong.mai): Sorry, I had added my description of this issue when I created, bu was not submitted to jira by some problems. I would add description as soon as possible. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: mai shurong > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager
[ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469 ] mai shurong edited comment on YARN-5449 at 8/2/16 9:38 AM: --- Sorry, I had added my description of this issue when I created, bu was not submitted to jira by some problems. I would add description as soon as possible. was (Author: shurong.mai): Sorry, I had added my description, bu was not submitted to jira by some problems. I would add description as soon as possible. > nodemanager process is hung, and lost from resourcemanager > -- > > Key: YARN-5449 > URL: https://issues.apache.org/jira/browse/YARN-5449 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux > The java version is jdk1.7.0_45 > The hadoop version is hadoop-2.2.0 >Reporter: mai shurong > > The nodemanager process is hung(is not dead), and lost from resourcemanager. > The nodemanager's log is stopped from printing. > The used cpu of nodemanager process is very low(nearly 0%). > GC of nodemanager jvm process is stopped, and the result of jstat(jstat > -gccause pid 1000 100) is as follows: > S0 S1 E O P YGC YGCTFGCFGCT GCT > LGCC GCC > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > 0.00 100.00 95.06 24.08 30.46 3274 623.437 75.899 629.335 No > GCG1 Evacuation Pause > The nodemanager jvm process is also accur this problem using CMS garbage > collector or g1 garbage collector. > The parameters of CMS garbage collector are as following: > -Xmx4096m -Xmn1024m -XX:PermSize=128m -XX:MaxPermSize=128m > -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 > -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 > -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 > The parameters of g1 garbage collector are as following: > -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC > -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 > -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4 > -XX:+PrintAdaptiveSizePolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org