[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2019-04-28 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827893#comment-16827893
 ] 

Shurong Mai edited comment on YARN-5449 at 4/29/19 3:37 AM:


[~rohithsharma] , thank you for your attention and advice . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we could not get 
the reason for sure. As a result of we analysed, we guessed the most  probable 
reason of nodemanager process  hung was that disk hanging  when reading/writing 
disk, but we have not proved that yet.


was (Author: shurong.mai):
[~rohithsharma] , thank you for your attention and advices . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we can't get the 
reason for sure. As a result of we analysed, the most  probable reason is that 
nodemanager process is hung.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: Shurong Mai
>Priority: Major
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2019-04-28 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827893#comment-16827893
 ] 

Shurong Mai edited comment on YARN-5449 at 4/29/19 2:59 AM:


[~rohithsharma] , thank you for your attention and advices . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we can't get the 
reason for sure. As a result of we analysed, the most  probable reason is that 
nodemanager process is hung.


was (Author: shurong.mai):
[~rohithsharma] , thank you for your attention and advices . Before I created 
this issue, we had been making analysis it for a long time from  jvm process 
thread stack, jvm process  heap memory, different java version, os log, 
different os version,  different os file system and so on. But we can't get the 
reason for sure. As a result of we analysed, the most  probable reason is that 
nodemanager process is hung.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: Shurong Mai
>Priority: Major
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2016-08-03 Thread mai shurong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469
 ] 

mai shurong edited comment on YARN-5449 at 8/4/16 2:57 AM:
---

Sorry, I had added my description of this issue when I created it, but was not 
submitted to jira by some problems. I would add description as soon as possible.


was (Author: shurong.mai):
Sorry, I had added my description of this issue when I created it, bu was not 
submitted to jira by some problems. I would add description as soon as possible.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: mai shurong
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2016-08-02 Thread mai shurong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469
 ] 

mai shurong edited comment on YARN-5449 at 8/2/16 9:40 AM:
---

Sorry, I had added my description of this issue when I created it, bu was not 
submitted to jira by some problems. I would add description as soon as possible.


was (Author: shurong.mai):
Sorry, I had added my description of this issue when I created this jira, bu 
was not submitted to jira by some problems. I would add description as soon as 
possible.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: mai shurong
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2016-08-02 Thread mai shurong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469
 ] 

mai shurong edited comment on YARN-5449 at 8/2/16 9:38 AM:
---

Sorry, I had added my description of this issue when I created this jira, bu 
was not submitted to jira by some problems. I would add description as soon as 
possible.


was (Author: shurong.mai):
Sorry, I had added my description of this issue when I created, bu was not 
submitted to jira by some problems. I would add description as soon as possible.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: mai shurong
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2016-08-02 Thread mai shurong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469
 ] 

mai shurong edited comment on YARN-5449 at 8/2/16 9:38 AM:
---

Sorry, I had added my description of this issue when I created, bu was not 
submitted to jira by some problems. I would add description as soon as possible.


was (Author: shurong.mai):
Sorry, I had added my description, bu was not submitted to jira by some 
problems. I would add description as soon as possible.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: mai shurong
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org