[jira] [Updated] (YARN-5403) yarn top command does not execute correct

2016-07-19 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-5403: - Attachment: YARN-5403.patch > yarn top command does not execute correct >

[jira] [Updated] (YARN-5403) yarn top command does not execute correct

2016-07-19 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-5403: - Attachment: (was: YARN-5403.patch) > yarn top command does not execute correct >

[jira] [Updated] (YARN-5403) yarn top command does not execute correct

2016-07-19 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-5403: - Attachment: YARN-5403.patch > yarn top command does not execute correct >

[jira] [Created] (YARN-5403) yarn top command does not execute correct

2016-07-19 Thread gu-chi (JIRA)
gu-chi created YARN-5403: Summary: yarn top command does not execute correct Key: YARN-5403 URL: https://issues.apache.org/jira/browse/YARN-5403 Project: Hadoop YARN Issue Type: Bug

[jira] [Resolved] (YARN-3678) DelayedProcessKiller may kill other process other than container

2016-05-10 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi resolved YARN-3678. -- Resolution: Duplicate > DelayedProcessKiller may kill other process other than container >

[jira] [Created] (YARN-4536) DelayedProcessKiller may not work under heavy workload

2016-01-04 Thread gu-chi (JIRA)
gu-chi created YARN-4536: Summary: DelayedProcessKiller may not work under heavy workload Key: YARN-4536 URL: https://issues.apache.org/jira/browse/YARN-4536 Project: Hadoop YARN Issue Type: Bug

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2016-01-04 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082201#comment-15082201 ] gu-chi commented on YARN-3678: -- same issue as confirmed with [~hex108] > DelayedProcessKiller may kill other

[jira] [Resolved] (YARN-4536) DelayedProcessKiller may not work under heavy workload

2016-01-04 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi resolved YARN-4536. -- Resolution: Not A Problem As analyzed further, this is introduced by some custom modification, sorry if bother.

[jira] [Commented] (YARN-4536) DelayedProcessKiller may not work under heavy workload

2016-01-04 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081159#comment-15081159 ] gu-chi commented on YARN-4536: -- Thanks for reply, the process group I not realize, this seems introduced by

[jira] [Created] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly

2015-12-18 Thread gu-chi (JIRA)
gu-chi created YARN-4481: Summary: negative pending resource of queues lead to applications in accepted status inifnitly Key: YARN-4481 URL: https://issues.apache.org/jira/browse/YARN-4481 Project: Hadoop

[jira] [Updated] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly

2015-12-18 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-4481: - Attachment: jmx.txt > negative pending resource of queues lead to applications in accepted status > inifnitly >

[jira] [Commented] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly

2015-12-18 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065175#comment-15065175 ] gu-chi commented on YARN-4481: -- Same using DRC. :( Debug Log was only enabled after I saw the issue, so before

[jira] [Commented] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly

2015-12-18 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065181#comment-15065181 ] gu-chi commented on YARN-4481: -- I added some extra log to trace, do you have any idea how can probably

[jira] [Commented] (YARN-4427) NPE on handleNMContainerStatus when NM is registering to RM

2015-12-07 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044871#comment-15044871 ] gu-chi commented on YARN-4427: -- NM recovery is enabled, this is the precondition > NPE on

[jira] [Commented] (YARN-3730) scheduler reserve more resource than required

2015-05-31 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566919#comment-14566919 ] gu-chi commented on YARN-3730: -- Thx Naga, as improvements r not merged to my current using

[jira] [Updated] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-3678: - Attachment: YARN-3678.patch DelayedProcessKiller may kill other process other than container

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562180#comment-14562180 ] gu-chi commented on YARN-3678: -- I made this https://github.com/apache/hadoop/pull/20/

[jira] [Created] (YARN-3730) scheduler reserve more resource than required

2015-05-27 Thread gu-chi (JIRA)
gu-chi created YARN-3730: Summary: scheduler reserve more resource than required Key: YARN-3730 URL: https://issues.apache.org/jira/browse/YARN-3730 Project: Hadoop YARN Issue Type: Bug

[jira] [Updated] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-3678: - Attachment: (was: YARN-3678.patch) DelayedProcessKiller may kill other process other than container

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-19 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551681#comment-14551681 ] gu-chi commented on YARN-3678: -- I see the possibility is low, but with heavy task load, it

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-19 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551756#comment-14551756 ] gu-chi commented on YARN-3678: -- The PID number may be not use as a process, also can be a

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-19 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550390#comment-14550390 ] gu-chi commented on YARN-3678: -- I think if decrease the max_pid setting in OS can enlarge the

[jira] [Created] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-19 Thread gu-chi (JIRA)
gu-chi created YARN-3678: Summary: DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue

[jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally

2015-05-18 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547997#comment-14547997 ] gu-chi commented on YARN-1922: -- Hi, I see you comment here to check in YARN-1922.5.patch, but

[jira] [Commented] (YARN-3536) ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover

2015-04-23 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510320#comment-14510320 ] gu-chi commented on YARN-3536: -- Thx, as the exception trace stack is almost, I once looked

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2015-04-23 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508843#comment-14508843 ] gu-chi commented on YARN-2308: -- Thx, I saw this and think not a same issue. YARN-2340 is

[jira] [Commented] (YARN-3536) ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover

2015-04-23 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508852#comment-14508852 ] gu-chi commented on YARN-3536: -- 2015-04-21 03:52:31,395 | INFO | AsyncDispatcher event

[jira] [Created] (YARN-3536) ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover

2015-04-23 Thread gu-chi (JIRA)
gu-chi created YARN-3536: Summary: ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover Key: YARN-3536 URL: https://issues.apache.org/jira/browse/YARN-3536 Project: Hadoop

[jira] [Commented] (YARN-3536) ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover

2015-04-23 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508855#comment-14508855 ] gu-chi commented on YARN-3536: -- 2015-04-21 04:22:33,923 | INFO | main-EventThread |

[jira] [Updated] (YARN-3536) ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover

2015-04-23 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-3536: - Description: Here is a scenario that Application status is FAILED/FINISHED but AppAttempt status is null, this

[jira] [Updated] (YARN-3536) ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover

2015-04-23 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-3536: - Description: Here is a scenario that Application status is FAILED/FINISHED but AppAttempt status is null, this

[jira] [Commented] (YARN-3536) ZK exception occur when updating AppAttempt status, then NPE thrown when RM do recover

2015-04-23 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508900#comment-14508900 ] gu-chi commented on YARN-3536: -- Please assign this to me for fixing ZK exception occur when

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2015-04-22 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506991#comment-14506991 ] gu-chi commented on YARN-2308: -- Hi, Chang Li, as I went through the patches that you attached,

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2015-04-22 Thread gu-chi (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506992#comment-14506992 ] gu-chi commented on YARN-2308: -- Hi, Chang Li, as I went through the patches that you attached,