[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940746#comment-13940746 ] Hadoop QA commented on YARN-1849: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635573/yarn-1849-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3397//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3397//console This message is automatically generated. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940640#comment-13940640 ] Karthik Kambatla commented on YARN-1849: This time around, it turns out the master container is null: {code} if (rmAppAttempt != null) { if (rmAppAttempt.getMasterContainer().getId() .equals(containerStatus.getContainerId()) containerStatus.getState() == ContainerState.COMPLETE) {code} Looks like it is not necessary for an UnmanagedAM to have a master container. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940936#comment-13940936 ] Hadoop QA commented on YARN-1849: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635622/yarn-1849-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1492 javac compiler warnings (more than the trunk's current 1491 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3398//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3398//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3398//console This message is automatically generated. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941008#comment-13941008 ] Hadoop QA commented on YARN-1849: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635635/yarn-1849-3.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3400//console This message is automatically generated. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941037#comment-13941037 ] Jian He commented on YARN-1849: --- Hi, I want to take a look at the patch, can you wait for some time ? I'll do it today. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941007#comment-13941007 ] Alejandro Abdelnur commented on YARN-1849: -- +1 pending jenkins. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941004#comment-13941004 ] Karthik Kambatla commented on YARN-1849: Tested the newest patch on a secure cluster with UAM and RM HA. Failover works fine. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941062#comment-13941062 ] Hadoop QA commented on YARN-1849: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12635635/yarn-1849-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3399//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3399//console This message is automatically generated. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941275#comment-13941275 ] Jian He commented on YARN-1849: --- Those NULL checks should be valid only for UMA. Normal AM should not happen, if it happens, it’s a bug. suggest instead of those NULL checks which may hide bug, check if it is UMA, if it is , do not send the container finished events. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941283#comment-13941283 ] Karthik Kambatla commented on YARN-1849: Thanks [~jianhe]. Agree with you partially; in fact, I was thinking of doing that initially. However, in case we do end up into these NULLs for managed AMs, not handling them leads to the NM going down. Logging the errors will let us know that things are wrong, but not take the nodes down. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
[ https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941310#comment-13941310 ] Vinod Kumar Vavilapalli commented on YARN-1849: --- Haven't looked at the patch, but in general there is a constant tussle between keeping things up vs failing fast so as to be able to fix bugs. I would in general avoid null checks unless I am sure - failing the RM/NM at least uncovers the bug instead of limping with it and then breaking somewhere else at which point it becomes hard to root-cause. If possible, let's fix what is actually broken here instead of putting in a lot of null checks (if that is what the above comments are talking about). Sure, we may run into one more issue that we haven't foreseen, but we can atleast comfort in knowing that we are addressing the right corner cases. NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters Key: YARN-1849 URL: https://issues.apache.org/jira/browse/YARN-1849 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, yarn-1849-3.patch While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821. -- This message was sent by Atlassian JIRA (v6.2#6252)