[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941073#comment-13941073 ] Karthik Kambatla commented on YARN-1815: Even if the UMA finishes successfully, there is no way for the RM to know. At least, not until YARN-556. Today, the RM tries to recover the app, but can't recover UAM. The corresponding RMApp transitions to FAILED after a while. This JIRA is only avoiding those attempts to recover and marking it as FAILED early. RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, yarn-1815-2.patch, yarn-1815-2.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933580#comment-13933580 ] Karthik Kambatla commented on YARN-1815: The tests pass locally. Filed YARN-1830 for TestRMRestart failure and YARN-1591 covers TestRMRestart failure. [~vinodkv] - mind taking a look at the updated patch? RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, yarn-1815-2.patch, yarn-1815-2.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933639#comment-13933639 ] Jian He commented on YARN-1815: --- Thanks Karthik for the patch. For now, it should be fine to move UMA to Failed state as UMA is not saving the final state and RM restart doesn’t support UMA. The core change looks good. Test case: we need a more thorough test case to test UMA is moved to Failed state after RM restarts using two MockRMs like the ones in TestRMRestart. The bigger problem is that if Unmanged application is not added back to the completedApps in RMAppManager after RM restart via the FinalTransition, it'll never be removed from state store. We remove the applications from state store when completedApps in RMAppManager go beyond the max-app-limit. RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, yarn-1815-2.patch, yarn-1815-2.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933938#comment-13933938 ] Jian He commented on YARN-1815: --- bq. it should be fine to move UMA to Failed state as UMA is not saving the final state On a second thought, if the UMA just successfully finished, and it will also be moved to FAILD state after RM restart? this doesn't seem right. RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, yarn-1815-2.patch, yarn-1815-2.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932232#comment-13932232 ] Karthik Kambatla commented on YARN-1815: bq. If so, we need to make sure the App state moves to FAILED for apps with unmanaged AMs after RM restart. Good catch, Vinod. Looking into this. RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-1815-1.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932419#comment-13932419 ] Hadoop QA commented on YARN-1815: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634246/yarn-1815-2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3335//console This message is automatically generated. RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: yarn-1815-1.patch, yarn-1815-2.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932630#comment-13932630 ] Hadoop QA commented on YARN-1815: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634273/yarn-1815-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3338//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3338//console This message is automatically generated. RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, yarn-1815-2.patch, yarn-1815-2.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931015#comment-13931015 ] Sandy Ryza commented on YARN-1815: -- +1 RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1815-1.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931020#comment-13931020 ] Jian He commented on YARN-1815: --- looks good, +1 RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1815-1.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931064#comment-13931064 ] Hadoop QA commented on YARN-1815: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634014/yarn-1815-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3320//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3320//console This message is automatically generated. RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1815-1.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1815) RM should recover only Managed AMs
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931065#comment-13931065 ] Vinod Kumar Vavilapalli commented on YARN-1815: --- Wait, that doesn't make sense. Do we store unmanaged AMs? If so, we need to make sure the App state moves to FAILED for apps with unmanaged AMs after RM restart. RM should recover only Managed AMs -- Key: YARN-1815 URL: https://issues.apache.org/jira/browse/YARN-1815 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1815-1.patch RM should not recover unmanaged AMs until YARN-1823 is fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)