[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930224#comment-13930224 ] Hudson commented on YARN-1764: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/506/]) YARN-1764. Modified YarnClient to correctly handle failover of ResourceManager after the submitApplication call goes through. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576160) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSubmitApplicationWithRMHA.java Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1764.1.patch, YARN-1764.2.patch, YARN-1764.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930347#comment-13930347 ] Hudson commented on YARN-1764: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/]) YARN-1764. Modified YarnClient to correctly handle failover of ResourceManager after the submitApplication call goes through. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576160) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSubmitApplicationWithRMHA.java Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1764.1.patch, YARN-1764.2.patch, YARN-1764.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930418#comment-13930418 ] Hudson commented on YARN-1764: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/]) YARN-1764. Modified YarnClient to correctly handle failover of ResourceManager after the submitApplication call goes through. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576160) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSubmitApplicationWithRMHA.java Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1764.1.patch, YARN-1764.2.patch, YARN-1764.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929790#comment-13929790 ] Hadoop QA commented on YARN-1764: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633802/YARN-1764.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3314//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3314//console This message is automatically generated. Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1764.1.patch, YARN-1764.2.patch, YARN-1764.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929795#comment-13929795 ] Hadoop QA commented on YARN-1764: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633802/YARN-1764.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3315//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3315//console This message is automatically generated. Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1764.1.patch, YARN-1764.2.patch, YARN-1764.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929855#comment-13929855 ] Hudson commented on YARN-1764: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5302 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5302/]) YARN-1764. Modified YarnClient to correctly handle failover of ResourceManager after the submitApplication call goes through. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576160) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSubmitApplicationWithRMHA.java Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.4.0 Attachments: YARN-1764.1.patch, YARN-1764.2.patch, YARN-1764.3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925427#comment-13925427 ] Xuan Gong commented on YARN-1764: - bq. Can you add a log in YarnClientImpl when we retry the submission? DONE bq. Can you improvement the documentation of submitApp() API in ApplicationClientProtocol about the clients needing to retry when the specified exception happens? ADDED bq. Also add the exception to the documentation to the base protocol. ADDED bq. Document YarnClient's submit API that we automatically retry when this issue happens. ADDED bq. All the new files added in the patch have some formatting issues. FIXED bq. In both the test-cases, after the fail-over, we assert for the states that are not expected (assertFalse). Can we explicitly test for the cases that we expect (assertTrue) ? changed bq. I think we should also mark getApplicationReport() to be idempotent in this patch itself as RM can fail-over after submitApplication() returned but during a getApplicationReport(). We will need to add some tests for this too. ADDED Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1764.1.patch, YARN-1764.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925445#comment-13925445 ] Hadoop QA commented on YARN-1764: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633623/YARN-1764.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3307//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3307//console This message is automatically generated. Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1764.1.patch, YARN-1764.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923566#comment-13923566 ] Vinod Kumar Vavilapalli commented on YARN-1764: --- I think we should also mark getApplicationReport() to be idempotent in this patch itself as RM can fail-over after submitApplication() returned but *during* a getApplicationReport(). We will need to add some tests for this too. Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1764.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920564#comment-13920564 ] Hadoop QA commented on YARN-1764: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632776/YARN-1764.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3258//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3258//console This message is automatically generated. Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1764.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.
[ https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918499#comment-13918499 ] Xuan Gong commented on YARN-1764: - Let us continue our discussions on case 3: Handle RM fail overs after the submitApplication call. Reply to [~kkambatl]‘s comment: “ I don't see 3 to be as straight-forward, and suspect would require revisiting the state machine.” We will only consider the case that failover happens after submitApplication call. It means when failover happens, we have already received the SubmitApplicationResponse. When the failover happens, we will *not re-entry* clientRMService#submitApplication() again. What will happen next is that getApplicationReport() will start to execute. And YarnClient will start to re-try until it finds the next active RM, and continue execute getApplicationReport(). Now we have two cases to handle: * RMStateStore already saved the ApplicationState when failover happens. * RMStateStore does not save the ApplicationState when failover happens. For case1, we do not need to make any changes. For case2, if the failover happens, when we try to execute getApplicationReport, we will get ApplicationNotFoundException. I think this is the only case we should handle here. Handle RM fail overs after the submitApplication call. -- Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.2#6252)