[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626583#comment-13626583 ] Hudson commented on YARN-479: - Integrated in Hadoop-Mapreduce-trunk #1394 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1394/]) YARN-479. NM retry behavior for connection to RM should be similar for lost heartbeats (Jian He via bikas) (Revision 1465731) Result = FAILURE bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465731 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Fix For: 2.0.5-beta > > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, > YARN-479.8.patch, YARN-479.9.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626559#comment-13626559 ] Hudson commented on YARN-479: - Integrated in Hadoop-Hdfs-trunk #1367 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1367/]) YARN-479. NM retry behavior for connection to RM should be similar for lost heartbeats (Jian He via bikas) (Revision 1465731) Result = FAILURE bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465731 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Fix For: 2.0.5-beta > > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, > YARN-479.8.patch, YARN-479.9.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626468#comment-13626468 ] Hudson commented on YARN-479: - Integrated in Hadoop-Yarn-trunk #178 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/178/]) YARN-479. NM retry behavior for connection to RM should be similar for lost heartbeats (Jian He via bikas) (Revision 1465731) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465731 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Fix For: 2.0.5-beta > > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, > YARN-479.8.patch, YARN-479.9.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625701#comment-13625701 ] Hudson commented on YARN-479: - Integrated in Hadoop-trunk-Commit #3575 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3575/]) YARN-479. NM retry behavior for connection to RM should be similar for lost heartbeats (Jian He via bikas) (Revision 1465731) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465731 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Fix For: 2.0.5-beta > > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, > YARN-479.8.patch, YARN-479.9.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625688#comment-13625688 ] Bikas Saha commented on YARN-479: - Thanks! +1. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, > YARN-479.8.patch, YARN-479.9.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625588#comment-13625588 ] Hadoop QA commented on YARN-479: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577571/YARN-479.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/679//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/679//console This message is automatically generated. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, > YARN-479.8.patch, YARN-479.9.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625577#comment-13625577 ] Jian He commented on YARN-479: -- update a new patch,fix the bug in testNMRegistration in testNodeStatusUpdater, and updated based on last comment > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, > YARN-479.8.patch, YARN-479.9.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623979#comment-13623979 ] Bikas Saha commented on YARN-479: - Patch looks good overall! why is this line showing as a diff {code} -List containersStatuses = new ArrayList(); ... +List containersStatuses = new ArrayList(); {code} why is this change needed? {code} public void testNMRegistration() throws InterruptedException { +final long connectionWaitSecs = 5; +final long connectionRetryIntervalSecs = 1; +YarnConfiguration conf = createNMConfig(); +conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_WAIT_SECS, +connectionWaitSecs); +conf.setLong(YarnConfiguration +.RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_SECS, +connectionRetryIntervalSecs); + nm = new NodeManager() { {code} and this change needed? {code} @@ -527,7 +599,6 @@ protected NodeStatusUpdater createNodeStatusUpdater(Context context, } }; -YarnConfiguration conf = createNMConfig(); nm.init(conf); {code} The message can be made part of the Assert {code} +//calculate heartBeatCount based on connectionWaitSecs and RetryIntervalSecs +Assert.assertTrue(heartBeatCount == 2); {code} Can we pass in the barrier etc into the custom derived classes of nodemanager, rmservice etc so that we can avoid global vars? > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, > YARN-479.8.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623294#comment-13623294 ] Hadoop QA commented on YARN-479: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577136/YARN-479.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/675//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/675//console This message is automatically generated. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, > YARN-479.8.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623102#comment-13623102 ] Hadoop QA commented on YARN-479: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577113/YARN-479.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/674//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/674//console This message is automatically generated. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622967#comment-13622967 ] Hadoop QA commented on YARN-479: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577107/YARN-479.6.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/673//console This message is automatically generated. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622783#comment-13622783 ] Bikas Saha commented on YARN-479: - I dont see the value of waitForever if we can specify a large value for retry interval (1 day or so) Not sure what retryCounts is buying us. Whats the intention of catching and rethrowing the exception without doing anything else {code} + } catch (YarnException e) { +//catch and throw the exception if tried MAX wait time to connect RM +throw e; {code} there is a finally block which will make the code sleeping for longer than necessary before exiting. this becomes important because admins might kill the NM after waiting for a few seconds for it to exit. In that much time NM has to do a bunch of clean up tasks and this extra sleep does not help. Unrelated to this change, but does the NM really shutdown when the heartbeat fails right now? It looks like that the thread just keeps running. After this change it looks like the heartbeat thread will just exit. This does not mean that the NM will shutdown? > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620291#comment-13620291 ] Hadoop QA commented on YARN-479: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576654/YARN-479.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/650//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/650//console This message is automatically generated. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612779#comment-13612779 ] Xuan Gong commented on YARN-479: Could you re-phrase this error message(String errorMessage = "Failed to Connect to RM," + "no. of failed attempts is "+rmRetryCount; ), please ? The patch looks good. +1 > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: jian he > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612396#comment-13612396 ] Hadoop QA commented on YARN-479: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575276/YARN-479.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/584//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/584//console This message is automatically generated. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: jian he > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609818#comment-13609818 ] Hadoop QA commented on YARN-479: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574940/YARN-479.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/568//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/568//console This message is automatically generated. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: jian he > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609642#comment-13609642 ] jian he commented on YARN-479: -- thanks, Xuan. I'll update the patch > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: jian he > Attachments: YARN-479.1.patch, YARN-479.2.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609633#comment-13609633 ] Xuan Gong commented on YARN-479: Oh. I got it. Do you think we still need a test case since testNMRegistration only covered part of it? For example, Test what happen if NM will never get a response back, etc. This behavior is almost the same as nm retry for connection to RM. And the retry behavior for connection to RM has already been covered by other test case. So, I am not sure whether we still need a new test case just for handling heartbeat lost. Other than that, I think the patch looks good. Some minor format issue need to be fixed, such as extra spaces. And this "//Waiting for rmStartIntervalMS, RM will be started" in testNMRegistration() can be removed. Re-phrase the error message and warning message, please. We are waiting for heartbeat response back here. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: jian he > Attachments: YARN-479.1.patch, YARN-479.2.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609158#comment-13609158 ] jian he commented on YARN-479: -- 1. while its looping ,nodeStatus may,though unlikely,change, so I put it in the loop. 2. I'm not testing with testNMRegistration. I modified testNMRegistration because otherwise it fail, since by default RESOURCEMANAGER_CONNECT_WAIT_SECS is too long for this test > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: jian he > Attachments: YARN-479.1.patch, YARN-479.2.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608634#comment-13608634 ] Xuan Gong commented on YARN-479: minor question : why add Assert response != null ? Trying to test post-condition here ? If response == null, what will happen ? I mean, if response == null, the following code response.getNodeAction() will give error anyway. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: jian he > Attachments: YARN-479.1.patch, YARN-479.2.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608632#comment-13608632 ] Xuan Gong commented on YARN-479: Couple of comments on the latest one (479-2): 1. In the while(true) loop at NodeStatusUpdaterImpl : startStatusUpdater() : rmRetryCount ++ and response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse() can be in the try block, others such as NodeStatus nodeStatus = getNodeStatus(), etc, I think we can move them out of while(true) loop. We only consider losting heartBeatResponse. 2.please re-phrase the warning message and error message for more clarity - something along the lines of did not get the heartbeat response ... 3. testNMRegistration may not be a good place to test the changes. You can re-write your own ResourceTracker and NodeStatusUpdater to mimic the heartbeat response lose, and test your code if it can handle properly. Take a look at the MyNodeStatusUpdater and MyResourceTracker class, they can tell you how to do that. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: jian he > Attachments: YARN-479.1.patch, YARN-479.2.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607910#comment-13607910 ] Hadoop QA commented on YARN-479: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574579/YARN-479.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/549//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/549//console This message is automatically generated. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: jian he > Attachments: YARN-479.1.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira