[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626583#comment-13626583
 ] 

Hudson commented on YARN-479:
-

Integrated in Hadoop-Mapreduce-trunk #1394 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1394/])
YARN-479. NM retry behavior for connection to RM should be similar for lost 
heartbeats (Jian He via bikas) (Revision 1465731)

 Result = FAILURE
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465731
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Fix For: 2.0.5-beta
>
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
> YARN-479.8.patch, YARN-479.9.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626559#comment-13626559
 ] 

Hudson commented on YARN-479:
-

Integrated in Hadoop-Hdfs-trunk #1367 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1367/])
YARN-479. NM retry behavior for connection to RM should be similar for lost 
heartbeats (Jian He via bikas) (Revision 1465731)

 Result = FAILURE
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465731
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Fix For: 2.0.5-beta
>
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
> YARN-479.8.patch, YARN-479.9.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626468#comment-13626468
 ] 

Hudson commented on YARN-479:
-

Integrated in Hadoop-Yarn-trunk #178 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/178/])
YARN-479. NM retry behavior for connection to RM should be similar for lost 
heartbeats (Jian He via bikas) (Revision 1465731)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465731
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Fix For: 2.0.5-beta
>
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
> YARN-479.8.patch, YARN-479.9.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625701#comment-13625701
 ] 

Hudson commented on YARN-479:
-

Integrated in Hadoop-trunk-Commit #3575 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3575/])
YARN-479. NM retry behavior for connection to RM should be similar for lost 
heartbeats (Jian He via bikas) (Revision 1465731)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465731
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Fix For: 2.0.5-beta
>
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
> YARN-479.8.patch, YARN-479.9.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-08 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625688#comment-13625688
 ] 

Bikas Saha commented on YARN-479:
-

Thanks! +1.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
> YARN-479.8.patch, YARN-479.9.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625588#comment-13625588
 ] 

Hadoop QA commented on YARN-479:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577571/YARN-479.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/679//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/679//console

This message is automatically generated.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
> YARN-479.8.patch, YARN-479.9.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625577#comment-13625577
 ] 

Jian He commented on YARN-479:
--

update a new patch,fix the bug in testNMRegistration in testNodeStatusUpdater, 
and updated based on last comment

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
> YARN-479.8.patch, YARN-479.9.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-05 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623979#comment-13623979
 ] 

Bikas Saha commented on YARN-479:
-

Patch looks good overall!

why is this line showing as a diff
{code}
-List containersStatuses = new 
ArrayList();
...
+List containersStatuses = new 
ArrayList();
{code}


why is this change needed?
{code}
   public void testNMRegistration() throws InterruptedException {
+final long connectionWaitSecs = 5;
+final long connectionRetryIntervalSecs = 1;
+YarnConfiguration conf = createNMConfig();
+conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_WAIT_SECS,
+connectionWaitSecs);
+conf.setLong(YarnConfiguration
+.RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_SECS,
+connectionRetryIntervalSecs);
+
 nm = new NodeManager() {
{code}

and this change needed?
{code}
@@ -527,7 +599,6 @@ protected NodeStatusUpdater createNodeStatusUpdater(Context 
context,
   }
 };
 
-YarnConfiguration conf = createNMConfig();
 nm.init(conf);
{code}

The message can be made part of the Assert
{code}
+//calculate heartBeatCount based on connectionWaitSecs and 
RetryIntervalSecs
+Assert.assertTrue(heartBeatCount == 2);
{code}

Can we pass in the barrier etc into the custom derived classes of nodemanager, 
rmservice etc so that we can avoid global vars?

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
> YARN-479.8.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623294#comment-13623294
 ] 

Hadoop QA commented on YARN-479:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577136/YARN-479.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/675//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/675//console

This message is automatically generated.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
> YARN-479.8.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623102#comment-13623102
 ] 

Hadoop QA commented on YARN-479:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577113/YARN-479.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/674//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/674//console

This message is automatically generated.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622967#comment-13622967
 ] 

Hadoop QA commented on YARN-479:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577107/YARN-479.6.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/673//console

This message is automatically generated.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622783#comment-13622783
 ] 

Bikas Saha commented on YARN-479:
-

I dont see the value of waitForever if we can specify a large value for retry 
interval (1 day or so)

Not sure what retryCounts is buying us.

Whats the intention of catching and rethrowing the exception without doing 
anything else
{code}
+  } catch (YarnException e) {
+//catch and throw the exception if tried MAX wait time to connect 
RM
+throw e;
{code}

there is a finally block which will make the code sleeping for longer than 
necessary before exiting. this becomes important because admins might kill the 
NM after waiting for a few seconds for it to exit. In that much time NM has to 
do a bunch of clean up tasks and this extra sleep does not help.

Unrelated to this change, but does the NM really shutdown when the heartbeat 
fails right now? It looks like that the thread just keeps running. After this 
change it looks like the heartbeat thread will just exit. This does not mean 
that the NM will shutdown?

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620291#comment-13620291
 ] 

Hadoop QA commented on YARN-479:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576654/YARN-479.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/650//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/650//console

This message is automatically generated.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-25 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612779#comment-13612779
 ] 

Xuan Gong commented on YARN-479:


Could you re-phrase this error message(String errorMessage = "Failed to Connect 
to RM," + "no. of failed attempts is "+rmRetryCount; ), please ?
The patch looks good.
+1


> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: jian he
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612396#comment-13612396
 ] 

Hadoop QA commented on YARN-479:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12575276/YARN-479.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/584//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/584//console

This message is automatically generated.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: jian he
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609818#comment-13609818
 ] 

Hadoop QA commented on YARN-479:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12574940/YARN-479.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/568//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/568//console

This message is automatically generated.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: jian he
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-21 Thread jian he (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609642#comment-13609642
 ] 

jian he commented on YARN-479:
--

thanks, Xuan. I'll update the patch

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: jian he
> Attachments: YARN-479.1.patch, YARN-479.2.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-21 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609633#comment-13609633
 ] 

Xuan Gong commented on YARN-479:


Oh. I got it.
Do you think we still need a test case since testNMRegistration only covered 
part of it? For example, Test what happen if NM will never get a response back, 
etc. This behavior is almost the same as nm retry for connection to RM. And the 
retry behavior for connection to RM has already been covered by other test 
case. So, I am not sure whether we still need a new test case just for handling 
heartbeat lost.
Other than that, I think the patch looks good. 
Some minor format issue need to be fixed, such as extra spaces. 
And this "//Waiting for rmStartIntervalMS, RM will be started" in 
testNMRegistration() can be removed.
Re-phrase the error message and warning message, please. We are waiting for 
heartbeat response back here.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: jian he
> Attachments: YARN-479.1.patch, YARN-479.2.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-21 Thread jian he (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609158#comment-13609158
 ] 

jian he commented on YARN-479:
--

1. while its looping ,nodeStatus may,though unlikely,change, so I put it in the 
loop.
2. I'm not testing with testNMRegistration. I modified testNMRegistration 
because otherwise it fail, since by default RESOURCEMANAGER_CONNECT_WAIT_SECS 
is too long for this test

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: jian he
> Attachments: YARN-479.1.patch, YARN-479.2.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-20 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608634#comment-13608634
 ] 

Xuan Gong commented on YARN-479:


minor question : why add Assert response != null ? Trying to test 
post-condition here ? 
If response == null, what will happen ? I mean, if response == null, the 
following code response.getNodeAction() will give error anyway. 

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: jian he
> Attachments: YARN-479.1.patch, YARN-479.2.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-20 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608632#comment-13608632
 ] 

Xuan Gong commented on YARN-479:


Couple of comments on the latest one (479-2):
1. In the while(true) loop at NodeStatusUpdaterImpl : startStatusUpdater() :
 rmRetryCount ++ and response = 
resourceTracker.nodeHeartbeat(request).getHeartbeatResponse() can be in the try 
block, others such as NodeStatus nodeStatus = getNodeStatus(), etc, I think we 
can move them out of while(true) loop. We only consider losting 
heartBeatResponse.
2.please re-phrase the warning message and error message for more clarity - 
something along the lines of did not get the heartbeat response ...
3. testNMRegistration may not be a good place to test the changes. You can 
re-write your own ResourceTracker and NodeStatusUpdater to mimic the heartbeat 
response lose, and test your code if it can handle properly. Take a look at the 
MyNodeStatusUpdater and MyResourceTracker class, they can tell you how to do 
that.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: jian he
> Attachments: YARN-479.1.patch, YARN-479.2.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607910#comment-13607910
 ] 

Hadoop QA commented on YARN-479:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12574579/YARN-479.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/549//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/549//console

This message is automatically generated.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: jian he
> Attachments: YARN-479.1.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira