[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066251#comment-14066251
 ] 

Hudson commented on YARN-1341:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #616 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/616/])
YARN-1341. Recover NMTokens upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611512)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseNMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.6.0

 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066342#comment-14066342
 ] 

Hudson commented on YARN-1341:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1835 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1835/])
YARN-1341. Recover NMTokens upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611512)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseNMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.6.0

 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066365#comment-14066365
 ] 

Hudson commented on YARN-1341:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1808 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1808/])
YARN-1341. Recover NMTokens upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611512)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseNMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.6.0

 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064870#comment-14064870
 ] 

Hadoop QA commented on YARN-1341:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656064/YARN-1341v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4344//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4344//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-17 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065716#comment-14065716
 ] 

Junping Du commented on YARN-1341:
--

+1. Patch looks good. Will commit it shortly.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-17 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065715#comment-14065715
 ] 

Junping Du commented on YARN-1341:
--

I can confirm test failure is not related to the patch as it also show up in 
YARN-2045. The similar issue happens in AM WebServices (MAPREDUCE-5973)and RM 
WebServices (YARN-2304) also. Already filed YARN-2316 to track these failures.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065795#comment-14065795
 ] 

Hudson commented on YARN-1341:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5906 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5906/])
YARN-1341. Recover NMTokens upon nodemanager restart. (Contributed by Jason 
Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611512)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseNMTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063381#comment-14063381
 ] 

Junping Du commented on YARN-1341:
--

Thanks [~jlowe] for reply above and document work on umbrella JIRA. I think we 
can add error handling session later for all cases according to discussion 
above. Isn't it? I also agree retry (configurable or not) may not be necessary 
at this point. We may add it if future if we really need it. [~devaraj.k], what 
do you think?
The latest patch looks good to me overall. Some comments on tiny issues:
In NMTokenSecretManagerInNM.java
{code}
+// if there was no master key, try the previous key
+if (super.currentMasterKey == null) {
+  super.currentMasterKey = previousMasterKey;
+}
{code}
Is above code still necessary given currentMasterKey will updated soon from RM 
registration as what we discussed above?

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063704#comment-14063704
 ] 

Hadoop QA commented on YARN-1341:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656064/YARN-1341v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4323//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4323//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063981#comment-14063981
 ] 

Jason Lowe commented on YARN-1341:
--

I believe the test failures are unrelated.  All of the nodemanager tests pass 
for me locally, and the build report with the test failures shows they're all 
of the java.net.BindException: Address already in use variety.  Other web 
services tests have been failing in MAPREDUCE in a similar way -- I suspect a 
process from a previous test run got stuck on a popular web service port.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064427#comment-14064427
 ] 

Junping Du commented on YARN-1341:
--

Agree. The test failure shouldn't be related to the latest patch. Kick off 
Jenkins test again.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061891#comment-14061891
 ] 

Junping Du commented on YARN-1341:
--

Hey [~jlowe], I also agree it is better to discuss the inconsistent scenario 
for each cases on separated JIRAs. However, for now, our conclusion from these 
discussions can only be true in theoretically but it may have bugs/issues in 
practical. Thus, I also suggest we should have a central place to document 
these assumptions/conclusions from discussions and it would help us and others 
in community to identify potential issues if coming up with UT or other 
integration tests on negative cases later. What do you think? If you are also 
agree on this, we can separate this document effort to other JIRA (Umbrella or 
a dedicated one, whatever you like) and continue the discussion on this 
particular case.
On this particular one, the assumptions here from discussion above seems like: 
if NM restart with stale keys, 
a. if currentMasterKey is stale, it can be updated and override soon with 
registering to RM later. Nothing is affected.
b. if previousMasterKey is stale, then the real previous master key is lost, so 
the affection is: AMs with real master key cannot connect to NM to launch 
containers.
c. if applicationMasterKeys are stale, then previous old keys get tracked in 
applicationMasterKeys get lost after restart. The affection is: AMs with old 
keys cannot connect to NM to launch containers.
I would prefer option 1 too if we listed all affections here. Anything I am 
missing here?

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062861#comment-14062861
 ] 

Jason Lowe commented on YARN-1341:
--

Thanks for commenting, Devaraj!  My apologies for the late reply, as I was on 
vacation and am still catching up.

bq. In addition to option 1), I'd think of making the NM down if NM fails to 
store RM keys for certain number of times(configurable) consecutively.

As for retries, I mentioned earlier that if retries are likely to help then the 
state store implementation should do so rather than have the common code do so. 
 For the leveldb implementation it is very unlikely that a retry is going to do 
anything other than just make the operation take longer to ultimately fail.  
The the firmware of the drive is already going to implement a large number of 
retries to attempt to recover from hardware errors, and non-hardware local 
filesystem errors are highly unlikely to be fixed by simply retrying 
immediately.  If that were the case then I'd expect retries to be implemented 
in many other places where the local filesystem is used by Hadoop code.

bq. And also we can make it(i.e. tear down NM or not) as configurable

I'd like to avoid adding yet more config options unless we think we really need 
them, but if people agree this needs to be configurable then we can do so.  
Also I assume in that scenario you would want the NM to shutdown while also 
tearing down containers, cleaning up, etc. as if it didn't support recovery.  
Tearing down the NM on a state store error just to have it start up again and 
try to recover with stale state seems pointless -- might as well have just kept 
running which is a better outcome.  Or am I missing a use case for that?


And thanks, Junping, for the recent comments!

bq. If you are also agree on this, we can separate this document effort to 
other JIRA (Umbrella or a dedicated one, whatever you like) and continue the 
discussion on this particular case.

Sure, we can discuss general error handling or an overall document for it 
either on YARN-1336 or a new JIRA.

bq. a. if currentMasterKey is stale, it can be updated and override soon with 
registering to RM later. Nothing is affected.
Correct, the NM should receive the current master key upon re-registration with 
the RM after it restarts.

bq. b. if previousMasterKey is stale, then the real previous master key is 
lost, so the affection is: AMs with real master key cannot connect to NM to 
launch containers.
AMs that have the current master key will still be able to connect because the 
NM just got the current master key as described in a).  AM's that have the 
previous master key will not be able to connect to the NM unless that 
particular master key also happened to be successfully associated with the 
attempt in the state store (related to case c).

bq. c. if applicationMasterKeys are stale, then previous old keys get tracked 
in applicationMasterKeys get lost after restart. The affection is: AMs with old 
keys cannot connect to NM to launch containers.
AMs that use an old key (i.e.: not the current or previous master key) would be 
unable to connect to the NM.

bq. Anything I am missing here?
I don't believe so.  The bottom line is that an AM may not be able to 
successfully connect to an NM after a restart with stale NM token state.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063031#comment-14063031
 ] 

Hadoop QA commented on YARN-1341:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651342/YARN-1341v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4319//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4319//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-30 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047910#comment-14047910
 ] 

Devaraj K commented on YARN-1341:
-

Sorry for coming late here. 

+1 for limiting the implementation/discussion as per Jira title and handling 
other cases in the respected Jira’s.

In addition to option 1), I'd think of making the NM down if NM fails to store 
RM keys for certain number of times(configurable) consecutively. And also we 
can make it(i.e. tear down NM or not) as configurable and let the users choose 
whether to enable or disable the config to make the NM down for RM keys  state 
store failures.

Similarly for Container/Application state store failures, NM can mark that 
Container/Application as failed and can be reported to RM. These can be 
discussed more detail in the corresponding Jira’s YARN-1337 and YARN-1354. 
However for all these NM state store operations, we could think of having 
retries before throwing the IOException. 

Thoughts?


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045959#comment-14045959
 ] 

Jason Lowe commented on YARN-1341:
--

Agree it's not ideal to discuss handling state store errors for all NM 
components in this JIRA.  In general I'd prefer to discuss and address each 
case with the corresponding JIRA, e.g.: application state store errors 
discussed and addressed in YARN-1354, container state store errors in 
YARN-1337, etc.  If we feel there's significant utility to committing a JIRA 
before all the issues are addressed then we can file one or more followup JIRAs 
to track those outstanding issues.  That's the normal process we follow with 
other features/fixes as well.  

So if we follow that process then we're back to the discussion about RM master 
keys not being able to be stored in the state store.  The choices we've 
discussed are:

1) Log an error, update the master key in memory, and continue
2) Log an error, _not_ update the master key in memory, and continue
3) Log an error and tear down the NM

I'd prefer 1) since that is the option that preserves the most work in all 
scenarios I can think of, and I don't know of a scenario where 2) would handle 
it better.  However I could be convinced given the right scenario.  I'd really 
rather avoid 3) since that seems like a severe way to handle the error and 
guarantees work is lost.

Oh there is one more handling scenario we briefly discussed where we flag the 
NM as undesirable.  When that occurs we don't shoot the containers that are 
running, but we avoid adding new containers since the node is having issues 
(i.e.: a drain-decommission).  I feel that would be a separate JIRA since it 
needs YARN-914, and we'd still need to decide how to handle the error until the 
decommission is complete (i.e.: choice 1 or 2 above).

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045522#comment-14045522
 ] 

Junping Du commented on YARN-1341:
--

[~jlowe], I agree we should treat each state inconsistent case specifically, 
and it is good that we already had many discussions to cover many cases which 
could beyond the work in this JIRA. I prefer to continue these discussions in a 
separated JIRA until we make sure all scenarios are covered properly (not block 
this JIRA's work). What do you think?   

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042559#comment-14042559
 ] 

Jason Lowe commented on YARN-1341:
--

bq. So far from I know, RM restart didn't track this because these metrics will 
be recover during events recovery in RM restart. In current NM restart, some 
metrics could be lost, i.e. allocatedContainers, etc. I think we should either 
count them back as part of events during recovery or persistent them. Thoughts?

Not all of the RM metrics will be recovered, correct?  RPC metrics will be 
zeroed since those aren't persisted (nor should they be, IMHO).  Aggregate 
containers allocated/released in the queue metrics will be wrong since the RM 
restart work, by design, doesn't store per-container state.  If the cluster 
stays up too long then apps submitted/completed/failed/killed will not be 
correct, as I believe it will only count the applications that haven't been 
reaped due to retention policies.  Anyway this is outside the scope of this 
JIRA, and I'll file a separate JIRA underneath the YARN-1336 umbrella to 
discuss what we should do about NM metrics and restart.

bq. If so, how about we don't apply these changes until these changes can be 
persistent? If so, we still keep consistent between state store and NM's 
current state. Even we choose to fail the NM, we still can load state and 
recover the working. 

Again I think this is a case-by-case thing.  For the RM master key, I'd rather 
keep going with the current master key and hope the next key update is able to 
persist (e.g.: a full disk where the state is stored that is later cleared up) 
rather than ditch the new key update and risk bringing down the NM because it 
can no longer keep talking to the RM or AMs.  As I mentioned earlier, the 
failure to persist the RM master key or the master key used by an AM is that 
_if_ the NM happens to restart then some AMs _might_ not be able to 
authenticate with the NM until they get updated to the new master key.  If we 
take down the NM or keep going but fail to update the master key in memory then 
this seems purely worse.  The opportunity for error has widened, but I don't 
see any advantage gained by doing so.

bq. Do we expect some operations can be failed while other operation can be 
successful? If this means short-term unavailable for persistent effort, we can 
just handle it by adding retry. If not, we should expect other operations that 
fetal get failed soon enough, and in this case, log error and move on in 
non-fatal operations don't have many differences. No? 

I don't expect immediate retry to help, and if the state store implementation 
is such that immediate retry is likely to help then the state store 
implementation should do that directly before throwing the error rather than 
relying on the upper-layer code to do so.  However I do expect there to be 
common failure modes where the error state is temporary but not in the 
immediate sense (e.g.: the full disk scenario).  And although an NM can't 
launch containers without a working state store, there's still a lot of useful 
stuff an NM can do with a broken state store -- report status of active 
containers, serve up shuffle data, etc.  So far I don't think any of the state 
store updates should result in a teardown of the NM if there is a failure, 
although please let me know if you have a scenario where we should.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041077#comment-14041077
 ] 

Junping Du commented on YARN-1341:
--

bq. Yes, applications should be like containers. If we fail to store an 
application start in the state store then we should fail the container launch 
that triggered the application to be added. This already happens in the current 
patch for YARN-1354. If we fail to store the completion of an application then 
worst-case we will report an application to the RM on restart that isn't 
active, and the RM will correct the NM when it re-registers.
That make sense. I guess we should do additional work to check if the behavior 
is as our expected.

bq. I wasn't planning on persisting metrics during restart, as there are quite 
a few (e.g.: RPC metrics, etc.), and I'm not sure it's critical that they be 
preserved across a restart. Does RM restart do this or are there plans to do so?
I think these metrics are important especially for user's monitoring tools and 
we should make these info consistent during restart. So far from I know, RM 
restart didn't track this because these metrics will be recover during events 
recovery in RM restart. In current NM restart, some metrics could be lost, i.e. 
allocatedContainers, etc. I think we should either count them back as part of 
events during recovery or persistent them. Thoughts?

bq. Therefore I don't believe the effort to maintain a stale tag is going to be 
worth it. Also if we refuse to load a state store that's stale then we are 
going to leak containers because we won't try to recover anything from a stale 
state store.
If so, how about we don't apply these changes until these changes can be 
persistent? If so, we still keep consistent between state store and NM's 
current state. Even we choose to fail the NM, we still can load state and 
recover the working.  

bq. Instead I think we should decide in the various store failure cases whether 
the error should be fatal to the operation (which may lead to it being fatal to 
the NM overall) or if we feel the recovery with stale information is a better 
outcome than taking the NM down. In the latter case we should just log the 
error and move on.
Do we expect some operations can be failed while other operation can be 
successful? If this means short-term unavailable for persistent effort, we can 
just handle it by adding retry. If not, we should expect other operations that 
fetal get failed soon enough, and in this case, log error and move on in 
non-fatal operations don't have many differences. No? 



 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039138#comment-14039138
 ] 

Jason Lowe commented on YARN-1341:
--

bq. The worst case seems to me is: NM restart with partial state recovered, 
this inconsistent state is not aware by running containers which could bring 
some weird bugs.

Yes, you're correct.  The worst-case is likely where we come up, fail to 
realize a container is running, and therefore the container leaks.

I think we should handle store errors on a case-by-case basis, based on the 
ramifications of how the system will recover without that information.  For 
containers, a container should fail to launch if state store errors occur to 
mark the container request and/or mark the container launch.  The YARN-1336 
prototype patch already does this when containers are requested and in 
ContainerLaunch.  That way the worst-case scenario for a container is that we 
throw an error for the container request or the container fails before launch 
due to a state store error.  We failed to launch a container, but the whole NM 
doesn't go down.  If we fail to mark the container completed in the store then 
worst-case scenario is that we try to recover a container that isn't there, 
which again will mark the container as failed and we'll report that to the RM.  
If the RM doesn't know about the failed container (because the container/app is 
long gone) then it will just ignore it.

For deletion service, if we fail to update the store then we may fail to delete 
something when we recover if we happened to restart in-between.  If we ignore 
the error then it's very likely the NM will _not_ restart before the deletion 
time expires and the file is deleted.  However if we tear down the NM on a 
store error then we will also fail to delete it when the NM restarts later 
since we failed to record it, meaning we made things purely worse -- we lost 
work _and_ leaked the thing we were supposed to delete.  Therefore for deletion 
tasks I think the current behavior is appropriate.

For localized resources failing to update the store means we could end up 
leaking a resource or thinking a resource is there when it's really not.  The 
latter isn't a huge problem because when we try to reference the resource again 
it checks if it's there, and if it isn't it re-localizes it again.  Not knowing 
a resource is there is a bigger issue, and there's a couple of ways to tackle 
that one -- either fail the localization of the resource when the state store 
error occurs or have the NM scan the local resource directories for unknown 
resources when it recovers.

For the RM master key, I see it very similar to the deletion task case.  If we 
fail to store it then the NM will update it in memory, and can keep going.  If 
we restart without recovering an older key (the current key will be obtained 
when the NM re-registers with the RM) then we may fail to let AMs connect that 
only have an older key.  Containers that were still on the NM will still 
continue.  If we take down the NM when the store hiccups then we lose work 
which seems worse than a possibility the AM could fail to connect to the NM 
(which can and does already happen today due to network cuts, etc.) 

bq.  May be we add some stale tag on NMStateStore and mark this when store 
failure happens and never load a staled store.

If we had an error storing then we're likely to have the same error trying to 
store a stale tag, or am I misunderstanding the proposal?  Also as I mentioned 
above, there are many cases where a partial recovery isn't a bad thing as the 
system can recover via other means (e.g.: trying to recover a container that 
already completed should be benign, trying to delete a container directory 
that's already deleted is benign, etc.).

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039206#comment-14039206
 ] 

Junping Du commented on YARN-1341:
--

Thanks [~jlowe] for detailed explanation here! I totally agree that we should 
deal with this case by case and appreciate your analysis on above cases. I 
think there are still other cases we should double-check, some of them may 
suffer more from inconsistency.
- Application state - If we failed to store the application update, i.e. from 
init to finish, then we get wrong state on application after recovery. 
- NodeManagerMetrics - The metrics of NM will get mess up if partial updated. 
(We haven't get JIRA to store/recover this. Isn't it?)
On the side effect for bring NM down, like case in deletionServices. I think we 
can just do cleanup on these directories (like we want to do in node 
decommission cases).
About stale tag on NMStateStore - I don't mean to put on NMStateStore, but 
haven't think clearly on where to do - may be we can persistent on local disk 
directly or send to RM and retrieval it in NM registration? 

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039235#comment-14039235
 ] 

Jason Lowe commented on YARN-1341:
--

bq. Application state - If we failed to store the application update, i.e. from 
init to finish, then we get wrong state on application after recovery.

Yes, applications should be like containers.  If we fail to store an 
application start in the state store then we should fail the container launch 
that triggered the application to be added.  This already happens in the 
current patch for YARN-1354.  If we fail to store the completion of an 
application then worst-case we will report an application to the RM on restart 
that isn't active, and the RM will correct the NM when it re-registers.

bq. NodeManagerMetrics - The metrics of NM will get mess up if partial updated.

I wasn't planning on persisting metrics during restart, as there are quite a 
few (e.g.: RPC metrics, etc.), and I'm not sure it's critical that they be 
preserved across a restart.  Does RM restart do this or are there plans to do 
so?

bq. About stale tag on NMStateStore - I don't mean to put on NMStateStore, but 
haven't think clearly on where to do - may be we can persistent on local disk 
directly or send to RM and retrieval it in NM registration?

I think in most cases the attempt to update the stale tag, even if it's 
separate from the NMStateStore, will often fail in a similar way when the state 
store fails (e.g.: full local disk, read-only filesystem, etc.).  Therefore I 
don't believe the effort to maintain a stale tag is going to be worth it.  Also 
if we refuse to load a state store that's stale then we are going to leak 
containers because we won't try to recover anything from a stale state store.

Instead I think we should decide in the various store failure cases whether the 
error should be fatal to the operation (which may lead to it being fatal to the 
NM overall) or if we feel the recovery with stale information is a better 
outcome than taking the NM down.  In the latter case we should just log the 
error and move on.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037656#comment-14037656
 ] 

Junping Du commented on YARN-1341:
--


bq. I'm not sure I understand what you're requesting. Recovering the NM tokens 
is one line of code (3 if we count the if canRecover part), and recovering 
the container tokens in YARN-1342 will add one more line for that (inside the 
same if canRecover block). I went ahead and factored this into a separate 
method, however I'm not sure it matches what you were expecting as I don't see 
where we're saving duplicated code. If what's in the updated patch isn't what 
you expected, please provide some sample pseudo-code to demonstrate how we can 
avoid duplication of code.
I think it is fine for now. However, I would like to refactor a bit on 
NodeManager#serviceInit() when we finish all these recover work to avoid some 
duplicate work,  some code like: createNMContext(), we duplicated set some 
handler. Anyway, we can do this later.

bq. The problem with throwing an exception is what to do with the exception – 
do we take down the NM? That seems like a drastic answer since the NM will 
likely chug along just fine without the key stored. It only becomes a problem 
when the NM restarts and restores an old key. However if we rollback the old 
key here then we take that only-breaks-if-we-happened-to-restart case and make 
it an always-breaks scenario. Eventually the old key will no longer be valid to 
the RM, and none of the AMs will be able to authenticate to the NM. Therefore I 
thought it would be better to log the error, press onward, and hope we don't 
restart before we store a valid key again (maybe store error was transient) 
rather than either take down the NM or have things start failing even without a 
restart
We already have similar tradeoff in RM side, if any exception happens in 
RMStore then it will bring down RM. In NM case, if levelDB stop to work, I 
think we should bring NM down to get rid of any inconsistent after NM restart. 
Although I am not sure what weird things could happen in case of inconsistency 
here, but considering it is cheaper to bring down NM, we should play more 
safety in our case than RM. Actually, I bring up some thoughts on play more 
risky in RM side at YARN-2019 which target to reduce RM service down time. But 
here, I prefer to be safer. Jason, what do you think? 
 

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037868#comment-14037868
 ] 

Junping Du commented on YARN-1341:
--

bq.  Restarts should be rare, and I'd rather not force a loss of work by taking 
the NM down instantly when the state store hiccups.
Yes. But considering rolling upgrade case, it (restart) should be much often 
than failed in state store (Correct me here if I am wrong as I am not levelDB 
expert). In this case, we always look forward to some work loss as even if we 
don't bring NM down now, we will suffer after NM restart in upgrade.

bq.  If the state store is missing some things, we might not be able to recover 
a localized resource, a token, a container, or possibly anything at all.
I am not worrying losing them all, but if we can only partially recover these, 
would it become a problem and break some assumptions we have? I don't know. But 
this seems to make things more complicated.

bq.  in the worst-case, the state store is so corrupted on startup that we 
don't even survive the NM restart and the NM crashes, which would have an end 
result just like if we took it down when the state store failed.
I am not sure if this is the worst case. The worst case seems to me is: NM 
restart with partial state recovered, this inconsistent state is not aware by 
running containers which could bring some weird bugs. I am not sure how 
possible it could happen here, please 

bq.  Therefore I'd rather not guarantee that we'll lose work by crashing the NM 
on any store error and instead try to preserve the work we have. The NM could 
theoretically recover (e.g.: if the error is transient then the next RM key 
store could succeed). If we take the NM down immediately then we're 
guaranteeing the work is lost. Is that really better?
I think it is better to guarantee the work get lost as the expectation to user 
is consistent. We don't know when new Token from RM come to refresh to stale 
one to make persevering work succeed in lucky. User shouldn't expect work still 
get preserved after NM restart if state store get failed sometime.

bq. May be a better approach is to have errors like this trigger an unhealthy 
state for the NM when we have the ability to do a graceful decommission. 
I agree. This could be a better approach.

In overall, I agree that we can keep log error here without breaking NM down 
(or we will have change previous code on update 
localizedResources/deletionServices) for reason you specified above. However, 
to get rid of loading inconsistent state and manage user's expectation. I think 
we shouldn't allow the state get loaded again if get some failure before in 
store. May be we add some stale tag on NMStateStore and mark this when store 
failure happens and never load a staled store. [~jlowe], what do you think?

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036592#comment-14036592
 ] 

Junping Du commented on YARN-1341:
--

Thanks for updating the patch, [~jlowe]! 
Some minor comments:
The change in BaseContainerTokenSecretManager.java is not necessary and I 
believe that belongs to YARN-1342. Let’s remove it for this patch.

In NodeManager.java,
{code}
 NMTokenSecretManagerInNM nmTokenSecretManager =
-new NMTokenSecretManagerInNM();
+new NMTokenSecretManagerInNM(nmStore);
+
+if (nmStore.canRecover()) {
+  nmTokenSecretManager.recover(nmStore.loadNMTokenState());
+}
{code}
Can we consolidate the code in a separated method together with 
NMContainerTokenSecretManager as we will do similar thing to recover 
ContainerToken staff which make code have duplicated things?

In NMTokenSecretManagerInNM.java, 
{code}
+  private void updateCurrentMasterKey(MasterKeyData key) {
+super.currentMasterKey = key;
+try {
+  stateStore.storeNMTokenCurrentMasterKey(key.getMasterKey());
+} catch (IOException e) {
+  LOG.error(Unable to update current master key in state store, e);
+}
+  }
+
+  private void updatePreviousMasterKey(MasterKeyData key) {
+previousMasterKey = key;
+try {
+  stateStore.storeNMTokenPreviousMasterKey(key.getMasterKey());
+} catch (IOException e) {
+  LOG.error(Unable to update previous master key in state store, e);
+}
+  }
{code}
Does log error here is just enough in case of failure in store? If Master key 
is updated but not persistent, then it could cause some inconsistency when 
recover it. I think we should throw some exception here if store get failed and 
rollback the key just set. Thoughts?

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036853#comment-14036853
 ] 

Hadoop QA commented on YARN-1341:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12651342/YARN-1341v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4022//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4022//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-17 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034021#comment-14034021
 ] 

Junping Du commented on YARN-1341:
--

[~jlowe], Thanks for the patch here. I am currently reviewing it and looks like 
some code like: LeveldbIterator, NMStateStoreService already get committed in 
other patches. Would you resync the patch here against trunk? Thanks!

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034588#comment-14034588
 ] 

Hadoop QA commented on YARN-1341:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650914/YARN-1341v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4017//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4017//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985928#comment-13985928
 ] 

Hadoop QA commented on YARN-1341:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12642696/YARN-1341v4-and-YARN-1987.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3667//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3667//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963581#comment-13963581
 ] 

Hadoop QA commented on YARN-1341:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12639280/YARN-1341v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3534//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3534//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923276#comment-13923276
 ] 

Hadoop QA commented on YARN-1341:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633251/YARN-1341.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3283//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3283//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923372#comment-13923372
 ] 

Hadoop QA commented on YARN-1341:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633265/YARN-1341v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3285//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3285//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)