[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941073#comment-13941073
 ] 

Karthik Kambatla commented on YARN-1815:


Even if the UMA finishes successfully, there is no way for the RM to know. At 
least, not until YARN-556. Today, the RM tries to recover the app, but can't 
recover UAM. The corresponding RMApp transitions to FAILED after a while. 

This JIRA is only avoiding those attempts to recover and marking it as FAILED 
early. 

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
 yarn-1815-2.patch, yarn-1815-2.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933580#comment-13933580
 ] 

Karthik Kambatla commented on YARN-1815:


The tests pass locally. Filed YARN-1830 for TestRMRestart failure and YARN-1591 
covers TestRMRestart failure.

[~vinodkv] - mind taking a look at the updated patch? 

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
 yarn-1815-2.patch, yarn-1815-2.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933639#comment-13933639
 ] 

Jian He commented on YARN-1815:
---

Thanks Karthik for the patch.
For now, it should be fine to move UMA to Failed state as UMA is not saving the 
final state and RM restart doesn’t support UMA. The core change looks good.

Test case:  we need a more thorough test case to test UMA is moved to Failed 
state after RM restarts using two MockRMs like the ones in TestRMRestart. The 
bigger problem is that if Unmanged application is not added back to the 
completedApps in RMAppManager after RM restart via the FinalTransition, it'll 
never be removed from state store. We remove the applications from state store 
when completedApps in RMAppManager go beyond the max-app-limit.

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
 yarn-1815-2.patch, yarn-1815-2.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933938#comment-13933938
 ] 

Jian He commented on YARN-1815:
---

bq. it should be fine to move UMA to Failed state as UMA is not saving the 
final state
On a second thought, if the UMA just successfully finished, and it will also be 
moved to FAILD state after RM restart? this doesn't seem right.

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
 yarn-1815-2.patch, yarn-1815-2.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932232#comment-13932232
 ] 

Karthik Kambatla commented on YARN-1815:


bq. If so, we need to make sure the App state moves to FAILED for apps with 
unmanaged AMs after RM restart.
Good catch, Vinod. Looking into this. 

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-1815-1.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932419#comment-13932419
 ] 

Hadoop QA commented on YARN-1815:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634246/yarn-1815-2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3335//console

This message is automatically generated.

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-1815-1.patch, yarn-1815-2.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932630#comment-13932630
 ] 

Hadoop QA commented on YARN-1815:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634273/yarn-1815-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3338//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3338//console

This message is automatically generated.

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
 yarn-1815-2.patch, yarn-1815-2.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-11 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931015#comment-13931015
 ] 

Sandy Ryza commented on YARN-1815:
--

+1

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1815-1.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931020#comment-13931020
 ] 

Jian He commented on YARN-1815:
---

looks good, +1

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1815-1.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931064#comment-13931064
 ] 

Hadoop QA commented on YARN-1815:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634014/yarn-1815-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3320//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3320//console

This message is automatically generated.

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1815-1.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931065#comment-13931065
 ] 

Vinod Kumar Vavilapalli commented on YARN-1815:
---

Wait, that doesn't make sense. Do we store unmanaged AMs? If so, we need to 
make sure the App state moves to FAILED for apps with unmanaged AMs after RM 
restart.

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1815-1.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)