[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946414#comment-13946414
 ] 

Hudson commented on YARN-1852:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #520 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/520/])
YARN-1852. Fixed RMAppAttempt to not resend AttemptFailed/AttemptKilled events 
to already recovered Failed/Killed RMApps. Contributed by Rohith Sharmaks 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580997)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946546#comment-13946546
 ] 

Hudson commented on YARN-1852:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1737 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1737/])
YARN-1852. Fixed RMAppAttempt to not resend AttemptFailed/AttemptKilled events 
to already recovered Failed/Killed RMApps. Contributed by Rohith Sharmaks 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580997)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946574#comment-13946574
 ] 

Hudson commented on YARN-1852:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1712 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1712/])
YARN-1852. Fixed RMAppAttempt to not resend AttemptFailed/AttemptKilled events 
to already recovered Failed/Killed RMApps. Contributed by Rohith Sharmaks 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580997)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944881#comment-13944881
 ] 

Hadoop QA commented on YARN-1852:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636314/YARN-1852.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3441//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3441//console

This message is automatically generated.

 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945464#comment-13945464
 ] 

Jian He commented on YARN-1852:
---

LGTM, +1

 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945557#comment-13945557
 ] 

Hudson commented on YARN-1852:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5390 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5390/])
YARN-1852. Fixed RMAppAttempt to not resend AttemptFailed/AttemptKilled events 
to already recovered Failed/Killed RMApps. Contributed by Rohith Sharmaks 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580997)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-23 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944353#comment-13944353
 ] 

Jian He commented on YARN-1852:
---

Thanks Rohith for the patch !
Patch looks good.  Did minor modification myself to remove some duplicate 
asserts.

 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-23 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944539#comment-13944539
 ] 

Jian He commented on YARN-1852:
---

Hi [~rohithsharma], just walked through the code again. We should not send the 
ATTEMPT_FAILED/ATTEMPT_KILLED events, if the app was supposed to recover to the 
final state. We should send the events only if the app was not able to recover 
it self. I think the following RMAppImpl.isAppInFinalState has some problem, 
it's checking against the move-to state, while by the time this method is 
called, the app has not yet moved to this state. We may check against 
RMApp.recoveredFinalState state instead?
{code}
// We will replay the final attempt only if last attempt is in final
// state but application is not in final state.
if (rmApp.getCurrentAppAttempt() == appAttempt
 !RMAppImpl.isAppInFinalState(rmApp)
{code}

 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-1852.2.patch, YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941839#comment-13941839
 ] 

Hadoop QA commented on YARN-1852:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635773/YARN-1852.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3407//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3407//console

This message is automatically generated.

 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-19 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940394#comment-13940394
 ] 

Rohith commented on YARN-1852:
--

Here is the exception stack trace..

For Killed application state=KILLED
{noformat}
2014-03-19 14:26:11,618 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1394526371652_0004 with 1 attempts and final state = KILLED
2014-03-19 14:26:11,618 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root 
OPERATION=Application Finished - Killed TARGET=RMAppManager RESULT=SUCCESS  
APPID=application_1394526371652_0003
2014-03-19 14:26:11,618 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1394526371652_0004_01 with final state: 
KILLED
2014-03-19 14:26:11,618 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: 
appId=application_1394526371652_0003,name=Sleep 
job,user=root,queue=default,state=KILLED,trackingUrl=host-10-18-40-77:45020/cluster/app/application_1394526371652_0003,appMasterHost=N/A,startTime=1394526759247,finishTime=1394527194947,finalStatus=KILLED
2014-03-19 14:26:11,619 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Unregistering app attempt : appattempt_1394526371652_0004_01
2014-03-19 14:26:11,619 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1394526371652_0004_01 State change from NEW to KILLED
2014-03-19 14:26:11,619 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1394526371652_0004 State change from NEW to KILLED
2014-03-19 14:26:11,619 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
ATTEMPT_KILLED at KILLED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:632)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:690)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:674)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:662)
{noformat}

For failed application state=FAILED
{noformat}
2014-03-19 14:26:11,614 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1394528000856_0003 with 2 attempts and final state = FAILED
2014-03-19 14:26:11,614 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: 
appId=application_1395139734891_0003,name=Sleep 
job,user=root,queue=d,state=FINISHED,trackingUrl=http://host-10-18-40-77:45020/proxy/application_1395139734891_0003/jobhistory/job/job_1395139734891_0003,appMasterHost=N/A,startTime=1395141914653,finishTime=1395141933121,finalStatus=SUCCEEDED
2014-03-19 14:26:11,614 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1394528000856_0003_01 with final state: 
FAILED
2014-03-19 14:26:11,615 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1394528000856_0003_02 with final state: 
FAILED
2014-03-19 14:26:11,615 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1394528000856_0003_01 State change from NEW to FAILED
2014-03-19 14:26:11,615 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Unregistering app attempt : appattempt_1394528000856_0003_02
2014-03-19 14:26:11,615 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1394528000856_0003_02 State change from NEW to FAILED
2014-03-19 14:26:11,616 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1394528000856_0003 State change from NEW to FAILED
2014-03-19 14:26:11,616 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
ATTEMPT_FAILED at FAILED
 

[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940822#comment-13940822
 ] 

Jian He commented on YARN-1852:
---

This seems most likely due to , we are replaying the attempt's 
BaseFinalTransition logic which causes sending a new FAILED/KILLED event, while 
RMApp already moves to FAILED/KILLED state.  We covered the case for FINISHED 
state but it seems we miss this.

 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith

 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)