[jira] [Commented] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps
[ https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046328#comment-14046328 ] Vinod Kumar Vavilapalli commented on YARN-1373: --- Since YARN-1210, we always have had the app and app-attempt move to RUNNING state after RM restarts. That's why it is a dup. Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps --- Key: YARN-1373 URL: https://issues.apache.org/jira/browse/YARN-1373 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Currently the RM moves recovered app attempts to the a terminal recovered state and starts a new attempt. Instead, it will have to transition the last attempt to a running state such that it can proceed as normal once the running attempt has resynced with the ApplicationMasterService (YARN-1365 and YARN-1366). If the RM had started the application container before dying then the AM would be up and trying to contact the RM. The RM may have had died before launching the container. For this case, the RM should wait for AM liveliness period and issue a kill container for the stored master container. It should transition this attempt to some RECOVER_ERROR state and proceed to start a new attempt. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps
[ https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034700#comment-14034700 ] Bikas Saha commented on YARN-1373: -- Sorry I am not clear how this is a dup. This jira is tracking new behavior in the RM that will transition a recovered RMAppImpl/RMAppAttemptImpl (and still running for real) app to a RUNNING state instead of a terminal recovered state. This is to ensure that the state machines are in the correct state for the running AM to resync and continue as running. This is not related to killing the app master process on the NM. Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps --- Key: YARN-1373 URL: https://issues.apache.org/jira/browse/YARN-1373 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Currently the RM moves recovered app attempts to the a terminal recovered state and starts a new attempt. Instead, it will have to transition the last attempt to a running state such that it can proceed as normal once the running attempt has resynced with the ApplicationMasterService (YARN-1365 and YARN-1366). If the RM had started the application container before dying then the AM would be up and trying to contact the RM. The RM may have had died before launching the container. For this case, the RM should wait for AM liveliness period and issue a kill container for the stored master container. It should transition this attempt to some RECOVER_ERROR state and proceed to start a new attempt. -- This message was sent by Atlassian JIRA (v6.2#6252)