[ https://issues.apache.org/jira/browse/YARN-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625802#comment-13625802 ]
Bikas Saha commented on YARN-534: --------------------------------- Looks good overall. Can the message be improved a bit? e.g. because maxAppAttempts have already been started. Secondly, there needs to be a comment here mentioning YARN-556 because this code needs to change when work preserving restart happens, right? {code} + LOG.info("Not recovering application " + appState.getAppId() + + " due to hit maxAppAttempts limit"); {code} I dont think this is needed anymore because the capacity scheduler bug is fixed. Earlier capacity scheduler had a bug that made applications remain in pending state. {code} + conf.set(YarnConfiguration.RM_SCHEDULER, + "org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler"); {code} In the test can we also start another mockrm that uses the save app submission context to determine that max attempts that been reached. The global limit will have to be increased at that time. Can the test stop all started RM's? Minor suggestion in the code structure in RMAppManager.recover(). IMO, we can use a boolean flag shouldRecover which will be set to false if app is unmanaged or max retries is reached. That way we dont have to duplicate the remove logic. > AM max attempts is not checked when RM restart and try to recover attempts > -------------------------------------------------------------------------- > > Key: YARN-534 > URL: https://issues.apache.org/jira/browse/YARN-534 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Jian He > Assignee: Jian He > Attachments: YARN-534.1.patch > > > Currently,AM max attempts is only checked if the current attempt fails and > check to see whether to create new attempt. If the RM restarts before the > max-attempt fails, it'll not clean the state store, when RM comes back, it > will retry attempt again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira