[ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhijie Shen updated YARN-514: ----------------------------- Attachment: YARN-514.7.patch In this patch, there're the following updates: 1. Instead of setting a flag to differentiate new RMApp and recovered RMApp, a new event type, RECOVER, is defined. NEW->SUBMITTED on RECOVER while NEW->NEW_SAVING on START. StartAppAttemptTransition is invoked either NEW->SUBMITTED or NEW_SAVING->SUBMITTED. In this transition, only when the event type is APP_SAVED, the stored exception will be checked. 2. With the aforementioned modification, starting the attempt will not be called in two places, but one. 3. Wrong comments in testAppFailedFailed() is fixed. 4. Test cases are simplified. Only from NEW to SUBMITTED transition is tested, without the following redundant steps. In addition, I've verified the delayed application store works correctly in the single-node cluster. > Delayed store operations should not result in RM unavailability for app > submission > ---------------------------------------------------------------------------------- > > Key: YARN-514 > URL: https://issues.apache.org/jira/browse/YARN-514 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Zhijie Shen > Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, > YARN-514.4.patch, YARN-514.5.patch, YARN-514.6.patch, YARN-514.7.patch > > > Currently, app submission is the only store operation performed synchronously > because the app must be stored before the request returns with success. This > makes the RM susceptible to blocking all client threads on slow store > operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira