[ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766926#comment-13766926 ]
Bikas Saha commented on YARN-540: --------------------------------- bq. Because I added a check in RMAppRemovingTransition instead of FinalTransition The check in RMAppRemovingTransition will pass in the normal case because the app has unregistered and this is the first call to remove app. Then in the end when the app container exits then FinalTransition is called and there is no check at that time. so removeapp will be called a second time and the delete will throw an exception. Is that not the flow? > Race condition causing RM to potentially relaunch already unregistered AMs on > RM restart > ---------------------------------------------------------------------------------------- > > Key: YARN-540 > URL: https://issues.apache.org/jira/browse/YARN-540 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Jian He > Assignee: Jian He > Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, > YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.7.patch, > YARN-540.7.patch, YARN-540.8.patch, YARN-540.patch, YARN-540.patch > > > When job succeeds and successfully call finishApplicationMaster, RM shutdown > and restart-dispatcher is stopped before it can process REMOVE_APP event. The > next time RM comes back, it will reload the existing state files even though > the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira