[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766926#comment-13766926
 ] 

Bikas Saha commented on YARN-540:
---------------------------------

bq. Because I added a check in RMAppRemovingTransition instead of 
FinalTransition
The check in RMAppRemovingTransition will pass in the normal case because the 
app has unregistered and this is the first call to remove app. Then in the end 
when the app container exits then FinalTransition is called and there is no 
check at that time. so removeapp will be called a second time and the delete 
will throw an exception. Is that not the flow?
                
> Race condition causing RM to potentially relaunch already unregistered AMs on 
> RM restart
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-540
>                 URL: https://issues.apache.org/jira/browse/YARN-540
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
> YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.7.patch, 
> YARN-540.7.patch, YARN-540.8.patch, YARN-540.patch, YARN-540.patch
>
>
> When job succeeds and successfully call finishApplicationMaster, RM shutdown 
> and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
> next time RM comes back, it will reload the existing state files even though 
> the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to