[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762533#comment-13762533
 ] 

Jian He commented on YARN-540:
------------------------------

bq. Why is the default false?
I thought it's more safe to tell user that this flag is false. 

bq. So we check for FINISHING, FINISHED. FAILED, KILLED. This will also allow 
us to not special case the unmanaged AM in the latter half of the same function
Added a separate method to check for FINISHING, FINISHED. FAILED, KILLED 
states. We still need to special case the unmanaged AM in the end though, since 
otherwise when the 2nd unregister call comes in, this check will throw 
Exception because unManagedAM attempt will be immediately removed from the 
responseMap
{code}
if (lastResponse == null) {
      String message = "Application doesn't exist in cache "
          + applicationAttemptId;
      LOG.error(message);
      throw RPCUtil.getRemoteException(message);
    }
{code}

bq. Is there a version of delete that will not fail if the file does not exist? 
OR we can have a boolean in RMApp to show that the removal request has already 
been sent.
Added a boolean to check if the removal request has already been sent.

Addressed other comments also.
                
> Race condition causing RM to potentially relaunch already unregistered AMs on 
> RM restart
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-540
>                 URL: https://issues.apache.org/jira/browse/YARN-540
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
> YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.patch, 
> YARN-540.patch
>
>
> When job succeeds and successfully call finishApplicationMaster, RM shutdown 
> and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
> next time RM comes back, it will reload the existing state files even though 
> the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to