[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764689#comment-13764689
 ] 

Jason Lowe commented on YARN-540:
---------------------------------

JobClient is the "standard" APIs.  I don't mean to imply we shouldn't try to 
improve that situation, rather that there are many out-of-band notifications in 
use and therefore fixing JobClient doesn't solve the problem in the general 
sense.

Job end notification (see mapreduce.job.end-notification.url) is another 
mechanism used to notify clients of job completion.  Currently this is done 
before unregistering, but we could move it to after unregistering.  The failure 
mode then changes such that an AM that crashes after unregistering but before 
notifying could end up never notifying a client because the RM would not retry. 
 However job end notification is currently best-effort and not guaranteed, and 
most frameworks I'm familiar with that are using it have a polling fallback 
(via something like JobClient) in case the notification fails to arrive.
                
> Race condition causing RM to potentially relaunch already unregistered AMs on 
> RM restart
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-540
>                 URL: https://issues.apache.org/jira/browse/YARN-540
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
> YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.patch, 
> YARN-540.patch
>
>
> When job succeeds and successfully call finishApplicationMaster, RM shutdown 
> and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
> next time RM comes back, it will reload the existing state files even though 
> the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to