[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908585#comment-13908585
 ] 

Bikas Saha commented on YARN-1410:
----------------------------------

We are not going to save the retry cache anywhere. To support retry cache after 
failover, NN stores some retry information (possibly client-id/call-id though I 
am not sure) into the edit log entry for that operation. So when that edit log 
entry for that operation is recovered upon failover, that information can be 
re-built. Similarly in our case, for the example of this jira. That, 
information can be stored in AppSubmissionContextData (the object that gets 
stored). Since we are piggy-backing on existing storage flow, there should be 
no new issue of async/sync etc. After recovery, if we manage to recover that 
context then we get back to where we were before. If we do not recover it then 
we will accept the new submission (as done in this jira).

> Handle client failover during 2 step client API's like app submission
> ---------------------------------------------------------------------
>
>                 Key: YARN-1410
>                 URL: https://issues.apache.org/jira/browse/YARN-1410
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
> YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> App submission involves
> 1) creating appId
> 2) using that appId to submit an ApplicationSubmissionContext to the user.
> The client may have obtained an appId from an RM, the RM may have failed 
> over, and the client may submit the app to the new RM.
> Since the new RM has a different notion of cluster timestamp (used to create 
> app id) the new RM may reject the app submission resulting in unexpected 
> failure on the client side.
> The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to