[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880797#comment-13880797
 ] 

Jian He commented on YARN-1618:
-------------------------------

bq. is it still the case that RPC servers are started after recovery is 
complete?
it is.
bq.  The START should come almost immediately after the RMAppImpl object is 
created in a NEW state during regular app submission. Karthik, are we sure that 
this happened?
yes, it is.
bq. There is no need for history for an app that was never submitted 
successfully to the RM.
I agree. We don't need to save the final state of the app if the app is not 
even accepted by the RM.
bq. If we don't want the store to be touched until the app is SUBMITTED/ 
ACCEPTED (X), we should probably replace the existing NEW_SAVING state with a 
corresponding X_SAVING state, and re-jig the transitions to directly go to 
KILLED/FAILED from any of the states before this X_SAVING state.
Regarding the two approaches Karthik proposed. I'm in favor of the 1st one.  

> Applications transition from NEW to FINAL_SAVING, and try to update 
> non-existing entries in the state-store
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1618
>                 URL: https://issues.apache.org/jira/browse/YARN-1618
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>         Attachments: yarn-1618-1.patch
>
>
> YARN-891 augments the RMStateStore to store information on completed 
> applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
> This leads to the RM trying to update entries in the state-store that do not 
> exist. On ZKRMStateStore, this leads to the RM crashing. 
> Previous description:
> ZKRMStateStore fails to handle updates to znodes that don't exist. For 
> instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
> In these cases, the store should create the missing znode and handle the 
> update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to