[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801504#comment-13801504
 ] 

Bikas Saha commented on YARN-1121:
----------------------------------

The state store has its own dispatch queue. So basically in RMStateStore.stop() 
we need to block until the pending events get stored. At this point. Ideally 
before stopping the state store, the RM will shutdown all RPC services so that 
no new externally generated store events get added to the system. 

One of the main use cases for this jira is a user who has submitted a job and 
is currently polling for the app to be accepted. If the RM does not save this 
request then such users will get an unknown app error when the RM comes back 
up. If we are willing to live with this error then we may choose to not fix 
this jira. Because, like you say, the RM may not have time to actually flush 
pending events. So we cannot technically solve this completely. However, during 
graceful RM failover or admin controlled shutdown, we should have time to flush 
events and prevent such user facing errors.

> RMStateStore should flush all pending store events before closing
> -----------------------------------------------------------------
>
>                 Key: YARN-1121
>                 URL: https://issues.apache.org/jira/browse/YARN-1121
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Bikas Saha
>            Assignee: Omkar Vinit Joshi
>             Fix For: 2.2.1
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to