[ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815280#comment-13815280 ]
Bikas Saha commented on YARN-1222: ---------------------------------- bq. Post YARN-1318, I think RMStateStore constructor should take RMContext. Then, we should be able to replace the RPC approach with rmContext.getHAService.transitionToStandby() Great, lets track that and put a comment. Doing a self-RPC is good to avoid. bq. A completely different approach might to be keep handleStoreFencedException() in ResourceManager and the store implementation to call it when it realizes it got fenced. Thoughts? Thats what I was suggesting. The store reports this exception/error to the RM and then the RM does the right thing. (in this case transitionToStandby). notifyDoneStoringApplicationAttempt() etc should not be sent when there is a fenced exception. Extending that, we should probably only send the notifyDone* upon success. That way those callees need to be bothered only with the normal/success code path. Any exception should be reported to the RM. The RM can examine the exception to see if it is a fenced exception. Then transitionToStandby(). If some other exception then die (like we currently do in multiple different places. We will now do it in one place). > Make improvements in ZKRMStateStore for fencing > ----------------------------------------------- > > Key: YARN-1222 > URL: https://issues.apache.org/jira/browse/YARN-1222 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Bikas Saha > Assignee: Karthik Kambatla > Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, > yarn-1222-4.patch, yarn-1222-5.patch > > > Using multi-operations for every ZK interaction. > In every operation, automatically creating/deleting a lock znode that is the > child of the root znode. This is to achieve fencing by modifying the > create/delete permissions on the root znode. -- This message was sent by Atlassian JIRA (v6.1#6144)