[ https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765167#comment-13765167 ]
Bikas Saha commented on YARN-1027: ---------------------------------- Could you please share the different scenarios that have been tried out. This will help everyone else following the jira. Stopped instead of Stopping? {code} + STANDBY("standby"), + STOPPING("stopping"); {code} Since this is a change in common, this has to be in its own jira filed under common. Probably reviewed by someone from HDFS to make sure we will not inadvertently break HDFS HA somewhere because of it. We can commit YARN-1027 independent of that jira with state==Initializing for now and so we are not blocked by it. We would like to be resilient to future changes in transitionToStandby() logic that may get missed from serviceStop() and so it might be better to call transitionToStandby() inside serviceStop(). Can we modify transitionToStandby to accept a stop flag such that if that flag is true then it does not init services again and changes state to Stopped. OR something on those lines. {code} public synchronized void serviceStop() throws Exception { + // Stop all services + rm.stopActiveServices(); + haState = HAServiceState.STOPPING; {code} Create a startActiveServices() method similar to stopActiveServices() ? {code} + LOG.info("Transitioning to active"); + rm.activeServices.start(); {code} creating a new cluster time stamp should be when the RM transitions to active, right? Not when it transitions to standby. {code} + void createAndInitActiveServices() throws Exception { + // reset cluster timestamp + clusterTimeStamp = System.currentTimeMillis(); {code} Should createAndInit/Start/Stop methods in RM be synchronized? Can they race with other activity in the RM happening on the dispatcher thread? Was getClusterTimeStamp() addition necessary? Its good to keep refactorings separate. Incomplete comment {code} + // 6. Stop the RM. All services should {code} We do need some e2e tests that test the changes in more detail. Its fine to do that in a separate jira. The new unit tests in this jira are sufficient for the purposes of this jira IMO. > Implement RMHAProtocolService > ----------------------------- > > Key: YARN-1027 > URL: https://issues.apache.org/jira/browse/YARN-1027 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Bikas Saha > Assignee: Karthik Kambatla > Attachments: test-yarn-1027.patch, yarn-1027-1.patch, > yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch, > yarn-1027-6.patch, yarn-1027-including-yarn-1098-3.patch, > yarn-1027-in-rm-poc.patch > > > Implement existing HAServiceProtocol from Hadoop common. This protocol is the > single point of interaction between the RM and HA clients/services. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira