[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765167#comment-13765167
 ] 

Bikas Saha commented on YARN-1027:
----------------------------------

Could you please share the different scenarios that have been tried out. This 
will help everyone else following the jira.

Stopped instead of Stopping?
{code}
+    STANDBY("standby"),
+    STOPPING("stopping");
{code}
Since this is a change in common, this has to be in its own jira filed under 
common. Probably reviewed by someone from HDFS to make sure we will not 
inadvertently break HDFS HA somewhere because of it. We can commit YARN-1027 
independent of that jira with state==Initializing for now and so we are not 
blocked by it.

We would like to be resilient to future changes in transitionToStandby() logic 
that may get missed from serviceStop() and so it might be better to call 
transitionToStandby() inside serviceStop(). Can we modify transitionToStandby 
to accept a stop flag such that if that flag is true then it does not init 
services again and changes state to Stopped. OR something on those lines. 
{code}
 public synchronized void serviceStop() throws Exception {
+    // Stop all services
+    rm.stopActiveServices();
+    haState = HAServiceState.STOPPING;
{code}

Create a startActiveServices() method similar to stopActiveServices() ?
{code}
+    LOG.info("Transitioning to active");
+    rm.activeServices.start();
{code}

creating a new cluster time stamp should be when the RM transitions to active, 
right? Not when it transitions to standby.
{code}
+  void createAndInitActiveServices() throws Exception {
+    // reset cluster timestamp
+    clusterTimeStamp = System.currentTimeMillis();
{code}

Should createAndInit/Start/Stop methods in RM be synchronized? Can they race 
with other activity in the RM happening on the dispatcher thread?

Was getClusterTimeStamp() addition necessary? Its good to keep refactorings 
separate.

Incomplete comment
{code}
+    // 6. Stop the RM. All services should
{code}

We do need some e2e tests that test the changes in more detail. Its fine to do 
that in a separate jira. The new unit tests in this jira are sufficient for the 
purposes of this jira IMO.
                
> Implement RMHAProtocolService
> -----------------------------
>
>                 Key: YARN-1027
>                 URL: https://issues.apache.org/jira/browse/YARN-1027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
> yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch, 
> yarn-1027-6.patch, yarn-1027-including-yarn-1098-3.patch, 
> yarn-1027-in-rm-poc.patch
>
>
> Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
> single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to