[ 
https://issues.apache.org/jira/browse/YARN-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868231#comment-13868231
 ] 

Bikas Saha commented on YARN-1584:
----------------------------------

The duration of failover depends on how long ZK needs to figure out that the 
leader is gone. Then notifying the new leader. Then new leader reading state.
Its not clear to me how any of these steps are faster with a admin failover 
option.

Not quite. When the RM is asked to transition to active via the AdminService 
(FORCE_USER) flag, then the AdminService can transition to standby and then 
notify the elector to quitElection(). That API is present on the elector for 
this specific purpose. The elector gives up participation in the leader 
election process. This RM will remain in standby (because the elector is not 
going to notify it anymore) until the admin ask it to 
transitionToActive(FORCE_USER). Later, when the AdminService is asked to 
transitionToActive() it can call the joinElection API on the elector to rejoin 
the leader election and stay in the Standby state. The elector will join the 
election and notify the RM to transitionToActive if it wins the election.

> Support explicit failover when automatic failover is enabled
> ------------------------------------------------------------
>
>                 Key: YARN-1584
>                 URL: https://issues.apache.org/jira/browse/YARN-1584
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>
> YARN-1029 adds automatic failover support. However, users can't explicitly 
> ask for a failover from one RM to the other without stopping the other RM. 
> Stopping the RM until the other RM takes over and then restarting the first 
> RM is more involving and exposes the RM-ensemble to SPOF for a longer 
> duration. 
> It would be nice to allow explicit failover through yarn rmadmin -failover 
> command.
> PS: HDFS supports -failover option. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to