[ 
https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708732#comment-13708732
 ] 

Karthik Kambatla commented on YARN-149:
---------------------------------------

Thanks Bikas.

bq. 1) extra daemon to manage because in fail-over scenarios each extra actor 
increases the combinatorics
The wrapper is not an extra daemon. There will be a single daemon for the 
wrapper/RM. In the cold standby case, the wrapper starts the RM instance when 
it becomes active. 

bq. 2) the wrapper functionality seems to overlap the ZKFC and RM
The wrapper *interacts* with the ZKFC and RM. 

bq. 3) RM will need to be changed to interact with the wrapper and the changes 
IMO should not be much different than those needed for direct ZKFC interaction
Mostly agree with you here. 

I believe it boils down to the following: what state machine to incorporate the 
HA logic into. The wrapper approach essentially proposes two state machines - 
one for the core RM and one for the HA logic. Integrating the HA logic into the 
current RM will be adding more states to the current RM. There are 
(dis)advantages to both: the wrapper approach shouldn't affect non-HA 
instances, and might help with earlier adoption by major YARN users like Yahoo!

bq. In fact, what is being called as a wrapper is something that probably does 
wrap around core RM functionality but remains inside the RM. From what I see, 
it will be an impl of the HAProtocol interface around the core RM startup 
functionality.
Looks like a promising approach. Let me take a closer look at the code and 
comment.
                
> ResourceManager (RM) High-Availability (HA)
> -------------------------------------------
>
>                 Key: YARN-149
>                 URL: https://issues.apache.org/jira/browse/YARN-149
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Harsh J
>            Assignee: Bikas Saha
>         Attachments: rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf
>
>
> This jira tracks work needed to be done to support one RM instance failing 
> over to another RM instance so that we can have RM HA. Work includes leader 
> election, transfer of control to leader and client re-direction to new leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to