[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708732#comment-13708732 ]
Karthik Kambatla commented on YARN-149: --------------------------------------- Thanks Bikas. bq. 1) extra daemon to manage because in fail-over scenarios each extra actor increases the combinatorics The wrapper is not an extra daemon. There will be a single daemon for the wrapper/RM. In the cold standby case, the wrapper starts the RM instance when it becomes active. bq. 2) the wrapper functionality seems to overlap the ZKFC and RM The wrapper *interacts* with the ZKFC and RM. bq. 3) RM will need to be changed to interact with the wrapper and the changes IMO should not be much different than those needed for direct ZKFC interaction Mostly agree with you here. I believe it boils down to the following: what state machine to incorporate the HA logic into. The wrapper approach essentially proposes two state machines - one for the core RM and one for the HA logic. Integrating the HA logic into the current RM will be adding more states to the current RM. There are (dis)advantages to both: the wrapper approach shouldn't affect non-HA instances, and might help with earlier adoption by major YARN users like Yahoo! bq. In fact, what is being called as a wrapper is something that probably does wrap around core RM functionality but remains inside the RM. From what I see, it will be an impl of the HAProtocol interface around the core RM startup functionality. Looks like a promising approach. Let me take a closer look at the code and comment. > ResourceManager (RM) High-Availability (HA) > ------------------------------------------- > > Key: YARN-149 > URL: https://issues.apache.org/jira/browse/YARN-149 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Harsh J > Assignee: Bikas Saha > Attachments: rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf > > > This jira tracks work needed to be done to support one RM instance failing > over to another RM instance so that we can have RM HA. Work includes leader > election, transfer of control to leader and client re-direction to new leader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira