[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079329#comment-13079329 ]
Florian Haas commented on HDFS-2185: ------------------------------------ Allow me to comment that the [Pacemaker|http://www.clusterlabs.org] stack already fulfills all of the above. * Initiates leader election (via ZK) when necessary -- Pacemaker calls this a _Designated Coordinator_, which is elected automatically. * Performs health monitoring (aka failure detection) -- Pacemaker does this via the _monitor_ action of _resource agents_, which follow the Open Cluster Framework (OCF) standard * Performs fail-over (standby to active and active to standby transitions) -- Pacemaker does this automatically, including fencing, quorum and other vital concepts * Heartbeats to ensure the liveness -- Pacemaker does this over one of two cluster communication layers it supports, those being Heartbeat and Corosync. Why reinvent the wheel? > HA: ZK-based FailoverController > ------------------------------- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Eli Collins > Assignee: Eli Collins > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira