[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079329#comment-13079329
 ] 

Florian Haas commented on HDFS-2185:
------------------------------------

Allow me to comment that the [Pacemaker|http://www.clusterlabs.org] stack 
already fulfills all of the above.

* Initiates leader election (via ZK) when necessary -- Pacemaker calls this a 
_Designated Coordinator_, which is elected automatically.

* Performs health monitoring (aka failure detection) -- Pacemaker does this via 
the _monitor_ action of _resource agents_, which follow the Open Cluster 
Framework (OCF) standard

* Performs fail-over (standby to active and active to standby transitions) -- 
Pacemaker does this automatically, including fencing, quorum and other vital 
concepts

* Heartbeats to ensure the liveness -- Pacemaker does this over one of two 
cluster communication layers it supports, those being Heartbeat and Corosync.

Why reinvent the wheel?

> HA: ZK-based FailoverController
> -------------------------------
>
>                 Key: HDFS-2185
>                 URL: https://issues.apache.org/jira/browse/HDFS-2185
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to