[ClusterLabs] Stonith

Alexander Markov Tue, 21 Mar 2017 00:09:07 -0700

Hello guys,

it looks like I miss something obvious, but I just don't get what hashappened.

I've got a number of stonith-enabled clusters within my big POWER boxes.My stonith devices are two HMC (hardware management consoles) - separateservers from IBM that can reboot separate LPARs (logical partitions)within POWER boxes - one per every datacenter.


So my definition for stonith devices was pretty straightforward:

primitive st_dc2_hmc stonith:ibmhmc \
params ipaddr=10.1.2.9
primitive st_dc1_hmc stonith:ibmhmc \
params ipaddr=10.1.2.8
clone cl_st_dc2_hmc st_dc2_hmc
clone cl_st_dc1_hmc st_dc1_hmc

Everything was ok when we tested failover. But today upon power outagewe lost one DC completely. Shortly after that cluster just literallyhanged itself upong trying to reboot nonexistent node. No failoveroccured. Nonexistent node was marked OFFLINE UNCLEAN and resources weremarked "Started UNCLEAN" on nonexistent node.

UNCLEAN seems to flag a problems with stonith configuration. So myquestion is: how to avoid such behaviour?


Thank you!

--
Regards,
Alexander

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Stonith

Reply via email to