On 07/24/2017 07:32 PM, Prasad, Shashank wrote: > > Sometimes IPMI fence devices use shared power of the node, and it > cannot be avoided. > > In such scenarios the HA cluster is NOT able to handle the power > failure of a node, since the power is shared with its own fence device. > > The failure of IPMI based fencing can also exist due to other reasons > also. > > > > A failure to fence the failed node will cause cluster to be marked > UNCLEAN. > > To get over it, the following command needs to be invoked on the > surviving node. > > > > pcs stonith confirm <failed_node_name> --force > > > > This can be automated by hooking a recovery script, when the the > Stonith resource ‘Timed Out’ event. > > To be more specific, the Pacemaker Alerts can be used for watch for > Stonith timeouts and failures. > > In that script, all that’s essentially to be executed is the > aforementioned command. >
If I get you right here you can disable fencing then in the first place. Actually quorum-based-watchdog-fencing is the way to do this in a safe manner. This of course assumes you have a proper source for quorum in your 2-node-setup with e.g. qdevice or using a shared disk with sbd (not directly pacemaker quorum here but similar thing handled inside sbd). > Since the alerts are issued from ‘hacluster’ login, sudo permissions > for ‘hacluster’ needs to be configured. > > > > Thanx. > > > > > > *From:*Klaus Wenninger [mailto:kwenn...@redhat.com] > *Sent:* Monday, July 24, 2017 9:24 PM > *To:* Kristián Feldsam; Cluster Labs - All topics related to > open-source clustering welcomed > *Subject:* Re: [ClusterLabs] Two nodes cluster issue > > > > On 07/24/2017 05:37 PM, Kristián Feldsam wrote: > > I personally think that power off node by switched pdu is more > safe, or not? > > > True if that is working in you environment. If you can't do a physical > setup > where you aren't simultaneously loosing connection to both your node and > the switch-device (or you just want to cover cases where that happens) > you have to come up with something else. > > > > S pozdravem Kristián Feldsam > Tel.: +420 773 303 353, +421 944 137 535 > E-mail.: supp...@feldhost.cz <mailto:supp...@feldhost.cz> > > www.feldhost.cz <http://www.feldhost.cz> - *Feld*Host™ – profesionální > hostingové a serverové služby za adekvátní ceny. > > FELDSAM s.r.o. > V rohu 434/3 > Praha 4 – Libuš, PSČ 142 00 > IČ: 290 60 958, DIČ: CZ290 60 958 > C 200350 vedená u Městského soudu v Praze > > Banka: Fio banka a.s. > Číslo účtu: 2400330446/2010 > BIC: FIOBCZPPXX > IBAN: CZ82 2010 0000 0024 0033 0446 > > > > On 24 Jul 2017, at 17:27, Klaus Wenninger <kwenn...@redhat.com > <mailto:kwenn...@redhat.com>> wrote: > > > > On 07/24/2017 05:15 PM, Tomer Azran wrote: > > I still don't understand why the qdevice concept doesn't help > on this situation. Since the master node is down, I would > expect the quorum to declare it as dead. > > Why doesn't it happens? > > > That is not how quorum works. It just limits the decision-making > to the quorate subset of the cluster. > Still the unknown nodes are not sure to be down. > That is why I suggested to have quorum-based watchdog-fencing with > sbd. > That would assure that within a certain time all nodes of the > non-quorate part > of the cluster are down. > > > > > On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri > Maziuk" <dmitri.maz...@gmail.com > <mailto:dmitri.maz...@gmail.com>> wrote: > > On 2017-07-24 07:51, Tomer Azran wrote: > > > We don't have the ability to use it. > > > Is that the only solution? > > > > No, but I'd recommend thinking about it first. Are you sure you will > > care about your cluster working when your server room is on fire? 'Cause > > unless you have halon suppression, your server room is a complete > > write-off anyway. (Think water from sprinklers hitting rich chunky volts > > in the servers.) > > > > Dima > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> > > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> > > > > -- > > Klaus Wenninger > > > > Senior Software Engineer, EMEA ENG Openstack Infrastructure > > > > Red Hat > > > > kwenn...@redhat.com <mailto:kwenn...@redhat.com> > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> > Getting > started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> > > > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org