On 22/04/17 03:05 AM, Andrei Borzenkov wrote: > 18.04.2017 10:47, Ulrich Windl пишет: > ... >>> >>> Now let me come back to quorum vs. stonith; >>> >>> Said simply; Quorum is a tool for when everything is working. Fencing is >>> a tool for when things go wrong. >> >> I'd say: Quorum is the tool to decide who'll be alive and who's going to die, >> and STONITH is the tool to make nodes die. > > If I had PROD, QA and DEV in a cluster and PROD were separated from > QA+DEV I'd be very sad if PROD were shut down. > > The notion of simple node majority as kill policy is not appropriate as > well as simple node based delays. I wish pacemaker supported scoring > system for resources so that we could base stonith delays on them (the > most important sub-cluster starts fencing first). > > >> If everything is working you need >> neither quorum nor STONITH. >> > > I wonder how SBD fits into this discussion. It is marketed as stonith > agent, but it is based on committing suicide so relies on well-behaving > nodes. Which we by definition cannot trust to behave well, otherwise > we'd not need stonith in the first place.
The logic, when using a watchdog timer, is that if the node is alive enough to kick the watchdog, it's alive enough to not do something dumb to the cluster. If it's not able to kick the timer, the watchdog timer will reset the machine. This works *if* all resources hang when messages stop coming back from the peer (a side effect of corosync's virtual synchrony). So as I understand it, for SBD to be safe, it requires a hardware watchdog timer and a properly configured cluster. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org