22.04.2017 11:31, Klaus Wenninger пишет: >>>> >>> I wonder how SBD fits into this discussion. It is marketed as stonith >>> agent, but it is based on committing suicide so relies on well-behaving >>> nodes. Which we by definition cannot trust to behave well, otherwise >>> we'd not need stonith in the first place. >> The logic, when using a watchdog timer, is that if the node is alive >> enough to kick the watchdog, it's alive enough to not do something dumb >> to the cluster. If it's not able to kick the timer, the watchdog timer >> will reset the machine. This works *if* all resources hang when messages >> stop coming back from the peer (a side effect of corosync's virtual >> synchrony). > > In fact watchdog-implementations (meaning the software that > kicks the hardware-watchdog) are a little bit smarter - and > so is SBD. > By having the watchdog-kicking and observation-code in a > simple loop that is executed periodically you don't need the > 'if it is alive enough to do the kicking it will behave well' > paradigm. > This burns down to making the critical part of the code very > small and on top hard to control failures that result in any > kind of hanging don't bother us. > >> >> So as I understand it, for SBD to be safe, it requires a hardware >> watchdog timer and a properly configured cluster. > > Yes, yes and yes ... as important as fencing I would say ;-) >
So I gather that for SBD to be reasonably safe, it needs real hardware watchdog. I often see SBD recommended as stonith agent inside a VM, where we do not have "hardware watchdog" by definition. I still wonder whether it can be trusted in this case. _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org