On 06/13/2018 10:58 AM, 井上 和徳 wrote: > Thanks for the response. > > As of v1.3.1 and later, I recognized that real quorum is necessary. > I also read this: > https://wiki.clusterlabs.org/wiki/Using_SBD_with_Pacemaker#Watchdog-based_self-fencing_with_resource_recovery > > As related to this specification, in order to use pacemaker-2.0, > we are confirming the following known issue. > > * When SIGSTOP is sent to the pacemaker process, no failure of the > resource will be detected. > https://lists.clusterlabs.org/pipermail/users/2016-September/011146.html > https://lists.clusterlabs.org/pipermail/users/2016-October/011429.html > > I expected that it was being handled by SBD, but no one detected > that the following process was frozen. Therefore, no failure of > the resource was detected either. > - pacemaker-based > - pacemaker-execd > - pacemaker-attrd > - pacemaker-schedulerd > - pacemaker-controld > > I confirmed this, but I couldn't read about the correspondence > situation. > > https://wiki.clusterlabs.org/w/images/1/1a/Recent_Work_and_Future_Plans_for_SBD_1.1.pdf You are right. The issue was known as when I created these slides. So a plan for improving the observation of the pacemaker-daemons should have gone into that probably.
Thanks for bringing this to the table. Guess the issue got a little bit neglected recently. > > As a result of our discussion, we want SBD to detect it and reset the > machine. Implementation wise I would go for some kind of a split solution between pacemaker & SBD. Thinking of Pacemaker observing the sub-daemons by itself while there would be some kind of a heartbeat (implicitly via corosync or explicitly) between pacemaker & SBD that assures this internal observation is doing it's job properly. > > Also, for users who do not have shared disk or qdevice, > we need an option to work even without real quorum. > (fence races are going to avoid with delay attribute: > https://access.redhat.com/solutions/91653 > https://access.redhat.com/solutions/1293523) I'm not sure if I get your point here. Watchdog-fencing on a 2-node-cluster without additional qdevice or shared disk is like denying the laws of physics in my mind. At the moment I don't see why auto_tie_breaker wouldn't work on a 4-node and up cluster here. Regards, Klaus > > Best Regards, > Kazunori INOUE > >> -----Original Message----- >> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Klaus >> Wenninger >> Sent: Friday, May 25, 2018 4:08 PM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Questions about SBD behavior >> >> On 05/25/2018 07:31 AM, 井上 和徳 wrote: >>> Hi, >>> >>> I am checking the watchdog function of SBD (without shared block-device). >>> In a two-node cluster, if one cluster is stopped, watchdog is triggered on >>> the >> remaining node. >>> Is this the designed behavior? >> SBD without a shared block-device doesn't really make sense on >> a two-node cluster. >> The basic idea is - e.g. in a case of a networking problem - >> that a cluster splits up in a quorate and a non-quorate partition. >> The quorate partition stays over while SBD guarantees a >> reliable watchdog-based self-fencing of the non-quorate partition >> within a defined timeout. >> This idea of course doesn't work with just 2 nodes. >> Taking quorum info from the 2-node feature of corosync (automatically >> switching on wait-for-all) doesn't help in this case but instead >> would lead to split-brain. >> What you can do - and what e.g. pcs does automatically - is enable >> the auto-tie-breaker instead of two-node in corosync. But that >> still doesn't give you a higher availability than the one of the >> winner of auto-tie-breaker. (Maybe interesting if you are going >> for a load-balancing-scenario that doesn't affect availability or >> for a transient state while setting up a cluste node-by-node ...) >> What you can do though is using qdevice to still have 'real-quorum' >> info with just 2 full cluster-nodes. >> >> There was quite a lot of discussion round this topic on this >> thread previously if you search the history. >> >> Regards, >> Klaus > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org