> -----Original Message----- > From: Klaus Wenninger [mailto:kwenn...@redhat.com] > Sent: Wednesday, June 13, 2018 6:40 PM > To: Cluster Labs - All topics related to open-source clustering welcomed; 井上 和 > 徳 > Subject: Re: [ClusterLabs] Questions about SBD behavior > > On 06/13/2018 10:58 AM, 井上 和徳 wrote: > > Thanks for the response. > > > > As of v1.3.1 and later, I recognized that real quorum is necessary. > > I also read this: > > > https://wiki.clusterlabs.org/wiki/Using_SBD_with_Pacemaker#Watchdog-based_self > -fencing_with_resource_recovery > > > > As related to this specification, in order to use pacemaker-2.0, > > we are confirming the following known issue. > > > > * When SIGSTOP is sent to the pacemaker process, no failure of the > > resource will be detected. > > https://lists.clusterlabs.org/pipermail/users/2016-September/011146.html > > https://lists.clusterlabs.org/pipermail/users/2016-October/011429.html > > > > I expected that it was being handled by SBD, but no one detected > > that the following process was frozen. Therefore, no failure of > > the resource was detected either. > > - pacemaker-based > > - pacemaker-execd > > - pacemaker-attrd > > - pacemaker-schedulerd > > - pacemaker-controld > > > > I confirmed this, but I couldn't read about the correspondence > > situation. > > > https://wiki.clusterlabs.org/w/images/1/1a/Recent_Work_and_Future_Plans_for_SB > D_1.1.pdf > You are right. The issue was known as when I created these slides. > So a plan for improving the observation of the pacemaker-daemons > should have gone into that probably. >
It's good news that there is a plan to improve. So I registered it as a memorandum in CLBZ: https://bugs.clusterlabs.org/show_bug.cgi?id=5356 Best Regards > Thanks for bringing this to the table. > Guess the issue got a little bit neglected recently. > > > > > As a result of our discussion, we want SBD to detect it and reset the > > machine. > > Implementation wise I would go for some kind of a split > solution between pacemaker & SBD. Thinking of Pacemaker > observing the sub-daemons by itself while there would be > some kind of a heartbeat (implicitly via corosync or explicitly) > between pacemaker & SBD that assures this internal > observation is doing it's job properly. > > > > > Also, for users who do not have shared disk or qdevice, > > we need an option to work even without real quorum. > > (fence races are going to avoid with delay attribute: > > https://access.redhat.com/solutions/91653 > > https://access.redhat.com/solutions/1293523) > I'm not sure if I get your point here. > Watchdog-fencing on a 2-node-cluster without > additional qdevice or shared disk is like denying > the laws of physics in my mind. > At the moment I don't see why auto_tie_breaker > wouldn't work on a 4-node and up cluster here. > > Regards, > Klaus > > > > Best Regards, > > Kazunori INOUE > > > >> -----Original Message----- > >> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Klaus > >> Wenninger > >> Sent: Friday, May 25, 2018 4:08 PM > >> To: users@clusterlabs.org > >> Subject: Re: [ClusterLabs] Questions about SBD behavior > >> > >> On 05/25/2018 07:31 AM, 井上 和徳 wrote: > >>> Hi, > >>> > >>> I am checking the watchdog function of SBD (without shared block-device). > >>> In a two-node cluster, if one cluster is stopped, watchdog is triggered > >>> on the > >> remaining node. > >>> Is this the designed behavior? > >> SBD without a shared block-device doesn't really make sense on > >> a two-node cluster. > >> The basic idea is - e.g. in a case of a networking problem - > >> that a cluster splits up in a quorate and a non-quorate partition. > >> The quorate partition stays over while SBD guarantees a > >> reliable watchdog-based self-fencing of the non-quorate partition > >> within a defined timeout. > >> This idea of course doesn't work with just 2 nodes. > >> Taking quorum info from the 2-node feature of corosync (automatically > >> switching on wait-for-all) doesn't help in this case but instead > >> would lead to split-brain. > >> What you can do - and what e.g. pcs does automatically - is enable > >> the auto-tie-breaker instead of two-node in corosync. But that > >> still doesn't give you a higher availability than the one of the > >> winner of auto-tie-breaker. (Maybe interesting if you are going > >> for a load-balancing-scenario that doesn't affect availability or > >> for a transient state while setting up a cluste node-by-node ...) > >> What you can do though is using qdevice to still have 'real-quorum' > >> info with just 2 full cluster-nodes. > >> > >> There was quite a lot of discussion round this topic on this > >> thread previously if you search the history. > >> > >> Regards, > >> Klaus > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org