On 06/25/2018 12:01 PM, 井上 和徳 wrote: >> -----Original Message----- >> From: Klaus Wenninger [mailto:kwenn...@redhat.com] >> Sent: Wednesday, June 13, 2018 6:40 PM >> To: Cluster Labs - All topics related to open-source clustering welcomed; 井上 >> 和 >> 徳 >> Subject: Re: [ClusterLabs] Questions about SBD behavior >> >> On 06/13/2018 10:58 AM, 井上 和徳 wrote: >>> Thanks for the response. >>> >>> As of v1.3.1 and later, I recognized that real quorum is necessary. >>> I also read this: >>> >> https://wiki.clusterlabs.org/wiki/Using_SBD_with_Pacemaker#Watchdog-based_self >> -fencing_with_resource_recovery >>> As related to this specification, in order to use pacemaker-2.0, >>> we are confirming the following known issue. >>> >>> * When SIGSTOP is sent to the pacemaker process, no failure of the >>> resource will be detected. >>> https://lists.clusterlabs.org/pipermail/users/2016-September/011146.html >>> https://lists.clusterlabs.org/pipermail/users/2016-October/011429.html >>> >>> I expected that it was being handled by SBD, but no one detected >>> that the following process was frozen. Therefore, no failure of >>> the resource was detected either. >>> - pacemaker-based >>> - pacemaker-execd >>> - pacemaker-attrd >>> - pacemaker-schedulerd >>> - pacemaker-controld >>> >>> I confirmed this, but I couldn't read about the correspondence >>> situation. >>> >> https://wiki.clusterlabs.org/w/images/1/1a/Recent_Work_and_Future_Plans_for_SB >> D_1.1.pdf >> You are right. The issue was known as when I created these slides. >> So a plan for improving the observation of the pacemaker-daemons >> should have gone into that probably. >> > It's good news that there is a plan to improve. > So I registered it as a memorandum in CLBZ: > https://bugs.clusterlabs.org/show_bug.cgi?id=5356 > > Best Regards Wasn't there a bug filed before?
Klaus > >> Thanks for bringing this to the table. >> Guess the issue got a little bit neglected recently. >> >>> As a result of our discussion, we want SBD to detect it and reset the >>> machine. >> Implementation wise I would go for some kind of a split >> solution between pacemaker & SBD. Thinking of Pacemaker >> observing the sub-daemons by itself while there would be >> some kind of a heartbeat (implicitly via corosync or explicitly) >> between pacemaker & SBD that assures this internal >> observation is doing it's job properly. >> >>> Also, for users who do not have shared disk or qdevice, >>> we need an option to work even without real quorum. >>> (fence races are going to avoid with delay attribute: >>> https://access.redhat.com/solutions/91653 >>> https://access.redhat.com/solutions/1293523) >> I'm not sure if I get your point here. >> Watchdog-fencing on a 2-node-cluster without >> additional qdevice or shared disk is like denying >> the laws of physics in my mind. >> At the moment I don't see why auto_tie_breaker >> wouldn't work on a 4-node and up cluster here. >> >> Regards, >> Klaus >>> Best Regards, >>> Kazunori INOUE >>> >>>> -----Original Message----- >>>> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Klaus >>>> Wenninger >>>> Sent: Friday, May 25, 2018 4:08 PM >>>> To: users@clusterlabs.org >>>> Subject: Re: [ClusterLabs] Questions about SBD behavior >>>> >>>> On 05/25/2018 07:31 AM, 井上 和徳 wrote: >>>>> Hi, >>>>> >>>>> I am checking the watchdog function of SBD (without shared block-device). >>>>> In a two-node cluster, if one cluster is stopped, watchdog is triggered >>>>> on the >>>> remaining node. >>>>> Is this the designed behavior? >>>> SBD without a shared block-device doesn't really make sense on >>>> a two-node cluster. >>>> The basic idea is - e.g. in a case of a networking problem - >>>> that a cluster splits up in a quorate and a non-quorate partition. >>>> The quorate partition stays over while SBD guarantees a >>>> reliable watchdog-based self-fencing of the non-quorate partition >>>> within a defined timeout. >>>> This idea of course doesn't work with just 2 nodes. >>>> Taking quorum info from the 2-node feature of corosync (automatically >>>> switching on wait-for-all) doesn't help in this case but instead >>>> would lead to split-brain. >>>> What you can do - and what e.g. pcs does automatically - is enable >>>> the auto-tie-breaker instead of two-node in corosync. But that >>>> still doesn't give you a higher availability than the one of the >>>> winner of auto-tie-breaker. (Maybe interesting if you are going >>>> for a load-balancing-scenario that doesn't affect availability or >>>> for a transient state while setting up a cluste node-by-node ...) >>>> What you can do though is using qdevice to still have 'real-quorum' >>>> info with just 2 full cluster-nodes. >>>> >>>> There was quite a lot of discussion round this topic on this >>>> thread previously if you search the history. >>>> >>>> Regards, >>>> Klaus >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org