On Fri, May 25, 2018 at 10:08 AM, Klaus Wenninger <kwenn...@redhat.com> wrote: > On 05/25/2018 07:31 AM, 井上 和徳 wrote: >> Hi, >> >> I am checking the watchdog function of SBD (without shared block-device). >> In a two-node cluster, if one cluster is stopped, watchdog is triggered on >> the remaining node. >> Is this the designed behavior? > > SBD without a shared block-device doesn't really make sense on > a two-node cluster. > The basic idea is - e.g. in a case of a networking problem - > that a cluster splits up in a quorate and a non-quorate partition. > The quorate partition stays over while SBD guarantees a > reliable watchdog-based self-fencing of the non-quorate partition > within a defined timeout.
Does it require no-quorum-policy=suicide or it decides completely independently? I.e. would it fire also with no-quorum-policy=ignore? > This idea of course doesn't work with just 2 nodes. > Taking quorum info from the 2-node feature of corosync (automatically > switching on wait-for-all) doesn't help in this case but instead > would lead to split-brain. So what you are saying is that SBD ignores quorum information from corosync and takes its own decisions based on pure count of nodes. Do I understand it correctly? > What you can do - and what e.g. pcs does automatically - is enable > the auto-tie-breaker instead of two-node in corosync. But that > still doesn't give you a higher availability than the one of the > winner of auto-tie-breaker. (Maybe interesting if you are going > for a load-balancing-scenario that doesn't affect availability or > for a transient state while setting up a cluste node-by-node ...) > What you can do though is using qdevice to still have 'real-quorum' > info with just 2 full cluster-nodes. > > There was quite a lot of discussion round this topic on this > thread previously if you search the history. > > Regards, > Klaus > >> >> [vmrh75b]# cat /etc/corosync/corosync.conf >> (snip) >> quorum { >> provider: corosync_votequorum >> two_node: 1 >> } >> >> [vmrh75b]# cat /etc/sysconfig/sbd >> # This file has been generated by pcs. >> SBD_DELAY_START=no >> ## SBD_DEVICE="/dev/vdb1" >> SBD_OPTS="-vvv" >> SBD_PACEMAKER=yes >> SBD_STARTMODE=always >> SBD_WATCHDOG_DEV=/dev/watchdog >> SBD_WATCHDOG_TIMEOUT=5 >> >> [vmrh75b]# crm_mon -r1 >> Stack: corosync >> Current DC: vmrh75a (version 2.0.0-0.1.rc4.el7-2.0.0-rc4) - partition with >> quorum >> Last updated: Fri May 25 13:36:07 2018 >> Last change: Fri May 25 13:35:22 2018 by root via cibadmin on vmrh75a >> >> 2 nodes configured >> 0 resources configured >> >> Online: [ vmrh75a vmrh75b ] >> >> No resources >> >> [vmrh75b]# pcs property show >> Cluster Properties: >> cluster-infrastructure: corosync >> cluster-name: my_cluster >> dc-version: 2.0.0-0.1.rc4.el7-2.0.0-rc4 >> have-watchdog: true >> stonith-enabled: false >> >> [vmrh75b]# ps -ef | egrep "sbd|coro|pace" >> root 2169 1 0 13:34 ? 00:00:00 sbd: inquisitor >> root 2170 2169 0 13:34 ? 00:00:00 sbd: watcher: Pacemaker >> root 2171 2169 0 13:34 ? 00:00:00 sbd: watcher: Cluster >> root 2172 1 0 13:34 ? 00:00:00 corosync >> root 2179 1 0 13:34 ? 00:00:00 /usr/sbin/pacemakerd -f >> haclust+ 2180 2179 0 13:34 ? 00:00:00 >> /usr/libexec/pacemaker/pacemaker-based >> root 2181 2179 0 13:34 ? 00:00:00 >> /usr/libexec/pacemaker/pacemaker-fenced >> root 2182 2179 0 13:34 ? 00:00:00 >> /usr/libexec/pacemaker/pacemaker-execd >> haclust+ 2183 2179 0 13:34 ? 00:00:00 >> /usr/libexec/pacemaker/pacemaker-attrd >> haclust+ 2184 2179 0 13:34 ? 00:00:00 >> /usr/libexec/pacemaker/pacemaker-schedulerd >> haclust+ 2185 2179 0 13:34 ? 00:00:00 >> /usr/libexec/pacemaker/pacemaker-controld >> >> [vmrh75b]# pcs cluster stop vmrh75a >> vmrh75a: Stopping Cluster (pacemaker)... >> vmrh75a: Stopping Cluster (corosync)... >> >> [vmrh75b]# tail -F /var/log/messages >> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: Our peer on the DC >> (vmrh75a) is dead >> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition >> S_NOT_DC -> S_ELECTION >> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition >> S_ELECTION -> S_INTEGRATION >> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Node vmrh75a state is >> now lost >> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Removing all vmrh75a >> attributes for peer loss >> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Lost attribute writer >> vmrh75a >> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Purged 1 peer with >> id=1 and/or uname=vmrh75a from the membership cache >> May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Node vmrh75a state >> is now lost >> May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Purged 1 peer with >> id=1 and/or uname=vmrh75a from the membership cache >> May 25 13:37:00 vmrh75b pacemaker-based[2180]: notice: Node vmrh75a state is >> now lost >> May 25 13:37:00 vmrh75b pacemaker-based[2180]: notice: Purged 1 peer with >> id=1 and/or uname=vmrh75a from the membership cache >> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: warning: Input >> I_ELECTION_DC received in state S_INTEGRATION from do_election_check >> May 25 13:37:01 vmrh75b sbd[2171]: cluster: warning: set_servant_health: >> Connected to corosync but requires both nodes present >> May 25 13:37:01 vmrh75b sbd[2171]: cluster: warning: notify_parent: >> Notifying parent: UNHEALTHY (6) >> May 25 13:37:01 vmrh75b sbd[2169]: warning: inquisitor_child: cluster health >> check: UNHEALTHY >> May 25 13:37:01 vmrh75b sbd[2169]: warning: inquisitor_child: Servant >> cluster is outdated (age: 226) >> May 25 13:37:01 vmrh75b sbd[2170]: pcmk: notice: unpack_config: >> Watchdog will be used via SBD if fencing is required >> May 25 13:37:01 vmrh75b sbd[2170]: pcmk: info: >> determine_online_status: Node vmrh75b is online >> May 25 13:37:01 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: >> Node 2 is already processed >> May 25 13:37:01 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: >> Node 2 is already processed >> May 25 13:37:01 vmrh75b sbd[2171]: cluster: warning: notify_parent: >> Notifying parent: UNHEALTHY (6) >> May 25 13:37:01 vmrh75b corosync[2172]: [TOTEM ] A new membership >> (192.168.28.132:5712) was formed. Members left: 1 >> May 25 13:37:01 vmrh75b corosync[2172]: [QUORUM] Members[1]: 2 >> May 25 13:37:01 vmrh75b corosync[2172]: [MAIN ] Completed service >> synchronization, ready to provide service. >> May 25 13:37:01 vmrh75b pacemakerd[2179]: notice: Node vmrh75a state is now >> lost >> May 25 13:37:01 vmrh75b pacemaker-controld[2185]: notice: Node vmrh75a state >> is now lost >> May 25 13:37:01 vmrh75b pacemaker-controld[2185]: warning: Stonith/shutdown >> of node vmrh75a was not expected >> May 25 13:37:02 vmrh75b sbd[2171]: cluster: warning: notify_parent: >> Notifying parent: UNHEALTHY (6) >> May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Watchdog will be >> used via SBD if fencing is required >> May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: warning: Blind faith: >> not fencing unseen nodes >> May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Delaying fencing >> operations until there are resources to manage >> May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Calculated >> transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-1410.bz2 >> May 25 13:37:02 vmrh75b pacemaker-controld[2185]: notice: Transition 0 >> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, >> Source=/var/lib/pacemaker/pengine/pe-input-1410.bz2): Complete >> May 25 13:37:02 vmrh75b pacemaker-controld[2185]: notice: State transition >> S_TRANSITION_ENGINE -> S_IDLE >> May 25 13:37:03 vmrh75b sbd[2171]: cluster: warning: notify_parent: >> Notifying parent: UNHEALTHY (6) >> May 25 13:37:03 vmrh75b sbd[2170]: pcmk: notice: unpack_config: >> Watchdog will be used via SBD if fencing is required >> May 25 13:37:03 vmrh75b sbd[2170]: pcmk: info: >> determine_online_status: Node vmrh75b is online >> May 25 13:37:03 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: >> Node 2 is already processed >> May 25 13:37:03 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: >> Node 2 is already processed >> May 25 13:37:04 vmrh75b sbd[2171]: cluster: warning: notify_parent: >> Notifying parent: UNHEALTHY (6) >> May 25 13:37:05 vmrh75b sbd[2169]: warning: inquisitor_child: Latency: No >> liveness for 4 s exceeds threshold of 3 s (healthy servants: 0) >> May 25 13:37:05 vmrh75b sbd[2171]: cluster: warning: notify_parent: >> Notifying parent: UNHEALTHY (6) >> May 25 13:37:05 vmrh75b sbd[2169]: warning: inquisitor_child: Latency: No >> liveness for 4 s exceeds threshold of 3 s (healthy servants: 0) >> >> Best Regards, >> Kazunori INOUE >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org