On 05/25/2018 07:31 AM, 井上 和徳 wrote: > Hi, > > I am checking the watchdog function of SBD (without shared block-device). > In a two-node cluster, if one cluster is stopped, watchdog is triggered on > the remaining node. > Is this the designed behavior?
SBD without a shared block-device doesn't really make sense on a two-node cluster. The basic idea is - e.g. in a case of a networking problem - that a cluster splits up in a quorate and a non-quorate partition. The quorate partition stays over while SBD guarantees a reliable watchdog-based self-fencing of the non-quorate partition within a defined timeout. This idea of course doesn't work with just 2 nodes. Taking quorum info from the 2-node feature of corosync (automatically switching on wait-for-all) doesn't help in this case but instead would lead to split-brain. What you can do - and what e.g. pcs does automatically - is enable the auto-tie-breaker instead of two-node in corosync. But that still doesn't give you a higher availability than the one of the winner of auto-tie-breaker. (Maybe interesting if you are going for a load-balancing-scenario that doesn't affect availability or for a transient state while setting up a cluste node-by-node ...) What you can do though is using qdevice to still have 'real-quorum' info with just 2 full cluster-nodes. There was quite a lot of discussion round this topic on this thread previously if you search the history. Regards, Klaus > > [vmrh75b]# cat /etc/corosync/corosync.conf > (snip) > quorum { > provider: corosync_votequorum > two_node: 1 > } > > [vmrh75b]# cat /etc/sysconfig/sbd > # This file has been generated by pcs. > SBD_DELAY_START=no > ## SBD_DEVICE="/dev/vdb1" > SBD_OPTS="-vvv" > SBD_PACEMAKER=yes > SBD_STARTMODE=always > SBD_WATCHDOG_DEV=/dev/watchdog > SBD_WATCHDOG_TIMEOUT=5 > > [vmrh75b]# crm_mon -r1 > Stack: corosync > Current DC: vmrh75a (version 2.0.0-0.1.rc4.el7-2.0.0-rc4) - partition with > quorum > Last updated: Fri May 25 13:36:07 2018 > Last change: Fri May 25 13:35:22 2018 by root via cibadmin on vmrh75a > > 2 nodes configured > 0 resources configured > > Online: [ vmrh75a vmrh75b ] > > No resources > > [vmrh75b]# pcs property show > Cluster Properties: > cluster-infrastructure: corosync > cluster-name: my_cluster > dc-version: 2.0.0-0.1.rc4.el7-2.0.0-rc4 > have-watchdog: true > stonith-enabled: false > > [vmrh75b]# ps -ef | egrep "sbd|coro|pace" > root 2169 1 0 13:34 ? 00:00:00 sbd: inquisitor > root 2170 2169 0 13:34 ? 00:00:00 sbd: watcher: Pacemaker > root 2171 2169 0 13:34 ? 00:00:00 sbd: watcher: Cluster > root 2172 1 0 13:34 ? 00:00:00 corosync > root 2179 1 0 13:34 ? 00:00:00 /usr/sbin/pacemakerd -f > haclust+ 2180 2179 0 13:34 ? 00:00:00 > /usr/libexec/pacemaker/pacemaker-based > root 2181 2179 0 13:34 ? 00:00:00 > /usr/libexec/pacemaker/pacemaker-fenced > root 2182 2179 0 13:34 ? 00:00:00 > /usr/libexec/pacemaker/pacemaker-execd > haclust+ 2183 2179 0 13:34 ? 00:00:00 > /usr/libexec/pacemaker/pacemaker-attrd > haclust+ 2184 2179 0 13:34 ? 00:00:00 > /usr/libexec/pacemaker/pacemaker-schedulerd > haclust+ 2185 2179 0 13:34 ? 00:00:00 > /usr/libexec/pacemaker/pacemaker-controld > > [vmrh75b]# pcs cluster stop vmrh75a > vmrh75a: Stopping Cluster (pacemaker)... > vmrh75a: Stopping Cluster (corosync)... > > [vmrh75b]# tail -F /var/log/messages > May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: Our peer on the DC > (vmrh75a) is dead > May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition > S_NOT_DC -> S_ELECTION > May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition > S_ELECTION -> S_INTEGRATION > May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Node vmrh75a state is > now lost > May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Removing all vmrh75a > attributes for peer loss > May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Lost attribute writer > vmrh75a > May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Purged 1 peer with > id=1 and/or uname=vmrh75a from the membership cache > May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Node vmrh75a state is > now lost > May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Purged 1 peer with > id=1 and/or uname=vmrh75a from the membership cache > May 25 13:37:00 vmrh75b pacemaker-based[2180]: notice: Node vmrh75a state is > now lost > May 25 13:37:00 vmrh75b pacemaker-based[2180]: notice: Purged 1 peer with > id=1 and/or uname=vmrh75a from the membership cache > May 25 13:37:00 vmrh75b pacemaker-controld[2185]: warning: Input > I_ELECTION_DC received in state S_INTEGRATION from do_election_check > May 25 13:37:01 vmrh75b sbd[2171]: cluster: warning: set_servant_health: > Connected to corosync but requires both nodes present > May 25 13:37:01 vmrh75b sbd[2171]: cluster: warning: notify_parent: > Notifying parent: UNHEALTHY (6) > May 25 13:37:01 vmrh75b sbd[2169]: warning: inquisitor_child: cluster health > check: UNHEALTHY > May 25 13:37:01 vmrh75b sbd[2169]: warning: inquisitor_child: Servant cluster > is outdated (age: 226) > May 25 13:37:01 vmrh75b sbd[2170]: pcmk: notice: unpack_config: > Watchdog will be used via SBD if fencing is required > May 25 13:37:01 vmrh75b sbd[2170]: pcmk: info: > determine_online_status: Node vmrh75b is online > May 25 13:37:01 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: > Node 2 is already processed > May 25 13:37:01 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: > Node 2 is already processed > May 25 13:37:01 vmrh75b sbd[2171]: cluster: warning: notify_parent: > Notifying parent: UNHEALTHY (6) > May 25 13:37:01 vmrh75b corosync[2172]: [TOTEM ] A new membership > (192.168.28.132:5712) was formed. Members left: 1 > May 25 13:37:01 vmrh75b corosync[2172]: [QUORUM] Members[1]: 2 > May 25 13:37:01 vmrh75b corosync[2172]: [MAIN ] Completed service > synchronization, ready to provide service. > May 25 13:37:01 vmrh75b pacemakerd[2179]: notice: Node vmrh75a state is now > lost > May 25 13:37:01 vmrh75b pacemaker-controld[2185]: notice: Node vmrh75a state > is now lost > May 25 13:37:01 vmrh75b pacemaker-controld[2185]: warning: Stonith/shutdown > of node vmrh75a was not expected > May 25 13:37:02 vmrh75b sbd[2171]: cluster: warning: notify_parent: > Notifying parent: UNHEALTHY (6) > May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Watchdog will be > used via SBD if fencing is required > May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: warning: Blind faith: not > fencing unseen nodes > May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Delaying fencing > operations until there are resources to manage > May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Calculated > transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-1410.bz2 > May 25 13:37:02 vmrh75b pacemaker-controld[2185]: notice: Transition 0 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-1410.bz2): Complete > May 25 13:37:02 vmrh75b pacemaker-controld[2185]: notice: State transition > S_TRANSITION_ENGINE -> S_IDLE > May 25 13:37:03 vmrh75b sbd[2171]: cluster: warning: notify_parent: > Notifying parent: UNHEALTHY (6) > May 25 13:37:03 vmrh75b sbd[2170]: pcmk: notice: unpack_config: > Watchdog will be used via SBD if fencing is required > May 25 13:37:03 vmrh75b sbd[2170]: pcmk: info: > determine_online_status: Node vmrh75b is online > May 25 13:37:03 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: > Node 2 is already processed > May 25 13:37:03 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: > Node 2 is already processed > May 25 13:37:04 vmrh75b sbd[2171]: cluster: warning: notify_parent: > Notifying parent: UNHEALTHY (6) > May 25 13:37:05 vmrh75b sbd[2169]: warning: inquisitor_child: Latency: No > liveness for 4 s exceeds threshold of 3 s (healthy servants: 0) > May 25 13:37:05 vmrh75b sbd[2171]: cluster: warning: notify_parent: > Notifying parent: UNHEALTHY (6) > May 25 13:37:05 vmrh75b sbd[2169]: warning: inquisitor_child: Latency: No > liveness for 4 s exceeds threshold of 3 s (healthy servants: 0) > > Best Regards, > Kazunori INOUE > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org