> 11 апр. 2019 г., в 20:00, Klaus Wenninger <kwenn...@redhat.com> написал(а): > > On 4/11/19 5:27 PM, Олег Самойлов wrote: >> Hi all. >> I am developing HA PostgreSQL cluster for 2 or 3 datacenters. In case of >> DataCenter failure (blackout) the fencing will not work and will prevent to >> switching to working DC. So I disable the fencing. The cluster working is >> based on a quorum and I added a quorum device on a third DC in case of 2 DC. >> But I need somehow solve > Why would you disable fencing? SBD with watchdog-fencing (no shared > disk) is made for exactly that use-case but you need fencing to > be enabled and stonith-watchdog-timeout to be set to roughly 2x the > watchdog-timeout.
Interesting. There are a lot in documentation about using the sbd with 1,2,3 block devices, but about using without block devices is nothing, except a sentence that this is possible. :) > Regarding a node restart to be triggered that shouldn't make much > difference but if you disable fencing you won't get the remaining > cluster to wait for the missing node to be reset and proceed afterwards > (regardless if the lost node shows up again or not). Yep, in my case this will good for floating IPs. >> cases when corosync or pacemaker is freeze. In this case I use a hw watchdog >> or a softdog and SBD as watchdog daemon (without shared devices). Well, >> after this if I kill the corosync or the pacemakerd, all fine, the node is >> restarted. And if I freeze sbd by `killall -s STOP sbd`, all fine, reboots. >> But if I freeze corosync or pacemakerd by `killall -s STOP` or by `ifdown >> eth0` (corosync is frozen in this case), nothing happened. The question is >> «Is this is fixed in the master branch or in 1.4.0?» (I use centos rpms: sbd >> v1.3.1) or where I need to look for (in what file, function) to fix this. > Referring to the above I'm not sure how you did configure sbd. Just pcs stonith sbd enable pcs property set stonith-enabled=false Now I change to pcs stonith sbd enable pcs property set stonith-enabled=true pcs property set stonith-watchdog-timeout=12 > > ifdown of the corosync-interface definitely gives me a reboot on a > cluster with corosync-3 and current sbd from master. I tested with corosync 2.4.3 (default for CentOS 7). Or may be in your case reboot was happened by the fencing, But no matter, if a watchdog will work as expected. > But iirc there was an improvement regarding this in corosync. > Freezing corosync or pacemakerd on the other hand doesn't trigger anything. > For doing a regular ping to corosync via cpg there is an outstanding PR > that should help here - unfortunately needs to be rebased to current sbd > (don't find it atm - strange) > > Regarding pacemakerd that should be a little bit more complicated as > pacemakerd is just the main control daemon. > So if you freeze that it shouldn't be harmful for the first but of > course as pacemakerd is doing the observation of the rest of the > pacemaker-daemons it should be somehow watchdog-observed. iirc there > were some tests by hideo using corosync-watchdog-device-integration. But > these attempts unfortunately slept in as well. You should find some > discussion in the mailinglist-archives about it. Unfortunately having > corosync open a watchdog-device makes it fight with sbd for that > resource. But a generic solution isn't that simple as not every setup is > using sbd. Well, I see, freezeing of pacemaker daemons is not monitoring by the watchdog daemon (sbd). It’s strange, I see two «Watcher» daemon from sbd, one for corosync, other for pacemakerd, they must do something useful. :) I want that the behaviour was at least the same as with normal fencing. In case of fencing if corosync or pacemaker freeze, failure node is fenced. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/