On 02/01/19 15:43 +0100, Jan Pokorný wrote: > On 28/12/18 05:51 +0900, renayama19661...@ybb.ne.jp wrote: >> As a result, Pacemaker will stop without stopping the resource. > > This might have serious consequences in some scenarios, perhaps > unless some watchdog-based solution (SBD?) was used as a fencing > of choice since it would not get defused just as the resource > wasn't stopped, I think...
Just very recently, I realized that pacemaker is likely not sufficiently vigorous, in part for simplicity of design constraints, in part for neglectation thereof, to prevent any such "stray started resource" leaks that verge on resource-level split-brains, at least in theory. Take, for example, an OCF/LSB resource (hence with just approximated monitoring capabilities by design) that takes unusually long to start. What if pacemaker-execd (lrmd) crashes midway to bring it to start, making the original resource process reparented to PID1? Pacemakerd will restart this child daemon anew, resources will get probed, but because the OCF/LSB resource in question is not started yet (e.g. it double-forks, it does a lengthy initialization in between the forks, only near the finish line it will create a pid file that is also the only indicator for the respective monitor operation), pacemaker on this node indicates to the peers this particular resource is _not_ running locally, making them free to run it if DC decides so. That is, unless the start operation comes with an override of "on-fail" default if this start-monitor pair would be evaluated as a failed start at all (I don't know). But what we are observing now is an opportunity for resource-level split-brain to emerge; remember, the resource on the original node, now under PID1's supervision, is about to finish its initialization any present momement + no more probe/monitor is coming there (unless explicitly configured so) to realize this disaster any time soon. This theoretical observation makes systemd class of resources (putting nagios and upstart aside now for not having a look at them, and, perhaps naively, assuming that things like a double-fencing are relatively harmless -- it's meant to be downright idempotent when the action is "off", unless it would collide with the parallel manual intervention, indeed) the only one universally and relatively safely survivable pacemaker-execd isolated restart (even then, it might be recommended to have systemd sitting on the ticking watchdog just in case, since when it internally "asserts", no further actions are possible till the machine is restarted; indeed, unless pacemaker can capture this circumstance and panic on its own). Alternatively, one needs to make sure the OCF/LSB agent's start operation begins with creating what's usually called a lock file, so that after-restart probe in such a scenario will spot, in combination with missing pid file, that the resource is still coming to its start, give it some time for pid file to actually appear, and if not in time, preferably trigger panic/self-fencing, since any getting-hold-of-a-process-by-procfs-scan is a broken approach (there's no snapshot semantics imposed with POSIX), especially when there can be containers running on that host. The other alternative in the current state of affairs and without having OCF/LSB resources in use properly scrutinized (fact that they start timely may be sufficient) is declaring PCMK_fail_fast=yes in /etc/sysconfig/pacemaker or equivalent. I do apologize beforehand for not having verified these scenarios by hand, I wish I had a throughput for that. Sadly, the failure modes are far from being documented, which is best done along creating and implementing the design (with a very desirable feedback loop when running into particular corner cases), without the need for reverse engineering (reverse grasping of the intentions prone to misunderstanding) afterwards. Keep calm, things have always been this way :-) -- Nazdar, Jan (Poki)
pgp8Caardy9Kx.pgp
Description: PGP signature
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org