On Thu, Aug 4, 2022 at 6:07 AM Lentes, Bernd <bernd.len...@helmholtz-muenchen.de> wrote: > > > ----- On 4 Aug, 2022, at 00:27, Reid Wahl nw...@redhat.com wrote: > > > > > Such constraints are unnecessary. > > > > Let's say we have two stonith devices called "fence_dev1" and > > "fence_dev2" that fence nodes 1 and 2, respectively. If node 2 needs > > to be fenced, and fence_dev2 is running on node 2, node 1 will still > > use fence_dev2 to fence node 2. The current location of the stonith > > device only tells us which node is running the recurring monitor > > operation for that stonith device. The device is available to ALL > > nodes, unless it's disabled or it's banned from a given node. So these > > constraints serve no purpose in most cases. > > Would do you mean by "banned" ? "crm resource ban ..." ?
Yes. If you run `pcs resource ban fence_dev1 node-1` (I presume `crm resource ban` does the same thing), then: - fence_dev1 is not allowed to run on node-1 - node-1 is not allowed to use fence-dev to fence a node If you disable the fence_dev1 (the pcs command would be `pcs resource disable`, which sets the target-role meta attribute to Stopped), then **no** node can use fence_dev1 to fence a node. > Is that something different than a location constraint ? It creates a -INFINITY location constraint. The same might also apply when a stonith device has a finite negative preference for a given node -- not sure without testing. > > > If you ban fence_dev2 from node 1, then node 1 won't be able to use > > fence_dev2 to fence node 2. Likewise, if you ban fence_dev1 from node > > 1, then node 1 won't be able to use fence_dev1 to fence itself. > > Usually that's unnecessary anyway, but it may be preferable to power > > ourselves off if we're the last remaining node and a stop operation > > fails. > So banning a fencing device from a node means that this node can't use the > fencing device ? > > > If ha-idg-2 is in standby, it can still fence ha-idg-1. Since it > > sounds like you've banned fence_ilo_ha-idg-1 from ha-idg-1, so that it > > can't run anywhere when ha-idg-2 is in standby, I'm not sure off the > > top of my head whether fence_ilo_ha-idg-1 is available in this > > situation. It may not be. > > ha-idg-2 was not only in standby, i also stopped pacemaker on that node. > Then ha-idg-2 can't fence ha-idg-1 i assume. Correct, ha-idg-2 can't fence ha-idg-1 if ha-idg-2 is stopped. > > > > > A solution would be to stop banning the stonith devices from their > > respective nodes. Surely if fence_ilo_ha-idg-1 had been running on > > ha-idg-1, ha-idg-2 would have been able to use it to fence ha-idg-1. > > (Again, I'm not sure if that's still true if ha-idg-2 is in standby > > **and** fence_ilo_ha-idg-1 is banned from ha-idg-1.) > > > >> Aug 03 01:19:58 [19364] ha-idg-1 stonith-ng: notice: log_operation: > >> Operation 'Off' [20705] (call 2 from crmd.19368) for host 'ha-idg-1' with > >> device 'fence_ilo_ha-idg-2' returned: 0 (OK) > >> So the cluster starts the resource running on ha-idg-1 and cut off > >> ha-idg-2, > >> which isn't necessary. > > > > Here, it sounds like the pcmk_host_list setting is either missing or > > misconfigured for fence_ilo_ha-idg-2. fence_ilo_ha-idg-2 should NOT be > > usable for fencing ha-idg-1. > > > > fence_ilo_ha-idg-1 should be configured with pcmk_host_list=ha-idg-1, > > and fence_ilo_ha-idg-2 should be configured with > > pcmk_host_list=ha-idg-2. > > I will check that. > > > What happened is that ha-idg-1 used fence_ilo_ha-idg-2 to fence > > itself. Of course, this only rebooted ha-idg-2. But based on the > > stonith device configuration, pacemaker on ha-idg-1 believed that > > ha-idg-1 had been fenced. Hence the "allegedly just fenced" message. > > > >> > >> Finally the cluster seems to realize that something went wrong: > >> Aug 03 01:19:58 [19368] ha-idg-1 crmd: crit: > >> tengine_stonith_notify: > >> We were allegedly just fenced by ha-idg-1 for ha-idg-1! > > Bernd > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Regards, Reid Wahl (He/Him) Senior Software Engineer, Red Hat RHEL High Availability - Pacemaker _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/