----- On 4 Aug, 2022, at 00:27, Reid Wahl nw...@redhat.com wrote: > > Such constraints are unnecessary. > > Let's say we have two stonith devices called "fence_dev1" and > "fence_dev2" that fence nodes 1 and 2, respectively. If node 2 needs > to be fenced, and fence_dev2 is running on node 2, node 1 will still > use fence_dev2 to fence node 2. The current location of the stonith > device only tells us which node is running the recurring monitor > operation for that stonith device. The device is available to ALL > nodes, unless it's disabled or it's banned from a given node. So these > constraints serve no purpose in most cases.
Would do you mean by "banned" ? "crm resource ban ..." ? Is that something different than a location constraint ? > If you ban fence_dev2 from node 1, then node 1 won't be able to use > fence_dev2 to fence node 2. Likewise, if you ban fence_dev1 from node > 1, then node 1 won't be able to use fence_dev1 to fence itself. > Usually that's unnecessary anyway, but it may be preferable to power > ourselves off if we're the last remaining node and a stop operation > fails. So banning a fencing device from a node means that this node can't use the fencing device ? > If ha-idg-2 is in standby, it can still fence ha-idg-1. Since it > sounds like you've banned fence_ilo_ha-idg-1 from ha-idg-1, so that it > can't run anywhere when ha-idg-2 is in standby, I'm not sure off the > top of my head whether fence_ilo_ha-idg-1 is available in this > situation. It may not be. ha-idg-2 was not only in standby, i also stopped pacemaker on that node. Then ha-idg-2 can't fence ha-idg-1 i assume. > > A solution would be to stop banning the stonith devices from their > respective nodes. Surely if fence_ilo_ha-idg-1 had been running on > ha-idg-1, ha-idg-2 would have been able to use it to fence ha-idg-1. > (Again, I'm not sure if that's still true if ha-idg-2 is in standby > **and** fence_ilo_ha-idg-1 is banned from ha-idg-1.) > >> Aug 03 01:19:58 [19364] ha-idg-1 stonith-ng: notice: log_operation: >> Operation 'Off' [20705] (call 2 from crmd.19368) for host 'ha-idg-1' with >> device 'fence_ilo_ha-idg-2' returned: 0 (OK) >> So the cluster starts the resource running on ha-idg-1 and cut off ha-idg-2, >> which isn't necessary. > > Here, it sounds like the pcmk_host_list setting is either missing or > misconfigured for fence_ilo_ha-idg-2. fence_ilo_ha-idg-2 should NOT be > usable for fencing ha-idg-1. > > fence_ilo_ha-idg-1 should be configured with pcmk_host_list=ha-idg-1, > and fence_ilo_ha-idg-2 should be configured with > pcmk_host_list=ha-idg-2. I will check that. > What happened is that ha-idg-1 used fence_ilo_ha-idg-2 to fence > itself. Of course, this only rebooted ha-idg-2. But based on the > stonith device configuration, pacemaker on ha-idg-1 believed that > ha-idg-1 had been fenced. Hence the "allegedly just fenced" message. > >> >> Finally the cluster seems to realize that something went wrong: >> Aug 03 01:19:58 [19368] ha-idg-1 crmd: crit: >> tengine_stonith_notify: >> We were allegedly just fenced by ha-idg-1 for ha-idg-1! Bernd
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/