>>> Digimer <li...@alteeve.ca> schrieb am 07.10.2020 um 23:27 in Nachricht <d8e826da-f7cc-a8c6-9793-ea73e8280...@alteeve.ca>: > On 2020-10-07 2:35 a.m., Ulrich Windl wrote: >>>>> Digimer <li...@alteeve.ca> schrieb am 07.10.2020 um 05:42 in Nachricht >> <b1b2c412-1cc4-e77a-230e-a5d442370...@alteeve.ca>: >>> Hi all, >>> >>> While developing our program (and not being a production cluster), I >>> find that when I push broken code to a node, causing the RA to fail to >>> perform an operation, the node gets fenced. (example below). >> >> (I see others have replied, too, but anyway) >> Specifically it's the "stop" operation that may not fail. >> >>> >>> This brings up a question; >>> >>> If a single resource fails for any reason and can't be recovered, but >>> other resources on the node are still operational, how can I suppress a >>> self-fence? I'd rather one failed resource than having all resources get >>> killed (they're VMs, so restarting on the peer is ... disruptive). >> >> I think you can (on-fail=block (AFAIR). >> Note: This is not a political statement for any near elections ;-) > > Indeed, and this works. I misunderstood the pcs syntax and applied the > 'on-fail="stop"' to the monitor operation... Woops. > >>> If this is a bad approach (sufficiently bad to justify hard-rebooting >>> other VMs that had been running on the same node), why is that? Are >>> there any less-bad options for this scenario? >>> >>> Obviously, I would never push untested code to a production system, >>> but knowing now that this is possible (losing a node with it's other VMs >>> on an RA / code fault), I'm worried about some unintended "oops" causing >>> the loss of a node. >>> >>> For example, would it be possible to have the node try to live migrate >>> services to the other peer, before self-fencing in a scenario like this? >> >> As there is guarantee that migration will succeed without fencing the node
s/there is/there is no/ # sorry > it >> could only be done with a timeout; otherwise the node will be hanging while >> waiting for migration to succeed. > > I figured as much. > ... Regards, Ulrich _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/