On Mon, 02 Apr 2018 09:02:24 -0500 Ken Gaillot <kgail...@redhat.com> wrote: > On Mon, 2018-04-02 at 10:54 +0200, Jehan-Guillaume de Rorthais wrote: > > On Sun, 1 Apr 2018 09:01:15 +0300 > > Andrei Borzenkov <arvidj...@gmail.com> wrote: [...] > > > In two-node cluster you can set pcmk_delay_max so that both nodes > > > do not > > > attempt fencing simultaneously. > > > > I'm not sure to understand the doc correctly in regard with this > > property. Does > > pcmk_delay_max delay the request itself or the execution of the > > request? > > > > In other words, is it: > > > > delay -> fence query -> fencing action > > > > or > > > > fence query -> delay -> fence action > > > > ? > > > > The first definition would solve this issue, but not the second. As I > > understand it, as soon as the fence query has been sent, the node > > status is > > "UNCLEAN (online)". > > The latter -- you're correct, the node is already unclean by that time. > Since the stop did not succeed, the node must be fenced to continue > safely.
Thank you for this clarification. Do you want to patch to add this clarification to the documentation ? > > > > The first node did, but no FA was then able to fence the second > > > > one. So the > > > > node stayed DC and was reported as "UNCLEAN (online)". > > > > > > > > We were able to fix the original ressource problem, but not to > > > > avoid the > > > > useless second node fencing. > > > > > > > > My questions are: > > > > > > > > 1. is it possible to cancel the fencing request > > > > 2. is it possible reset the node status to "online" ? > > > > > > Not that I'm aware of. > > > > Argh! > > > > ++ > > You could fix the problem with the stopped service manually, then run > "stonith_admin --confirm=<NODENAME>" (or higher-level tool equivalent). > That tells the cluster that you took care of the issue yourself, so > fencing can be considered complete. Oh, OK. I was wondering if it could help. For the complete story, while I was working on this cluster, we tried first to "unfence" the node using "stonith_admin --unfence <nodename>"...and it actually rebooted the node (using fence_vmware_soap) without cleaning its status?? ...So we actually cleaned the status using "--confirm" after the complete reboot. Thank you for this clarification again. > The catch there is that the cluster will assume you stopped the node, > and all services on it are stopped. That could potentially cause some > headaches if it's not true. I'm guessing that if you unmanaged all the > resources on it first, then confirmed fencing, the cluster would detect > everything properly, then you could re-manage. Good to know. Thanks again. _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org