On 04/03/2018 05:43 PM, Ken Gaillot wrote: > On Tue, 2018-04-03 at 07:36 +0200, Klaus Wenninger wrote: >> On 04/02/2018 04:02 PM, Ken Gaillot wrote: >>> On Mon, 2018-04-02 at 10:54 +0200, Jehan-Guillaume de Rorthais >>> wrote: >>>> On Sun, 1 Apr 2018 09:01:15 +0300 >>>> Andrei Borzenkov <arvidj...@gmail.com> wrote: >>>> >>>>> 31.03.2018 23:29, Jehan-Guillaume de Rorthais пишет: >>>>>> Hi all, >>>>>> >>>>>> I experienced a problem in a two node cluster. It has one FA >>>>>> per >>>>>> node and >>>>>> location constraints to avoid the node each of them are >>>>>> supposed >>>>>> to >>>>>> interrupt. >>>>> If you mean stonith resource - for all I know location it does >>>>> not >>>>> affect stonith operations and only changes where monitoring >>>>> action >>>>> is >>>>> performed. >>>> Sure. >>>> >>>>> You can create two stonith resources and declare that each >>>>> can fence only single node, but that is not location >>>>> constraint, it >>>>> is >>>>> resource configuration. Showing your configuration would be >>>>> helpflul to >>>>> avoid guessing. >>>> True, I should have done that. A conf worth thousands of words :) >>>> >>>> crm conf<<EOC >>>> >>>> primitive fence_vm_srv1 stonith:fence_virsh \ >>>> params pcmk_host_check="static-list" pcmk_host_list="srv1" \ >>>> ipaddr="192.168.2.1" login="<user>" \ >>>> identity_file="/root/.ssh/id_rsa" \ >>>> port="srv1-d8" action="off" \ >>>> op monitor interval=10s >>>> >>>> location fence_vm_srv1-avoids-srv1 fence_vm_srv1 -inf: srv1 >>>> >>>> primitive fence_vm_srv2 stonith:fence_virsh \ >>>> params pcmk_host_check="static-list" pcmk_host_list="srv2" \ >>>> ipaddr="192.168.2.1" login="<user>" \ >>>> identity_file="/root/.ssh/id_rsa" \ >>>> port="srv2-d8" action="off" \ >>>> op monitor interval=10s >>>> >>>> location fence_vm_srv2-avoids-srv2 fence_vm_srv2 -inf: srv2 >>>> >>>> EOC >>>> >> -inf constraints like that should effectively prevent >> stonith-actions from being executed on that nodes. > It shouldn't ... > > Pacemaker respects target-role=Started/Stopped for controlling > execution of fence devices, but location (or even whether the device is > "running" at all) only affects monitors, not execution. > >> Though there are a few issues with location constraints >> and stonith-devices. >> >> When stonithd brings up the devices from the cib it >> runs the parts of pengine that fully evaluate these >> constraints and it would disable the stonith-device >> if the resource is unrunable on that node. > That should be true only for target-role, not everything that affects > runnability
cib_device_update bails out via a removal of the device if - role == stopped - node not in allowed_nodes-list of stonith-resource - weight is negative Wouldn't that include a -inf rule for a node? It is of course clear that no pengine-decision to start a stonith-resource is required for it to be used for fencing. Regards, Klaus > >> But this part is not retriggered for location contraints >> with attributes or other content that would dynamically >> change. So one has to stick with constraints as simple >> and static as those in the example above. >> >> Regarding adding/removing location constraints dynamically >> I remember a bug that should have got fixed round 1.1.18 >> that led to improper handling and actually usage of >> stonith-devices disabled or banned from certain nodes. >> >> Regards, >> Klaus >> >>>>>> During some tests, a ms resource raised an error during the >>>>>> stop >>>>>> action on >>>>>> both nodes. So both nodes were supposed to be fenced. >>>>> In two-node cluster you can set pcmk_delay_max so that both >>>>> nodes >>>>> do not >>>>> attempt fencing simultaneously. >>>> I'm not sure to understand the doc correctly in regard with this >>>> property. Does >>>> pcmk_delay_max delay the request itself or the execution of the >>>> request? >>>> >>>> In other words, is it: >>>> >>>> delay -> fence query -> fencing action >>>> >>>> or >>>> >>>> fence query -> delay -> fence action >>>> >>>> ? >>>> >>>> The first definition would solve this issue, but not the second. >>>> As I >>>> understand it, as soon as the fence query has been sent, the node >>>> status is >>>> "UNCLEAN (online)". >>> The latter -- you're correct, the node is already unclean by that >>> time. >>> Since the stop did not succeed, the node must be fenced to continue >>> safely. >> Well, pcmk_delay_base/max are made for the case >> where both nodes in a 2-node-cluster loose contact >> and see the respectively other as unclean. >> If the looser gets fenced it's view of the partner- >> node becomes irrelevant. >> >>>>>> The first node did, but no FA was then able to fence the >>>>>> second >>>>>> one. So the >>>>>> node stayed DC and was reported as "UNCLEAN (online)". >>>>>> >>>>>> We were able to fix the original ressource problem, but not >>>>>> to >>>>>> avoid the >>>>>> useless second node fencing. >>>>>> >>>>>> My questions are: >>>>>> >>>>>> 1. is it possible to cancel the fencing request >>>>>> 2. is it possible reset the node status to "online" ? >>>>> Not that I'm aware of. >>>> Argh! >>>> >>>> ++ >>> You could fix the problem with the stopped service manually, then >>> run >>> "stonith_admin --confirm=<NODENAME>" (or higher-level tool >>> equivalent). >>> That tells the cluster that you took care of the issue yourself, so >>> fencing can be considered complete. >>> >>> The catch there is that the cluster will assume you stopped the >>> node, >>> and all services on it are stopped. That could potentially cause >>> some >>> headaches if it's not true. I'm guessing that if you unmanaged all >>> the >>> resources on it first, then confirmed fencing, the cluster would >>> detect >>> everything properly, then you could re-manage. >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. >> pdf >> Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org