On 3/3/20 11:22 PM, wf...@niif.hu wrote:
Hi,

I suffered unexpected fencing under Pacemaker 2.0.1.  I set a resource
to unmanaged (crm_resource -r vm-invtest -m -p is-managed -v false),
then played with ocf-tester, which left the resource stopped.  Finally I
deleted the resource (crm_resource -r vm-invtest --delete -t primitive),
which led to:

pacemaker-controld[11670]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
pacemaker-schedulerd[11669]:  notice: Clearing failure of vm-invtest on inv1 
because resource parameters have changed
pacemaker-schedulerd[11669]:  warning: Processing failed monitor of vm-invtest 
on inv1: not running
pacemaker-schedulerd[11669]:  warning: Detected active orphan vm-invtest 
running on inv1
pacemaker-schedulerd[11669]:  notice: Clearing failure of vm-invtest on inv1 
because it is orphaned
pacemaker-schedulerd[11669]:  notice:  * Stop       vm-invtest       (  inv1 )  
 due to node availability
pacemaker-schedulerd[11669]:  notice: Calculated transition 959, saving inputs 
in /var/lib/pacemaker/pengine/pe-input-87.bz2
pacemaker-controld[11670]:  notice: Initiating stop operation vm-invtest_stop_0 
on inv1
pacemaker-controld[11670]:  notice: Transition 959 aborted by deletion of 
lrm_rsc_op[@id='vm-invtest_last_failure_0']: Resource operation removal
pacemaker-controld[11670]:  warning: Action 6 (vm-invtest_stop_0) on inv1 
failed (target: 0 vs. rc: 6): Error
pacemaker-controld[11670]:  notice: Transition 959 (Complete=5, Pending=0, 
Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-87.bz2): Complete
pacemaker-schedulerd[11669]:  warning: Processing failed stop of vm-invtest on 
inv1: not configured
pacemaker-schedulerd[11669]:  error: Preventing vm-invtest from re-starting 
anywhere: operation stop failed 'not configured' (6)
pacemaker-schedulerd[11669]:  warning: Processing failed stop of vm-invtest on 
inv1: not configured
pacemaker-schedulerd[11669]:  error: Preventing vm-invtest from re-starting 
anywhere: operation stop failed 'not configured' (6)
pacemaker-schedulerd[11669]:  warning: Cluster node inv1 will be fenced: 
vm-invtest failed there
pacemaker-schedulerd[11669]:  warning: Detected active orphan vm-invtest 
running on inv1
pacemaker-schedulerd[11669]:  warning: Scheduling Node inv1 for STONITH
pacemaker-schedulerd[11669]:  notice: Stop of failed resource vm-invtest is 
implicit after inv1 is fenced
pacemaker-schedulerd[11669]:  notice:  * Fence (reboot) inv1 'vm-invtest failed 
there'
pacemaker-schedulerd[11669]:  notice:  * Move       fencing-inv3     ( inv1 -> 
inv2 )
pacemaker-schedulerd[11669]:  notice:  * Stop       vm-invtest       (         
inv1 )   due to node availability

The OCF resource agent (on inv1) reported that it failed to validate one
of the attributes passed to it for the stop operation, hence the "not
configured" error, which caused the fencing.  Is there a way to find out
what attributes were passed to the OCF agent in that fateful invocation?
I've got pe-input files, Pacemaker detail logs and a hard time wading
through them.  I failed to reproduce the issue till now (but I haven't
rewound the CIB yet).


Hi Feri,

> Is there a way to find out what attributes were passed to the OCF agent in that fateful invocation?

Basically same as with any other operation while the resource was configured (with exception of ACTION which was 'stop' in case of stopping resource).

As you have the pe-input files which contains the attributes of the resource you can get the attributes and their values from there.
==
For example if I have tried to delete my test resource with same name, the following can be found in pe-input file

...
<primitive class="ocf" id="vm-invtest" provider="pacemaker" type="Dummy">
        <meta_attributes id="vm-invtest-meta_attributes">
<nvpair id="vm-invtest-meta_attributes-target-role" name="target-role" value="Stopped"/>
        </meta_attributes>
        <instance_attributes id="vm-invtest-instance_attributes">
<nvpair id="vm-invtest-instance_attributes-fake" name="fake" value="some_value"/>
        </instance_attributes>
        <operations>
<op id="vm-invtest-migrate_from-interval-0s" interval="0s" name="migrate_from" timeout="20s"/> <op id="vm-invtest-migrate_to-interval-0s" interval="0s" name="migrate_to" timeout="20s"/> <op id="vm-invtest-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/> <op id="vm-invtest-reload-interval-0s" interval="0s" name="reload" timeout="20s"/> <op id="vm-invtest-start-interval-0s" interval="0s" name="start" timeout="20s"/> <op id="vm-invtest-stop-interval-0s" interval="0s" name="stop" timeout="20s"/>
        </operations>
      </primitive>
...
From above you can see that cluster will be stopping it because of the 'name="target-role" value="Stopped"'. Also you can see that this resource has one attribute (nvpair) with value - name="fake value="some_value"'. Taking inspiration from /usr/lib/ocf/resource.d/pacemaker/Dummy I can see that resource agent will be called like "/usr/lib/ocf/resource.d/pacemaker/Dummy stop" and there will be at minimum $OCF_RESKEY_fake variable passed to it. If you can reproduce the same issue you can try to dump all variables to file when validation fails (take inspiration from function 'dump_env()' of Dummy resource).

So if you wanna check what attributes were set around the time of deletion have a look at /var/lib/pacemaker/pengine/pe-input-87.bz2 or maybe /var/lib/pacemaker/pengine/pe-input-86.bz2.

--
Ondrej Famera
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to