Re: [ClusterLabs] monitor failed actions not cleared

LE COQUIL Pierre-Yves Mon, 02 Oct 2017 06:34:01 -0700

Hi,

I finally found my mistake:
I have set up the failure-timeout like the lifetime example in the RedHat 
Documentation with the value PT1M.
If I set up the failure-timeout with 60, it works like it should.


Just trying a last question ...:
Couldn't it be something in the log telling the value isn't at the right format 
?

Pierre-Yves


De : LE COQUIL Pierre-Yves
Envoyé : mercredi 27 septembre 2017 19:37
À : 'users@clusterlabs.org' <users@clusterlabs.org>
Objet : RE: monitor failed actions not cleared



De : LE COQUIL Pierre-Yves
Envoyé : lundi 25 septembre 2017 16:58
À : 'users@clusterlabs.org' 
<users@clusterlabs.org<mailto:users@clusterlabs.org>>
Objet : monitor failed actions not cleared

Hi,

I'am using Pacemaker  1.1.15-11.el7_3.4 / Corosync 2.4.0-4.el7 under CentOS 
7.3.1611


ð  Is this configuration too old ? (yum indicates these versions are up to date)

ð  Should I install more recent versions of Pacemaker and Corosync ?

My subject is very close to the post "clearing failed actions" initiated by 
Attila Megyeri in May 2017.
But the issue doesn't fit my case.

What I want to do is:

-          2 systemd resources running on 1 of the 2 nodes of my cluster,

-          When  1 resource fails (by killing it or by moving the resource), I 
want it to be restarted on the other node, but I want the other resource still 
running on the same node.


ð  Is this possible with Pacemaker ?

What I have done in addition to the default parameters:

-          For my resources:

o   migration-threshold=1,

o   failure-timeout=PT1M

-          For the cluster

o   Cluster-recheck-interval=120

I have added for my resource operation monitor: on-fail=restart (which is the 
default)

I do not use Fencing (Stonith Enabled = false)

ð  Is Fencing compatible with my goal ?

What happens:

-          When I kill or move 1 resource, it is restarted on the other node => 
OK

-          The failcount is incremented to 1 for this resource => OK

-          The failcount is never cleared => NOK


ð  I get a warning in the log :

"pengine:  warning: unpack_rsc_op_failure:        Processing failed op monitor 
for ACTIVATION_KX on metro.cas-n1: not running (7)"

when my resource  ACTIVATION_KX has been killed on node  metro.cas-n1

but pcs status shows ACTIVATION_KX is started on the other node


ð  Is it a bad monitor operation configuration for my resource ? (I have added 
"requires= nothing")

I know that my english and my pacemaker knowledge are not so high but could you 
please give me some explanations about that behavior that I misunderstand.


ð  If something is wrong with my post, just tell me (this is my first)

Thank you

Thanks

Pierre-Yves Le Coquil

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] monitor failed actions not cleared

Reply via email to