Hi, I finally found my mistake: I have set up the failure-timeout like the lifetime example in the RedHat Documentation with the value PT1M. If I set up the failure-timeout with 60, it works like it should.
Just trying a last question ...: Couldn't it be something in the log telling the value isn't at the right format ? Pierre-Yves De : LE COQUIL Pierre-Yves Envoyé : mercredi 27 septembre 2017 19:37 À : 'users@clusterlabs.org' <users@clusterlabs.org> Objet : RE: monitor failed actions not cleared De : LE COQUIL Pierre-Yves Envoyé : lundi 25 septembre 2017 16:58 À : 'users@clusterlabs.org' <users@clusterlabs.org<mailto:users@clusterlabs.org>> Objet : monitor failed actions not cleared Hi, I'am using Pacemaker 1.1.15-11.el7_3.4 / Corosync 2.4.0-4.el7 under CentOS 7.3.1611 ð Is this configuration too old ? (yum indicates these versions are up to date) ð Should I install more recent versions of Pacemaker and Corosync ? My subject is very close to the post "clearing failed actions" initiated by Attila Megyeri in May 2017. But the issue doesn't fit my case. What I want to do is: - 2 systemd resources running on 1 of the 2 nodes of my cluster, - When 1 resource fails (by killing it or by moving the resource), I want it to be restarted on the other node, but I want the other resource still running on the same node. ð Is this possible with Pacemaker ? What I have done in addition to the default parameters: - For my resources: o migration-threshold=1, o failure-timeout=PT1M - For the cluster o Cluster-recheck-interval=120 I have added for my resource operation monitor: on-fail=restart (which is the default) I do not use Fencing (Stonith Enabled = false) ð Is Fencing compatible with my goal ? What happens: - When I kill or move 1 resource, it is restarted on the other node => OK - The failcount is incremented to 1 for this resource => OK - The failcount is never cleared => NOK ð I get a warning in the log : "pengine: warning: unpack_rsc_op_failure: Processing failed op monitor for ACTIVATION_KX on metro.cas-n1: not running (7)" when my resource ACTIVATION_KX has been killed on node metro.cas-n1 but pcs status shows ACTIVATION_KX is started on the other node ð Is it a bad monitor operation configuration for my resource ? (I have added "requires= nothing") I know that my english and my pacemaker knowledge are not so high but could you please give me some explanations about that behavior that I misunderstand. ð If something is wrong with my post, just tell me (this is my first) Thank you Thanks Pierre-Yves Le Coquil
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org