On Wednesday 31 March 2021 at 15:48:15, Ken Gaillot wrote: > On Wed, 2021-03-31 at 14:32 +0200, Antony Stone wrote: > > > So, what am I misunderstanding about "failure-timeout", and what > > configuration setting do I need to use to tell pacemaker that "provided the > > resource hasn't failed within the past X seconds, forget the fact that it > > failed more than X seconds ago"? > > Unfortunately, there is no way. failure-timeout expires *all* failures > once the *most recent* is that old. It's a bit counter-intuitive but > currently, Pacemaker only remembers a resource's most recent failure > and the total count of failures, and changing that would be a big > project.
So, are you saying that if a resource failed last Friday, and then again on Saturday, but has been running perfectly happily ever since, a failure today will trigger "that's it, we're moving it, it doesn't work here"? That seems bizarre. Surely the length of time a resource has been running without problem should be taken into account when deciding whether the node it's running on is fit to handle it or not? My problem is also bigger than that - and I can't believe there isn't a way round the following, otherwise people couldn't use pacemaker: I have "migration-threshold=3" on most of my resources, and I have three nodes. If a resource fails for the third time (in any period of time) on a node, it gets moved (along with the rest in the group) to another node. The cluster does not forget that it failed and was moved away from the first node, though. "crm status -f" confirms that to me. If it then fails three times (in an hour, or a fortnight, whatever) on the second node, it gets moved to node 3, and from that point on the cluster thinks there's nowhere else to move it to, so another failure means a total failure of the cluster. There must be _something_ I'm doing wrong for the cluster to behave in this way? It can't believe it's by design. Regards, Antony. -- Anyone that's normal doesn't really achieve much. - Mark Blair, Australian rocket engineer Please reply to the list; please *don't* CC me. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/