Hi. I'm trying to understand what looks to me like incorrect behaviour between cluster-recheck-interval and failure-timeout, under pacemaker 2.0.1
I have three machines in a corosync (3.0.1 if it matters) cluster, managing 12 resources in a single group. I'm following documentation from: https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/ Pacemaker_Explained/s-cluster-options.html and https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/ Pacemaker_Explained/s-resource-options.html I have set a cluster property: cluster-recheck-interval=60s I have set a resource property: failure-timeout=180 The docs say failure-timeout is "How many seconds to wait before acting as if the failure had not occurred, and potentially allowing the resource back to the node on which it failed." I think this should mean that if the resource fails and gets restarted, the fact that it failed will be "forgotten" after 180 seconds (or maybe a little longer, depending on exactly when the next cluster recheck is done). However what I'm seeing is that if the resource fails and gets restarted, and this then happens an hour later, it's still counted as two failures. If it fails and gets restarted another hour after that, it's recorded as three failures and (because I have "migration-threshold=3") it gets moved to another node (and therefore all the other resources in group are moved as well). So, what am I misunderstanding about "failure-timeout", and what configuration setting do I need to use to tell pacemaker that "provided the resource hasn't failed within the past X seconds, forget the fact that it failed more than X seconds ago"? Thanks, Antony. -- The first fifty percent of an engineering project takes ninety percent of the time, and the remaining fifty percent takes another ninety percent of the time. Please reply to the list; please *don't* CC me. _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/