On 05/17/2017 04:56 AM, Klaus Wenninger wrote: > On 05/17/2017 11:28 AM, 井上 和徳 wrote: >> Hi, >> I'm testing Pacemaker-1.1.17-rc1. >> The number of failures in "Too many failures (10) to fence" log does not >> match the number of actual failures. > > Well it kind of does as after 10 failures it doesn't try fencing again > so that is what > failures stay at ;-) > Of course it still sees the need to fence but doesn't actually try. > > Regards, > Klaus
This feature can be a little confusing: it doesn't prevent all further fence attempts of the target, just *immediate* fence attempts. Whenever the next transition is started for some other reason (a configuration or state change, cluster-recheck-interval, node failure, etc.), it will try to fence again. Also, it only checks this threshold if it's aborting a transition *because* of this fence failure. If it's aborting the transition for some other reason, the number can go higher than the threshold. That's what I'm guessing happened here. >> After the 11th time fence failure, "Too many failures (10) to fence" is >> output. >> Incidentally, stonith-max-attempts has not been set, so it is 10 by default.. >> >> [root@x3650f log]# egrep "Requesting fencing|error: Operation reboot|Stonith >> failed|Too many failures" >> ##Requesting fencing : 1st time >> May 12 05:51:47 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 05:52:52 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.8415167d: No data available >> May 12 05:52:52 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> ## 2nd time >> May 12 05:52:52 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 05:53:56 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.53d3592a: No data available >> May 12 05:53:56 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> ## 3rd time >> May 12 05:53:56 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 05:55:01 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.9177cb76: No data available >> May 12 05:55:01 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> ## 4th time >> May 12 05:55:01 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 05:56:05 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.946531cb: No data available >> May 12 05:56:05 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> ## 5th time >> May 12 05:56:05 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 05:57:10 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.278b3c4b: No data available >> May 12 05:57:10 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> ## 6th time >> May 12 05:57:10 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 05:58:14 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.7a49aebb: No data available >> May 12 05:58:14 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> ## 7th time >> May 12 05:58:14 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 05:59:19 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.83421862: No data available >> May 12 05:59:19 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> ## 8th time >> May 12 05:59:19 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 06:00:24 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.afd7ef98: No data available >> May 12 06:00:24 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> ## 9th time >> May 12 06:00:24 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 06:01:28 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.3b033dbe: No data available >> May 12 06:01:28 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> ## 10th time >> May 12 06:01:28 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 06:02:33 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.5447a345: No data available >> May 12 06:02:33 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> ## 11th time >> May 12 06:02:33 rhel73-1 crmd[5269]: notice: Requesting fencing (reboot) of >> node rhel73-2 >> May 12 06:03:37 rhel73-1 stonith-ng[5265]: error: Operation reboot of >> rhel73-2 by rhel73-1 for crmd.5269@rhel73-1.db50c21a: No data available >> May 12 06:03:37 rhel73-1 crmd[5269]: warning: Too many failures (10) to >> fence rhel73-2, giving up >> May 12 06:03:37 rhel73-1 crmd[5269]: notice: Transition aborted: Stonith >> failed >> >> Regards, >> Kazunori INOUE _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org