On Fri, 2018-08-31 at 08:37 +0200, Cesar Hernandez wrote: > Hi > > > > > > > Do you mean you have a custom fencing agent configured? If so, > > check > > the return value of each attempt. Pacemaker should request fencing > > only > > once as long as it succeeds (returns 0), but if the agent fails > > (returns nonzero or times out), it will retry, even if the reboot > > worked in reality. > > > > Yes, custom fencing agent, and it always returns 0 > > > > > > FYI, corosync 2 has a "two_node" setting that includes > > "wait_for_all" > > -- with that, you don't need to ignore quorum in pacemaker, and the > > cluster won't start until both nodes have seen each other at least > > once. > > Well I'm ok with the quorum behaviour but I want to know why it > reboots 3 times on startup. > When both nodes are up and running, and if one node stops responding, > the other node fences it only 1 time, not 3 > > > > > > Do you know why it happens? > > Thanks > Cesar
Check the pacemaker logs on both bodes around the time it happens. One of the nodes will be the DC, and will have "pengine:" logs with "saving inputs". The first thing I'd look for is who requested fencing. The DC will have stonith logs with "Client ... wants to fence ...". The client will either be crmd (i.e. the cluster itself) or some external program. If it's the cluster, I'd look at the "pengine:" logs on the DC before that, to see if there are any hints (node unclean, etc.). Then keep going backward until the ultimate cause is found. -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org