I am running it on centos 6.6. I am killing the "pacemakerd" process using kill -9.
hmm, stonith is used for detection as well? I thought it was used to disable malfunctioning nodes. On Fri, Apr 7, 2017 at 7:58 AM, Ken Gaillot <kgail...@redhat.com> wrote: > On 04/05/2017 05:16 PM, neeraj ch wrote: > > Hello All, > > > > I noticed something on our pacemaker test cluster. The cluster is > > configured to manage an underlying database using master slave primitive. > > > > I ran a kill on the pacemaker process, all the other nodes kept showing > > the node online. I went on to kill the underlying database on the same > > node which would have been detected had the pacemaker on the node been > > online. The cluster did not detect that the database on the node has > > failed, the failover never occurred. > > > > I went on to kill corosync on the same node and the cluster now marked > > the node as stopped and proceeded to elect a new master. > > > > > > In a separate test. I killed the pacemaker process on the cluster DC, > > the cluster showed no change. I went on to change CIB on a different > > node. The CIB modify command timed out. Once that occurred, the node > > didn't failover even when I turned off corosync on cluster DC. The > > cluster didn't recover after this mishap. > > > > Is this expected behavior? Is there a solution for when OOM decides to > > kill the pacemaker process? > > > > I run pacemaker 1.1.14, with corosync 1.4. I have stonith disabled and > > quorum enabled. > > > > Thank you, > > > > nwarriorch > > What exactly are you doing to kill pacemaker? There are multiple > pacemaker processes, and they have different recovery methods. > > Also, what OS/version are you running? If it has systemd, that can play > a role in recovery as well. > > Having stonith disabled is a big part of what you're seeing. When a node > fails, stonith is the only way the rest of the cluster can be sure the > node is unable to cause trouble, so it can recover services elsewhere. > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org