On 09/03/2016 08:42 PM, Shermal Fernando wrote: > > Hi, > > > > Currently our system have 99.96% uptime. But our goal is to increase > it beyond 99.999%. Now we are studying the > reliability/performance/features of pacemaker to replace the existing > clustering solution. > > > > While testing pacemaker, I have encountered a problem. If the DC (crm > daemon) is frozen by sending the SIGSTOP signal, crmds in other > machines never start election to elect a new DC. Therefore fail-overs, > resource restartings and other cluster decisions will be delayed until > the DC is unfrozen. > > Is this the default behavior of pacemaker or is it due to a > misconfiguration? Is there any way to avoid this single point of failure? > > > > For the testing, we use Pacemaker 1.1.12 with Corosync 2.3.3 in SLES > 12 SP1 operation system. >
Guess I can reproduce that with pacemaker 1.1.15 & corosync 2.3.6. I'm having sbd with pacemaker-watcher running as well on the nodes. As the node-health is not updated and the cib can be read sbd is happy - as to be expected. Maybe we could at least add something into sbd-pacemaker-watcher to detect the issue ... thinking ... Regards, Klaus > > > > > Regards, > > Shermal Fernando > > > > > > > > > > > > > > > > This e-mail transmission (inclusive of any attachments) is strictly > confidential and intended solely for the ordinary user of the e-mail > address to which it was addressed. It may contain legally privileged > and/or CONFIDENTIAL information. The unauthorized use, disclosure, > distribution printing and/or copying of this e-mail or any information > it contains is prohibited and could, in certain circumstances, > constitute an offence. If you have received this e-mail in error or > are not an intended recipient please inform the sender of the email > and MillenniumIT immediately by return e-mail or telephone (+94-11) > 2416000. We advise that in keeping with good computing practice, the > recipient of this e-mail should ensure that it is virus free. We do > not accept responsibility for any virus that may be transferred by way > of this e-mail. E-mail may be susceptible to data corruption, > interception and unauthorized amendment, and we do not accept > liability for any such corruption, interception or amendment or any > consequences thereof. > > www.millenniumit.com <http://www.millenniumit.com> > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org