Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

Gao,Yan Mon, 04 Dec 2017 03:52:14 -0800

On 12/02/2017 07:19 PM, Andrei Borzenkov wrote:

30.11.2017 13:48, Gao,Yan пишет:

On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:

SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
VM on VSphere using shared VMDK as SBD. During basic tests by killing
corosync and forcing STONITH pacemaker was not started after reboot.
In logs I see during boot


Nov 22 16:04:56 sapprod01s crmd[3151]:     crit: We were allegedly
just fenced by sapprod01p for sapprod01p
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
process (3151) can no longer be respawned,
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
Pacemaker

SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
stonith with SBD always takes msgwait (at least, visually host is not
declared as OFFLINE until 120s passed). But VM rebots lightning fast
and is up and running long before timeout expires.

I think I have seen similar report already. Is it something that can
be fixed by SBD/pacemaker tuning?

SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.


I tried it (on openSUSE Tumbleweed which is what I have at hand, it has
SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch

disk at all.

It simply waits that long on startup before starting the rest of thecluster stack to make sure the fencing that targeted it has returned. Itintentionally doesn't watch anything during this period of time.


Regards,
  Yan

First, at startup no slot is allocated for a node at all
(confirmed with "sbd list"). I manually allocated slots for both nodes,
then I see that stonith agent does post "reboot" message (confirmed with
"sbd list" again) and sbd never reacts to it. Even after system reboot
message on disk is not cleared.

Removing SBD_DELAY_START and restarting pacemaker (with implicit SBD
restart) immediately cleared pending messages.

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

Reply via email to