Hi, I've read about how important is the relationship between the different parameters of the SBD device (msgwait & watchdog timeout) & Pacemaker's stonith timeout. However I've just encountered something that I never considered: the time elapsed until a node is fully up (after being fenced) against msgwait.
Two nodes: sles11a & sles11b. I fenced sles11a (via Hawk's interface that triggers the sbd resource agent) and watched carefully /var/log/messages on sles11b: Sept 8 11:27:00 sles11b sbd: Writing reset to node slot sles11a Sept 8 11:27:00 sles11b sbd: Messaging delay: 40 [sles11a is rebooting and it comes up in about 12 seconds] [see a bunch of messages joining the cluster] [finally node sles11a is online at about 11:27:25] Sept 8 11:27:40 sles11b sbd: Message successfully delivered [sles11a is put offline!] Sept 8 11:27:41 pengine[4358]: warning: custom_action: Action p_stonith-sdb_monitor_0 on sles11a is unrunnable (pending) I've done it about 5 times and it happens every time. My values are: 20 (watchdog timeout) & 40 (msgwait). I know I know..it's too much for my lab environment but I'm just curious if there's something wrong or if indeed msgwait NEEDS to be ALWAYS less than reboot-time. Thanks, Jorge _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org