[ClusterLabs] New status reporting for starting/stopping resources in 1.1.19-8.el7

2019-08-30 Thread Chris Walker
Hello, The 1.1.19-8 EL7 version of Pacemaker contains a commit ‘Feature: crmd: default record-pending to TRUE’ that is not in the ClusterLabs Github repo. This commit changes the reporting for resources that are in the process of starting and stopping for (at least) crm_mon and crm_resource cr

Re: [ClusterLabs] why is node fenced ?

2019-08-12 Thread Chris Walker
When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for example, Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: pcmk_quorum_notification: Quorum retained | membership=1320 members=1 after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and STONITHed as part

Re: [ClusterLabs] [EXTERNAL] Re: "node is unclean" leads to gratuitous reboot

2019-07-11 Thread Chris Walker
On 7/11/19 6:52 AM, Users wrote: On Thu, Jul 11, 2019 at 12:58 PM Lars Ellenberg wrote: On Wed, Jul 10, 2019 at 06:15:56PM +, Michael Powell wrote: Thanks to you and Andrei for your responses. In our particular situation, we want to be able to operate w

Re: [ClusterLabs] HA domain controller fences newly joined node after fence_ipmilan delay even if transition was aborted.

2018-12-18 Thread Chris Walker
Looks like rhino66-left was scheduled for fencing because it was not present 20 seconds (the dc-deadtime parameter) after rhino66-right started Pacemaker (startup fencing). I can think of a couple of ways to allow all nodes to survive if they come up far apart in time (i.e., father apart than d

[ClusterLabs] short circuiting the corosync token timeout

2018-08-10 Thread Chris Walker
Hello, Before Pacemaker can declare a node as 'offline', the Corosync layer must first declare that the node is no longer part of the cluster after waiting a full token timeout.  For example, if I manually STONITH a node with 'crm -F node fence node2', even if the fence operation happens imme

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-04-26 Thread Chris Walker
: crmd requests STONITH stonith-ng successfully STONITHs node corosync communicates membership change to stonith-ng stonith-ng communicates successful STONITH to crmd cluster reacts to down node Thanks, Chris On Wed, Apr 5, 2017 at 5:07 PM, Chris Walker wrote: > Thanks very much for your reply

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-04-05 Thread Chris Walker
ue, Apr 4, 2017 at 12:47 PM, Ken Gaillot wrote: > On 03/13/2017 10:43 PM, Chris Walker wrote: >> Thanks for your reply Digimer. >> >> On Mon, Mar 13, 2017 at 1:35 PM, Digimer > <mailto:li...@alteeve.ca>> wrote: >> >> On 13/03/17 12:07 PM, Chris Wal

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-03-13 Thread Chris Walker
Thanks for your reply Digimer. On Mon, Mar 13, 2017 at 1:35 PM, Digimer wrote: > On 13/03/17 12:07 PM, Chris Walker wrote: > > Hello, > > > > On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync: > > 2.4.0-4; libqb: 1.0-1), > > it looks like succ

[ClusterLabs] STONITH not communicated back to initiator until token expires

2017-03-13 Thread Chris Walker
Hello, On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync: 2.4.0-4; libqb: 1.0-1), it looks like successful STONITH operations are not communicated from stonith-ng back to theinitiator (in this case, crmd) until the STONITHed node is removed from the cluster when Corosync notices

Re: [ClusterLabs] question about dc-deadtime

2017-01-10 Thread Chris Walker
On Mon, Jan 9, 2017 at 6:55 PM, Andrew Beekhof wrote: > On Fri, Dec 16, 2016 at 8:52 AM, Chris Walker > wrote: > > Thanks for your response Ken. I'm puzzled ... in my case node remain > > UNCLEAN (offline) until dc-deadtime expires, even when both nodes are up > a

Re: [ClusterLabs] question about dc-deadtime

2016-12-15 Thread Chris Walker
, Dec 15, 2016 at 3:26 PM, Ken Gaillot wrote: > On 12/15/2016 02:00 PM, Chris Walker wrote: > > Hello, > > > > I have a quick question about dc-deadtime. I believe that Digimer and > > others on this list might have already addressed this, but I want to > > mak

[ClusterLabs] question about dc-deadtime

2016-12-15 Thread Chris Walker
Hello, I have a quick question about dc-deadtime. I believe that Digimer and others on this list might have already addressed this, but I want to make sure I'm not missing something. If my understanding is correct, dc-deadtime sets the amount of time that must elapse before a cluster is formed (

[ClusterLabs] Node lost early in HA startup --> no STONITH

2015-08-02 Thread Chris Walker
Hello, We recently had an unfortunate sequence on our two-node cluster (nodes n02 and n03) that can be summarized as: 1. n03 became pathologically busy and was STONITHed by n02 2. The heavy load migrated to n02, which also became pathologically busy 3. n03 was rebooted 4. During the startup of