Re: [Pacemaker] unknown third node added to a 2 node cluster?

2014-10-22 Thread Brian J. Murrell (brian)
On Mon, 2014-10-13 at 12:51 +1100, Andrew Beekhof wrote: Even the same address can be a problem. That brief window where things were getting renewed can screw up corosync. But as I proved, there was no renewal at all during the period of this entire pacemaker run, so the use of DHCP here is

Re: [Pacemaker] unknown third node added to a 2 node cluster?

2014-10-10 Thread Brian J. Murrell (brian)
On Wed, 2014-10-08 at 12:39 +1100, Andrew Beekhof wrote: On 8 Oct 2014, at 2:09 am, Brian J. Murrell (brian) brian-squohqy54cvwr29bmmi...@public.gmane.org wrote: Given a 2 node pacemaker-1.1.10-14.el6_5.3 cluster with nodes node5 and node6 I saw an unknown third node being added

[Pacemaker] unknown third node added to a 2 node cluster?

2014-10-07 Thread Brian J. Murrell (brian)
Given a 2 node pacemaker-1.1.10-14.el6_5.3 cluster with nodes node5 and node6 I saw an unknown third node being added to the cluster, but only on node5: Sep 18 22:52:16 node5 corosync[17321]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 12: memb=2, new=0, lost=0 Sep

Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2014-02-06 Thread Brian J. Murrell (brian)
On Wed, 2014-01-08 at 13:30 +1100, Andrew Beekhof wrote: What version of pacemaker? Most recently I have been seeing this in 1.1.10 as shipped by RHEL6.5. On 10 Dec 2013, at 4:40 am, Brian J. Murrell brian-squohqy54cvwr29bmmi...@public.gmane.org wrote: I didn't seem to get a response

Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2014-02-06 Thread Brian J. Murrell (brian)
On Thu, 2014-02-06 at 10:42 -0500, Brian J. Murrell (brian) wrote: On Wed, 2014-01-08 at 13:30 +1100, Andrew Beekhof wrote: What version of pacemaker? Most recently I have been seeing this in 1.1.10 as shipped by RHEL6.5. Doh! Somebody did a test run that had not been updated to use

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Brian J. Murrell (brian)
On Wed, 2014-01-15 at 17:11 +1100, Andrew Beekhof wrote: Consider any long running action, such as starting a database. We do not update the CIB until after actions have completed, so there can and will be times when the status section is out of date to one degree or another. But that is

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Brian J. Murrell (brian)
On Thu, 2014-01-16 at 08:35 +1100, Andrew Beekhof wrote: I know, I was giving you another example of when the cib is not completely up-to-date with reality. Yeah, I understood that. I was just countering with why that example is actually more acceptable. It may very well be partially

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-14 Thread Brian J. Murrell (brian)
On Tue, 2014-01-14 at 16:01 +1100, Andrew Beekhof wrote: On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: The local cib hasn't caught up yet by the looks of it. I should have asked in my previous message: is this entirely an artifact of having just restarted or are there any

[Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Brian J. Murrell (brian)
Hi, I found a situation using pacemaker 1.1.10 on RHEL6.5 where the output of crm_resource -L is not trust-able, shortly after a node is booted. Here is the output from crm_resource -L on one of the nodes in a two node cluster (the one that was not rebooted): st-fencing

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Brian J. Murrell (brian)
On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: The local cib hasn't caught up yet by the looks of it. Should crm_resource actually be [mis-]reporting as if it were knowledgeable when it's not though? IOW is this expected behaviour or should it be considered a bug? Should I open a

[Pacemaker] prevent starting resources on failed node

2013-12-06 Thread Brian J. Murrell (brian)
[ Hopefully this doesn't cause a duplicate post but my first attempt returned an error. ] Using pacemaker 1.1.10 (but I think this issue is more general than that release), I want to enforce a policy that once a node fails, no resources can be started/run on it until the user permits it. I have

[Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2013-12-06 Thread Brian J. Murrell (brian)
I seem to have another instance where pacemaker fails to exit at the end of a shutdown. Here's the log from the start of the service pacemaker stop: Dec 3 13:00:39 wtm-60vm8 crmd[14076]: notice: do_state_transition: State transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS