[Pacemaker] Broken links and broken bugzilla

2014-10-17 Thread Andrew Widdersheim
I noticed a few broken links and the bugzilla seems broken as well: http://bugs.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker http://oss.clusterlabs.org/mailman/options/pacemaker Pretty much a lot of stuff on the following seems to need some love: http://clusterlabs.org/w

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-23 Thread Andrew Widdersheim
After setting the crmd-transition-delay to 2 * my ping monitor interval the issues I was seeing before in testing have not re-occurred. Thanks again for the help. ___ Pacemaker mailing list: Pacemaker@oss.clusterl

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-21 Thread Andrew Widdersheim
Well that sounds like exactly what I was looking for and should hopefully solve my problem nicely. Thanks for the info. I'll give it a shot and hopefully let you know how it goes. ___ Pacemaker mailing list: Pacema

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-20 Thread Andrew Widdersheim
Have I just run into a shortcoming with pacemaker? Should I file a bug or RFE somewhere? Seems like there should be another parameter when setting up a pingd resource to tell the DC/policy engine to wait x amount of seconds so that all nodes have shared their connection state before it makes a d

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-16 Thread Andrew Widdersheim
I'm attaching 3 patches I made fairly quickly to fix the installation issues and also an issue I noticed with the ping ocf from the latest pacemaker.  One is for cluster-glue to prevent lrmd from building and later installing. May also want to modify this patch to take lrmd out of both spec file

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-16 Thread Andrew Widdersheim
akers. > Subject: Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the > LRM 7 > From: and...@beekhof.net > Date: Thu, 16 May 2013 15:20:59 +1000 > CC: pacemaker@oss.clusterlabs.org > To: awiddersh...@hotmail.com > > > On 16/05/2013, at 3:16 PM, Andrew Widders

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-16 Thread Andrew Widdersheim
Thanks for the help. Adding another node to the ping host_list may help in some  situations but the root issues doesn't really get solved. Also, the location constraint you posted is very different than mine. Your constraint requires connectivity where as the one I am trying to use looks for best

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-16 Thread Andrew Widdersheim
The cluster has 3 connections total. The first connection is the outside interface where services can communicate and is also used for cluster communication using mcast. The second interface is a cross-over that is solely for cluster communication. The third connection is another cross-over sole

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-15 Thread Andrew Widdersheim
I'll look into moving over to the cman option since that is preferred for RHEL6.4 now if I'm not mistaken. I'll also try out the patch provided and see how that goes. So was LRMD not apart of pacemaker previously and later added? Was it originally apart of heartbeat/cluster-glue? I'm just trying

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-15 Thread Andrew Widdersheim
There are quite a few symlinks of heartbeat pieces back to pacemaker pieces like crmd as an example but lrmd was not one of them: [root@node1 ~]# ls -lha /usr/lib64/heartbeat/crmdlrwxrwxrwx 1 root root 27 May 14 17:31 /usr/lib64/heartbeat/crmd -> /usr/libexec/pacemaker/crmd [root@node1 ~]# ls -lh

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-15 Thread Andrew Widdersheim
I attached logs from both nodes. Yes, we compiled 1.1.6 with heartbeat support for RHEL6.4. I tried 1.1.10 but had issues. I have another thread open on the mailing list for that issue as well. I'm not opposed to doing CMAN or corosync if those fix the problem. We have been using this setup or

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-15 Thread Andrew Widdersheim
These are the libqb versions: libqb-devel-0.14.2-3.el6.x86_64libqb-0.14.2-3.el6.x86_64 Here is a process listing where lrmd is running:[root@node1 ~]# ps auxwww | egrep "heartbeat|pacemaker"root 9553 0.1 0.7 52420 7424 ?SLs May14 1:39 heartbeat: master control processroot

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-15 Thread Andrew Widdersheim
Sorry to bring up old issues but I am having the exact same problem as the original poster. A simultaneous disconnect on my two node cluster causes the resources to start to transition to the other node but mid flight the  transition is aborted and resources are started again on the original node

[Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-15 Thread Andrew Widdersheim
I am running the following versions: pacemaker-1.1.10-rc2 cluster-glue-1.0.11 heartbeat-3.0.5 I was running pacemaker-1.1.6 and things were working fine but after updating to the latest I could not get pacemaker to start with the following message repeated in the logs: crmd[8456]:  warning: do

Re: [Pacemaker] Resource fails to stop

2012-07-26 Thread Andrew Widdersheim
Ah, that makes sense. Thanks for helping me wrap my head around it. Working on setting up STONITH now to avoid this in the future. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/m

[Pacemaker] Resource fails to stop

2012-07-26 Thread Andrew Widdersheim
One of my resources failed to stop due to it hitting the timeout setting. The resource went into a failed state and froze the cluster until I manually fixed the problem. My question is what is pacemaker's default action when it encounters a stop failure and STONITH is not enabled? Is it what I

Re: [Pacemaker] Resource active on both nodes

2012-07-06 Thread Andrew Widdersheim
Update... I did a "crm resource cleanup" of everything and these messages started to look more sane. The logs were now saying this were active only on what is currently the active node. From: awiddersh...@hotmail.com To: pacemaker@oss.clusterlabs.org Date: Thu, 5 Jul 2012 14:47:57 -0400 Subje

[Pacemaker] Resource active on both nodes

2012-07-05 Thread Andrew Widdersheim
I'm seeing messages similar to the following: Jul 5 14:34:06 server1 pengine: [423]: notice: unpack_rsc_op: Operation p_syslog-ng_monitor_0 found resource p_syslog-ng active on server2 Jul 5 14:34:06 server1 pengine: [423]: notice: unpack_rsc_op: Operation p_bacula_monitor_0 found resource p_