[Pacemaker] heartbeat:anything resource not stop/monitoring after reboot

2013-09-05 Thread David Coulson
We patched and rebooted one of our clusters this morning - I verified that pacemaker is the same as previous, plus it matches another similar cluster. There is a resource in the cluster defined as: primitive re-named-reload ocf:heartbeat:anything \ params binfile=/usr/sbin/rndc

Re: [Pacemaker] named

2013-06-05 Thread David Coulson
On 6/5/13 2:30 PM, paul wrote: Hi. I have followed the Clusters from scratch PDF and have a working two node active passive cluster with ClusterIP, WebDataClone,WebFS and WebSite working. I am using BIND DNS to direct my websites to the cluster address. When I perform a failover which works ok

[Pacemaker] Group attributes ordered/collocated no longer valid?

2013-05-31 Thread David Coulson
Trying to commit a change to a group that looks like this: group gr-ns-auth-ip re-auth6-ns1-ip re-auth6-ns2-ip re-auth-ns1-ip re-auth-ns2-ip re-ns1auth-ip re-ns2auth-ip re-ns3auth-ip \ meta ordered=false collocated=false /tmp/tmpBQyloG.pcmk 118L, 4752C written

Re: [Pacemaker] why so long to stonith?

2013-04-25 Thread David Coulson
On 4/25/13 7:43 PM, Andrew Beekhof wrote: I certainly hope so :) So I should complain to our sales people about this BZ before we upgrade our clusters to 6.4? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org

Re: [Pacemaker] Routing-Ressources on a 2-Node-Cluster

2013-04-15 Thread David Coulson
On Apr 15, 2013, at 1:59 PM, T. nos...@godawa.de wrote: For the access-network I use a different NIC, the nodes are in different networks, NodeA has 10.20.11.70, NodeB has 10.20.12.70 and I have configured a cluster-ip, the active node gets, (10.20.10.70). Are they really on different

Re: [Pacemaker] Wrong system send arp reply when using IPaddr

2013-04-09 Thread David Coulson
On 4/7/13 10:29 PM, Andrew Beekhof wrote: Really really weird, I've got nothing :( We've added SPANs on the switches for the two boxes in the cluster, so we can hopefully at least identify that the ARP frame didn't come from them. Of course, we've not had an occurrence of it in almost a

Re: [Pacemaker] Wrong system send arp reply when using IPaddr

2013-04-09 Thread David Coulson
On 4/9/13 7:18 PM, Andrew Beekhof wrote: Pacemaker is not supported in 6.3 and all I am allowed to say at this point[1] is that your configuration isn't supportable for 6.4 Not because you've configured anything wrong/badly, but because a specific application is not present. However, if I

Re: [Pacemaker] rhel6/cman+pacemaker - how to use clvm?

2013-04-08 Thread David Coulson
On 4/8/13 6:42 AM, Yuriy Demchenko wrote: The purpose of my cluster is to provide HA VM and routing/gateway (thus RHCS isn't an option for me - no IPaddr2 and Route resources). But I cannot find any documentation how to use cLVM in cman+pacemaker cluster, everything I found requires use of

Re: [Pacemaker] rhel6/cman+pacemaker - how to use clvm?

2013-04-08 Thread David Coulson
On 4/8/13 7:37 AM, Vadym Chepkov wrote: What if a clustered volume group appears only when pacemaker establishes iSCSI connection? just make sure you activate the VG before trying to mount anything. ___ Pacemaker mailing list:

Re: [Pacemaker] Wrong system send arp reply when using IPaddr

2013-03-18 Thread David Coulson
On 3/18/13 5:24 PM, Andrew Beekhof wrote: So: 1. the IP moved from 01 to 02 2. 01 was then rebooted 3. a long time passes 4. 01 starts arping for the IP Is that what you're saying? Is the problem transient or does it persist? Went like this - IP movements are all by Pacemaker/IPaddr resource

[Pacemaker] Wrong system send arp reply when using IPaddr

2013-03-17 Thread David Coulson
First off, I'm going to preface this with the realization that what I am explaining makes no sense, doesn't follow normal logic and I'm not a complete idiot. I've beaten my head against a wall with this issue for two days, and have made no progress, yet we've had a couple of production system

Re: [Pacemaker] apache problems when moving an Ipaddr2 resource

2013-03-17 Thread David Coulson
What is the specific error you get from Apache? Does it not start, or does it just not work properly? How are you ensuring your two nodes have the same apache configuration? David On Mar 17, 2013, at 8:13 PM, Luis Daniel Lucio Quiroz wrote: strange, i have 2 hosts in a cluster cluster

Re: [Pacemaker] RHEL/CentOS 6.4: corosync - CMAN migration

2013-03-17 Thread David Coulson
On Mar 11, 2013, at 7:32 PM, Andrew Beekhof wrote: In fact prior to 6.4, Pacemaker only had Tech Preview status - using the CMAN plugin instead of our home grown one was key to that changing. Is Pacemaker not tech preview in 6.4 anymore? What is the support status of Pacemaker on 6.4?

Re: [Pacemaker] Wrong system send arp reply when using IPaddr

2013-03-17 Thread David Coulson
and...@beekhof.net wrote: On Mon, Mar 18, 2013 at 3:17 AM, David Coulson da...@davidcoulson.net wrote: First off, I'm going to preface this with the realization that what I am explaining makes no sense, doesn't follow normal logic and I'm not a complete idiot. I've beaten my head against a wall

Re: [Pacemaker] Correct order of meta/params in crm

2013-03-03 Thread David Coulson
On 3/2/13 8:22 AM, Lars Marowsky-Bree wrote: Unless it annoys you, this is actually harmless. Otherwise, params first is what I tend to use. Regards, Lars We've seen instances where failure-timeout is set, but Pacemaker never seems to clean up the failure. First thought was it didn't

Re: [Pacemaker] Correct order of meta/params in crm

2013-03-03 Thread David Coulson
On 3/3/13 1:00 PM, Lars Marowsky-Bree wrote: My memory may be very faulty, but I thought this didn't lead to the failure actually be cleaned up automatically, but merely ignored post-timeout. Perhaps 'clean up' is the wrong phrase. But I've absolutely seen it remove the failure out of 'crm_mon

[Pacemaker] Correct order of meta/params in crm

2013-03-02 Thread David Coulson
Running Pacemaker 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 I noticed we have inconsistent ordering of meta/params in our configuration - For some resources, meta comes before params, in some cases after. In the case below, both. I am assuming meta before params is the correct way

Re: [Pacemaker] time synchronisation

2012-12-19 Thread David Coulson
On 12/19/12 5:06 AM, James Harper wrote: What is the best way on bootup in the above situation to ensure time synchronisation? Is it as simple as having a cron job to reset the hardware clock every so often so that on reboot things are reasonable? At least RHEL and SuSE can do an explicit

Re: [Pacemaker] Configure simple Network Load-balancing

2012-11-30 Thread David Coulson
All the connections, from how many clients? You might be better off using LVS for this. David On 11/30/12 7:15 AM, Ratko Dodevski wrote: Hi guys, I need some help on configuring NLB for my application servers. I've installed tomcat on 4 servers and I've decided to use Linux HA for Network

Re: [Pacemaker] Configure simple Network Load-balancing

2012-11-30 Thread David Coulson
up till 3000 in the next two months. Servers have Tomcat installed on them, so basically I need to load balance connections from outside to the Tomcat. Regards On Fri, Nov 30, 2012 at 1:19 PM, David Coulson da...@davidcoulson.net mailto:da...@davidcoulson.net wrote: All the connections

Re: [Pacemaker] Configure simple Network Load-balancing

2012-11-30 Thread David Coulson
server etc... Is it something wrong with my configuration or that's the way it's working? Thanks and regards On Fri, Nov 30, 2012 at 1:36 PM, David Coulson da...@davidcoulson.net wrote: I would add HA to your existing HA config - The primary issue you have right now is that if tomcat

Re: [Pacemaker] Error while mount gfs2 filesystem in active/active clustering

2012-09-13 Thread David Coulson
On 9/13/12 7:33 AM, ecfgijn wrote: Hi All , I have configure active/active clustering in centos-6.2. But when i try to mount gfs2 file system i am getting an error , which is mentioned below [root@node1 ~]# mount /dev/sdb1 /mnt/ gfs_controld join connect error: Connection refused error

Re: [Pacemaker] Expired fail-count doesn't get cleaned up.

2012-08-14 Thread David Coulson
On 8/13/12 8:01 PM, Andrew Beekhof wrote: You might be experiencing: + David Vossel (5 months ago) 9263480: Low: pengine: cl#5025 - Automatically clear failures when resource configuration changes. But if you send us a crm_report tarball coving the period during which you had problems, we can

Re: [Pacemaker] None of the standard agents in ocf:heartbeat are working in centos 6

2012-07-23 Thread David Coulson
We run many RHEL6 clusters using cman/corosync/pacemaker with SELinux enabled. Doubt that is the problem. The original poster wasn't using cman, but I'm not sure that makes a substantial difference. On 7/23/12 7:15 AM, Vladislav Bogdanov wrote: 23.07.2012 08:06, David Barchas wrote: Hello.

Re: [Pacemaker] Pull the plug on one node, the resource doesn't promote the other node to master

2012-07-20 Thread David Coulson
Use ping to set an attribute, then add a location. primitive re-ping-core ocf:pacemaker:ping \ meta failure-timeout=60 \ params name=at-ping-core host_list=10.250.52.1 multiplier=100 attempts=5 \ op monitor interval=20 timeout=15 \ op start interval=0 timeout=5

[Pacemaker] Restart clone service on only one node

2012-06-25 Thread David Coulson
I've a couple of cloned resources which need to be restarted one at a time as part of a batch process. If I do a 'crm -w resource restart cl-whatever', it restarts the whole lot at once. I can do a 'service appname stop' on each box, wait for pacemaker to notice it is down, then let it

Re: [Pacemaker] 2-node clusters, who's the master now.

2012-06-07 Thread David Coulson
If you are running two nodes, you need to tell pacemaker you don't care if it can't get quorum, by only having 1 of 2 nodes available. Neither node which take over in this event know if there is split brain or not, so you will need to make sure you have sufficient infrastructure between the

Re: [Pacemaker] KVM DRBD and Pacemaker

2012-06-04 Thread David Coulson
Can you post your pacemaker config on pastebin? David On Jun 4, 2012, at 3:51 PM, Cliff Massey wrote: I am trying to setup a cluster consisting of KVM DRBD and pacemaker. Without pacemaker DRBD and KVM are working. I can even stop everything on one node, promote the other to drbd

Re: [Pacemaker] VIP on Active/Active cluster

2012-05-14 Thread David Coulson
Cloning IPAddr2 resources utilizes the iptables CLUSTERIP rule. Probably a good idea to start looking at it w/ tcpdump and seeing if either box gets the icmp echo-request packet (from a ping) and determining if it just doesn't respond properly, doesn't get it at all, or something else. I'd say

Re: [Pacemaker] lvm resource doesn't start

2012-05-12 Thread David Coulson
If clvmd hangs, you probably don't have fencing configured properly - It will block IO until a node is fenced correctly. On May 12, 2012, at 12:16 PM, Frank Van Damme wrote: Hi list, I'm assembling a cluster for A/P nfs on Debian Squeeze (6.0). For flexibility I want with LVM. So the

Re: [Pacemaker] VIP on Active/Active cluster

2012-05-09 Thread David Coulson
What application is running on the nodes? Sent from my iPad On May 9, 2012, at 3:10 PM, Paul Damken zen.su...@gmail.com wrote: Hello, I wonder if someone can light me on how to handle the following cluster scene: 2 Nodes Cluster (Active/Active) 1 Cluster managed VIP - RoundRobin ?

Re: [Pacemaker] Failed Actions

2012-05-04 Thread David Coulson
Why not run two separate clusters - One for VMs, one for DRBD. You can create a group containing the resources and have the location constraint reference the group - You probably want to set the group to 'ordered=false' and 'collocated=false'. That said, if you split your environment into two

Re: [Pacemaker] Nodes will not promote DRBD resources to master on failover

2012-03-24 Thread David Coulson
Shutdown pacemaker and fix your drbd disk first. Get them both uptodate/uptodate and make sure you can manually switch them to primary on each node. Node2 can't become primary when it's not connected to something with an uptodate disk. On 3/24/12 3:15 PM, Andrew Martin wrote: Hi Andreas,

Re: [Pacemaker] Dnsmasq

2012-03-23 Thread David Coulson
Did dnsmasq log that it is listening on the cluster address? You could try adding an iptables nat rule to the box and see if that works. Nat the cluster address for port 53 to the local server ip. Sent from my iPad On Mar 23, 2012, at 9:35 PM, Gregg Stock gr...@damagecontrolusa.com wrote:

Re: [Pacemaker] Deleting the resource while it's running

2012-03-22 Thread David Coulson
On 3/22/12 5:09 AM, Ante Karamatic wrote: Hi I've came across an odd behavior, which might be considered as inconsistent. As we know, pacemaker doesn't allow deleting a resource that's running, but this doesn't produce same behavior every time. Let's take a VM with a default stop timeout (90

Re: [Pacemaker] pacemaker + rhel6 and clvmd

2012-03-14 Thread David Coulson
Are you running 'real' RHEL6? If so, cman + clvmd + gfs2 is the way to go. You can run pacemaker on top of all of that (without openais) to manage your resources if you don't want to use rgmanager. I've never tried to run clvmd out of pacemaker, but there is a init.d script for it in RHEL6,

[Pacemaker] Resource inter-dependency without being a 'group'

2012-02-18 Thread David Coulson
I have an active/active LVS cluster, which uses pacemaker for managing IP resources. Currently I have one environment running on it which utilizes ~30 IP addresses, so a group was created so all resources could be stopped/started together. Downside of that is that all resources have to run on

Re: [Pacemaker] Resource inter-dependency without being a 'group'

2012-02-18 Thread David Coulson
On 2/18/12 4:33 PM, Florian Haas wrote: Is setting meta collocated=false not working for your group? Yep, I found that option shortly after posting my email question. Need to try it in production tomorrow morning, but it worked in my dev environment with dummy resources. Thanks- David

Re: [Pacemaker] Resource inter-dependency without being a 'group'

2012-02-18 Thread David Coulson
On 2/18/12 4:33 PM, Florian Haas wrote: Is setting meta collocated=false not working for your group? Along similar lines, if I have default-resource-stickiness=200 set, what is the best way to 'rebalance' resources following a node failure? In general, if I lose a node I don't want resources

Re: [Pacemaker] Pacemaker in RHEL6.

2011-08-10 Thread David Coulson
On 8/10/11 11:43 AM, Marco van Putten wrote: Thanks Andreas. But our managers persist on using Redhat. I think the idea would be to take the HA packages distributed with Scientific Linux 6.x and run them on RHEL. Note that even when you do subscribe to the HA add-on in RHEL6, pacemaker is