Re: [Linux-HA] Storing arbitrary metadata in the CIB

2013-08-22 Thread Andrew Beekhof
On 22/08/2013, at 10:08 PM, Ferenc Wagner wrote: > Hi, > > Our setup uses some cluster wide pieces of meta information. Think > access control lists for resource instances used by some utilities or > some common configuration data used by the resource agents. Currently > this info is stored i

Re: [Linux-HA] Q: groups of groups

2013-08-22 Thread Andrew Beekhof
On 22/08/2013, at 7:31 PM, Ulrich Windl wrote: > Hi! > > Suppose you have an application A that needs two filesystems F1 and F2. The > filesystems are on separate LVM VGs VG1 and VG2 with LVs L1 and L2, > respectively. The RAID R1 and R2 provide the LVM PVs. > > (Actually we have one group

Re: [Linux-HA] establishing a new resource-agent package provider

2013-08-13 Thread Andrew Beekhof
On 13/08/2013, at 7:41 PM, Lars Marowsky-Bree wrote: > On 2013-08-07T19:16:24, Lars Ellenberg wrote: > > Hi all, > > sorry for being a bit late to the game. I was on vacation for 2,5 weeks > with no internet-enabled equipment. I can highly recommend the > experience ;-) > These are the

Re: [Linux-HA] Moving CIB xml to new servers - UUID values and tags

2013-08-12 Thread Andrew Beekhof
On 12/08/2013, at 7:33 PM, John M wrote: > Hi All, > > I am using heartbeat 2.1.4 and OS is RHEL 5.8. I have few queries > related to CIB xml. > > 1. I want to use existing CIB xml. But whenever I import the file using > cibadmin command, four nodes are added in the new cluster because of un

Re: [Linux-HA] {SPAM 04.2} Re: Many location on ping resources and best practice for connectivity monitoring

2013-08-09 Thread Andrew Beekhof
On 09/08/2013, at 6:42 PM, RaSca wrote: > Il giorno Ven 09 Ago 2013 04:42:28 CEST, Andrew Beekhof ha scritto: > [...] >> That sounds like something playing with the virt bridge when the vm starts. >> Is the host trying to ping through the bridge too? > > Yes. Is this no

Re: [Linux-HA] Many location on ping resources and best practice for connectivity monitoring

2013-08-08 Thread Andrew Beekhof
On 09/08/2013, at 12:48 AM, RaSca wrote: > Il giorno Gio 08 Ago 2013 01:07:06 CEST, Andrew Beekhof ha scritto: >> On 08/08/2013, at 12:37 AM, RaSca wrote: > [...] >>> The problem I got is that when I clone a VM (using virt-clone) >>> everything works fine until

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-08 Thread Andrew Beekhof
On 08/08/2013, at 3:49 PM, Thomas Glanzmann wrote: > Hello Andrew, > >> It really helps to read the output of the commands you're running: > >> Did you not see these messages the first time? > >> apache-03: WARN: Unknown cluster type: any >> apache-03: ERROR: Could not determine the locatio

Re: [Linux-HA] Many location on ping resources and best practice for connectivity monitoring

2013-08-07 Thread Andrew Beekhof
On 08/08/2013, at 12:37 AM, RaSca wrote: > Hi all, > I have a big Pacemaker (1.1.9-1512) cluster with 9 nodes and almost 200 > virtual machines (with the same storage on the bottom). Everything is > based upon KVM and libvirt. > Each VM has got a location, based upon a cloned ping resource on ea

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-07 Thread Andrew Beekhof
On 07/08/2013, at 5:42 PM, Thomas Glanzmann wrote: > Hello Andrew, > >> I can try and fix that if you re-run with -x and paste the output. > > (apache-03) [~] crm_report -l /var/adm/syslog/2013/08/05 -f "2013-08-04 > 18:30:00" -t "2013-08-04 19:15" -x > + shift > + true > + [ ! -z ] > + break

Re: [Linux-HA] establishing a new resource-agent package provider

2013-08-07 Thread Andrew Beekhof
On 08/08/2013, at 3:16 AM, Lars Ellenberg wrote: > On Wed, Jul 31, 2013 at 09:47:04AM +1000, Andrew Beekhof wrote: >>>> On 30/07/2013, at 4:21 PM, Ulrich Windl >>>> wrote: >>>> >>>>>>>> David Vossel schrieb am 30.07.2013 um 01:20

Re: [Linux-HA] Antw: resource agent iSCSILogicalUnit failing unexpectedly on occasion

2013-08-06 Thread Andrew Beekhof
On 07/08/2013, at 4:03 PM, "Ulrich Windl" wrote: >>>> Andrew Beekhof schrieb am 06.08.2013 um 22:08 in >>>> Nachricht > <1037f5af-5bee-44a4-8f71-5e72b940b...@beekhof.net>: > >> On 06/08/2013, at 5:27 PM, "Ulrich Windl" &g

Re: [Linux-HA] Antw: Re: pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-06 Thread Andrew Beekhof
On 07/08/2013, at 4:06 PM, "Ulrich Windl" wrote: >>>> Andrew Beekhof schrieb am 06.08.2013 um 22:10 in >>>> Nachricht > : > >> On 06/08/2013, at 5:24 PM, "Ulrich Windl" >> >> wrote: >> >>>>&

Re: [Linux-HA] Wheezy / heartbeat / pacemaker: Howto make persistent configuration changes

2013-08-06 Thread Andrew Beekhof
On 05/08/2013, at 9:12 PM, Thomas Glanzmann wrote: > Hello Andrew, > >> did they ensure everything was flushed to disk first? > > (apache-03) [/var] cat /proc/sys/vm/dirty_expire_centisecs > 3000 > > So dirty data should be flushed within 3 seconds. But I lost at least 24 > hours maybe even

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-06 Thread Andrew Beekhof
On 06/08/2013, at 2:29 AM, Thomas Glanzmann wrote: > Hello Andrew, > >> You will need to run crm_report and email us the resulting tarball. >> This will include the version of the software you're running and log >> files (both system and cluster) - without which we can't do anything. > > Find

Re: [Linux-HA] Antw: Re: pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-06 Thread Andrew Beekhof
On 06/08/2013, at 5:24 PM, "Ulrich Windl" wrote: Thomas Glanzmann schrieb am 05.08.2013 um 19:03 in > Nachricht <20130805170345.gb...@glanzmann.de>: >> Hello Ulrich, >> >>> Did it happen when you put the cluster into maintenance-mode, or did >>> it happen after someone fiddled with the r

Re: [Linux-HA] Antw: resource agent iSCSILogicalUnit failing unexpectedly on occasion

2013-08-06 Thread Andrew Beekhof
On 06/08/2013, at 5:27 PM, "Ulrich Windl" wrote: > Hi! > > I always wanted to know what "Detected action XXX from a different transition > ..." really means: > Does it indicate a programming error in the cluster stack? To me it sounds as > if at least two parties try to control a thing witho

Re: [Linux-HA] Wheezy / heartbeat / pacemaker: Howto make persistent configuration changes

2013-08-05 Thread Andrew Beekhof
On 05/08/2013, at 5:20 PM, Thomas Glanzmann wrote: > Hello Andrew, > >> Any change to the configuration section is automatically written to >> disk. The cluster only stops doing this if writing to disk fails at >> some point - but there would have been an error in your logs if that >> were the

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-04 Thread Andrew Beekhof
On 05/08/2013, at 3:11 AM, Thomas Glanzmann wrote: > Hello Andrew, > I just got another crash when putting a node into unmanaged node, this > time it hit me hard: > >- Both nodes sucided or snothined each other >- One out of four md devices where detected on both nodes after >

Re: [Linux-HA] Wheezy / heartbeat / pacemaker: Howto make persistent configuration changes

2013-08-04 Thread Andrew Beekhof
On 05/08/2013, at 4:40 AM, Thomas Glanzmann wrote: > Hello, > both nodes of my ha cluster just paniced, afterwards the config was > gone. Is there a command to force heartbeat / pacemaker to write the > config to the disk or do I need to restart heartbeat for persistant > changes. Any change to

Re: [Linux-HA] ping resource colocation question

2013-07-30 Thread Andrew Beekhof
On 31/07/2013, at 11:47 AM, Jeff Frost wrote: > > On Jul 30, 2013, at 6:34 PM, Andrew Beekhof wrote: > >> >> On 27/07/2013, at 3:37 AM, Jeff Frost wrote: >> >>> Is it possible when colocating a ping resource to avoid having the colocated >>> r

Re: [Linux-HA] ping resource colocation question

2013-07-30 Thread Andrew Beekhof
On 27/07/2013, at 3:37 AM, Jeff Frost wrote: > Is it possible when colocating a ping resource to avoid having the colocated > resource stopped if there is nowhere better to start it? Why would you colocate the ping resource with anything? Or do you mean location constraints that look for the at

Re: [Linux-HA] Simple cluster - backup wont start

2013-07-30 Thread Andrew Beekhof
On 30/07/2013, at 1:40 AM, mike wrote: > Hi guys, > > I've got a rather odd issue. We have a simple two node cluster running one > VIP and mysql. Pretty sure I could create this cluster in my sleep. Anyway, > the cluster has been up and running for months with no issues at all. Last > night

Re: [Linux-HA] Antw: Re: establishing a new resource-agent package provider

2013-07-30 Thread Andrew Beekhof
On 31/07/2013, at 5:11 AM, David Vossel wrote: > > > > > - Original Message - >> From: "Andrew Beekhof" >> To: "General Linux-HA mailing list" >> Sent: Tuesday, July 30, 2013 7:46:25 AM >> Subject: Re: [Linux-HA] Antw:

Re: [Linux-HA] Antw: Re: establishing a new resource-agent package provider

2013-07-30 Thread Andrew Beekhof
On 30/07/2013, at 4:21 PM, Ulrich Windl wrote: David Vossel schrieb am 30.07.2013 um 01:20 in Nachricht > <1719265415.18819216.1375140025306.javamail.r...@redhat.com>: > > [...] >>> How does this compare to the Red Hat fence/resource-agent packages? I'm >>> very happy to see "heart

Re: [Linux-HA] does migration-threshold act differently on master/slave clones?

2013-07-16 Thread Andrew Beekhof
On 17/07/2013, at 11:38 AM, Jeff Frost wrote: > I understand that it can't be demoted in the normal sense, I just expected it > to demote after migration-threshold failures were reached and not always on > the first failure. Its a bit more subtle than you might imagine... We don't just demot

Re: [Linux-HA] Antw: crm resource restart is broken (d4de3af6dd33)

2013-07-12 Thread Andrew Beekhof
On 12/07/2013, at 8:23 PM, Lars Marowsky-Bree wrote: > On 2013-07-12T12:19:40, Ulrich Windl > wrote: > >> BTW: The way "resource restart" is implemented (i.e.: stop & wait, then >> start) has a major problem: If stop causes to fence the node where the crm >> command is running, the resource

Re: [Linux-HA] crm node delete

2013-07-12 Thread Andrew Beekhof
On 12/07/2013, at 8:23 PM, Vladislav Bogdanov wrote: > 01.07.2013 17:29, Vladislav Bogdanov wrote: >> Hi, >> >> I'm trying to look if it is now safe to delete non-running nodes >> (corosync 2.3, pacemaker HEAD, crmsh tip). >> >> # crm node delete v02-d >> WARNING: 2: crm_node bad format: 7 v02

Re: [Linux-HA] crm node delete

2013-07-09 Thread Andrew Beekhof
On 10/07/2013, at 3:42 PM, Vladislav Bogdanov wrote: > 10.07.2013 08:38, Andrew Beekhof wrote: >> >> On 10/07/2013, at 3:37 PM, Vladislav Bogdanov wrote: >> >>> 10.07.2013 08:13, Andrew Beekhof wrote: >>>> >>>> On 10/07/2013, at 2:15 PM

Re: [Linux-HA] crm node delete

2013-07-09 Thread Andrew Beekhof
On 10/07/2013, at 3:37 PM, Vladislav Bogdanov wrote: > 10.07.2013 08:13, Andrew Beekhof wrote: >> >> On 10/07/2013, at 2:15 PM, Vladislav Bogdanov wrote: >> >>> 10.07.2013 07:05, Andrew Beekhof wrote: >>>> >>>> On 10/07/2013, at 2:04 PM

Re: [Linux-HA] crm node delete

2013-07-09 Thread Andrew Beekhof
On 10/07/2013, at 2:15 PM, Vladislav Bogdanov wrote: > 10.07.2013 07:05, Andrew Beekhof wrote: >> >> On 10/07/2013, at 2:04 PM, Vladislav Bogdanov wrote: >> >>> 10.07.2013 03:39, Andrew Beekhof wrote: >>>> >>>> On 10/07/2013, at 1:51 AM

Re: [Linux-HA] crm node delete

2013-07-09 Thread Andrew Beekhof
On 10/07/2013, at 2:04 PM, Vladislav Bogdanov wrote: > 10.07.2013 03:39, Andrew Beekhof wrote: >> >> On 10/07/2013, at 1:51 AM, Vladislav Bogdanov wrote: >> >>> 03.07.2013 19:31, Dejan Muhamedagic wrote: >>>> On Tue, Jul 02, 2013 at

Re: [Linux-HA] crm node delete

2013-07-09 Thread Andrew Beekhof
On 10/07/2013, at 1:51 AM, Vladislav Bogdanov wrote: > 03.07.2013 19:31, Dejan Muhamedagic wrote: >> On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote: >>> 01.07.2013 18:29, Dejan Muhamedagic wrote: Hi, On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov

Re: [Linux-HA] Centos 6.4 KVM+ DRBD 8.4.2 + Pacemaker 1.1.10 - Ddbd Monitor Failing!

2013-07-08 Thread Andrew Beekhof
On 04/07/2013, at 9:49 AM, Jimmy Magee wrote: > Hi all, > > I'm endeavouring to setup a drbb pacemaker cluster on two Centos 6.4 kvm's. > The kvm are running on a Centos 6.4 host operating, vms installed on separate > logical volumes, with 1 GB ram allocated to each vm. > Drbb starts manual

Re: [Linux-HA] Preventing MS from starting on certain nodes in a 4-node cluster

2013-07-08 Thread Andrew Beekhof
On 04/07/2013, at 11:16 AM, Leland wrote: > I'm setting up a 4-node cluster where 2 of the nodes will run an > active/active DRBD and the other two nodes will be available in case one of > the "active" nodes fails. > > I can almost get this working fine. But, what I can't seem to figure > out

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Andrew Beekhof
On 03/07/2013, at 7:11 AM, Vladislav Bogdanov wrote: > 02.07.2013 14:55, Andrew Beekhof wrote: >> >> On 02/07/2013, at 8:14 PM, Vladislav Bogdanov wrote: >> >>> 02.07.2013 12:27, Lars Marowsky-Bree wrote: >>>> On 2013-07-02T11:05:01, Vladislav Bo

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Andrew Beekhof
On 02/07/2013, at 8:14 PM, Vladislav Bogdanov wrote: > 02.07.2013 12:27, Lars Marowsky-Bree wrote: >> On 2013-07-02T11:05:01, Vladislav Bogdanov wrote: >> >>> One thing I see immediately, is that node utilization attributes are >>> deleted after I do 'load update' with empty node utilization s

Re: [Linux-HA] Resource Collocation v/s Resource Groups

2013-07-01 Thread Andrew Beekhof
On 21/06/2013, at 7:01 PM, Parkirat wrote: > Hi Andrew, > > Thanks for the reply. I did that experiment again, with Apache and Dummy > Resource. > > Below is my configuration: > > > [root@prod-hb-nmn-002 ~]# crm configure show > node $i

Re: [Linux-HA] Backing out of HA

2013-07-01 Thread Andrew Beekhof
I know you're not looking for fixes, but there are a couple of points I would make: On 02/07/2013, at 6:31 AM, William Seligman wrote: > I'm about to write a transition plan for getting rid of high-availability on > our > lab's cluster. Before I do that, I thought I'd put my reasons before thi

Re: [Linux-HA] corosync binding to 127.0.0.1 instead of correct interface

2013-06-24 Thread Andrew Beekhof
On 24/06/2013, at 8:56 PM, Athanasios Kostopoulos wrote: > Dear list, You might have more luck on the corosync mailing list. Not all their developers are on this one IIRC. > > I have the following problem when trying to implement a two-node failover > cluster, using Hetzner as the hosting pr

Re: [Linux-HA] Resource Collocation v/s Resource Groups

2013-06-19 Thread Andrew Beekhof
On 18/06/2013, at 7:18 PM, Parkirat wrote: > Hi Sven, > > Thanks for replying back. > > By manually stopping, I mean, I am stopping the resource by running the > below command: > /etc/init.d/httpd stop, which is outside the control of pacemaker. > > Also, the entries for target-role is not d

Re: [Linux-HA] Pacemaker & gfs2 on RHEL6.2 x64

2013-06-13 Thread Andrew Beekhof
On 14/06/2013, at 3:57 AM, Guglielmo Abbruzzese wrote: > Hi, > I'm working on a issue from a while but it seems to me I've got to a dead > end :( > > Till now, I've been able to setup and configure several cluster solutions on > RHEL6.2x64 using the RH's Corosync (1.4.1-4) and Pacemaker (1.1.6-

Re: [Linux-HA] ocf HA_RSCTMP directory location

2013-06-13 Thread Andrew Beekhof
On 14/06/2013, at 7:59 AM, David Vossel wrote: > Hey, > > Andrew and I have been running into some inconsistencies between > resource-agent packages that we need to get cleared up. > > There's an ocf variable, HA_RSCTMP, used in many of the resource agents that > represents a place the agent

Re: [Linux-HA] Master/Slave status check using crm_mon

2013-06-12 Thread Andrew Beekhof
On 13/06/2013, at 7:45 AM, Dimitri Maziuk wrote: > On 06/12/2013 04:35 PM, Andrew Beekhof wrote: >> >> On 13/06/2013, at 2:49 AM, John M wrote: >> >>> Dear All, >>> >>> I will try to setup pacemaker cluster in the coming weeks. Before tha

Re: [Linux-HA] Master/Slave status check using crm_mon

2013-06-12 Thread Andrew Beekhof
On 13/06/2013, at 2:49 AM, John M wrote: > Dear All, > > I will try to setup pacemaker cluster in the coming weeks. Before that I > have to complete the configuration using heartbeat 2.1.4. > I would really appreciate if you could suggest the configuration for > Master/Slave scenario mentione

Re: [Linux-HA] Pacemaker: Only the first DRBD is promoted in a group having multiple filesystems which promote individual drbds

2013-06-11 Thread Andrew Beekhof
If you include a crm_report for the scenario you're describing, I can take a look. The config alone does not contain enough information. On 06/06/2013, at 6:57 PM, Thomas Glanzmann wrote: > Hello, > on Debian Wheezy (7.0) I installed pacemaker with heartbeat. When > putting multiple filesystems

Re: [Linux-HA] How to analyze node failure

2013-06-11 Thread Andrew Beekhof
On 12/06/2013, at 1:04 AM, Stefan Schloesser wrote: > Hi, > > I have a setup with 2 nodes, drbd, mysql and apache. Rather too often for my > liking (1 per month) one node is killed (fenced) by the other. Each time I am > unable to find out what actually caused this behaviour. > I can see in

Re: [Linux-HA] Master/Slave status check using crm_mon

2013-06-11 Thread Andrew Beekhof
On 11/06/2013, at 3:13 PM, John M wrote: > Hi All, > >I am using heartbeat 2.1.3 and I have configured a Master/Slave setup > using Stateful RA. > But when I run crm_mon -r -1, it shows the status as "Started". How can > i identify which node is Master or Slave? In this case, none of the

Re: [Linux-HA] Antw: Q: ERROR: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped in state S_RELEASE_DC! (120000ms)

2013-06-06 Thread Andrew Beekhof
On 07/06/2013, at 10:11 AM, Andrew Beekhof wrote: >> >> [Crazy things go on, until it changes to:] >> crmd: [7285]: ERROR: verify_stopped: Resource prm_ping_gw1-v582:1 was active >> at shutdown. You may ignore this error if it is unmanaged. >> >> Hey fo

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-06-06 Thread Andrew Beekhof
On 06/06/2013, at 7:11 PM, Thomas Glanzmann wrote: > Jun 6 10:17:37 astorage1 crmd: [2947]: ERROR: crm_abort: > abort_transition_graph: Triggered assert at te_utils.c:339 : transition_graph > != NULL This is the cause of the coredump. What version of pacemaker is this? Installing pacemaker'

Re: [Linux-HA] Antw: Q: ERROR: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped in state S_RELEASE_DC! (120000ms)

2013-06-06 Thread Andrew Beekhof
On 05/06/2013, at 11:22 PM, Ulrich Windl wrote: > Hi again! > > I haven't fully understood the problem, but it looks as if pacemaker likes to > shoot himself in the foot, and then go crazy when it feels the pain: > > Shortly after maintenance mode was turned on, there was a communication >

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-05 Thread Andrew Beekhof
On 06/06/2013, at 3:45 PM, Vladislav Bogdanov wrote: > 06.06.2013 08:14, Andrew Beekhof wrote: >> >> On 06/06/2013, at 2:50 PM, Vladislav Bogdanov wrote: >> >>> 06.06.2013 07:31, Andrew Beekhof wrote: >>>> >>>> On 06/06/2013, at 2:27 PM

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-05 Thread Andrew Beekhof
On 06/06/2013, at 2:50 PM, Vladislav Bogdanov wrote: > 06.06.2013 07:31, Andrew Beekhof wrote: >> >> On 06/06/2013, at 2:27 PM, Vladislav Bogdanov wrote: >> >>> 05.06.2013 02:04, Andrew Beekhof wrote: >>>> >>>> On 05/06/2013, at 5:08 A

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-05 Thread Andrew Beekhof
On 06/06/2013, at 2:27 PM, Vladislav Bogdanov wrote: > 05.06.2013 02:04, Andrew Beekhof wrote: >> >> On 05/06/2013, at 5:08 AM, Ferenc Wagner wrote: >> >>> Dejan Muhamedagic writes: >>> >>>> On Mon, Jun 03, 2013 at 06:19:06PM +0200, Fer

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-04 Thread Andrew Beekhof
On 05/06/2013, at 5:08 AM, Ferenc Wagner wrote: > Dejan Muhamedagic writes: > >> On Mon, Jun 03, 2013 at 06:19:06PM +0200, Ferenc Wagner wrote: >> >>> I've got a script for resource creation, which puts the new resource in >>> a shadow CIB together with the necessary constraints, runs a simul

Re: [Linux-HA] heartbeat 'ERROR' messages

2013-05-28 Thread Andrew Beekhof
On 29/05/2013, at 8:05 AM, Greg Woods wrote: > On Wed, 2013-05-29 at 07:50 +1000, Andrew Beekhof wrote: > >>> respawn hacluster /usr/lib64/heartbeat/ipfail >>> crm respawn >> >> I don't know about the rest, but definitely do not use both ipfail and crm

Re: [Linux-HA] heartbeat 'ERROR' messages

2013-05-28 Thread Andrew Beekhof
On 29/05/2013, at 2:37 AM, Greg Woods wrote: > I have two clusters that are both running CentOS 5.6 and > heartbeat-3.0.3-2.3.el5 (from the clusterlabs repo). THey are running > slightly different pacemaker versions (pacemaker-1.0.9.1-1.15.el5 on the > first one and pacemaker-1.0.12-1.el5 on the

Re: [Linux-HA] Confirmation about resource group ordering

2013-05-22 Thread Andrew Beekhof
On 22/05/2013, at 10:15 PM, Tony Stocker wrote: > > I just want to make sure that my understanding of resource groups is > correct before I proceed too far. Are the following true statements > regarding resource groups: > > 1. The order in which resource primitives are listed in a res

Re: [Linux-HA] Unable to Configure Heartbeat on RHEL-6.4 64 Bit

2013-05-21 Thread Andrew Beekhof
On 22/05/2013, at 2:23 PM, Digimer wrote: > On 05/21/2013 11:59 PM, Deepak Yadav wrote: >> Dear Team >> >> >> I want to configure the DRBD and Heartbeat on RHEL-6 OS but I am not able >> to find out exact package of heartbeat for the installation. >> Please help me out to resolve this issue !

Re: [Linux-HA] Virtual interface (eth0:0) disappeared

2013-05-20 Thread Andrew Beekhof
On 21/05/2013, at 4:19 PM, Nikita Michalko wrote: > > > Am Dienstag, 21. Mai 2013 00:00:03 schrieb DaveW: >> We are running heartbeat 2.1.3 on CentOS 5.4. Last Monday AM, I > > - Man, so OLD! Any chance to update to the latest version ? In haresources mode, there is very little difference

Re: [Linux-HA] rsync command in a OCF

2013-05-19 Thread Andrew Beekhof
On 16/05/2013, at 7:57 PM, Guglielmo Abbruzzese wrote: > Hi, > I'd like to hear from you if someone has already experienced someting > similar or -in case- to get how to do better. > I need to sync a few files from time to time. Non need for a storage or a > DRBD solution. Active/Passive cluster

Re: [Linux-HA] is delayed fencing possible?

2013-05-15 Thread Andrew Beekhof
On 15/05/2013, at 6:44 PM, Ferenc Wagner wrote: > Andrew Beekhof writes: > >> On 15/05/2013, at 5:23 PM, Ferenc Wagner wrote: >> >>> If a resource fails to stop, the node hosting it is fenced immediately. >>> Isn't it possible to move the other reso

Re: [Linux-HA] Antw: Re: adding a monitor operation online

2013-05-15 Thread Andrew Beekhof
are simply started/stopped as they are added/removed. The resource itself is unaffected (unless the monitor operations report a problem). > I doubt it... > >>>> Ferenc Wagner schrieb am 15.05.2013 um 11:29 in Nachricht > <87li7g6160@lant.ki.iif.hu>: >> And

Re: [Linux-HA] is delayed fencing possible?

2013-05-15 Thread Andrew Beekhof
On 15/05/2013, at 5:23 PM, Ferenc Wagner wrote: > Hi, > > If a resource fails to stop, the node hosting it is fenced immediately. > Isn't it possible to move the other resources off the node before it's > fenced? Sometimes, but you're in trouble if another resource can't stop until the failed

Re: [Linux-HA] adding a monitor operation online

2013-05-15 Thread Andrew Beekhof
On 15/05/2013, at 5:19 PM, Ferenc Wagner wrote: > Hi, > > I learnt that it's possible to change non-unique parameters of a > resource without restarting it if the agent implements the reload > action. On the other hand, adding a monitor operation does not seem to > have an effect until the res

Re: [Linux-HA] shutdowns from the blue

2013-05-13 Thread Andrew Beekhof
On 13/05/2013, at 7:48 PM, Ferenc Wagner wrote: > Andrew Beekhof writes: > >> On 10/05/2013, at 11:37 PM, Ferenc Wagner wrote: >> >>> An hour ago one node (n02) of our 4-node cluster started to shutdown. >> >> Someone, probably the init script, sen

Re: [Linux-HA] shutdowns from the blue

2013-05-12 Thread Andrew Beekhof
On 10/05/2013, at 11:37 PM, Ferenc Wagner wrote: > Hi, > > An hour ago one node (n02) of our 4-node cluster started to shutdown. Someone, probably the init script, sent SIGTERM to pacemakerd. > No idea why. But during shutdown, it asked another node (n01) to shut > down as well: No it didn'

Re: [Linux-HA] Retransmit list and window_size

2013-05-05 Thread Andrew Beekhof
On 03/05/2013, at 5:17 PM, RaSca wrote: > Il giorno Ven 05 Apr 2013 15:29:36 CEST, RaSca ha scritto: > [...] >> It seem that when a configuration message has to run over the ring, in >> some particular cases, everything collapse. Following Florian's article >> I've tried setting up a window_size

Re: [Linux-HA] Network failover and communication channel survival

2013-04-30 Thread Andrew Beekhof
On 30/04/2013, at 9:58 PM, Richard Comblen wrote: > Hi all, > > I have a two node setup with replicated PostgreSQL DB (master/slave > setup). Focus is on keeping the system up and running, not on > capacity. > All good, that works fine. > > Now, a new requirement shows up: the two nodes should

Re: [Linux-HA] Resource moves

2013-04-29 Thread Andrew Beekhof
On 19/04/2013, at 11:22 PM, Marcus Bointon wrote: > On 19 Apr 2013, at 14:48, Florian Crouzat wrote: > >> Well, you kinda answered this when you mentioned "crm_resource -U". >> You should use "unmove" instead of "move". Unmove will remove the >> location constraint where move will create a ne

Re: [Linux-HA] Resource move not moving

2013-04-29 Thread Andrew Beekhof
On 16/04/2013, at 11:50 PM, Marcus Bointon wrote: > I'm running crm using heartbeat 3.0.5 pacemaker 1.1.6 on Ubuntu Lucid 64. > > I have a small resource group containing an IP, ARP and email notifier on a > cluster containing two nodes called proxy1 and proxy2. I asked it to move > nodes, an

Re: [Linux-HA] Antw: Question about forbidden colocation -inf

2013-04-25 Thread Andrew Beekhof
On 24/04/2013, at 8:49 PM, Ulrich Windl wrote: > Hi! > > I remember a complaint from my side that colocation should be symmetrical. Unfortunately many things are much easier to ask for than to implement. > I > guess you'll find the responses via Google. Maybe the other effects can be > deriv

Re: [Linux-HA] LSB Managed Resources - XML Error / Core Dump

2013-04-25 Thread Andrew Beekhof
I have followed up on the equivalent pacemaker mailing list thread. Essentially I asked if the latest http://www.clusterlabs.org/rpm-next packages helped and if someone could open up the core file and print the contents of the input passed to string2xml() On 26/04/2013, at 2:00 AM, Jimmy Magee

Re: [Linux-HA] Question about forbidden colocation -inf

2013-04-25 Thread Andrew Beekhof
On 25/04/2013, at 6:52 PM, fabian.herschel wrote: > So, this seems to mean that the order of the resources for a -inf: > collocation is important and has an impact on the behavior. Absolutely. > > I wonder if it is a normal behavior ? and so we have to really take in > account the order on

Re: [Linux-HA] Best Corosync and Pacemaker Versions for New Cluster

2013-04-25 Thread Andrew Beekhof
On 26/04/2013, at 8:08 AM, "Robinson, Eric" wrote: > We are installing corosync and pacemaker on a brand new RHEL 6.3 cluster > today. When we installed using yum, here are the versions that pulled down > from the repos. > > > pacemaker-libs-1.1.9-1512.el6.x86_64 > pacemaker-1.1.9-1512.el6.x

Re: [Linux-HA] Is CLVM really needed in an active/passive cluster?

2013-04-23 Thread Andrew Beekhof
On 24/04/2013, at 1:57 AM, Angel L. Mateo wrote: > El 22/04/13 13:01, Lars Marowsky-Bree escribió: >> On 2013-04-22T11:14:14, "Angel L. Mateo" wrote: >> >>> The problem I have is that I have firstly configured the cluster with >>> CLVM, but with this I can't create snapshots of my volumes,

Re: [Linux-HA] clean shutdown procedure?

2013-04-23 Thread Andrew Beekhof
On 23/04/2013, at 4:56 PM, Ferenc Wagner wrote: > Dejan Muhamedagic writes: > >> On Fri, Apr 19, 2013 at 08:01:04AM -0600, Greg Woods wrote: >> >>> Is there a recommended method for taking a cluster out of service >>> cleanly? >> >> If you don't want you resources to failover, just stop them

Re: [Linux-HA] clean shutdown procedure?

2013-04-22 Thread Andrew Beekhof
On 23/04/2013, at 1:50 AM, Greg Woods wrote: > On Mon, 2013-04-22 at 10:12 +1000, Andrew Beekhof wrote: >> On Saturday, April 20, 2013, Greg Woods wrote: >> Often one of the >>> nodes gets stuck at "Stopping HA Services" >> >> >> That mea

Re: [Linux-HA] clean shutdown procedure?

2013-04-21 Thread Andrew Beekhof
On Saturday, April 20, 2013, Greg Woods wrote: > I've got two-node clusters running Heartbeat (3.0.3-2.3.el5) and > Pacemaker (1.0.9.1-1.15.el5) from the clusterlabs repo on CentOS 5.9 > (yes, I know my Pacemaker is a little old, but I don't want to upgrade > unless there is some reason to believe

Re: [Linux-HA] Looking parameters for stop the node if service fails

2013-04-17 Thread Andrew Beekhof
On 18/04/2013, at 2:13 AM, Ahmed Munir wrote: > Date: Wed, 17 Apr 2013 15:43:22 +1000 > >> From: Andrew Beekhof >> Subject: Re: [Linux-HA] Looking parameters for stop the node if >>service fails >> To: General Linux-HA mailing list >> Message-ID

Re: [Linux-HA] Looking parameters for stop the node if service fails

2013-04-16 Thread Andrew Beekhof
On 17/04/2013, at 5:28 AM, Ahmed Munir wrote: > Hi, > > I configured Linux HA for Asterisk service where I'm using asterisk RA from > link; > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/asteriskand > it is working fine. > > As per current configuration when I stop hear

Re: [Linux-HA] Clusterlabs repo broken? Can't install pacemaker in RH5

2013-04-16 Thread Andrew Beekhof
On 17/04/2013, at 3:21 AM, Tomàs Núnez wrote: > Hi > > I'm getting errors when I try to download and install pacemaker and > corosync in my RedHat 5. I'm doing the same I've done a dozen times (and > the same I find in http://clusterlabs.org/rpm-next/): > > # wget -O /etc/yum.repos.d/pacemaker

Re: [Linux-HA] Migrating from heartbeat Fedora 17 to Fedora 18 pacemaker

2013-04-15 Thread Andrew Beekhof
On 16/04/2013, at 12:05 AM, Guilsson Guilsson wrote: > Is there a way to continue using my cfg files in Fedora 18 pacemaker ? No. Sorry. > If not, Is there a straightforward (and simple) conversion between my cfg > files to new format ? There was one, but its honestly much simpler to start fr

Re: [Linux-HA] How to

2013-04-15 Thread Andrew Beekhof
On 16/04/2013, at 1:11 AM, Moullé Alain wrote: > Hi, > > I wonder if there is documentation somewhere to know how to exploit such > file for example : /var/lib/pengine/pe-input-890 from the original > zipped file : > /var/lib/pengine/pe-input-890.bz2 Pass it to crm_simulate (-x), you'll be ab

Re: [Linux-HA] Failed to sign on to the LRM error on Corosync Startup

2013-04-15 Thread Andrew Beekhof
ces on each node: > service cman start > service pacemaker start > also ensure they restart: > chkconfig --add cman > chkconfig --add pacemaker > > > Best of luck, > Jimmy. > > > > > On 12 Apr 2013, at 02:11,

Re: [Linux-HA] Behaviour of fence/stonith device fence_imm

2013-04-14 Thread Andrew Beekhof
Marek: Could you possibly comment on this please? On 13/04/2013, at 12:21 AM, Andreas Mock wrote: > Hi all, > > just played with the fence/stonith device fence_imm. > (as part of pacemaker on RHEL6.x and clones) > > It is configured to use the action 'reboot'. > This action seems to cause a g

Re: [Linux-HA] Failed to sign on to the LRM error on Corosync Startup

2013-04-11 Thread Andrew Beekhof
emaker-1.2" >> crm_feature_set="3.0.6" update-origin="node03" update-client="crmd" >> cib-last-written="Tue Apr 9 06:48:33 2013" have-quorum="1" > >> 06:59:20 [14932]node03 cib:debug: readCibXmlFile: [on

Re: [Linux-HA] Antw: Re: Q: "crmd: [13080]: ERROR: do_recover: Action A_RECOVER (0000000001000000) not supported" anybody else?

2013-04-11 Thread Andrew Beekhof
On 11/04/2013, at 7:57 PM, Ulrich Windl wrote: >>>> Andrew Beekhof schrieb am 11.04.2013 um 01:05 in >>>> Nachricht > <0550d2cf-e56d-4693-97cf-43c46df64...@beekhof.net>: > >> On 10/04/2013, at 11:54 PM, Ulrich Windl >> wrote: >>

Re: [Linux-HA] Q: "crmd: [13080]: ERROR: do_recover: Action A_RECOVER (0000000001000000) not supported" anybody else?

2013-04-10 Thread Andrew Beekhof
On 10/04/2013, at 11:54 PM, Ulrich Windl wrote: > Hi! > > I had a situation when one node was periodically fenced when there was a busy > network. The node bing fenced tried to restart crmd after some problem, and > shortly after rejoining the cluster, it was fenced. The message "Apr 5 14

Re: [Linux-HA] Failed to sign on to the LRM error on Corosync Startup

2013-04-08 Thread Andrew Beekhof
lrmd > root 3052 0.0 0.0 76464 2528 ?S07:10 0:00 > /usr/lib64/heartbeat/lrmd > > > Not sure why this is the case? Appreciate any help.. > Have you perhaps specified "ver: 0" for the pacemaker plugin and run "service pacemaker start" ? > Che

Re: [Linux-HA] Failed actions

2013-04-07 Thread Andrew Beekhof
On 23/03/2013, at 9:09 AM, Thomas Glanzmann wrote: > Hello, > I have an openais installation on centos which has logged failed > actions, but the services appear to be 'started'. As I know > heartbeat/pacemaker > if an action fails the service should not be started. Not really. In this case, i

Re: [Linux-HA] Failed to sign on to the LRM error on Corosync Startup

2013-04-07 Thread Andrew Beekhof
This doesn't look promising: lrmd: [4939]: info: G_main_add_SignalHandler: Added signal handler for signal 15 lrmd: [4946]: info: Signal sent to pid=4939, waiting for process to exit lrmd: [4939]: info: G_main_add_SignalHandler: Added signal handler for signal 17 lrmd: [4939]: info: enabling cored

Re: [Linux-HA] Getting Unknown Error for HA + Asterisk

2013-04-07 Thread Andrew Beekhof
On 04/04/2013, at 2:41 AM, Ahmed Munir wrote: > Hi, > > From: David Vossel >> Subject: Re: [Linux-HA] Getting Unknown Error for HA + Asterisk >> To: General Linux-HA mailing list >> Message-ID: <1360103377.672633.1364939425712.javamail.r...@redhat.com> >> Content-Type: text/plain; charset=utf

Re: [Linux-HA] Antw: Re: manage/umanage

2013-03-27 Thread Andrew Beekhof
On Wed, Mar 27, 2013 at 6:30 PM, Moullé Alain wrote: > Hi > Thanks but I never asked "to run monitoring on an unmanaged resource " > ... ? ! > I ask for the opposite : a way to set one resource in a state near to > "umanage", > meaning "umanaged and wo monitoring", and wo to be forced to set all t

Re: [Linux-HA] manage/umanage

2013-03-26 Thread Andrew Beekhof
On Wed, Mar 27, 2013 at 9:13 AM, Arnold Krille wrote: > Hi, > > On Mon, 25 Mar 2013 16:25:54 +0100 Moullé Alain wrote: >> I've tested two things : >> >> 1/ if we set maintenance-mode=true : >> >> all the configured ressources become 'unmanaged' , as displayed >> with crm_mon >> ok start

Re: [Linux-HA] Many Resources Dependent on One Resource Group

2013-03-26 Thread Andrew Beekhof
On Wed, Mar 27, 2013 at 8:23 AM, Robinson, Eric wrote: >> On Wed, Mar 27, 2013 at 6:12 AM, Robinson, Eric >> wrote: >> >> >> > In the simplest terms, we currently have resources: >> >> >> > >> >> >> > A = drbd >> >> >> > B = filesystem >> >> >> > C = cluster IP >> >> >> > D thru J = mysql instanc

Re: [Linux-HA] Many Resources Dependent on One Resource Group

2013-03-26 Thread Andrew Beekhof
On Wed, Mar 27, 2013 at 6:12 AM, Robinson, Eric wrote: >> >> > In the simplest terms, we currently have resources: >> >> > >> >> > A = drbd >> >> > B = filesystem >> >> > C = cluster IP >> >> > D thru J = mysql instances. >> >> > >> >> > Resource group G1 consists of resources B through J, in >> >

Re: [Linux-HA] Problem with migration of OpenVZ CTs

2013-03-26 Thread Andrew Beekhof
On Fri, Mar 22, 2013 at 7:13 PM, Roman Haefeli wrote: > Hi, > > I encountered a problem when performing a live migration of some OpenVZ > CTs. Altough the migration didn't trigger any messages in 'crm_mon' and > was initially performed without any troubles, the resource was restarted > on the targ

Re: [Linux-HA] Many Resources Dependent on One Resource Group

2013-03-26 Thread Andrew Beekhof
On Wed, Mar 27, 2013 at 2:38 AM, Robinson, Eric wrote: >> > In the simplest terms, we currently have resources: >> > >> > A = drbd >> > B = filesystem >> > C = cluster IP >> > D thru J = mysql instances. >> > >> > Resource group G1 consists of resources B through J, in >> that order, and is depend

Re: [Linux-HA] Problem promoting Slave to Master

2013-03-17 Thread Andrew Beekhof
On Thu, Mar 14, 2013 at 9:47 PM, Fredrik Hudner wrote: > Hi all, > I have a problem after I removed a node with the force command from my crm > config. > Originally I had 2 nodes running HA cluster (corosync 1.4.1-7.el6, pacemaker > 1.1.7-6.el6) > > Then I wanted to add a third node acting as qu

Re: [Linux-HA] Problem promoting Slave to Master

2013-03-17 Thread Andrew Beekhof
On Fri, Mar 15, 2013 at 8:49 PM, emmanuel segura wrote: > Hello Fedrik > > Why you have a clone of cl_exportfs_root and you have ext4 filesystem, and > i think this order is not correct > > order o_drbd_before_nfs inf: ms_drbd_nfs:promote g_nfs:start > order o_root_before_nfs inf: cl_exportfs_root

<    1   2   3   4   5   6   7   8   9   10   >