Re: [Linux-HA] Antw: What about start-delay attribute status ?

2011-12-06 Thread Andrew Beekhof
On Tue, Nov 29, 2011 at 12:40 AM, Dejan Muhamedagic deja...@fastmail.fm wrote: On Mon, Nov 28, 2011 at 10:36:06AM +1100, Andrew Beekhof wrote: On Thu, Nov 24, 2011 at 8:52 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Wed, Nov 23, 2011 at 08:52:43AM +1100, Andrew Beekhof wrote

Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery

2011-11-28 Thread Andrew Beekhof
On Mon, Nov 28, 2011 at 7:16 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 28.11.2011 um 00:26 in Nachricht CAEDLWG0LxjrvRd0mOQEpe0NrY+-X=pslkxrn0lhpceady6q...@mail.gmail.com: On Fri, Nov 25, 2011 at 11:54 PM, Florian Haas flor

Re: [Linux-HA] is it good to create order constraint for sbd resource

2011-11-28 Thread Andrew Beekhof
in general, no pacemaker handles this internally when necessary On Tue, Nov 29, 2011 at 4:54 AM, Muhammad Sharfuddin m.sharfud...@nds.com.pk wrote: is it good/required to create order constraint for sbd resource I am using following fencing resource: primitive sbd_stonith

Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery

2011-11-28 Thread Andrew Beekhof
On Tue, Nov 29, 2011 at 7:54 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 11/28/2011 02:37 PM, Andrew Beekhof wrote: On Mon, Nov 28, 2011 at 7:16 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: And therefore you need to monitor the _unmanaged_ resource? Strange. Now

Re: [Linux-HA] Pacemaker : how to modify configuration ?

2011-11-28 Thread Andrew Beekhof
You could probably do something with cibadmin, grep and sed. On Tue, Nov 29, 2011 at 1:04 AM, alain.mou...@bull.net wrote: Hi sorry but I forgot if there is another way than crm configure edit to modify all the value of on-fail= for all resources in the configuration ? Thanks Alain

Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery

2011-11-27 Thread Andrew Beekhof
On Fri, Nov 25, 2011 at 11:54 PM, Florian Haas flor...@hastexo.com wrote: On 11/25/11 13:29, Lars Ellenberg wrote: From the log snippet it's not entirely clear whether that's a recurring monitor (interval == whatever you configured, or 20 if default), or a probe (interval == 0). A recurring

Re: [Linux-HA] Antw: What about start-delay attribute status ?

2011-11-27 Thread Andrew Beekhof
export PCMK_trace_functions=function1,function2,function3 this results in /all/ messages from those three functions being displayed. On Thu, Nov 24, 2011 at 10:12 PM, alain.mou...@bull.net wrote: Hi Sorry Andrew, but I don't understand your answer ... ? Regards Alain

Re: [Linux-HA] Antw: What about start-delay attribute status ?

2011-11-27 Thread Andrew Beekhof
On Thu, Nov 24, 2011 at 8:52 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Wed, Nov 23, 2011 at 08:52:43AM +1100, Andrew Beekhof wrote: On Wed, Nov 23, 2011 at 4:05 AM, Dejan Muhamedagic deja...@fastmail.fm wrote: It could be that the init script on your platform doesn't

Re: [Linux-HA] The active trap of the SNMP is delayed.

2011-11-27 Thread Andrew Beekhof
On Thu, Nov 24, 2011 at 7:50 PM, Gao,Yan y...@suse.com wrote: Hi Hideo, On 11/24/11 15:48, renayama19661...@ybb.ne.jp wrote: Hi Yan, About this matter, were you selected? Best Regards, Hideo Yamauchi. --- On Wed, 2011/9/21, Gao,Yan y...@novell.com wrote: On 09/19/11 12:19,

Re: [Linux-HA] Stonith SBD not fencing nodes

2011-11-23 Thread Andrew Beekhof
On Thu, Nov 24, 2011 at 1:42 AM, Hal Martin hal.mar...@gmail.com wrote: After much Googling and reading past linux-ha discussions it seems like this could be an issue with my configuration. However, I can't find any issues with my configuration after running: # stonith -t external/sbd

Re: [Linux-HA] Stonith SBD not fencing nodes

2011-11-23 Thread Andrew Beekhof
On Thu, Nov 24, 2011 at 1:29 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2011-11-24T11:14:05, Andrew Beekhof and...@beekhof.net wrote: Relevant portions of crm config: primitive stonith-sbd stonith:external/sbd \        meta is-managed=true target-role=Started Looks like you forgot

Re: [Linux-HA] Error running corosync

2011-11-23 Thread Andrew Beekhof
On Mon, Nov 21, 2011 at 10:54 AM, Nick Khamis sym...@gmail.com wrote: Hello Andrew, Thank you so much for your response. I did manage to get an active/active cluster working using cman+pacemaker. Everything works fine except for the occasional error from fenced, and a kernel crash from

Re: [Linux-HA] The master server of HA system suddently roll over

2011-11-22 Thread Andrew Beekhof
On Wed, Nov 23, 2011 at 8:31 AM, Andreas Kurz andr...@hastexo.com wrote: On 11/22/2011 01:27 AM, tyo...@globalchoice.us wrote: Dear sirs:     This is Yang. I set up 2 database server using heartbeat (heartbeat-2.1.3-3.el5.centos.rpm). They are running for over 40 days /me falls over

Re: [Linux-HA] Antw: What about start-delay attribute status ?

2011-11-22 Thread Andrew Beekhof
On Wed, Nov 23, 2011 at 4:05 AM, Dejan Muhamedagic deja...@fastmail.fm wrote: On Tue, Nov 22, 2011 at 04:44:50PM +0100, alain.mou...@bull.net wrote: Hi again, that's strange because I did tests around this parameter LRMD_MAX_CHILDREN, with 24 Dummy resources, therefore resources which do

Re: [Linux-HA] Antw: What about start-delay attribute status ?

2011-11-22 Thread Andrew Beekhof
On Tue, Nov 22, 2011 at 11:36 PM, alain.mou...@bull.net wrote: Hi, on RHEL6, I have :  cat /etc/sysconfig/pacemaker # Variables for running child daemons under valgrind and/or checking for memory problems #export G_SLICE=always-malloc #export MALLOC_PERTURB_=221 # or 0 #export

Re: [Linux-HA] Question about groups

2011-11-21 Thread Andrew Beekhof
On Mon, Nov 21, 2011 at 6:49 PM, Florian Haas flor...@hastexo.com wrote: On 11/21/11 08:35, alain.mou...@bull.net wrote: Hi It seems that my last email a week ago has been lost, don't know why ... so : does migration-threshold parameter in the meta of group is allowed/efficient ? I've

Re: [Linux-HA] Error running corosync

2011-11-20 Thread Andrew Beekhof
On Sat, Nov 19, 2011 at 1:02 AM, Nick Khamis sym...@gmail.com wrote: Hello Andrew, Thank you so much for your response. My concern was elimination as much of cman as possible, Then don't use it at all. since the goal was to run pacemaker on top of corosync/openais however, from

Re: [Linux-ha-dev] VirtualDomain issue

2011-11-17 Thread Andrew Beekhof
On Mon, Nov 14, 2011 at 9:58 PM, Dejan Muhamedagic de...@suse.de wrote: Hi, On Thu, Jun 23, 2011 at 07:51:48AM +0200, Dominik Klein wrote: Hi code snippet from http://hg.linux-ha.org/agents/raw-file/7a11934b142d/heartbeat/VirtualDomain (which I believe is the current version)

Re: [Linux-HA] [ha-wg-technical] Problems in XML CIB

2011-11-17 Thread Andrew Beekhof
On Thu, Nov 17, 2011 at 8:18 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2011-11-17T09:09:29, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hello, I tried to understand the structure of the XML CIB a little bit better, and I found at least two major problems: That's something

Re: [Linux-HA] Error running corosync

2011-11-17 Thread Andrew Beekhof
: Cluster connection established.  Local node id: 1 1321287982 setup_stack@174: Added Pacemaker as client 1 with fd -1 Thanks in Advance, Nick. On Sun, Nov 13, 2011 at 7:44 PM, Andrew Beekhof and...@beekhof.net wrote: On Mon, Nov 14, 2011 at 11:12 AM, Nick Khamis sym...@gmail.com wrote: Hello

Re: [Linux-HA] AP7920 stonith-ng problems (pcmk/cman)

2011-11-17 Thread Andrew Beekhof
On Tue, Nov 15, 2011 at 8:36 AM, dyna h...@dyna.nu wrote: Hi, I have a 2 node cluster using cman and pacemaker. It was working great with meatware, so i decided to get myself an AP7920 hardware fencing device, however i can't seem to get it working from pacemaker using apcmastersnmp. (Both

Re: [Linux-HA] Error running corosync

2011-11-13 Thread Andrew Beekhof
On Sat, Nov 12, 2011 at 12:06 AM, Nick Khamis sym...@gmail.com wrote: Hello Andrew, I do appologize for this, and really appreciate how far I have got into this project thanks to everyone's help. Just as a quick summary: the patch that you suggested did in fact fix the following (ais.c:346):

Re: [Linux-HA] rhel 6.1 gfs2 pacemaker

2011-11-13 Thread Andrew Beekhof
On Tue, Nov 8, 2011 at 3:02 AM, Eric Mueller mcss_...@yahoo.com wrote: I have followed the clusters from scratch tutorial, however i have some descrepencies that i want to make sure are correct. everything seems to be functioning properly such as gfs2 and dlm. although, i am unable to clone

Re: [Linux-HA] Error running corosync

2011-11-13 Thread Andrew Beekhof
ocfs2_controld? If you're running cman, use the cman one Thanks for Everything! Nick. If it's cman On Sun, Nov 13, 2011 at 6:49 PM, Andrew Beekhof and...@beekhof.net wrote: On Sat, Nov 12, 2011 at 12:06 AM, Nick Khamis sym...@gmail.com wrote: Hello Andrew, I do appologize

Re: [Linux-HA] Documentation issue: broken html anchors

2011-11-13 Thread Andrew Beekhof
Looks like a publican bug unfortunately (thats what we use to build the docs from the docbook sources) On Thu, Oct 20, 2011 at 8:24 PM, Florian Crouzat gen...@floriancrouzat.net wrote: Hi, There is a small issue in the HTML documentation for Pacemaker explained. I only looked in the

Re: [Linux-HA] Error running corosync

2011-11-10 Thread Andrew Beekhof
On Tue, Nov 8, 2011 at 1:08 PM, Tim Serong tser...@suse.com wrote: On 11/07/2011 11:34 PM, Nick Khamis wrote: Hello Everyone, After being unsuccessful trying to get cman+pacemaker working, I decided to try the latest committed version of pacemaker git clone

Re: [Linux-HA] [Pacemaker] pcmk + corosync + cman for dlm support?

2011-11-03 Thread Andrew Beekhof
On Thu, Nov 3, 2011 at 4:32 PM, Tim Serong tser...@suse.com wrote: On 11/03/2011 04:11 PM, Vladislav Bogdanov wrote: 02.11.2011 16:36, Nick Khamis wrote: Vladislav, Thank you so much for your response. Just to make sure, all I need is to: * Apply the three patches to cman. Found here

Re: [Linux-ha-dev] attrd and repeated changes

2011-11-02 Thread Andrew Beekhof
On Sat, Oct 22, 2011 at 7:14 AM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, Oct 20, 2011 at 08:48:36AM -0600, Alan Robertson wrote: On 10/20/2011 03:41 AM, Philipp Marek wrote: Hello, when constantly sending new data via attrd the changes are never used. Example:    

Re: [Linux-HA] [corosync] crm not connecting with cman/corosync instance

2011-11-02 Thread Andrew Beekhof
On Wed, Nov 2, 2011 at 6:38 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 01.11.2011 11:01, Andrew Beekhof wrote: On Tue, Nov 1, 2011 at 12:02 PM, Nick Khamis sym...@gmail.com wrote: I included /etc/corosync/service.d/pcmk: service {        # Load the Pacemaker Cluster Resource Manager

Re: [Linux-HA] [Pacemaker] pcmk + corosync + cman for dlm support?

2011-11-02 Thread Andrew Beekhof
? Maybe fedora 17 Thanks in Advance, Nick. On Mon, Oct 31, 2011 at 4:52 AM, Andrew Beekhof and...@beekhof.net wrote: On Sat, Oct 29, 2011 at 3:09 AM, Nick Khamis sym...@gmail.com wrote: Hello Gents, Thank you so much for your response. That being said, what are the plans once the next

Re: [Linux-HA] [corosync] crm not connecting with cman/corosync instance

2011-11-02 Thread Andrew Beekhof
On Thu, Nov 3, 2011 at 12:18 AM, Nick Khamis sym...@gmail.com wrote: I start the cman init script (which also starts corosync), and the start pcmk. Is there anyway to get cman to pcmk as well? i assume you forgot the word start in there. the answer though, is no

Re: [Linux-HA] [corosync] Trouble with active/active

2011-11-02 Thread Andrew Beekhof
Please avoid starting a new thread every 5s. Its way too much noise. Pick one software stack and concentrate on getting that working rather than attempting them all at once. On Tue, Nov 1, 2011 at 8:15 AM, Nick Khamis sym...@gmail.com wrote: Hello Everyone, I have the following built from

Re: [Linux-HA] Does ANYTHING Work on RHEL6?

2011-11-01 Thread Andrew Beekhof
On Tue, Nov 1, 2011 at 9:15 AM, Robinson, Eric eric.robin...@psmnv.com wrote: Florian's suggestion sounds like a good start for you.  After that, try firewalls and selinux. Well, sheesh, it was selinux. Write that one down, folks. Selinux causes error 6 problem when initializing the ring.

Re: [Linux-HA] [corosync] crm not connecting with cman/corosync instance

2011-11-01 Thread Andrew Beekhof
not support the 'cman' cluster infrastructure.  Terminating. pacemakerd -$ Pacemaker 1.1.6 Written by Andrew Beekhof Thanks in Advance, Nick. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also

Re: [Linux-HA] Could not get the ring status, the error is: 6

2011-11-01 Thread Andrew Beekhof
I think we pointed the finger at selinux in another thread On Mon, Oct 31, 2011 at 10:48 AM, Robinson, Eric eric.robin...@psmnv.com wrote: I just installed and configured corosync-1.2.3-21.el6_0.1.x86_64 on RHEL6. At startup, the corosync log appears to be complete except for the line, A

Re: [Linux-HA] partial recovery - odd status - help

2011-11-01 Thread Andrew Beekhof
On Sun, Oct 30, 2011 at 11:35 AM, Miles Fidelman mfidel...@meetinghouse.net wrote: Hi, I have a basic 2-node HA cluster - Xen over DRBD, pacemaker, etc. I've seen this happen several times: - reboot a node: VM goes down on one node, fails over to the other node cleanly - restart the

Re: [Linux-HA] PCMK + OCFS2

2011-10-31 Thread Andrew Beekhof
On Thu, Oct 27, 2011 at 12:19 AM, alain.mou...@bull.net wrote: And last thing I forgot to write : as far as I remind, the problem with controld.pcml/ocfs2.pmck and Pacemaker was not on the pacemaker side but in the controld.pcml/ocfs2.pmck stack . You shouldn't need the .pcmk controld

Re: [Linux-HA] [Pacemaker] pcmk + corosync + cman for dlm support?

2011-10-31 Thread Andrew Beekhof
On Sat, Oct 29, 2011 at 3:09 AM, Nick Khamis sym...@gmail.com wrote: Hello Gents, Thank you so much for your response. That being said, what are the plans once the next release of CMAN does not include PCMK and DLM related implementation? From what I can see, libdlm will be separated from

Re: [Linux-HA] Error with pacemaker + cman

2011-10-31 Thread Andrew Beekhof
On Sat, Oct 29, 2011 at 4:57 AM, Nick Khamis sym...@gmail.com wrote: Hello Everyone, I am trying to setup and active/active using: Pacemaker 1.1.6 Cluster3.1.7 When trying to check the cluster using ccs_config_validate, I am recieving the following error:

Re: [Linux-HA] [corosync] crm not connecting with cman/corosync instance

2011-10-31 Thread Andrew Beekhof
On Sat, Oct 29, 2011 at 7:02 AM, Nick Khamis sym...@gmail.com wrote: Hello Everyone, I am trying to configure an active/active cluster. Built from source are: Pacemaker 1.1.6 Cluster3 Corosync 1.4.2 Starting corosync instead of cman, crm works, however when starting cman, I am not able

Re: [Linux-HA] Does ANYTHING Work on RHEL6?

2011-10-31 Thread Andrew Beekhof
On Mon, Oct 31, 2011 at 9:56 PM, Robinson, Eric eric.robin...@psmnv.com wrote: I can't get a cluster up on RHEL6. First I tried pacemaker+corosync, but corosync complains...    Could not get the ring status, the error is: 6 ..and I cannot connect to the cluster. So then I tried

Re: [Linux-HA] pcmk + corosync + cman for dlm support?

2011-10-28 Thread Andrew Beekhof
Does http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08.html help answer your question? On Fri, Oct 28, 2011 at 12:04 PM, Nick Khamis sym...@gmail.com wrote: Hello Everyone, I just want to make sure this is still the case before I go through with it. I am

Re: [Linux-ha-dev] How to use reload action of RA agent?

2011-10-11 Thread Andrew Beekhof
On Fri, Oct 7, 2011 at 2:42 PM, Serge Dubrouski serge...@gmail.com wrote: Hello - How one supposed to use reload action of RA agent it it's supported by RA? When I try to set up an order like this: order Reload_After_Start +inf: res1:start res2:reload neither crm nor cibadmin allow me to

Re: [Linux-HA] Two node cluster monitoring configuration to ignore failing on restart

2011-10-11 Thread Andrew Beekhof
On Thu, Sep 29, 2011 at 7:48 PM, Florian Crouzat gen...@floriancrouzat.net wrote: Hi, I'm running a two node cluster where all the resources have to run on the same node and failed resources must not trigger anything. Why have a cluster then? I'm having trouble configuring the following

Re: [Linux-HA] Q: lost vote while network seems up

2011-10-11 Thread Andrew Beekhof
On Thu, Sep 29, 2011 at 6:09 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hello! I'm examining a case where both nodes of a two node cluster were fenced at the same time. The cluster is running SLES11 SP1 with a corosync 1.4.1 Update to make the rrp stable. I found strange

Re: [Linux-HA] [Pacemaker] rpm repo down?

2011-10-10 Thread Andrew Beekhof
It should be back now. Someone, or some people, were trying to download every rpm we've ever built which was both wasteful and creating enough load to cripple the server. I've put it back and enabled mod_bw and although the load is now reasonable it's not clear if thats because the mirroring has

[Linux-HA] New Pacemaker Issue Tracker

2011-10-10 Thread Andrew Beekhof
Since it's clearly not acceptable for our issue tracker to be offline for months at a time, it is time to replace the Bugzilla instance hosted by the Linux Foundation with something else. Some candidates included the github issue tracker (no attachments) and the Red Hat bugzilla (BZ components

Re: [Linux-ha-dev] [Linux-HA] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th

2011-10-09 Thread Andrew Beekhof
On Sat, Oct 8, 2011 at 6:03 AM, Digimer li...@alteeve.com wrote: On 10/07/2011 02:58 PM, Florian Haas wrote: Vienna before the early afternoon of Saturday the 29th, so if anyone has plans to do something interesting that Saturday morning I'd be more than happy to join. Cheers, Florian I'm

Re: [Linux-HA] [Linux-ha-dev] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th

2011-10-09 Thread Andrew Beekhof
On Sat, Oct 8, 2011 at 6:03 AM, Digimer li...@alteeve.com wrote: On 10/07/2011 02:58 PM, Florian Haas wrote: Vienna before the early afternoon of Saturday the 29th, so if anyone has plans to do something interesting that Saturday morning I'd be more than happy to join. Cheers, Florian I'm

Re: [Linux-ha-dev] [Linux-HA] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th

2011-10-06 Thread Andrew Beekhof
On Thu, Oct 6, 2011 at 1:53 AM, Lars Marowsky-Bree l...@suse.com wrote: On 2011-10-03T11:10:13, Andrew Beekhof and...@beekhof.net wrote: Based on Boston last year, I imagine the conversations will last right up until Lars starts presenting his talk on Friday afternoon. People came and went

Re: [Linux-HA] [Linux-ha-dev] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th

2011-10-06 Thread Andrew Beekhof
On Thu, Oct 6, 2011 at 1:53 AM, Lars Marowsky-Bree l...@suse.com wrote: On 2011-10-03T11:10:13, Andrew Beekhof and...@beekhof.net wrote: Based on Boston last year, I imagine the conversations will last right up until Lars starts presenting his talk on Friday afternoon. People came and went

Re: [Linux-HA] CentOS 6: pacemaker and corosync wrong stop order (Patch)

2011-10-03 Thread Andrew Beekhof
On Thu, Sep 29, 2011 at 6:46 PM, Gianluca Cecchi gianluca.cec...@gmail.com wrote: On Thu, Sep 29, 2011 at 9:41 AM, Florian Crouzat wrote: Andrew Beekhof wrote on 2011-09-29: On Wed, Sep 28, 2011 at 12:36 AM, Florian CROUZAT gen...@floriancrouzat.net wrote: Hi, I cannot 'halt' my CentOS 6

Re: [Linux-ha-dev] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th

2011-10-02 Thread Andrew Beekhof
On Sat, Oct 1, 2011 at 12:55 AM, Digimer li...@alteeve.com wrote: On 09/27/2011 07:58 AM, Lars Marowsky-Bree wrote: Hi all, it turns out that there was zero feedback about people wanting to present, only some about travel budget being too tight to come. So we had some discussions about

Re: [Linux-HA] [Linux-ha-dev] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th

2011-10-02 Thread Andrew Beekhof
On Sat, Oct 1, 2011 at 12:55 AM, Digimer li...@alteeve.com wrote: On 09/27/2011 07:58 AM, Lars Marowsky-Bree wrote: Hi all, it turns out that there was zero feedback about people wanting to present, only some about travel budget being too tight to come. So we had some discussions about

Re: [Linux-HA] Antwort: Re: Escaping Depenencies in Resource Groups

2011-10-02 Thread Andrew Beekhof
On Sat, Oct 1, 2011 at 5:22 AM, Robinson, Eric eric.robin...@psmnv.com wrote: Thanks for your thoughts. However, these are production servers so I have to be quite certain of the approach before I start. I don't really have an opportunity to try out configs. Hopefully someone will chime in

Re: [Linux-HA] wipe out cib on 2 node that is not connected

2011-09-29 Thread Andrew Beekhof
Could you send me a crm_report of this situation? It will have everything we need to investigate. On Thu, Sep 29, 2011 at 7:32 AM, Nick Khamis sym...@gmail.com wrote: Hello Everyone, This is a fresh install for a two node (astdrbd1, astdrbd2), that I rushed and ended up with:

Re: [Linux-HA] Cluster with 4 nodes 2 services

2011-09-29 Thread Andrew Beekhof
On Thu, Sep 29, 2011 at 12:56 PM, Nick Khamis sym...@gmail.com wrote: Hello Everyone, We have the following: * astdrbd1 astdrbd2 - Run the asterisk servers node astdrbd1 \       attributes standby=off node astdrbd2 \       attributes standby=off primitive failover-ip

Re: [Linux-HA] CentOS 6: pacemaker and corosync wrong stop order (Patch)

2011-09-28 Thread Andrew Beekhof
On Wed, Sep 28, 2011 at 12:36 AM, Florian CROUZAT gen...@floriancrouzat.net wrote: Hi, I cannot 'halt' my CentOS 6 servers while running corosync+pacemaker. I believe the runlevels used to stop corosync and pacemaker are not in the correct order and create the infinite Waiting for corosync

Re: [Linux-HA] Install problems with ha resource

2011-09-28 Thread Andrew Beekhof
On Thu, Sep 29, 2011 at 12:07 AM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi all, On Wed, Sep 28, 2011 at 01:18:27PM +0200, Andreas Mock wrote: Hi Fabio, just my thoughts from the perspective of a user: a) If I install from source choosing a non-standard installation environment, I

Re: [Linux-HA] Understanding why a host fence (was: Resource fail and node fence)

2011-09-26 Thread Andrew Beekhof
On Tue, Sep 20, 2011 at 5:59 PM, RaSca ra...@miamammausalinux.org wrote: Hi all, I start a new thread because I've got more debug details to analyze my situation, and starting from the beginning might be better. My environment is composed by two machine connected to a network and one to each

Re: [Linux-HA] Problem with creating constraints

2011-09-25 Thread Andrew Beekhof
On Fri, Sep 23, 2011 at 11:00 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Fri, Sep 23, 2011 at 12:39:13PM +1000, Andrew Beekhof wrote: On Wed, Sep 21, 2011 at 7:15 PM, Uwe Weiss u.we...@netz-objekte.de wrote: Hello List, I have a problem in creating a constraint. I hope

Re: [Linux-HA] Start resource depending on resource(s) outside Pacemakers scope

2011-09-25 Thread Andrew Beekhof
On Sat, Sep 24, 2011 at 12:20 AM, Uwe Weiss u.we...@netz-objekte.de wrote: Hello list, I would like to start a Resource depending on a mount point which is not handled by pacemaker. I have some resources handled by pacemaker. This resources are stored on Primary/Primary drbd drive. Due to

Re: [Linux-HA] Resource fail and node fence

2011-09-22 Thread Andrew Beekhof
On Wed, Sep 21, 2011 at 4:59 PM, RaSca ra...@miamammausalinux.org wrote: Il giorno Mar 20 Set 2011 17:54:58 CEST, Dejan Muhamedagic ha scritto: [...] And I completely agree with this, but in an environment like mine, where a single resource failure might involve all the others (with fence) is

Re: [Linux-HA] What's wrong in my configuration for GFS2 under Pacemaker ?

2011-09-22 Thread Andrew Beekhof
On Sat, Sep 17, 2011 at 12:19 AM, alain.mou...@bull.net wrote: Hi , no nothing more with crm_mon -1r but I trace in Filesystem script, in fact I see that if we configure a clone for fsGS2 (Filesystem) it seems that when you ask to start a clone resource, Pacemaker at first call the

Re: [Linux-HA] What's wrong in my configuration for GFS2 under Pacemaker ?

2011-09-22 Thread Andrew Beekhof
On Mon, Sep 19, 2011 at 5:08 PM, alain.mou...@bull.net wrote: Hi, sorry to ask that , but is there a problem with my questions messages ? No, just busy working through a large backlog because I can't see my questions anymore in the digest emails (bounce) ... for example this one ... So I

Re: [Linux-HA] What's wrong in my configuration for GFS2 under Pacemaker ?

2011-09-22 Thread Andrew Beekhof
On Tue, Sep 20, 2011 at 12:40 AM, alain.mou...@bull.net wrote: Hi Ok it was due to the parameter globally-unique which is true by default, and that 's lead to the stop of the clone on both sides, because with globally-unique=true, X:0 on node1 is the same as X:0 on node2 (but different from

Re: [Linux-HA] Invalid recurring action when trying to op start...

2011-09-22 Thread Andrew Beekhof
On Fri, Sep 23, 2011 at 11:59 AM, Nick Khamis sym...@gmail.com wrote: Hello Everyone, When trying to load the folloing RA configuration: node mydrbd1 \       attributes standby=off node mydrbd2 \       attributes standby=off primitive ip1 ocf:heartbeat:IPaddr2 \        params

Re: [Linux-HA] Prob with VirtualDomain RA, Res is active on two nodes

2011-09-22 Thread Andrew Beekhof
On Tue, Sep 20, 2011 at 7:37 PM, Uwe Weiss u.we...@netz-objekte.de wrote: Hello list I am using libvirt 0.94 pacemaker 1.1.5 corosync 1.3.0 kvm 0.15.0 openSuse 11.4 on a two node Cluster. The VMs are stored on glusterfs shared and replicated device.  It seems that there is no

Re: [Linux-HA] Problem with creating constraints

2011-09-22 Thread Andrew Beekhof
On Wed, Sep 21, 2011 at 7:15 PM, Uwe Weiss u.we...@netz-objekte.de wrote: Hello List, I have a problem in creating a constraint. I hope that someone could help me and give me a hint. I have three resources (A,B,C) and two cluster nodes (node0,node1). Resource A can run only on node0 and

Re: [Linux-HA] [DRBD-user] Invalid recurring action when trying to op start...

2011-09-22 Thread Andrew Beekhof
, 2011 at 10:29 PM, Andrew Beekhof and...@beekhof.net wrote: On Fri, Sep 23, 2011 at 11:59 AM, Nick Khamis sym...@gmail.com wrote: Hello Everyone, When trying to load the folloing RA configuration: node mydrbd1 \       attributes standby=off node mydrbd2 \       attributes standby=off

Re: [Linux-HA] two node cluster: clvm depending resources restart/stuck when failing node joins cluster

2011-09-22 Thread Andrew Beekhof
On Mon, Sep 5, 2011 at 6:38 PM, Oualid Nouri o.no...@computer-lan.de wrote: Hi to all, i have setup a drbd-based dual primary two node cluster with Pacemaker on opensuse 11.4  for testing. I have also setup drbd=controld=clvm=lvm=ocfs2 resources (all clones)   and a samba+IP resource

Re: [Linux-HA] Cluster corosync issues, crmd terminating

2011-09-22 Thread Andrew Beekhof
On Wed, Sep 21, 2011 at 3:43 AM, kevins7189 kevin.sm...@dtn.com wrote: Having an issue with my cluster testing.  I have a simple 2 node drbd/nfs/mysql cluster.  Working on configuring stonith (which is not working for me), but while testing failover scenarios, running into a issue where crmd

Re: [Linux-HA] Pacemaker : Pb on stop on a resource while the monitoring is performed

2011-09-22 Thread Andrew Beekhof
On Thu, Sep 1, 2011 at 10:00 PM, alain.mou...@bull.net wrote: Hi My release is : pacemaker-1.1.2-7 (on RHEL6) and I have checked that the patch : High: PE: Bug lf#2433 - No services should be stopped until probes finish is effectively integrated in this release. Nethertheless, it seems

Re: [Linux-HA] remove resource WITHOUT moving the other resources

2011-09-22 Thread Andrew Beekhof
On Sun, Aug 14, 2011 at 11:04 PM, Julian D. Seifert ala...@julian-seifert.de wrote: Hi, Thank you for your response, I have some follow-up questions. Now what I am looking for is a way to completely delete/remove openvzve_itv without affecting the other resources. is-managed-default=false

Re: [Linux-HA] Antw: Why 'crm resource cleanup' cannot work

2011-09-19 Thread Andrew Beekhof
On Thu, Sep 1, 2011 at 1:04 PM, robin robin@163.com wrote: Hi Gent, It seems some errors in syslog when I run crm_resource -C -r linkmon [root@master ~]# crm status Last updated: Thu Sep  1 10:58:18 2011 Stack: Heartbeat Current DC: master

Re: [Linux-HA] Antw: Re: Apache error on all nodes

2011-09-18 Thread Andrew Beekhof
On Fri, Sep 16, 2011 at 11:22 PM, Guillaume Bettayeb guillaume1...@gmail.com wrote: Hi all, I have been through my Apache configuration again and I confirm Apache works fine. I assume you're testing by running /etc/init.d/apache2 start or something similar? This is not what the cluster

Re: [Linux-HA] A little confused

2011-09-18 Thread Andrew Beekhof
On Fri, Sep 16, 2011 at 6:03 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 09/15/2011 01:33 PM, Charles Richard wrote: Hi, Just looking to see if i could get a little help understanding more about heartbeat. I've got a Drbd and Heartbeat setup on a CentOS 6 server.  When i start

Re: [Linux-HA] The active trap of the SNMP is delayed.

2011-09-18 Thread Andrew Beekhof
On Mon, Aug 22, 2011 at 5:05 PM, renayama19661...@ybb.ne.jp wrote: Hi Yan, Hi Andrew, Thank you for comment. Hi Hideo, On 08/04/11 08:13, renayama19661...@ybb.ne.jp wrote: Hi Yan, Pushed. Since we don't have a separate branch, you might need to back-port this patch to

Re: [Linux-HA] Forcing primitive_nfslock away from node

2011-09-18 Thread Andrew Beekhof
On Fri, Aug 19, 2011 at 7:31 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: WTH does this mean (from node2): pengine: [16069]: notice: clone_print:  Master/Slave Set: master_drbd pengine: [16069]: notice: short_print:      Masters: [ node1 ] pengine: [16069]: notice: short_print:      

Re: [Linux-HA] Forcing primitive_nfslock away from node

2011-09-18 Thread Andrew Beekhof
On Mon, Sep 19, 2011 at 1:19 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 9/18/2011 7:22 PM, Andrew Beekhof wrote: On Fri, Aug 19, 2011 at 7:31 AM, Dimitri Maziukdmaz...@bmrb.wisc.edu   wrote: ... No it means one of more filesystem_drbd and primitive_nfslock operations failed really

Re: [Linux-HA] Q: ERROR: is_op_dup: Operation .. is a duplicate of ..

2011-08-14 Thread Andrew Beekhof
On Fri, Aug 12, 2011 at 10:13 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi, regarding these error maessages: Aug  5 13:47:13 h02 pengine: [11473]: ERROR: is_op_dup: Operation prm_t11_as cs_ers-op-monitor-Slave-30 is a duplicate of prm_t11_ascs_ers-op-monitor-Master-30 Aug  

Re: [Linux-HA] Problem with kvm virtual machine and cluster

2011-08-11 Thread Andrew Beekhof
On Wed, Aug 10, 2011 at 11:15 PM, Maloja01 maloj...@arcor.de wrote: The order constraints do work as I assume, but I guess that you run into a pifall: A clone is marked as up, if one instance in the cluster is started successfully. The order does not say, that the clone on the same node must

Re: [Linux-HA] remove resource WITHOUT moving the other resources

2011-08-11 Thread Andrew Beekhof
On Thu, Aug 11, 2011 at 5:28 AM, ala...@julian-seifert.de wrote: Hi List, I have a little problem with my 2 node pacemaker cluster. (Active/Passive Setup). vpsnode01-rz (current active node) and vpsnode01-nk (passive) It's a bunch of OpenVZ containers grouped together and colocated to where

Re: [Linux-HA] about STONITH in HA

2011-08-11 Thread Andrew Beekhof
On Thu, Aug 11, 2011 at 9:29 PM, Sam Sun sam@ericsson.com wrote: Hi All, This is Sam for Ericsson IPWorks product maintenance team. We have an urgent problem on the Linux HA solution. I am not sure if this is the right mail box, however it is very appreciated if any one can help us.

Re: [Linux-HA] Antw: Re: Q: default vs. default (e.g. exportfs)

2011-08-11 Thread Andrew Beekhof
On Thu, Aug 11, 2011 at 5:08 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 11.08.2011 um 07:57 in Nachricht CAEDLWG3UfkJsYf3x9CUu45K9vdO1rce7FF9V1sooHkdp_X=x...@mail.gmail.com: On Sat, Aug 6, 2011 at 12:01 AM, Ulrich Windl ulrich.wi

Re: [Linux-HA] Renaming a running resource: to do, or not to do?

2011-08-11 Thread Andrew Beekhof
On Thu, Aug 11, 2011 at 5:37 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! Using crm shell, you cannot rename a running resource. However I managed to do it via a shadow cib: I renamed the resource in the shadow cib, then committed the shadow cib. From the XML changes, I

Re: [Linux-HA] Q: default vs. default (e.g. exportfs)

2011-08-10 Thread Andrew Beekhof
On Sat, Aug 6, 2011 at 12:01 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I frequently see problems I don't understand: When configuring an exportfs resource using crm shell without explicitly specifying operations or timeouts, I get warnings like these: WARNING:

Re: [Linux-HA] Question about max_child_count

2011-08-10 Thread Andrew Beekhof
That would be an lrmd property, so unaffected by the pacemaker version. On Mon, Aug 8, 2011 at 8:04 PM, alain.mou...@bull.net wrote: Hi I wonder if the default value of max_child_count (4) has been increased on last Pacemaker releases ? or if it is now possible to tune it ? Thanks Alain

Re: [Linux-HA] Antw: Re: [ha-wg-technical] The mess with OCF_CHECK_LEVEL (crm aborts during commit)

2011-08-07 Thread Andrew Beekhof
On Fri, Aug 5, 2011 at 5:15 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Dejan Muhamedagic de...@suse.de schrieb am 05.08.2011 um 08:39 in Nachricht 20110805063900.GB31749@rondo.homenet: Hi, On Fri, Aug 05, 2011 at 08:23:43AM +0200, Ulrich Windl wrote: Dejan Muhamedagic

Re: [Linux-HA] Antw: Re: location and orders : Question about a behavior ...

2011-08-04 Thread Andrew Beekhof
On Thu, Aug 4, 2011 at 4:28 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Dan Frincu df.clus...@gmail.com schrieb am 03.08.2011 um 13:28 in Nachricht CADQRkwiFCEUnq-i9Dtv6AbjQz4Z_e792=3is81zv1eqdrnj...@mail.gmail.com: Hi, On Wed, Aug 3, 2011 at 2:22 PM,  alain.mou...@bull.net

Re: [Linux-ha-dev] [Pacemaker] ping RA question

2011-07-31 Thread Andrew Beekhof
Dan - any objections if I incorporate the fping parts into the ping RA? On Fri, Jul 29, 2011 at 12:47 AM, Dan Urist dur...@ucar.edu wrote: Here's my fping RA, for anyone who's interested. Note that some of the parameters are different than ping/pingd, since fping works differently. The major

Re: [Linux-HA] logged messages

2011-07-28 Thread Andrew Beekhof
On Mon, Jul 25, 2011 at 9:10 PM, Léon Keijser keij...@stone-it.com wrote: On Fri, 2011-07-22 at 17:18 +, Léon Keijser wrote: 2011-07-22T19:15:21+02:00 nfs01 attrd: [19717]: info: attrd_trigger_update: Sending flush op to all hosts for: ping (0) 2011-07-22T19:15:21+02:00 nfs01

Re: [Linux-HA] stonith with external/vcenter

2011-07-18 Thread Andrew Beekhof
On Mon, Jul 18, 2011 at 3:25 PM, lowshoe lows...@gmail.com wrote: hi guys, help for this problem is still greatly appreciated! i can give more info, logmessages or configs if needed. It might be best to contact SUSE support. They'll be able to give this a higher priority. regards,

Re: [Linux-HA] split brain problem

2011-07-18 Thread Andrew Beekhof
On Sat, Jul 16, 2011 at 7:31 PM, Willi Fehler willi.feh...@t-online.de wrote: Hi, I've installed a Pacemaker/OpenAIS/Corosync/DRBD/MySQL Cluster on CentOS6. (VirtualBox) If I start both nodes at the same time, I always get a split brain Split brain as in, corosync on the two nodes can't talk

Re: [Linux-HA] Always Get a Billion Failed Actions

2011-07-14 Thread Andrew Beekhof
On Thu, Jul 14, 2011 at 7:31 PM, Robinson, Eric eric.robin...@psmnv.com wrote: On Thu, Jun 16, 2011 at 8:38 PM, Robinson, Eric eric.robin...@psmnv.com wrote: crm_mon on my system displays a lot of failed actions, I guess because the init script for the resource is not fully lsb compliant?

Re: [Linux-ha-dev] [PATCH] pacemaker-1.1.5 : fix autotools build system

2011-07-13 Thread Andrew Beekhof
applied. thanks! On Tue, Jul 12, 2011 at 6:54 PM, Ultrabug ultra...@gentoo.org wrote: Hello mates, I would like you to consider having the attached patch committed in order to fix and improve the build system of pacemaker. We Gentoo compilation lovers have to apply this patch in order to

Re: [Linux-ha-dev] pacemaker - migrate RA, based on the state of other RA, w/o clone?

2011-07-13 Thread Andrew Beekhof
On Thu, Jul 14, 2011 at 1:51 AM, RNZ renoi...@gmail.com wrote: I make next resource agent - https://github.com/rnz/resource-agents/blob/master/heartbeat/couchdb At end of file exist next example configuration:  node vub001  node vub002  primitive couchdb-1 ocf:heartbeat:couchdb

Re: [Linux-HA] colocation of three resources

2011-07-13 Thread Andrew Beekhof
On Wed, Jul 13, 2011 at 10:00 PM, Trujillo Carmona, Antonio antonio.trujillo.s...@juntadeandalucia.es wrote: I need three resources collocated. I use: colocation res_Nagios-res_centcore +inf: res_Nagios:Started res_centstorage:Started colocation res_centcore-res_centstorage +inf:

Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-07 Thread Andrew Beekhof
On Thu, Jul 7, 2011 at 5:40 PM, Lars Marowsky-Bree l...@suse.de wrote: On 2011-07-06T15:06:01, Craig Lesle craig.le...@bruden.com wrote: Interesting that st_timeout does not show 75 seconds on any try and looks rather random, like it's calculated. ... right. I hadn't noticed that before.

Re: [Linux-HA] ERROR: glib: ucast: error binding socket. Retrying: Address already in use

2011-07-07 Thread Andrew Beekhof
There is some way to tell the system not to hand out 696 for use by other daemons. Its been a long time since I did it though so I forget the details (even who is handing it out, possibly rpc). On Thu, Jul 7, 2011 at 10:16 AM, Hai Tao taoh...@hotmail.com wrote: I got this error (ERROR: glib:

<    1   2   3   4   5   6   7   8   9   10   >