Re: [Pacemaker] Can't failover Master/Slave with group(primitive x3) setting

2011-09-29 Thread Junko IKEDA
Hi, sorry for the confusion. Pacemaker 1.0.10 OK(group resource can failover) Pacemaker 1.0.11 NG(gruop resource just stop, can not failover) Pacemaker 1.1 <- the latest hg (gruop resource just stop, can not failover) By the way, your simulation showed dummy01 restart on bl460g1n13 again, but du

Re: [Pacemaker] load balancing in a 3-node cluster

2011-09-29 Thread Mark Smith
> Try no-quorum-policy=freeze instead. I considered this option, but that puts us into a situation where if node X and Y fail, then resources from them won't be started up on Z. I would like to (if possible) avoid that -- I want one node to be able to take on everything. I realize this may be a p

Re: [Pacemaker] Configuring Pacemaker & DRBD

2011-09-29 Thread Cliff Massey
After digging around in the messages I found the following: notice: unpack_rsc_op: Operation convirt-drbd:0_monitor_0 found resource convirt-drbd:0 active on admin01 notice: unpack_rsc_op: Operation convirt-drbd:0_monitor_0 found resource convirt-drbd:0 active on admin02 admin01 pengine: [3178]:

[Pacemaker] concurrent uses of cibadmin: Signon to CIB failed: connection failed

2011-09-29 Thread Brian J. Murrell
So, in another thread there was a discussion of using cibadmin to mitigate possible concurrency issue of crm shell. I have written a test program to test that theory and unfortunately cibadmin falls down in the face of heavy concurrency also with errors such as: Signon to CIB failed: connection f

Re: [Pacemaker] Reloading a resource after a failover

2011-09-29 Thread Serge Dubrouski
Here: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/named Let me know how it works for you. On Sep 29, 2011 8:25 AM, "Max Williams" wrote: > Yes I was using the LSB RA. Can you give me a link to the OCF RA on github? > Thanks, > Max > > From: Serge Dubrouski [mailto:serge..

[Pacemaker] 1) attrd, crmd, cib, stonithd going to 100% CPU after standby 2) monitoring bug 3) meta failure-timeout issue

2011-09-29 Thread Proskurin Kirill
Hello all. corosync-1.4.1 pacemaker-1.1.5 pacemaker runs with "ver: 1" I run on some problems this week. I not sure if I need to make 3 separate letters, sorry if so. 1) I set node to standby and then to online. And after this I get this: 2643 root RT 0 11424 2052 1744 R 100.9 0.0 657502:53

Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-09-29 Thread Nick Khamis
Darren, Please keep us updated on your progress, I am still in the stage of setting up services and primitives. This will all be done by the end of this week. Cheers, Nick. On Thu, Sep 29, 2011 at 11:06 AM, wrote: > Sorry for top-posting, I'm Outlook-afflicted. > > This is also my problem; In

Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-09-29 Thread Darren.Mansell
Sorry for top-posting, I'm Outlook-afflicted. This is also my problem; In the full production environment there will be low-level hardware fencing by means of IBM RSA/ASM but this is a VMware test environment. The vmware STONITH plugin is dated and doesn't seem to work correctly (I gave up quic

Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-09-29 Thread Nick Khamis
Hello Dejan, Sorry to hijack, I am also working on the same type of setup as a prototype. What is the best way to get stonith included for VM setups? Maybe an SSH stonith? Again, this is just for the prototype. Cheers, Nick. On Thu, Sep 29, 2011 at 9:28 AM, Dejan Muhamedagic wrote: > Hi Darren

Re: [Pacemaker] Reloading a resource after a failover

2011-09-29 Thread Max Williams
Yes I was using the LSB RA. Can you give me a link to the OCF RA on github? Thanks, Max From: Serge Dubrouski [mailto:serge...@gmail.com] Sent: 29 September 2011 13:19 To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] Reloading a resource after a failover What kind of RA you do

Re: [Pacemaker] Pacemaker 1.0 and compiler optimization

2011-09-29 Thread Rainer Weikusat
Andrew Beekhof writes: > On Tue, Sep 27, 2011 at 1:47 AM, Rainer Weikusat > wrote: >> Is there a specific reason why compiler optimization is disabled (line >> 1323 in configure.ac, '-ggdb3 -O0') when building pacemaker? > > Something we inherited from heartbeat I guess - possibly so that stack >

Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-09-29 Thread Dejan Muhamedagic
Hi Darren, On Thu, Sep 29, 2011 at 02:15:34PM +0100, darren.mans...@opengi.co.uk wrote: > (Originally sent to DRBD-user, reposted here as it may be more relevant) > > > > > Hello all. > > > > I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2 > for dual-primary shared F

Re: [Pacemaker] Master won't get promoted

2011-09-29 Thread Dejan Muhamedagic
Hi, On Thu, Sep 29, 2011 at 09:30:55AM -0300, Charles Richard wrote: > Here it is attached. > > I also see the following 2 errors in the node 2 logs which I assume mean the > problem is really that node1 is not getting demoted and I'm not sure why: > > Error 1: > Sep 28 19:53:20 staging2 drbd[85

[Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-09-29 Thread Darren.Mansell
(Originally sent to DRBD-user, reposted here as it may be more relevant) Hello all. I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2 for dual-primary shared FS. I've followed the instructions on the DRBD applications site and it works really well. However, if I

[Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-09-29 Thread Darren.Mansell
(Originally sent to DRBD-user, reposted here as it may be more relevant) Hello all. I'm implementing a 2-node cluster using Corosync/Pacemaker/DRBD/OCFS2 for dual-primary shared FS. I've followed the instructions on the DRBD applications site and it works really well. However, if I

Re: [Pacemaker] Master won't get promoted

2011-09-29 Thread Charles Richard
Here it is attached. I also see the following 2 errors in the node 2 logs which I assume mean the problem is really that node1 is not getting demoted and I'm not sure why: Error 1: Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Called drbdadm -c /etc/drbd.conf primary mysqld Sep 28 19:53:20

Re: [Pacemaker] Reloading a resource after a failover

2011-09-29 Thread Serge Dubrouski
What kind of RA you do you use? LSB one doesn't support reload, OCF - does. You need to get OCF RA from github,. On Thu, Sep 29, 2011 at 3:04 AM, Max Williams wrote: > Yes this is what I would like to do. Ideally have named as a clone and > then have an order like this: > > crm(live)configur

Re: [Pacemaker] Reloading a resource after a failover

2011-09-29 Thread Max Williams
Yes this is what I would like to do. Ideally have named as a clone and then have an order like this: crm(live)configure# order named-service-clone-after-Cluster_IP inf: Cluster_IP:start Named_Service:reload But it gives this error: ERROR: bad resource action/instance definition: Named_Service:rel

Re: [Pacemaker] Stonith / Fencing

2011-09-29 Thread Andrew Beekhof
On Wed, Sep 28, 2011 at 5:52 PM, Fiorenza Meini wrote: > Hi there, > I'm working on stonith on my test cluster. It has, to me, a strange > behaviour: when the condition to fence the other node happens, is it normal > that both primary/secondary node fences the other one? I thought that the > prima

Re: [Pacemaker] Odd colocation behaviour with master/slave resource

2011-09-29 Thread Andrew Beekhof
On Sat, Aug 27, 2011 at 9:07 AM, Chris Redekop wrote: > I'm attempting to set up a master/slave database cluster where the master is > R/W and the slave is R/O.  The master failure scenario works fine (slave > becomes master, master vip moves over)however when the slave resource > goes down I

Re: [Pacemaker] Master won't get promoted

2011-09-29 Thread Andrew Beekhof
Could you attach /var/lib/pengine/pe-input-3802.bz2 from staging1? That would tell us why. On Mon, Sep 26, 2011 at 10:28 PM, Charles Richard wrote: > Hi, > > I'm making some headway finally with my pacemaker install but now that > crm_mon doesn't return errors any more and crm_verify is clear, I

Re: [Pacemaker] load balancing in a 3-node cluster

2011-09-29 Thread Andrew Beekhof
On Wed, Sep 28, 2011 at 8:52 AM, Mark Smith wrote: > Hi all, > > Here at Bump we currently have our handset traffic routed through a > single server.  For obvious reasons, we want to expand this to > multiple nodes for redundancy.  The load balancer is doing two tasks: > TLS termination and then d