Re: [Linux-ha-dev] possible deadlock in lrmd?

2010-11-26 Thread Dejan Muhamedagic
Hi, On Thu, Nov 25, 2010 at 10:51:56PM +, Dave Williams wrote: To follow up it appears that lrmd is aborting with the same error message after executing both crm configure verify AND lrmadmin -C Strace yields the following: lrmd: [336]: debug: on_receive_cmd: the IPC to client

Re: [Linux-ha-dev] OCF Resource Agent Developer's Guide (Draft)

2010-11-26 Thread Andrew Beekhof
On Mon, Nov 22, 2010 at 2:40 PM, Florian Haas florian.h...@linbit.com wrote: On 2010-11-22 11:02, alexander.kra...@basf.com wrote: Hi Florian, I did shortly read about your guide. Very good aggregation and definitely a good starting point to begin with own development. Properly you could go

Re: [Linux-ha-dev] OCF Resource Agent Developer's Guide (Draft)

2010-11-26 Thread Florian Haas
On 2010-11-26 10:51, Andrew Beekhof wrote: And also something about the monitoring function in a M/S case. E.g. should a M/S resource return OCF_NOT_RUNNING or OCF_FAILED_MASTER, if there are no more processes of the resource on the node ? Well, to be honest I don't know how $OCF_NOT_RUNNING

Re: [Linux-ha-dev] OCF Resource Agent Developer's Guide (Draft)

2010-11-26 Thread Tim Serong
On 11/26/2010 at 08:51 PM, Andrew Beekhof and...@beekhof.net wrote: On Mon, Nov 22, 2010 at 2:40 PM, Florian Haas florian.h...@linbit.com wrote: On 2010-11-22 11:02, alexander.kra...@basf.com wrote: Hi Florian, I did shortly read about your guide. Very good aggregation and

Re: [Linux-ha-dev] possible deadlock in lrmd?

2010-11-26 Thread Dave Williams
But this is what Senko's patch (2444.diff) fixes - so with that added it cures the abort in both situations above. Now time to look at his potential ref leak. That patch doesn't cure the cause, just works around it. lrmd would just keep accumulating open IPC sockets. Thanks,

Re: [Linux-HA] Pacemaker AWS elastic IPs

2010-11-26 Thread Andrew Miklas
Hi, On 25-Nov-10, at 11:37 AM, Andrew Beekhof wrote: Given what you've described, you could probably remove the while loop during stop. It should be safe because Amazon is ensuring that it will only run in exactly one location. I'll give that a try -- thanks. I noticed something else

Re: [Linux-HA] Pacemaker AWS elastic IPs

2010-11-26 Thread Andrew Beekhof
On Fri, Nov 26, 2010 at 9:36 AM, Andrew Miklas and...@pagerduty.com wrote: Hi, On 25-Nov-10, at 11:37 AM, Andrew Beekhof wrote: Given what you've described, you could probably remove the while loop during stop. It should be safe because Amazon is ensuring that it will only run in exactly

Re: [Linux-HA] PATCH: Sysinfo RA

2010-11-26 Thread Andrew Beekhof
On Fri, Nov 19, 2010 at 4:29 PM, Matthew Richardson m.richard...@ed.ac.uk wrote: Please find attached a patch to the pacemaker SysInfo RA. I like the idea, but we probably need to keep the use of expr ( instead of $(()) ) for compatibility with non-bash systems. This patch adds 2 new features:

Re: [Linux-HA] HealthSMART RA re-written

2010-11-26 Thread Andrew Beekhof
On Thu, Nov 18, 2010 at 4:06 PM, Matthew Richardson m.richard...@ed.ac.uk wrote: I've been playing with the existing HealthSMART RA in Pacemaker and have discovered a number of fundamental bugs and errors with it that mean it will never have worked for anyone. I've done a big overhaul of this

Re: [Linux-HA] 3 Node Cluster

2010-11-26 Thread Andrew Beekhof
On Fri, Nov 19, 2010 at 3:16 PM, Frank Lazzarini flazzar...@gmail.com wrote: Hi all, so I've been playing arround a little bit with setting up a 3 node cluster, and all seems to work fine when I do the start of the drbd resources manually. So basically here is my setup, in this current setup

Re: [Linux-HA] PATCH: Sysinfo RA

2010-11-26 Thread Dejan Muhamedagic
Hi, On Fri, Nov 26, 2010 at 11:13:35AM +0100, Andrew Beekhof wrote: On Fri, Nov 19, 2010 at 4:29 PM, Matthew Richardson m.richard...@ed.ac.uk wrote: Please find attached a patch to the pacemaker SysInfo RA. I like the idea, but we probably need to keep the use of expr ( instead of $(()) )

Re: [Linux-HA] PATCH: Sysinfo RA

2010-11-26 Thread Andrew Beekhof
On Fri, Nov 26, 2010 at 11:53 AM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Fri, Nov 26, 2010 at 11:13:35AM +0100, Andrew Beekhof wrote: On Fri, Nov 19, 2010 at 4:29 PM, Matthew Richardson m.richard...@ed.ac.uk wrote: Please find attached a patch to the pacemaker SysInfo RA. I

[Linux-HA] migration-threshold global vs resource specific

2010-11-26 Thread Kris Buytaert
Imagine that I have 3 resources. d_A, d_B and d_C. They all have to run together on the same node however I want the behaviour upon failure of a service to be different. If primitive d_A and d_B fail I want the whole group to fail to another node. If primitive d_C fails I just want it to

[Linux-HA] hb_gui

2010-11-26 Thread benjamin fernandis
Hi, How to get rpm for hb_gui for centos 5.5 Regards, Benjo ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] migration-threshold global vs resource specific

2010-11-26 Thread Pavlos Parissis
On 26 November 2010 14:52, Kris Buytaert m...@inuits.be wrote: Imagine that I have 3 resources.  d_A, d_B and d_C.   They all have to run together on the same node however I want the behaviour upon failure of a service to be different. Define failure? Failure to stop or start or monitor  If

[Linux-HA] ip faiover

2010-11-26 Thread bharat khandelwal
On Fri, Nov 26, 2010 at 12:58 PM, Andrew Beekhof and...@beekhof.net wrote: On Fri, Nov 26, 2010 at 12:46 PM, Bharath Khandelwal bharath.khandel...@opticalfusion.net wrote: In Heartbeat 2 cluster of four nodes how to stop auto failback: as in cib i am having default resource stickiness —-

[Linux-HA] Fw: ip failover

2010-11-26 Thread bharat khandelwal
On Fri, Nov 26, 2010 at 12:58 PM, Andrew Beekhof and...@beekhof.net wrote: On Fri, Nov 26, 2010 at 12:46 PM, Bharath Khandelwal bharath.khandel...@opticalfusion.net wrote: In Heartbeat 2 cluster of four nodes how to stop auto failback: as in cib i am having default resource stickiness —-

[Linux-HA] auto failback problem

2010-11-26 Thread bharat khandelwal
hi all need help regarding ip failback as in two node cluster how to stop ip failback as i have set default-resource-stickiness to INFINITYafter this also it is failing back i have no constraints in my cib.xml file  check out cib and please help as already crossed deadline here is my cib:  cib