[Pacemaker] Corosync IPAddr problems(?)

2011-02-07 Thread Stephan-Frank Henry
Hello again, I am having some possible problems with Corosync and IPAddr. To be more specific, when I do a /etc/init.d/corosync stop, while everything shuts down more or less gracefully, the virtual ip never is released (still visible with ifconfig). if I do a 'sudo ifdown --force eth0:0' it

Re: [Pacemaker] Corosync IPAddr problems(?)

2011-02-07 Thread Shravan Mishra
Try using IPAddr2 -Shravan On Mon, Feb 7, 2011 at 8:01 AM, Stephan-Frank Henry frank.he...@gmx.netwrote: Hello again, I am having some possible problems with Corosync and IPAddr. To be more specific, when I do a /etc/init.d/corosync stop, while everything shuts down more or less

Re: [Pacemaker] Corosync IPAddr problems(?)

2011-02-07 Thread Dejan Muhamedagic
Hi, On Mon, Feb 07, 2011 at 02:01:11PM +0100, Stephan-Frank Henry wrote: Hello again, I am having some possible problems with Corosync and IPAddr. To be more specific, when I do a /etc/init.d/corosync stop, while everything shuts down more or less gracefully, the virtual ip never is

Re: [Pacemaker] The effects of /var being full on failure detection

2011-02-07 Thread Ryan Thomson
Hi Brett, My question is this: Would /var being full on the passive node have played a role in the cluster's inability to failover during the soft lockup condition on the active node? Or perhaps we hit a condition in which our configuration of pacemaker was unable to detect this type of

Re: [Pacemaker] The effects of /var being full on failure detection

2011-02-07 Thread Brett Delle Grazie
Hi Ryan, On 7 February 2011 17:24, Ryan Thomson r...@pet.ubc.ca wrote: snip We have /var mounted separately, but not /var/log. Interesting idea. Part of our /var problem was two fold: We had enabled debug logging and iptables logging to diagnose a previous problem and neglected to turn them

[Pacemaker] Return value from promote function

2011-02-07 Thread Bob Schatz
I am running Pacemaker 1.0.9.1 and Heartbeat 3.0.3. I have a master/slave resource with an agent. When the resource hangs while doing a promote, the resource returns OCF_ERR_GENERIC. However, all this does is call demote on the resource, restart the resource on the same node and then retry