Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-20 Thread Alan Robertson
Bjorn Oglefjorn wrote: > Oh my. I feel embarrassed. I owe you a drink, Dejan. Stonith seems to be > working now. I'll go hang my head in shame now. Many people owe Dejan a drink! Thanks for finishing this off. I just didn't seem to have the concentration to follow through on the details. --

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-20 Thread Dejan Muhamedagic
On Fri, Apr 20, 2007 at 02:18:48PM -0400, Bjorn Oglefjorn wrote: > Oh my. I feel embarrassed. I owe you a drink, Dejan. Stonith seems to be > working now. I'll go hang my head in shame now. No need to be embarrassed. Worse things happen :) I'm glad that we finally managed to find the problem.

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-20 Thread Bjorn Oglefjorn
Oh my. I feel embarrassed. I owe you a drink, Dejan. Stonith seems to be working now. I'll go hang my head in shame now. --BO On 4/20/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > Once again: > > [EMAIL PROTECTED] ~]# stonith -t external/drac4 > DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=r

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-20 Thread Dejan Muhamedagic
On Fri, Apr 20, 2007 at 10:25:04AM -0400, Bjorn Oglefjorn wrote: > If it seems counter intuitive, think of it like this: >* test-1_DRAC is the DRAC installed in the chassis of > test-1.domainwhich has an address of > test-1.drac.domain No, actually it's not counter intuitive. > Then look here

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-20 Thread Bjorn Oglefjorn
Thirty seconds _should_ be enough time, but I'm curious as to why my five minute timeout isn't in effect here. --BO On 4/19/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: On Tue, Apr 17, 2007 at 03:53:41PM -0400, Bjorn Oglefjorn wrote: > Here they are again. It looks like that this Apr 4 1

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-20 Thread Bjorn Oglefjorn
If it seems counter intuitive, think of it like this: * test-1_DRAC is the DRAC installed in the chassis of test-1.domainwhich has an address of test-1.drac.domain Then look here: In other words, test-1_DRAC

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-20 Thread Andrew Beekhof
On 4/19/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: Anyway, in CIB I found only this (crm_verify doesn't complain) I find these two timeouts: ... 1. transition_timeout is not in the annotated CIB. 2. Should user specify this timeout in the crm_config section and calculate the maximum v

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-19 Thread Dejan Muhamedagic
On Tue, Apr 17, 2007 at 03:55:07PM -0400, Bjorn Oglefjorn wrote: > Alan, what is the list operation? The node names are always FQDNs and > always match. If you apply this patch: http://hg.linux-ha.org/dev/rev/944d240b728a and recompile, we should see a list of hosts as reported by your stonith ag

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-19 Thread Dejan Muhamedagic
On Tue, Apr 17, 2007 at 03:55:07PM -0400, Bjorn Oglefjorn wrote: > Alan, what is the list operation? The node names are always FQDNs and > always match. Do they? >From your CIB:

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-19 Thread Dejan Muhamedagic
Hate replying to myself... There's more and somewhere here is the real problem: Apr 4 11:27:50 test-2 tengine: [13668]: info: te_fence_node:actions.c Executing reboot fencing operation (16) on test-1.domain (timeout=3) Apr 4 11:27:50 test-2 stonithd: [13658]: info: Broadcasting the message

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-19 Thread Dejan Muhamedagic
On Tue, Apr 17, 2007 at 03:53:41PM -0400, Bjorn Oglefjorn wrote: > Here they are again. It looks like that this Apr 4 11:28:20 test-2 stonithd: [13658]: info: Failed to STONITH the node test-1.domain: optype=1, op_result=2 means that the stonith operation timed out. I'll fix the code to raise

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-17 Thread Bjorn Oglefjorn
Alan, what is the list operation? The node names are always FQDNs and always match. --BO On 4/17/07, Alan Robertson <[EMAIL PROTECTED]> wrote: Andrew Beekhof wrote: > On 4/17/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: >> I know that my plugin is getting called because of the logging that t

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-17 Thread Alan Robertson
Andrew Beekhof wrote: > On 4/17/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: >> I know that my plugin is getting called because of the logging that the >> plugin does. > > do we get to see that logging at all? preferably in the context of > the other log messages > >> That said, I also know my

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-17 Thread Andrew Beekhof
On 4/17/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: I know that my plugin is getting called because of the logging that the plugin does. do we get to see that logging at all? preferably in the context of the other log messages That said, I also know my plugin is not receiving any 'reset'

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-17 Thread Bjorn Oglefjorn
I know that my plugin is getting called because of the logging that the plugin does. That said, I also know my plugin is not receiving any 'reset' operation request from heartbeat. If you see below, request actions are logged. The only actions logged when node failure is simulated are: getconfi

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-17 Thread Andrew Beekhof
On 4/17/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: Yes, I most certainly have. The stonith command-line tool has no problem at all with the plugin. The following was run from test-1.domain. The indented log entries are from the debug log of the stonith plugin: I'm no stonith expert, but

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-17 Thread Bjorn Oglefjorn
Yes, I most certainly have. The stonith command-line tool has no problem at all with the plugin. The following was run from test-1.domain. The indented log entries are from the debug log of the stonith plugin: root:~ # stonith -t external/drac4 DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=root DRAC_

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-17 Thread Andrew Beekhof
On 4/16/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: No ideas? none at all - have you tried calling it manually using the stonith command-line tool to make sure it works? On 4/9/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: > > I quickly put together a STONITH plugin for testing this. It

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-09 Thread Bjorn Oglefjorn
I quickly put together a STONITH plugin for testing this. It conforms to the heartbeat spec and always lies to heartbeat returning success no matter what. With this plugin in place I'm still getting this error: Apr 9 15:40:47 test-2 stonithd: [8791]: info: Failed to STONITH the node test-1.dom

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-09 Thread Bjorn Oglefjorn
Any ideas? --BO On 4/4/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: I do not know what op_result=2 means. I can only say that the drac4 RA will never have exit code 2. I am sure that the drac4 RA works as expected in all use cases and also when called via the stonith command from the comman

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-04 Thread Dejan Muhamedagic
On Tue, Apr 03, 2007 at 03:52:37PM -0400, Bjorn Oglefjorn wrote: > Sorry Alan, I realize that this post is getting quite long. Here is a run > down of where I am currently. > > STONITH is failing and I'm still not sure why. Me neither. There's nothing in the logs apart from: Mar 30 09:38:20 tes

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-04 Thread Dejan Muhamedagic
On Tue, Apr 03, 2007 at 03:38:44PM -0400, Bjorn Oglefjorn wrote: > Maybe I have too much logging now. Here is the log from test-1 with the > debugging information removed. I've also trimmed it from where heartbeat > notices the node is dead until STONITH fails the second time. I hope this > is a

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-03 Thread Alan Robertson
Bjorn Oglefjorn wrote: > Anyone? Help? > --BO > > On 4/2/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: >> >> Any ideas as to what's going wrong here? there is so much send/reply/try/fail/fix stuff in the email that I had trouble following what was going on. Could you try reposting this cleanly

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-03 Thread Bjorn Oglefjorn
Anyone? Help? --BO On 4/2/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: Any ideas as to what's going wrong here? --BO On 3/30/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: > > I've made the OCF apache RA work by editing the script's parameters for > now. This is just testing anyway. Attach

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-02 Thread Bjorn Oglefjorn
Any ideas as to what's going wrong here? --BO On 3/30/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: I've made the OCF apache RA work by editing the script's parameters for now. This is just testing anyway. Attached are my configs and a tar ball of the logs from the two nodes in question. Th

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-04-02 Thread Bjorn Oglefjorn
Thanks Alan. That makes more sense now. --BO On 3/30/07, Alan Robertson <[EMAIL PROTECTED]> wrote: Bjorn Oglefjorn wrote: > I took a look at the apache RA, but it makes a lot of assumptions about the > environment which are mostly untrue in Red Hat. How can I configure > this RA > short of ma

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-30 Thread Alan Robertson
Bjorn Oglefjorn wrote: > I took a look at the apache RA, but it makes a lot of assumptions about the > environment which are mostly untrue in Red Hat. How can I configure > this RA > short of making changes to the script? Can I set environmental variables? > I tried setting what's shown in the 'm

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-30 Thread Bjorn Oglefjorn
Correct. I am running CentOS 4.4 and httpd-2.0.52-28.ent.centos4. The default location for httpd.conf is in /etc/httpd/conf/http.conf for this package. The script looks at /etc/httpd/httpd.conf and fails when it doesn't find it. Also, the default LISTEN directive in the apache config does NOT

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-30 Thread Dejan Muhamedagic
On Fri, Mar 30, 2007 at 09:22:44AM -0400, Bjorn Oglefjorn wrote: > I took a look at the apache RA, but it makes a lot of assumptions about the > environment which are mostly untrue in Red Hat. How can I configure this RA > short of making changes to the script? Can I set environmental variables?

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-30 Thread Bjorn Oglefjorn
I took a look at the apache RA, but it makes a lot of assumptions about the environment which are mostly untrue in Red Hat. How can I configure this RA short of making changes to the script? Can I set environmental variables? I tried setting what's shown in the 'meta-data' output, but with no lu

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-29 Thread Alan Robertson
Bjorn Oglefjorn wrote: > Thanks for the reply Dejan. My responses are inline. > --BO > > On 3/28/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: >> >> On Wed, Mar 28, 2007 at 11:29:35AM -0400, Bjorn Oglefjorn wrote: >> > I believe I've corrected some issues, but now I'm getting more of this: >>

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-29 Thread Alan Robertson
Dejan Muhamedagic wrote: > On Wed, Mar 28, 2007 at 02:33:28PM -0400, Bjorn Oglefjorn wrote: >> Thanks for the reply Dejan. My responses are inline. >> --BO >> >> On 3/28/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: >>> On Wed, Mar 28, 2007 at 11:29:35AM -0400, Bjorn Oglefjorn wrote: I bel

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-28 Thread Dejan Muhamedagic
On Wed, Mar 28, 2007 at 02:33:28PM -0400, Bjorn Oglefjorn wrote: > Thanks for the reply Dejan. My responses are inline. > --BO > > On 3/28/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > > > >On Wed, Mar 28, 2007 at 11:29:35AM -0400, Bjorn Oglefjorn wrote: > >> I believe I've corrected some is

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-28 Thread Dejan Muhamedagic
On Wed, Mar 28, 2007 at 11:29:35AM -0400, Bjorn Oglefjorn wrote: > I believe I've corrected some issues, but now I'm getting more of this: > Mar 28 11:02:37 test-1 lrmd: [22008]: ERROR: RA lsb:httpd:monitor (process > 24472) failed to redirect stdout for its background child (daemon) > processes. T

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-28 Thread Bjorn Oglefjorn
Here is the script I forgot to attach. On 3/28/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: Thanks for the reply Dejan. My responses are inline. --BO On 3/28/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > > On Wed, Mar 28, 2007 at 11:29:35AM -0400, Bjorn Oglefjorn wrote: > > I believe I

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-28 Thread Bjorn Oglefjorn
Thanks for the reply Dejan. My responses are inline. --BO On 3/28/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: On Wed, Mar 28, 2007 at 11:29:35AM -0400, Bjorn Oglefjorn wrote: > I believe I've corrected some issues, but now I'm getting more of this: > Mar 28 11:02:37 test-1 lrmd: [22008]:

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-28 Thread Bjorn Oglefjorn
I believe I've corrected some issues, but now I'm getting more of this: Mar 28 11:02:37 test-1 lrmd: [22008]: ERROR: RA lsb:httpd:monitor (process 24472) failed to redirect stdout for its background child (daemon) processes. This will likely cause those processes to die mysteriously at some later

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-21 Thread Bjorn Oglefjorn
Does this make more sense? I've changed the constraints to the way you've suggested and I've also changed the scores to INFINITY. I have since added debug logging and made some changes to my STONITH RA. It kind of works at this point, but eventually both nodes get shot in the head if I shut dow

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-21 Thread Alan Robertson
Dejan Muhamedagic wrote: > On Wed, Mar 21, 2007 at 05:45:14AM -0600, Alan Robertson wrote: >> Dejan Muhamedagic wrote: >>> On Tue, Mar 20, 2007 at 10:59:06PM -0600, Alan Robertson wrote: Dejan Muhamedagic wrote: > On Tue, Mar 20, 2007 at 01:11:21PM -0400, Bjorn Oglefjorn wrote: >> Odd.

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-21 Thread Dejan Muhamedagic
On Wed, Mar 21, 2007 at 05:45:14AM -0600, Alan Robertson wrote: > Dejan Muhamedagic wrote: > > On Tue, Mar 20, 2007 at 10:59:06PM -0600, Alan Robertson wrote: > >> Dejan Muhamedagic wrote: > >>> On Tue, Mar 20, 2007 at 01:11:21PM -0400, Bjorn Oglefjorn wrote: > Odd. I've changed that op to be

Re: [Linux-HA] R2 Two-node apache cluster with STONITH

2007-03-21 Thread Bjorn Oglefjorn
CentOS extras to be more specific. On 3/21/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: I am running: Version: 2.0.7 Release: 1.c4 Thanks, --BO On 3/21/07, Alan Robertson <[EMAIL PROTECTED]> wrote: > > Dejan Muhamedagic wrote: > > On Tue, Mar 20, 2007 at 10:59:06PM -0600, Alan Robertson wro