Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-11 Thread Andrew Beekhof
On 4/11/07, Alan Robertson [EMAIL PROTECTED] wrote: Lars Marowsky-Bree wrote: On 2007-04-10T07:09:44, Alan Robertson [EMAIL PROTECTED] wrote: As even calling crm_master and having it do a compare and update-if-modified, or filtering it in the CIB directly requires to at least contact and

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-11 Thread Lars Marowsky-Bree
On 2007-04-10T16:47:52, Alan Robertson [EMAIL PROTECTED] wrote: Such a daemon would obviously easily keep internal state and only issue CIB updates or async (failure) notifications as needed. It'd avoid a lot of forking for various operations. You can already do that. Resources often

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-11 Thread Lars Marowsky-Bree
On 2007-04-11T09:40:52, Andrew Beekhof [EMAIL PROTECTED] wrote: Actually if you look at CTS, you'll find that ResourceRecover uses a-sync notifications (albeit with a couple of tricks to work around deficiencies in the lrm) Have bugs been filed for those deficiencies so that Dejan doesn't

Re: crm_master patch to eliminate do-nothing attribute updates - WAS Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-10 Thread Andrew Beekhof
On Apr 5, 2007, at 4:48 PM, Alan Robertson wrote: Alan Robertson wrote: Lars Marowsky-Bree wrote: On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote: The key word in my question was thinks. It would be useful to the RA if it could know what state the CRM thought it was in, so in

Re: crm_master patch to eliminate do-nothing attribute updates - WAS Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-10 Thread Alan Robertson
Andrew Beekhof wrote: On Apr 5, 2007, at 4:48 PM, Alan Robertson wrote: Alan Robertson wrote: Lars Marowsky-Bree wrote: On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote: The key word in my question was thinks. It would be useful to the RA if it could know what state the CRM

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-10 Thread Alan Robertson
Lars Marowsky-Bree wrote: On 2007-04-05T07:40:34, Alan Robertson [EMAIL PROTECTED] wrote: That is why I'd suggest to only call it in start or post-notify; calling it in post-notify basically implies it'll be called after every state change. But, for DRBD for example, the ability to become

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-10 Thread Lars Marowsky-Bree
On 2007-04-10T07:09:44, Alan Robertson [EMAIL PROTECTED] wrote: As even calling crm_master and having it do a compare and update-if-modified, or filtering it in the CIB directly requires to at least contact and query the CIB, I'd probably still track the state in the RA somewhere. (As to

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-10 Thread Alan Robertson
Lars Marowsky-Bree wrote: On 2007-04-10T07:09:44, Alan Robertson [EMAIL PROTECTED] wrote: As even calling crm_master and having it do a compare and update-if-modified, or filtering it in the CIB directly requires to at least contact and query the CIB, I'd probably still track the state in

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-07 Thread Lars Marowsky-Bree
On 2007-04-05T07:40:34, Alan Robertson [EMAIL PROTECTED] wrote: That is why I'd suggest to only call it in start or post-notify; calling it in post-notify basically implies it'll be called after every state change. But, for DRBD for example, the ability to become master can change without

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Doug Knight
Hi Alan, Interesting. This entry in the cib was completely created from the GUI. I modified the Transition Timeout from the GUI to get around the timeout of my notify function call. So if I understand what you're saying, I should remove the tags from the op definition and create nvpairs to replace

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Alan Robertson
Doug Knight wrote: Hi Alan, Interesting. This entry in the cib was completely created from the GUI. I modified the Transition Timeout from the GUI to get around the timeout of my notify function call. So if I understand what you're saying, I should remove the tags from the op definition and

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Alan Robertson
Alan Robertson wrote: Doug Knight wrote: Hi Alan, Interesting. This entry in the cib was completely created from the GUI. I modified the Transition Timeout from the GUI to get around the timeout of my notify function call. So if I understand what you're saying, I should remove the tags from

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Doug Knight
I've asked this question before, but have not gotten an answer: Can you have a location constraint to set the resource to prefer running the master on one node vs the other (and still have the slave running too)? On Fri, 2007-04-06 at 09:46 -0400, Doug Knight wrote: Not sure how to determine

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Knight, Doug
OK, here's another strange happening, I had an IPaddr resource with a separate (non-WAL file forwarding instance) database co-located to it. I created this early on to get up to speed on the HA basics. It uses the heartbeat ocf script, as does IPaddr. I used a places constraint with a uname eq to

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Alan Robertson
Doug Knight wrote: Not sure how to determine the version of the GUI. It came with the 2.0.8 release. The prolog says copyright 2005. I've looked at the DTD and its not always clear what its saying, but it helps. What's really helped is stealing the do_cmd function from the drbd OCF script, and

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-05 Thread Andrew Beekhof
On 4/4/07, Alan Robertson [EMAIL PROTECTED] wrote: Doug Knight wrote: I think I ran into this exact issue. I was calling crm_master -v 100 to upgrade to master status too frequently. The postgres database needs to stay in master state once its there, and not transition frequently. Are there

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-05 Thread Lars Marowsky-Bree
On 2007-04-03T22:05:24, Alan Robertson [EMAIL PROTECTED] wrote: Yes. I'd avoid needless calls to it though as every call triggers a transition - so don't just call it in _every_ monitor operation ;-) That's a good point. Unfortunately that means the RA has to keep state somewhere, which is

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-05 Thread Lars Marowsky-Bree
On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote: The key word in my question was thinks. It would be useful to the RA if it could know what state the CRM thought it was in, so in case the RA determines on its own that its already in that state, it doesn't have to do anything. But,

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-05 Thread Alan Robertson
Lars Marowsky-Bree wrote: On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote: The key word in my question was thinks. It would be useful to the RA if it could know what state the CRM thought it was in, so in case the RA determines on its own that its already in that state, it

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-05 Thread Alan Robertson
Lars Marowsky-Bree wrote: On 2007-04-03T22:05:24, Alan Robertson [EMAIL PROTECTED] wrote: Yes. I'd avoid needless calls to it though as every call triggers a transition - so don't just call it in _every_ monitor operation ;-) That's a good point. Unfortunately that means the RA has to keep

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-04 Thread Doug Knight
I think I ran into this exact issue. I was calling crm_master -v 100 to upgrade to master status too frequently. The postgres database needs to stay in master state once its there, and not transition frequently. Are there any other command line tools that could be used to retrieve what state CRM

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-04 Thread Andrew Beekhof
On 4/3/07, Doug Knight [EMAIL PROTECTED] wrote: Thanks. Yes, I'm concerned that I'm seeing the spinning and your not. However, I need to push through to getting my OCF script working, and using my version of crm_master will allow me to do that. I'll have to take the added demote into

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-04 Thread Doug Knight
The key word in my question was thinks. It would be useful to the RA if it could know what state the CRM thought it was in, so in case the RA determines on its own that its already in that state, it doesn't have to do anything. But, if the RA finds that the CRM thinks its in a different state,

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-04 Thread Alan Robertson
Doug Knight wrote: I think I ran into this exact issue. I was calling crm_master -v 100 to upgrade to master status too frequently. The postgres database needs to stay in master state once its there, and not transition frequently. Are there any other command line tools that could be used to

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-03 Thread Andrew Beekhof
Hi Doug, I just tried to reproduce this (loaded your cib and ran crm_master the same way you did) but it worked fine. Can you open a bug in bugzilla and include the result of cibadmin -Q please? On 4/2/07, Doug Knight [EMAIL PROTECTED] wrote: Andrew, Alan, Lars, et al, Any updates on the

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-03 Thread Doug Knight
I've never done a bugzilla report, maybe you can point me in the right direction. In the mean time I'll continue to test with my modified version of crm_master that doesn't spin. I've managed to get the master side of the process configuration up, but am having trouble getting the slave side up.

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-03 Thread Andrew Beekhof
On 4/3/07, Doug Knight [EMAIL PROTECTED] wrote: I've never done a bugzilla report, maybe you can point me in the right direction. http://old.linux-foundation.org/developer_bugzilla/enter_bug.cgi In the mean time I'll continue to test with my modified version of crm_master that doesn't

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-03 Thread Alan Robertson
Lars Marowsky-Bree wrote: On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote: THe docs are more specific: They state that it is mandatory to call it in the start action. That doesn't seem to be true. That's a correct observation. You could call it in a monitor action quite

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-02 Thread Doug Knight
Andrew, Alan, Lars, et al, Any updates on the spinning crm_master? I and an associate of mine here are looking into compiler flags and settings to see if we can find anything there. If there is more info you need, just let me know. Doug On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote:

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-30 Thread Doug Knight
Hi Andrew, I'll get to this today. I've been digging through some of the source, putting in some additional logging, etc, just to see where the problem occurs. Thanks, Doug On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: Results:

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-30 Thread Doug Knight
OK, let me try this again. My last email was too big by a little and got held up for review. I've bzip2'ed this one. Doug On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote: Andrew, FYI, the one I ran has some debug statements in it that I put in there. Let me know if you want cleaner

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-29 Thread Andrew Beekhof
On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote: One additional question that has come up as I've developed my notify function: When the promote of the slave completes, and the post-promote notify is sent out, is this post-promote notify sent to both the master and slave nodes, or just to the

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-29 Thread Doug Knight
I went ahead and installed via RPM automake, autoconf, and libtool, even though they were not needed for the 2.0.8 baseline build (I believe libtool at least may have been installed previously). ConfigureMe bootstrap ran fine. From the output I gather it includes the equivalent of the ConfigureMe

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-29 Thread Doug Knight
OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I had two compile errors, and had to make some minor source code mods (see attached error message and file diffs). Not sure if this is related or not. I've done a make install in the crm/admin directory to replace the crm_master

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-29 Thread Alan Robertson
Lars Marowsky-Bree wrote: On 2007-03-27T22:38:50, Alan Robertson [EMAIL PROTECTED] wrote: mandatory in the sense that nothing will get promoted until someone, somewhere runs it. but the exact timing is completely up to the user/admin/RA... it is even possible to run it manually if you have

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-28 Thread Doug Knight
I'm working through the various emails I've received on what to try for the issue I'm having with crm_master. Here's my first response. I pulled down the bz2 version of the Mercurial version per the URL given in a previous email (hg.linux-ha.org/dev), and attempted to build on my server. Using the

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-28 Thread Doug Knight
On Wed, 2007-03-28 at 09:14 +0200, Andrew Beekhof wrote: On 3/28/07, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On Mar 22, 2007, at 2:13 AM, Alan Robertson wrote: Doug Knight wrote: Hi Andrew, I had just started reviewing both of thes scripts, and reviewed

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-28 Thread Doug Knight
Hi Lars, I've gone through your comments below, and I think I understand this a bit better. Let me state what I think I need to do, and see if I've got it now (starting with Node A master and Node B slave): Node A: PostgreSQL master gets demoted - completes (basically just stops PostgreSQL) Node

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-28 Thread Doug Knight
One additional question that has come up as I've developed my notify function: When the promote of the slave completes, and the post-promote notify is sent out, is this post-promote notify sent to both the master and slave nodes, or just to the slave? Doug On Wed, 2007-03-28 at 12:34 -0400, Doug

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-27 Thread Alan Robertson
Andrew Beekhof wrote: On Mar 22, 2007, at 2:13 AM, Alan Robertson wrote: Doug Knight wrote: Hi Andrew, I had just started reviewing both of thes scripts, and reviewed the Multistate and clone resource pages on the web site. It looks like multistate is how I need to handle it, but a

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-25 Thread Alan Robertson
Doug Knight wrote: Got it. The attached file contains the strace from the second attempt by heartbeat to start the resource up as master, right up until it was killed. The resource already showed failed on the gui. I zipped it up using gzip. Lars asked a good question as well... Could you

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-25 Thread Alan Robertson
Doug Knight wrote: Got it. The attached file contains the strace from the second attempt by heartbeat to start the resource up as master, right up until it was killed. The resource already showed failed on the gui. I zipped it up using gzip. By the way, from the system call perspective, what

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-24 Thread Lars Marowsky-Bree
On 2007-03-23T12:56:12, Doug Knight [EMAIL PROTECTED] wrote: I figured this one out, please ignore, its because I didn't give it a value. If I run crm_master -v 100 at the command line, it spins right up to 100% cpu with no error. Can you retest that with the hg.linux-ha.org/dev version

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-23 Thread Doug Knight
Hi Alan, I've started testing my OCF script, and I'm seeing something unusual during initial startup. I've placed a crm_master call in my stateful_start function, after the function has determined that it is running on what should be the master, and postgresql has successfully started: crm_master

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-23 Thread Alan Robertson
Doug Knight wrote: Hi Alan, I've started testing my OCF script, and I'm seeing something unusual during initial startup. I've placed a crm_master call in my stateful_start function, after the function has determined that it is running on what should be the master, and postgresql has

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-23 Thread Alan Robertson
Doug Knight wrote: Current 2.0.8 tarball from 1/18/07. Process in top looks like: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24591 root 18 0 1663m 1.5g 1028 R 83 77.8 1:19.42 /usr/sbin/crm_master -v 100 It dies and restarts about every 120 seconds, which happens

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-23 Thread Doug Knight
On Fri, 2007-03-23 at 09:25 -0600, Alan Robertson wrote: Doug Knight wrote: Current 2.0.8 tarball from 1/18/07. Process in top looks like: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24591 root 18 0 1663m 1.5g 1028 R 83 77.8 1:19.42 /usr/sbin/crm_master -v 100

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-23 Thread Doug Knight
I figured this one out, please ignore, its because I didn't give it a value. If I run crm_master -v 100 at the command line, it spins right up to 100% cpu with no error. Doug On Fri, 2007-03-23 at 12:52 -0400, Doug Knight wrote: This might help. With the resource in a failed mode, but

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-22 Thread Doug Knight
Hi Alan, I took a look at the drbd OCF script's notify function, and the online documentation. I believe there is one circumstance where I need to make use of the pre/post notify. The last step in my development/testing has to do with several steps I take to prepare the server that was primary and

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-21 Thread Andrew Beekhof
On 3/20/07, Doug Knight [EMAIL PROTECTED] wrote: Hi Alan, I've had some time to try to implement the OCF script for the PostgreSQL WAL file forwarding configuration, and I still am having some issues. The main issue is how do I set up an OCF script that allows the process to run on both

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-21 Thread Doug Knight
Hi Andrew, I had just started reviewing both of thes scripts, and reviewed the Multistate and clone resource pages on the web site. It looks like multistate is how I need to handle it, but a couple of questions first. 1. I noticed that the write-up says the resource must come up on each of the

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-21 Thread Alan Robertson
Doug Knight wrote: Hi Andrew, I had just started reviewing both of thes scripts, and reviewed the Multistate and clone resource pages on the web site. It looks like multistate is how I need to handle it, but a couple of questions first. 1. I noticed that the write-up says the resource must

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-20 Thread Doug Knight
Hi Alan, I've had some time to try to implement the OCF script for the PostgreSQL WAL file forwarding configuration, and I still am having some issues. The main issue is how do I set up an OCF script that allows the process to run on both servers, one process in a starting mode ingesting

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-20 Thread Serge Dubrouski
On 3/20/07, Doug Knight [EMAIL PROTECTED] wrote: Hi Alan, I've had some time to try to implement the OCF script for the PostgreSQL WAL file forwarding configuration, and I still am having some issues. The main issue is how do I set up an OCF script that allows the process to run on both

[Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-14 Thread Doug Knight
Hi All, I currently am running a two node cluster (host1 and host2) with version 2.0.8. I have a resource defined with a place constraint of #uname eq host1, so that it will start on host1 (using an OCF RA script, including all of the required methods). The resource itself has target_role set to

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-14 Thread Alan Robertson
Doug Knight wrote: Hi All, I currently am running a two node cluster (host1 and host2) with version 2.0.8. I have a resource defined with a place constraint of #uname eq host1, so that it will start on host1 (using an OCF RA script, including all of the required methods). The resource itself

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-14 Thread Doug Knight
Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll see if I have any other questions after that. Thanks for getting back to me. Doug On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote: Doug Knight wrote: Hi All, I currently am running a two node cluster (host1

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-14 Thread Alan Robertson
Doug Knight wrote: Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll see if I have any other questions after that. Thanks for getting back to me. Doug On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote: Doug Knight wrote: Hi All, I currently am running a