Re: [Linux-HA] Score Calculation

2007-09-13 Thread Dominik Klein
Yes I meant the resource is running first and crashes later on, so that monitor reports "not running". generally, one shouldn't report "not running" in such cases Okay, maybe I should have read this more precisely. http://www.linux-ha.org/OCFResourceAgent "monitor - monitor the health of a res

[Linux-HA] Repository of ocf scripts?

2007-09-13 Thread Dave Augustus
Hello all, I am making some of my own ocf scripts and was thinking that certainly I am not the only one doing this. Is there a repository out there somewhere? Dave ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman

[Linux-HA] EchoNoNl: command not found ?

2007-09-13 Thread Dave Augustus
/etc/init.d/heartbeat: line 261: EchoNoNl: command not found What is it missing? heartbeat-2.1.2-3.el5.centos heartbeat-stonith-2.1.2-3.el5.centos heartbeat-gui-2.1.2-3.el5.centos heartbeat-pils-2.1.2-3.el5.centos Dave ___ Linux-HA mailing list Linux-

Re: [Linux-HA] Score Calculation

2007-09-13 Thread Max Hofer
On Thursday 13 September 2007, Dominik Klein wrote: > > http://www.linux-ha.org/v2/faq > > > > That's were you'll find one example of calculating scores. Check > > also the list archives---quite a few times this issue came up. > > Of course I read that. > > What I want to know is: Does the cluster

Re: [Linux-HA] Compile error building 2.1.2 on RHEL5 64 bit

2007-09-13 Thread Doug Knight
We recently upgraded from RHEL5 Beta to the full RHEL5. I did a quick check against the glib-related RPMs installed, and found that the full RHEL5 did not have glib installed (it had glib2 though). Comparing back to a system which still had the beta release I saw the glib-1.2.10-19.el5 RPM installe

Re: [Linux-HA] Score Calculation

2007-09-13 Thread Andrew Beekhof
On 9/13/07, Dominik Klein <[EMAIL PROTECTED]> wrote: > > Afraid I can't offer an answer here. Though, in case the "not > > running" part was unexpected, i.e. the resource had been started, > > Yes I meant the resource is running first and crashes later on, so that > monitor reports "not running".

Re: [Linux-HA] one dead ping node caused partially group restart

2007-09-13 Thread Andreas Kurz
On 9/13/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > On 8/30/07, Andreas Kurz <[EMAIL PROTECTED]> wrote: > > On 8/24/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > > On 8/21/07, Andreas Kurz <[EMAIL PROTECTED]> wrote: > > > > Hello all, > > > > > > > > I have a Heartbeat 2.1.2 two-node cluste

Re: [Linux-HA] Score Calculation

2007-09-13 Thread Dominik Klein
Afraid I can't offer an answer here. Though, in case the "not running" part was unexpected, i.e. the resource had been started, Yes I meant the resource is running first and crashes later on, so that monitor reports "not running". then it seems to be wrong to calculate scores differently.

Re: Re: [Linux-HA] CPU time consumption of lrmd

2007-09-13 Thread matilda matilda
>>> Dejan Muhamedagic <[EMAIL PROTECTED]> 13.09.2007 16:24 >>> >No, that's definitely not normal. This is a strong contender on >the todo list here. However, so far seems to be tough to shake it >out and I guess it'll take some time. There's a bug opened and >some discussion on the matter here: > >

Re: [Linux-HA] CPU time consumption of lrmd

2007-09-13 Thread Dejan Muhamedagic
Hi, On Wed, Sep 12, 2007 at 12:58:59PM +0200, matilda matilda wrote: > Hi all, > > after running HAv2 (2.1.2) for a while now (thanks for all the > necessary advice and help) I found out that lrmd is consuming > relativly much CPU time as stated by 'ps'. > > I just want to ask the technical reas

Re: [Linux-HA] Score Calculation

2007-09-13 Thread Dejan Muhamedagic
On Thu, Sep 13, 2007 at 03:49:11PM +0200, Dominik Klein wrote: > >http://www.linux-ha.org/v2/faq > > > >That's were you'll find one example of calculating scores. Check > >also the list archives---quite a few times this issue came up. > > Of course I read that. > > What I want to know is: Does th

Re: [Linux-HA] one dead ping node caused partially group restart

2007-09-13 Thread Andrew Beekhof
On 8/30/07, Andreas Kurz <[EMAIL PROTECTED]> wrote: > On 8/24/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > On 8/21/07, Andreas Kurz <[EMAIL PROTECTED]> wrote: > > > Hello all, > > > > > > I have a Heartbeat 2.1.2 two-node cluster installation using three > > > ping nodes to check network conne

Re: [Linux-HA] Compile error building 2.1.2 on RHEL5 64 bit

2007-09-13 Thread Dejan Muhamedagic
Hi, On Thu, Sep 13, 2007 at 09:32:06AM -0400, Doug Knight wrote: > I could do that, but isn't there a reason why warnings are set to fatal > for the build? True. > Also, I successfully built the 2.0.8 on the same system > some time ago. I was building the 2.1.2 to try it out as an upgrade to > o

Re: [Linux-HA] Score Calculation

2007-09-13 Thread Dominik Klein
http://www.linux-ha.org/v2/faq That's were you'll find one example of calculating scores. Check also the list archives---quite a few times this issue came up. Of course I read that. What I want to know is: Does the cluster make a difference between monitor result "error" and monitor result "n

Re: [Linux-HA] Score Calculation

2007-09-13 Thread Dejan Muhamedagic
Hi, On Thu, Sep 13, 2007 at 11:53:46AM +0200, Dominik Klein wrote: > Hi > > I did some tests on score calculation as I could not find a good web > resource on this. Now I would like to get this confirmed or corrected. > > Suppose I have a resource with a resource location constraint wit score

Re: [Linux-HA] Compile error building 2.1.2 on RHEL5 64 bit

2007-09-13 Thread Doug Knight
I could do that, but isn't there a reason why warnings are set to fatal for the build? Also, I successfully built the 2.0.8 on the same system some time ago. I was building the 2.1.2 to try it out as an upgrade to our existing configuration. Doug On Thu, 2007-09-13 at 14:55 +0200, Christian Frank

Re: [Linux-HA] Compile error building 2.1.2 on RHEL5 64 bit

2007-09-13 Thread Christian Frank
Hi, add --disable-fatal-warnings to the configure. cause "cc1: warnings being treated as errors" will stop compiling everytime you have a compiler warning. Regards, Christian Doug Knight schrieb: All, I just downloaded the 2.1.2 tar ball to my RHEL5 64 bit system, and got the following error

[Linux-HA] Compile error building 2.1.2 on RHEL5 64 bit

2007-09-13 Thread Doug Knight
All, I just downloaded the 2.1.2 tar ball to my RHEL5 64 bit system, and got the following error during the ConfigureMe make phase: gcc -DHAVE_CONFIG_H -I. -I. -I../../include -I../../include -I../../include -I../../include -I../../linux-ha -I../../linux-ha -I../../libltdl -I../../libltdl -I/usr/

Re: Re: [Linux-HA] R1 to R2 testing: cib.xml & ldirectord questions for 2 node cluster

2007-09-13 Thread Peter Farrell
Hi Andreas - Everything seems fine. I responded to Andrew in relation to what I'd asked Dejan. (Regarding failover w/ no network connectivity) We use the R1 version of heartbeat / ldirectord as follows: On our DMZ segment: 2 nodes (running hearbeat w/ ldirectord) connected via serial 2 webserver

Re: [Linux-HA] R1 to R2 testing: cib.xml & ldirectord questions for 2 node cluster

2007-09-13 Thread Andrew Beekhof
On 9/13/07, Peter Farrell <[EMAIL PROTECTED]> wrote: > On 13/09/2007, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > On 9/13/07, Peter Farrell <[EMAIL PROTECTED]> wrote: > > > Thanks Dejan - that was very helpful. > > > > > > Last question :-) > > > > > > I've modified this setup, taking into accoun

Re: [Linux-HA] Mounting and Unmount a volume

2007-09-13 Thread Andrew Beekhof
On 8/26/07, Jake Conk <[EMAIL PROTECTED]> wrote: > Hello, > > I installed OCFS2 and got it up and running on my computers and network but > now I want to have heartbeat configured to be incharge of mounting and > unmounting the disk I'm sharing over the network when its host machine is up > and goe

Re: [Linux-HA] Getting two drbd resources to run as master on same host

2007-09-13 Thread Andrew Beekhof
On 8/24/07, Martin Bene <[EMAIL PROTECTED]> wrote: > Hi, > > I've got a heartbeat 2.1.2 cluster with two nodes and two drbd resources > set up as described on > http://wiki.linux-ha.org/DRBD/HowTov2?highlight=%28drbd%29: > > Drbd0 prefers to run as master on node1, drbd1 prefers to run as master >

Re: [Linux-HA] R1 to R2 testing: cib.xml & ldirectord questions for 2 node cluster

2007-09-13 Thread Andrew Beekhof
On 9/13/07, Peter Farrell <[EMAIL PROTECTED]> wrote: > Thanks Dejan - that was very helpful. > > Last question :-) > > I've modified this setup, taking into account your advice about > removing the ldirectord attributes (where are these things > documented!?). http://linux-ha.org/ResourceAgents >

Re: Re: [Linux-HA] R1 to R2 testing: cib.xml & ldirectord questions for 2 node cluster

2007-09-13 Thread matilda matilda
>>> "Peter Farrell" <[EMAIL PROTECTED]> 12.09.2007 17:24 >>> > Andreas - > > A follow up if you will... Hi Peter, do you need further assistence. I saw that Dejan helped you. Is everything fine and running? In which mode does you use ldirectord? (NAT?) Best regards Andreas Mock _

[Linux-HA] Helper scripts: showfailcount resetfailcount showscores

2007-09-13 Thread Dominik Klein
Hi I would like to share these scripts I wrote. showfailcount.sh shows - who would have guessed it - the failcount of all resources on all nodes showscores.sh shows the current scores of all resources on all nodes resetfailcount.sh resets the failcount on all nodes for all resources or the

Re: [Linux-HA] R1 to R2 testing: cib.xml & ldirectord questions for 2 node cluster

2007-09-13 Thread Peter Farrell
Thanks Dejan - that was very helpful. Last question :-) I've modified this setup, taking into account your advice about removing the ldirectord attributes (where are these things documented!?). Everything is running fine and I've got no errors in cluster.log - only a few WARN's and they seem ha

Re: [Linux-HA] fail count was initialized after recovering fromSplitBrain

2007-09-13 Thread Yan Fitterer
>> there is no 'cib' process. > > actually there is :-) Oops - Thanks for the correction Andrew! > >> If I understand things right, the crmd >> process handles all core CIB maintenance operations. > > nope, all done by the CIB process Makes sense __

[Linux-HA] Score Calculation

2007-09-13 Thread Dominik Klein
Hi I did some tests on score calculation as I could not find a good web resource on this. Now I would like to get this confirmed or corrected. Suppose I have a resource with a resource location constraint wit score 300 for node A, score 250 for node B, resource stickiness = 200, failure stic

Re: [Linux-HA] How to reorder resources in a group?

2007-09-13 Thread Yan Fitterer
I'd like to reorder the primitives in that group: Resource Group: web resource_web_ip (heartbeat::ocf:IPaddr2) resource_web_ip_27 (heartbeat::ocf:IPaddr2) resource_web_fs_ww (heartbeat::ocf:Filesystem) resource_web_fs_cache (heartbeat::ocf:Filesystem) resource_web

Re: [Linux-HA] Resource monitoring question

2007-09-13 Thread Andrew Beekhof
On 8/24/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > On Fri, Aug 24, 2007 at 10:17:17AM +0200, peter panda wrote: > > Hi, > > > > i'm trying to use the resource monitoring feature of Heartbeat V2. i have > > read the documentation, but still have some questions please. > > 1. does my script n

Re: [Linux-HA] Could heartbeat work correctly when the "keepalive" time is very short?

2007-09-13 Thread Andrew Beekhof
On 8/24/07, mingdao lu <[EMAIL PROTECTED]> wrote: > hi, all > I have a cluster which include 12 nodes. > For each node, I set the ha.cf as: > keepalive 100ms > warntime 300ms > deadtime 500ms > > This means that heartbeat should handle about 120 udp packets every seconds. > Could heartbeat work cor

Re: [Linux-HA] fail count was initialized after recovering fromSplitBrain

2007-09-13 Thread Andrew Beekhof
On 9/13/07, Yan Fitterer <[EMAIL PROTECTED]> wrote: > > > Junko IKEDA wrote: > >>> once again something about SplitBrain... > >>> During SplitBrain, I wrecked the resource on the both nodes. > >>> fail count was increased at this time. > >>> But after recovering from SplitBrain, fail count returned

Re: [Linux-HA] How to reorder resources in a group?

2007-09-13 Thread Andrew Beekhof
On 8/27/07, Martin Bene <[EMAIL PROTECTED]> wrote: > Hi > > I've got an ordered resource group "web" (on heartbeat 2.1.2) > > Resource Group: web > resource_web_ip (heartbeat::ocf:IPaddr2) > resource_web_fs_ww (heartbeat::ocf:Filesystem) > resource_web_fs_cache (heartbeat::oc

Re: [Linux-HA] How to disable crm debug info?

2007-09-13 Thread Andrew Beekhof
On 8/31/07, mingdao lu <[EMAIL PROTECTED]> wrote: > hi, all > I have a strange problem that it always output debug info when I used > command "crm_resource". > For example: > ipmux2:~ # crm_resource -L > crm_resource[19729]: 1999/12/10_21:56:01 info: Invoked: crm_resource -L > Resource Group: group

Re: [Linux-HA] patch: bug in the xen resource agent

2007-09-13 Thread Andrew Beekhof
On 8/27/07, Per Andreas Buer <[EMAIL PROTECTED]> wrote: > There is a bug in the Xen resource agent. monitor calls fail if monitor > is called during migrations. To trigger det bug - just set up HB2 to use > live migrations and flip one of the nodes into and out of standby mode. > After a short whil

Re: [Linux-HA] STONITH without special hardware

2007-09-13 Thread Philip Gwyn
On 11-Sep-2007 Departamento Técnico de El Norte de Castilla wrote: > Yes, but I heard about some magical key combinations that restart systems > even in a kernel panic (Something like Alt + Sys Req + some key) and I > thought that maybe some kernel module waiting for a serial com signal could > d

Re: [Linux-HA] fail count was initialized after recovering fromSplitBrain

2007-09-13 Thread Yan Fitterer
Junko IKEDA wrote: once again something about SplitBrain... During SplitBrain, I wrecked the resource on the both nodes. fail count was increased at this time. But after recovering from SplitBrain, fail count returned to zero on both! Is this due to the restart of crmd or pengine/tengine? Mo

Re: [Linux-HA] fail count was initialized after recovering fromSplitBrain

2007-09-13 Thread Andrew Beekhof
On 9/13/07, Junko IKEDA <[EMAIL PROTECTED]> wrote: > > > once again something about SplitBrain... > > > During SplitBrain, I wrecked the resource on the both nodes. > > > fail count was increased at this time. > > > But after recovering from SplitBrain, fail count returned to zero on > both! > > >

RE: [Linux-HA] fail count was initialized after recovering fromSplitBrain

2007-09-13 Thread Junko IKEDA
> > once again something about SplitBrain... > > During SplitBrain, I wrecked the resource on the both nodes. > > fail count was increased at this time. > > But after recovering from SplitBrain, fail count returned to zero on both! > > Is this due to the restart of crmd or pengine/tengine? > > Mos