[Linux-HA] Resource ldirectord and monitoring

2007-06-04 Thread matilda matilda
Hi all, another question for HAv2: I only have a ldirectord resource with type heartbeat and provider heartbeat. I it possible to have a operation monitor for that resource? Or is this not valid as this resource agent has no function monitor? How would I monitor this resource? Best regards Andr

[Linux-HA] Need explanation for colocation constraint

2007-06-04 Thread matilda matilda
Hi all, after reading almost all stuff on linux-ha.org, digging around in the mailing list and doing many tries on a test cluster installation (HAv2, version 2.0.7 as of SuSE SLES 10) I do have a phenomena I can't explain to myself. Probably someone can help. What I want: * 3 IP-resources (wor

Antw: Re: [Linux-HA] When removing a resource the crm_mon is not updated

2007-06-04 Thread matilda matilda
Hi Andrew, hi Claes , I'm very new here, but experimented with HAv2 on SLES10 the last days. I found something similar. Is it possible that crm_mon says that there is a resource because it finds one in the LRM tree of the CIB? When I delete a resource with crm_resource, the LRM part is there unti

Antw: Re: Re: [Linux-HA] When removing a resource the crm_mon is not updated

2007-06-04 Thread matilda matilda
Hi Andrew, I have to check. Thank's for the advice. Best regards Andreas Mock >>> "Andrew Beekhof" <[EMAIL PROTECTED]> 04.06.2007 15:26 >>> On 6/4/07, matilda matilda <[EMAIL PROTECTED]> wrote: > Hi Andrew, hi Claes , > > I'm very new

Antw: Re: [Linux-HA] Need explanation for colocation constraint

2007-06-04 Thread matilda matilda
leased SP1, with a vastly improved Heartbeat package. Try reproducing the issue with the upgraded version. matilda matilda wrote: > Hi all, > > after reading almost all stuff on linux-ha.org, digging around in the mailing > list > and doing many tries on a test cluster installation (HAv2, v

Antw: Re: Re: Re: [Linux-HA] When removing a resource the crm_mon is not updated

2007-06-04 Thread matilda matilda
01 >>> btw. there is an option that tells the crm what to do with deleted resources... whether to stop them or ignore them. so what it does is up to the admin On 6/4/07, matilda matilda <[EMAIL PROTECTED]> wrote: > Hi Andrew, > > I have to check. Thank's for the advice

Re: Antw: Re: [Linux-HA] When removing a resource the crm_mon is not updated

2007-06-04 Thread matilda matilda
Beekhof it could be a little bug here. Kindly Regards Claes Lindvall matilda matilda skrev: > Hi Andrew, hi Claes , > > I'm very new here, but experimented with HAv2 on SLES10 the last > days. > I found something similar. Is it possible that crm_mon says > that there is a re

Re: Antw: Re: [Linux-HA] When removing a resource the crm_mon is not updated

2007-06-05 Thread matilda matilda
; shows me that the resource was added - now I have "one" > >> > resource in my cluster. > >> > I stop it with "crm_resource -r -p target_role -v > >> > stopped. > >> > I goes down and then I remove the resource with "cibadmin -D -o

[Linux-HA] OCF compliant Resource Agent for ldirectord

2007-06-06 Thread matilda matilda
Hi all, I'm pretty new here. Therfore I hope my offer is not completely meaningless: I want to use ldirectord with the new HAv2 and found out, that there is no OCF compliant Resource Agent for this daemon. (SLES 10, HA 2.0.7) Is this true at the moment or didn't I find it? If it's true, I just

Re: Re: [Linux-HA] OCF compliant Resource Agent for ldirectord

2007-06-11 Thread matilda matilda
Hi Horms, as requested I send the script attached. Best regards Andreas Mock >>> Horms <[EMAIL PROTECTED]> 07.06.2007 02:58 >>> > Hi Andreas, > > In the past there was some work done on ldirectord to make it > an OCF resource, but unfortunately this work is not complete. > In the long term I t

[Linux-HA] Need explanation of several cluster options

2007-07-10 Thread matilda matilda
Hi all, just installed version 2.1.0 of heartbeat and found some messages in log files about using defaults for several cluster options. These are: dc_deadtime cluster_recheck_interval election_timeout shutdown_escalation crmd-integration-timeout crmd-finalization-timeout Where do I find explana

Re: Re: [Linux-HA] Need explanation of several cluster options

2007-07-10 Thread matilda matilda
Hi Lars, thank you for the fast answer. Best regards Andreas Mock >>> Lars Marowsky-Bree <[EMAIL PROTECTED]> 10.07.2007 11:49 >>> On 2007-07-10T11:04:56, matilda matilda <[EMAIL PROTECTED]> wrote: [...] ___ Linux-HA mai

[Linux-HA] Cluster options

2007-07-10 Thread matilda matilda
Hi all, another question to get sure: I found documentation of cluster options in /usr/lib/heartbeat/crm.dtd. All cluster options there a written with underscored, e.g. 'no_quorum_policy'. If I look at the /var/log/ha-log (installed log sink) I do see that the name taken for this option is: 'no-qu

[Linux-HA] Problems with stonith device version 2

2007-07-10 Thread matilda matilda
Hi all, I just configured the stonith device external/ssh as described in http://www.linux-ha.org/ConfiguringStonithPlugins I use version 2.1.0. But I get a warning in /var/log/ha-debug and I shall do a crm_verify. I did it. The output is: =

[Linux-HA] Question concerning certain log entry

2007-07-11 Thread matilda matilda
Hi all, I find often a log entry of the following kind: - WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 240 ms - Can anyone explain what that means? Best regards Andreas Mock __

[Linux-HA] Stopping single instances of a clone

2007-07-11 Thread matilda matilda
Hi all, can I stop a single instance of a clone on a certain node? If yes, how is it done? Best regards Andreas Mock ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.or

[Linux-HA] HA 2.1.0 crm_failcount does not work as expected

2007-07-11 Thread matilda matilda
Hi folks, sorry for sending so much questions. You see I'm testing wildly with heartbeat 2.1.0 and I struggle at almost every step. As discusses with the announcement of the www.linux-ha.org/Education it's very hard to help myself because I really don't find the informations needed. So probably a

Re: Re: [Linux-HA] HA 2.1.0 crm_failcount does not work as expected

2007-07-11 Thread matilda matilda
Hi Andrew, hi folks, >>> "Andrew Beekhof" <[EMAIL PROTECTED]> 11.07.2007 15:04 >>> > > first of all, run: crm_failcount --help Ok, ok :-) I did it, but I'm sure I missed something. It's time to take a coffee break. I'm very sorry for bothering. > then run: crm_failcount -r 'cs_ocfs-01-clone:

[Linux-HA] Reasonable values for timeouts

2007-07-12 Thread matilda matilda
Hi all, how do I get reasonable values for timeout attributes for certain operations? How can I tune them? Or shall I use the values provided in the RA metadata? Best regards Andreas Mock ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://li

Antw: Re: [Linux-HA] Reasonable values for timeouts

2007-07-12 Thread matilda matilda
Hi all, sorry, I forgot this information: I'm working with HAv2 (cib) 2.1.0. Best regards Andreas Mock >>> "Andrew Beekhof" <[EMAIL PROTECTED]> 12.07.2007 13:53 >>> On 7/12/07, matilda matilda <[EMAIL PROTECTED]> wrote: > Hi all, > > how

Re: Re: Re: [Linux-HA] Reasonable values for timeouts

2007-07-12 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 12.07.2007 15:40 >>> > >>> "Andrew Beekhof" <[EMAIL PROTECTED]> 12.07.2007 13:53 >>> > On 7/12/07, matilda matilda <[EMAIL PROTECTED]> wrote: > > Hi all, > > > &

Antw: Re: [Linux-HA] Reasonable values for timeouts

2007-07-13 Thread matilda matilda
t fail over ~60 seconds is good. If you > go to low the state machine mechanics can start getting tricky. > > > > On 7/12/07, matilda matilda <[EMAIL PROTECTED]> wrote: > > > > >>> "Andrew Beekhof" <[EMAIL PROTECTED]> 12.07.200

Antw: Re: [Linux-HA] Reasonable values for timeouts

2007-07-13 Thread matilda matilda
Hi Andrew, you wrote: >>> "Andrew Beekhof" <[EMAIL PROTECTED]> 13.07.2007 09:43 >>> > > >i _think_ that the interval is the time between one action ending and the >next one starting (rather than between both starting) > >at least i hope that This is a big difference. So, I would be interested if

[Linux-HA] Tool for querying current resource score

2007-07-14 Thread matilda matilda
Hi Andrew, hi all, I just followed the thread about resource failover because I do have difficulties to understand every aspect even after reading he available doc and the answers to related questions on the list. Even if Maxim starts to document this stuff as the new volunteer ;-) it would be g

Antw: Re: [Linux-HA] Tool for querying current resource score

2007-07-15 Thread matilda matilda
Hi Lars, >>> Lars Marowsky-Bree <[EMAIL PROTECTED]> 14.07.07 21.17 Uhr >>> >>On 2007-07-14T17:49:53, matilda matilda <[EMAIL PROTECTED]> wrote: >> So, is there a tool to read the current scores of resources? > >ptest can do what you want. If you ru

Antw: Re: Re: [Linux-HA] Tool for querying current resource score

2007-07-16 Thread matilda matilda
> "Andrew Beekhof" <[EMAIL PROTECTED]> 16.07.2007 08:16 >>> On 7/15/07, matilda matilda <[EMAIL PROTECTED]> wrote: > > Hi Lars, > > >>> Lars Marowsky-Bree <[EMAIL PROTECTED]> 14.07.07 21.17 Uhr >>> > >>On 2007-07-14T17:49:53, ma

Re: Re: [Linux-HA] Resource stickiness count is incremental ?

2007-07-16 Thread matilda matilda
"Andrew Beekhof" <[EMAIL PROTECTED]> 16.07.2007 08:19 >>> >On 7/16/07, Maxim Veksler <[EMAIL PROTECTED]> wrote: >> to 15 times before it migrates to a second node. I would to have a >> fixed "stickiness" value for each resource - that is I would like each >> of them to a maxim fail count of 5.

Re: Re: Re: Re: [Linux-HA] Tool for querying current resource score

2007-07-16 Thread matilda matilda
>>> Lars Marowsky-Bree <[EMAIL PROTECTED]> 16.07.2007 09:44 >>> > On 2007-07-16T09:10:32, matilda matilda <[EMAIL PROTECTED]> wrote: > >(Andreas, is that From: line intentional? ;-) I was forced to take a generic account. As I knew that the mailer would t

Re: [Linux-HA] Confusion about MailTo RA and monitoring

2007-07-16 Thread matilda matilda
>>> Peter Kruse <[EMAIL PROTECTED]> 16.07.2007 10:58 >>> >According to http://www.linux-ha.org/OCFResourceAgent >a Resource Agent is required to support the monitor >action. But in the MailTo agent I find: > >ocf_log warn "Don't stat/monitor me! MailTo is a pseudo resource agent, >so the status r

Re: Re: [Linux-HA] Confusion about MailTo RA and monitoring

2007-07-16 Thread matilda matilda
>>> Lars Marowsky-Bree <[EMAIL PROTECTED]> 16.07.2007 11:31 >>> > >Even if no "monitor" operation is configured, the cluster will do >startup probes to find out whether the resource is running somewhere and >in what state. > >If those calls blindly return "no error" (ie, 0, success), the cluster >

[Linux-HA] Probably metadata missing

2007-07-16 Thread matilda matilda
Hi all, I'm using HAv2 2.1.0. When I'm doing a 'crm_verify -VV -L' I get the following output: == crm_verify[8659]: 2007/07/16_15:40:14 notice: main: Required feature set: 1.1 crm_verify[8659]: 2007/07/16_15:40:14 notice: cluster_opti

Re: Re: Re: [Linux-HA] Resource stickiness count is incremental ?

2007-07-16 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 16.07.2007 18:09 >>> >On 7/16/07, matilda matilda <[EMAIL PROTECTED]> wrote: > >The failover stuff isn't always the easiest to figure out, trial-and-error >can be a valid form of exploration too :-)

Re: [Linux-HA] WARN: unpack_rsc_op:

2007-07-18 Thread matilda matilda
>>> "Taldevkar, Chetan" <[EMAIL PROTECTED]> 18.07.2007 07:23 >>> >Hi all, > >When I start cluster lunixha is able to invoke start call on both the >nodes. On first node start fails as script calls echo "stopped" >followed by exit 1 as this resource needs to be running on the second >node. After tha

Re: Re: [Linux-HA] Don't get back resource again

2007-07-18 Thread matilda matilda
>>> "Maxim Veksler" <[EMAIL PROTECTED]> 18.07.2007 18:13 >>> > I myself would appreciate a more experienced answer from the gurus of the > list. Hi all, I'm happy (or actually not) to see that there are others who struggle with the whole stickiness thing and score calculation. Therefore I can o

Re: Re: [Linux-HA] Cluster Monitoring tool

2007-07-23 Thread matilda matilda
>>> Michael Schwartzkopff <[EMAIL PROTECTED]> 24.07.2007 06:27 >>> >Am Dienstag, 24. Juli 2007 05:08 schrieb Gokak, Arun Madhukar: >Hi, > >A SNMP Subagent is included in the project. No need to invent the wheel again. >Look for SNMP or hbagent in the documentation of the sources. Works quite >goo

Re: [Linux-HA] Re: Spurious failover?

2007-07-23 Thread matilda matilda
Hi Carlo, just a hint. We discussed here how low someone should set the timeout value for the monitor action. And there were many voices saying that it's not a good idea to set this value too low. So, in your case probably a timeout of 5s is too low. I got the same things when I thought a timeo

[Linux-HA] Problems with colocation rule and group

2007-07-23 Thread matilda matilda
Hi all, I do have a strange behaviour with HAv2 2.1.0 in the following situation: * 2 nodes in cluster, all fine * 1 IP resource, works on it's own * 1 group consisting of 4 Filesystem resources * default-resource-stickiness = 50 * default-resource-failure-stickiness = -101 * there are two constr

[Linux-HA] OCF-RA db2

2007-07-23 Thread matilda matilda
Hi all, hi Alan, I wanted to use the OCF-RA db2 as found in HAv2 2.1.0. What I found out: The whole RA assumes that the home directory of the OS user holding the runtime environment of the DB2 instance is available on every node. Almost any action/operation of the RA relies on that. This is a pro

[Linux-HA] Need a rule

2007-07-25 Thread matilda matilda
Hi all, I want to accomplish the following behaviour for two resource A and B: * If A is running and B will be started, B should run on the same node as A * If A is NOT started and B will be started, B can be run whereever it wants. * If A is not running, B is running on node X, A is started on n

Re: Re: [Linux-HA] Need a rule

2007-07-25 Thread matilda matilda
>>> Max Hofer <[EMAIL PROTECTED]> 25.07.2007 12:12 >>> > This is a simple assymetric co-location rule (since 2.0.8 all co-location > rules are assymetric). > > This means write a rule that B follows A: > Hi Max, that is exactly what I thought, too. BUT: With this constraint, if A is NOT starte

Re: Re: [Linux-HA] onboard NIC looses IP on reboot

2007-07-25 Thread matilda matilda
>>> Rudi Ahlers <[EMAIL PROTECTED]> 25.07.2007 13:29 >>> ><6>eth1: SiS 900 PCI Fast Ethernet at 0xa800, IRQ 225, 00:0f:ea:c3:3c:55. ><6>eth0 renamed to ethxx0 ><6>eth1 renamed to eth0 ><6>ethxx0 renamed to eth1 Hi Rudi, only my guess: 1) Delete all entries in the file /etc/udev/rules.d/??-net_per

Re: Re: [Linux-HA] Need a rule

2007-07-25 Thread matilda matilda
Hi Max, >>> Max Hofer <[EMAIL PROTECTED]> 25.07.2007 14:33 >>> > It really would help if you just attached the CIB before and after you > started > resource A. ;-) I thought it would be easier this way. :-)) > [...many many valuable informations...] Thank you for the informations and the ti

Re: Re: [Linux-HA] Need a rule

2007-07-25 Thread matilda matilda
Hi Max, >>> Max Hofer <[EMAIL PROTECTED]> 25.07.2007 15:55 >>> > [...] (not to mention that my english is not the best). [...] You can write it in Austrian if you like... ;-)) Best regards Andreas Mock ___ Linux-HA mailing list Linux-HA@lists.linux-

Re: Re: [Linux-HA] Need a rule (added)

2007-07-25 Thread matilda matilda
Hi Max, hi all, for all who want to follow up, attached the needed files: * cluster-options.xml : all options explicit * r-A.xml (resource definition and constraints for resource A) * r-B.xml (resource definition and constraints for resource B) * WARNING: My nodes are called db01 and db02. You ha

[Linux-HA] Question concerning current HA versions

2007-07-27 Thread matilda matilda
Hi all, the new "official" HA version 2.1.1 is out. I tried to install it on SLES 10 SP1 to find out that there are missing dependecies (mostly perl stuff) which can't be resolved by standard package sources. So, my question is: 1) Is 2.1.1 better/newer/more bug free than the version 2.1.0 I can

Re: [Linux-HA] Question concerning current HA versions

2007-07-27 Thread matilda matilda
>>> Jan Kalcic <[EMAIL PROTECTED]> 07/27/07 2:28 PM >>> > Just a quick note. Novell does not longer support your system if it is > update through unofficial system. Hi Jan, thank you for that hint. I do know this. Your comment can be a good starting point to discuss about that in general. ;-) a

Antw: Re: [Linux-HA] Question concerning current HA versions

2007-07-30 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 30.07.2007 10:05 >>> > > I'll be refreshing the packages there in the next week or so once > things calm down. Hi Andrew, thank you for your answer and for updating the packages. ;-) Best regards Andreas Mock ___

[Linux-HA] Correct recovering from state: RS stopped because couldn't be started

2007-07-31 Thread matilda matilda
Hi all, environment: HAv2 (2.1.0) Can anybody tell how to recover correctly from the following situation: * HA comes up * One resource couldn't be started on every node * This resource is shown as stopped in crm_mon * Failcount is 0 (Yes, I read that this should change) How do I tell HA to retr

Re: Re: [Linux-HA] Correct recovering from state: RS stopped because couldn't be started

2007-07-31 Thread matilda matilda
>>> "Andreas Kurz" <[EMAIL PROTECTED]> 31.07.2007 12:22 >>> > I think "crm_resource -C -r yourresource" should do this job. Hi Andreas, thank you for your reply. I thought this too, but: 1) the online usage screen of crm_resource says:

Re: Re: Re: [Linux-HA] Correct recovering from state: RS stopped because couldn't be started

2007-08-01 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 01.08.2007 16:17 >>> > > you should use --force until i refresh the packages next week Hi Andrew, thank you for answering. Best regards Andreas Mock ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http:

[Linux-HA] Certain log message filling the log files

2007-08-06 Thread matilda matilda
Hi all, I'm using HAv2 (2.1.0) and a log messages of the kind: lrmd: [4715]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x517bb8) are filling up my log files. At the moment I do have more than 3 lines of that. The log file started

Antw: [Linux-HA] Certain log message filling the log files

2007-08-07 Thread matilda matilda
>>> "matilda matilda" <[EMAIL PROTECTED]> 06.08.2007 19:45 >>> > Hi all, > > I'm using HAv2 (2.1.0) and a log messages of the kind: > lrmd: [4715]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too > long to execute: 50 ms (> 30

Re: Re: [Linux-HA] Certain log message filling the log files

2007-08-07 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 07.08.2007 14:52 >>> > either the lrmd took too long to do something or the clplumbing > library is being overly sensitive... but i'm no lrmd expert so i dont > know which Hi Andrew, thank you for trying to help me. I don't understand why I do get these

[Linux-HA] Self fencing with HAv2 + OCFS2

2007-08-07 Thread matilda matilda
Hi all, I did some tests with HAv2 (2.1.0) and OCFS2 on SLES SP1. Sometimes it happens that - if I turn off one node of the 2-node-cluster immediately - the other node does self fencing initiated by the OCFS stack. So, now I'm wondering. I thought, that HAv2 takes the administrative role of the O

Re: Re: Re: [Linux-HA] Certain log message filling the log files

2007-08-07 Thread matilda matilda
>>> "Andreas Kurz" <[EMAIL PROTECTED]> 07.08.07 20.07 Uhr >>> > I am running a two-node test system with heartbeat 2.1.2 which logged > the same lrmd messages on one node (not DC). Because it is not a live > system I simply killed the local lrmd to test the automatic respawn of > heartbeat ... it w

[Linux-HA] Shutdown with rcheartbeat stop hangs

2007-08-08 Thread matilda matilda
Hi all, it's just me again. :-) Using HAv2 (2.1.1), two nodes configured db01, db02. DC was db02, both nodes nothing to do despite monitoring the resources. Trying to bring down one node with rcheartbeat stop. This action didn't move on. When I looked at the logs I found the following enry:

Re: [Linux-HA] problems with lrmd after kill & respawn

2007-08-09 Thread matilda matilda
>>> "Andreas Kurz" <[EMAIL PROTECTED]> 08/09/07 4:34 PM >>> > The only way to resolve this issue was a restart of heartbeat on the > node with the stuck lrmd and a simple "/etc/init.d/heartbeat stop" was > running more than 20 minutes without success. Has someone an idea > whats wrong? Hi all, I

[Linux-HA] Don't know how to interpret log entries

2007-08-10 Thread matilda matilda
Hi all, using HAv2 (2.1.0) I found the following log entries: crmd[9975]: 2007/08/10_04:51:59 info: do_lrm_rsc_op: Performing op=cs_stonith-db02-clone:0_notify_0 key=161:23:ef743132-acab-4f9f-a051-b7d16503994b) lrmd[28686]: 2007/08/10_04:51:59 ERROR: sending stonithRA op to stonithd failed. sto

[Linux-HA] Urgent question about clone ordering

2007-08-10 Thread matilda matilda
Hi all, an urgent question before weekend (HAv2, 2.1.0): In the crm.dtd I read for clones: --- * ordered Start (or stop) each clone only after the operation on the previous clone completed. --- Is this also valid for the monitor operation? And other operations? Background i

Re: Re: [Linux-HA] Output of crm_mon in different format

2007-08-10 Thread matilda matilda
>>> Dejan Muhamedagic <[EMAIL PROTECTED]> 10.08.2007 14:16 >>> > Output of crm_mon is an end product, so to speak. It's nice to > get a quick overview of the cluster status, but otherwise it was > not meant to be parsed. That's the reason why I asked to make it parseable. :-) > You can get most o

Antw: Re: Re: [Linux-HA] Don't know how to interpret log entries

2007-08-10 Thread matilda matilda
>>> Dejan Muhamedagic <[EMAIL PROTECTED]> 10.08.2007 14:04 >>> > Isn't that the other way around: one actually has to specify a > notify operation in the CIB, i.e. nobody's notified by default. Hi Dejan, you're absoultly right in that. A copy and paste brought 'notify=true" in. You can be sure I

[Linux-HA] Output of crm_mon in different format

2007-08-10 Thread matilda matilda
Hi Andrew, hi all, I'm interested in getting the informations I can get by invoking 'crm_mon -1 -r' in a different format or in a different way. Background is, that I want to script some little helpers to make life easier with HAv2. I started to parse the output of the command above, which is not

Antw: Re: Re: [Linux-HA] Oh my god, I got a core dump

2007-08-10 Thread matilda matilda
>>> "matilda matilda" <[EMAIL PROTECTED]> 10.08.2007 12:52 >>> >>> "Andrew Beekhof" <[EMAIL PROTECTED]> 10.08.2007 12:37 >>> > always Ok, you wanted it. :-)) Hi Andrew, ha ha, I also found one on the other node at a tota

[Linux-HA] Oh my god, I got a core dump

2007-08-10 Thread matilda matilda
Hi all, me again. Using HAv2 (2.1.0). Have a nice story for you: As I looked at my heartbeat cluster with crm_mon -1 -r I was surprised to see 2 nodes up and NO, really NO resources. I had 20 up and running. Uppps, probably a self-cleaning feature I didn't know before. ;-)) I recognized that I

Re: Re: [Linux-HA] Oh my god, I got a core dump

2007-08-10 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 10.08.2007 12:37 >>> > always Ok, you wanted it. :-)) > no files, we need a stack trace (which can be extracted from the core > file using gdb but only from the machine that generated it) > > which process? It was crmd. Here is the output of command b

[Linux-HA] Bug filed

2007-08-10 Thread matilda matilda
Hi all, as promised, I filed a bug at bugzilla concerning core dumping. http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1682 Best regards Andreas Mock ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/m

Re: Re: [Linux-HA] problems with lrmd after kill & respawn

2007-08-10 Thread matilda matilda
>>> Dejan Muhamedagic <[EMAIL PROTECTED]> 10.08.2007 13:50 >>> > I'd appreciate if you could turn on debugging and create a > bugzilla and attach the resulting log. It is rather hard to > figure out why a signal handler is invoked late. Perhaps the > debug info could help. Hi Dejan, I will take t

Re: Re: [Linux-HA] Shutdown with rcheartbeat stop hangs

2007-08-10 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 10.08.2007 12:55 >>> >> How can I find out why it doesn't happen? > > mostly, by reading the logs > did db02 respond? > do you see the test "stalling the FSA" anywhere (towards the end)? Hi Andrew, what do you think about a nice screencast with the title:

Re: Re: [Linux-HA] Don't know how to interpret log entries

2007-08-10 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 10.08.2007 13:30 >>> > i think it means stonith agents dont support the notify action. > just turn it off in the CIB for that clone. I think you're right. Document http://linux-ha.org/ExternalStonithPlugins doesn't say anything about notify operation. So t

Re: Re: [Linux-HA] Output of crm_mon in different format

2007-08-10 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 10.08.2007 14:55 >>> >>> What do you think besides having no time, Andrew? > > I'm happy to include other output modes in crm_mon.. just send me a patch :-) Ok, ok, ok. Let's make a deal: * You include the code fragment needed to query different resource a

Re: Re: [Linux-HA] Urgent question about clone ordering

2007-08-13 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 13.08.2007 11:38 >>> >> Is this also valid for the monitor operation? And other operations? > >nope I feared that. > >> >> Background is: Every clone of a stonith RA is trying to get the status of >> the stonith device. >> Most of them can be contacted ON

Re: Re: Re: [Linux-HA] Urgent question about clone ordering

2007-08-13 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 13.08.2007 13:45 >>> > as of the version i'm in the middle of packaging for the opensuse > build service, this wont be an issue. these types of stonith > configurations (non-clone) will be just as reliable as any other. Hi Andrew, in which way? Can you gi

Re: Re: Re: Re: [Linux-HA] Urgent question about clone ordering

2007-08-13 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 13.08.2007 15:14 >>> > We will ensure that: > a) all STONITH resources that can be active, are active before we > issue a STONITH request I really don't understand this, sorry. I thought, that on every node of the cluster the stonithd is started which start

[Linux-HA] Logging semantics changed from HA 2.1.0 to HA 2.1.2 (opensuse build)

2007-08-13 Thread matilda matilda
Hi Andrew, hi all, I updated from HA 2.1.0 to HA 2.1.2 (opensuse build) on SLES10 SP1. What I found first is, that without changing logd.cf and entry in /etc/syslog-ng/syslog-ng.conf I only get logging messages from logd and heartbeat where I got messages from all related processes before: My l

[Linux-HA] Need hint for rule HAv2

2007-08-14 Thread matilda matilda
Hi all, I'm using HAv2 and after trying several things I'm stuck and I don't know if it's even possible. I can only achieve a subset of my requirements but not all together. As I think that my requirement is really common I hope that you can help me or give me the right hints. Probably someone has

Re: Re: Re: Re: Re: [Linux-HA] Urgent question about clone ordering

2007-08-14 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 14.08.2007 12:23 >>> > does this help? Hi Andrew, thank you for the explanation. It clarifies some aspects. But back to my initial question. If I configure a plugin 1 whichis able to shoot node 2 and a plugin 2 which is able to shoot node 1, is it meaningf

Re: Re: [Linux-HA] WARN: status_from_rc: Action monitor on node failed (target: 7 vs. rc: 0): Error

2007-08-14 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 14.08.2007 12:16 >>> > if you're writing an OCF agent, try ocf-tester > if you're not writing an OCF agent... you should be :-) A little bit long, but a nice slogan for a T-shirt. :-)) ___ Linux-HA mailing list

Re: Re: [Linux-HA] Logging semantics changed from HA 2.1.0 to HA 2.1.2 (opensuse build)

2007-08-14 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 14.08.2007 12:29 >>> > Alan changed the default log facility to LOG_DAEMON (instead of LOG_LOCAL7). > That is likely to be the change that is affecting you (assuming you > set "use_logd true" in ha.cf). Hi Andrew, thank you very much. This is exactly the p

Antw: Re: [Linux-HA] Need hint for rule HAv2

2007-08-14 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 14.08.2007 15:02 >>> >On 8/14/07, matilda matilda <[EMAIL PROTECTED]> wrote: >odd - that should work already >oh, try setting the priority of Y so that its greater than X >(resources are processed in or

Re: Re: [Linux-HA] Logging semantics changed from HA 2.1.0 to HA 2.1.2 (opensuse build)

2007-08-16 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 14.08.2007 12:29 >>> > Alan changed the default log facility to LOG_DAEMON (instead of LOG_LOCAL7). > That is likely to be the change that is affecting you (assuming you > set "use_logd true" in ha.cf). Hi Andrew, I just checked my configuration. I use a /

Re: Re: Re: [Linux-HA] Logging semantics changed from HA 2.1.0 to HA 2.1.2 (opensuse build)

2007-08-16 Thread matilda matilda
>>> "Andrew Beekhof" <[EMAIL PROTECTED]> 16.08.2007 13:45 >>> > On 8/16/07, matilda matilda <[EMAIL PROTECTED]> wrote: > Yes. Alan broke it in http://hg.beekhof.net/lha/crm-dev/rev/1f454f857ee8 > Fixed in: >http://hg.beekhof.net/lha/crm-de

Re: Re: [Linux-HA] Need a rule (added)

2007-08-16 Thread matilda matilda
>>> "Christian Rishøj" <[EMAIL PROTECTED]> 08/16/07 11:11 PM >>> > Just to further my understanding of colocation constraints: Does this > also imply that B will stop/relocate if A fails, or does it take an >extra rule to achieve that? If you have resource A and B and an asymetrical colocation con

[Linux-HA] Documentation

2007-08-17 Thread matilda matilda
Hi all, it's not a secret that the documentation of HAv2 is not the best. As some guys here on the list were very helpful to me the last days/weeks (especially Andrew, Lars, Max, Eddie) I would like to add some documentation/snippets. So, I really don't know where to start. The structure of the

[Linux-HA] Manual failback

2007-08-17 Thread matilda matilda
Hi all, now I reached a new level in the HAv2 (2.1.2) adventure game. :-) One question here: After failover my ressources stick at the second node, which is really what I wanted. If the first and primary node is up back again, how can I simply switch the cluster and all ressources manually back t

Antw: Re: [Linux-HA] Documentation

2007-08-17 Thread matilda matilda
>>> Sander van Vugt <[EMAIL PROTECTED]> 17.08.2007 10:56 >>> > If anyone can explain HOW I can edit this page, we will have the start > of a decent documentation soon! Andreas, keep an eye on this mail thread > and it will become clear soon enough how to add things to this page. Hi Sander, do you

Re: Antw: Re: [Linux-HA] Documentation

2007-08-17 Thread matilda matilda
Hi all, thank you for your feedback on my initial question. Now I had the time to look at the Novell documentation. Here's my personal conclusion to the question about wiki (WD) vs. Novell documentation (ND). All my experiences are related to HAv2: * ND is well structured. You just can start to

Antw: [Linux-HA] Resources in UNMANAGED state

2007-08-20 Thread matilda matilda
>>> "Christian Rishøj" <[EMAIL PROTECTED]> 20.08.2007 15:00 >>> > I suspect it happens when a monitor action times out. What other > likely causes can there be? Hi Christian, there was once a discussion about proper timeout values. And the experts here came to the conclusion that it's much better

Antw: Re: [Linux-HA] Documentation

2007-08-20 Thread matilda matilda
>>> Max Hofer <[EMAIL PROTECTED]> 20.08.2007 14:27 >>> > Current State: a PDF version is currently 30 pages long containing 20% of the > content I would like to put in ---> I intended to write an article but it > starts to become a book. Hi Max, I would like to see the current state. Is it pos

Re: [Linux-HA] How to keep the "takeover time" in 1 second?

2007-08-23 Thread matilda matilda
>>> "mingdao lu" <[EMAIL PROTECTED]> 23.08.2007 11:23 >>> > [...] the deadtime, warntime and keepalive time > to keep the "takeover time" in one second? U. This is a real challenge. I'm also interested in an answer. :-) Best regards Andreas ___

Re: Re: [Linux-HA] How to keep the "takeover time" in 1 second?

2007-08-23 Thread matilda matilda
>>> "Maxim Veksler" <[EMAIL PROTECTED]> 23.08.2007 13:50 >>> > > Clones + Active replication between all nodes ? Hi Maxim, do you have a scenario up and running (with that small failover times)? If yes, could you tell us more about the config? It sounds really interesting. Best regards Andreas

Re: [Linux-HA] Filesystem ocf problem

2007-08-23 Thread matilda matilda
>>> Ben Clewett <[EMAIL PROTECTED]> 23.08.2007 17:06 >>> > If any kind linux-ha member can let me know how this is done, it would > be very useful. Hi Ben, you can increase the timeout by setting an explicit timeout value for the start operation (probably stop and monitor also). Look at the cur

Re: Re: [Linux-HA] Filesystem ocf problem

2007-08-23 Thread matilda matilda
>>> Ben Clewett <[EMAIL PROTECTED]> 23.08.2007 17:38 >>> > The reason for this email is just to note to the group that a large file > system under drbd mounted by the latest linux-ha will cause a problem > which can only be sorted by a complete re-boot. > > I hope this will be useful, and somebo

Re: Re: [Linux-HA] Filesystem ocf problem

2007-08-23 Thread matilda matilda
>>> Hannes Dorbath <[EMAIL PROTECTED]> 24.08.2007 00:20 >>> > # time mount /raid/ > real0m0.023s > user0m0.001s > sys 0m0.007s > > # df -h|grep raid > /dev/mapper/raid-data 2.8T 1021G 1.8T 37% /raid > > > No offense, but as you can see, a 3 TB file system is mounted in 0.023s. > Ma

Re: Re: [Linux-HA] Resource monitoring question

2007-08-24 Thread matilda matilda
>>> Ben Clewett <[EMAIL PROTECTED]> 24.08.2007 10:33 >>> > I hope this a useful suggestion. I based my own custom script on this > simple and well written example: > > /etc/ha.d/resource.d/drbddisk > > This shows the expected input and expected outputs, and is so short it > makes an excellent

Re: [Linux-HA] Filesystem ocf problem

2007-08-28 Thread matilda matilda
>>> Ben Clewett <[EMAIL PROTECTED]> 08/28/07 4:31 PM >>> >> http://www.linux-ha.org/ClusterResourceManager/DTD1.0/Annotated >> >> or, actually: >> >> http://hg.linux-ha.org/dev/file/tip/crm/crm-1.0.dtd > >Thanks, I'm printing this out now :) Hi Ben, I couldn't resist to smile, because I sent y

Re: [Linux-HA] Trying to understand: resource cant run anywhere

2007-09-07 Thread matilda matilda
>>> Dave Augustus <[EMAIL PROTECTED]> 07.09.2007 18:28 >>> > What log tells me more details about WHY the sshd resource failed? I hope you have enabled logd and you're logging to some log destination. Look at the node where the resources initially ran. I guess that the monitoring action had an err

Re: [Linux-HA] R1 to R2 testing: cib.xml & ldirectord questions for 2 node cluster

2007-09-11 Thread matilda matilda
>>> "Peter Farrell" <[EMAIL PROTECTED]> 09/11/07 6:07 PM >>> >The only odd thing I didn't find anywhere grep'ing through the lists >and Google was the 'WARN: There is something wrong' message in the log >files. >If you recall the crm_mon said that 'ldirectord start' was failing - >but it wasn't tr

Re: [Linux-HA] R1 to R2 testing: cib.xml & ldirectord questions for 2 node cluster

2007-09-11 Thread matilda matilda
>>> "Peter Farrell" <[EMAIL PROTECTED]> 09/11/07 10:11 PM >>> Hi Peter, 1) In version 2.1.2 the mentioned script aka ocf-compliant RA should be part of the distribution. You can also use the one posted. 2) Yes, you have to create another resource definition for that ldirectord resource. I'm pre

[Linux-HA] CPU time consumption of lrmd

2007-09-12 Thread matilda matilda
Hi all, after running HAv2 (2.1.2) for a while now (thanks for all the necessary advice and help) I found out that lrmd is consuming relativly much CPU time as stated by 'ps'. I just want to ask the technical reason and if it's plausible. Facts: * 14 resources are in started state, that means for

Re: Re: [Linux-HA] R1 to R2 testing: cib.xml & ldirectord questions for 2 node cluster

2007-09-13 Thread matilda matilda
>>> "Peter Farrell" <[EMAIL PROTECTED]> 12.09.2007 17:24 >>> > Andreas - > > A follow up if you will... Hi Peter, do you need further assistence. I saw that Dejan helped you. Is everything fine and running? In which mode does you use ldirectord? (NAT?) Best regards Andreas Mock _

  1   2   >