Re: [Linux-HA] The suicide stonith plugin doesn't work with 2.1.3 (?)

2008-02-03 Thread Lars Marowsky-Bree
On 2008-02-01T15:29:55, Dejan Muhamedagic [EMAIL PROTECTED] wrote: It turns out that the suicide stonith plugin doesn't work with crm in v2.1.3. The reason is crm stopping all managed resources on the node before it is fenced. However, when the suicide stonith resource is moved

Re: [Linux-ha-dev] Re: RFC: Roadmap for 2.2.0

2008-01-31 Thread Lars Marowsky-Bree
On 2008-01-31T12:10:09, Andrew Beekhof [EMAIL PROTECTED] wrote: The fact that we have non-SUSE packages for OpenAIS, Heartbeat and Pacemaker on the build service should underline our commitment to supporting those parts of the community that do not run SUSE products. This cannot be

Re: [Linux-HA] SLES 10 SP1 and Heartbeat

2008-01-31 Thread Lars Marowsky-Bree
On 2008-01-30T19:43:48, Bryan Manzeck [EMAIL PROTECTED] wrote: I need to pass this along as I worked on this a long time and finally got it working. I have two HP DL385G2 Servers running SLES 10 SP1 and will NOT configure STONITH and HP iLO card with the software included. The iLO hardware

Re: [Linux-ha-dev] RFC: Roadmap for 2.2.0

2008-01-30 Thread Lars Marowsky-Bree
On 2008-01-29T09:23:35, Serge Dubrouski [EMAIL PROTECTED] wrote: How will be *.spec files organized and who will support them? Before this release there was just one heartbeat.spec file that one could use to build RPMs, how is it going to be handled in the future? Well, in what I outlined,

Re: [Linux-ha-dev] Re: RFC: Roadmap for 2.2.0

2008-01-30 Thread Lars Marowsky-Bree
On 2008-01-30T20:30:44, Tadashiro Yoshida [EMAIL PROTECTED] wrote: I understand that each project, PaceMaker and narrowly-defined Heartbeat, maintains own package. We then need an integrator to package those into broadly-defined Heartbeat. No, this is not correct. You do not need an

Re: [Linux-ha-dev] Re: RFC: Roadmap for 2.2.0

2008-01-30 Thread Lars Marowsky-Bree
On 2008-01-31T08:24:08, Tadashiro Yoshida [EMAIL PROTECTED] wrote: But from our experiences, it is not enough for enterprise use from the view of quality. Someone should integrate plural packages and test it intensively. It is efficient if all of community member can test one integrated

[Linux-ha-dev] RFC: Roadmap for 2.2.0

2008-01-29 Thread Lars Marowsky-Bree
Hi all, I'd like to propose the following changes to happen in the next heartbeat release, which I'd name 2.2.0 because of them. As the formerly-known-as-CRM component is now developed as the PaceMaker project, the corresponding code should be removed from the heartbeat project itself, as it

Re: [Linux-ha-dev] Stonith device development

2008-01-29 Thread Lars Marowsky-Bree
On 2008-01-28T09:04:45, chris barry [EMAIL PROTECTED] wrote: Thanks Andrew. I am aware of this agent. It does not however work using the viperltoolkit APIs, nor can it use VirtualCenter, so it's not a workable option for me - hence the fenced script ;). In a clustered ESX environment that

Re: [Linux-HA] URGENT: Problem with configuration of STONITH device

2008-01-27 Thread Lars Marowsky-Bree
On 2008-01-27T14:06:15, Andreas Mock [EMAIL PROTECTED] wrote: just want to add another issue to think about: Some people want to use the whole HA infrastructure to have single node clusters. In this case HA is not able to fence itself when a resource gets crazy: You will find I explicitly

[Linux-HA] Re: [Pacemaker] Ordered attribute of clone resources

2008-01-26 Thread Lars Marowsky-Bree
On 2008-01-26T21:29:39, Andreas Mock [EMAIL PROTECTED] wrote: the crm.dtd says about the ordered-attribute: -8-- ordered * Start (or stop) each clone only after the operation on the previous clone completed.

Re: [Linux-HA] STONITH agent only on DC?

2008-01-26 Thread Lars Marowsky-Bree
On 2008-01-26T20:50:12, Michael Schwartzkopff [EMAIL PROTECTED] wrote: Hi, as I understood, the STONITH operation is always initiated by the DC. Is it possible to run the STONITH agent only on the DC instead having a clone ressource and running the STONITH agent on every node? Does it

Re: [Linux-HA] Re: [Pacemaker] Ordered attribute of clone resources

2008-01-26 Thread Lars Marowsky-Bree
On 2008-01-26T22:32:21, Andreas Mock [EMAIL PROTECTED] wrote: this can have an impact on monitoring stonith resource agents if you want to use clones. There are many stonith devices which do allow only extactly one connection at a time. The monitor action is often implemented to connect to

Re: [Linux-HA] STONITH agent only on DC?

2008-01-26 Thread Lars Marowsky-Bree
On 2008-01-26T22:26:30, Andreas Mock [EMAIL PROTECTED] wrote: I think a piece of information is valueable at this point. As far as Dejan told me stonithd would never trigger a stonith resource agent to shoot the node on which the stonith resource agent is running. That would appear to be a

Re: [Linux-HA] URGENT: Problem with configuration of STONITH device

2008-01-26 Thread Lars Marowsky-Bree
On 2008-01-15T12:55:50, Dejan Muhamedagic [EMAIL PROTECTED] wrote: Yes. One serious problem in this case is that the cluster can never know if the stonith operation was successful. Which would basically render the cluster unusable. So, the upper question is probably NO by design?

Re: [Linux-HA] Explanation of node's standby status

2008-01-25 Thread Lars Marowsky-Bree
On 2008-01-25T12:02:05, matilda matilda [EMAIL PROTECTED] wrote: Andrew Beekhof [EMAIL PROTECTED] 25.01.2008 11:10 command 1 tells the cluster that the node is not available to run resources command 2 says that it can (again) run resources I just wanted to know, what standby means. I

Re: [Linux-ha-dev] Explain of some undocumented parameters to RA ?

2008-01-22 Thread Lars Marowsky-Bree
On 2008-01-22T17:38:14, Xinwei Hu [EMAIL PROTECTED] wrote: Hi all, I'm drafting a document on parameters passed to RA. Besides those mentioned on opencf.org and linux-ha.org, I found a lot of CRM_meta_xxx parameters which don't explained anyway (or I just missed ;-/) Here's a list of

Re: [Linux-ha-dev] [PATCH] Process monitor daemon

2008-01-22 Thread Lars Marowsky-Bree
On 2008-01-22T17:37:15, Keisuke MORI [EMAIL PROTECTED] wrote: The background of why we developed this tool is that: 1) We want to detect a process failure asynchronously, not only by the periodic monitor operations, to cause a failover faster to minimize the service downtime. Right,

Re: [Linux-HA] log output confusion

2008-01-21 Thread Lars Marowsky-Bree
On 2008-01-21T15:37:17, DAIKI MATSUDA [EMAIL PROTECTED] wrote: Hi, All. I encountered odd behaviour for log output. I wrote the config in ha.cf following logfile /var/log/ha-log debugfile /var/log/ha-debug #logfacility local0 ... But, the logs are written to system log file. In ha.cf

Re: [Linux-ha-dev] [PATCH] Process monitor daemon

2008-01-18 Thread Lars Marowsky-Bree
On 2008-01-16T18:48:06, Keisuke MORI [EMAIL PROTECTED] wrote: Hello all, We have developed a new feature that detects a process failure directly to reduce the failover time. If you're interested in, please try this and give me your comments. See attached README for details about how to

Re: [Linux-HA] ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned.

2008-01-18 Thread Lars Marowsky-Bree
On 2008-01-14T20:17:53, Thomas Glanzmann [EMAIL PROTECTED] wrote: - Don't use clone or group - One primitive per ipmi device - Location constraints You don't need location constraints. Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE

Re: [Linux-HA] ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned.

2008-01-18 Thread Lars Marowsky-Bree
On 2008-01-18T13:18:21, Thomas Glanzmann [EMAIL PROTECTED] wrote: You don't need location constraints. okay. Could elaborate please? Does the stonith subsystem automatically know where to put them? Assuming that the fencing device can be reached from all nodes, it doesn't matter where they

Re: [Linux-HA] ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned.

2008-01-18 Thread Lars Marowsky-Bree
On 2008-01-18T13:26:47, Thomas Glanzmann [EMAIL PROTECTED] wrote: I have a two node cluster. I use external/ipmi which needs one instance per node. A node that is misbehaving can't stonith itself, can it? If the node fails, and the other side needs STONITH, the resource will be started in that

Re: [Linux-ha-dev] [mgmt][Patch]Implement common classes for adding and viewing kinds of objects.

2008-01-13 Thread Lars Marowsky-Bree
On 2008-01-11T15:20:29, Yan Gao [EMAIL PROTECTED] wrote: 1. Get crm.dtd file from server end 2. Parse DTD in haclient 3. Dynamically render appropriate gtk widgets according to the DTD element 4. Add enumeration values to drop-down list. List and mark up the default values if have.

Re: [Linux-ha-dev] Status o2cb RA

2008-01-11 Thread Lars Marowsky-Bree
On 2008-01-10T08:27:07, Serge Dubrouski [EMAIL PROTECTED] wrote: It's definitely broken in 2.1.3. It doesn't have a working monitor function, it has a syntax error in ip command (at least for Fedora distro), it doesn't offline a resource when it's stopped and so on, it even doesn't set a path

Re: [Linux-ha-dev] Status o2cb RA

2008-01-11 Thread Lars Marowsky-Bree
On 2008-01-11T08:40:23, Serge Dubrouski [EMAIL PROTECTED] wrote: The monitor function is actually implemented as-intended at this stage. There's nothing to monitor, and it shouldn't be run with a periodic monitor. It looks like that: o2cb_monitor() { # o2cb_init exit

Re: [Linux-HA] 2.1.3 suse rpm's?

2008-01-11 Thread Lars Marowsky-Bree
On 2008-01-10T20:00:12, Sebastian Reitenbach [EMAIL PROTECTED] wrote: they're pretty close to what ended up in 2.1.3 i'll update them shortly when I do the first pacemaker release ah, ok, that's fine. There was something wrongish with the daily builds during my vacation (last '07 and first

Re: [Linux-ha-dev] Status o2cb RA

2008-01-10 Thread Lars Marowsky-Bree
On 2008-01-04T13:18:57, Serge Dubrouski [EMAIL PROTECTED] wrote: can you please supply your patches? i would like to take a look and test them too :) I'd like to know what is the status of that RA first. It definitely requires some work done and I can take but it looks like Lars is working

Re: [Linux-HA] dev can not up a fail count for monitor timeout

2007-12-20 Thread Lars Marowsky-Bree
On 2007-12-19T21:19:48, Alan Robertson [EMAIL PROTECTED] wrote: Dave, Dejan and (if possible) Lars: I have put this patch into 'test'. PLEASE begin testing it at your earliest convenience. Is there any specific reason why you did not push it into dev? Regards, Lars -- Teamlead

Re: [Linux-HA] Heartbeat Service fails in the first start.

2007-12-19 Thread Lars Marowsky-Bree
On 2007-12-19T11:32:12, Andrew Beekhof [EMAIL PROTECTED] wrote: i prefer to use the crm respawn directive which disables the fast-fail logic^. when a non-transient problem like this occurs and heartbeat is started at boot time (which is the normal thing to do), you have about 2s to identify

Re: [Linux-ha-dev] heartbeat 2.x on IPv6?

2007-12-18 Thread Lars Marowsky-Bree
On 2007-12-18T22:06:07, Tomokazu Omura [EMAIL PROTECTED] wrote: Has anyone developed a ping6 plugin for heartbeat 2.x ? openAIS has full IPv6 support. I hope that making pingd handle the pinging internally (instead of relying on the cluster infrastructure) should be a simple change with

Re: [Linux-HA] IPaddr: netmask or cidr_netmask?

2007-12-18 Thread Lars Marowsky-Bree
On 2007-12-14T14:54:38, Keisuke MORI [EMAIL PROTECTED] wrote: IPaddr RA has two kinds of parameter to specify the netmask: netmask and cidr_netmask. Which one is officially supported and recommended to use? The fact that only the cidr_netmask is in the metadata is a pretty big clue. ;-)

Re: [Linux-HA] Cluster aware LVM.

2007-12-18 Thread Lars Marowsky-Bree
On 2007-12-18T13:19:50, Andrew Beekhof [EMAIL PROTECTED] wrote: Now that the crm can run on openAIS (which i believe is what cman uses), the crm and clvmd can use the same membership information so in theory it should be possible. I don't know much about clvmd, but if you write an RA

Re: [Linux-ha-dev] [mgmt] [Patch3] Remove the default settings for master_slave from attlist

2007-12-12 Thread Lars Marowsky-Bree
On 2007-12-12T19:31:08, Yan Gao [EMAIL PROTECTED] wrote: Patch3: Default settings of notify and globally_unique for master_slave have been moved into meta_attributes. Remove them from attlist. Attached the patch. Signed-off-by: Yan Gao [EMAIL PROTECTED] Thanks, merged too.

Re: [Linux-ha-dev] [mgmt] [Patch2] Bugfix: Error caused by blank description of parameter.

2007-12-12 Thread Lars Marowsky-Bree
On 2007-12-12T19:30:46, Yan Gao [EMAIL PROTECTED] wrote: Patch2: When getting description of a parameter from metadata, if the description is blank and environment variable of LANG has been set to POSIX , haclient will fall into an error. Attached the patch. Signed-off-by: Yan Gao

Re: [Linux-ha-dev] [PATCH]The change of the required parameter of FileSystem resources for GUI.

2007-12-11 Thread Lars Marowsky-Bree
On 2007-12-11T16:58:08, HIDEO YAMAUCHI [EMAIL PROTECTED] wrote: When operator set Filesystem in GUI, there is the case that a operator forgets setting of directory. In addition, that the setting of directory is required is described in a Filesystem resource. Thanks, this is good. I've

Re: pgsql ra (WAS: Re: [Linux-ha-dev] ids ra)

2007-12-10 Thread Lars Marowsky-Bree
On 2007-12-10T12:38:34, Serge Dubrouski [EMAIL PROTECTED] wrote: Both operations include calling pg_ctl or psql. validate_all checks that they are set in the right way. /usr/sbin/ocf-tester can very quickly show whether everything works as it should ;-) I can see an option to ocf-tester:

Re: [Linux-ha-dev] [mgmt]Rewriting order and colocation configurations

2007-12-09 Thread Lars Marowsky-Bree
On 2007-12-08T15:12:47, Alan Robertson [EMAIL PROTECTED] wrote: This may be preferable to needing to duplicate this in home-grown fashion. A lint-like tool is still a good idea, but it should be build on top of this, IMHO. /me redirects this whining into /dev/null I'm not whining. I have

Re: [Linux-HA] handling clone instances

2007-12-09 Thread Lars Marowsky-Bree
On 2007-12-06T10:07:25, Andrew Beekhof [EMAIL PROTECTED] wrote: I mean, if clone:0 fails on node_a, I want to up a fail count for clone:1/node_a at the same time. or, is there any good idea to work out the above behavior without clone? not sure if thats possible yet. a good idea though let

Re: [Linux-HA] Re: [Linux-ha-dev] ANNOUNCE: Project Organization - CRM to become its own project

2007-12-08 Thread Lars Marowsky-Bree
On 2007-12-07T14:55:31, Alan Robertson [EMAIL PROTECTED] wrote: Because of the surprise timing of this announcement, right in the last phases of a release, and during time when I'm supposed to be on vacation, I'm postponing discussion on this until at least Monday to give me a chance to

Re: [Linux-ha-dev] [mgmt]Rewriting order and colocation configurations

2007-12-08 Thread Lars Marowsky-Bree
On 2007-12-09T02:50:52, Yan Gao [EMAIL PROTECTED] wrote: Thanks! It's a good tool. By now, haclient doesn't generate a xml file. Ideally, haclient should generate a valid xml, and then transfer to mgmtd. Xinwei and I think that the current protocol is too complicated and has many

Re: [Linux-HA] Re: [Linux-ha-dev] ANNOUNCE: Project Organization - CRM to become its own project

2007-12-08 Thread Lars Marowsky-Bree
On 2007-12-07T10:41:47, Alan Robertson [EMAIL PROTECTED] wrote: Andrew's contributions to the Linux-HA community will be missed. I am sad that he has unilaterally decided to leave Linux-HA and fork his code into in a separate project. It is not a fork; very little redundant development

Re: [Linux-HA] Re: [Linux-ha-dev] ANNOUNCE: Project Organization - CRM to become its own project

2007-12-08 Thread Lars Marowsky-Bree
On 2007-12-07T14:57:25, Alan Robertson [EMAIL PROTECTED] wrote: Moderation was already removed. It was a rather childish thing to do, and an active abuse of power and control. It was possibly the best thing you could do to convince us that we are heading down the right path. Regards,

Re: [Linux-HA] Re: [Linux-ha-dev] ANNOUNCE: Project Organization - CRM to become its own project

2007-12-08 Thread Lars Marowsky-Bree
On 2007-12-07T14:55:31, Alan Robertson [EMAIL PROTECTED] wrote: Because of the surprise timing of this announcement, right in the last phases of a release, and during time when I'm supposed to be on vacation, I'm postponing discussion on this until at least Monday to give me a chance to

[Linux-ha-dev] GPL (no version) vs GPLv2 vs GPLv2+

2007-12-07 Thread Lars Marowsky-Bree
Hi all, we have a quite inconvenient mix of licenses in the code. Some are # License: GNU General Public License (GPL) others # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the

Re: [Linux-HA] ANNOUNCE: Project Organization - CRM to become its own project

2007-12-07 Thread Lars Marowsky-Bree
On 2007-12-07T15:24:41, matilda matilda [EMAIL PROTECTED] wrote: Hi Andrew, can you give some explanation to us why this decision was made? What is the vision/idea behind that? I'm not Andrew, but the primary motivator is that the CRM will in the future be a dual-stacked effort, and this

Re: [Linux-ha-dev] ANNOUNCE: Heartbeat Cluster Resource Manager Ported to OpenAIS

2007-12-06 Thread Lars Marowsky-Bree
On 2007-12-05T21:06:38, Andrew Beekhof [EMAIL PROTECTED] wrote: Over the last few months, Red Hat and SUSE engineers have been working together to port Heartbeat's powerful Cluster Resource Manager (CRM) to run natively on top of OpenAIS. Credit where credit is due: this means you, Andrew.

[Linux-HA] Re: [Linux-ha-dev] ANNOUNCE: Heartbeat Cluster Resource Manager Ported to OpenAIS

2007-12-06 Thread Lars Marowsky-Bree
On 2007-12-05T21:06:38, Andrew Beekhof [EMAIL PROTECTED] wrote: Over the last few months, Red Hat and SUSE engineers have been working together to port Heartbeat's powerful Cluster Resource Manager (CRM) to run natively on top of OpenAIS. Credit where credit is due: this means you, Andrew.

Re: [Linux-ha-dev] [RFC] Change the behavior of cibadmin on dangerous options

2007-12-04 Thread Lars Marowsky-Bree
On 2007-12-04T00:20:15, Xinwei Hu [EMAIL PROTECTED] wrote: Hi all, We have a instance about cibadmin recently. A typo of 'cibadmin -r blahblah' forces the HA into RO mode without any warning, and the field engineer almost panic. ;) I like the direction. The more dangerous commands

Re: [Linux-ha-dev] [RFC] Change the behavior of cibadmin on dangerous options

2007-12-04 Thread Lars Marowsky-Bree
On 2007-12-04T21:29:35, Xinwei Hu [EMAIL PROTECTED] wrote: The more dangerous commands usually require a --force option on other tools. (fsck, mkfs, rpm, drbdadm, ...) The reason that I don't go this way is concerning the portability. getopt_long is not a POSIX standard AFAIK. Then make it

Re: [Linux-HA] Fencing prevents resource from failing over

2007-11-26 Thread Lars Marowsky-Bree
On 2007-11-26T10:55:25, [EMAIL PROTECTED] wrote: Hi, I've a 2 node active/passive cluster ( active node=active , passive node=standby) using heartbeat 2.0.8 . I recently enabled stonith . The stonith device is an rsh device that tries to restart the cluster node. What is an rsh stonith

Re: [Linux-HA] How to control a resource with an environment variable?

2007-11-25 Thread Lars Marowsky-Bree
On 2007-11-25T13:54:34, Atanas Dyulgerov [EMAIL PROTECTED] wrote: Then I export the variable my_variable on the node where I want to move the resource service_ip. # export my_variable=true Nothing happens. # export OCF_RESKEY_my_variable=true Again nothing happens. That's a rather

Re: [Linux-HA] ECCN classification for Linux-HA Heartbeat

2007-11-13 Thread Lars Marowsky-Bree
On 2007-11-13T14:18:50, Henriques, Tiago [EMAIL PROTECTED] wrote: We are using Linux-HA Heartbeat in one of our products, and are now in the process of collecting the information needed to export it to other countries. In order to do this, can you tell me whether any citizens of the United

Re: [Linux-HA] pengine: increment_clone erros with clones on 10 nodes

2007-11-09 Thread Lars Marowsky-Bree
On 2007-11-08T15:05:17, Iain Arnell [EMAIL PROTECTED] wrote: I've been happily running a cluster of eight SLES10 machines using the standard SLES10 service pack 1 heartbeat-2.0.8-0.19 RPMs. But after adding 2 more machines, I'm now running into problems with the clone resources. (And I

Re: [Linux-ha-dev] If suicide is the answer, you're asking the wrong question

2007-11-08 Thread Lars Marowsky-Bree
On 2007-11-08T16:04:25, Andrew Beekhof [EMAIL PROTECTED] wrote: The attached table^ attempts to explain why node suicide, at least the so simple it can't possibly have a single bug kind being proposed, is no substitute for enabling stonith (even when no plugins are configured!). Agreed.

Re: [Linux-ha-dev] cl_log dropping messages

2007-11-08 Thread Lars Marowsky-Bree
On 2007-11-06T09:53:13, Alan Robertson [EMAIL PROTECTED] wrote: Cutting out that debug should be OK - or raising it to happen if debug is 1 would probably also be OK. If you're seeing this happen a lot, that's not a good thing. Getting behind 200 messages seems like a lot to me - off

Re: [Linux-HA] OS process priority of monitor action

2007-11-08 Thread Lars Marowsky-Bree
On 2007-11-06T15:20:06, Alan Robertson [EMAIL PROTECTED] wrote: I believe that you have hit on the only really good general solution. Raising the priority won't raise I/O priority or make the monitor and stop actions stay locked in memory so that they don't get paged in or out behind a

[Linux-HA] Re: [Linux-ha-dev] If suicide is the answer, you're asking the wrong question

2007-11-08 Thread Lars Marowsky-Bree
On 2007-11-08T16:04:25, Andrew Beekhof [EMAIL PROTECTED] wrote: The attached table^ attempts to explain why node suicide, at least the so simple it can't possibly have a single bug kind being proposed, is no substitute for enabling stonith (even when no plugins are configured!). Agreed.

Re: [Linux-HA] Recovering from unexpected bad things - is STONITH the answer?

2007-11-08 Thread Lars Marowsky-Bree
On 2007-11-08T14:49:44, Andrew Beekhof [EMAIL PROTECTED] wrote: I understand (and so far as that particular logic goes, I agree), but my concern is with the proposal of having some official recommendation to use the SSH plugin in production systems. It's simply (at present) just not

Re: [Linux-HA] Recovering from unexpected bad things - is STONITH the answer?

2007-11-08 Thread Lars Marowsky-Bree
On 2007-11-08T01:25:05, Yan Fitterer [EMAIL PROTECTED] wrote: If your software cannot withstand a crash, then it cannot be made highly-available - end of story. Crashes will happen. Be prepared. This is a fine argument from an engineering perspective, but not much use from a sysadmin POV.

[Linux-ha-dev] Commit messages

2007-11-06 Thread Lars Marowsky-Bree
Hi, how about commit messages which have some resemblance to what the change actually is about - preferably from a user's point of view, but I'd even take a developer PoV, but with bug impact: major (if you use cl_respawn), risk: low-to-moderate LF bug 1706 (finishing up associated issues) not

Re: [Linux-ha-dev] Recovering from unexpected bad things - is STONITH the answer?

2007-11-06 Thread Lars Marowsky-Bree
On 2007-11-06T10:25:05, Alan Robertson [EMAIL PROTECTED] wrote: For problems that should never happen like death of one of our core/key processes, is an immediate reboot of the machine the right recovery technique? The advantages of such a choice include: It is fast It will invoke

Re: Antw: Re: [Linux-HA] [ANNOUNCE] Interim heartbeat packages refreshed (2.1.2-15)

2007-11-06 Thread Lars Marowsky-Bree
On 2007-10-29T11:48:58, matilda matilda [EMAIL PROTECTED] wrote: Andrew Beekhof [EMAIL PROTECTED] 29.10.2007 11:37 is its removal causing problems? Thare seems to be a dependency between 'heartbeat' and 'heartbeat-devel' (if heartbeat-devel is installed) which gets broken and YaST is

Re: [Linux-HA] crm_mon confused?

2007-11-06 Thread Lars Marowsky-Bree
On 2007-11-01T02:12:49, Christian Rishøj [EMAIL PROTECTED] wrote: Additionally, it would be nice if I could configure Heartbeat to be more eager in trying to recover failed resources. Suppose my database times out when stopping. Now it's banned, with a failed stop action. I'd like Heartbeat

Re: [Linux-ha-dev] bug in failcount handling?

2007-10-30 Thread Lars Marowsky-Bree
On 2007-10-29T19:44:48, Alan Robertson [EMAIL PROTECTED] wrote: Off hand, this sounds a bit like a bug to me. I've attached the relevant files - the output of cibadmin -Q, a spreadsheet with the output of the various ptest runs, and the logs from both machines in the clusters. If it's a

Re: [Linux-ha-dev] [Bug 1722] First item in a group is not stopped when the second fails (and can't be migrated)

2007-10-29 Thread Lars Marowsky-Bree
On 2007-10-29T21:16:17, Keisuke MORI [EMAIL PROTECTED] wrote: In the HA database cluster, the database service is typically provided by the group like: Filesystem + MySQL + IP If any of the resources failed then the database service is no longer available. Running only Filesystem does

[Linux-ha-dev] cl_log dropping messages

2007-10-25 Thread Lars Marowsky-Bree
Hi all, on my 7 node cluster, I see the occasional - every 5-10 tests - bunch of messages dropped during a burst; usually on the DC (what a surprise), on the order of ~200 messages dropped per incident. This occurs only with debug 1, and only above 5 nodes or so. So yes, my cluster is fully

Re: [Linux-ha-dev] cl_log dropping messages

2007-10-25 Thread Lars Marowsky-Bree
On 2007-10-25T13:11:56, Dejan Muhamedagic [EMAIL PROTECTED] wrote: The network is fully virtual, so I can't be hitting that limit. Probably your xen is better than mine. Here I have a transfer rate (guest to host) at times around 10mbit. Paravirtualized is quite fast. I also don't connect my

Re: [Linux-ha-dev] cl_log dropping messages

2007-10-25 Thread Lars Marowsky-Bree
On 2007-10-25T16:25:30, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: http://hg.linux-ha.org/dev/rev/69f0395c2ead seems to fix some of this for me. BTW, I was able to conclude a 100 cycle run with that patch applied on 7 nodes, and absolutely not a single BadNews, which is a first. Regards

Re: [Linux-ha-dev] cl_log dropping messages

2007-10-25 Thread Lars Marowsky-Bree
On 2007-10-25T19:39:23, Dejan Muhamedagic [EMAIL PROTECTED] wrote: Probably this means that MAXMISSING and FLOWCONTROL_LIMIT might require tuning. Since both directly depend on MAXMSGHIST, I guess that it should be OK as it is. Exactly why. Those thresholds depend on a compile-time choice,

Re: [Linux-HA] [ANNOUNCE] Interim heartbeat packages refreshed (2.1.2-15)

2007-10-25 Thread Lars Marowsky-Bree
On 2007-10-25T10:23:42, Andrew Beekhof [EMAIL PROTECTED] wrote: Just a quick note to say that the packages at http://software.opensuse.org/download/server:/ha-clustering were refreshed today after sufficiently (see pending bugs below) passing automated testing. Thanks for this! Great

Re: [Linux-HA] Grouping clone resources?!

2007-10-25 Thread Lars Marowsky-Bree
On 2007-10-25T12:09:58, Andrew Beekhof [EMAIL PROTECTED] wrote: Sorry for the dumb question (if any) but is it possible? I have 2 evms and 1 ocfs clones+ an extra stonith but that's not really my concern. It would be good to start those above in 1 group so my Xen ordering would become

Re: [Linux-HA] Is it possible to create group of multi-state DRBD resources?

2007-10-24 Thread Lars Marowsky-Bree
On 2007-10-24T12:06:22, Andrew Beekhof [EMAIL PROTECTED] wrote: I have a two-node symetric cluster that has four multi-state drbd resources per node, each of which contain a DRBD primitive. All these resources must be promoted to master in the same time together on the same node. if

Re: [Linux-ha-dev] [GUI][patch3]Excludes a problem resource in a metadata function

2007-10-23 Thread Lars Marowsky-Bree
On 2007-10-23T10:34:44, HIDEO YAMAUCHI [EMAIL PROTECTED] wrote: 1)When there is the resource that there is a problem in metadata, the resource addition dialog is not displayed. With this patch, only a problem resource is excluded, and a dialog is displayed. 2)Even if there is a

Re: [Linux-ha-dev] build infrastructure

2007-10-23 Thread Lars Marowsky-Bree
On 2007-10-23T15:42:03, David Lee [EMAIL PROTECTED] wrote: 1. of our bugs - of the Linux/32bit subset of bugs. I agree with most of your points, but I need to make a distinction here. ;-) At least x86-32 and x86-64 are considered, and across a considerable range of distributions. (Internally,

Re: [Linux-ha-dev] RFC: pkg/ and port/ directory location

2007-10-23 Thread Lars Marowsky-Bree
On 2007-10-23T16:06:58, Alan Robertson [EMAIL PROTECTED] wrote: Getting to a single RPM spec file is not a stupid idea. I've taken some of the code from your specfile, and some from the CentOS and Fedora specfiles and combined them into one specfile. I we still disagree about the basic part

Re: [Linux-ha-dev] RFC: pkg/ and port/ directory location

2007-10-22 Thread Lars Marowsky-Bree
On 2007-10-18T13:07:45, Andrew Beekhof [EMAIL PROTECTED] wrote: Quick question, does anyone know if the pkg and port directories need to live in their current location? If not, I'm considering moving them to contrib/build/(pkg|port) where they'd also be joined by the openBSD

Re: [Linux-ha-dev] Starting heartbeat when interfaces are down

2007-10-22 Thread Lars Marowsky-Bree
On 2007-10-19T21:57:17, Dejan Muhamedagic [EMAIL PROTECTED] wrote: http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1732 for some discussion on communication interfaces. discussion means the current deficits are by design ;-) This seems somewhat counter to the idea of

Re: [Linux-ha-dev] mysql ocf patch - update documentation and fix a typo

2007-10-22 Thread Lars Marowsky-Bree
On 2007-10-18T08:10:04, Alan Robertson [EMAIL PROTECTED] wrote: please find another ocf::heartbeat::mysql patch attached. When you attach patches, it would be nice if you're able to make them text/plain MIME types. Just out of interest - his attachment _was_ a text/plain attachment according

Re: [Linux-ha-dev] RFC: pkg/ and port/ directory location

2007-10-22 Thread Lars Marowsky-Bree
On 2007-10-22T12:03:32, Andrew Beekhof [EMAIL PROTECTED] wrote: actually debian does need to be in its current location - which is why i thought to ask :-) Why? And maybe a symlink would suffice, if debian insists? (Not that it matters, it just might be more tidy.) We could fold the Build

Re: [Linux-ha-dev] Interim heartbeat version (obs-) 2.1.2-4

2007-10-22 Thread Lars Marowsky-Bree
On 2007-10-22T15:22:46, Michael Kapp [EMAIL PROTECTED] wrote: Is it possible to provide the standard install mechanism for the obs-* package like ./configure, make, make install, such as available with the stable heartbeat-2.1.2.tar.gz package? The gentoo people would be very happy for

Re: [Linux-ha-dev] Interim heartbeat version (obs-) 2.1.2-4

2007-10-22 Thread Lars Marowsky-Bree
On 2007-10-22T16:29:43, Michael Kapp [EMAIL PROTECTED] wrote: That would mean that automake autoconf would have to be run first. I think this would be done by bootstrap make dist, but I _think_ that this would also run configure already (which is superfluous and lengthy ...) ..yes it

Re: [Linux-HA] No local heartbeat. Forcing restart

2007-10-22 Thread Lars Marowsky-Bree
On 2007-10-17T16:24:03, Bernd Schubert [EMAIL PROTECTED] wrote: I really think this No local heartbeat. Forcing restart is just ridiculous. Either the system is dead and then it also can't restart itself or the the system is in operating state, but then it also doesn't need to reset itself.

Re: [Linux-HA] DRBD v8

2007-10-22 Thread Lars Marowsky-Bree
On 2007-10-18T10:38:25, Raoul Bhatia [IPAX] [EMAIL PROTECTED] wrote: in the HOWTO of Linux-HA2 is mentioned htat the ocf RA of drbd does not support version 8 of drbd. Is that still true? as far as i know, yes :) Yes, for drbd8, I recommend to use the drbddisk script shipped with drbd for

Re: [Linux-HA] how to start/stop a resource and wait until success/fail by PROGRAM?

2007-10-22 Thread Lars Marowsky-Bree
On 2007-10-22T14:10:53, Andrew Beekhof [EMAIL PROTECTED] wrote: once you make your update, look for an lrm_rsc_op for (your resource + action=start + rc=0) once you see that, then you know it has started Might be an interesting commandline option for crm_resource -W to block until the

Re: [Linux-ha-dev] Release 2.1.3 planned for 10 December, 2007

2007-10-20 Thread Lars Marowsky-Bree
On 2007-10-15T15:33:29, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2007-10-15T06:53:36, Alan Robertson [EMAIL PROTECTED] wrote: Uhm. Ping? Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg

Re: [Linux-HA] Resource Stickiness

2007-10-17 Thread Lars Marowsky-Bree
On 2007-10-17T20:36:27, Ivan [EMAIL PROTECTED] wrote: Hi, Today I was testing my 2 node Xen cluster and noticed a funny thing. I do have resource stickiness set to infinity however 1 of my VMs (out of 2) called cups was moved back to the rebooted node after joining back to the cluster. Why

Re: [Linux-HA] Other open source of heartbeat available?

2007-10-16 Thread Lars Marowsky-Bree
On 2007-10-16T12:11:24, Ian Jiang [EMAIL PROTECTED] wrote: I want to use the hearbeat idea in an embedded environment. The current Linux heartbeat is too complicated, because an embedded system differs a lot with a general Linux cluster, and is usually much simpler and

Re: [Linux-HA] Add multiple IPs in V1 style config

2007-10-16 Thread Lars Marowsky-Bree
On 2007-10-16T11:00:54, Hannes Dorbath [EMAIL PROTECTED] wrote: I need to add multiple IPs to a machine that runs HB 2.0.8 with V1 style config. Is there a limit of IPaddr statements I can have? Is there a better to define multiple IPs? Can I get the changes take effect without doing a

Re: [Linux-HA] Negative value in resource-failure-stickiness

2007-10-16 Thread Lars Marowsky-Bree
On 2007-10-15T22:45:53, Yan Fitterer [EMAIL PROTECTED] wrote: Why ? i.e. why does the value have to be negative? Because it was a bad choice, relying too much on internal details than user perception ;-) (And was noticed too late for it to be changed easily.) The multiplied stickiness is

Re: [Linux-ha-dev] Re: [mgmt][patch1]Add boolean_op in mgmt

2007-10-15 Thread Lars Marowsky-Bree
On 2007-09-13T14:14:46, Yan Gao [EMAIL PROTECTED] wrote: Sorry for late. I've been a little busy last two weeks. ;-) Same here! But I've just pushed the first 5 patches out to the dev repo. Yes. I think the first 5 patches should be ok. I'm rewriting the pengine and crmd metadata

Re: [Linux-ha-dev] translated Linux-ha.org into Japanese

2007-10-15 Thread Lars Marowsky-Bree
On 2007-10-04T14:38:52, Takayuki Tanaka [EMAIL PROTECTED] wrote: I translated Linux-ha.org into Japanese. Hereafter, I will up-load 123 pages of the translated contents. The update notifications will reach some members of this mailing list. Please pardon if you will recieve them. URL

Re: [Linux-ha-dev] Release 2.1.3 planned for 10 December, 2007

2007-10-15 Thread Lars Marowsky-Bree
On 2007-10-15T06:53:36, Alan Robertson [EMAIL PROTECTED] wrote: I have talked to Tadashiro Yoshida about his test team in NTT doing this. For their own reasons his test team needs to test our releases anyway. I thought it would be good to just let them be the test team, since they

Re: [Linux-HA] LZO requirement in interim RPMs

2007-10-15 Thread Lars Marowsky-Bree
On 2007-10-11T20:37:39, Carson Gaspar [EMAIL PROTECTED] wrote: It has LZO dependencies for _everyone_. And if you undef it, it fails because the macro isn't conditionalized when referenced later (%{foo} instead of %{?foo}). I had to whack it with a machete to get it to build. I can provide

Re: [Linux-HA] lrmd: G_SIG_dispatch ... dispatch function took to long

2007-10-15 Thread Lars Marowsky-Bree
On 2007-10-14T17:23:28, Raoul Bhatia [IPAX] [EMAIL PROTECTED] wrote: for all of you who cannot wait for new interim builds, you can always get a special and/or the latest revision [1] and build your own interim release. http://software.opensuse.org/download/home:/LarsMB/ has daily builts.

Re: [Linux-HA] External Stonith Plugin

2007-10-15 Thread Lars Marowsky-Bree
On 2007-10-15T11:43:41, Andreas Mock [EMAIL PROTECTED] wrote: what do you mean with legal stuff? I've put it under GPLv2. Careful distinction: the heartbeat code base is licensed using the GPLv2 or later clause. GPLv2-only code would not fit right in. Is this o.k. with HAv2? Yes, it should

Re: [Linux-HA] is_managed option for clone resource

2007-10-15 Thread Lars Marowsky-Bree
On 2007-10-15T16:22:07, Junko IKEDA [EMAIL PROTECTED] wrote: can't clone + group resource take is_managed option? I attached the logs on DC. They can, but again, you've found a bug ;-) Please file a bugzilla entry, and include the backtraces from the coredumps. The mailing list is not the

Re: [Linux-HA] RE: Linux-HA Digest, Vol 47, Issue 35

2007-10-15 Thread Lars Marowsky-Bree
On 2007-10-15T11:08:12, Stefano Colombo [EMAIL PROTECTED] wrote: Hi , I tried but got the following error /usr/sbin/ocf-tester -n TEST /DS1/ha.d/resource.d/ocf_vmware Beginning tests for /DS1/ha.d/resource.d/ocf_vmware... * rc=7: Your agent was active and could not be stopped Aborting

Re: [Linux-HA] lrmd: G_SIG_dispatch ... dispatch function took to long

2007-10-15 Thread Lars Marowsky-Bree
On 2007-10-15T14:17:02, Raoul Bhatia [IPAX] [EMAIL PROTECTED] wrote: the latest build i see is heartbeat v2.1.2.200710-1. the changes i needed are from 20071014 in andrews repository - don't know if a daily build would allow me to pull a specific version :) Hrm, weird; updates have

Re: [Linux-HA] ERROR: Message hist queue is filling up (200 messages in queue)

2007-10-15 Thread Lars Marowsky-Bree
On 2007-10-15T16:43:52, alexus [EMAIL PROTECTED] wrote: ERROR: Message hist queue is filling up (200 messages in queue) how would i read those messages? i'm starting my heartbeat and after 10 seconds thats the only thing i see in my syslog/messages anyone? That sounds as if

Re: [Linux-HA] patch for mysql ocf script

2007-10-15 Thread Lars Marowsky-Bree
On 2007-10-15T18:09:04, Raoul Bhatia [IPAX] [EMAIL PROTECTED] wrote: der dejan, thank you for applying the patches. and thank you for mentioning my name ;) i have another - hopefully for all - handy patch for mysql. it adds additional_parameters so that one can specify additional mysqld

<    4   5   6   7   8   9   10   11   12   13   >