RE: [Linux-HA] Two-node clusters in split-sites

2009-01-22 Thread Alex Strachan
-Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha- boun...@lists.linux-ha.org] On Behalf Of Hell, Robert Sent: Thursday, 22 January 2009 9:33 PM To: linux-ha@lists.linux-ha.org Subject: [Linux-HA] Two-node clusters in split-sites Hi, we are

RE: [Linux-HA] Frequency of RPM builds on the opensuse website

2009-01-13 Thread Alex Strachan
On Mon, Jan 12, 2009 at 07:28, Alex Strachan astrac...@inter-systems.com.au wrote: How often are the RPM built on this website? http://download.opensuse.org/repositories/server:/ha- clustering/RHEL_4/x86_6 4/ For pacemaker: usually once a month but we skipped December due to vacation

[Linux-HA] Frequency of RPM builds on the opensuse website

2009-01-11 Thread Alex Strachan
How often are the RPM built on this website? http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/x86_6 4/ When is the next update planned for? The files currently available are from 21-Nov-2008. -- Alex ___

RE: [Linux-HA] make heartbeat startup when node boots up

2008-12-17 Thread Alex Strachan
Normally done by the OS start/stop scripts. e.g. in RedHat Linux [r...@dtbaims ~]# ls -l /etc/init.d/heartbeat -rwxr-xr-x 1 root root 9528 Nov 15 04:50 /etc/init.d/heartbeat [r...@dtbaims ~]# chkconfig --list | grep heartbeat heartbeat 0:off 1:off 2:on3:on4:on5:on

RE: [Linux-HA] make heartbeat startup when node boots up

2008-12-17 Thread Alex Strachan
Did you install rpm or from source? If from source you may need to make the links yourself. e.g [r...@dtbaims ~]# find /etc -type f -name *heartbeat* /etc/rc.d/init.d/heartbeat /etc/logrotate.d/heartbeat [r...@dtbaims ~]# find /etc -type l -name *heartbeat* /etc/rc.d/rc5.d/S75heartbeat

RE: [Linux-HA]_stonith_-_ibmrsa-telnet_TIMEOUT, _how_to enablepython debug

2008-12-09 Thread Alex Strachan
- Von: Alex Strachan [EMAIL PROTECTED] Gesendet: 08.12.08 03:20:07 An: 'General Linux-HA mailing list' linux-ha@lists.linux-ha.org Betreff: RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enablepython debug Hi Andreas, Thank you for responding. See inline for comments

RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enablepython debug

2008-12-07 Thread Alex Strachan
Hi Andreas, Thank you for responding. See inline for comments. -Original Message- From: [EMAIL PROTECTED] [mailto:linux-ha- [EMAIL PROTECTED] On Behalf Of Andreas Mock Sent: Saturday, 6 December 2008 10:17 AM To: General Linux-HA mailing list Subject: Re: [Linux-HA] stonith -

RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enablepython debug

2008-12-02 Thread Alex Strachan
] On Behalf Of Alex Strachan Sent: Tuesday, 2 December 2008 12:36 PM To: 'General Linux-HA mailing list' Subject: RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT,how to enablepython debug The stonith debug file from heartbeat startup. Note - heartbeat tries to do fencing on dtbaims since heartbeat

RE: [Linux-HA] Problem with mailman

2008-12-02 Thread Alex Strachan
Any output in /var/log/messages for when HA tries to start masterMailman? e.g. Dec 2 17:55:12 itbaims lrmd: [3790]: info: RA output: (resource_its_fild:start:stdout) Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. This is the output from a script. Enable

[Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enable python debug

2008-12-01 Thread Alex Strachan
Two nodes Node - dtbaims does not have heartbeat software running. [EMAIL PROTECTED] ~]# crm_mon -1 . Node: dtbaims (4f1614ac-d465-49db-b847-bac60f9dac6c): OFFLINE Node: itbaims (96595e56-e3db-42da-b13b-1e2d3a956529): online r_stonith-dtbaims (stonith:external/ibmrsa-telnet):

RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enable python debug

2008-12-01 Thread Alex Strachan
/ /meta_attributes /primitive -Original Message- From: [EMAIL PROTECTED] [mailto:linux-ha- [EMAIL PROTECTED] On Behalf Of Alex Strachan Sent: Monday, 1 December 2008 9:13 PM To: 'General Linux-HA mailing list' Subject: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT,how to enable

RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enablepython debug

2008-12-01 Thread Alex Strachan
: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enablepython debug Hi again, On Mon, Dec 01, 2008 at 09:34:40PM +1000, Alex Strachan wrote: Stonith primitive defn. primitive id=r_stonith-dtbaims class=stonith type=external/ibmrsa-telnet operations

RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enablepython debug

2008-12-01 Thread Alex Strachan
December 2008 11:30 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enablepython debug Hi again, On Mon, Dec 01, 2008 at 09:34:40PM +1000, Alex Strachan wrote: Stonith primitive defn. primitive id=r_stonith-dtbaims class

RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enablepython debug

2008-12-01 Thread Alex Strachan
. -Original Message- From: [EMAIL PROTECTED] [mailto:linux-ha- [EMAIL PROTECTED] On Behalf Of Alex Strachan Sent: Tuesday, 2 December 2008 11:05 AM To: 'General Linux-HA mailing list' Subject: RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT,how to enablepython debug Please ignore that last

RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enablepython debug

2008-12-01 Thread Alex Strachan
When running from the command line I can see the 'power cycle. How do I get heartbeat to use the reset command? [EMAIL PROTECTED] hb]# stonith -v -t external/ibmrsa-telnet -p dtbaimsilo 192.168.201.37 stonith -T reset dtbaims From the log file... 2008-12-02 11:36:10,153:

RE: [Linux-HA] stonith - ibmrsa-telnet TIMEOUT, how to enablepython debug

2008-12-01 Thread Alex Strachan
The stonith debug file from heartbeat startup. Note - heartbeat tries to do fencing on dtbaims since heartbeat software is not running on dtbaims. Dec 2 12:29:23 itbaims crmd: [3793]: info: te_fence_node: Executing reboot fencing operation (32) on dtbaims (timeout=6) Dec 2 12:30:24 itbaims

RE: [Linux-HA] pingd - clones, non-symmetrical cluster, rsc_location rules - HA 2.99.1, pacemaker 1.0

2008-11-19 Thread Alex Strachan
Non-symmetrical cluster All I get is Resource Group: group_its resource_its_drbd (heartbeat:its_drbddisk): Started itbaims resource_its_fs (ocf::heartbeat:its_Filesystem):Started itbaims resource_its_vip(ocf::heartbeat:IPaddr):Started itbaims

RE: [Linux-HA] Missing dependency on new RPM's

2008-11-18 Thread Alex Strachan
Similar issue, did eventually manage to upgrade. RHEL4 64bit, 2 node cluster Current software... heartbeat-2.99.2-3.1 heartbeat-common-2.99.2-3.1 heartbeat-resources-2.99.2-3.1 libheartbeat2-2.99.2-3.1 libopenais2-0.80.3-10.1 libpacemaker3-1.0.0-4.1 openais-0.80.3-10.1

RE: [Linux-HA] pingd - clones, non-symmetrical cluster, rsc_location rules - HA 2.99.1, pacemaker 1.0

2008-11-18 Thread Alex Strachan
- HA 2.99.1, pacemaker 1.0 On Tue, Nov 4, 2008 at 11:31, Adrian Chapela [EMAIL PROTECTED] wrote: Andrew Beekhof escribió: On Thu, Oct 30, 2008 at 13:00, Adrian Chapela [EMAIL PROTECTED] wrote: Alex Strachan escribió: Hi All, HA non-symmetrical cluster with two nodes

RE: [Linux-HA] resources unmanaged/managed - unexpected behavior

2008-11-05 Thread Alex Strachan
Ok, then the reason for the restart is that the resource definition changed (the value of the is-managed attribute changed requiring a restart). By using --meta, you're telling the cluster that this is an option for the PE - not the resource. This puts it in a different namespace which

RE: [Linux-HA] how to use pingd for stopping services on a node when/ifit loses network connectivity?

2008-11-03 Thread Alex Strachan
Hi Allan, From an earlier post I did a few days ago - response from Andrew How do I configure a location rule on a non-symmetric cluster so that pingd:0,1 will run? rsc_location id=foo rsc=pingd node=dtbaims score=1/ rsc_location id=foo rsc=pingd node=itbaims score=1/ The constraints below

RE: [Linux-HA] crm_mon - ha 2.99.1, pacemaker 1.0 - not showing unmanaged status

2008-11-03 Thread Alex Strachan
Yep - that made a difference! Thanks -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dominik Klein Sent: Monday, 3 November 2008 7:21 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] crm_mon - ha 2.99.1,pacemaker 1.0 - not showing unmanaged

RE: [Linux-HA] crm_mon - ha 2.99.1, pacemaker 1.0 - not showing unmanaged status

2008-11-03 Thread Alex Strachan
, Nov 3, 2008 at 03:06, Alex Strachan [EMAIL PROTECTED] wrote: Prior releases to HA 2.99.1, pacemaker 1.0 monitoring command crm_mon showed a resource which was currently unmanaged visibly. e.g. Resource Group: group_itsapaims1 resource_itsapaims1_drbd

RE: [Linux-HA] What to do when file-system becomes read-only due toharddisk error and the heartbeat still alive?

2008-11-02 Thread Alex Strachan
Hi Jonas, If the kernel detects a problem with a filesystem you can configure ext2/3 filesytems to panic. Seeman tune2fs option -e panic -- Alex -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nikita Michalko Sent: Friday, 31 October 2008 8:49 PM

RE: [Linux-HA] Suggestions needed, SuSE 10

2008-11-02 Thread Alex Strachan
Hi Landon, I just recently installed 2.99.1. I got the rpm's from http://www.clusterlabs.org/mw/Install In the future they hope to simplify this and fully separate the heartbeat/openais files. These are the packages that I installed onto my RHES4 hosts. [EMAIL PROTECTED] ~]# rpm -qa | egrep

[Linux-HA] crm_mon - ha 2.99.1, pacemaker 1.0 - not showing unmanaged status

2008-11-02 Thread Alex Strachan
Prior releases to HA 2.99.1, pacemaker 1.0 monitoring command crm_mon showed a resource which was currently unmanaged visibly. e.g. Resource Group: group_itsapaims1 resource_itsapaims1_drbd(heartbeat:itsapaims1_drbddisk): Started canopus (unmanaged) resource_itsapaims1_fs

[Linux-HA] pingd - clones, non-symmetrical cluster, rsc_location rules - HA 2.99.1, pacemaker 1.0

2008-10-30 Thread Alex Strachan
Hi All, HA non-symmetrical cluster with two nodes; dtbaims, itbaims. HA 2.99.1, pacemaker 1.0 I have been reviewing the pacemaker_configuration guide but no luck with this config. Lots of variations but no joy. Error: Oct 30 16:58:35 dtbaims pengine: [27989]: WARN: native_color: Resource

RE: [Linux-HA] Stonith, 2 node cluster - on loss ofpowertoprimarynode; failure to secondary didn't happen.

2008-10-30 Thread Alex Strachan
The first test my boss likes to apply to a HA setup is to remove the power cords from the back of the running primary server. By having a stonith device (IBM RSA) running from the same power as the host the failover no longer happens. :-( We could power the RSA independently - maybe there is a

RE: [Linux-HA] Stonith, 2 node cluster - on loss of power to primarynode; failure to secondary didn't happen.

2008-10-29 Thread Alex Strachan
=dtbaims/ /rule /rsc_location -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alex Strachan Sent: Wednesday, 29 October 2008 3:26 PM To: 'General Linux-HA mailing list' Subject: [Linux-HA] Stonith, 2 node cluster - on loss of power

RE: [Linux-HA] Stonith, 2 node cluster - on loss of power toprimarynode; failure to secondary didn't happen.

2008-10-29 Thread Alex Strachan
- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alex Strachan Sent: Wednesday, 29 October 2008 4:07 PM To: 'General Linux-HA mailing list' Subject: RE: [Linux-HA] Stonith, 2 node cluster - on loss of power toprimarynode; failure to secondary didn't happen. When power

RE: [Linux-HA] /var/lib/heartbeat/cores/pvm directory, ownership and mode? usage? HA 2.99.1, pacemaker 1.0

2008-10-29 Thread Alex Strachan
PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Beekhof Sent: Wednesday, 29 October 2008 5:48 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] /var/lib/heartbeat/cores/pvm directory,ownership and mode? usage? HA 2.99.1, pacemaker 1.0 On Wed, Oct 29, 2008 at 01:27, Alex Strachan

RE: [Linux-HA] /var/lib/heartbeat/cores/pvm directory, ownership and mode? usage? HA 2.99.1, pacemaker 1.0

2008-10-28 Thread Alex Strachan
PROTECTED] On Behalf Of Andrew Beekhof Sent: Tuesday, 28 October 2008 9:08 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] /var/lib/heartbeat/cores/pvm directory,ownership and mode? usage? HA 2.99.1, pacemaker 1.0 On Tue, Oct 28, 2008 at 01:11, Alex Strachan [EMAIL PROTECTED] wrote: Yes

[Linux-HA] Stonith, 2 node cluster - on loss of power to primary node; failure to secondary didn't happen.

2008-10-28 Thread Alex Strachan
Finally configured Stonith for an HA cluster - believe me doing this made me happy! Versions - heartbeat 2.99.1, pacemaker 1.0, redhat 4 x86_64 I have two nodes, dtbaims, itbaims. Stonith device ibmrsa-telnet is being used; failover is fine when doing a reset via the RSA card. Complete

RE: [Linux-HA] /var/lib/heartbeat/cores/pvm directory, ownership and mode? usage? HA 2.99.1, pacemaker 1.0

2008-10-27 Thread Alex Strachan
/cores/pvm directory,ownership and mode? usage? HA 2.99.1, pacemaker 1.0 Are these packages from the build service? On Mon, Oct 27, 2008 at 03:44, Alex Strachan [EMAIL PROTECTED] wrote: Hi, I have just recently installed HA 2.99.1, pacemaker 1.0 on RHAS4 x86_64 On startup I am

RE: [Linux-HA] Updated from 2.1.3 to 2.99.x w/ Pacemaker 1.x and CIB no longer conforms to DTD

2008-10-26 Thread Alex Strachan
See http://www.clusterlabs.org/wiki/images/f/fb/Configuration_Explained.pdf I have just installed the same revisions as you and I found the above document very useful. The Appendix - Upgrading Cluster Software should help. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL

[Linux-HA] /var/lib/heartbeat/cores/pvm directory, ownership and mode? usage? HA 2.99.1, pacemaker 1.0

2008-10-26 Thread Alex Strachan
Hi, I have just recently installed HA 2.99.1, pacemaker 1.0 on RHAS4 x86_64 On startup I am getting the following warnings. Oct 27 11:13:00 dtbaims cib: [13164]: ERROR: Cannot chdir to [/var/lib/heartbeat/cores/pvm]: No such file or directory Oct 27 11:13:00 dtbaims attrd:

[Linux-HA] How to create a rsc_order 'OR' condition

2008-06-02 Thread Alex Strachan
State1 == nodeA resourceA --- resourceC nodeB resourceB State2 == nodeA resourceA nodeB resourceB --- resourceC Rule: ResourceC can only start AFTER (resourceA or resourceB), a preference for resourceA is needed. Attempted config

RE: [Linux-HA] Not started heartbeat!

2008-06-02 Thread Alex Strachan
Confirm the output from 'uname -n' on both nodes matches what is in ha.cf -Original Message- From: [EMAIL PROTECTED] [mailto:linux-ha- [EMAIL PROTECTED] On Behalf Of Nguyen Quang Huy Sent: Tuesday, 3 June 2008 11:59 AM To: linux-ha@lists.linux-ha.org Subject: [Linux-HA] Not

[Linux-HA] failure of resync of secondary after successful connection to primary - resulted in data loss

2007-09-21 Thread Alex Strachan
Hi All, This error has resulted in data loss and I need to fully understand why and hopefully stop it from happening again. Any help in this would be warmly received. Thanks Alex I have experienced a failure for DRBD to reconnect after a system forced power off/on due to a hung

RE: [Linux-HA] Resource Becoming Unamanged

2007-05-21 Thread Alex Strachan
+Cleanup resource # crm_resource -C -r resource_rokfids_dhcpd +Set resource to managed # crm_resource -p is_managed -r resource_rokfids_aims -t primitive -v true If the resource goes back to unmanaged then probably something wrong with the scripts. run # crm_verify -L -VVV and review

RE: [Linux-HA] ERROR: parse_xml: Expected: action - HB 2.0.8

2007-04-30 Thread Alex Strachan
To: General Linux-HA mailing list Subject: Re: [Linux-HA] ERROR: parse_xml: Expected: action - HB 2.0.8 On Fri, Apr 27, 2007 at 11:22:44AM +1000, Alex Strachan wrote: Error in /var/log/messages Apr 27 11:07:26 deneb crmd: [3038]: info: process_lrm_event: LRM operation

RE: [Linux-HA] ERROR: parse_xml: Expected: action - HB 2.0.8

2007-04-30 Thread Alex Strachan
PROTECTED] On Behalf Of Alex Strachan Sent: Tuesday, 1 May 2007 10:36 AM To: 'General Linux-HA mailing list' Subject: RE: [Linux-HA] ERROR: parse_xml: Expected: action - HB 2.0.8 The conclusion for this error is that there is something wrong with the meta-data of the resource script

[Linux-HA] ERROR: parse_xml: Expected: action - HB 2.0.8

2007-04-26 Thread Alex Strachan
Error in /var/log/messages Apr 27 11:07:26 deneb crmd: [3038]: info: process_lrm_event: LRM operation resource_itsapaims_skel1_start_0 (call=134, rc=0) complete Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Expected: action Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error