Re: [Linux-HA] stand_alone_ping: Node xx.yy.zz.ww is unreachable (read)

2009-07-23 Thread Andrew Beekhof
version? On Wed, Jul 22, 2009 at 4:16 PM, wrote: > Hi All, > > I'm using pingd in clone mode (pacemaker). I observe lots of messages saying > that "Node xx.yy.zz.ww is unreachable" (see at the end of this mail). > "xx.yy.zz.ww" is IP of main router. I checked the connection using the system >

Re: [Linux-HA] all or none failover

2009-07-23 Thread Andrew Beekhof
On Thu, Jul 23, 2009 at 8:03 PM, Cantwell, Bryan wrote: > I have set up a new 2 node environment that right now only runs httpd. > > I use heartbeat 2.0.8. yuk > If I stop the heartbeat on master then slave takes over, and if I power off > master then slave takes over, but if I kill the httpd se

Re: [Linux-HA] Adding a node to HA-Cluster without service interruption

2009-07-23 Thread Andrew Beekhof
On Thu, Jul 23, 2009 at 6:22 AM, Michael Schwartzkopff wrote: > Am Mittwoch, 22. Juli 2009 23:38:26 schrieb Alexander Födisch: >> Hi, >> >> I have a samba cluster w/ three nodes (heartbeat 2.1.3 / crm-enabled). Now >> I need to add a fourth one. What will be the best way to do this w/o any >> servi

Re: [Linux-HA] migrate_to and migrate_from

2009-07-21 Thread Andrew Beekhof
On Tue, Jul 21, 2009 at 9:32 AM, Malte Geierhos wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > hi >>> meta allow-migrate=true >>> >>> hth, >>> Florian >> >> thanks. Any doc about this feature? >> > Just don't ask about docs ... > We can read source. > > It's a miracle. Oh and somewher

Re: [Linux-HA] migrate_to and migrate_from

2009-07-21 Thread Andrew Beekhof
I'll write it up in the next couple of days. I have a few other changes to the pdf that need to be pushed out too On Tue, Jul 21, 2009 at 10:56 AM, Michael Schwartzkopff wrote: > Am Dienstag, 21. Juli 2009 10:35:57 schrieb Lars Marowsky-Bree: >> On 2009-07-21T09:10:03, Michael Schwartzkopff wrote

Re: [Linux-HA] Failover 2 servers multiple IP's and SSL

2009-07-20 Thread Andrew Beekhof
On Sat, Jul 18, 2009 at 4:53 PM, Tom Potwin wrote: > Hi > > I've searched all over, and just gotten more confused. I need to set up a > auto failover for my primary server. I have the same system backed up on to > my second server in real time already. Both servers have two NIC's; one for > the net

Re: [Linux-HA] hb_gui // Pacemaker 1.0.4 & Hb 2.99 / pb to add constraints is identified)

2009-07-17 Thread Andrew Beekhof
On Fri, Jul 17, 2009 at 11:29 AM, Xinwei Hu wrote: > It's required by relax-ng actually. So the _encrypted_ error messages > are spitted directly from libxml :( > > Yan's working on it to make it more verbose & understandable. > Thanks.. if you guys find a way, please let me know. i tried and fail

Re: [Linux-HA] Pacemaker 1.0.4 & Hb 2.99 / question about RA stop target

2009-07-16 Thread Andrew Beekhof
On Wed, Jul 15, 2009 at 4:31 PM, Alain.Moulle wrote: > Hi, > > I've a declared an OCF  RA and test the start , stop thanks to hb_gui. > When I start the resource , it starts immediately. > Then if I ask for stopping, it takes 120s to begin the stop, but stop is > executed and successful. > Except i

Re: [Linux-HA] Next question regarding constraints sets

2009-07-13 Thread Andrew Beekhof
On Tue, Jul 7, 2009 at 9:32 PM, Michael Schwartzkopff wrote: > Hi, > > I defined a order and a colocation constraints according to the doc: > > >     >       >       >       >     >   >   >     >       >       >       >     >   > > So I thought that the resources should run on the same node, but >

Re: [Linux-HA] Question regarding constraint sets

2009-07-13 Thread Andrew Beekhof
On Mon, Jul 13, 2009 at 4:29 PM, Michael Schwartzkopff wrote: > Am Montag, 13. Juli 2009 16:23:31 schrieb Andrew Beekhof: >> On Tue, Jul 7, 2009 at 9:14 PM, Michael Schwartzkopff > wrote: >> > Hi, >> > >> > In the "configuration explained" I read ab

Re: [Linux-HA] Question regarding constraint sets

2009-07-13 Thread Andrew Beekhof
On Tue, Jul 7, 2009 at 9:14 PM, Michael Schwartzkopff wrote: > Hi, > > In the "configuration explained" I read about the new resource sets within > constraints. I wanted to try it and configured three dummy resources: > > > > > > According to the doc I created the following colocation constratin

Re: [Linux-HA] cleanup resource or failcount

2009-07-13 Thread Andrew Beekhof
On Mon, Jul 13, 2009 at 4:12 PM, Cristina Bulfon wrote: > Thanks :-) > > so it's better ( or faster) to cleanup the resource directly, is that true ? usually ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo

Re: [Linux-HA] crm_mon | 2 Nodes configured, unknown expected votes

2009-07-13 Thread Andrew Beekhof
You can safely ignore that, it shouldn't be printed for heartbeat-based clusters On Fri, Jul 3, 2009 at 10:33 AM, Thomas Baumann wrote: > Hello list, > > I recently installed a mini-cluster without resources etc. > only the nodes. > What means "unknown expected votes", when using crm_mon? > >

Re: [Linux-HA] cleanup resource or failcount

2009-07-13 Thread Andrew Beekhof
On Mon, Jul 13, 2009 at 3:16 PM, Cristina Bulfon wrote: > Ciao, > > which is the difference between  clear the failcount  (crm_failcount -G -U -G does a query, -D will remove it but this only removes the record of how many times a resource failed > ... ) and cleanup the resource  (crm_resource -

Re: [Linux-HA] Resource set question

2009-07-13 Thread Andrew Beekhof
On Mon, Jul 13, 2009 at 11:29 AM, Steinhauer Juergen wrote: > Hi, > > it was my fault. I didn't set "ordered" as meta-attribute. > Nevertheless, the wanted behaviour is still not achieved. > > I created a colocation like this: > > sequential="true"> >           >           > > > I would expect, t

Re: [Linux-HA] Resource set question

2009-07-13 Thread Andrew Beekhof
On Fri, Jul 10, 2009 at 12:53 PM, Gawith wrote: > Hi Dominik, > >> Set ordered=false for the ip group. That will start them in parallel. I >> think. Then specify a resource order constraint to start your app group >> after the ip group and a colocation constraint to have the apps on the >> same nod

Re: [Linux-HA] Heartbeat-v2 or Pacemaker/ Question about first configuration

2009-06-30 Thread Andrew Beekhof
On Tue, Jun 30, 2009 at 2:18 PM, Alain.Moulle wrote: > Hi, > it seems that for example on four nodes cluster, the basic  first cib.xml > is created only if we start heartbeat on the four nodes. nope, it should exist after one. (you dont need to care about the contents of the section if thats what

Re: [Linux-HA] how to configure a resource to run on active node only

2009-06-30 Thread Andrew Beekhof
On Tue, Jun 30, 2009 at 7:28 AM, MAHESH, SIDDACHETTY M (SIDDACHETTY M) wrote: > Hi, > >  I have a setup with two nodes in the HA cluster. Both nodes share a virtual > IP (bound to the currently active node). If the cluster software is running, then the node is active. >There are two resources 'A

Re: [Linux-HA] Pacemaker without Heartbeat-v2 ?

2009-06-29 Thread Andrew Beekhof
3.1.x86_64 >        heartbeat-common is needed by pacemaker-1.0.4-23.1.x86_64 > > So ... it seems not possible to configure a HA cluster with Pacemaker > without Hearbeat-v2 ? > Except if someone gives me a way to do it ? > > Thanks > Regards > Alain Moullé > >> Fro

Re: [Linux-HA] stonith riloe - nodes kill each other

2009-06-26 Thread Andrew Beekhof
On Fri, Jun 26, 2009 at 3:07 PM, Jan Kalcic wrote: > Andrew Beekhof wrote: >> On Fri, Jun 26, 2009 at 10:55 AM, Jan wrote: >> >>> Hi, >>> >>> a very boring issue with stonith using the plugin external/riloe (never used >>> it). Whenever I try to s

Re: [Linux-HA] Heartbeat-v2 in the future ?

2009-06-26 Thread Andrew Beekhof
'it ? right > Thanks > Alain Moullé >> From: Andrew Beekhof >> Subject: Re: [Linux-HA] Heartbeat-v2 in the future ? >> To: General Linux-HA mailing list >> Message-ID: >>       >> Content-Type: text/plain; charset=ISO-8859-1 >> >> On F

Re: [Linux-HA] Heartbeat-v2 in the future ?

2009-06-26 Thread Andrew Beekhof
On Fri, Jun 26, 2009 at 11:34 AM, Alain.Moulle wrote: > Hi, > does anybody here could for sure confirm or cancel information telling that > Hearbeat-v2 project  is really stopped and definitively replaced by > Pacemaker ? Pacemaker doesn't replace all of Heartbeat*, just the CRM part which is abso

Re: [Linux-HA] Heart-beat-v2 : question about configuration

2009-06-26 Thread Andrew Beekhof
On Thu, Jun 25, 2009 at 3:53 PM, Alain.Moulle wrote: > Hi > > I set the 3 node names of my cluster in /etc/ha.d/ha.cf, > I wonder if there is a command to add a new node in the cluster dynamically, > whereas heartbeat is started on previous 3 nodes ? check out the autojoin directive for ha.cf if y

Re: [Linux-HA] 2.0.4 / question about cib.xml configuration

2009-06-26 Thread Andrew Beekhof
On Tue, Jun 16, 2009 at 12:50 PM, Alain.Moulle wrote: > Hi > I would like to pre-configure several  HA cluster files cib.xml from a > remote node, > and to push them on each nodes of HA clusters with Heartbeat-v2. But > when I put > a cib.xml under the repertory /var/lib/heartbeat/crm on target nod

Re: [Linux-HA] 2.0.8 : two questions (contd.)

2009-06-26 Thread Andrew Beekhof
On Tue, Jun 16, 2009 at 2:26 PM, Michael Schwartzkopff wrote: > Am Dienstag, 16. Juni 2009 14:19:51 schrieb Alain.Moulle: >> Hi >> >> Thanks your answers Dejan, but about your remark on 2.0.8, is there >> any big issue with this release ? because on RH or CentOs , it seems to >> be the very last re

Re: [Linux-HA] Corosync / OpenAIS

2009-06-26 Thread Andrew Beekhof
On Wed, Jun 17, 2009 at 6:18 PM, wrote: > Sorry if this has been asked before, how does Corosync relate to > OpenAIS? Are they one and the same now? Will the next major update of > OpenAIS included in SLES likely be called Corosync? Or am I completely > wide of the mark? J > The openais developer

Re: [Linux-HA] Strange monitoring experience of a LSB resource

2009-06-26 Thread Andrew Beekhof
On Tue, Jun 23, 2009 at 1:42 PM, Michael Schwartzkopff wrote: > Hi, > > I have a squid LSB resource on my pacemaker-1.0.4 cluster. Ok. I know that > there is a OCF resource, but this is for historical reasons. I checked the > failcounters and the resource had two errors. So I checked the logfiles a

Re: [Linux-HA] stonith riloe - nodes kill each other

2009-06-26 Thread Andrew Beekhof
On Fri, Jun 26, 2009 at 10:55 AM, Jan wrote: > Hi, > > a very boring issue with stonith using the plugin external/riloe (never used > it). Whenever I try to simulate a split-brain condition (using iptables) in > order to test stonith, both nodes kill each other. Not exactly what > expected. Sure i

Re: [Linux-HA] CRM documentation all gone?

2009-06-26 Thread Andrew Beekhof
On Mon, Jun 15, 2009 at 7:55 PM, Michael Schwartzkopff wrote: > Am Montag, 15. Juni 2009 10:59:09 schrieb Mark Hunting: >> Hi, >> >> I am trying to use Heartbeat 2 in CRM configuration. I'm using Debian >> Lenny, which hasn't got pacemaker yet (but heartbeat 2.1.3 instead) >> The problem is that on

Re: [Linux-HA] Fedora 11 install issues

2009-06-26 Thread Andrew Beekhof
On Tue, Jun 23, 2009 at 12:27 PM, Michael Schwartzkopff wrote: > Am Dienstag, 23. Juni 2009 12:22:41 schrieb jayfitzpatr...@gmail.com: >> Morning all >> >> I have just been going through trying to install heartbeat / pacemaker onto >> a test system and have noticed the following error during instal

Re: [Linux-HA] active/standby setup: Can one run a resource on standby only?

2009-06-26 Thread Andrew Beekhof
On Fri, Jun 26, 2009 at 12:34 AM, Christoph Lechner wrote: > Dejan Muhamedagic wrote: >> Hi, >> >> On Sun, Jun 21, 2009 at 11:53:52PM +0200, Christoph Lechner wrote: >>> Hi all, >>> >>> I want my active/standby heartbeat setup (Version 2.1.4, CRM enabled) to >>> run a specific resource on the stand

Re: [Linux-HA] drbd pacemaker heartbeat oh my

2009-06-25 Thread Andrew Beekhof
On Thu, Jun 25, 2009 at 1:57 AM, Michael Hutchins wrote: > Well, now I am in a pickle. > > So when I enable crm, I get this " socket_wait_conn_new: trying to create in > /var/run/crm/cib_callback bind:: No such file or directory" and never ending > reboots. > > Googles it and came back with a bug

Re: [Linux-HA] pingd vs. ipfail: how to compare the ping nodes??

2009-06-09 Thread Andrew Beekhof
On Mon, Jun 8, 2009 at 8:51 AM, Patrick Roßbach wrote: > Hi, > > we are migrating from heartbeat v1 to v2 and also want to use pingd > instead of ipfail. Our two cluster nodes are connected to redundant > networks (two switches etc.) to communicate to the rest of the world. > With v1 ipfail compar

Re: [Linux-HA] Resource doesn't come online after failure-timeout has expired

2009-06-04 Thread Andrew Beekhof
t; > dc-uuid="79298950-c48f-4415-b4b2-0abba4a7ac8d"> >   >     >       >         value="1.0.2-c02b459053bfa44d509a2a0e0247b291d93662b7"/> >         name="last-lrm-refresh" value="1243528412"/> >         name="cluster-recheck-interval" value

Re: [Linux-HA] pacemaker 1.0.4 errs

2009-06-04 Thread Andrew Beekhof
On Thu, Jun 4, 2009 at 2:42 PM, Michael Schwartzkopff wrote: > Hi, > > I wanted to give 1.0.4 a try. Here is the output in syslog after install on a > completely new node: > > Jun  4 14:31:52 xen14 lrmd: [5343]: ERROR: on_msg_del_rsc: no rsc with id > U8�E^H�E�e�^T. > > Are theses ERRORs norma

Re: [Linux-HA] pacemaker 1.0.2 - can't migrate resources

2009-06-02 Thread Andrew Beekhof
Ok, so problem solved? On Sun, May 31, 2009 at 5:46 AM, wrote: >> I'd need the logs from both nodes (in particular the one acting as DC). >> Try using hb_report - it gathers all the relevant information > >> On Thu, May 28, 2009 at 4:21 PM,   wrote: >> > Hello All, >> > >> > I'm running a two-no

Re: [Linux-HA] Duplicate standby values

2009-06-02 Thread Andrew Beekhof
Sorry for the delay... Were you using the GUI at all? It looks like somehow one of the tools ended up creating the two items with the same ID - one in the nodes section and the other in the status section. The easiest way to deal with it is remove one with cibadmin: cibadmin --xml-text '' O

Re: [Linux-HA] DRBD Master fails back un-intentionally

2009-06-02 Thread Andrew Beekhof
On Wed, May 27, 2009 at 9:02 PM, Michael Schwartzkopff wrote: > Am Mittwoch, 27. Mai 2009 15:37:06 schrieb Andrew Beekhof: > (...) >> Stickiness doesn't control promotion/demotion (perhaps it should but >> thats another story). >> The only thing that matters is

Re: [Linux-HA] Re sources wont run anywhere when PingNode is unreachable

2009-06-02 Thread Andrew Beekhof
On Thu, May 28, 2009 at 4:28 PM, firewall wrote: > > Hi all, > > I'm having problems getting a split-brain to work (for demo, yes that's > right). Basically, I have an active/passive cluster both configured with > pingd like below. When the ping node is not reachable on the primary/active > (i.e.

Re: [Linux-HA] Resource doesn't come online after failure-timeout has expired

2009-06-02 Thread Andrew Beekhof
On Fri, May 29, 2009 at 10:27 AM, Koen Verwimp wrote: > > Anyone idea what to do for a automatically migration after expiring the > failure-timeout? The section called "Ensuring Time Based Rules Take Effect" in the documentation also applies here. The version you have defaulted the value of clu

Re: [Linux-HA] questions about using HA

2009-06-02 Thread Andrew Beekhof
On Tue, Jun 2, 2009 at 4:44 PM, Dimitri Maziuk wrote: > blue_hmq wrote: >> hi, i am sorry to disturb you because i have some questions about using  HA >> >> i had configured the HA on two computer(HA01 as master,HA02),using apache2 >> service. >> if i stop apache2 service on HA01,but the Ethernet

Re: [Linux-HA] Node Selection on Failover

2009-05-29 Thread Andrew Beekhof
On Tue, May 26, 2009 at 10:07 PM, Kevin Harms wrote: > >   I have setup an 8 node cluster. The cluster has 15 resources. I > setup the system such that all 15 resources are  distributed on the 7 > primary nodes when the cluster starts up. I would like it such that > when a node fails, the resource

Re: [Linux-HA] Recovering a Fragile CIB after Debian Lenny upgrade

2009-05-29 Thread Andrew Beekhof
On Wed, May 27, 2009 at 7:57 PM, Imran Chaudhry wrote: > One out-standing question I have is that if I reboot foo, then the > resources will migrate to bar but when foo comes back up the resources > migrate back to foo. I did not expect this to happen since I have > "auto_failback off" in ha.cf.

Re: [Linux-HA] Resources get restarted when a node joins the cluster

2009-05-29 Thread Andrew Beekhof
gt; If anyone can bring any light on this matter please do. This is > essentiell for me. > > Regards, > Tobi > > > Andrew Beekhof wrote: >> On Tue, May 26, 2009 at 2:56 PM, Tobias Appel wrote: >>> Hi, >>> >>> In the past sometimes the following ha

Re: [Linux-HA] pacemaker 1.0.2 - can't migrate resources

2009-05-28 Thread Andrew Beekhof
I'd need the logs from both nodes (in particular the one acting as DC). Try using hb_report - it gathers all the relevant information On Thu, May 28, 2009 at 4:21 PM, wrote: > Hello All, > > I'm running a two-node cluster with pacemaker 1.0.2 and heartbeat 2.99 on > SLES 10 SP2 and I'm having tr

Re: [Linux-HA] Resources get restarted when a node joins the cluster

2009-05-27 Thread Andrew Beekhof
On Tue, May 26, 2009 at 2:56 PM, Tobias Appel wrote: > Hi, > > In the past sometimes the following happened on my Heartbeat 2.1.14 cluster: > > 2-Node Cluster, all resources run one node - no location constraints > Now I restarted the "standby" node (which had no resources running but > was still

Re: [Linux-HA] DRBD Master fails back un-intentionally

2009-05-27 Thread Andrew Beekhof
On Wed, May 27, 2009 at 9:26 AM, Michael Schwartzkopff wrote: > hi, > > I have a DRBD Multistate resource. When I set the node, on which the DRBD run > as master to "standby" the DRBd instance on the secondary get promoted to the > master state. So far so good. > > When I switch the first node  on

Re: [Linux-HA] New cluster behaves VERY slow

2009-05-26 Thread Andrew Beekhof
On Tue, May 26, 2009 at 12:06 PM, Michael Schwartzkopff wrote: > Am Dienstag, 26. Mai 2009 10:26:46 schrieb Andrew Beekhof: >> On Tue, May 26, 2009 at 10:19 AM, Michael Schwartzkopff >> >> wrote: >> > Am Dienstag, 26. Mai 2009 09:42:53 schrieb Andrew Beekhof: >

Re: [Linux-HA] OpenAIS test gives strange error messages

2009-05-26 Thread Andrew Beekhof
Sorry, I meant of Pacemaker (the errors are coming from the pacemaker plugin). On Tue, May 26, 2009 at 10:15 AM, Michael Schwartzkopff wrote: > debian package fresh from OSBS verison 0.80.5-1 as far as I remember. > > Am Dienstag, 26. Mai 2009 10:04:16 schrieb Andrew Beekhof: >>

Re: [Linux-HA] New cluster behaves VERY slow

2009-05-26 Thread Andrew Beekhof
On Tue, May 26, 2009 at 10:19 AM, Michael Schwartzkopff wrote: > Am Dienstag, 26. Mai 2009 09:42:53 schrieb Andrew Beekhof: [snip] >> The cluster can't react to the current event until all the actions it >> took in order to react to the previous event have finished. >>

Re: [Linux-HA] OpenAIS test gives strange error messages

2009-05-26 Thread Andrew Beekhof
Which version was this? On Mon, May 25, 2009 at 9:14 PM, Michael Schwartzkopff wrote: > hi, > > I have a identical openais.conf on both nodes.  When I enter some changes on > the GUI I see the folloring entries in the log file: > > openais[17461]: [crm  ] ERROR: route_ais_message: Child 17892 spa

Re: [Linux-HA] New cluster behaves VERY slow

2009-05-26 Thread Andrew Beekhof
On Tue, May 26, 2009 at 8:26 AM, Michael Schwartzkopff wrote: > Am Dienstag, 26. Mai 2009 08:16:53 schrieb Andrew Beekhof: >> On Mon, May 25, 2009 at 9:49 PM, Michael Schwartzkopff >> >> wrote: >> > Hi, >> > >> > I am just setting up a new cluster

Re: [Linux-HA] New cluster behaves VERY slow

2009-05-25 Thread Andrew Beekhof
On Mon, May 25, 2009 at 9:49 PM, Michael Schwartzkopff wrote: > Hi, > > I am just setting up a new cluster. It behaves VERY slow. An example from the > logs: > > I switched on node offline: > May 25 21:43:10 mom2 cib: [2804]: info: cib_process_request: Operation > complete: op cib_modify for secti

Re: [Linux-HA] Dual-primary DRBD on SLES11 not supported?

2009-05-25 Thread Andrew Beekhof
On Mon, May 25, 2009 at 12:58 PM, Jan Kalcic wrote: > Hi all, > > I was going through several documents and I came across the pacemaker > document on clusterlabs saying "DRBD version 8's Primary/Primary mode is > not supported (yet)" > > Does it mean I should not create any OCF RA for it (pacemake

Re: [Linux-HA] limit to one automatic failover, no more

2009-05-22 Thread Andrew Beekhof
On Fri, May 22, 2009 at 4:16 PM, Mikael Kermorgant wrote: > On Fri, May 22, 2009 at 1:35 PM, Andrew Beekhof wrote: > >> On Wed, May 20, 2009 at 7:24 PM, Mikael Kermorgant >> wrote: >> > On Wed, May 20, 2009 at 6:18 PM, Mikael Kermorgant < >> > mikael.kermo

Re: [Linux-HA] default ressource stickness

2009-05-22 Thread Andrew Beekhof
On Fri, May 22, 2009 at 1:54 PM, Michael Schwartzkopff wrote: > Hi, > > I have pacemaker 1.0.2-1 installed from OSBS. I can adjust the default > resource stickiness with two (!) attributes: > # cibadmin -Q | grep "resource.*stick" > (...) >         name="default-resource-stickiness" value="1"/> >

Re: [Linux-HA] is-ordered in group?

2009-05-22 Thread Andrew Beekhof
On Wed, May 20, 2009 at 10:43 AM, Michael Schwartzkopff wrote: > Am Mittwoch, 20. Mai 2009 10:00:29 schrieb Michael Schwartzkopff: >> Hi, >> >> I the latest verison of pacemaker I do not find the is-order meta-attribute >> for groups any more? I used this attribute to start grouped resources all >

Re: [Linux-HA] limit to one automatic failover, no more

2009-05-22 Thread Andrew Beekhof
On Wed, May 20, 2009 at 7:24 PM, Mikael Kermorgant wrote: > On Wed, May 20, 2009 at 6:18 PM, Mikael Kermorgant < > mikael.kermorg...@gmail.com> wrote: > >> Hello, >> >> I have a drbd + service group (filesystem + zope + ip) configured with the >> latest pacemaker on two nodes (zeo1 and zeo2). >> >

Re: [Linux-HA] SegFault with two symmetrical colocations

2009-05-20 Thread Andrew Beekhof
On Tue, May 19, 2009 at 4:07 PM, Steinhauer Juergen wrote: > Andrew Beekhof schrieb: >> yep. >> thats a very strange place to crash. >> can you try printing the values of lpc and *node please? > > Sorry, no idea how to do that same way you got the stack trace. e

Re: [Linux-HA] SegFault with two symmetrical colocations

2009-05-19 Thread Andrew Beekhof
- > #11 0x00784896 in G_CH_dispatch_int (source=0x8247f30, callback=0, >     user_data=0x0) at GSource.c:625 > #12 0x007f91a2 in g_main_context_dispatch () from /lib/libglib-2.0.so.0 > #13 0x007fc196 in ?? () from /lib/libglib-2.0.so.0 > #14 0x007fc557 in g_main_loop_run () from /lib/libglib-2.0.so

Re: [Linux-HA] SegFault with two symmetrical colocations

2009-05-19 Thread Andrew Beekhof
On Tue, May 19, 2009 at 1:11 PM, Steinhauer Juergen wrote: > Hi guys! > > I'm running heartbeat 2.1.3 with pacemaker 0.6. > I have a resource "ip" and two resources "app1, app2", depending on "ip". > I want to achieve, that switching "app1" or "app2" to the other node, > also moves the "ip" and th

Re: [Linux-HA] Documentation for shutdown_escalation

2009-05-19 Thread Andrew Beekhof
ource that wont stop (and stonith is either not configured or not working). On Tue, May 19, 2009 at 11:31 AM, Michael Schwartzkopff wrote: > Am Dienstag, 19. Mai 2009 09:58:26 schrieb Andrew Beekhof: > >> On Mon, May 18, 2009 at 10:18 PM, Michael Schwartzkopff >> >> wrote: &g

Re: [Linux-HA] Documentation for shutdown_escalation

2009-05-19 Thread Andrew Beekhof
/usr/lib/heartbeat/crmd metadata is your friend in this instance On Mon, May 18, 2009 at 10:18 PM, Michael Schwartzkopff wrote: > Hi, > > is there any documentation for the shutdown_escalation timeout? What is it > exactly? > > Thanks. > -- > Dr. Michael Schwartzkopff > MultiNET Services GmbH > A

Re: [Linux-HA] URGENT: Problem configureing a rule inside a instacne_attribure

2009-05-18 Thread Andrew Beekhof
I answered on irc, rule needs a score too. The existing score determines which order multiple attribute sets are processed in. On Mon, May 18, 2009 at 12:56 PM, Michael Schwartzkopff wrote: > Hi, > > >  I got a problem with a rule inside a instance_attribute. I wanted to > configure > something

Re: [Linux-HA] Resoure config wizzard, part 2.

2009-05-18 Thread Andrew Beekhof
Personally, I think its a great idea. On Fri, May 15, 2009 at 9:54 PM, Michael Schwartzkopff wrote: > Hi, > > thanks to feedback from floyd and some others I developed my script a little > bit further. > > This script is intended to detect running resources automatically and to > configure pacema

Re: [Linux-HA] no nodes in pacemaker + openais

2009-05-15 Thread Andrew Beekhof
On Fri, May 15, 2009 at 6:49 AM, Michael Schwartzkopff wrote: > Hi, > > For testing I started openais and pacemaker on one machine. I configured > openais and started it. All seems to work but there are no nodes added to the > cluster. > > Even setting no-quorum-policy to ignore doesn't help. > >

Re: [Linux-HA] Problems With SLES11 + DRBD

2009-05-12 Thread Andrew Beekhof
On Mon, May 11, 2009 at 4:38 PM, wrote: > Woo-hoo I can finally give an answer on this list! :) > > You need to set the no-quorum-policy to ignore: > > # crm configure property no-quorum-policy=ignore The reason being that unlike heartbeat, openais doesn't pretend it has quorum when it doesn't h

Re: [Linux-HA] standby node doesnot takeover

2009-05-12 Thread Andrew Beekhof
On Mon, May 11, 2009 at 10:18 AM, Kaushal Shriyan wrote: > Hi > > I am using Heartbeat - 2.1.3-8 on Ubuntu OS. the issue is that when i stop > primary node. the standby node does not take over. > All the resources fail. > > Any help would be really appreciated. So would some log files :-) ___

Re: [Linux-HA] Recovering a Fragile CIB after Debian Lenny upgrade

2009-05-12 Thread Andrew Beekhof
Have you tried crm_resource -C yet? That will clear away any errors and tell the cluster its ok to try again. On Wed, May 6, 2009 at 5:46 PM, Imran Chaudhry wrote: > Hi Guys, > > I need some advice regarding my cluster config. It's in production so > I'm treating it as fragile. > > In summary it

Re: [Linux-HA] A Node cannot rejoin the cluster after a rebuild

2009-05-12 Thread Andrew Beekhof
o that it would reflect the > new one from the rebuilt Nomen?  If so, which file(s) do I need to modify? > > Thank you. > > jerome > > -Original Message- > From: linux-ha-boun...@lists.linux-ha.org > [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew

Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing

2009-05-11 Thread Andrew Beekhof
On Mon, May 11, 2009 at 11:52 AM, Peter Kruse wrote: > Hi Andrew, > > Andrew Beekhof wrote: >> Any switch that shares power with the host(s) it controls clearly has a SPoF. >> You don't need me to tell you that. > > But that does not have to be a SPoF for the en

Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing

2009-05-11 Thread Andrew Beekhof
On Mon, May 11, 2009 at 11:04 AM, Peter Kruse wrote: > Hi Andrew, > > Andrew Beekhof wrote: >> On Wed, May 6, 2009 at 10:13 AM, Peter Kruse wrote: >>> You are saying that it is okay that a single failure can bring the cluster >>> in a unsolvable situation?  I tho

Re: [Linux-HA] A Node cannot rejoin the cluster after a rebuild

2009-05-11 Thread Andrew Beekhof
On Sat, May 9, 2009 at 12:56 AM, Jerome Yanga wrote: > Here is the scenario. > > 01)  There are two nodes in the Active-Passive cluster--Nomen and Rubric. > 02)  Nomen had a hardware and software failure. > 03)  Rubric took over the resources as expected. > 04)  Due to the failures, Nomen's operat

Re: [Linux-HA] Resource location on failover

2009-05-11 Thread Andrew Beekhof
s the available nodes. It seems > like this scenario will work correctly if I could give the different > resources some type of weighting factor. > > thanks, > kevin > > On May 8, 2009, at 9:10 AM, Andrew Beekhof wrote: > >> On Tue, May 5, 2009 at 11:32 PM, Kevin Harm

Re: [Linux-HA] Resource location on failover

2009-05-08 Thread Andrew Beekhof
On Tue, May 5, 2009 at 11:32 PM, Kevin Harms wrote: > >   In the following documentation: > http://clusterlabs.org/mediawiki/images/7/7d/Configuration_Explained_0.6.pdf >  there is a sub-heading titled "What if Two Nodes Have the Same > Score" that contains the statement "It would likely have pla

Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing

2009-05-08 Thread Andrew Beekhof
On Wed, May 6, 2009 at 10:13 AM, Peter Kruse wrote: > Hello, > > thanks for your replies, > > Andreas Mock wrote: >>> If the PDUs becomes unavailable and shortly after the host is unavailable as >>> well, then assume the host is down and fenced successfully. >> >> 'assume' is the bad word here. St

Re: [Linux-HA] Heartbeat 2.1.4 and 2.9.9 together?

2009-05-05 Thread Andrew Beekhof
n use the detach-reattach method though. http://clusterlabs.org/wiki/Upgrade > > TIA > Nikita Michalko > > Am Montag, 4. Mai 2009 08:58 schrieb Andrew Beekhof: >> haresources clusters should be fine. >> for crm clusters it depends if you go for 1.0 or 0.6 >> >>

Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing

2009-05-04 Thread Andrew Beekhof
On Mon, May 4, 2009 at 4:17 PM, Andreas Mock wrote: >> -Ursprüngliche Nachricht- >> Von: "Peter Kruse" >> Gesendet: 04.05.09 15:19:06 >> An:   pacema...@oss.clusterlabs.org >> Betreff: Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing > > Hi Peter, > >> If the PDUs becomes unavaila

Re: [Linux-HA] Heartbeat 2.1.4 and 2.9.9 together?

2009-05-03 Thread Andrew Beekhof
haresources clusters should be fine. for crm clusters it depends if you go for 1.0 or 0.6 On Fri, May 1, 2009 at 10:32 PM, Mike Sweetser - Adhost wrote: > Hello: > > I'm looking to migrate an existing Heartbeat 2.1.4 installation to > 2.9.9.  Would it be possible to upgrade the servers one at a t

Re: [Linux-HA] Broken web site documentation

2009-05-01 Thread Andrew Beekhof
On Thu, Apr 30, 2009 at 12:44 PM, Andrew Beekhof wrote: > > Putting <= 2.0.8 anywhere near a production environment is bordering on > irresponsible. It occurred to me later, that it would reasonable to ask in response: If it was such a pile of garbage, wasn't releasing it i

Re: [Linux-HA] Broken web site documentation

2009-04-30 Thread Andrew Beekhof
Putting <= 2.0.8 anywhere near a production environment is bordering on irresponsible.2.0.8 in particular was simply broken and the release prior to that is nearly 3 years old. It's two years since the project supported 2.0.x If you want support for ancient releases, you need to contact your distr

Re: [Linux-HA] crm CLI

2009-04-29 Thread Andrew Beekhof
_sort_cli(self.obj_list)) >      ^ > SyntaxError: invalid syntax looks like a typo. add a semi-colon to the end of that line for obj in processing_sort_cli(self.obj_list)): <--- here > > cristina > > On Apr 29, 2009, at 8:45 AM, Cristina Bulfon wrote: > &g

Re: [Linux-HA] crm CLI

2009-04-28 Thread Andrew Beekhof
ssage- >>> From: linux-ha-boun...@lists.linux-ha.org >>> [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof >>> Sent: Tuesday, April 28, 2009 5:42 PM >>> To: General Linux-HA mailing list >>> Subject: Re: [Linux-HA] crm CLI >>> &

Re: [Linux-HA] Broken web site documentation

2009-04-28 Thread Andrew Beekhof
On Tue, Apr 28, 2009 at 16:30, Tobias Appel wrote: > I don't know why it's offline but why don't you use pacemaker? pacemaker being the new project containing the "v2" crm from heartbeat > It's got a really good documentation as well: > > http://www.clusterlabs.org/wiki/Documentation > > > Phili

Re: [Linux-HA] Resource Location/Scores

2009-04-28 Thread Andrew Beekhof
On Thu, Apr 23, 2009 at 21:35, Kevin Harms wrote: > >  I have the following configuration: >  n hosts >  n stonith resources >  n-1 resources > >  This gives 1 "backup" node that doesn't need to have a resource. (although > it may have stonith resources) When a node fails, in most cases, the > res

Re: [Linux-HA] crm CLI

2009-04-28 Thread Andrew Beekhof
On Tue, Apr 28, 2009 at 11:39, Junko IKEDA wrote: > lm_sensors and lm_sensors-devel might be needed. > RHEL5.3 includes net-snmp-5.3.2.2-5, but Pacemaker needs 5.4, Oh, has the API changed significantly? I just coded for the version I had. Is there an easy way to check the installed version? __

Re: [Linux-HA] crm CLI

2009-04-28 Thread Andrew Beekhof
On Tue, Apr 28, 2009 at 09:39, Cristina Bulfon wrote: > Ciao, > > not good news :-(( > On Apr 24, 2009, at 3:52 PM, Dejan Muhamedagic wrote: > >> Ciao, >> >> On Fri, Apr 24, 2009 at 09:29:12AM +0200, Cristina Bulfon wrote: >>> >>> Ciao, >>> >>> I tried to build pacemaker rpm w/o success :-(( >>> I

Re: [Linux-HA] SLES 11 HA

2009-04-27 Thread Andrew Beekhof
On Mon, Apr 27, 2009 at 15:03, wrote: > Hi, not sure anyone can help me but I'm running down a few blind alleys with > the current state of HA in Suse, and I notice there are a few Suse employees > on this list ;) > > We are currently using SLES 10 SP2 for our clusters, doing a nice job with >

Re: [Linux-HA] Assymetric Clustering

2009-04-24 Thread Andrew Beekhof
On Tue, Apr 21, 2009 at 21:54, fsalas wrote: > > First of all, thanks for prompt anwers > > > > >> On Mon, 20 Apr 2009 14:11:22 -0700: >> >> 8.10 has 2.1.3 which is not a good choice. use at least 2.1.4 or >> heartbeat 2.99.2 / pacemaker 1.0.3 . maybe you have to compile it >> yourself. >> >> > >

Re: [Linux-HA] New pacemaker feature

2009-04-22 Thread Andrew Beekhof
I'd be interested in taking such a feature but this isn't the list to discuss it. Pacemaker is not a Linux-HA project. Try http://oss.clusterlabs.org/mailman/listinfo/pacemaker On Wed, Apr 22, 2009 at 23:41, Mark Hamzy wrote: > > > Hello, > > I am working on a feature to add system health metric

Re: [Linux-HA] restoring the web site

2009-04-22 Thread Andrew Beekhof
Alan, You don't work here anymore, you gave up the right to dictate what happens to the project quite some time ago. As I have already communicated to to the actual people running the project, the old wiki site (minus the horribly out-of-date and misleading version 2 information) can be restored

Re: [Linux-HA] Failover is not working as expected

2009-04-22 Thread Andrew Beekhof
On Tue, Apr 21, 2009 at 22:55, sachin patel wrote: > > I have finally setup four system A B C D > I have six resource group each is lvm, filesystem, nfs and ip > > when I stop heartbeat on a system whole group fails over to some other system. > Problem is that with this resource group other resour

Re: Re: [Linux-HA] Re: node dc

2009-04-21 Thread Andrew Beekhof
2009/4/21 grikxd : > Another trouble is also caused by the  DC host, incorrect. the location of the DC has zero connection to where resources are placed. > If I have two resource groups ,A(oss0) and B (oss1) > > if DC host is oss1,when I start resource A and B ,I find all resource   > ,running on

Re: [Linux-HA] crm CLI

2009-04-17 Thread Andrew Beekhof
On Fri, Apr 17, 2009 at 10:00, Cristina Bulfon wrote: > Ciao, > > I installed heartbeat 2.1.4 and I'd like to know for using the cmr CLI suit > do I have to install pacemaker ? Correct. You need heartbeat > 2.99.0 and pacemaker > 1.0.0 > Follows the rpm list > > heartbeat-2.1.4-4.1.x86_64 > hear

Re: [Linux-HA] Need help with cib.xml configuration

2009-04-16 Thread Andrew Beekhof
On Thu, Apr 16, 2009 at 09:51, MAHESH, SIDDACHETTY M (SIDDACHETTY M) wrote: > Hi Andrew, > > -Original Message- > From: linux-ha-boun...@lists.linux-ha.org > [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof > Sent: Thursday, April 16, 2009 12:5

Re: [Linux-HA] Need help with cib.xml configuration

2009-04-16 Thread Andrew Beekhof
On Thu, Apr 16, 2009 at 07:37, MAHESH, SIDDACHETTY M (SIDDACHETTY M) wrote: > Hi List, > > >   I am new to linux-ha and this is my first attempt at it. > > My configuration: > 1. OS = Redhat Enterprise Linux 5.x > 2. HA = v2.1.3-3 RPM install (using CentoS repository rpms) > > > I am having troubl

Re: [Linux-HA] Cancel a STONITH

2009-04-15 Thread Andrew Beekhof
On Thu, Apr 16, 2009 at 03:41, Junko IKEDA wrote: > Hi, > > I think I can use "hb_delnode" when I want to remove one node from the > cluster, > Should I do "hb_delnode" on DC? > Is there any distinction between DC or not to do that? Nope. ___ Linux-HA m

Re: [Linux-HA] can not shutdown heartbeat.

2009-04-15 Thread Andrew Beekhof
set the shutdown_escalation cluster option On Wed, Apr 15, 2009 at 16:05, Juha Heinanen wrote: > Andrew Beekhof writes: > >  > > You can't. There should be a shutdown escalation after 5 minutes >  > > if something hangs resourcewise. >  > >  > Actua

Re: [Linux-HA] can not shutdown heartbeat.

2009-04-15 Thread Andrew Beekhof
On Fri, Apr 10, 2009 at 14:03, Dejan Muhamedagic wrote: > Hi, > > On Fri, Apr 10, 2009 at 01:59:29AM -0400, Martin Suehowicz wrote: >> I ran in to an issue tonight where I could not delete or shutdown the >> R_192.168.1.2  the resource below. I was running service heartbeat stop. >> It would just

Re: [Linux-HA] mgmtd on Ubuntu

2009-04-15 Thread Andrew Beekhof
On Wed, Apr 15, 2009 at 15:20, David Dumortier wrote: > Hi and thanks for the response, > > Michael Schwartzkopff a écrit : >> >> Upgrade to a recent version. heartbeat 2.99 and pacemaker 1.0.3. > > Target servers are production servers. Beta is not an option for me. Neither are beta versions ___

<    7   8   9   10   11   12   13   14   15   16   >