Re: [Linux-HA] How can heartbeat notify me, if one node dead?

2009-04-15 Thread Andrew Beekhof
Just one email is fine thanks... no need to send it 3 times. You can probably achieve this with a couple of MailTo resources 2009/4/14 �^�I : > How can heartbeat notify me, if one node dead? Can I config it to send an > email to me? If can, how to do? > ___

Re: [Linux-HA] difference in cib.xml

2009-04-14 Thread Andrew Beekhof
On Wed, Apr 15, 2009 at 06:13, devi wrote: > This is because, my resources are virtual machines on 2 nodes. hence ip > will be different, so inorder to specify them as resources in > haresources.  Hence the cib.xml differs No, it doesn't. See the section "Using Rules to Control Resource Optio

Re: [Linux-HA] www.linux-ha.org lost DTD

2009-04-14 Thread Andrew Beekhof
On Tue, Apr 14, 2009 at 14:25, devi wrote: > Hi Andrew, > >        Can we have different cib.xml file between 2 nodes(which belongs to > same cluster), as per our requirement. No. > If no, then is there any > alternative to achieve this feature. I mean, can we achieve it using > cibadmin?  . No

Re: [Linux-HA] how to check HBA with heartbeat

2009-04-14 Thread Andrew Beekhof
On Fri, Apr 10, 2009 at 12:25, Cristina Bulfon wrote: > Dejan, > > I've followed your advice and I've moved to V2, first the software has been > updated to version 2.1.4. >  I just modified the following files > > - ha.cf, added the line >         crm yes > > - cib.xml has been produced using the

Re: [Linux-HA] difference in cib.xml

2009-04-09 Thread Andrew Beekhof
On Thu, Apr 9, 2009 at 07:27, devi wrote: > hi > > my cluster conatins 2 nodes, when I run the heartbeat , I am getting an > error message > > "cib_process_diff: Diff 0.1.1 -> 0.1.2 not applied to 0.1.1: Failed > application of a global update.  Requesting full refresh." > > I dont understand why

Re: [Linux-HA] Options of crm_mon

2009-04-08 Thread Andrew Beekhof
On Wed, Apr 8, 2009 at 08:52, Michael Schwartzkopff wrote: > Hi, > > if I am calling crm_mon with the option -? it gives me the help page. There > are the possible options "xtNTFHP" listed, but I did not find any > documentation > about these options. Any help here? Where can I find the missing d

Re: [Linux-HA] How to determine why resources aren't started?

2009-04-03 Thread Andrew Beekhof
I'd not be using master/slave resources with 2.1.4 Try getting the latest version of Pacemaker (which also lists failed operations in the crm_mon output) On Tue, Feb 10, 2009 at 16:21, Michael Rendell wrote: > Hi, > >  Am having problems determining why some resources are not started > by linux-h

Re: [Linux-HA] node ignored after reboot

2009-04-03 Thread Andrew Beekhof
Sorry, I've had to ignore Heartbeat based clusters for the last few weeks... There may have been a problem with 1.0.2, I never tested it with Heartbeat, but my testing this week indicates the current code should work. So you might want to consider updating... This looks suspicious though: heart

Re: AW: [Linux-HA] Heartbeat v2 stickiness, score and more

2009-04-02 Thread Andrew Beekhof
On Thu, Apr 2, 2009 at 16:14, Michael Schwartzkopff wrote: > Am Donnerstag, 2. April 2009 15:35:44 schrieb florian.engelm...@bt.com: >> I was now reading the great links Michael gave me and I learned a lot. >> But I am still a little confused about openAIS / Heartbeat / Pacemaker. >> >> I understo

Re: [Linux-HA] rule move resource if mysql status fails

2009-04-02 Thread Andrew Beekhof
Create a resource and a monitor action for mysql but mark set is-managed=false. This tells the cluster not to stop or start it, but only to check if its running. Then simply add a colocation constraint from the other resource to the mysql one. Though I have to wonder why you'd not want the cluste

Re: [Linux-HA] Determining whether CIB is accessible

2009-04-02 Thread Andrew Beekhof
On Wed, Apr 1, 2009 at 18:21, Nicholas Dronen wrote: > On Wed, Mar 18, 2009 at 3:44 AM, Andrew Beekhof wrote: >> >> On Tue, Mar 17, 2009 at 16:10, Nicholas Dronen wrote: >> > Hi: >> > >> > I'm writing scripts to dynamically add, remove, manage

Re: [Linux-HA] Programmatic interface to CRM?

2009-04-02 Thread Andrew Beekhof
On Wed, Apr 1, 2009 at 17:24, wrote: > I am delighted to hear there is subscription mechanism! > I apologize if I have not read through the documentation > thoroughly. Could you please point me to the documentation > on CIB events and the subscription mechanism? > > What is the protocol by which

Re: [Linux-HA] 1 ip resource fail between master master mysql instance help

2009-04-01 Thread Andrew Beekhof
add a colocation constraint between the ip and mysql. On Wed, Apr 1, 2009 at 08:43, Martin Suehowicz wrote: > I am setting up a master/master mysql instance with 2 arrays. I would > like to setup a address to fail between the two of them. > I can setup a ip resource ok and I can get to fail over

Re: [Linux-HA] Programmatic interface to CRM?

2009-04-01 Thread Andrew Beekhof
On Tue, Mar 31, 2009 at 22:36, wrote: > Is there a programmatic interface to CRM available, by which > an application could query CIB and perhaps be notified (based > on callbacks or some such) about events such as transitions? you can subscribe to cib events. crm_mon does this to generate snmp

Re: [Linux-HA] Re:Two sets of Heartbeat HTTPD clusters on same subnet

2009-03-31 Thread Andrew Beekhof
On Tue, Mar 31, 2009 at 06:41, Arun G wrote: > Try assigning different udp port for broadcast in both the clusters. Wont be enough. You'd also need them to be running in different chroot environments. > > Default port used is #udpport        694 > > Regards, > Arun. > >> From: Devraj Mukherjee

Re: [Linux-HA] Newbie: Always reboots if I start heartbeat ...

2009-03-31 Thread Andrew Beekhof
Try "crm respawn" instead of "crm on" This should leave the node up long enough to allow you to figure out what the problem is. On Mon, Mar 30, 2009 at 16:34, Lothar Behrens wrote: > Hi, > > I am new to Linux-HA and have problems to proper configure my two machines > (ha1 and ha2). > > I have put

Re: [Linux-HA] could heartbeat notify by email about heartbeat status?

2009-03-27 Thread Andrew Beekhof
2009/3/27 可可熊 : > 2009/3/26 Andrew Beekhof : >> newer versions of pacemaker support this as part of the crm_mon daemon. >> the advantage of this method is that the daemon monitors _all_ resource >> events. >> >> it can also send snmp alerts. >> > &g

Re: [Linux-HA] OpenAIS, Heartbeat and Pacemaker: what exactly are they now?

2009-03-27 Thread Andrew Beekhof
On Thu, Mar 26, 2009 at 18:10, Jose Perez wrote: > Just to finish with questions. Do you have any technical comparision > between OpenAIS and Heartbeat? There was one, but I don't have it handy. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http

Re: [Linux-HA] maintenance-mode of pengine

2009-03-26 Thread Andrew Beekhof
On Mon, Mar 23, 2009 at 11:36, Dominik Klein wrote: > Michael Schwartzkopff wrote: >> Hi, >> >> In the metadata of the pengine I found the attribute maintenance-mode. I did >> not find any documentation about it. The long description also says: "Should >> the cluster ...". Anybody knows what this

Re: [Linux-HA] why crm_resource -C works, but gives error message?

2009-03-26 Thread Andrew Beekhof
On Wed, Mar 25, 2009 at 20:52, Dejan Muhamedagic wrote: > Hi, > > On Wed, Mar 25, 2009 at 07:38:12PM +0200, Juha Heinanen wrote: >> i have: >> >> j...@lenny1:~$ crm_mon -1 >> >> >> Last updated: Wed Mar 25 19:33:01 2009 >> Current DC: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325) >> V

Re: [Linux-HA] suicide in no-quorum-policy

2009-03-26 Thread Andrew Beekhof
On Wed, Mar 25, 2009 at 12:20, Dejan Muhamedagic wrote: > I guess that under some circumstances people would want to have a > node try to commit suicide in case of quorum lost. Perhaps the > user doesn't want to trust the node's ability to stop resources, > in which case this could serve as a quic

Re: [Linux-HA] could heartbeat notify by email about heartbeat status?

2009-03-26 Thread Andrew Beekhof
newer versions of pacemaker support this as part of the crm_mon daemon. the advantage of this method is that the daemon monitors _all_ resource events. it can also send snmp alerts. On Thu, Mar 26, 2009 at 03:18, 可可熊 wrote: > for example, there are two nodes, when one node down, another node >

Re: [Linux-HA] OpenAIS, Heartbeat and Pacemaker: what exactly are they now?

2009-03-26 Thread Andrew Beekhof
Best place to start is http://www.clusterlabs.org/wiki/Main_Page#Project_History followed soon after by http://www.clusterlabs.org/mediawiki/images/f/fb/Configuration_Explained.pdf On Thu, Mar 26, 2009 at 01:47, Jose Perez wrote: > Hi people: > > I'm new in HA Clustering world. I started googlin

Re: [Linux-HA] Score calculations ignored.

2009-03-20 Thread Andrew Beekhof
I'd highly recommend getting pacemaker 1.0 which (finally!) sorted out the scoring mess that failure_stickiness created. http://clusterlabs.org oh, and the new version also allows: ptest -L -s which will show you the current scores. On Fri, Mar 20, 2009 at 22:01, adam wrote: > Hi list- > > He

Re: [Linux-HA] Latest DTD?

2009-03-19 Thread Andrew Beekhof
On Wed, Mar 18, 2009 at 22:30, Michael Schwartzkopff wrote: > Am Mittwoch, 18. März 2009 10:56:49 schrieb Andrew Beekhof: >> On Wed, Mar 18, 2009 at 07:27, Michael Schwartzkopff > wrote: >> > Hi, >> > >> > where can I find information about the latest

Re: [Linux-HA] Running multiple instances of heartbeat

2009-03-18 Thread Andrew Beekhof
On Wed, Mar 18, 2009 at 00:18, Patrick LeBoutillier wrote: > Hi, > > I'd like to run multiple independant instances of heartbeat on some hosts. > After looking at the source, it seems the path to the config dir is hardcoded. > > Is there are way to do this? possibly in a chroot... but it is gener

Re: [Linux-HA] Latest DTD?

2009-03-18 Thread Andrew Beekhof
On Wed, Mar 18, 2009 at 07:27, Michael Schwartzkopff wrote: > Hi, > > where can I find information about the latest DTD. I am especially interested > in information about the crm_config section. > > The two sources of information I found were: > http://hg.clusterlabs.org/pacemaker/dev/file/tip/xml

Re: [Linux-HA] STONITH to the node which active(have some resources) and DC

2009-03-18 Thread Andrew Beekhof
On Wed, Mar 18, 2009 at 02:33, Junko IKEDA wrote: >> > I found the following stonithd behavior. >> > It might be an expected one, but I'm just wondering. >> > >> > My operation is here; >> > (1) start Heartbeat 2.1.4 on two nodes(dom-d1, dom-2). >> > (2) start the resource on active node(dom-d2),

Re: [Linux-HA] H.A. on SLES 11?

2009-03-18 Thread Andrew Beekhof
On Tue, Mar 17, 2009 at 21:30, lllact...@gmx.net wrote: > Andrew Beekhof wrote: > And where is the SLE-11-HAE to be found? Looked at opensues.org, > linux-ha.org, clusterlabs.org and Novell.com; or has it something > to do with Novell's "PlateSpin Orchestrate 2.0 H

Re: [Linux-HA] Determining whether CIB is accessible

2009-03-18 Thread Andrew Beekhof
On Tue, Mar 17, 2009 at 16:10, Nicholas Dronen wrote: > Hi: > > I'm writing scripts to dynamically add, remove, manage, and unmanage > resources from the CIB.  We already have scripts that start and stop > heartbeat.  What I'm doing is updating our code so we can use V2.  To > add resources to a V

Re: [Linux-HA] H.A. on SLES 11?

2009-03-17 Thread Andrew Beekhof
On Tue, Mar 17, 2009 at 18:17, Ciro Iriarte wrote: > 2009/3/17 lllact...@gmx.net : >> I have installed SLES 11 RC4. There are no packages for openAIS on the >> DVD, neither any sign of pacemaker. Do I have to install packages >> mentioned on the clusterlabs.org site? Are there advantages of these

Re: [Linux-HA] Re: Configurating STONITH device (how to avoid reset each other)

2009-03-17 Thread Andrew Beekhof
On Sun, Mar 15, 2009 at 19:54, Fabian Herschel wrote: > Hi, > > for heatbeat 2.1.4 there is IMHO no "out-of-the-box" solution for that > problem. > > I dont know, if the following method would be a valid method: > > Edit(!) the stonith script and add a sleep XX to the one of the nodes > stonith sc

Re: [Linux-HA] STONITH to the node which active(have some resources) and DC

2009-03-17 Thread Andrew Beekhof
On Mon, Mar 16, 2009 at 10:46, Junko IKEDA wrote: > Hi, > > I found the following stonithd behavior. > It might be an expected one, but I'm just wondering. > > My operation is here; > (1) start Heartbeat 2.1.4 on two nodes(dom-d1, dom-2). > (2) start the resource on active node(dom-d2), and dom-d2

Re: [Linux-HA] clvmd and SuSE

2009-03-17 Thread Andrew Beekhof
On Mon, Mar 16, 2009 at 17:37, Jan Kalcic wrote: > Hi All, > > will be clvmd available for SuSE correct > and integrated with hearbeat2? no > What I have > heard is EVMS will not longer be part of  SLES starting from SLE 11 instead > OpenAIS will. Does this answer my initial question? yes :)

Re: [Linux-HA] Manual Resource Migration creates a Constraint

2009-03-14 Thread Andrew Beekhof
On Sat, Mar 14, 2009 at 00:05, Jerome Yanga wrote: > When I manually migrate a group resource, a constraint is automatically > created.  Is there a way to avoid this? No. Thats how migration is supposed to work. Once the resource has been moved, use crm_resource -U > > Here is the automaticall

Re: [Linux-HA] Crm_mon and 'crm resource state' disagree

2009-03-13 Thread Andrew Beekhof
Sorry for the delay... if you happen to have the cluster in this state (still/again), can you please attach the result of cibadmin -Ql Both tools, i think, use the same underlying library for calculating the resource state so this really shouldn't be possible. On Thu, Mar 12, 2009 at 16:18, Nichol

Re: [Linux-HA] Meanings of node states

2009-03-12 Thread Andrew Beekhof
On Wed, Mar 11, 2009 at 00:05, Nicholas Dronen wrote: > Hi: > > Looking at crm_mon, I sometimes see a node listed as UNCLEAN (online) or > UNCLEAN (offline).  It looks like UNCLEAN (online) means that the node > disappeared unexpectedly from the cluster.  How about UNCLEAN (online)? It usually me

Re: [Linux-HA] pacemaker 1.0.2 memory leak

2009-03-11 Thread Andrew Beekhof
On Wed, Mar 11, 2009 at 19:39, Pavel Georgiev wrote: >> Also yes :) >> You can either grab the latest sources or wait for 1.0.3 > > Any estimates when will that be out later this month > (I`m guessing the centos rpms will > be available shortly after the release)? same time as everyone else :)

Re: [Linux-HA] pacemaker 1.0.2 memory leak

2009-03-11 Thread Andrew Beekhof
On Wed, Mar 11, 2009 at 18:17, Pavel Georgiev wrote: > I've noticed that pacemaker`s /usr/lib/heartbeat/cib leaks ~200kb > every time a resource is migrated. I`ve setup a resource to fail ~ 2 > minutes after it is started and the cib proc quickly grows in size. > I`ve upgraded pacemaker to 1.0.2-1

Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

2009-03-11 Thread Andrew Beekhof
On Wed, Mar 11, 2009 at 18:41, Jerome Yanga wrote: > Thank you all.  I have been happy with the functionality of the setup that > you guys helped build. > > For reference, here are the versions that I am running. > > drbd-8.2.7-3 > drbd-debuginfo-8.2.7-3 > drbd-km-2.6.18_128.1.1.el5-8.2.7-3 > hea

Re: [Linux-HA] live migrate

2009-03-10 Thread Andrew Beekhof
fail counts. > > > > > > -Original Message- > From: linux-ha-boun...@lists.linux-ha.org > [mailto:linux-ha-boun...@lists.linux-ha.org]on Behalf Of Andrew Beekhof > Sent: Monday, 9 March 2009 7:03 PM > To: General Linux-HA mailing list > Subject: Re: [Linu

Re: [Linux-HA] live migrate

2009-03-09 Thread Andrew Beekhof
On Thu, Mar 5, 2009 at 21:09, David Pinkerton H wrote: > CIB as follows Looks sane enough. There's no colocation or ordering rules that would prevent a migration from occurring... Did any resource actions fail? (Look for ERROR: in the logs). Actually, re-reading the original email, what do you

Re: [Linux-HA] crm_mon vs cl_status

2009-03-08 Thread Andrew Beekhof
On Thu, Mar 5, 2009 at 15:24, Harakiri wrote: > > > > > --- On Thu, 3/5/09, Andrew Beekhof wrote: > >> From: Andrew Beekhof >> Subject: Re: [Linux-HA] crm_mon vs cl_status >> To: harakiri...@yahoo.com >> Cc: "Linux-HA mailing list" >>

Re: [Linux-HA] Generell STONITH resource configuration question

2009-03-06 Thread Andrew Beekhof
On Fri, Mar 6, 2009 at 19:00, Dejan Muhamedagic wrote: > Hi, > > On Fri, Mar 06, 2009 at 06:53:37PM +0100, Andrew Beekhof wrote: >> On Fri, Mar 6, 2009 at 13:27, Dejan Muhamedagic wrote: >> > Hi, >> >> Another option for such devices might be to use a Master/

Re: [Linux-HA] Generell STONITH resource configuration question

2009-03-06 Thread Andrew Beekhof
On Fri, Mar 6, 2009 at 13:27, Dejan Muhamedagic wrote: > Hi, >> Another option for such devices might be to use a Master/Slave and >> only have the master do monitoring. >> I wonder if the lrm hooks for stonith can handle this. > > Don't see any reason why they shouldn't. Does raexecstonith suppo

Re: [Linux-HA] crm_mon vs cl_status

2009-03-05 Thread Andrew Beekhof
On Mar 5, 2009, at 12:39 PM, Harakiri wrote: YES it _is_. The log messages above indicate the order heartbeat starts them in - anything after that is up to the scheduler of your OS. Regardless, the crmd and cib both have loops that retry opening connections to the services they require - with

Re: [Linux-HA] crm_mon vs cl_status

2009-03-05 Thread Andrew Beekhof
On Wed, Mar 4, 2009 at 17:15, Harakiri wrote: > > Thanks for answering, > > > --- On Wed, 3/4/09, Andrew Beekhof wrote: > >> >> crm_mon takes other things into account. >> but without logs or the current cib its impossible to say >> for sure why >

Re: [Linux-HA] live migrate

2009-03-05 Thread Andrew Beekhof
On Thu, Mar 5, 2009 at 01:35, David Pinkerton H wrote: > > Having an issue with live migrates: > > When I migrate a single domU (ie. crm_resource -M -r domU) the source dom0 > calls "migrate_to" and the target dom0 calls "migrate_from" - as expected. > If I execute several migrates at once, the s

Re: [Linux-HA] Re: New experimental debian repository

2009-03-04 Thread Andrew Beekhof
On Thu, Mar 5, 2009 at 06:33, Michael Schwartzkopff wrote: > Simon Horman schrieb: >> >> (...) >> I agree that it would be good to have a good repository for >> hb2.99/pacemaker on on Debian Stable/Lenny (as opposed to the efforts >> to get  hb2.99/pacemaker into Debian experimental and subsequent

Re: [Linux-HA] Re: New experimental debian repository

2009-03-04 Thread Andrew Beekhof
On Wed, Mar 4, 2009 at 09:24, Michael Schwartzkopff wrote: > Hi, > > these packets are compiled on lenny. I wrote a script that gets the latest > sources (see http://www.clusterlabs.org/wiki/Install#From_Source), patches > them to get rid of openais, compiles and builds the packages. > > Sorry for

Re: [Linux-HA] crm_mon vs cl_status

2009-03-04 Thread Andrew Beekhof
On Wed, Mar 4, 2009 at 01:18, Harakiri wrote: > > Hi, > > i got 2.1.4 to work on sparc solaris 10, the only issue left is that crm_mon > reports wrong node status (node as offline). Whereas cl_status works more > reliably in indicating that the local node is online. crm_mon takes other things i

Re: [Linux-HA] Strange problem with pingd

2009-03-02 Thread Andrew Beekhof
ent. On Mon, Mar 2, 2009 at 11:38, Michael Schwartzkopff wrote: > Am Montag, 2. März 2009 08:35:51 schrieb Andrew Beekhof: >> On Fri, Feb 27, 2009 at 16:04, Michael Schwartzkopff > wrote: >> > Am Freitag, 27. Februar 2009 15:21:34 schrieb Michael Schwartzkopff: >> >

Re: [Linux-HA] Strange problem with pingd

2009-03-01 Thread Andrew Beekhof
On Fri, Feb 27, 2009 at 16:04, Michael Schwartzkopff wrote: > Am Freitag, 27. Februar 2009 15:21:34 schrieb Michael Schwartzkopff: >> Hi, >> >> my system: debian lenny, heartbeat 2.99.2-1,  pacemaker 1.0.1-1. >> >> In ha.cf I have 2 ping nodes: >> ping 82.135.103.97 192.168.188.19 >> >> From the c

Re: [Linux-HA] Trigger a migrate in an OCF script

2009-03-01 Thread Andrew Beekhof
On Mon, Mar 2, 2009 at 03:09, David Pinkerton H wrote: > > How does heartbeat know a script is capable of doing a migrate instead of > stop/start.  I've been playing with the dummy script has it supports a > migrate_to/migrate_from but heartbeat never calls it. > > The only other migrate script

Re: [Linux-HA] Generell STONITH resource configuration question

2009-02-28 Thread Andrew Beekhof
On Fri, Feb 27, 2009 at 22:32, Andreas Mock wrote: >> -Ursprüngliche Nachricht- >> Von: "Andrew Beekhof" >> Gesendet: 16.02.09 11:23:53 >> An:  General Linux-HA mailing list >> Betreff: Re: [Linux-HA] Generell STONITH resource configuration ques

Re: [Linux-HA] showscores for pacamaker-1.0

2009-02-27 Thread Andrew Beekhof
On Fri, Feb 27, 2009 at 16:23, Michael Schwartzkopff wrote: > Hi, > > anybody knows where I can find Dominik's showscores script? Is there a more > actual version of that one of May 2008? > The may 2008 version results in a kind of broken output. > > Or is there a native pacemaker command that sho

Re: [Linux-HA] Pacemaker's pengine 1.0 crash.

2009-02-25 Thread Andrew Beekhof
See the following page for submitting bug http://clusterlabs.org/wiki/Help:Contents Be sure to include the backtrace On Wed, Feb 25, 2009 at 18:03, Brice Figureau wrote: > Hi, > > I'm currently running Pacemaker 0.6.5 and I'm preparing an upgrade > toward 1.0-tip (currently 0de73ec89e02) with

Re: [Linux-HA] Heartbeat quorum question

2009-02-24 Thread Andrew Beekhof
On Mon, Feb 23, 2009 at 22:15, Pavel Georgiev wrote: > OK, having spent few days on this, I tried few things that did not work out > the way I hoped, so I`m going to give it another try here. It wont work. You can't do this. ___ Linux-HA mailing list Li

Re: [Linux-HA] [Announce] DRBD Management Console

2009-02-23 Thread Andrew Beekhof
On Thu, Feb 19, 2009 at 17:01, Rasto Levrinc wrote: > Thursday 19 February 2009 12:42:02 pm Michael Schwartzkopff wrote: > >> >> There exists a CLI shell to CRM. Perhaps this one helps: >> >> http://hg.clusterlabs.org/pacemaker/dev/raw-file/tip/doc/crm_cli.txt > > It could be used to configure the

Re: [Linux-HA] Upper time limit for "start" method?

2009-02-19 Thread Andrew Beekhof
On Wed, Feb 18, 2009 at 14:51, Alexander Timofeev wrote: > I need to write RA for the service that takes 10 minutes to initialize. > OCF spec says that start method should not return until service is > completely ready to process clients requests. > > Questions: > > 1. Could the pengine process se

Re: [Linux-HA] Heartbeat score calculation

2009-02-18 Thread Andrew Beekhof
On Wed, Feb 18, 2009 at 22:10, Pavel Georgiev wrote: > Thanks, I`ll upgrade. > > Any input on issue (2) and (3) 2) later versions have the start-failure-is-fatal option (run: pengine metadata) 3) you need to erase the resource's operation history with crm_resource -C the -INFINITY score is comin

Re: [Linux-HA] [Announce] DRBD Management Console

2009-02-17 Thread Andrew Beekhof
Nice work. I particularly liked (what i presumed to be) the hierarchal view. Being able to be run out of a browser is a nice addition too. Is it just mgmtd protocol issues preventing it from being used with Pacemaker? (Obviously there is a new syntax available in 1.0 but its still possible to run

Re: [Linux-HA] Heartbeat quorum question

2009-02-16 Thread Andrew Beekhof
On Tue, Feb 17, 2009 at 02:14, Pavel Georgiev wrote: > I have a setup with 4 nodes, 3 of them may run a resource in active/passive > mode and that resource should never run on the 4th node. I also have a > second resource which may run on either one of the nodes (again > active/passive). > > Since

Re: [Linux-HA] Generell STONITH resource configuration question

2009-02-16 Thread Andrew Beekhof
On Tue, Feb 10, 2009 at 16:06, Tobias Appel wrote: > Hi, > > sorry for so many posts about STONITH but the documentation (at least on > linux-ha.org) is somewhat lacking in this department. > > Just to get it right, I'm using IPMI as STONITH device. I have to set up > two resources in a 2-Node clu

Re: [Linux-HA] xen live migrate

2009-02-16 Thread Andrew Beekhof
On Thu, Feb 12, 2009 at 00:14, David Pinkerton H wrote: > > > Is there away to trigger a live migrate from within the xen OCF script? > > I'm running heartbeat 2.1.4 /xen 3.2.0 /drbd 8.2.6 and want to trigger a live > migrate (works manually) if drbd goes diskless (ie. access to san lost) crm_re

Re: [Linux-HA] Nodes shooting each other over and over (STONITH)

2009-02-16 Thread Andrew Beekhof
On Mon, Feb 9, 2009 at 11:36, Tobias Appel wrote: > Hi, > > I've configured stonith/ssh on my 2-node cluster (Heartbeat 2.1.4). This > is still in testing since I haven't happen to find a good STONITH device > yet. > Anyway, I ran some tests this morning and pulled the cross-over cable. > After a

Re: [Linux-HA] Anybody succeded in creating a debian package of pacemaker?

2009-02-16 Thread Andrew Beekhof
On Sun, Feb 15, 2009 at 13:51, Michael Schwartzkopff wrote: > Hi, > > I am trying to create a debian lenny package from pacemaker. When I use the > way advertised on clusterlabs.org > > dpkg-buildpackage -rfakeroot -uc -us > > I get an error about a missing file "service_crm.a". > > When I install

Re: [Linux-HA] Pingd stops working after a certain time

2009-02-13 Thread Andrew Beekhof
You hit a pingd bug. The counter wraps around pingd wasn't able to handle it. This and the logging is fixed for the next version. On Thu, Feb 12, 2009 at 11:33, Tim Verhoeven wrote: > Hi, > > I had a strange problem with one of my clusters last night. As far as > I can see it it seems that the p

Re: [Linux-HA] Write requires Quorum, Setting up LVM resource fails

2009-02-06 Thread Andrew Beekhof
On Thu, Feb 5, 2009 at 17:49, sachin patel wrote: > > I am setting up heartbeat 2.1.3 on RHEL4.7 > > I have all four node up and running. IP resources fails over fine but when I > try to put LVM resources it says "write requires Quorum" The cluster is under the impression it doesn't have quorum.

Re: [Linux-HA] HA attempts to start the resource after "monitor" has returned an error.

2009-02-02 Thread Andrew Beekhof
On Thu, Jan 29, 2009 at 20:42, Alexander Timofeev wrote: > All, > > I have recently faced with strange behavior of the CRM. I have OCF compliant > RA ( ocf-tester considers it to be such ) . > It is supposed to fail over to other node on the very first failure and it > does. I noticed that my reso

Re: [Linux-HA] Resource restarting constantly while setting up heartbeat2

2009-02-02 Thread Andrew Beekhof
On Mon, Feb 2, 2009 at 18:52, akshat kansal wrote: > Hi all, > > > > I am facing a issue while setting up heartbeat version 2.0 using cib.xml > I am using two resources postgre and PCS. > Postgres is running fine,But the PCS resource is starting and stopping > continuously. > > *Issue: The hearbea

Re: [Linux-HA] The upper limit of cib.xml for cibadmin

2009-02-02 Thread Andrew Beekhof
One day we'll switch to the IPC code from openais... IIRC, it doesn't have a limit. Maybe for Pacemaker 1.2 On Tue, Feb 3, 2009 at 08:12, Junko IKEDA wrote: > Hi, > >> > > > We have 16 nodes, and the size of cib.xml is now about 150kbyte. >> > > > Heartbeat is 2.1.4. >> > > > >> > > > When I call

Re: [Linux-HA] The upper limit of cib.xml for cibadmin

2009-02-02 Thread Andrew Beekhof
On Tue, Feb 3, 2009 at 05:00, Junko IKEDA wrote: > Hi, > >> > > We have 16 nodes, and the size of cib.xml is now about 150kbyte. >> > > Heartbeat is 2.1.4. >> > > >> > > When I call cibadmin command, the following message comes. >> > > # cibadmin -U -x cib.xml >> > > No messages received in 30 sec

Re: [Linux-HA] The upper limit of cib.xml for cibadmin

2009-02-02 Thread Andrew Beekhof
On Mon, Feb 2, 2009 at 10:57, Junko IKEDA wrote: > Hi, > > We have 16 nodes, and the size of cib.xml is now about 150kbyte. > Heartbeat is 2.1.4. > > When I call cibadmin command, the following message comes. > # cibadmin -U -x cib.xml > No messages received in 30 seconds.. aborting > > Is the siz

Re: [Linux-HA] Failover not working as I expected

2009-02-02 Thread Andrew Beekhof
On Mon, Feb 2, 2009 at 08:21, Dominik Klein wrote: >> Moreover, even hb_gui shows that the services are bounced/restarted when a >> node joins the cluster. The status of the resources changes to "failed" for >> a second and changes back to "running on". > > Sounds like a bug in your RA to me.

Re: [Linux-HA] Failover troubles

2009-02-02 Thread Andrew Beekhof
On Fri, Jan 30, 2009 at 15:54, Alexander Timofeev wrote: > I have a linkup resource that can only run on node with external eth > connection present. > Linkup resource fails as soon as node looses external connectivity and fails > over to another node. > Then it tries to start there. It starts suc

Re: [Linux-HA] Can't get the ping_group to work - problem with location constraints and DRBD

2009-02-02 Thread Andrew Beekhof
On Fri, Jan 30, 2009 at 18:45, Dejan Muhamedagic wrote: > Hi, > > On Fri, Jan 30, 2009 at 01:12:09PM +0100, Tobias Appel wrote: >> On Fri, 2009-01-30 at 12:16 +0100, Dejan Muhamedagic wrote: >> > Hi, >> > >> > On Fri, Jan 30, 2009 at 11:35:54AM +0100, Tobias Appel wrote: >> > > Hi, >> > > >> > > I

Re: [Linux-HA] Linux-HA configuration on SLES 10.2 problem

2009-01-29 Thread Andrew Beekhof
Please update to something more recent On Thu, Jan 29, 2009 at 14:46, peteridah wrote: > > Hello, > > I have set up a 2-node heartbeat cluster on Suse Linux 10.2.I am using IBM > RSA slimlime II adapter cards on ibm x3655 servers connected to a SAN > device.So far I have set up an ext3 filesystem

Re: [Linux-HA] Re: Resource is started but gets never promoted

2009-01-29 Thread Andrew Beekhof
On Wed, Jan 28, 2009 at 16:43, Antonio wrote: > Hi, > > I have installed pacemaker. Ater I started it for the first time, I got > error messages that there was an error promoting my resource. From the log I > read that the promotion action was killed by lrmd. In the logs it said there > were some

Re: [Linux-HA] Failover not working as I expected

2009-01-29 Thread Andrew Beekhof
On Tue, Jan 27, 2009 at 22:04, Jerome Yanga wrote: > Dominik, > > Here is the status of the two concerns I needed help on. > > 01) When a node comes back up after a restart of heartbeat, resources gets > bounced when it rejoins the cluster. > STATUS: The resources still gets bounced when a node

Re: [Linux-HA] Remove Resources from CRM maintaining 'started' state

2009-01-29 Thread Andrew Beekhof
Probably time to create a bug and attach a hb_report archive (which will have the logs and saved configurations we'd need to figure out what happened). On Mon, Jan 26, 2009 at 21:49, Jordi Guijarro Olivares wrote: > On Mon, Jan 26, 2009 at 9:22 AM, Andrew Beekhof wrote: > >&g

Re: [Linux-HA] Some basic questions regarding heartbeat

2009-01-29 Thread Andrew Beekhof
On Wed, Jan 28, 2009 at 20:44, Michael Schwartzkopff wrote: > Am Mittwoch, 28. Januar 2009 18:22:31 schrieb Christian Schoepplein: >> - Is heartbeat-2 the software I'm looking for, can I build a HA cluster >> with apache, mysql and a central file storage with heartbeat-2 only? > > Definitely YE

Re: [Linux-HA] OCF RA and BASH arrays

2009-01-29 Thread Andrew Beekhof
On Mon, Jan 26, 2009 at 22:53, Ethan Bannister wrote: > > Can anyone tell me if heartbeat can handle a multiple elements within a > specific parameter in a OCF Agent using a BASH array? no - the parameters are name=value environment variables of course the RA can then parse them into whatever for

Re: [Linux-HA] Linux-HA on solaris

2009-01-29 Thread Andrew Beekhof
On Tue, Jan 27, 2009 at 11:44, David Lee wrote: > >>> I've no idea whether there is any compatibility overlap between pacemaker >>> and OHAC. I suspect, sadly, that there might not be (i.e. that the >>> clustering world has split into two (or more) parts). >> >> Exactly what kind of split are yo

Re: [Linux-HA] Can't find cib.xml

2009-01-29 Thread Andrew Beekhof
On Fri, Jan 23, 2009 at 17:14, sachin patel wrote: > > I have installed redhat and ha packages. started hb_gui and configure simple > IPaddr2 resources. > gui says it is running fine > I can use cibadmin -Q and can see completed XML format output. but in > /var/lib/heartbeat/crm/cib.xml file is

Re: [Linux-HA] Linux-HA on solaris

2009-01-26 Thread Andrew Beekhof
On Mon, Jan 26, 2009 at 18:11, David Lee wrote: > On Tue, 20 Jan 2009, Michael Schwartzkopff wrote: > >> as far as I understood, Linux-HA / pacemaker should also compile on a non- >> Linux OS. Does it compile under OpenSolaris? Any experience? Is this code >> still working? > > (Apologies for the

Re: [Linux-HA] VM as HA resource going unmanaged in SLES10 SP2, XEN kernel, iSCSI SAN , OCFS2 , LinuxHA

2009-01-26 Thread Andrew Beekhof
There's no logs in here... hard to comment without them On Tue, Jan 20, 2009 at 15:13, Lazer, Joshey wrote: > Hi , > > > Not sure of the cause for the below behaviour , any help or pointer is > appreciated. > Do I need to modify some parameters? or need to change the way I did the > setup? > >

Re: [Linux-HA] Monitor Operation should restart but resource goes into failed state?

2009-01-26 Thread Andrew Beekhof
we'd need at least the logs to be able to help you On Mon, Jan 26, 2009 at 12:59, Tobias Appel wrote: > Hi, > > I've added Monitor Operations to most of my resources and status > operations to the ones I only have a lsb script for. > > I then stopped the resource not via the cluster but just via

Re: [Linux-HA] resource_stickiness and groups - how it is calculated?

2009-01-26 Thread Andrew Beekhof
On Mon, Jan 26, 2009 at 12:10, Tobias Appel wrote: > Well I've got a lot of questions today as you can see :) > > I have a group of resources which is ordered and colocated (due to drbd > master / slave constraint). I added a monitor operation to nearly all > members of this group with a on_fail r

Re: [Linux-HA] cibadmin won't parse input file

2009-01-26 Thread Andrew Beekhof
what version of the software are you running, what does the rest of the config look like and what was the error On Mon, Jan 26, 2009 at 11:59, Tobias Appel wrote: > Hi, > > cibadmin just won't parse my input file, I've rewritten it twice now and > can't spot the error - maybe I haven't had enough

Re: [Linux-HA] Problem in swichover DRBD disk

2009-01-26 Thread Andrew Beekhof
On Wed, Jan 21, 2009 at 14:50, Stefano Bossi wrote: > Hi, > > that's my first attempt to mount a HA cluster. I have a strange behavior on > my > drbd disks. > When I failover the cluster the DRBD Master node move correctly but the > Slave > one doesn't start. > For simulate the fail over I change

Re: [Linux-HA] Remove Resources from CRM maintaining 'started' state

2009-01-26 Thread Andrew Beekhof
On Sat, Jan 24, 2009 at 10:04, Jordi Guijarro Olivares wrote: > Hi, > > To minimize downtime and maximize availability in a mixed Xen HA environment > (not all resources managed by heartbeat) it's interesting to me remove > resources maintaining the state of the resource. The cluster can also do

Re: [Linux-HA] Resource is started but gets never promoted

2009-01-24 Thread Andrew Beekhof
On Thu, Jan 22, 2009 at 14:31, Darren Mansell wrote: > On Thu, 2009-01-22 at 13:51 +0100, Andrew Beekhof wrote: >> >> > If I wanted >> > the most stable version on my SLES 10 SP2 cluster should I install >> > packages from outside of the Suse official packag

Re: [Linux-HA] Crash with hertbeat 2.99.3

2009-01-23 Thread Andrew Beekhof
On Fri, Jan 23, 2009 at 12:03, Michael Schwartzkopff wrote: > Am Freitag, 23. Januar 2009 11:41:18 schrieb Andrew Beekhof: >> On Fri, Jan 23, 2009 at 11:35, Michael Schwartzkopff > wrote: >> > It seems that cib process cannot start. What might be the reason? >> >>

Re: [Linux-HA] Two-node clusters in split-sites

2009-01-23 Thread Andrew Beekhof
? Silly me, I should have thought of that. Sure. We don't care what "hardware" your cluster runs on :-) > -Original Message- > From: linux-ha-boun...@lists.linux-ha.org > [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof > Sent: Freit

Re: [Linux-HA] Crash with hertbeat 2.99.3

2009-01-23 Thread Andrew Beekhof
On Fri, Jan 23, 2009 at 11:59, Michael Schwartzkopff wrote: > Am Freitag, 23. Januar 2009 11:41:18 schrieb Andrew Beekhof: >> On Fri, Jan 23, 2009 at 11:35, Michael Schwartzkopff > wrote: >> > It seems that cib process cannot start. What might be the reason? >> >>

Re: [Linux-HA] Crash with hertbeat 2.99.3

2009-01-23 Thread Andrew Beekhof
On Fri, Jan 23, 2009 at 11:35, Michael Schwartzkopff wrote: > It seems that cib process cannot start. What might be the reason? Were there any logs at all from the cib? You didn't include any for the latest post. Also, use "crm respawn" instead of "crm yes" to avoid heartbeat rebooting the node

Re: [Linux-HA] Two-node clusters in split-sites

2009-01-23 Thread Andrew Beekhof
On Fri, Jan 23, 2009 at 09:09, Hell, Robert wrote: > Hi, > > I'm wondering that quorum server doesn't work, because Alan Robertson > mentioned it in his LinuxWorld 08 talk. Can someone proof that it isn't > working? I think Lars has done this in the past. > Third node solution: I wouldn't be a

Re: [Linux-HA] cib.xml is in sync, yet it won't failover resources

2009-01-22 Thread Andrew Beekhof
On Thu, Jan 15, 2009 at 23:56, alexus wrote: > Hello, > > Would someone help me try to understand the problem i'm having > I'm using: > > [r...@mail2 ~]# cat /etc/redhat-release > Red Hat Enterprise Linux Server release 5.2 (Tikanga) > [r...@mail2 ~]# uname -a > Linux uftwfmail2 2.6.18-92.el5xen #

Re: [Linux-HA] Pacemaker-GUI despair

2009-01-22 Thread Andrew Beekhof
On Thu, Jan 22, 2009 at 21:03, Michael Schwartzkopff wrote: > Hi, > > I am quite desparate already. I compiled my Pacemaker-GUI. Compile and install > are OK, besides a small error with gv.py. (I finally removed gv from the > import > line) > > When I call /usr/heartbeat-gui/haclient.py I get the

<    8   9   10   11   12   13   14   15   16   17   >