Re: [Pacemaker] Patches for VirtualDomain RA

2011-08-08 Thread Dominik Klein
1) During stop operation libvirt occasionally returns an error because the state cannot be determined just the moment the machine is shut down. This patch makes the RA try to get the state again one time. If the machine is down then everything is OK. 2) The next problem is that a graceful shutd

Re: [Pacemaker] Location issue

2011-06-08 Thread Dominik Klein
e role="slave" -inf: #uname ne drbd3 > > result is identical, pacemaker try launch slave role on other nodes:-((( > > > 2011/6/8 Dominik Klein mailto:d...@in-telegence.net>> > > >> but when i shutdown drbd3 host Pacemaker try start slave role on >

Re: [Pacemaker] Location issue

2011-06-08 Thread Dominik Klein
>> but when i shutdown drbd3 host Pacemaker try start slave role on >> other host. How can i prevent this behavior? > > try > s/inf/-inf > s/eq/neq "ne" actually, sorry ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.or

Re: [Pacemaker] Location issue

2011-06-08 Thread Dominik Klein
On 06/08/2011 10:39 AM, ruslan usifov wrote: > Hello > > I have follow constraint: > > location ms_drbd_web-U_slave_on_drbd3 ms_drbd_web-U \ > rule role="slave" inf: #uname eq drbd3 > > > Which as i think it prevents slave role from launch on all hosts except > drbd3, nope it says "pu

Re: [Pacemaker] init script VS pacemaker to start a service

2011-06-08 Thread Dominik Klein
On 06/07/2011 07:09 PM, CeR wrote: > Hi there! > > I have some doubts, hope you folks can help me. > > In a system I have two (or more) ways to start a daemon: > A) /etc/init.d/ script. The service could be started by the system > (/etc/rcX) or by me manually. > B) The daemon has an executable

Re: [Pacemaker] Statefull firewall cluster Active/Pasive with conntrackd issues

2011-05-11 Thread Dominik Klein
netfilter is smarter than you think it is. It can distinguish between packet flows forming an "allowed flow" and actually invalid packets. That's default behaviour. This only works if there's no helper module needed. So with the likes of NAT or FTP connections, this will not work without conntrack

Re: [Pacemaker] Reboot node with stonith after killing a corosync-process?

2011-04-15 Thread Dominik Klein
Hi On 04/15/2011 09:05 AM, Tom Tux wrote: > I can reproduce this behavior: > > - On node02, which had no resources online, I killed all corosync > processes with "killall -9 corosync". > - Node02 was rebootet through stonith > - On node01, I can see the following lines in the message-log (line 6

[Pacemaker] [patch] low: ping RA: Make timeouts configured with unit work

2011-04-13 Thread Dominik Klein
Hi when the "ping" RA configured as primitive ping ocf:pacemaker:ping timeout="5s" it throws [: 5s: integer expression expected This patch fixes configurations where timeout is configured with a unit following the number. hth Dominik exporting patch: # HG changeset

Re: [Pacemaker] operative tasks for a pacemaker cluster

2011-04-13 Thread Dominik Klein
> Can you file a bug for that? http://developerbugs.linux-foundation.org/show_bug.cgi?id=2582 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getti

Re: [Pacemaker] operative tasks for a pacemaker cluster

2011-04-12 Thread Dominik Klein
> Were those 7000 pe-inputs all created over that 7 day period? Because > that's a transition every 1.44 minutes. Might be the recheck-interval? In a cluster of mine I have a recheck-interval of 5 minutes and see a new pe-input.bz2 in /var/lib/pengine every 5 minutes. cibadmin -Q|grep recheck

Re: [Pacemaker] pacemaker keeps crashing

2011-03-16 Thread Dominik Klein
On 03/15/2011 03:51 PM, Andrew Beekhof wrote: > On Tue, Mar 15, 2011 at 2:35 PM, Dominik Klein wrote: >> Hi >> >> I installed a new 3 node cluster today. I used the instructions on the >> install page from the wiki and up to "corosync start" everythin

[Pacemaker] pacemaker keeps crashing

2011-03-15 Thread Dominik Klein
Hi I installed a new 3 node cluster today. I used the instructions on the install page from the wiki and up to "corosync start" everything went smooth. At that point, apparently the following loop of corosync spawning pacemaker and pacemaker crashing starts. See logs on http://pastebin.com/VayyqZ

Re: [Pacemaker] Resource-Monitoring with an "On Fail"-Action

2010-03-17 Thread Dominik Klein
-100 node1 10003 > MySQL_MonitorAgent_Resource 100 node2 10003 > > I also saw, that the "last-run"-entry (crm_mon -fort1) for this > resource is not up-to-date. For me it seems, that the monitor-action > does not occu

Re: [Pacemaker] Which Linux to use for cluster

2010-03-17 Thread Dominik Klein
Hi Norbert I don't know what you did in 11.2, but I'll try to tell you what I do. I'm mostly still on 11.1 and use the clusterlabs repo. After installing the operating system from scratch, pretty much all I do is following the install page from the wiki http://clusterlabs.org/wiki/Install ie zy

Re: [Pacemaker] Resource-Monitoring with an "On Fail"-Action

2010-03-16 Thread Dominik Klein
Tom Tux wrote: > Hi > > I've have a question about the resource-monitoring: > I'm monitoring an ip-resource every 20 seconds. I have configured the > "On Fail"-action with "restart". This works fine. If the > "monitor"-operation fails, then the resource will be restartet. > > But how can I define

Re: [Pacemaker] [Patch showscores.sh]

2010-03-15 Thread Dominik Klein
err, yeah. That wasn't right. Use this one. Regards Dominik Dominik Klein wrote: > Minor Update. Just noticed it doesn't display stickiness=0 if stickiness > is unset. So failcound and migration-threshold columns were mixed up. > > Patch against stable-1.0 > >

[Pacemaker] [Patch showscores.sh]

2010-03-15 Thread Dominik Klein
Minor Update. Just noticed it doesn't display stickiness=0 if stickiness is unset. So failcound and migration-threshold columns were mixed up. Patch against stable-1.0 Regards Dominik exporting patch: # HG changeset patch # User Dominik Klein # Date 1268639542 -3600 # Branch stable-1.0 #

Re: [Pacemaker] Breaking pacemaker

2010-02-16 Thread Dominik Klein
jimbob palmer wrote: > Hello, > > I have a cluster that is all working perfectly. Time to break it. > > This is a two node master/slave cluster with drbd. Failover between > the nodes works backwards and forwards. Everything is happier than a > well fed cat. > > I wanted to see what would happen

Re: [Pacemaker] High load issues

2010-02-05 Thread Dominik Klein
Just for the record: heartbeat (3.0.2) was not able to recover either. It also manages to see a failure on the dead node but fails to recover. Regards Dominik ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/l

Re: [Pacemaker] High load issues

2010-02-05 Thread Dominik Klein
> But generally I believe this test case is invalid. I might agree here that this test case does not necessarily reproduce what happened on my production system (unfortunately I do not know for sure what happened there, the dev who caused this just tells me he used some stupid sql statement and ev

Re: [Pacemaker] Fwd: [Cluster-devel] Organizing Bug Squash PartyforCluster 3.x, GFS2 and more

2010-02-03 Thread Dominik Klein
Koch, Sebastian wrote: > Ahh great, that's good news. I've never been to australia hehe. If it would > be in germany or maybe austria i will participate and try my best to help > squash bugs. But i am no dveeloper i am more a technician. I may be wrong here, but I think this "party" will have no

Re: [Pacemaker] Fwd: [Cluster-devel] Organizing Bug Squash Partyfor Cluster 3.x, GFS2 and more

2010-02-02 Thread Dominik Klein
Koch, Sebastian wrote: > Hi, > > i am kind of new in the whole cluster stuff but i would like to > participateand contribute. But the main questions in which country ;-) I'd guess in #linux-cluster country, no? :) ___ Pacemaker mailing list Pacemaker@

Re: [Pacemaker] APC Master Stonith

2010-01-21 Thread Dominik Klein
Sander van Vugt wrote: > Hi, > > On Wed, 2010-01-20 at 07:56 +0100, Dominik Klein wrote: >> Errol Neal wrote: >>> On Tue, Jan 19, 2010 04:19 PM, Sander van Vugt >>> wrote: >>>> Hi, >>>> >>>> I hope someone has configu

Re: [Pacemaker] APC Master Stonith

2010-01-19 Thread Dominik Klein
Errol Neal wrote: > On Tue, Jan 19, 2010 04:19 PM, Sander van Vugt wrote: >> Hi, >> >> I hope someone has configured the APC Master Stonith resource (which you >> would use to have pacemaker to a device like the APC switched rack PDU), >> as I have a - probably extremely stupid - conceptual quest

Re: [Pacemaker] corosync init script broken

2009-12-28 Thread Dominik Klein
Dejan, thanks for your quick answer. > There has been recently an update to the corosync init script. I > think that it was actually written from scratch. It should be > included in release 1.2.0. Do you have that version? # rpm -qa|grep coro libcorosync-1.1.2-1 corosync-1.1.2-1 That's what cam

[Pacemaker] corosync init script broken

2009-12-28 Thread Dominik Klein
Hi cluster people been a while, couldn't really follow things. Today I was tasked to install a new cluster, went for 1.0.6 and corosync as described on the wiki and hit this: New cluster with pacemaker 106 and latest available corosync from the clusterlabs.org/rpm opensuse 11.1 repo. This instal

Re: [Pacemaker] crm_mon not refreshing

2009-11-30 Thread Dominik Klein
crm_mon is event-driven now. For a pretty long time actually. So unless something changes, you won't see a change in crm_mon. Regards Dominik Joseph, Lester wrote: > Hi, > > I have pacemaker 1.0.6 running with heartbeat 3.0.1. > Noticed that crm_mon is not refreshing anymore, even when I specif

Re: [Pacemaker] Looking for correct constraints

2009-10-30 Thread Dominik Klein
> Maybe set a cluster-wide attribute, which, when set, does not allow res2 > to run. Ie rule with score -infinity. > > res1 could remove this attribute while starting and set this attribute > when stopping. This does not make any sense. Sorry, let me try again. res1 start = set attribute res1 st

Re: [Pacemaker] Looking for correct constraints

2009-10-30 Thread Dominik Klein
Michael Schwartzkopff wrote: > Am Freitag, 30. Oktober 2009 13:26:35 schrieb Lars Marowsky-Bree: >> On 2009-10-30T13:19:52, Michael Schwartzkopff wrote: >>> I have a three node cluster. I have two resources that are not allowed to >>> run together in the cluster. Basically resource2 is a failover

Re: [Pacemaker] strange behaviour for ssh and eth0

2009-10-28 Thread Dominik Klein
gilberto migliavacca wrote: > Hi Dominik > > How can I configure the node's ips as cluster resources? > > sorry for the silly question but I'm a newbye in this field > > thanks in advance > > gilberto > > Dominik Klein wrote: >> gilberto mig

Re: [Pacemaker] strange behaviour for ssh and eth0

2009-10-28 Thread Dominik Klein
gilberto migliavacca wrote: > Hi > > > I have 2 nodes and 1 node that I'm using just > to manage the cluster. > > I started up the nodes and created the following > configuration : > > > node custdevc03.funambol.com > node custdevc04.funambol.com > node custdevc05.funambol.com > primitive res.

Re: [Pacemaker] bug: multi state and target-role=started results in promote

2009-10-15 Thread Dominik Klein
> i thought that for multistate resources, Started == Slave. > am i mistaken? did this change some time ago? Afaik, that was only true for status display in crm_mon. But also, that was fixed quite a while ago. Regards Dominik ___ Pacemaker mailing list

Re: [Pacemaker] Migration, constraints and failback off

2009-09-01 Thread Dominik Klein
Diego Woitasen wrote: > HI > I'm building a two node cluster with Xen, DRBD and > Pacemaker+Heartbeat. I've set default_resouce_stickiness to INFINITY > to disable failback (I want to handle it manually). When I want to > migrate a resource I execute > > crm resource migrate gw-piso-lab > > and >

Re: [Pacemaker] "rc=-2" when executing monitors

2009-08-26 Thread Dominik Klein
Roberto Suarez Soto wrote: > El día Wed, 26 Aug 2009 21:38:19 +1000, Tim Serong > escribía: > >>> we've recently deployed a two-node cluster using pacemaker, and >>> we're seeing a strange thing in the logs: from time to time, the monitor >>> operation fails with "rc=-2". This is an example:

Re: [Pacemaker] Slave does not get become Master after unplugging power cable at master

2009-08-18 Thread Dominik Klein
hj lee wrote: > Thank very much for the reply. > > I tested it both stonith-enabled and no-quorum-policy. As Dejan pointed, > this is related to stonith-enabled. With stonith-enabled true (which is > default), > if I kill the master node, the slave stays as a slave, it seems expecting > something

Re: [Pacemaker] Problem with ocf:heartbeat:mysql

2009-08-17 Thread Dominik Klein
Dominik Klein wrote: > Michal wrote: >> Hi, >> When I try to start mysql with config: >> primitive drbd1 ocf:heartbeat:drbd \ >> params drbd_resource=db \ >> op monitor role=Master interval=59s timeout=30s \ >> op monitor role=Slave interval=60s timeout=30s

Re: [Pacemaker] Problem with ocf:heartbeat:mysql

2009-08-17 Thread Dominik Klein
Michal wrote: > Hi, > When I try to start mysql with config: > primitive drbd1 ocf:heartbeat:drbd \ > params drbd_resource=db \ > op monitor role=Master interval=59s timeout=30s \ > op monitor role=Slave interval=60s timeout=30s > > ms ms-drbd1 drbd1 \ > meta clone-max=2 master-max="1" master-node

Re: [Pacemaker] RFC: Better error reporting for RAs.

2009-08-03 Thread Dominik Klein
> Though I don't see the point, grepping for the resource id is usually > just as effective. I totally agree here. I have helped quite a few people understand their problems on IRC and grepping the resource id usually works well. > I'd suggest focusing on improving the error logging that most RAs

Re: [Pacemaker] cibadmin vs. crm

2009-07-08 Thread Dominik Klein
Both work, in fact crm uses cibadmin in the background for some commands. crm uses readable, easier to remember syntax and commands, whereas cibadmin needs xml input (at least most of the time). So it's basically a question of preference. Regards Dominik Ryan Steele wrote: > My apologies if thi

Re: [Pacemaker] stonith reboot behavior

2009-06-22 Thread Dominik Klein
Hi Dan Dan Urist wrote: > My apologies if this is documented somewhere-- I've looked and haven't > found it. > > What happens if a stonith reboot fails? Does it retry, and if so how > many times and with what timeout and is that configureable? > > I have some hardware that has a buggy raid card

Re: [Pacemaker] Some showscores.sh questions

2009-05-18 Thread Dominik Klein
>> Whether it's in an RPM or not, could the author add a license header to it? > > dk: what license do you want? Just use what you use for all the cluster code. Regards Dominik ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.cluste

Re: [Pacemaker] Newbie question

2009-05-14 Thread Dominik Klein
Sorry, I misunderstood your question. When you said "pull the plug" i thought of the network connection and that is what pingd could help you with. If you pull the power plug, you shoud probably look into what beekhof told you. Sorry again, Dominik Dominik Klein wrote: > Hi

Re: [Pacemaker] Newbie question

2009-05-14 Thread Dominik Klein
Hi Mark The keyword you're looking for is "pingd". This example should get you going: http://www.clusterlabs.org/wiki/Example_configurations#Failover_IP__Service_in_a_Group_running_on_a_connected_node Regards Dominik Mark Schenk wrote: > Hello All, > > I'm new to pacemaker so please forgive m

Re: [Pacemaker] How to create OCF Resource Agents

2009-04-27 Thread Dominik Klein
Paul Osier wrote: > I'm trying to create an OCF resource agent that will start/stop/monitor > SER. I've read through the opencf.org resource agent api doc and the > wiki.linux-ha.org OCF resource agent doc and all those documents talk > about is what is needed in the resource agent, not necessari

Re: [Pacemaker] Resource won't switch from one node to another during power shutdown

2009-04-24 Thread Dominik Klein
Andrew Beekhof wrote: > On Fri, Apr 24, 2009 at 09:49, Juha Heinanen wrote: >> Andrew Beekhof writes: >> >> > +crm_config_err("NOTE: Clusters with shared data need STONITH to >> > ensure data integrity"); >> >> is special stonith hardware a must or is there some poor man's stonith >> sol

Re: [Pacemaker] drbd : returned 8 (master) instead of the expected value: 7 (not running)

2009-04-19 Thread Dominik Klein
Thomas Mueller wrote: > hi > > i'm using pacemaker 1.0.3 / hb 2.9.2-sle11rc9 on debian etch. > > altough everything is working like expected, these warnings pop up every > 5 minutes (the reckeck interval): > > Apr 17 07:04:52 ib002 crmd: [31460]: WARN: do_state_transition: > Progressed to st

Re: [Pacemaker] [Linux-HA] showscores.sh for pacemaker 1.0.2

2009-04-01 Thread Dominik Klein
Bruno Voigt wrote: > Hi Dominik, > > I use your script occasionally, > together with Pacemaker packaged for Debian by martin.loschw...@linbit.com. > > When running the new version I get as first output line: > tail: cannot open `+2' for reading: No such file or directory > and then the resource

Re: [Pacemaker] [Linux-HA] showscores.sh for pacemaker 1.0.2

2009-04-01 Thread Dominik Klein
So here's an update. Michael Schwartzkopf pointed out a bug regarding groups. That has been fixed now and the appropriate values should be shown. Thanks! There's not been a lot of feedback, is it because nobody uses the script or does it just work for you? Regards Dominik Dominik K

Re: [Pacemaker] couple of comments/questions on DRBD HowTo 1.0

2009-03-30 Thread Dominik Klein
Juha Heinanen wrote: > Dominik Klein writes: > > > The bug has been reported to Dejan (the crm shell dev) and he will > > fix it. > > are all bugs fixed also in OpenAIS 0.80.x branch (whitetank), which is > labelled on openais.com site as the stable release?

Re: [Pacemaker] couple of comments/questions on DRBD HowTo 1.0

2009-03-30 Thread Dominik Klein
Dominik Klein wrote: > Juha Heinanen wrote: >> Lars Ellenberg writes: >> >> > If that "Lars?" meant me, yes, please, >> > go ahead an delete outdated examples. >> > Replace with a reference to the drbd users guide >> > http://www.d

Re: [Pacemaker] restart of resource is not attempted

2009-03-29 Thread Dominik Klein
Juha Heinanen wrote: > i moved all my resources to the standby node. on this node, mysql > resource had a problem that prevented it from starting. i fixed the > problem and assumed that pacemaker would now automatically start mysql, > but it does not even try. it gave up after the first error ev

Re: [Pacemaker] couple of comments/questions on DRBD HowTo 1.0

2009-03-29 Thread Dominik Klein
Juha Heinanen wrote: > Lars Ellenberg writes: > > > If that "Lars?" meant me, yes, please, > > go ahead an delete outdated examples. > > Replace with a reference to the drbd users guide > > http://www.drbd.org/docs/about/ or > > http://www.drbd.org/docs/install/ > > how about the webserver

Re: [Pacemaker] couple of comments/questions on DRBD HowTo 1.0

2009-03-27 Thread Dominik Klein
Lars Ellenberg wrote: > On Wed, Mar 18, 2009 at 10:17:24AM +0100, Dominik Klein wrote: >> Juha Heinanen wrote: >>> "Prerequisites" section says that "DRBD must not be started by init.". In >>> Debian lenny at least, drbd init script load drbd mod

Re: [Pacemaker] Need your help in debugging

2009-03-26 Thread Dominik Klein
Priyanka Ranjan wrote: > Hi All, > > i am facing issue in ilo stonith. i have configured ilo stonith in my > cluster. it is running fine but it is not stonithing the errant node. in > case of failure , the syslong message on DC says that "we can't manage this > node" > > with same parameters va

Re: [Pacemaker] Monitor a resource without the cluster reacting to the result...

2009-03-25 Thread Dominik Klein
Joe Bill wrote: > Hi Dominik! > > dk at in-telegence wrote: >>> I'd love to see something like: >>> >>> # crm_resource -m check_level resource_id >>> .. >> This should be possible: >> >> export OCF_ROOT=/usr/lib/ocf >> export OCF_RESKEY_= >> export OCF_RESKEY_= >> $OCF_ROOT/resource.d// monitor >>

Re: [Pacemaker] Monitor a resource without the cluster reacting to the result...

2009-03-24 Thread Dominik Klein
foxyc...@yahoo.com wrote: > I've been wanting this for some time now and expecting pacemaker would > include it in it's newer versions. But I've checked the latest pacemaker 1.0 > distribution fresh of the day, and unfortunately have found nothing in it > indicating if this is possible. > > - R

Re: [Pacemaker] question related to resource starting

2009-03-24 Thread Dominik Klein
Glory Smith wrote: > On Tue, Mar 24, 2009 at 12:16 PM, Dominik Klein wrote: > >> Glory Smith wrote: >>> Hi All, >>> when we create a resource , how pacemaker choose a node to start >> resource >>> on it. >>> >>> To be mo

Re: [Pacemaker] question related to resource starting

2009-03-23 Thread Dominik Klein
Glory Smith wrote: > Hi All, > when we create a resource , how pacemaker choose a node to start resource > on it. > > To be more clear , suppose we have four node cluster , we configure any > resource xx and we see that it is started on say , node C . so my > question is why node C is choosen

Re: [Pacemaker] Colocation advice seeked

2009-03-20 Thread Dominik Klein
> By default, m1 and m1-ip are on xen-03, m2 and m2-ip are on xen-04. > Scores for the ips are > m1 xen-03 175 (100 node preference + 75 colocation with m1) > m1 xen-04 125 (50 node preference + 75 colocation with m2) > m2 xen-03 125 (50 node preference + 75 colocation with m1) > m2 xen-04 175 (100

Re: [Pacemaker] Colocation advice seeked

2009-03-20 Thread Dominik Klein
Hi Actually, I built a system just like that for presentation purpose (so just using Dummy resource, but that doesnt matter) to replace a system that is currently using keepalived. We seem to want to achieve just the same thing. Here's how I did it: # m1 = mysql 1 primitive m1 ocf:heartbeat:Dumm

Re: [Pacemaker] how to prevent auto relocation of recources to old primary?

2009-03-18 Thread Dominik Klein
> i wonder why the line > > location ms-drbd0-master-on-xen-1 ms-drbd0 rule role=master 100: #uname eq > xen-1 > > is in the example config, because heartbeat seems to be doing what the > line says even without it. The section states that "If you want to prefer a node to run the master role (xe

Re: [Pacemaker] how to prevent auto relocation of recources to old primary?

2009-03-18 Thread Dominik Klein
Juha Heinanen wrote: > Dominik Klein writes: > > > Sounds like you missed the order and colocation constraints. Please post > > your configuration. > > i have "order" and "colocation", but removed "location", because i > thought

Re: [Pacemaker] how to prevent auto relocation of recources to old primary?

2009-03-18 Thread Dominik Klein
Juha Heinanen wrote: > i tried the apache web server example of DRBD HowTo 1.0 with small changes: > > 1) replaced "webserver" primitive with "mysqlserver" primitive > 2) removed "location" primitive, since i don't care which node the resources >run. > > when i shutdown the current primary, t

Re: [Pacemaker] couple of comments/questions on DRBD HowTo 1.0

2009-03-18 Thread Dominik Klein
Juha Heinanen wrote: > "Prerequisites" section says that "DRBD must not be started by init.". In > Debian lenny at least, drbd init script load drbd module. if drbs init > is not run, drbd modules needs to be loaded by some other means, for > example, by adding "drbd" line to /etc/modules. The R

Re: [Pacemaker] A very basic question

2009-03-12 Thread Dominik Klein
>> Hi All, >> i have a quesion regarding stonith on 4 nodes cluster( suse 11 openais + >> pacemaker). i >> suppose i am using ilo or any other stonith where one stonith cant shoot >> more than one node , so i guess , i will have to create 4 stoniths for 4 >> node. assume i am using 4 nodes cluste

[Pacemaker] patch: pingd RA

2009-03-10 Thread Dominik Klein
High: RA pingd: Set default ping interval to 1 instead of 0 seconds. Produced high load and traffic. xen-03:~ # cat /proc/loadavg 1.53 1.54 1.47 4/213 6733 xen-03:~ # ps aux|grep pingd root 6735 0.0 0.0 5284 808 pts/1S+ 09:52 0:00 grep pingd root 17399 40.7 0.0 65316 1620

Re: [Pacemaker] Dont want to start openais cluster service at startup

2009-03-06 Thread Dominik Klein
Glory Smith wrote: > Thanks for reply Andrew, > i am using suse 11. man insserv Regards Dominik ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker

[Pacemaker] showscores.sh for pacemaker 1.0.2

2009-03-03 Thread Dominik Klein
Hi I made the necessary changes to the showscores script to work with pacemaker 1.0.2. Please test and report problems. Has been reported to work by some people and should go into the repository soon. Still, I'd like more people to test and confirm. Important changes: * correctly fetch stickines

[Pacemaker] Patch: RA pingd

2009-02-18 Thread Dominik Klein
. This is because the RA stops pingd with kill -9, which does not let it execute the normal pingd shutdown procedure (which includes setting the attribute to 0). Patch is attached. Regards Dominik exporting patch: # HG changeset patch # User Dominik Klein # Date 1234950829 -3600 # Branch stable

Re: [Pacemaker] very urgent

2009-02-16 Thread Dominik Klein
Glory Smith wrote: >> >> >> >> we kill the node with STONITH. >> very hard for a machine to write to shared media when its powered off. >> >> >> we can kill nodes when: >> - nodes become unresponsive - nodes are not part of the cluster that has >> quorum >> - resources fail to stop when instructed

Re: [Pacemaker] on_fail

2009-02-13 Thread Dominik Klein
Romi Verma wrote: >> setting on_fail=fence for a monitor op will cause the cluster to shoot the >> node immediately instead of trying to stop the resource and recover it >> without fencing. >> > > u said "instead of trying to stop the resource and recover it without > fencing." > > do you mean i

Re: [Pacemaker] Problem in drbd wiki page?

2009-02-13 Thread Dominik Klein
Andrew Beekhof wrote: > > On Feb 13, 2009, at 8:15 AM, Dominik Klein wrote: > >> Neil Katin wrote: >>> >>> I've been trying to upgrade to pacemaker 1.0.1, and have been >>> running the examples in a test environment. I've been trying >>&

Re: [Pacemaker] Problem in drbd wiki page?

2009-02-12 Thread Dominik Klein
Neil Katin wrote: > > I've been trying to upgrade to pacemaker 1.0.1, and have been > running the examples in a test environment. I've been trying > to get the example on the DRBD page to work: > > http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 > > This line seems to have two problems with it:

Re: [Pacemaker] spilit brain situation

2009-02-06 Thread Dominik Klein
Romi Verma wrote: > On Fri, Feb 6, 2009 at 3:09 PM, Andrew Beekhof wrote: > >> On Feb 6, 2009, at 10:29 AM, Romi Verma wrote: >> >> > i want the partition without quorum to reset the nodes instead of >>> killing . is it possible. >>> define the difference between reset node and kill node? >

Re: [Pacemaker] spilit brain situation

2009-02-06 Thread Dominik Klein
lure that led to loss of communication, the node reboots, restarts the cluster software and everything should be fine again. If there's a network problem, you would of course have to fix that ;) Regards Dominik >> This is not happening in my case. >> i dont have any stonith configu

Re: [Pacemaker] spilit brain situation

2009-02-06 Thread Dominik Klein
Romi Verma wrote: > Thanks Dominic, > i have two questions now. > > 1) what does no-quorum-policy= suicide means then?? does it remove the > resource completely. That's not documented and I don't know it. Guess we need Andrew to shed some light here. > 2) why each node is thinking itsef as DC a

Re: [Pacemaker] spilit brain situation

2009-02-06 Thread Dominik Klein
Romi Verma wrote: > Thanks for fast reply , > Ok, Let me explain the situation. i have two nodes cluster . i pulled out > the network cable of one > node which produced spilit brain situation. this time both nodes are > thinking that other one is dead. each node is thinking itself as DC and on > e