1) During stop operation libvirt occasionally returns an error because the
state cannot be determined just the moment the machine is shut down. This
patch makes the RA try to get the state again one time. If the machine is
down then everything is OK.
2) The next problem is that a graceful shutd
e role="slave" -inf: #uname ne drbd3
>
> result is identical, pacemaker try launch slave role on other nodes:-(((
>
>
> 2011/6/8 Dominik Klein mailto:d...@in-telegence.net>>
>
> >> but when i shutdown drbd3 host Pacemaker try start slave role on
>
>> but when i shutdown drbd3 host Pacemaker try start slave role on
>> other host. How can i prevent this behavior?
>
> try
> s/inf/-inf
> s/eq/neq
"ne" actually, sorry
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.or
On 06/08/2011 10:39 AM, ruslan usifov wrote:
> Hello
>
> I have follow constraint:
>
> location ms_drbd_web-U_slave_on_drbd3 ms_drbd_web-U \
> rule role="slave" inf: #uname eq drbd3
>
>
> Which as i think it prevents slave role from launch on all hosts except
> drbd3,
nope
it says "pu
On 06/07/2011 07:09 PM, CeR wrote:
> Hi there!
>
> I have some doubts, hope you folks can help me.
>
> In a system I have two (or more) ways to start a daemon:
> A) /etc/init.d/ script. The service could be started by the system
> (/etc/rcX) or by me manually.
> B) The daemon has an executable
netfilter is smarter than you think it is. It can distinguish between
packet flows forming an "allowed flow" and actually invalid packets.
That's default behaviour.
This only works if there's no helper module needed. So with the likes of
NAT or FTP connections, this will not work without conntrack
Hi
On 04/15/2011 09:05 AM, Tom Tux wrote:
> I can reproduce this behavior:
>
> - On node02, which had no resources online, I killed all corosync
> processes with "killall -9 corosync".
> - Node02 was rebootet through stonith
> - On node01, I can see the following lines in the message-log (line 6
Hi
when the "ping" RA configured as
primitive ping ocf:pacemaker:ping timeout="5s"
it throws
[: 5s: integer expression expected
This patch fixes configurations where timeout is configured with a unit
following the number.
hth
Dominik
exporting patch:
# HG changeset
> Can you file a bug for that?
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2582
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getti
> Were those 7000 pe-inputs all created over that 7 day period? Because
> that's a transition every 1.44 minutes.
Might be the recheck-interval?
In a cluster of mine I have a recheck-interval of 5 minutes and see a
new pe-input.bz2 in /var/lib/pengine every 5 minutes.
cibadmin -Q|grep recheck
On 03/15/2011 03:51 PM, Andrew Beekhof wrote:
> On Tue, Mar 15, 2011 at 2:35 PM, Dominik Klein wrote:
>> Hi
>>
>> I installed a new 3 node cluster today. I used the instructions on the
>> install page from the wiki and up to "corosync start" everythin
Hi
I installed a new 3 node cluster today. I used the instructions on the
install page from the wiki and up to "corosync start" everything went
smooth.
At that point, apparently the following loop of corosync spawning
pacemaker and pacemaker crashing starts. See logs on
http://pastebin.com/VayyqZ
-100 node1 10003
> MySQL_MonitorAgent_Resource 100 node2 10003
>
> I also saw, that the "last-run"-entry (crm_mon -fort1) for this
> resource is not up-to-date. For me it seems, that the monitor-action
> does not occu
Hi Norbert
I don't know what you did in 11.2, but I'll try to tell you what I do.
I'm mostly still on 11.1 and use the clusterlabs repo. After installing
the operating system from scratch, pretty much all I do is following the
install page from the wiki http://clusterlabs.org/wiki/Install
ie
zy
Tom Tux wrote:
> Hi
>
> I've have a question about the resource-monitoring:
> I'm monitoring an ip-resource every 20 seconds. I have configured the
> "On Fail"-action with "restart". This works fine. If the
> "monitor"-operation fails, then the resource will be restartet.
>
> But how can I define
err, yeah. That wasn't right.
Use this one.
Regards
Dominik
Dominik Klein wrote:
> Minor Update. Just noticed it doesn't display stickiness=0 if stickiness
> is unset. So failcound and migration-threshold columns were mixed up.
>
> Patch against stable-1.0
>
>
Minor Update. Just noticed it doesn't display stickiness=0 if stickiness
is unset. So failcound and migration-threshold columns were mixed up.
Patch against stable-1.0
Regards
Dominik
exporting patch:
# HG changeset patch
# User Dominik Klein
# Date 1268639542 -3600
# Branch stable-1.0
#
jimbob palmer wrote:
> Hello,
>
> I have a cluster that is all working perfectly. Time to break it.
>
> This is a two node master/slave cluster with drbd. Failover between
> the nodes works backwards and forwards. Everything is happier than a
> well fed cat.
>
> I wanted to see what would happen
Just for the record: heartbeat (3.0.2) was not able to recover either.
It also manages to see a failure on the dead node but fails to recover.
Regards
Dominik
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/l
> But generally I believe this test case is invalid.
I might agree here that this test case does not necessarily reproduce
what happened on my production system (unfortunately I do not know for
sure what happened there, the dev who caused this just tells me he used
some stupid sql statement and ev
Koch, Sebastian wrote:
> Ahh great, that's good news. I've never been to australia hehe. If it would
> be in germany or maybe austria i will participate and try my best to help
> squash bugs. But i am no dveeloper i am more a technician.
I may be wrong here, but I think this "party" will have no
Koch, Sebastian wrote:
> Hi,
>
> i am kind of new in the whole cluster stuff but i would like to
> participateand contribute. But the main questions in which country ;-)
I'd guess in #linux-cluster country, no? :)
___
Pacemaker mailing list
Pacemaker@
Sander van Vugt wrote:
> Hi,
>
> On Wed, 2010-01-20 at 07:56 +0100, Dominik Klein wrote:
>> Errol Neal wrote:
>>> On Tue, Jan 19, 2010 04:19 PM, Sander van Vugt
>>> wrote:
>>>> Hi,
>>>>
>>>> I hope someone has configu
Errol Neal wrote:
> On Tue, Jan 19, 2010 04:19 PM, Sander van Vugt wrote:
>> Hi,
>>
>> I hope someone has configured the APC Master Stonith resource (which you
>> would use to have pacemaker to a device like the APC switched rack PDU),
>> as I have a - probably extremely stupid - conceptual quest
Dejan,
thanks for your quick answer.
> There has been recently an update to the corosync init script. I
> think that it was actually written from scratch. It should be
> included in release 1.2.0. Do you have that version?
# rpm -qa|grep coro
libcorosync-1.1.2-1
corosync-1.1.2-1
That's what cam
Hi cluster people
been a while, couldn't really follow things. Today I was tasked to
install a new cluster, went for 1.0.6 and corosync as described on the
wiki and hit this:
New cluster with pacemaker 106 and latest available corosync from the
clusterlabs.org/rpm opensuse 11.1 repo.
This instal
crm_mon is event-driven now. For a pretty long time actually.
So unless something changes, you won't see a change in crm_mon.
Regards
Dominik
Joseph, Lester wrote:
> Hi,
>
> I have pacemaker 1.0.6 running with heartbeat 3.0.1.
> Noticed that crm_mon is not refreshing anymore, even when I specif
> Maybe set a cluster-wide attribute, which, when set, does not allow res2
> to run. Ie rule with score -infinity.
>
> res1 could remove this attribute while starting and set this attribute
> when stopping.
This does not make any sense. Sorry, let me try again.
res1 start = set attribute
res1 st
Michael Schwartzkopff wrote:
> Am Freitag, 30. Oktober 2009 13:26:35 schrieb Lars Marowsky-Bree:
>> On 2009-10-30T13:19:52, Michael Schwartzkopff wrote:
>>> I have a three node cluster. I have two resources that are not allowed to
>>> run together in the cluster. Basically resource2 is a failover
gilberto migliavacca wrote:
> Hi Dominik
>
> How can I configure the node's ips as cluster resources?
>
> sorry for the silly question but I'm a newbye in this field
>
> thanks in advance
>
> gilberto
>
> Dominik Klein wrote:
>> gilberto mig
gilberto migliavacca wrote:
> Hi
>
>
> I have 2 nodes and 1 node that I'm using just
> to manage the cluster.
>
> I started up the nodes and created the following
> configuration :
>
>
> node custdevc03.funambol.com
> node custdevc04.funambol.com
> node custdevc05.funambol.com
> primitive res.
> i thought that for multistate resources, Started == Slave.
> am i mistaken? did this change some time ago?
Afaik, that was only true for status display in crm_mon. But also, that
was fixed quite a while ago.
Regards
Dominik
___
Pacemaker mailing list
Diego Woitasen wrote:
> HI
> I'm building a two node cluster with Xen, DRBD and
> Pacemaker+Heartbeat. I've set default_resouce_stickiness to INFINITY
> to disable failback (I want to handle it manually). When I want to
> migrate a resource I execute
>
> crm resource migrate gw-piso-lab
>
> and
>
Roberto Suarez Soto wrote:
> El día Wed, 26 Aug 2009 21:38:19 +1000, Tim Serong
> escribía:
>
>>> we've recently deployed a two-node cluster using pacemaker, and
>>> we're seeing a strange thing in the logs: from time to time, the monitor
>>> operation fails with "rc=-2". This is an example:
hj lee wrote:
> Thank very much for the reply.
>
> I tested it both stonith-enabled and no-quorum-policy. As Dejan pointed,
> this is related to stonith-enabled. With stonith-enabled true (which is
> default),
> if I kill the master node, the slave stays as a slave, it seems expecting
> something
Dominik Klein wrote:
> Michal wrote:
>> Hi,
>> When I try to start mysql with config:
>> primitive drbd1 ocf:heartbeat:drbd \
>> params drbd_resource=db \
>> op monitor role=Master interval=59s timeout=30s \
>> op monitor role=Slave interval=60s timeout=30s
Michal wrote:
> Hi,
> When I try to start mysql with config:
> primitive drbd1 ocf:heartbeat:drbd \
> params drbd_resource=db \
> op monitor role=Master interval=59s timeout=30s \
> op monitor role=Slave interval=60s timeout=30s
>
> ms ms-drbd1 drbd1 \
> meta clone-max=2 master-max="1" master-node
> Though I don't see the point, grepping for the resource id is usually
> just as effective.
I totally agree here. I have helped quite a few people understand their
problems on IRC and grepping the resource id usually works well.
> I'd suggest focusing on improving the error logging that most RAs
Both work, in fact crm uses cibadmin in the background for some commands.
crm uses readable, easier to remember syntax and commands, whereas
cibadmin needs xml input (at least most of the time).
So it's basically a question of preference.
Regards
Dominik
Ryan Steele wrote:
> My apologies if thi
Hi Dan
Dan Urist wrote:
> My apologies if this is documented somewhere-- I've looked and haven't
> found it.
>
> What happens if a stonith reboot fails? Does it retry, and if so how
> many times and with what timeout and is that configureable?
>
> I have some hardware that has a buggy raid card
>> Whether it's in an RPM or not, could the author add a license header to it?
>
> dk: what license do you want?
Just use what you use for all the cluster code.
Regards
Dominik
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.cluste
Sorry, I misunderstood your question. When you said "pull the plug" i
thought of the network connection and that is what pingd could help you
with.
If you pull the power plug, you shoud probably look into what beekhof
told you.
Sorry again,
Dominik
Dominik Klein wrote:
> Hi
Hi Mark
The keyword you're looking for is "pingd".
This example should get you going:
http://www.clusterlabs.org/wiki/Example_configurations#Failover_IP__Service_in_a_Group_running_on_a_connected_node
Regards
Dominik
Mark Schenk wrote:
> Hello All,
>
> I'm new to pacemaker so please forgive m
Paul Osier wrote:
> I'm trying to create an OCF resource agent that will start/stop/monitor
> SER. I've read through the opencf.org resource agent api doc and the
> wiki.linux-ha.org OCF resource agent doc and all those documents talk
> about is what is needed in the resource agent, not necessari
Andrew Beekhof wrote:
> On Fri, Apr 24, 2009 at 09:49, Juha Heinanen wrote:
>> Andrew Beekhof writes:
>>
>> > +crm_config_err("NOTE: Clusters with shared data need STONITH to
>> > ensure data integrity");
>>
>> is special stonith hardware a must or is there some poor man's stonith
>> sol
Thomas Mueller wrote:
> hi
>
> i'm using pacemaker 1.0.3 / hb 2.9.2-sle11rc9 on debian etch.
>
> altough everything is working like expected, these warnings pop up every
> 5 minutes (the reckeck interval):
>
> Apr 17 07:04:52 ib002 crmd: [31460]: WARN: do_state_transition:
> Progressed to st
Bruno Voigt wrote:
> Hi Dominik,
>
> I use your script occasionally,
> together with Pacemaker packaged for Debian by martin.loschw...@linbit.com.
>
> When running the new version I get as first output line:
> tail: cannot open `+2' for reading: No such file or directory
> and then the resource
So here's an update. Michael Schwartzkopf pointed out a bug regarding
groups. That has been fixed now and the appropriate values should be
shown. Thanks!
There's not been a lot of feedback, is it because nobody uses the script
or does it just work for you?
Regards
Dominik
Dominik K
Juha Heinanen wrote:
> Dominik Klein writes:
>
> > The bug has been reported to Dejan (the crm shell dev) and he will
> > fix it.
>
> are all bugs fixed also in OpenAIS 0.80.x branch (whitetank), which is
> labelled on openais.com site as the stable release?
Dominik Klein wrote:
> Juha Heinanen wrote:
>> Lars Ellenberg writes:
>>
>> > If that "Lars?" meant me, yes, please,
>> > go ahead an delete outdated examples.
>> > Replace with a reference to the drbd users guide
>> > http://www.d
Juha Heinanen wrote:
> i moved all my resources to the standby node. on this node, mysql
> resource had a problem that prevented it from starting. i fixed the
> problem and assumed that pacemaker would now automatically start mysql,
> but it does not even try. it gave up after the first error ev
Juha Heinanen wrote:
> Lars Ellenberg writes:
>
> > If that "Lars?" meant me, yes, please,
> > go ahead an delete outdated examples.
> > Replace with a reference to the drbd users guide
> > http://www.drbd.org/docs/about/ or
> > http://www.drbd.org/docs/install/
>
> how about the webserver
Lars Ellenberg wrote:
> On Wed, Mar 18, 2009 at 10:17:24AM +0100, Dominik Klein wrote:
>> Juha Heinanen wrote:
>>> "Prerequisites" section says that "DRBD must not be started by init.". In
>>> Debian lenny at least, drbd init script load drbd mod
Priyanka Ranjan wrote:
> Hi All,
>
> i am facing issue in ilo stonith. i have configured ilo stonith in my
> cluster. it is running fine but it is not stonithing the errant node. in
> case of failure , the syslong message on DC says that "we can't manage this
> node"
>
> with same parameters va
Joe Bill wrote:
> Hi Dominik!
>
> dk at in-telegence wrote:
>>> I'd love to see something like:
>>>
>>> # crm_resource -m check_level resource_id
>>> ..
>> This should be possible:
>>
>> export OCF_ROOT=/usr/lib/ocf
>> export OCF_RESKEY_=
>> export OCF_RESKEY_=
>> $OCF_ROOT/resource.d// monitor
>>
foxyc...@yahoo.com wrote:
> I've been wanting this for some time now and expecting pacemaker would
> include it in it's newer versions. But I've checked the latest pacemaker 1.0
> distribution fresh of the day, and unfortunately have found nothing in it
> indicating if this is possible.
>
> - R
Glory Smith wrote:
> On Tue, Mar 24, 2009 at 12:16 PM, Dominik Klein wrote:
>
>> Glory Smith wrote:
>>> Hi All,
>>> when we create a resource , how pacemaker choose a node to start
>> resource
>>> on it.
>>>
>>> To be mo
Glory Smith wrote:
> Hi All,
> when we create a resource , how pacemaker choose a node to start resource
> on it.
>
> To be more clear , suppose we have four node cluster , we configure any
> resource xx and we see that it is started on say , node C . so my
> question is why node C is choosen
> By default, m1 and m1-ip are on xen-03, m2 and m2-ip are on xen-04.
> Scores for the ips are
> m1 xen-03 175 (100 node preference + 75 colocation with m1)
> m1 xen-04 125 (50 node preference + 75 colocation with m2)
> m2 xen-03 125 (50 node preference + 75 colocation with m1)
> m2 xen-04 175 (100
Hi
Actually, I built a system just like that for presentation purpose (so
just using Dummy resource, but that doesnt matter) to replace a system
that is currently using keepalived.
We seem to want to achieve just the same thing. Here's how I did it:
# m1 = mysql 1
primitive m1 ocf:heartbeat:Dumm
> i wonder why the line
>
> location ms-drbd0-master-on-xen-1 ms-drbd0 rule role=master 100: #uname eq
> xen-1
>
> is in the example config, because heartbeat seems to be doing what the
> line says even without it.
The section states that "If you want to prefer a node to run the master
role (xe
Juha Heinanen wrote:
> Dominik Klein writes:
>
> > Sounds like you missed the order and colocation constraints. Please post
> > your configuration.
>
> i have "order" and "colocation", but removed "location", because i
> thought
Juha Heinanen wrote:
> i tried the apache web server example of DRBD HowTo 1.0 with small changes:
>
> 1) replaced "webserver" primitive with "mysqlserver" primitive
> 2) removed "location" primitive, since i don't care which node the resources
>run.
>
> when i shutdown the current primary, t
Juha Heinanen wrote:
> "Prerequisites" section says that "DRBD must not be started by init.". In
> Debian lenny at least, drbd init script load drbd module. if drbs init
> is not run, drbd modules needs to be loaded by some other means, for
> example, by adding "drbd" line to /etc/modules.
The R
>> Hi All,
>> i have a quesion regarding stonith on 4 nodes cluster( suse 11 openais +
>> pacemaker). i
>> suppose i am using ilo or any other stonith where one stonith cant shoot
>> more than one node , so i guess , i will have to create 4 stoniths for 4
>> node. assume i am using 4 nodes cluste
High: RA pingd: Set default ping interval to 1 instead of 0 seconds.
Produced high load and traffic.
xen-03:~ # cat /proc/loadavg
1.53 1.54 1.47 4/213 6733
xen-03:~ # ps aux|grep pingd
root 6735 0.0 0.0 5284 808 pts/1S+ 09:52 0:00 grep pingd
root 17399 40.7 0.0 65316 1620
Glory Smith wrote:
> Thanks for reply Andrew,
> i am using suse 11.
man insserv
Regards
Dominik
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Hi
I made the necessary changes to the showscores script to work with
pacemaker 1.0.2.
Please test and report problems. Has been reported to work by some
people and should go into the repository soon. Still, I'd like more
people to test and confirm.
Important changes:
* correctly fetch stickines
.
This is because the RA stops pingd with kill -9, which does not let it
execute the normal pingd shutdown procedure (which includes setting the
attribute to 0).
Patch is attached.
Regards
Dominik
exporting patch:
# HG changeset patch
# User Dominik Klein
# Date 1234950829 -3600
# Branch stable
Glory Smith wrote:
>>
>>
>>
>> we kill the node with STONITH.
>> very hard for a machine to write to shared media when its powered off.
>>
>>
>> we can kill nodes when:
>> - nodes become unresponsive - nodes are not part of the cluster that has
>> quorum
>> - resources fail to stop when instructed
Romi Verma wrote:
>> setting on_fail=fence for a monitor op will cause the cluster to shoot the
>> node immediately instead of trying to stop the resource and recover it
>> without fencing.
>>
>
> u said "instead of trying to stop the resource and recover it without
> fencing."
>
> do you mean i
Andrew Beekhof wrote:
>
> On Feb 13, 2009, at 8:15 AM, Dominik Klein wrote:
>
>> Neil Katin wrote:
>>>
>>> I've been trying to upgrade to pacemaker 1.0.1, and have been
>>> running the examples in a test environment. I've been trying
>>&
Neil Katin wrote:
>
> I've been trying to upgrade to pacemaker 1.0.1, and have been
> running the examples in a test environment. I've been trying
> to get the example on the DRBD page to work:
>
> http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0
>
> This line seems to have two problems with it:
Romi Verma wrote:
> On Fri, Feb 6, 2009 at 3:09 PM, Andrew Beekhof wrote:
>
>> On Feb 6, 2009, at 10:29 AM, Romi Verma wrote:
>>
>> > i want the partition without quorum to reset the nodes instead of
>>> killing .
is it possible.
>>> define the difference between reset node and kill node?
>
lure that led to loss of communication, the node reboots, restarts
the cluster software and everything should be fine again.
If there's a network problem, you would of course have to fix that ;)
Regards
Dominik
>> This is not happening in my case.
>> i dont have any stonith configu
Romi Verma wrote:
> Thanks Dominic,
> i have two questions now.
>
> 1) what does no-quorum-policy= suicide means then?? does it remove the
> resource completely.
That's not documented and I don't know it. Guess we need Andrew to shed
some light here.
> 2) why each node is thinking itsef as DC a
Romi Verma wrote:
> Thanks for fast reply ,
> Ok, Let me explain the situation. i have two nodes cluster . i pulled out
> the network cable of one
> node which produced spilit brain situation. this time both nodes are
> thinking that other one is dead. each node is thinking itself as DC and on
> e
77 matches
Mail list logo