On 2012-11-14T09:33:22, Digimer li...@alteeve.ca wrote:
As it was told to me, pcs was going to be what whas used officially,
but that anyone and everyone was welcome to continue using and
developing crm or any other existing or new management tool. My
take-away was that the devs wanted pcs,
On 2012-11-14T09:24:45, Rasto Levrinc rasto.levr...@gmail.com wrote:
What doesn't work? I think that at this point of time, it's be easier to
get crmsh going/fixed with pcmk 1.1.8. It's probably just some path
somewhere. If really nothing works, you *must* use LCMC, Pacemaker GUI. :)
crmsh's
On 2012-11-14T12:44:53, Digimer li...@alteeve.ca wrote:
Not really, to be honest. The way I see it is that Pacemaker is in tech
preview (on rhel, which is where I live). So almost by definition,
anything can change at any time. This is what happened here, so I don't
see a problem.
That is a
On 2012-11-12T10:07:50, Andrew Beekhof and...@beekhof.net wrote:
Um, are you setting a nodeid in corosync.conf?
Because I see this:
Nov 09 09:07:25 [2609] ha09a.mycharts.md crmd: crit:
crm_get_peer: Node ha09a.mycharts.md and ha09a share the same cluster
node id
On 2012-11-13T16:34:23, Robinson, Eric eric.robin...@psmnv.com wrote:
bump.
Could someone please review the logs in the links below and tell me what the
heck is going on with this cluster? I've never encountered anything like this
before. Basically, corosync thinks the cluster is healthy
On 2012-11-13T17:06:31, Robinson, Eric eric.robin...@psmnv.com wrote:
I'm not sure how to correct this. Here are the results of my name resolution
test on node ha09a...
I'd probably strip everything except the short names out of
/etc/HOSTNAME and /etc/hosts, though it may be sufficient to
On 2012-11-12T15:01:47, alain.mou...@bull.net wrote:
Thanks but no, in older releases, the op monitoring failed leaded to
fence as required by on-fail=fence .
yes, that's what should happen.
You can file a crm_report with the PE inputs showing this for 1.1.7, or
directly retest with 1.1.8.
On 2012-11-07T12:51:25, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
I agree that one shouldn't have to do it, but I've seen cases (two node
cluster with quorum-policy=ignore) where one node was down while the
cluster wanted to fence both nodes. So when the other node goes up, nodes
On 2012-11-05T17:05:35, Dejan Muhamedagic de...@suse.de wrote:
It's a debug instrumentation message. But it is only triggered when
someone runs crmadmin -S, -H to look up the DC or something, it isn't
triggered by the stack internally.
If it's a debug message, why is it then at severity
On 2012-11-05T15:31:25, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
I just experienced that the syslog message crmd: [12771]: info:
handle_request: Current ping state: S_TRANSITION_ENGINE is sent out several
times per second for an extended period of time.
So I wonder: Is it a
On 2012-10-31T15:59:05, Robinson, Eric eric.robin...@psmnv.com wrote:
Nobody has any thoughts on why my 2-node cluster has no DC? As I mentioned,
corosync-cfgtool -s shows the ring active with no faults.
That probably means that someone (i.e., you ;-) needs to dig more into
the logs of
On 2012-10-25T11:30:32, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
I just wonder: If the reason is some kind of resource shortage in the Xen
Host that causes Xen guests to fail booting, it would ne nice if that
situation could be detected. I was just asking for an already known
On 2012-10-24T11:15:14, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
I'm happy you have something that works for you.
Although even if you're using it in haresources mode, your resource
agents are still years out of date.
It doesn't have resource agents (that's one of its pluses in my
On 2012-10-22T14:12:17, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
Interesting formula: I'd use something like number of CPUs * 4, not divided
by.
Reason: Today's workload is usually limited by I/O, not by CPU power.
However with something crazy like 32 CPUs, 32 tasks can
On 2012-10-24T13:17:57, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
I have e.g. mon script that greps 'lsof -i' to see if httpd is listening
on * or cluster ip. Which IMO is a way saner check than wget'ting
http://localhost/server-status -- and treating a [34]04 as a fail. Hence
the plus
On 2012-10-24T13:23:09, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
PS. but for the most part, like you said: you *have* people stuck on
2.1.4 and you keep supporting them much as you hate it.
Yes, but on SLES10, that was an actually shipping version with full
support.
EPEL has different
On 2012-09-27T17:32:58, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
Just a note: As it turned out, the Xen RA (SLES11 SP2,
resource-agents-3.9.3-0.7.1) is broken, because migrate will never look at
the node_ip_attribute you configured.
It's line 369:
On 2012-09-27T16:36:08, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
Hi Ulrich,
we always appreciate your friendly, constructive and non-condescending
feedback.
However if you specify a duration like P2, the duration is not added to the
current time; instead the current time is used
On 2012-09-24T08:45:39, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
So I select on unique attribute name for Xen migration, specify that
in the Xen resource, and then define that attribute per node, using
one of the node's own IP addresses?
Yes. The idea is that this allows you to
On 2012-09-20T08:47:59, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
---(resource-agents-3.9.3-0.7.1 of SLES 11 SP2)---
node_ip_attribute (string): Node attribute containing target IP address
^^
In case of a live migration, the system will
On 2012-09-11T15:04:55, Alan Robertson al...@unix.sh wrote:
Depends. Pacemaker may still care about the status of these agents.
If it can't start or stop them, what can it do with them?
The status from these agents may feed into operations on other
resources that are fully managed.
On 2012-09-12T09:01:05, Alan Robertson al...@unix.sh wrote:
The status from these agents may feed into operations on other
resources that are fully managed.
Understood.
I believe it will care about those other agents - not these. It
shouldn't know about these, AFAIK.
I guess then
On 2012-09-07T13:46:27, Alan Robertson al...@unix.sh wrote:
Well, I presume that one would not tell pacemaker about such agents, as
they would not be useful to pacemaker. From the point of view of the
crm command, you wouldn't consider them as valid resource agents to
put in a
On 2012-09-05T15:25:44, Dejan Muhamedagic de...@suse.de wrote:
BTW, FWIW -
monocf may be just like ocf, sans start and stop operations.
That would make all ocf RA elligible for this use.
Thinking about this, not entirely. We'd have to fake the start/stop at
least. (In particular the start.)
On 2012-09-04T19:20:23, Alan Robertson al...@unix.sh wrote:
I will likely write a monitor-only resource agent for web servers. What
would you think about calling it from the other web resource agents?
Sharing code - in this case, the monitor-via-network of the http agents
- seems to make
On 2012-09-05T15:25:44, Dejan Muhamedagic de...@suse.de wrote:
How about a new element. Something like
primitive vm1 ocf:heartbeat:VirtualDomain
require vm1 web-test dns-test
How we map this into Pacemaker's dependency scheme is obviously open to
discussion.
The require would imply that
On 2012-09-05T07:54:46, Andrew Beekhof and...@beekhof.net wrote:
(Or rather, obscure enough to configure that it might well be a
bug.) It'd be trivial to just append the role to the operation key
too. (It'd cause a few monitors to be recreated on update, but
that'd be harmless.)
Not
On 2012-09-05T06:26:50, Stefan Schloesser sschloes...@enomic.com wrote:
Hi Lars,
my problem with the rolling upgrade is the drbd partition. If you migrate the
service its data will move too. If you then restart the cluster and migrate
back the data will not be in an upgraded state and
On 2012-09-04T10:50:11, EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)
external.martin.kon...@de.bosch.com wrote:
I was reporting a serious bug in _your_ product and instead of
thanking for the bugreport you simply closed it as invalid
The bug was reported without a support contract. A
On 2012-09-04T15:56:14, Stefan Schloesser sschloes...@enomic.com wrote:
Hi,
I would like to know what the recommended way is to update a cluster. Every
week or so bug fixes and security patches are released for a various parts of
the used software.
I prefer rolling upgrades; migrate
On 2012-08-31T14:56:22, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
Hi!
By random I realized that every node of my 5-node test-cluster had at least
one corosync-coredump. Unfortunately they even seem to have different
signatures. I can provide a rough backtrace to get you warmed
On 2012-08-31T13:41:14, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
Hi!
There are things I don't understand: Even after
# /usr/lib64/heartbeat/send_arp -i 200 -r 5 br0 172.20.3.59 f1e991b1b951
not_used not_used
neither the local arp table (arp) not the software bridge (brctl
On 2012-08-30T12:53:45, Stefan Schloesser sschloes...@enomic.com wrote:
I would like to configure the resource-stickiness to 0 tuesdays between 2
and 2:20 am local time.
I could not find any examples on how to do this using crm configure ... but
only the XML snippets to accomplish this.
On 2012-08-20T11:31:07, Lars Marowsky-Bree l...@suse.com wrote:
Okay, so there's a bug in the NFS agent, point taken. I'll investigate
why it took so long to release as a real maintenance update; you're
right, that shouldn't happen. (I can already see it in the update queue
though
On 2012-08-23T09:35:51, Francis SOUYRI francis.sou...@apec.fr wrote:
Hello Dejan,
With the FC 16 heartbeat is a 3.0.4 not a v1.
I do not use crm because I can success to implement ipfail.
Dejan was refering to the v1 mode, namely the one that uses
haresources. haresources can't drive
On 2012-08-21T15:39:06, Carlos Pedro carlos_pe...@yahoo.com wrote:
Dear Sirs,
I´m working in a project
and I was proposed to build three clusters using a common node, that
is:
Nodes cannot be shared between clusters like this.
You can either build a 2 node cluster (with all nodes in one),
On 2012-08-22T10:08:14, RaSca ra...@miamammausalinux.org wrote:
Thank you Ulrich,
As far as you know, Is there a way to override the ID for each cloned
instance of the mysql resource? How can I resolve the problem?
Just make the intervals slightly different - 31s, 30s, 29s ...
Regards,
On 2012-08-21T00:22:00, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
CLUSTERIP which you presumably mean by fun with iptables is basically
Jack gets all calls from even area codes and Jill: from odd area
codes. Yeah, you cold do that, I just can't imagine why.
Because the commonly given
On 2012-08-21T14:32:53, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
Maybe I'm expecting too much, but isn't it possible to simply log Telling
other nodes that PV blabla is being created?
The problem is the error case, in which we want more logs. There is
progress (libqb with the
On 2012-08-21T13:16:29, David Lang david_l...@intuit.com wrote:
with ldirectord you have an extra network hop, and you have all your
traffic going through one system. This is a scalability bottleneck as
well as bing a separate system to configure.
CLUSTERIP isn't the solution to every
On 2012-08-17T16:38:01, EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)
external.martin.kon...@de.bosch.com wrote:
I don't see an open bug for something like this right now.
Are you serious?
It was you who resolved this bug as INVALID in bugzilla
On 2012-08-17T16:42:42, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
obviously not, because I have the latest updates installed. It happens
frequently enough to care about it:
# zgrep sscan /var/log/messages-201208*.bz2 |wc -l
76
Here are some:
/var/log/messages-20120816.bz2:Aug
On 2012-08-17T18:14:18, EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)
external.martin.kon...@de.bosch.com wrote:
On the other hand you sofar did not provide any case where SLES11 SP2 runs
reliably unmodified in a mission critical environment (e.g. a HA NFS server)
without local bugfixes.
On 2012-08-13T15:39:22, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
Hi!
In pacemaker-1.1.6-1.29.1 (SLES11 SP2 x86_64) I see this for an idle cluster
with just one stonith resource being running when doing some unrelated change:
What is the unrelated change you are doing?
Does
On 2012-08-16T17:54:06, EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)
external.martin.kon...@de.bosch.com wrote:
Hi Martin,
From my experience with SLES11 SP2 (with all current updates) I conclude that
actually nobody is seriously running SP2 without local bugfixes.
That isn't quite true.
On 2012-08-17T08:41:15, Nikita Michalko michalko.sys...@a-i-p.com wrote:
I am also testing SP2 - and yes, it's true: not yet ready for production ;-(
What problems did you find?
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
On 2012-08-17T11:43:13, Nikita Michalko michalko.sys...@a-i-p.com wrote:
- e.g. the problem with SLES 11 SP2 kernels crash - the same as described by
Martin:
SP2 kernels crash seriously (when a node rejoins the cluster) when using
STCP as
recommended in the SLES HA documentation and
On 2012-08-14T12:44:43, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
The messages are coming from the stonith plugin (it's actually
in pacemaker). But I think that that got fixed in the meantime.
^
Do you have the latest maintenance update?
Yes, latest on SLES is
On 2012-08-14T16:59:02, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
While starting a clone resource (mount OCFS2 filesystem), I see this message
in syslog:
crmd: [31942]: notice: do_lrm_invoke: Not creating resource for a delete
event: (null)
info: notify_deleted: Notifying
On 2012-07-18T20:01:35, Arnold Krille arn...@arnoldarts.de wrote:
That would mean that your system runs the same whether one or two links are
present.
That's not what I said. What I said (or at least meant ;-) is that, even
in the degraded state, the performance must still be within
On 2012-07-16T11:53:55, Volker Poplawski volker.poplaw...@atrics.de wrote:
Hello everyone.
Could you please tell me the recommended mode for a bonded network
interface, which is used as the direct link in a two machine cluster?
There are 'balance-rr', 'active-backup', 'balance-xor' etc
On 2012-07-17T23:44:13, Arnold Krille arn...@arnoldarts.de wrote:
Additionally: If its two direct links dedicated to your storage network,
there is no reason going active/backup and discarding half of the
available bandwidth.
Since the system must be designed for one link to have adequate
On 2012-07-12T10:31:53, Caspar Smit c.s...@truebit.nl wrote:
Now the interesting part. I would like to create a software raid6 set
(or multiple) with the disks in the JBOD and have the possibility to
use
the raid6 in an active/passive cluster.
Sure. md RAID in a fail-over configuration is
On 2012-07-03T11:26:11, darren.mans...@opengi.co.uk wrote:
I'd like to second Lars' comments here. I was strong-armed into doing a
dual-primary DRBD + OCFS2 cluster and it's a nightmare to manage. There's no
reason for us to do it other than 'we could'. It just needed something simple
like
On 2012-07-02T10:42:33, EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)
external.martin.kon...@de.bosch.com wrote:
when a split brain (drbd) happens mount.ocfs2 remains hanging unkillable in
D-state.
Unsurprising, since all IO is frozen during that time (depending on your
drbd setup, but I'm
On 2012-07-02T12:05:33, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
Unfortunately unless there's a real cluster filesystem that supports
mirroring with shared devices also, DRBD on some locally mirrored device on
each node seems to be the only alternative. (Talking about desasters)
On 2012-07-02T12:37:52, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
I've seen very few scenarios where OCFS2 was worth it over just using a
regular file system like XFS in a fail-over configuration in this kind
of environment.
How would you fail over if your shared storage went
On 2012-06-29T08:19:41, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
For SLE HA 11 SP1, please report these issues to NTS and SUSE support.
As I'm sure they won't fix it in SP1 (that PTF is one year old now),
SP1 is still supported by SUSE, and noone but our support folks know
what
On 2012-06-28T11:37:37, Heitor Lessa heitor.le...@hotmail.com wrote:
Such issue happens because OCFS does not support changes (modify/del) nodes
in a running cluster, such tasks requires cluster down though.
If driven by Pacemaker, OCFS2 does support adding/removing nodes at
runtime.
On 2012-06-27T14:18:26, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
Hello,
I see problems with applying configuration diffs so frequrntly that I suspect
there's a bug in the code.
This is for SLES11 SP1 on x86_64 with corosync-1.4.1-0.3.3.3518.1.PTF.712037
and
On 2012-06-21T08:02:25, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
See, it's simple. Any partially completed operation or state - not
successful, ergo failure must be reported.
Is it correct that the standard recovery procedure for this failure is node
fencing then? If so it
On 2012-06-20T08:44:33, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
The problem is: What to do if 1 out of n exports fails: Is the resource
started or stopped then. Likewise for unexporting and monitoring.
If the operation partially failed, it is failed.
But to have a clean
On 2012-06-20T16:37:35, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
so what exit code is failed? Then: With the standard logic of stop
only performing when the resource is up (i.e. monitor reports
stopped), a partially started resource that the monitor considers
stopped may fail to
On 2012-06-20T17:46:19, Andreas Kurz andr...@hastexo.com wrote:
hb_report does not work.
how to do a report tarball ?
It has been renamed to crm_report
There's still both around. Just that different distributions ship
different implementations.
Because. Well. Because. /rant
Regards,
On 2012-06-19T08:38:11, alain.mou...@bull.net wrote:
So that means that my modifications by crm configure edit , even if they
are correct (I've re-checked them) ,
have potentially corrupt the Pacemaker configuration ?
No. The CIB automatically recovers from this by doing a full sync. The
On 2012-06-19T14:13:06, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
The problem is: What to do if 1 out of n exports fails: Is the resource
started or stopped then. Likewise for unexporting and monitoring.
If the operation partially failed, it is failed.
Regards,
Lars
--
On 2012-05-25T17:31:52, Florian Haas flor...@hastexo.com wrote:
Um, right now I have no opinion. Your commit messages are pretty
terse, and there's no README in the repo. Mind adding one?
FWIW, there is now a manual page as well. That might help with
understanding what it is supposed to do.
On 2012-06-06T17:26:41, RaSca ra...@miamammausalinux.org wrote:
Thank you Florian, but how can one declare an anonymous clone? Is it
implicit with the globally-unique=false?
You don't need to explicitly declare that. It is the default.
(But yes, the default is globally-unique=false.)
On 2012-06-01T13:10:17, alain.mou...@bull.net wrote:
-does that mean that it will be this Pacemaker/cman on RH ans SLES ?
-or do RH and SLES wil require a different stack under Pacemaker ?
Right now, SLE HA is on the plugin version of pacemaker, and SLE HA 11
will likely remain on it - that's
On 2012-06-01T16:16:20, Florian Haas flor...@hastexo.com wrote:
Dejan, Lars,
is it confirmed from your end that sbd is moving out of cluster-glue?
If so, it would be nice if we could get an cluster-glue release with
sbd removed, and a release of standalone sbd, so packagers can fix the
On 2012-05-29T08:39:06, Florian Haas flor...@hastexo.com wrote:
Should be packageable on every platform, though I admit that I've not
tried building the pacemaker module against anything but the
corosync+pacemaker+openais stuff we ship on SLE HA 11 so far.
Are you expecting this to build
On 2012-05-29T14:31:20, Florian Haas flor...@hastexo.com wrote:
Forgot to mention this in the original cover message, for those who
haven't been following the discussion: this is for sbd which is just
spinning off from cluster-glue.
Thanks, I've merged them both!
Regards,
Lars
--
On 2012-05-29T17:56:59, Florian Haas flor...@hastexo.com wrote:
In case you're wondering why I didn't use PKG_CHECK_MODULES for the PE
libraries: their pkg-config file is currently broken; Andrew has a
pull request for Pacemaker for that.
I was wondering more about how to build this against
On 2012-05-29T18:34:15, Florian Haas flor...@hastexo.com wrote:
Yeah, it seems you just broke the build by including cluster/stack.h
and not bothering to add an AC_CHECK_HEADERS to configure.ac. Where
does that come from, is that new to Pacemaker?
Uh? It builds here on the 1.1.7 pacemaker
On 2012-05-29T18:57:30, Florian Haas flor...@hastexo.com wrote:
The integration with the cluster stack is rather specific to whatever
pacemaker/corosync version + configuration you build against.
Unfortunately.
Well that's what #ifdef HAVE_CLUSTER_STACK_H and friends are good for, no?
I
On 2012-05-25T17:31:52, Florian Haas flor...@hastexo.com wrote:
That aside, what do you think of the idea/approach?
Um, right now I have no opinion. Your commit messages are pretty
terse, and there's no README in the repo. Mind adding one?
Good point. I wasn't aware the commit messages were
On 2012-05-25T21:44:25, Florian Haas flor...@hastexo.com wrote:
If so, the master thread will not self-fence even if the majority of
devices is currently unavailable.
That's it, nothing more. Does that help?
It does. One naive question: what's the rationale of tying in with
On 2012-05-24T14:34:59, Florian Haas flor...@hastexo.com wrote:
To give you a glance of the extended sbd code, you can check out
http://hg.linux-ha.org/sbd - the new Pacemaker integration is activated
using the -P option in /etc/sysconfig/sbd, otherwise sbd remains a
drop-in replacement
On 2012-05-15T13:17:11, William Seligman selig...@nevis.columbia.edu wrote:
I can post details and logs and whatnot, but I don't think I need to do
detailed
debugging. My question is:
I don't think your rationale holds true, though. Like Andrew said, this
is only ever just written, not read.
On 2012-05-08T12:08:27, Dejan Muhamedagic de...@suse.de wrote:
In the default (without OCF_CHECK_LEVE), it's enough to try unmount
the file system, isn't it?
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem#L774
I don't see a need to remove the STATUSFILE
On 2012-04-04T01:52:12, Christian Franke nob...@nowhere.ws wrote:
Hello Florian,
Your question is fully justified - I sincerely apologize for ignoring
that comprehensive documentation.
I rewrote the patch trying to adhere to the requirements given in the
documentation.
Hi Christian,
On 2012-04-04T11:28:31, Rainer Krienke krie...@uni-koblenz.de wrote:
There is one basic thing however I do not understand: My setup involves
only a clustered filesystem. What I do not understand is why a stonith
resource is needed at all in this case which after all causes freezes
of the
On 2012-04-03T10:32:48, Rainer Krienke krie...@uni-koblenz.de wrote:
Hi Rainer,
I am new to HA setup and my first try was to set up a HA cluster (using
SLES 11 SP2 and the SLES11 SP2 HA extension) that simply offers an
OCFS2 filesystem. I did the setup according to the SLES 11 SP2 HA
On 2012-04-03T15:50:29, Rainer Krienke krie...@uni-koblenz.de wrote:
rzinstal4:~ # sbd -d /dev/disk/by-id/scsi-259316a7265713551-part1 dump
==Dumping header on disk /dev/disk/by-id/scsi-259316a7265713551-part1
Header version : 2
Number of slots: 255
Sector size: 512
Timeout
On 2012-04-03T15:59:00, Rainer Krienke krie...@uni-koblenz.de wrote:
Hi Lars,
this was something I detected already. And I changed the timeout in the
cluster configuration to 200sec. So the log I posted was the result of
the configuration below (200sec). Is this still to small?
$ crm
On 2012-03-29T11:31:38, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
pengine: [17043]: WARN: pe_fence_node: Node h07 will be fenced because it is
un-expectedly down
Th software bind used is basically SLES11 SP1 with a newer corosync
(corosync-1.4.1-0.3.3.3518.1.PTF.712037). Were
On 2012-03-19T11:09:16, Dejan Muhamedagic de...@suse.de wrote:
--- a/heartbeat/pgsql
+++ b/heartbeat/pgsql
@@ -1,12 +1,13 @@
-#!/bin/sh
+#!/bin/bash
Our policy is not to change shell. Is that absolutely necessary?
He sends in many patches. bash is a 1MB install. I can't believe that
On 2012-03-15T15:59:21, William Seligman selig...@nevis.columbia.edu wrote:
Could this be an issue? I've noticed that my fencing agent always seems to be
called with action=reboot when a node is fenced. Why is it using 'reboot'
and
not 'off'? Is this the standard, or am I missing a
On 2012-03-14T18:22:42, William Seligman selig...@nevis.columbia.edu wrote:
Now consider a primary-primary cluster. Both run the same resource.
One fails. There's no failover here; the other box still runs the
resource. In my case, the only thing that has to work is cloned
cluster IP
On 2012-02-06T09:05:13, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
but like with CPU affinity there should be no needless change of the DC. I
also wondered why after each configuration change the DC is newly elected (it
seems).
It isn't (or shouldn't be). Still, the DC election
On 2012-02-06T22:13:20, Mayank mayank.mittal.1...@hotmail.com wrote:
rsc_colocation id=pgsql_vip_colocation rsc=virtua_ip score=INFINITY
with-rsc=pgsql9 with-rsc-role=Master/
The intention behind defining such constraints is to make sure that the
postgre should always run in the master role
On 2011-12-08T12:08:06, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
Dejan Muhamedagic deja...@fastmail.fm schrieb am 08.12.2011 um 11:28 in
Nachricht 20111208102833.GA12338@walrus.homenet:
Hi,
On Wed, Dec 07, 2011 at 02:26:52PM +0100, Ulrich Windl wrote:
Hi!
While
On 2011-12-04T00:57:05, Andreas Kurz andr...@hastexo.com wrote:
the concept of an arbitrator for split-site cluster is already
implemented and should be available with Pacemaker 1.1.6 though it seem
to be not directly documented ... beside source code and this draft
document:
Documentation
On 2011-12-05T22:37:03, Andreas Kurz andr...@hastexo.com wrote:
Did you clone the sbd resource? If yes, don't do that. Start it as a
primitive, so in case of a split brain at least one node needs to start
the stonith resource which should give the other node an advantage ...
adding a
On 2011-12-01T13:48:56, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
I wonder about that usefulness of that value, especially as any configuration
change seems to increase the epoch anyway. I never saw that CRM cares about
the cib-last-written string.
It is for easy inspection by
On 2011-11-28T21:14:22, Florian Haas flor...@hastexo.com wrote:
Seems to make sense. of course, an alternative would be to add a
Conflicts: lvm2 x.y.z to the package on the respective versions to
make sure it's only installed with a fixed lvm2 package ...?
Surely you're joking.
On 2011-11-28T15:04:45, alain.mou...@bull.net wrote:
sorry but I forgot if there is another way than crm configure edit to
modify
all the value of on-fail= for all resources in the configuration ?
If they're explicitly set, you have to modify them all.
Otherwise, look at op_defaults or
On 2011-11-29T08:33:01, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:
The state of an unmanaged resource is the state when it left the managed
meta-state.
That is not correct. An unmanaged resource is not *managed*, but its
state is still relevant to other resources that possibly
On 2011-11-29T22:10:10, Andreas Kurz andr...@hastexo.com wrote:
IIRC stonith resources are always started first and stopped last anyways
... without extra constraints ... implicitly. Please someone correct me
if I'm wrong.
Yes, but they are not mandatory. The configuration that was discussed
On 2011-11-29T12:36:39, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
If you repeatedly try to re-sync with a dying disk, with each resync
interrupted by i/o error, you will get data corruption sooner or later.
No, you shouldn't. (Unless the drive returns faulty data on read, which
is actually a
201 - 300 of 1358 matches
Mail list logo