Re: [Pacemaker] asymmetric clusters, remote nodes, and monitor operations

2013-11-12 Thread Andrew Beekhof
run on. Previously I'd noticed that LSB resources probed on nodes that don't have the associated init script would fail; looks like that is also getting reported as OCF_NOT_INSTALLED, so perhaps is the same problem. On Wed, Sep 4, 2013 at 12:49 AM, Andrew Beekhof and...@beekhof.net wrote

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 11:22 am, Sean Lutner s...@rentul.net wrote: On Nov 12, 2013, at 6:01 PM, Andrew Beekhof and...@beekhof.net wrote: On 13 Nov 2013, at 6:10 am, Sean Lutner s...@rentul.net wrote: The folks testing the cluster I've been building have run a script which blocks all

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 11:49 am, Sean Lutner s...@rentul.net wrote: On Nov 12, 2013, at 7:33 PM, Andrew Beekhof and...@beekhof.net wrote: On 13 Nov 2013, at 11:22 am, Sean Lutner s...@rentul.net wrote: On Nov 12, 2013, at 6:01 PM, Andrew Beekhof and...@beekhof.net wrote: On 13

Re: [Pacemaker] Question about the resource to fence a node

2013-11-12 Thread Andrew Beekhof
On 16 Oct 2013, at 8:51 am, Andrew Beekhof and...@beekhof.net wrote: On 15/10/2013, at 8:24 PM, Kazunori INOUE kazunori.ino...@gmail.com wrote: Hi, I'm using pacemaker-1.1 (the latest devel). I started resource (f1 and f2) which fence vm3 on vm1. $ crm_mon -1 Last updated: Tue Oct

Re: [Pacemaker] The larger cluster is tested.

2013-11-11 Thread Andrew Beekhof
://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing Regards, Yusuke 2013/11/8 Andrew Beekhof and...@beekhof.net: On 8 Nov 2013, at 12:10 am, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew The shown code seems not to process correctly. I wrote correction. Please check

Re: [Pacemaker] recover cib from raw file

2013-11-11 Thread Andrew Beekhof
On 11 Nov 2013, at 9:41 pm, s.oreilly s.orei...@linnovations.co.uk wrote: Hi, Is it possible to recover/replace cib.xml from one of the raw files in /var/lib/pacemaker/cib? I would like to reset the cib to the configuration referenced in cib.last. In the case cib-89.raw I haven't

Re: [Pacemaker] crm_mon segment fault con fedora 20

2013-11-11 Thread Andrew Beekhof
. Thanks 2013/11/11 Andrew Beekhof and...@beekhof.net On 9 Nov 2013, at 8:56 am, emmanuel segura emi2f...@gmail.com wrote: Hello Andrew, You can the file in the attachment. It would be very useful to know what is NULL at: 1196node_name = g_strdup_printf(%s:%s

Re: [Pacemaker] stonith_admin does not work as expected

2013-11-11 Thread Andrew Beekhof
Impossible to comment without knowing the pacemaker version, full config, and how fence_ifmib works (I assume its a custom agent?) On 12 Nov 2013, at 1:21 am, andreas graeper agrae...@googlemail.com wrote: hi, two nodes. n1 (slave) fence_2:stonith:fence_ifmib n2 (master)

Re: [Pacemaker] The larger cluster is tested.

2013-11-11 Thread Andrew Beekhof
On 11 Nov 2013, at 11:48 pm, yusuke iida yusk.i...@gmail.com wrote: Execution of the graph was also checked. Since the number of pending(s) is restricted to 16 from the middle, it is judged that batch-limit is effective. Observing here, even if a job is restricted by batch-limit, two or

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-11 Thread Andrew Beekhof
On 12 Nov 2013, at 10:29 am, Andrew Beekhof and...@beekhof.net wrote: On 12 Nov 2013, at 2:46 am, Vladislav Bogdanov bub...@hoster-ok.com wrote: 11.11.2013 09:00, Vladislav Bogdanov wrote: ... Looking at crm-fence-peer.sh script, it would determine peer state as offline immediately

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-11 Thread Andrew Beekhof
Can you try with these two patches please? + Andrew Beekhof (4 seconds ago) fec946a: Fix: crmd: When the DC gracefully shuts down, record the new expected state into the cib (HEAD, master) + Andrew Beekhof (10 seconds ago) 740122a: Fix: crmd: When a peer expectedly shuts down, record the new

Re: [Pacemaker] Stonith question

2013-11-10 Thread Andrew Beekhof
On 9 Nov 2013, at 1:55 am, s.oreilly s.orei...@linnovations.co.uk wrote: Hi Chrissie, thanks I did try that and it didn't work, but then, neither has adding the location constraints so maybe (and this is very possible) I am doing something else wrong!! Quite probably. But we cant say for

Re: [Pacemaker] crm_mon segment fault con fedora 20

2013-11-10 Thread Andrew Beekhof
. p *node p *node-details p *node-details-remote_rsc p *node-details-remote_rsc-container Also, if you have the actual XML, that may be useful. The tools can't read the crmsh syntax. 2013/11/8 Andrew Beekhof and...@beekhof.net On 6 Nov 2013, at 9:36 am, emmanuel segura emi2f...@gmail.com

Re: [Pacemaker] Remove a ghost node

2013-11-10 Thread Andrew Beekhof
On 8 Nov 2013, at 12:59 pm, Sean Lutner s...@rentul.net wrote: On Nov 7, 2013, at 8:34 PM, Andrew Beekhof and...@beekhof.net wrote: On 8 Nov 2013, at 4:45 am, Sean Lutner s...@rentul.net wrote: I have a confusing situation that I'm hoping to get help with. Last night after

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-10 Thread Andrew Beekhof
On 5 Nov 2013, at 2:22 am, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi Andrew, David, all, Just found interesting fact, don't know is it a bug or not. When doing service pacemaker stop on a node which has drbd resource promoted, that resource does not promote on another node, and

Re: [Pacemaker] pacemaker - ClusterMon

2013-11-10 Thread Andrew Beekhof
On 18 Oct 2013, at 7:49 am, Denise Cosso guanae...@yahoo.com.br wrote: Hello, I configured in Pacemaker ClusterMon but not receive email. Already tested the email from the command line and working. I think it's the version of crm_mon Could anyone help me?? You have an MTA

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-10 Thread Andrew Beekhof
On 8 Nov 2013, at 7:49 am, Andrey Groshev gre...@yandex.ru wrote: Hi, PPL! I need help. I do not understand... Why has stopped working. This configuration work on other cluster, but on corosync1. So... cluster postgres with master/slave. Classic config as in wiki. I build cluster, start,

Re: [Pacemaker] What value should be in the $OCF_RESKEY_CRM_meta_notify_slave_uname when a quorum is lost?

2013-11-10 Thread Andrew Beekhof
On 6 Nov 2013, at 12:42 am, Andrey Groshev gre...@yandex.ru wrote: Hi All! I am interested in this subject, because happens is the following situation. I build cluster on four nodes with postgres master/slave configuration. Set quorum-policy=stop Run the cluster and conducted an experiment

Re: [Pacemaker] Pacemaker-corosync update attribute issue

2013-11-10 Thread Andrew Beekhof
On 22 Oct 2013, at 3:43 am, A66A a6ap...@gmail.com wrote: Hello, I have a problem with my 2-node cluster. In some reasons one of my nodes can't update attributes due to error - warning: attrd_cib_callback: Update PostgreSQL-status=HS:async failed: Transport endpoint is not connected.

Re: [Pacemaker] Monitoring on master node not running after standby is connected

2013-11-10 Thread Andrew Beekhof
On 16 Oct 2013, at 12:21 am, Juraj Fabo juraj.f...@gmail.com wrote: Juraj Fabo juraj.fabo@... writes: Hello Andrew thank you for the response. I've patched crmd, cleaned the cluster, done the scenario steps and created crm_report which is attached. After loading the cluster

Re: [Pacemaker] Remove a ghost node

2013-11-10 Thread Andrew Beekhof
On 11 Nov 2013, at 11:44 am, Sean Lutner s...@rentul.net wrote: On Nov 10, 2013, at 6:27 PM, Andrew Beekhof and...@beekhof.net wrote: On 8 Nov 2013, at 12:59 pm, Sean Lutner s...@rentul.net wrote: On Nov 7, 2013, at 8:34 PM, Andrew Beekhof and...@beekhof.net wrote: On 8 Nov

Re: [Pacemaker] Remove a ghost node

2013-11-10 Thread Andrew Beekhof
On 11 Nov 2013, at 12:03 pm, Sean Lutner s...@rentul.net wrote: On Nov 10, 2013, at 7:54 PM, Andrew Beekhof and...@beekhof.net wrote: On 11 Nov 2013, at 11:44 am, Sean Lutner s...@rentul.net wrote: On Nov 10, 2013, at 6:27 PM, Andrew Beekhof and...@beekhof.net wrote: On 8 Nov

Re: [Pacemaker] qb_ipcs_us_connection_acceptor: Could not accept client connection: Too many open files (24)

2013-11-08 Thread Andrew Beekhof
On 9 Nov 2013, at 7:19 am, Moturi Upendra moturi.upen...@gmail.com wrote: Hello Andrew, I have installed pacemaker on RHEL6.4 Please try the updated 6.4 packages from Red Hat (pacemaker 1.1.10 and libqb 0.16) but now and then i get the following error and need to restart pacemaker

Re: [Pacemaker] The larger cluster is tested.

2013-11-07 Thread Andrew Beekhof
at all? Regards, Yusuke 2013/11/7 Andrew Beekhof and...@beekhof.net: On 7 Nov 2013, at 12:43 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew 2013/11/7 Andrew Beekhof and...@beekhof.net: On 6 Nov 2013, at 4:48 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew I tested

Re: [Pacemaker] /var/lib/pacemaker/cores cleanup

2013-11-07 Thread Andrew Beekhof
On 7 Oct 2013, at 5:52 pm, Mailing List SVR li...@svrinformatica.it wrote: Il 07/10/2013 04:16, Andrew Beekhof ha scritto: On 05/10/2013, at 7:11 AM, Mailing List SVR li...@svrinformatica.it wrote: Hi, I have a pacemaker cluster running fine since 2 months, I noticed

Re: [Pacemaker] /var/lib/pacemaker/cores cleanup

2013-11-07 Thread Andrew Beekhof
On 8 Nov 2013, at 10:27 am, Andrew Beekhof and...@beekhof.net wrote: On 7 Oct 2013, at 5:52 pm, Mailing List SVR li...@svrinformatica.it wrote: Il 07/10/2013 04:16, Andrew Beekhof ha scritto: On 05/10/2013, at 7:11 AM, Mailing List SVR li...@svrinformatica.it wrote: Hi, I have

Re: [Pacemaker] two design questions about active/active mode?

2013-11-07 Thread Andrew Beekhof
On 21 Aug 2013, at 1:50 pm, Wen Wen (NCS) w...@ncs.com.sg wrote: Hi all, I am doing dual nodes practice. I use CentOS 6.3 x86_64 pacemaker DRBD and GFS2 for my cluster. I already test many times I have a design question. Here is my crm status on one node after I set this node from

Re: [Pacemaker] Simple installation Pacemaker + CMAN + fence-agents

2013-11-07 Thread Andrew Beekhof
Something seems very wrong with this at the corosync level. Even fenced and the dlm are having issues. Jan: Could this be firewall related? On 27 Sep 2013, at 10:44 pm, Bartłomiej Wójcik bartlomiej.woj...@turbineam.com wrote: W dniu 2013-09-27 04:26, Andrew Beekhof pisze: On 26/09/2013, at 8

Re: [Pacemaker] location ping rules

2013-11-07 Thread Andrew Beekhof
On 7 Nov 2013, at 9:34 pm, s.oreilly s.orei...@linnovations.co.uk wrote: Having some trouble getting a location rule to work. Here is my current config: Resources: Master: master_drbd Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true Resource:

Re: [Pacemaker] Remove a ghost node

2013-11-07 Thread Andrew Beekhof
On 8 Nov 2013, at 4:45 am, Sean Lutner s...@rentul.net wrote: I have a confusing situation that I'm hoping to get help with. Last night after configuring STONITH on my two node cluster, I suddenly have a ghost node in my cluster. I'm looking to understand the best way to remove this node

Re: [Pacemaker] Upgrade of Pacemaker on CentOS 6.4 to 1.1.10 - Delay RA missing ...

2013-11-07 Thread Andrew Beekhof
On 7 Nov 2013, at 9:30 pm, Robert H. pacema...@elconas.de wrote: This does a reasonable job of explaining: http://blog.clusterlabs.org/blog/2013/pacemaker-and-rhel-6-dot-4/ I see, thanks for the hint ... small step for man, huge step for mankind .. (or something like this :)) I would

Re: [Pacemaker] crm_mon segment fault con fedora 20

2013-11-07 Thread Andrew Beekhof
On 6 Nov 2013, at 9:36 am, emmanuel segura emi2f...@gmail.com wrote: Hello everybody, On Fedora 20 i got a crm_mon segment fault with the following configuration http://ur1.ca/fzndq maybe my configuration is wrong, but in any case the is what i saw with gdb http://ur1.ca/fznf2 Best to

Re: [Pacemaker] Upgrade of Pacemaker on CentOS 6.4 to 1.1.10 - Delay RA missing ...

2013-11-06 Thread Andrew Beekhof
On 7 Nov 2013, at 12:15 am, Robert Heinzmann (pacemaker) pacema...@elconas.de wrote: Hello, I just upgraded to the latest CentOS release of Pacemaker (released on 1.11.2013) on our test environment and we get not installed messages for ocf:heartbeat:Delay. It seems that the ressource

Re: [Pacemaker] nofile limit with pacemaker

2013-11-06 Thread Andrew Beekhof
On 6 Nov 2013, at 6:08 pm, Daniel Jung mimianddan...@gmail.com wrote: Hi all, I came across a problem with no file limit for one of the primitives(slapd) running in pacemaker. To verify that this is result of running in pacemaker, i put the node in standby and start up Slapd manually,

Re: [Pacemaker] The larger cluster is tested.

2013-11-06 Thread Andrew Beekhof
https://drive.google.com/file/d/0BwMFJItoO-fVV1BoTjVQMk52WEU/edit?usp=sharing 2013/11/6 Andrew Beekhof and...@beekhof.net: On 5 Nov 2013, at 12:48 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew I tested by this commitment. https://github.com/beekhof/pacemaker/commit

Re: [Pacemaker] The larger cluster is tested.

2013-11-06 Thread Andrew Beekhof
On 7 Nov 2013, at 12:43 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew 2013/11/7 Andrew Beekhof and...@beekhof.net: On 6 Nov 2013, at 4:48 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew I tested by the following versions. https://github.com/ClusterLabs/pacemaker

Re: [Pacemaker] The larger cluster is tested.

2013-11-05 Thread Andrew Beekhof
/0BwMFJItoO-fVWDg1Sjc2WXltUjQ/edit?usp=sharing Regards, Yusuke 2013/10/31 Andrew Beekhof and...@beekhof.net: On 29 Oct 2013, at 12:12 am, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew I tested using following commit. https://github.com/beekhof/pacemaker/commit

Re: [Pacemaker] Resource only failsover in one direction

2013-11-03 Thread Andrew Beekhof
On 24 Oct 2013, at 10:45 am, Lucas Brown lu...@locatrix.com wrote: Date: Tue, 22 Oct 2013 09:32:59 +0200 From: emmanuel segura emi2f...@gmail.com To: cae7pj3av7tbgcfjya5leayqwew4sfecsrlo55wqrfjv7joo...@mail.gmail.com

Re: [Pacemaker] Pacemaker crash on node unstandby/standby.

2013-11-03 Thread Andrew Beekhof
On 19 Oct 2013, at 12:45 pm, Justin Burnham jburnha...@gmail.com wrote: Hi, I am having an issue with Pacemaker on the cman stack where I can reliably cause pacemaker to crash and coredump when I put the node in standby or unstandby. Here is my messages log from when I do a unstandby

[Pacemaker] findif.sh

2013-10-31 Thread Andrew Beekhof
Howdy, Could we get some documentation for findif.sh? Specifically what should and shouldn't work. I'm seeing various behaviour and its not clear to me whether the results are intended or not. My eth0: inet 192.168.122.101/24 scope global eth0 inet6 fe80::5054:ff:fee8:c8a7/64 scope

Re: [Pacemaker] findif.sh

2013-10-31 Thread Andrew Beekhof
On 1 Nov 2013, at 12:19 pm, Andrew Beekhof and...@beekhof.net wrote: Howdy, Could we get some documentation for findif.sh? Specifically what should and shouldn't work. I'm seeing various behaviour and its not clear to me whether the results are intended or not. My eth0: inet

Re: [Pacemaker] crm_attribute: temporary (lifetime 'reboot') attribute

2013-10-31 Thread Andrew Beekhof
On 1 Nov 2013, at 6:50 am, Jason Harley jhar...@redmind.ca wrote: Hello — I’ve got Pacemaker 1.1.10 (1.1.10+git20130802-1ubuntu1) and Corosync 2.3.0 (2.3.0-1ubuntu4) running in a cluster managing a simple resource group. I would like to create an (arbitrary) attribute that goes away on

Re: [Pacemaker] updating corosync without interrupting resources

2013-10-31 Thread Andrew Beekhof
On 29 Oct 2013, at 12:39 am, Karl Rößmann k.roessm...@fkf.mpg.de wrote: Hi, we want to increase some corosync token-parameters slightly: token, token_retransmits_before_loss_const,

Re: [Pacemaker] Asymmetric cluster, clones, and location constraints

2013-10-30 Thread Andrew Beekhof
On 25 Oct 2013, at 9:40 am, David Vossel dvos...@redhat.com wrote: - Original Message - From: Lindsay Todd rltodd@gmail.com To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Sent: Wednesday, October 23, 2013 2:38:17 PM Subject: Re: [Pacemaker]

Re: [Pacemaker] Could not initialize corosync configuration API error 2

2013-10-30 Thread Andrew Beekhof
Jan: not sure if you're on the pacemaker list On 29 Oct 2013, at 6:43 pm, Bauer, Stefan (IZLBW Extern) stefan.ba...@iz.bwl.de wrote: Dear Developers/Users, we’re using Pacemaker 1.1.7 and Corosync Cluster Engine 1.4.2 with Debian 6 and a recent vanilla Kernel (3.10). On quite a lot

Re: [Pacemaker] Corosync hanging during stop

2013-10-30 Thread Andrew Beekhof
On 17 Oct 2013, at 8:05 pm, D.Gossrau dgoss...@andtek.com wrote: Hi Lars, On 10/12/2013 02:14 AM, Lars Ellenberg wrote: On Thu, Oct 10, 2013 at 04:06:46PM +0200, Detlef Gossrau wrote: Hi, I created a cluster installation with two nodes. Everything is running smoothly most of the time.

Re: [Pacemaker] pacemaker shutdown under high load

2013-10-30 Thread Andrew Beekhof
On 17 Oct 2013, at 1:37 am, Alessandro Bono alessandro.b...@gmail.com wrote: On 16/10/2013 00:11, Andrew Beekhof wrote: On 09/10/2013, at 10:53 PM, Alessandro Bono alessandro.b...@gmail.com wrote: Hi this week end my pacemaker shutdown on primary node during machine backup

Re: [Pacemaker] The larger cluster is tested.

2013-10-30 Thread Andrew Beekhof
. Regards, Yusuke 2013/10/20 Andrew Beekhof and...@beekhof.net: On 18/10/2013, at 10:12 PM, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew Now, I am testing the configuration of one standby node and active node of 15. About 10 Dummy resources are started per node. If all the nodes

Re: [Pacemaker] An internal error occurred in crmd

2013-10-30 Thread Andrew Beekhof
I think this should be fixed by: https://github.com/beekhof/pacemaker/commit/ea7991f The underlying issue though, is that the lrmd command timed out, which _should_ have been fixed by: https://github.com/beekhof/pacemaker/commit/d65b270 What are you doing to this poor cluster? :) On 21

Re: [Pacemaker] Stonith issue with fence_virsh

2013-10-30 Thread Andrew Beekhof
Personally I use fence_xvm. IIRC, it's the supported equivalent of fence_virsh. On 24 Oct 2013, at 6:38 pm, Beo Banks beo.ba...@googlemail.com wrote: hi, i have enable the debug option and i use the ip instead of hostname primitive stonith-zarafa02 stonith:fence_virsh \ params

Re: [Pacemaker] resources does not start on survied node after reboot

2013-10-30 Thread Andrew Beekhof
On 30 Oct 2013, at 1:12 am, Саша Александров shurr...@gmail.com wrote: Hi! I have a 2-node cluster with shared storage and SBD-fencing. One node was down for maintenance. Due to external reasons, second node was rebotted. After reboot service never got up: Oct 29 13:04:21 wcs2

Re: [Pacemaker] libqb-0.16 instability with standby/unstandby ?

2013-10-22 Thread Andrew Beekhof
On 23 Oct 2013, at 8:39 am, David Vossel dvos...@redhat.com wrote: - Original Message - From: Mike Pomraning m...@pilcrow.madison.wi.us To: pacemaker@oss.clusterlabs.org Sent: Tuesday, October 22, 2013 10:49:28 AM Subject: [Pacemaker] libqb-0.16 instability with standby/unstandby ?

Re: [Pacemaker] The larger cluster is tested.

2013-10-20 Thread Andrew Beekhof
On 18/10/2013, at 10:12 PM, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew Now, I am testing the configuration of one standby node and active node of 15. About 10 Dummy resources are started per node. If all the nodes are started with this composition, before all the resources start,

Re: [Pacemaker] Question about the resource to fence a node

2013-10-15 Thread Andrew Beekhof
On 15/10/2013, at 8:24 PM, Kazunori INOUE kazunori.ino...@gmail.com wrote: Hi, I'm using pacemaker-1.1 (the latest devel). I started resource (f1 and f2) which fence vm3 on vm1. $ crm_mon -1 Last updated: Tue Oct 15 15:16:37 2013 Last change: Tue Oct 15 15:16:21 2013 via crmd on vm1

Re: [Pacemaker] ping monitor

2013-10-15 Thread Andrew Beekhof
On 14/10/2013, at 7:51 PM, s.oreilly s.orei...@linnovations.co.uk wrote: Hi, I am setting up a 2 node mysql cluster using pcs instead of crm for the first time and have everything working nicely except my ping monitor. Ping is running on both nodes but I can't figure out how to configure

Re: [Pacemaker] Rule constraint monitoring interval

2013-10-15 Thread Andrew Beekhof
On 10/10/2013, at 3:22 AM, Sam Gardner lwnex...@gmail.com wrote: As I understand it, there are two ways to monitor the status of a resource. 1) Use the monitor action on the resource agent script - this is equivalent to polling the resource at every monitor interval 2) Write a value

Re: [Pacemaker] pacemaker shutdown under high load

2013-10-15 Thread Andrew Beekhof
On 09/10/2013, at 10:53 PM, Alessandro Bono alessandro.b...@gmail.com wrote: Hi this week end my pacemaker shutdown on primary node during machine backup attached compressed log of primary node, logs of secondary node is too big, if needed I can provide as external link inspecting logs

Re: [Pacemaker] Bug? failed to stonith with fence_ipmilan on CentOS6.2

2013-10-15 Thread Andrew Beekhof
On 09/10/2013, at 1:53 PM, Xiaomin Zhang zhangxiao...@gmail.com wrote: I think I know why this happened after I enabled 'verbose' for fence_ipmilan. When I firstly configure stonith, I set lanplus as true, however, my machine is not HP one so lanplus is not supported. When I notice this, I

Re: [Pacemaker] Service restoration in clone resource group

2013-10-15 Thread Andrew Beekhof
On 10/10/2013, at 12:52 PM, Sean Lutner s...@rentul.net wrote: On Oct 8, 2013, at 9:45 AM, Sean Lutner s...@rentul.net wrote: On Oct 8, 2013, at 9:33 AM, Lars Marowsky-Bree l...@suse.com wrote: On 2013-10-08T09:29:14, Sean Lutner s...@rentul.net wrote: The clone was created using

Re: [Pacemaker] Fedora 20 Alpath with pcs and crm

2013-10-14 Thread Andrew Beekhof
On 13/10/2013, at 9:22 AM, emmanuel segura emi2f...@gmail.com wrote: Hello list I'm testing Fedora 20 Alpath with the new tool pcs, but i saw in pacemaker this two parameters rsc_defaults resource-stickiness and property default-resource-stickiness what is the defirente between then?

Re: [Pacemaker] How to disallow resources to re-run after they were stopped

2013-10-07 Thread Andrew Beekhof
On 01/10/2013, at 8:09 PM, Mistina Michal michal.mist...@virte.sk wrote: Dear all. I am using ping resource agent to check if the resources can run on the node. I have 2 node cluster. If the destination is down resources are also down on that particular node. If the destination can be

Re: [Pacemaker] Failover IP + One service on multi-node cluster only works on two nodes.

2013-10-07 Thread Andrew Beekhof
On 07/10/2013, at 11:25 PM, Bright Dadson losintik...@yahoo.co.uk wrote: Hi All, Stack: pacemaker/heartbeat My setup consists of multiple nodes, atleast three with below configs. ##Heatbeat config autojoin none bcast eth0 warntime 3 deadtime 6 initdead 60 keepalive 1 node

Re: [Pacemaker] DRBD Master/Slave in a 3 node cluster

2013-10-06 Thread Andrew Beekhof
On 06/10/2013, at 8:20 PM, Stefan Botter listrea...@jsj.dyndns.org wrote: Hi Dejan, On Tuesday 01 October 2013 14:04:54 Stefan Botter wrote: Dejan Muhamedagic deja...@fastmail.fm wrote: On Tue, Oct 01, 2013 at 09:26:14AM +0200, Stefan Botter wrote: I have a quite similar setup, currently

Re: [Pacemaker] How to do crm resource cleanup with the new pacemaker ?

2013-10-06 Thread Andrew Beekhof
On 07/10/2013, at 10:22 AM, Lev Sidorenko l...@securemedia.co.nz wrote: Hello All! On the good old pacemaker there was a crm shell which is substituted by pcs now. But I can't find how to cleanup a resource with pcs. How it can be done now? Do you have: pcs resource cleanup --help

Re: [Pacemaker] /var/lib/pacemaker/cores cleanup

2013-10-06 Thread Andrew Beekhof
On 05/10/2013, at 7:11 AM, Mailing List SVR li...@svrinformatica.it wrote: Hi, I have a pacemaker cluster running fine since 2 months, I noticed that in the folder /var/lib/pacemaker/cores/root I have about 1,5 GB of files core., who is responsabile to cleanup these files, Ideally

Re: [Pacemaker] Problems when quorum lost for a short period of time

2013-10-01 Thread Andrew Beekhof
On 02/10/2013, at 6:26 AM, Lev Sidorenko l...@securemedia.co.nz wrote: Hello All! I have a 4-nodes cluster setup. It is actually 2 nodes for main+stanby and another two nodes just for provide quorum. 1 extra would have been enough So, all resources run on the main node but only

Re: [Pacemaker] Corosync won't recover when a node fails

2013-10-01 Thread Andrew Beekhof
On 02/10/2013, at 5:24 AM, David Parker dpar...@utica.edu wrote: Thanks, I did a little Googling and found the git repository for pcs. pcs won't help you rebuild pacemaker with cman support (or corosync 2.x support) turned on though. Is there any way to make a two-node cluster work with

Re: [Pacemaker] cibadmin -Q: Call cib_query failed (-62): Timer expired

2013-10-01 Thread Andrew Beekhof
, but in meantime, if you had a moment, any hint would be welcomed. many thanks, On Thu, Sep 26, 2013 at 9:25 PM, Andrew Beekhof and...@beekhof.net wrote: On 27/09/2013, at 8:45 AM, Radoslaw Garbacz radoslaw.garb...@xtremedatainc.com wrote: Hi, I have a problem starting up

Re: [Pacemaker] Simple installation Pacemaker + CMAN + fence-agents

2013-09-26 Thread Andrew Beekhof
On 26/09/2013, at 8:35 PM, Bartłomiej Wójcik bartlomiej.woj...@turbineam.com wrote: Hello, I install Pacemaker in accordance with http://clusterlabs.org/quickstart-ubuntu.html on Ubuntu 13.04 two nodes changing only the IP addresses. /etc/cluster/cluster.conf: ?xml version=1.0?

Re: [Pacemaker] cibadmin -Q: Call cib_query failed (-62): Timer expired

2013-09-26 Thread Andrew Beekhof
On 27/09/2013, at 8:45 AM, Radoslaw Garbacz radoslaw.garb...@xtremedatainc.com wrote: Hi, I have a problem starting up a cluster after upgrading corosync from 1.4 to 2.3.2 and pacemaker from 1.8 to 1.9. All crm_node calls report well, but any CIB manipulation fails, i.e.: * crm_node

Re: [Pacemaker] Monitoring - pacemaker

2013-09-24 Thread Andrew Beekhof
/crm_mon.html extra_options=-T guanae...@yahoo.com.br \ op monitor interval=10s timeout=20s Do you have a mail server configured on that machine? Does 'crm_mon --help' show that -T is supported? I await the return Denise De: Andrew Beekhof and...@beekhof.net Para: Denise Cosso guanae

Re: [Pacemaker] Resource is not started immediately regardless of require-all=false

2013-09-23 Thread Andrew Beekhof
On 12/09/2013, at 8:29 PM, Takatoshi MATSUO matsuo@gmail.com wrote: Hi all I use resource_set and require-all='false' as follows constraints rsc_order id=order1 kind=Mandatory resource_set id=order1-0 resource_ref id=dummy1/

Re: [Pacemaker] pacemaker dies with full log (was: Re: pacemaker dies without logs)

2013-09-23 Thread Andrew Beekhof
On 23/09/2013, at 8:27 PM, Alessandro Bono alessandro.b...@gmail.com wrote: On Mon, 23 Sep 2013 15:31:59 +1000, Andrew Beekhof wrote: I see: Sep 22 00:45:48 [4412] ga1-ext pacemakerd:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Sep 22 00:45

Re: [Pacemaker] cib connection error

2013-09-23 Thread Andrew Beekhof
On 24/09/2013, at 2:09 AM, Халезов Иван i.khale...@rts.ru wrote: Hi all, I use pacemaker 1.1.9 with corosync 2.3 both built from source. My OS is CentOS 6.4 x86_64 I have about 30 resources of one type managed by my own resource agent. It is nesessary for the resource agent to know

Re: [Pacemaker] Pacemaker basic installation in CentOS 6.4

2013-09-22 Thread Andrew Beekhof
On 21/09/2013, at 3:34 AM, Gopalakrishnan N gopalakrishnan...@gmail.com wrote: Thought of sharing my experience with basic installation http://gopalstech.blogspot.com/2013/09/pacemaker-basic-setup-with-cent-os-64.html But long way to go Since you're copying the structure and whole

Re: [Pacemaker] exit code crm_attibute

2013-09-22 Thread Andrew Beekhof
On 20/09/2013, at 5:53 PM, Andrey Groshev gre...@yandex.ru wrote: Hi again! Today again met a strange behavior. I asked for a non-existent attribute of an existing node. # crm_attribute --type nodes --node-uname exist.node.domain.com --attr-name notexistattibute --query ; echo $?

Re: [Pacemaker] Monitoring - pacemaker

2013-09-22 Thread Andrew Beekhof
On 20/09/2013, at 4:52 AM, Denise Cosso guanae...@yahoo.com.br wrote: Hi, I have a cluster (2 machines) email using the pacemaker / corosync as Active / Passive. Already configured filesystem (, ocf: heartbeat: Filesystem) SFEX, ping (ocf: pacemaker: ping) and start / stop

Re: [Pacemaker] pacemaker dies with full log (was: Re: pacemaker dies without logs)

2013-09-22 Thread Andrew Beekhof
I see: Sep 22 00:45:48 [4412] ga1-ext pacemakerd:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Sep 22 00:45:48 [4419] ga1-ext stonith-ng:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Sep 22 00:45:48

Re: [Pacemaker] Monitoring on master node not running after standby is connected

2013-09-22 Thread Andrew Beekhof
On 20/09/2013, at 1:39 AM, Juraj Fabo juraj.f...@gmail.com wrote: diff -urp pacemaker-Pacemaker-1.1.10.z0/crmd/lrm.c pacemaker-Pacemaker-1.1.10/crmd/lrm.c --- pacemaker-Pacemaker-1.1.10.z0/crmd/lrm.c2013-07-26 00:02:31.0 + +++ pacemaker-Pacemaker-1.1.10/crmd/lrm.c

Re: [Pacemaker] Postgresql Replication

2013-09-19 Thread Andrew Beekhof
On 12/09/2013, at 11:58 PM, Takatoshi MATSUO matsuo@gmail.com wrote: Hi 2013/9/12 Eloy Coto Pereiro eloy.c...@gmail.com: Hi, Thanks for your help, I use the same example. In this case Kamailio need to start after postgresql. But this is not a problem I think, the replication work

Re: [Pacemaker] [Openais] very slow pacemaker/corosync shutdown

2013-09-19 Thread Andrew Beekhof
On 19/09/2013, at 8:25 AM, David Lang da...@lang.hm wrote: I have been using heartbeat for many years, but am now setting up some new clusters with pacemaker/corosync. I'm not sure which component is having problems so I'm sending to both lists. These are two machine clusters, configured

Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down

2013-09-19 Thread Andrew Beekhof
You might need -R On 19/09/2013, at 2:54 AM, Andreas Mock andreas.m...@web.de wrote: Hallo Andreas, thank you for your reply. I use 1.1.11-git. What I did: I put one node down (servive pacemaker stop) and then execute a crm_simulate -Ls -u node and I only see the output as said before.

Re: [Pacemaker] Strange bad permissions issue when starting pacemaker

2013-09-19 Thread Andrew Beekhof
the situation (the name is that matters) That _might_ not be true for some older versions. I would definitely try making them the same 2013/9/17 Andrew Beekhof and...@beekhof.net On 17/09/2013, at 3:42 AM, Саша Александров shurr...@gmail.com wrote: Hi, everyone! I have a pretty

Re: [Pacemaker] monitor on disabled nodes

2013-09-19 Thread Andrew Beekhof
On 19/09/2013, at 2:13 AM, Radoslaw Garbacz radoslaw.garb...@xtremedatainc.com wrote: Hi, I have a question regarding the monitor operation on disabled nodes. I noticed that this operation is called even, when an agent is disabled for a node. Is it an indented behavior Yes. We have

Re: [Pacemaker] corosync service start giving segmentation fault on centos

2013-09-19 Thread Andrew Beekhof
On 19/09/2013, at 8:17 PM, Aarti Sawant aartipsawan...@gmail.com wrote: hello, pacemaker starts but it corosync service is not started Are you using cman (cluster.conf) or the corosync plugin (corosync.conf) Pacemaker only starts corosync if cman (which is a particular way of running

Re: [Pacemaker] Strange bad permissions issue when starting pacemaker

2013-09-19 Thread Andrew Beekhof
/pacemaker-logging/ It turned out that on the problem node root user had GID not 0 but 501. Very odd. :-) Very :) 2013/9/19 Andrew Beekhof and...@beekhof.net On 17/09/2013, at 4:30 PM, Саша Александров shurr...@gmail.com wrote: Andrew, Sorry, here it is: [root@premium2 ~]# ls

Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down

2013-09-19 Thread Andrew Beekhof
On 19/09/2013, at 5:57 PM, Andreas Mock andreas.m...@web.de wrote: Hi Lars, hi Andrew, thank you for your answers. But I'm still stuck. When I do have both nodes online and the resources are spread over these nodes and I do a crm_simulate -Ls -R -d node1 I do see nicly what would

Re: [Pacemaker] Mysql multiple slaves, slaves restarting occasionally without a reason

2013-09-19 Thread Andrew Beekhof
On 10/09/2013, at 4:07 PM, Attila Megyeri amegy...@minerva-soft.com wrote: Hi, We have a Mysql cluster which works fine when I have a single master (“A”) and slave (“B”). Failover is almost immediate and I am happy with this approach. When we configured two additional slaves, strange

Re: [Pacemaker] [Openais] very slow pacemaker/corosync shutdown

2013-09-19 Thread Andrew Beekhof
On 20/09/2013, at 8:19 AM, Lists li...@benjamindsmith.com wrote: On 09/18/2013 06:49 PM, Andrew Beekhof wrote: On 19/09/2013, at 8:25 AM, David Lang da...@lang.hm wrote: What's the best way to see what it's getting stuck doing? Log files. Is there a good way to tell

Re: [Pacemaker] very slow pacemaker/corosync shutdown

2013-09-19 Thread Andrew Beekhof
On 19/09/2013, at 7:45 PM, David Lang da...@lang.hm wrote: On Thu, 19 Sep 2013, Florian Crouzat wrote: Le 19/09/2013 00:25, David Lang a ?crit : I'm frequently running into a problem that shutting down pacemaker/corosync takes a very long time (several minutes) Just to be 100% sure, you

Re: [Pacemaker] corosync service start giving segmentation fault on centos

2013-09-19 Thread Andrew Beekhof
On 19/09/2013, at 11:07 PM, Aarti Sawant aartipsawan...@gmail.com wrote: hello, i am running corosync plugin and not using cman. then contrary to what digimer said, you do need to start corosync yourself. On Thu, Sep 19, 2013 at 4:22 PM, Andrew Beekhof and...@beekhof.net wrote

Re: [Pacemaker] [Openais] very slow pacemaker/corosync shutdown

2013-09-19 Thread Andrew Beekhof
On 20/09/2013, at 10:46 AM, Lists li...@benjamindsmith.com wrote: On 09/19/2013 04:50 PM, Andrew Beekhof wrote: From this we can infer that corosync has gotten horribly confused and, as a consequence, pacemaker can't talk to its peers anymore. this is a test cluster and not being

Re: [Pacemaker] Confused at the state of Linux Clustering..

2013-09-16 Thread Andrew Beekhof
On 17/09/2013, at 4:01 AM, Errol Neal en...@businessgrade.com wrote: Hi. I'm trying to figure out EXACTLY what the process is for getting a cluster working on Ubuntu Raring. Most of the clusters I implemented required dlm-controld.pcmk, but I don't see this being shipped anymore in most

Re: [Pacemaker] Strange bad permissions issue when starting pacemaker

2013-09-16 Thread Andrew Beekhof
On 17/09/2013, at 3:42 AM, Саша Александров shurr...@gmail.com wrote: Hi, everyone! I have a pretty strange issue. When starting pacemaker, I get Sep 16 21:21:03 premium2 cib[27510]: notice: main: Using new config location: /var/lib/pacemaker/cib Sep 16 21:21:03 premium2 cib[27510]:

Re: [Pacemaker] samba inside xen-vm; device held open, migration fails

2013-09-16 Thread Andrew Beekhof
On 16/09/2013, at 12:11 AM, ge...@riseup.net wrote: Hi all, I'm in the process of deploying a pacemaker cluster, running several xen vms, storage is done with drbd. Everything works like a charm, and now I just found the root cause (at least I believe) for the issue device is still held

Re: [Pacemaker] DRBD resources show as running when they don't exist

2013-09-16 Thread Andrew Beekhof
On 15/09/2013, at 3:14 AM, Stephen Marsh step...@serverforce.net wrote: Hi all, I'm using Corosync 2.3.1 with Pacemaker 1.1.10 (final release) and DRBD 8.4.3. I've got a strange problem with the way Pacemaker handles DRBD resources that don't exist. I'm testing with this config:

Re: [Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

2013-09-12 Thread Andrew Beekhof
On 12/09/2013, at 4:44 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2013-09-12T14:34:02, Andrew Beekhof and...@beekhof.net wrote: Well, they're all doing something completely different. No, they're all crude approximations designed to stop the cluster as a whole from using up so much

Re: [Pacemaker] different behavior cibadmin -Ql with cman and corosync2

2013-09-12 Thread Andrew Beekhof
, Chrissie is looking at making the combined data set available in a different namespace for pacemaker to use. 05.09.2013, 15:49, Christine Caulfield ccaul...@redhat.com: On 05/09/13 11:33, Andrew Beekhof wrote: On 05/09/2013, at 6:37 PM, Christine Caulfield ccaul...@redhat.com wrote

Re: [Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS)

2013-09-12 Thread Andrew Beekhof
On 09/09/2013, at 6:46 PM, Heikki Manninen h...@iki.fi wrote: Hello Andreas, thanks for your input, much appreciated. On 5.9.2013, at 16.39, Andreas Mock andreas.m...@web.de wrote: 1) The second output of crm_mon show a resource IP_database which is not shown in the initial crm_mon

Re: [Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

2013-09-11 Thread Andrew Beekhof
On 11/09/2013, at 5:54 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2013-04-02T17:02:01, David Vossel dvos...@redhat.com wrote: I'm convinced this useful. I'll add PCMK_MAX_CHILDREN to the sysconfig documentation. To be backwards compatible I'll have the lrmd internally interpret

<    3   4   5   6   7   8   9   10   11   12   >