Re: [Pacemaker] possible node status

2014-01-08 Thread Andrew Beekhof
On 9 Jan 2014, at 8:30 am, Michael Schwartzkopff wrote: > Hi, > > what are fro mthe pacemaker point of view all possible node status'? > > online|standby|offline|... failed too iirc. crm_mon.c would know for sure > > Any future extensions planned? not that i know of > > Thanks for any fe

Re: [Pacemaker] lrmd segfault at pacemaker 1.1.11-rc1

2014-01-08 Thread Andrew Beekhof
On 8 Jan 2014, at 9:15 pm, Kazunori INOUE wrote: > 2014/1/8 Andrew Beekhof : >> >> On 18 Dec 2013, at 9:50 pm, Kazunori INOUE wrote: >> >>> Hi David, >>> >>> 2013/12/18 David Vossel : >>>> >>>> That's a really

Re: [Pacemaker] Question about node-action-limit and migration-limit

2014-01-07 Thread Andrew Beekhof
On 18 Dec 2013, at 9:51 pm, Kazunori INOUE wrote: > Hi, > > When I set only migration-limit without setting node-action-limit in > pacemaker-1.1, > the number of 'operation' other than migrate_to/from was limited to > the value of migration-limit. > (The node that I used has 8 cores.) > > [cib

Re: [Pacemaker] host came online but it was ignored

2014-01-07 Thread Andrew Beekhof
test pacemaker with heartbeat anymore). If you can reproduce with something more recent, I'd be happy to take a look at the logs. > > Thanks > Eswar > > > On Wed, Dec 11, 2013 at 9:35 AM, ESWAR RAO wrote: > Hi Andrew, > > # pacemakerd --version > Pacemaker

Re: [Pacemaker] Reg. trigger when node failure occurs

2014-01-07 Thread Andrew Beekhof
On 11 Dec 2013, at 3:45 pm, ESWAR RAO wrote: > Hi Micheal, > > I am configuring the ClusterMon as below on the 3 node setup: > I am following > http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/ > > # crm configure primitive Cluster

Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2014-01-07 Thread Andrew Beekhof
What version of pacemaker? There were some improvements to how we handle sending messages via CPG recently. On 10 Dec 2013, at 4:40 am, Brian J. Murrell wrote: > On Mon, 2013-12-09 at 09:28 +0100, Jan Friesse wrote: >> >> Error 6 error means "try again". This is happening ether if corosync is >

Re: [Pacemaker] prevent starting resources on failed node

2014-01-07 Thread Andrew Beekhof
On 7 Dec 2013, at 2:17 am, Brian J. Murrell (brian) wrote: > [ Hopefully this doesn't cause a duplicate post but my first attempt > returned an error. ] > > Using pacemaker 1.1.10 (but I think this issue is more general than that > release), I want to enforce a policy that once a node fails, n

Re: [Pacemaker] hangs pending

2014-01-07 Thread Andrew Beekhof
On 29 Nov 2013, at 7:17 pm, Andrey Groshev wrote: > Hi, ALL. > > I'm still trying to cope with the fact that after the fence - node hangs in > "pending". Please define "pending". Where did you see this? > At this time, there are constant re-election. > Also, I noticed the difference when yo

Re: [Pacemaker] Weird behavior of PCS command while defining DRBD resources

2014-01-07 Thread Andrew Beekhof
On 27 Nov 2013, at 10:21 pm, Muhammad Kamran Azeem wrote: > Apologies for double post. In my initial post, I forgot to set the subject > properly. > > > Hello List, > > I am new here. I worked with Linux HA during 2006-2008, went in HPC > direction, and came back to HA a month ago. Realize

Re: [Pacemaker] Breaking dependency loop && stonith

2014-01-07 Thread Andrew Beekhof
On 27 Nov 2013, at 12:26 am, Andrey Groshev wrote: > Hi, ALL. > > I want to clarify two more questions. > After stonith reboot - this node hangs with status "pending". > The logs found string . > >info: rsc_merge_weights:pgsql:1: Breaking dependency loop at > msPostgresql >inf

Re: [Pacemaker] disable migration after the faicount

2014-01-07 Thread Andrew Beekhof
On 26 Nov 2013, at 11:55 pm, ESWAR RAO wrote: > Hi All, > > Even I tried with meta allow-migrate="false", but still the resource is > migrating to another node. I think you mean "move" - aka. "stopped here and started there". In which case you need to use a location constraint (with score=-IN

Re: [Pacemaker] some questions about STONITH

2014-01-07 Thread Andrew Beekhof
On 26 Nov 2013, at 12:39 am, Andrey Groshev wrote: >> ...snip... >>> Make next test: >>> #stonith_admin --reboot=dev-cluster2-node2 >>> Node reboot, but resource don't start. >>> In crm_mon status - Node dev-cluster2-node2 (172793105): pending. >>> And it will be hung. >> >> That is *proba

Re: [Pacemaker] Starting Pacemaker Cluster Manager [FAILED]

2014-01-07 Thread Andrew Beekhof
On 21 Nov 2013, at 9:56 pm, Miha wrote: > HI, > > how can i delete/reset all config, so that I could do again: "pcs cluster destroy" on all nodes looks about right > > 'pcs cluster setup mycluster pcmk-1 pcmk-2' and begin again at the beginning? > tnx! > > p.s.: bellowe is a log > > Nov 2

Re: [Pacemaker] Minor buffer overflow..

2014-01-07 Thread Andrew Beekhof
On 5 Dec 2013, at 3:20 pm, Rob Thomas wrote: > I was idly wondering why the SMTP and SNMP modules were disabled by > default on the RHEL builds, and was in the middle of writing a shell > script to duplicate them when I noticed there was a tiny buffer > overflow in crm_mon. > > This may be why

Re: [Pacemaker] reboot of non-vm host results in VM restart -- of chickens and eggs and VMs

2014-01-07 Thread Andrew Beekhof
On 20 Dec 2013, at 5:30 am, Bob Haxo wrote: > Hello, > > Earlier emails related to this topic: > [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster > [pacemaker] VirtualDomain problem after reboot of one node > > > My configuration: > > RHEL6.5/CMAN/gfs2/Pacemaker/crmsh >

Re: [Pacemaker] lrmd segfault at pacemaker 1.1.11-rc1

2014-01-07 Thread Andrew Beekhof
On 18 Dec 2013, at 9:50 pm, Kazunori INOUE wrote: > Hi David, > > 2013/12/18 David Vossel : >> >> That's a really weird one... I don't see how it is possible for op->id to be >> NULL there. You might need to give valgrind a shot to detect whatever is >> really going on here. >> >> -- Voss

Re: [Pacemaker] Manual fence confirmation by stonith_admin doesn't work again.

2014-01-07 Thread Andrew Beekhof
On 19 Dec 2013, at 6:54 pm, Nikita Staroverov wrote: > >> Please see: >> >> https://access.redhat.com/site/articles/36302 >> >> If you don't have an account, the relevant part is: >> >> "Usage of fence_manual is not supported in any production cluster. You may >> use this fence agent for de

Re: [Pacemaker] CentOS 6.5 Pacemaker Oracle Active/Failover cluster setup on SAN

2014-01-07 Thread Andrew Beekhof
On 6 Jan 2014, at 4:15 pm, Pui Edylie wrote: > Good Day members, > > I am wondering if anyone has set this up successfully? > > I noticed that there is a lack of Oracle script to initiate this. > > I would willing to pay someone for this effort and hopefully we could create > a howto to bene

Re: [Pacemaker] Pacemaker and RHEL/CENTOS 5.x compatibility ?

2013-12-19 Thread Andrew Beekhof
On 20 Dec 2013, at 1:36 am, Stephane Robin wrote: > Hi, > > This is a follow up on my previous post 'Trouble building Pacemaker from > source on CentOS 5.10' > Andrew: Thanks for your pointers. > > It turns out Pacemaker 1.1.10 needed more changes to build on CentOS 5.x. > • revert of

Re: [Pacemaker] Time to get ready for 1.1.11

2013-12-19 Thread Andrew Beekhof
gt; Andrew > > - Original Message - >> From: "David Vossel" >> To: "The Pacemaker cluster resource manager" >> Sent: Wednesday, December 11, 2013 3:33:46 PM >> Subject: Re: [Pacemaker] Time to get ready for 1.1.11 >> >>

Re: [Pacemaker] Trouble building Pacemaker from source on CentOS 5.10

2013-12-18 Thread Andrew Beekhof
On 14 Dec 2013, at 7:51 am, Stephane Robin wrote: > Hi, > > I'm trying to build Pacemaker-1.1.10 (from git), with corosync 2.3.2 and > libqb 0.16.0 on a CentOS 5.10 64b system. > I have latest auto tools (automake 1.14, autoconf 2.69, lib tool 2.4, > pkg-config 0.27.1) > > For Pacemaker, I'

Re: [Pacemaker] question on "on-fail=restart"

2013-12-18 Thread Andrew Beekhof
On 19 Dec 2013, at 4:03 am, Brusq, Jerome wrote: > Dear all, > > I have a custom lsb script that launch a custom process. > > primitive myscript lsb:ha_swift \ >op start interval="0" timeout="30s" \ >op stop interval="0" timeout="30s" \ >op monitor interval="15s" on-fail="restart

Re: [Pacemaker] is ccs as racy as it feels?

2013-12-10 Thread Andrew Beekhof
On 10 Dec 2013, at 11:31 pm, Brian J. Murrell wrote: > On Tue, 2013-12-10 at 10:27 +, Christine Caulfield wrote: >> >> Sadly you're not wrong. > > That's what I was afraid of. > >> But it's actually no worse than updating >> corosync.conf manually, > > I think it is... > >> in fact it

Re: [Pacemaker] host came online but it was ignored

2013-12-10 Thread Andrew Beekhof
version of pacemaker? On 10 Dec 2013, at 10:41 pm, ESWAR RAO wrote: > Hi Micheal, > > There are no firewall rules. > > I could only see below messages in logs: > > Dec 10 14:13:48 nvp-common crmd: [9220]: WARN: crmd_ha_msg_callback: Ignoring > HA message (op=join_announce) from nvsd-1: not i

Re: [Pacemaker] no-quorum-policy="freeze"

2013-12-01 Thread Andrew Beekhof
On Wed, Nov 27, 2013, at 04:50 AM, Olivier Nicaise wrote: Hello all, I have an issue with the no quorum policy freeze (stonith disabled). I'm using an old version of pacemaker (1.1.6), the one distributed by Ubuntu 12.04. I have a cluster with 3 nodes running various resources, including drbd

Re: [Pacemaker] pacemaker 1.1.11 rc1 compilation error on RHEL6.5

2013-12-01 Thread Andrew Beekhof
On Sun, Nov 24, 2013, at 06:20 PM, T.J. Yang wrote: Hi Any pointer for the following compilation errors ? [tjyang@pm1 services]$ make CC libcrmservice_la-upstart.lo cc1: warnings being treated as errors upstart.c: In function ‘upstart_job_property’: upstart.c:264: error: implicit dec

Re: [Pacemaker] Where the heck is Beekhof?

2013-12-01 Thread Andrew Beekhof
On Thu, 28 Nov 2013 12:04:01 +1100 Andrew Beekhof <[2]and...@beekhof.net> wrote: > If you find yourself asking $subject at some point in the next couple > of months, the answer is that I'm taking leave to look after our new > son (Lawson Tiberius Beekhof) who was born on Tuesd

[Pacemaker] Where the heck is Beekhof?

2013-11-27 Thread Andrew Beekhof
If you find yourself asking $subject at some point in the next couple of months, the answer is that I'm taking leave to look after our new son (Lawson Tiberius Beekhof) who was born on Tuesday. I will be dropping in occasionally to see how things are travelling and attempt to get 1.1.11 finalis

Re: [Pacemaker] exit code crm_attibute

2013-11-21 Thread Andrew Beekhof
gt; Hellow Andrew! >> >> I'm sorry, forgot about this thread, and now again came across the same >> problem. >> # crm_attribute --type nodes --node-uname fackename.node.org --attr-name >> notexistattibute --query > /dev/null; echo $? >>

Re: [Pacemaker] p_mysql peration monitor failed 'not installed'

2013-11-21 Thread Andrew Beekhof
On 22 Nov 2013, at 7:32 am, Miha wrote: > HI, > > what could be a reason for this error: > > notice: unpack_rsc_op: Preventing p_mysql from re-starting > on sip2: operation monitor failed 'not installed' (rc=5) the agent, or something the agent needs is not available. how did you configure p_

Re: [Pacemaker] CentOS 6.4 last update - Failed to create cluster resources with pcs command

2013-11-21 Thread Andrew Beekhof
On 22 Nov 2013, at 4:15 am, Dmitry Bron wrote: > Hi All, > > We have two fresh installed boxes with CentOS 6.4 and with last updates which > we want to configure as Active - Standby in HA cluster. > We copied all configuration files from another worked well HA cluster. We > already have anoth

[Pacemaker] Time to get ready for 1.1.11

2013-11-20 Thread Andrew Beekhof
With over 400 updates since the release of 1.1.10, its time to start thinking about a new release. Today I have tagged release candidate 1[1]. The most notable fixes include: + attrd: Implementation of a truely atomic attrd for use with corosync 2.x + cib: Allow values to be added/updated and

Re: [Pacemaker] pacemaker update crash my config (cannot be represented in the CLI notation)

2013-11-20 Thread Andrew Beekhof
On 21 Nov 2013, at 6:08 am, Lars Marowsky-Bree wrote: > On 2013-11-20T16:43:51, Beo Banks wrote: > >> INFO: object cli-prefer-mysql cannot be represented in the CLI notation >> >> >> crm configure show | grep xml >> INFO: object cli-prefer-mysql cannot be represented in the CLI notation >> x

Re: [Pacemaker] stonith ra class missing

2013-11-19 Thread Andrew Beekhof
On 19 Nov 2013, at 4:19 pm, Michael Schwartzkopff wrote: > > > > > Andrew Beekhof schrieb: >> >> On 19 Nov 2013, at 1:23 am, Michael Schwartzkopff wrote: >> >>> Hi, >>> >>> I installed pacemaker on a RHEL 6.4 machine. Now crm

Re: [Pacemaker] Remove a "ghost" node

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 3:21 am, Sean Lutner wrote: > > On Nov 17, 2013, at 7:40 PM, Andrew Beekhof wrote: > >> >> On 15 Nov 2013, at 2:28 pm, Sean Lutner wrote: >> >>>>>> Yes the varnish resources are in a group which is then cloned. >>>

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 3:09 pm, Andrew Beekhof wrote: > > On 19 Nov 2013, at 2:50 pm, Rob Thomas wrote: > >>>>> On 19 Nov 2013, at 6:00 am, Rob Thomas wrote: >>>>>> So.

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 2:50 pm, Rob Thomas wrote: > >>> On 19 Nov 2013, at 6:00 am, Rob Thomas wrote: > So... What's the -right- way to do it then? 8) > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > ... > > >> I'll have a try with the setoptions

Re: [Pacemaker] No such device, problem with setting pacemaker

2013-11-18 Thread Andrew Beekhof
On 18 Nov 2013, at 11:59 pm, Miha wrote: > HI, > > I am for the first time setting cluster with pacemaker & corosync. > Server A and server B can ping each other, I have disabled selinux and > iptables but I can not get this going. I did step by step as is writen in > tutorial. Have you conf

Re: [Pacemaker] The larger cluster is tested.

2013-11-18 Thread Andrew Beekhof
t; https://drive.google.com/file/d/0BwMFJItoO-fVYl9Gbks2VlJMR0k/edit?usp=sharing > batch-limit=4 > https://drive.google.com/file/d/0BwMFJItoO-fVZnJIazd5MFQ1aGs/edit?usp=sharing > > The report at the time of making it operate by my test code is the following. > https://drive.google.c

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 10:30 am, Rob Thomas wrote: > On Tue, Nov 19, 2013 at 8:55 AM, Andrew Beekhof wrote: >> >> On 19 Nov 2013, at 6:00 am, Rob Thomas wrote: >>> So... What's the -right- way to do it then? 8) >> >> >> >>

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 6:00 am, Rob Thomas wrote: > On Mon, Nov 18, 2013 at 9:17 PM, Andrew Beekhof wrote: > >> my eyes! my eyes! > > So... What's the -right- way to do it then? 8) http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resourc

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 6:00 am, Rob Thomas wrote: > On Mon, Nov 18, 2013 at 9:17 PM, Andrew Beekhof wrote: > >> my eyes! my eyes! > > So... What's the -right- way to do it then? 8) http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resourc

Re: [Pacemaker] stonith ra class missing

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 1:23 am, Michael Schwartzkopff wrote: > Hi, > > I installed pacemaker on a RHEL 6.4 machine. Now crm tells me that there is > no > stonith ra class, onyl lsb, ocf and service. > > What did I miss? thanks for any valuable comments. did you install the fencing-agents packag

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 18 Nov 2013, at 3:30 pm, Rob Thomas wrote: >> I've been browsing through the cluster.log, and it's not even trying >> to move httpd. I'm almost certain that it used to work fine with >> resource sets. Hmm. > > OK. I went and -actually looked- at the CIB I was previously generating. > > Thi

Re: [Pacemaker] Finally. A REAL question.

2013-11-17 Thread Andrew Beekhof
On 18 Nov 2013, at 12:43 pm, Rob Thomas wrote: > Previously, using crm, it was reasonably painless to ensure that > resource groups ran on the same node. > > I'm having difficulties figuring out what the 'right' way to do this is with > pcs You tried: pcs constraint colocation add asteris

Re: [Pacemaker] Remove a "ghost" node

2013-11-17 Thread Andrew Beekhof
On 15 Nov 2013, at 2:28 pm, Sean Lutner wrote: Yes the varnish resources are in a group which is then cloned. >>> >>> -EDONTDOTHAT >>> >>> You cant refer to the things inside a clone. >>> 1.1.8 will have just been ignoring those constraints. >> >> So the implicit order and colocation con

Re: [Pacemaker] CentOS 6.4 and CFS.

2013-11-17 Thread Andrew Beekhof
On 16 Nov 2013, at 9:42 am, Rob Thomas wrote: >> Line 363 of /usr/lib/python2.6/site-packages/pcs/cluster.py has this: >> >>nodes = utils.getNodesFromCorosyncConf() > > Ahha. Look what I just spotted. > > https://github.com/feist/pcs/commit/8b888080c37ddea88b92dfd95aadd78b9db68b55 Are yo

Re: [Pacemaker] CentOS 6.4 and CFS.

2013-11-15 Thread Andrew Beekhof
On 15 Nov 2013, at 5:56 pm, Rob Thomas wrote: > So I'm a long time corosync fan, and I've recently come back into the > fold to change everything I've previously written to pcs, because > that's the new cool thing. > > Sadly, things seem to be a bit broken. > > Here's how things have gone toda

Re: [Pacemaker] Question about the resource to fence a node

2013-11-14 Thread Andrew Beekhof
On 14 Nov 2013, at 5:53 pm, Kazunori INOUE wrote: > Hi, Andrew > > 2013/11/13 Kazunori INOUE : >> 2013/11/13 Andrew Beekhof : >>> >>> On 16 Oct 2013, at 8:51 am, Andrew Beekhof wrote: >>> >>>> >>>> On 15/10/2013, at 8:24

Re: [Pacemaker] Remove a "ghost" node

2013-11-14 Thread Andrew Beekhof
On 15 Nov 2013, at 10:24 am, Sean Lutner wrote: > > On Nov 14, 2013, at 6:14 PM, Andrew Beekhof wrote: > >> >> On 14 Nov 2013, at 2:55 pm, Sean Lutner wrote: >> >>> >>> On Nov 13, 2013, at 10:51 PM, Andrew Beekhof wrote: >>> >

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-14 Thread Andrew Beekhof
On 14 Nov 2013, at 5:06 pm, Andrey Groshev wrote: > > > 14.11.2013, 02:22, "Andrew Beekhof" : >> On 14 Nov 2013, at 6:13 am, Andrey Groshev wrote: >> >>> 13.11.2013, 03:22, "Andrew Beekhof" : >>>> On 12 Nov 2013, at 4:42

Re: [Pacemaker] Remove a "ghost" node

2013-11-14 Thread Andrew Beekhof
On 14 Nov 2013, at 2:55 pm, Sean Lutner wrote: > > On Nov 13, 2013, at 10:51 PM, Andrew Beekhof wrote: > >> >> On 14 Nov 2013, at 1:12 pm, Sean Lutner wrote: >> >>> >>> On Nov 10, 2013, at 8:03 PM, Sean Lutner wrote: >>> >>

Re: [Pacemaker] Remove a "ghost" node

2013-11-13 Thread Andrew Beekhof
On 14 Nov 2013, at 1:12 pm, Sean Lutner wrote: > > On Nov 10, 2013, at 8:03 PM, Sean Lutner wrote: > >> >> On Nov 10, 2013, at 7:54 PM, Andrew Beekhof wrote: >> >>> >>> On 11 Nov 2013, at 11:44 am, Sean Lutner wrote: >>> >>

Re: [Pacemaker] crmd Segmentation fault at pacemaker 1.0.12

2013-11-13 Thread Andrew Beekhof
On 13 Nov 2013, at 7:36 pm, TAKATSUKA Haruka wrote: > Hello, pacemaker hackers > > I report crmd's crash at pacemaker 1.0.12 . > > We are going to upgrade pacemaker 1.0.12 to 1.0.13 . > But I was not able to find a fix for this problem from ChangeLog. > tengine.c:do_te_invoke() is not seem to

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-13 Thread Andrew Beekhof
On 14 Nov 2013, at 6:13 am, Andrey Groshev wrote: > > > 13.11.2013, 03:22, "Andrew Beekhof" : >> On 12 Nov 2013, at 4:42 pm, Andrey Groshev wrote: >> >>> 11.11.2013, 03:44, "Andrew Beekhof" : >>>> On 8 Nov 2013, at 7:49 am, And

Re: [Pacemaker] stonith_admin does not work as expected

2013-11-13 Thread Andrew Beekhof
> name="action" value="off"/> > id="fence_2-instance_attributes-pcmk_poweroff_action" > name="pcmk_poweroff_action" value="off"/> > name="pcmk_host_list" value="lisel2"/> >

Re: [Pacemaker] Question about the resource to fence a node

2013-11-12 Thread Andrew Beekhof
On 16 Oct 2013, at 8:51 am, Andrew Beekhof wrote: > > On 15/10/2013, at 8:24 PM, Kazunori INOUE wrote: > >> Hi, >> >> I'm using pacemaker-1.1 (the latest devel). >> I started resource (f1 and f2) which fence vm3 on vm1. >> >> $ crm_mon -1

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 11:49 am, Sean Lutner wrote: > > >> On Nov 12, 2013, at 7:33 PM, Andrew Beekhof wrote: >> >> >>> On 13 Nov 2013, at 11:22 am, Sean Lutner wrote: >>> >>> >>> >>>> On Nov 12, 2013, at 6:01 PM, A

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 11:22 am, Sean Lutner wrote: > > >> On Nov 12, 2013, at 6:01 PM, Andrew Beekhof wrote: >> >> >>> On 13 Nov 2013, at 6:10 am, Sean Lutner wrote: >>> >>> The folks testing the cluster I've been building have run

Re: [Pacemaker] asymmetric clusters, remote nodes, and monitor operations

2013-11-12 Thread Andrew Beekhof
s > p-mysql_monitor_0 on cvmh01 'not installed' (5): call=319, > status=complete, last-rc-change='Wed Sep 11 13:20:08 2013', queued=42ms, > exec=0ms > > Almost all of these are instances of resources being probed on nodes that > they shouldn't be runnin

Re: [Pacemaker] The larger cluster is tested.

2013-11-12 Thread Andrew Beekhof
11:10:32 [2387] vm13 crmd: ( throttle.c:632 ) trace: > throttle_get_total_job_limit:Using batch-limit=16 > > The above shows that it is not solved even if it restricts the whole > number of jobs by batch-limit. > Are there any other methods of reducing a synchronous

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-12 Thread Andrew Beekhof
On 12 Nov 2013, at 4:42 pm, Andrey Groshev wrote: > > > 11.11.2013, 03:44, "Andrew Beekhof" : >> On 8 Nov 2013, at 7:49 am, Andrey Groshev wrote: >> >>> Hi, PPL! >>> I need help. I do not understand... Why has stopped working. >

Re: [Pacemaker] Follow up: Colocation constraint to External Managed Resource (cluster-recheck-interval="5m" ignored after 1.1.10 update?)

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 12:06 am, Robert H. wrote: > Hello, > > for PaceMaker 1.1.8 (CentOS Version) the thread > http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg18048.html was > solved with adding cluster-recheck-interval="5m", causing the LRM Its the policy engine btw. Not the lrm

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 6:10 am, Sean Lutner wrote: > The folks testing the cluster I've been building have run a script which > blocks all traffic except SSH on one node of the cluster for 15 seconds to > mimic a network failure. During this time, the network being "down" seems to > cause some od

Re: [Pacemaker] recover cib from raw file

2013-11-12 Thread Andrew Beekhof
> use > it :-) > > Regards > > > Sean O'Reilly > > On Mon 11/11/13 10:03 PM , "Andrew Beekhof" and...@beekhof.net sent: >> >> On 11 Nov 2013, at 9:41 pm, s.oreilly < >> s.orei...@linnovations.co.uk> wrote: >>> Hi, >&g

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-11 Thread Andrew Beekhof
Can you try with these two patches please? + Andrew Beekhof (4 seconds ago) fec946a: Fix: crmd: When the DC gracefully shuts down, record the new expected state into the cib (HEAD, master) + Andrew Beekhof (10 seconds ago) 740122a: Fix: crmd: When a peer expectedly shuts down, record the new

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-11 Thread Andrew Beekhof
On 12 Nov 2013, at 10:29 am, Andrew Beekhof wrote: > > On 12 Nov 2013, at 2:46 am, Vladislav Bogdanov wrote: > >> 11.11.2013 09:00, Vladislav Bogdanov wrote: >> ... >>>>>> Looking at crm-fence-peer.sh script, it would determine peer state as >>&

Re: [Pacemaker] The larger cluster is tested.

2013-11-11 Thread Andrew Beekhof
On 11 Nov 2013, at 11:48 pm, yusuke iida wrote: > Execution of the graph was also checked. > Since the number of pending(s) is restricted to 16 from the middle, it > is judged that batch-limit is effective. > Observing here, even if a job is restricted by batch-limit, two or > more jobs are alwa

Re: [Pacemaker] stonith_admin does not work as expected

2013-11-11 Thread Andrew Beekhof
Impossible to comment without knowing the pacemaker version, full config, and how fence_ifmib works (I assume its a custom agent?) On 12 Nov 2013, at 1:21 am, andreas graeper wrote: > hi, > two nodes. > n1 (slave) fence_2:stonith:fence_ifmib > n2 (master) fence_1:stonith:fence_ifmib > > n1 was

Re: [Pacemaker] crm_mon segment fault con fedora 20

2013-11-11 Thread Andrew Beekhof
> > Thanks > > > 2013/11/11 Andrew Beekhof > > On 9 Nov 2013, at 8:56 am, emmanuel segura wrote: > > > Hello Andrew, > > > > You can the file in the attachment. > > It would be very useful to know what is NULL at: > > 1196n

Re: [Pacemaker] recover cib from raw file

2013-11-11 Thread Andrew Beekhof
On 11 Nov 2013, at 9:41 pm, s.oreilly wrote: > Hi, > > Is it possible to recover/replace cib.xml from one of the raw files in > /var/lib/pacemaker/cib? > > I would like to reset the cib to the configuration referenced in cib.last. In > the > case cib-89.raw > > I haven't been able to find a

Re: [Pacemaker] The larger cluster is tested.

2013-11-11 Thread Andrew Beekhof
ned by > gio_poll_dispatch_add(). > > I attach report which tested. > https://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing > > Regards, > Yusuke > > 2013/11/8 Andrew Beekhof : >> >> On 8 Nov 2013, at 12:10 am, yusuke iida wrote:

Re: [Pacemaker] Remove a "ghost" node

2013-11-10 Thread Andrew Beekhof
On 11 Nov 2013, at 12:03 pm, Sean Lutner wrote: > > On Nov 10, 2013, at 7:54 PM, Andrew Beekhof wrote: > >> >> On 11 Nov 2013, at 11:44 am, Sean Lutner wrote: >> >>> >>> On Nov 10, 2013, at 6:27 PM, Andrew Beekhof wrote: >>> >

Re: [Pacemaker] Remove a "ghost" node

2013-11-10 Thread Andrew Beekhof
On 11 Nov 2013, at 11:44 am, Sean Lutner wrote: > > On Nov 10, 2013, at 6:27 PM, Andrew Beekhof wrote: > >> >> On 8 Nov 2013, at 12:59 pm, Sean Lutner wrote: >> >>> >>> On Nov 7, 2013, at 8:34 PM, Andrew Beekhof wrote: >>> &g

Re: [Pacemaker] Monitoring on master node not running after standby is connected

2013-11-10 Thread Andrew Beekhof
On 16 Oct 2013, at 12:21 am, Juraj Fabo wrote: > Juraj Fabo writes: >> >> Hello Andrew >> >> >> thank you for the response. >> >> I've patched crmd, cleaned the cluster, done the scenario steps and > created crm_report which is attached. >> >> After loading the cluster configuration both

Re: [Pacemaker] Pacemaker-corosync update attribute issue

2013-11-10 Thread Andrew Beekhof
On 22 Oct 2013, at 3:43 am, A66A wrote: > Hello, > I have a problem with my 2-node cluster. In some reasons one of my nodes > can't update attributes due to error - warning: attrd_cib_callback: > Update PostgreSQL-status=HS:async failed: Transport endpoint is not > connected. Where can be

Re: [Pacemaker] What value should be in the $OCF_RESKEY_CRM_meta_notify_slave_uname when a quorum is lost?

2013-11-10 Thread Andrew Beekhof
On 6 Nov 2013, at 12:42 am, Andrey Groshev wrote: > Hi All! > I am interested in this subject, because happens is the following situation. > I build cluster on four nodes with postgres master/slave configuration. > Set quorum-policy=stop > Run the cluster and conducted an experiment - turned off

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-10 Thread Andrew Beekhof
On 8 Nov 2013, at 7:49 am, Andrey Groshev wrote: > Hi, PPL! > I need help. I do not understand... Why has stopped working. > This configuration work on other cluster, but on corosync1. > > So... cluster postgres with master/slave. > Classic config as in wiki. > I build cluster, start, he is wor

Re: [Pacemaker] pacemaker - ClusterMon

2013-11-10 Thread Andrew Beekhof
On 18 Oct 2013, at 7:49 am, Denise Cosso wrote: > Hello, > > >I configured in Pacemaker ClusterMon but not receive email. Already tested > the email from the command line and working. > >I think it's the version of crm_mon > >Could anyone help me?? You have an MTA configured on

Re: [Pacemaker] DRBD promotion timeout after pacemaker stop on other node

2013-11-10 Thread Andrew Beekhof
On 5 Nov 2013, at 2:22 am, Vladislav Bogdanov wrote: > Hi Andrew, David, all, > > Just found interesting fact, don't know is it a bug or not. > > When doing service pacemaker stop on a node which has drbd resource > promoted, that resource does not promote on another node, and promote > operat

Re: [Pacemaker] Remove a "ghost" node

2013-11-10 Thread Andrew Beekhof
On 8 Nov 2013, at 12:59 pm, Sean Lutner wrote: > > On Nov 7, 2013, at 8:34 PM, Andrew Beekhof wrote: > >> >> On 8 Nov 2013, at 4:45 am, Sean Lutner wrote: >> >>> I have a confusing situation that I'm hoping to get help with. Last night >>>

Re: [Pacemaker] crm_mon segment fault con fedora 20

2013-11-10 Thread Andrew Beekhof
>container->id); ie. p *node p *node->details p *node->details->remote_rsc p *node->details->remote_rsc->container Also, if you have the actual XML, that may be useful. The tools can't read the crmsh syntax. > > > 2013/11/8 Andrew Beekhof > >

Re: [Pacemaker] Stonith question

2013-11-10 Thread Andrew Beekhof
On 9 Nov 2013, at 1:55 am, s.oreilly wrote: > Hi Chrissie, thanks I did try that and it didn't work, but then, neither has > adding the location constraints so maybe (and this is very possible) I am > doing > something else wrong!! Quite probably. But we cant say for sure without logs. > > S

Re: [Pacemaker] qb_ipcs_us_connection_acceptor: Could not accept client connection: Too many open files (24)

2013-11-08 Thread Andrew Beekhof
On 9 Nov 2013, at 7:19 am, Moturi Upendra wrote: > Hello Andrew, > > I have installed pacemaker on RHEL6.4 Please try the updated 6.4 packages from Red Hat (pacemaker 1.1.10 and libqb 0.16) > but now and then i get the following error and need to restart pacemaker > > qb_ipcs_us_connection

Re: [Pacemaker] crm_mon segment fault con fedora 20

2013-11-07 Thread Andrew Beekhof
On 6 Nov 2013, at 9:36 am, emmanuel segura wrote: > Hello everybody, > > On Fedora 20 i got a crm_mon segment fault with the following configuration > http://ur1.ca/fzndq maybe my configuration is wrong, but in any case the is > what i saw with gdb http://ur1.ca/fznf2 Best to include it in t

Re: [Pacemaker] Upgrade of Pacemaker on CentOS 6.4 to 1.1.10 - Delay RA missing ...

2013-11-07 Thread Andrew Beekhof
On 7 Nov 2013, at 9:30 pm, Robert H. wrote: >> This does a reasonable job of explaining: >> http://blog.clusterlabs.org/blog/2013/pacemaker-and-rhel-6-dot-4/ > > I see, thanks for the hint ... small step for man, huge step for mankind .. > (or something like this :)) > >> I would be interes

Re: [Pacemaker] Remove a "ghost" node

2013-11-07 Thread Andrew Beekhof
On 8 Nov 2013, at 4:45 am, Sean Lutner wrote: > I have a confusing situation that I'm hoping to get help with. Last night > after configuring STONITH on my two node cluster, I suddenly have a "ghost" > node in my cluster. I'm looking to understand the best way to remove this > node from the c

Re: [Pacemaker] location ping rules

2013-11-07 Thread Andrew Beekhof
On 7 Nov 2013, at 9:34 pm, s.oreilly wrote: > Having some trouble getting a location rule to work. > > Here is my current config: > > > Resources: > Master: master_drbd > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true > Resource: drbd_mysql (class=ocf

Re: [Pacemaker] Simple installation Pacemaker + CMAN + fence-agents

2013-11-07 Thread Andrew Beekhof
Something seems very wrong with this at the corosync level. Even fenced and the dlm are having issues. Jan: Could this be firewall related? On 27 Sep 2013, at 10:44 pm, Bartłomiej Wójcik wrote: > W dniu 2013-09-27 04:26, Andrew Beekhof pisze: >> On 26/09/2013, at 8:35 PM, Bartłomi

Re: [Pacemaker] two design questions about active/active mode?

2013-11-07 Thread Andrew Beekhof
On 21 Aug 2013, at 1:50 pm, Wen Wen (NCS) wrote: > Hi all, > I am doing dual nodes practice. > I use CentOS 6.3 x86_64 pacemaker DRBD and GFS2 for my cluster. > I already test many times I have a design question. > > Here is my crm status on one node after I set this node from standby to onli

Re: [Pacemaker] /var/lib/pacemaker/cores cleanup

2013-11-07 Thread Andrew Beekhof
On 8 Nov 2013, at 10:27 am, Andrew Beekhof wrote: > > On 7 Oct 2013, at 5:52 pm, Mailing List SVR wrote: > >> Il 07/10/2013 04:16, Andrew Beekhof ha scritto: >>> On 05/10/2013, at 7:11 AM, Mailing List SVR >>> wrote: >>> >>> >>

Re: [Pacemaker] /var/lib/pacemaker/cores cleanup

2013-11-07 Thread Andrew Beekhof
On 7 Oct 2013, at 5:52 pm, Mailing List SVR wrote: > Il 07/10/2013 04:16, Andrew Beekhof ha scritto: >> On 05/10/2013, at 7:11 AM, Mailing List SVR >> wrote: >> >> >>> Hi, >>> >>> I have a pacemaker cluster running fine since 2 months

Re: [Pacemaker] The larger cluster is tested.

2013-11-07 Thread Andrew Beekhof
Did it help at all? > > Regards, > Yusuke > > 2013/11/7 Andrew Beekhof : >> >> On 7 Nov 2013, at 12:43 pm, yusuke iida wrote: >> >>> Hi, Andrew >>> >>> 2013/11/7 Andrew Beekhof : >>>> >>>> On 6 Nov 2013, at 4:48 pm, y

Re: [Pacemaker] The larger cluster is tested.

2013-11-06 Thread Andrew Beekhof
On 7 Nov 2013, at 12:43 pm, yusuke iida wrote: > Hi, Andrew > > 2013/11/7 Andrew Beekhof : >> >> On 6 Nov 2013, at 4:48 pm, yusuke iida wrote: >> >>> Hi, Andrew >>> >>> I tested by the following version

Re: [Pacemaker] The larger cluster is tested.

2013-11-06 Thread Andrew Beekhof
://github.com/yuusuke/pacemaker/commit/17a7cbe67c455f5f6d36a1e1bc255b4ab0039dd8 > > load-threshold 80% and CPG G_PRIORITY_DEFAULT test report > https://drive.google.com/file/d/0BwMFJItoO-fVV1BoTjVQMk52WEU/edit?usp=sharing > > 2013/11/6 Andrew Beekhof : >> >> On 5 Nov 2013,

Re: [Pacemaker] nofile limit with pacemaker

2013-11-06 Thread Andrew Beekhof
On 6 Nov 2013, at 6:08 pm, Daniel Jung wrote: > Hi all, > > I came across a problem with no file limit for one of the primitives(slapd) > running in pacemaker. To verify that this is result of running in pacemaker, > i put the node in standby and start up Slapd manually, the nofile limit is

Re: [Pacemaker] Upgrade of Pacemaker on CentOS 6.4 to 1.1.10 - Delay RA missing ...

2013-11-06 Thread Andrew Beekhof
On 7 Nov 2013, at 12:15 am, Robert Heinzmann (pacemaker) wrote: > Hello, > > I just upgraded to the latest CentOS release of Pacemaker (released on > 1.11.2013) on our test environment and we get "not installed" messages > for "ocf:heartbeat:Delay". > > It seems that the ressource agent for D

Re: [Pacemaker] The larger cluster is tested.

2013-11-05 Thread Andrew Beekhof
md/32) > > Since size is large, I want you to download from the following. > https://drive.google.com/file/d/0BwMFJItoO-fVWDg1Sjc2WXltUjQ/edit?usp=sharing > > Regards, > Yusuke > > 2013/10/31 Andrew Beekhof : >> >> On 29 Oct 2013, at 12:12 am, yusuke ii

Re: [Pacemaker] Pacemaker crash on node unstandby/standby.

2013-11-03 Thread Andrew Beekhof
On 19 Oct 2013, at 12:45 pm, Justin Burnham wrote: > Hi, > > I am having an issue with Pacemaker on the cman stack where I can reliably > cause pacemaker to crash and coredump when I put the node in standby or > unstandby. Here is my messages log from when I do a unstandby like so "pcs > clu

Re: [Pacemaker] Resource only failsover in one direction

2013-11-03 Thread Andrew Beekhof
On 24 Oct 2013, at 10:45 am, Lucas Brown wrote: >> Date: Tue, 22 Oct 2013 09:32:59 +0200 >> From: emmanuel segura >> To: >> "cae7pj3av7tbgcfjya5leayqwew4sfecsrlo55wqrfjv7joo...@mail.gmail.com" >> , >> The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] Resource on

<    3   4   5   6   7   8   9   10   11   12   >