Re: [Pacemaker] Pacemaker Digest, Vol 66, Issue 58

2013-05-16 Thread Wolfgang Routschka
Hi Andreas, thanks for your answer, crm_simulate -s -L (node2 is offline - r_postfix is running on node1) native_color: r_haproxy allocation score on node1: -INFINITY native_color: r_haproxy allocation score on node2: -INFINITY crm_simulate -s -L (both nodes are online - r_postfix is running on

Re: [Pacemaker] Stonith: How to avoid deathmatch cluster partitioning

2013-05-16 Thread Lars Marowsky-Bree
On 2013-05-15T22:55:43, Andreas Kurz wrote: > start-delay is an option of the monitor operation ... in fact means > "don't trust that start was successfull, wait for the initial monitor > some more time" It can be used on start here though to avoid exactly this situation; and it works fine for t

Re: [Pacemaker] Stonith: How to avoid deathmatch cluster partitioning

2013-05-16 Thread Klaus Darilion
Hi Andreas! On 15.05.2013 22:55, Andreas Kurz wrote: On 2013-05-15 15:34, Klaus Darilion wrote: On 15.05.2013 14:51, Digimer wrote: On 05/15/2013 08:37 AM, Klaus Darilion wrote: primitive st-pace1 stonith:external/xen0 \ params hostlist="pace1" dom0="xentest1" \ op start s

Re: [Pacemaker] error with cib synchronisation on disk

2013-05-16 Thread Халезов Иван
On 16.05.2013 07:14, Andrew Beekhof wrote: On 15/05/2013, at 9:53 PM, Халезов Иван wrote: Hello everyone! Some problems occured with synchronisation CIB configuration to disk. I have this errors in pacemaker's logfile: What were the messages before this? Did it happen once or many times? At

Re: [Pacemaker] pacemaker colocation after one node is down

2013-05-16 Thread Wolfgang Routschka
Hi Andreas, thank you for your answer. solutions is one coloation with -score colocation cl_g_ip-address_not_on_r_postfix -1: g_ip-address r_postfix Greetings Wolfgang On 2013-05-15 21:30, Wolfgang Routschka wrote: > Hi everybody, > > one question today about colocation rule on a 2-node clu

[Pacemaker] stonith-ng: error: remote_op_done: Operation reboot of node2 by node1 for stonith_admin: Timer expired

2013-05-16 Thread Brian J. Murrell
Using Pacemaker 1.1.8 on EL6.4 with the pacemaker plugin, I'm finding strange behavior with "stonith-admin -B node2". It seems to shut the node down but not start it back up and ends up reporting a timer expired: # stonith_admin -B node2 Command failed: Timer expired The pacemaker log for the op

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-16 Thread Andrew Widdersheim
The cluster has 3 connections total. The first connection is the outside interface where services can communicate and is also used for cluster communication using mcast. The second interface is a cross-over that is solely for cluster communication. The third connection is another cross-over sole

Re: [Pacemaker] pacemaker colocation after one node is down

2013-05-16 Thread Andreas Kurz
On 2013-05-16 13:42, Wolfgang Routschka wrote: > Hi Andreas, > > thank you for your answer. > > solutions is one coloation with -score ah, yes only _one_ of them with a non-negative value is needed. Scores of all constraints are added up. Regards, Andreas > > colocation cl_g_ip-address_n

Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?

2013-05-16 Thread Rainer Brestan
The bug is in the function is_normal_node. This function checks the attribute "type" for state "normal". But this attribute is not used any more.   CIB output from Pacemaker 1.1.8                             CIB output from Pacemaker 1.1.7

[Pacemaker] having problem with crm cib shadow

2013-05-16 Thread George Gibat
crm(live)cib# use gfs2 ERROR: gfs2: no such shadow CIB crm(live)cib# new gfs2 A shadow instance 'gfs2' already exists. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. crm(live)cib# list crm(live)cib# use gfs2 ERROR: gfs2: no such shadow CIB crm(

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-16 Thread Andrew Martin
Andrew, I'd recommend adding more than one host to your p_ping resource and see if that improves the situation. When I had this problem, I observed better behavior after adding more than one IP to the list of hosts and changing the p_ping location constraint to be as follows: location loc_run_o

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-16 Thread Andrew Widdersheim
Thanks for the help. Adding another node to the ping host_list may help in some  situations but the root issues doesn't really get solved. Also, the location constraint you posted is very different than mine. Your constraint requires connectivity where as the one I am trying to use looks for best

Re: [Pacemaker] having problem with crm cib shadow

2013-05-16 Thread John McCabe
Which Linux distribution and version of pacemaker are you using? /John On Thursday, 16 May 2013, George Gibat wrote: > crm(live)cib# use gfs2 > ERROR: gfs2: no such shadow CIB > crm(live)cib# new gfs2 > A shadow instance 'gfs2' already exists. > To prevent accidental destruction of the cluster,

Re: [Pacemaker] having problem with crm cib shadow

2013-05-16 Thread George G. Gibat
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 centos 6.4, pacemaker 1.1.8-7.el6 On 2013-05-16 18:57, John McCabe wrote: > Which Linux distribution and version of pacemaker are you using? /John > > On Thursday, 16 May 2013, George Gibat wrote: > > crm(live)cib# use gfs2 ERROR: gfs2: no such sha

[Pacemaker] question about interface failover

2013-05-16 Thread christopher barry
Greetings, I've setup a new 2-node mysql cluster using * drbd 8.3.1.3 * corosync 1.4.2 * pacemaker 117 on Debian Wheezy nodes. failover seems to be working fine for everything except the ips manually configured on the interfaces. see config here: http://pastebin.aquilenet.fr/?9eb51f6fb7d65fda#/Y

Re: [Pacemaker] having problem with crm cib shadow

2013-05-16 Thread John McCabe
Worth trying crm_shadow as described here - http://www.gossamer-threads.com/lists/linuxha/pacemaker/84969 I had the same problem and took it as a sign that I should just move to pcs (from the RHEL repo, not the latest source), which went pretty smoothly, only had a few problems with assigning para

[Pacemaker] pacemaker-remote tls handshaking

2013-05-16 Thread Lindsay Todd
I've built pacemaker 1.1.10rc2 and am trying to get the pacemaker-remote features working on my Scientific Linux 6.4 system. It almost works... The /etc/pacemaker/authkey file is on all the cluster nodes, as well as my test VM (readable to all users, and checksums are the same everywhere). I can

[Pacemaker] mysql ocf resource agent - resource stays unmanaged if binary unavailable

2013-05-16 Thread Vladimir
Hi, our pacemaker setup provides mysql resource using ocf resource agent. Today I tested with my colleagues forcing mysql resource to fail. I don't understand the following behaviour. When I remove the mysqld_safe binary (which path is specified in crm config) from one server and moving the mysql

Re: [Pacemaker] pacemaker-remote tls handshaking

2013-05-16 Thread David Vossel
- Original Message - > From: "Lindsay Todd" > To: "The Pacemaker cluster resource manager" > Sent: Thursday, May 16, 2013 3:44:09 PM > Subject: [Pacemaker] pacemaker-remote tls handshaking > > I've built pacemaker 1.1.10rc2 and am trying to get the pacemaker-remote > features working on

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-16 Thread renayama19661014
Hi Andrew, Hi Vladislav, I try whether this correction is effective for this problem. * https://github.com/beekhof/pacemaker/commit/eb6264bf2db395779e65dadf1c626e050a388c59 Best Regards, Hideo Yamauchi. --- On Thu, 2013/5/16, Andrew Beekhof wrote: > > On 16/05/2013, at 3:49 PM, Vladislav Bo

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-16 Thread Andrew Widdersheim
Just tried the patch you gave and it worked fine. Any plans on putting this patch in officially or was this a one off? Aside from this patch I guess the only thing to get things to work is to install things slightly differently and adding a symlink from cluster-glue's lrmd to pacemakers. > Subj

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-16 Thread Andrew Beekhof
On 17/05/2013, at 11:38 AM, Andrew Widdersheim wrote: > Just tried the patch you gave and it worked fine. Any plans on putting this > patch in officially or was this a one off? It will be in 1.1.10-rc3 "soon" > Aside from this patch I guess the only thing to get things to work is to > instal

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-16 Thread Andrew Widdersheim
I'm attaching 3 patches I made fairly quickly to fix the installation issues and also an issue I noticed with the ping ocf from the latest pacemaker.  One is for cluster-glue to prevent lrmd from building and later installing. May also want to modify this patch to take lrmd out of both spec file

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-16 Thread Andrew Beekhof
On 17/05/2013, at 10:27 AM, renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > Hi Vladislav, > > I try whether this correction is effective for this problem. > * > https://github.com/beekhof/pacemaker/commit/eb6264bf2db395779e65dadf1c626e050a388c59 > Doubtful, it just reduces code duplication.

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-16 Thread Vladislav Bogdanov
Hi Hideo-san, You may try the following patch (with trick below) >From 2c4418d11c491658e33c149f63e6a2f2316ef310 Mon Sep 17 00:00:00 2001 From: Vladislav Bogdanov Date: Fri, 17 May 2013 05:58:34 + Subject: [PATCH] Feature: PE: Unlink pengine output files before writing. This should help guys