[Pacemaker] Various Problems with Pacemaker and/or related Software

2014-02-20 Thread Stephan Buchner
Hello everyone, we are having some problems with pacemaker and/or related software. I hope you can help to shed some light on the issues we are facing. Our Setup consists of 2 nodes and 5 services running on these nodes. Here comes the first problem, i will show you the output of crm_mon -1:

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 6:06 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew I tested in the following environments. KVM virtual 16 machines CPU: 1 memory: 2048MB OS: RHEL6.4 Pacemaker-1.1.11(709b36b) corosync-2.3.2 libqb-0.16.0 It looks like performance is much better on the

[Pacemaker] Migrating resources on custom conditions

2014-02-20 Thread Dan Markhasin
Hi, I am wondering if it is possible to configure complex/custom migration rules, so resources would migrate in case there is a problem with the current node it is running on. i.e. If the node has a bad disk, or high load, the resource should be migrated to a different node. I didn't find any

Re: [Pacemaker] possible regex error in pcs resource enable/disable

2014-02-20 Thread Lars Marowsky-Bree
On 2014-02-19T14:39:30, Bob Haxo bh...@sgi.com wrote: Chris, was easy to duplicate ... I thought that I had cleared the error, but that had not happened. Bob Haxo [root@mici-admin ~]# pcs resource disable virt [root@mici-admin ~]# pcs resource disable libvirtd-clone Error: Error

[Pacemaker] Question about log level at monitor

2014-02-20 Thread Kazunori INOUE
Hi, Is this by design although log levels differ with a stonith resource and other resources in Pacemaker-1.1.11 ? P1 is id of ocf:pacemaker:Dummy resource. F1 is id of stonith (ex. stonith:external/ipmi) resource. * log at probe crmd[22860]: notice: process_lrm_event: LRM operation

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-20 Thread yusuke iida
Hi, Andrew 2014-02-20 17:28 GMT+09:00 Andrew Beekhof and...@beekhof.net: Who was pid 16243? Doesn't look like a pacemaker daemon. pid 16243 is crm_mon. In vm01, crm_mon was started and the state was checked. If there is information required for analysis to other, I get it. Regards, Yusuke

Re: [Pacemaker] hangs pending

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 5:33 pm, Andrey Groshev gre...@yandex.ru wrote: 20.02.2014, 01:22, Andrew Beekhof and...@beekhof.net: On 20 Feb 2014, at 4:18 am, Andrey Groshev gre...@yandex.ru wrote: 19.02.2014, 06:47, Andrew Beekhof and...@beekhof.net: On 18 Feb 2014, at 9:29 pm, Andrey Groshev

Re: [Pacemaker] Question about log level at monitor

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 8:37 pm, Kazunori INOUE kazunori.ino...@gmail.com wrote: Hi, Is this by design although log levels differ with a stonith resource and other resources in Pacemaker-1.1.11 ? P1 is id of ocf:pacemaker:Dummy resource. F1 is id of stonith (ex. stonith:external/ipmi)

Re: [Pacemaker] possible regex error in pcs resource enable/disable

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 8:03 pm, Lars Marowsky-Bree l...@suse.com wrote: On 2014-02-19T14:39:30, Bob Haxo bh...@sgi.com wrote: Chris, was easy to duplicate ... I thought that I had cleared the error, but that had not happened. Bob Haxo [root@mici-admin ~]# pcs resource disable virt

Re: [Pacemaker] node1 fencing itself after node2 being fenced

2014-02-20 Thread Asgaroth
I would really love to see logs at this point. Both from pacemaker and the system in general (and clvmd if it produces any). Based on what you say below, there doesn't seem to be a good reason for the hang (ie. no reason to be trying to fence anyone) I will try to get some logs to you

Re: [Pacemaker] hangs pending

2014-02-20 Thread Andrey Groshev
20.02.2014, 13:57, Andrew Beekhof and...@beekhof.net: On 20 Feb 2014, at 5:33 pm, Andrey Groshev gre...@yandex.ru wrote:  20.02.2014, 01:22, Andrew Beekhof and...@beekhof.net:  On 20 Feb 2014, at 4:18 am, Andrey Groshev gre...@yandex.ru wrote:   19.02.2014, 06:47, Andrew Beekhof

Re: [Pacemaker] Load balancing with CLUSTERIP

2014-02-20 Thread Jefferson Carlos Machado
Hi, All work fine on node, but not in other PC. Please see below the ping tests and configs. [root@europanorte vs_files]# ping -c 3 10.10.2.97 PING 10.10.2.97 (10.10.2.97) 56(84) bytes of data. 64 bytes from 10.10.2.97: icmp_seq=1 ttl=64 time=5.44 ms 64 bytes from 10.10.2.97: icmp_seq=2 ttl=64

[Pacemaker] pacemaker/corosync on CentOS 6.4/6.5 node offline after update

2014-02-20 Thread fatcharly
Hi, Im using a pacemaker/corosync 2 node cluster on an CentOS 6.4 to provide a loadbalancer-service via pound. After a update with yum the updatet node is not able to work in the cluster again. Here is the cmr_mon and some facts about the updatet node powerpound: Linux powerpound

Re: [Pacemaker] possible regex error in pcs resource enable/disable

2014-02-20 Thread Bob Haxo
Andrew, Lars, Yes, I have determined that this error is the result of mixing crmsh stop/start with pcs disable/enable (or maybe pcs stop/start mis-usage) commands. I've started to respond with how this happens, but have been pulled off to a higher priority task. Back when I have the resolved.

Re: [Pacemaker] possible regex error in pcs resource enable/disable

2014-02-20 Thread Andrew Beekhof
On 21 Feb 2014, at 4:30 am, Bob Haxo bh...@sgi.com wrote: Andrew, Lars, Yes, I have determined that this error is the result of mixing crmsh stop/start with pcs disable/enable (or maybe pcs stop/start mis-usage) commands. Specifically it will be when you use pcs first and crmsh

Re: [Pacemaker] [PATCH] update Clusters-From-Scratch to latest pcs syntax

2014-02-20 Thread Andrew Beekhof
thanks! On 21 Feb 2014, at 3:01 am, Christine Caulfield ccaul...@redhat.com wrote: Ap-Configuration.txt |4 ++-- Ch-Active-Active.txt | 10 +- Ch-Active-Passive.txt |4 ++-- Ch-Apache.txt |4 ++-- Ch-Installation.txt |2 +- Ch-Shared-Storage.txt |

Re: [Pacemaker] pacemaker/corosync on CentOS 6.4/6.5 node offline after update

2014-02-20 Thread Andrew Beekhof
On 21 Feb 2014, at 3:11 am, fatcha...@gmx.de wrote: Hi, Im using a pacemaker/corosync 2 node cluster on an CentOS 6.4 to provide a loadbalancer-service via pound. After a update with yum the updatet node is not able to work in the cluster again. Here is the cmr_mon and some facts

Re: [Pacemaker] Various Problems with Pacemaker and/or related Software

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 7:08 pm, Stephan Buchner buch...@linux-systeme.de wrote: Hello everyone, we are having some problems with pacemaker and/or related software. I hope you can help to shed some light on the issues we are facing. Our Setup consists of 2 nodes and 5 services running on these

Re: [Pacemaker] possible regex error in pcs resource enable/disable

2014-02-20 Thread Bob Haxo
Andrew, From my vantage point, it will be most unfortunate if pcs and crm cannot both be used for routine start and stop of resources, and other routine tasks. I am using pcs (almost exclusively) for my RHEL6.5 Pacemaker port. But, I am **REALLY** hoping that I can present a relatively

Re: [Pacemaker] possible regex error in pcs resource enable/disable

2014-02-20 Thread Andrew Beekhof
On 21 Feb 2014, at 10:25 am, Bob Haxo bh...@sgi.com wrote: Andrew, From my vantage point, it will be most unfortunate if pcs and crm cannot both be used for routine start and stop of resources, and other routine tasks. Agreed. AFAICS, it's crmsh thats creating the duplicate entries

Re: [Pacemaker] hangs pending

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 10:04 pm, Andrey Groshev gre...@yandex.ru wrote: 20.02.2014, 13:57, Andrew Beekhof and...@beekhof.net: On 20 Feb 2014, at 5:33 pm, Andrey Groshev gre...@yandex.ru wrote: 20.02.2014, 01:22, Andrew Beekhof and...@beekhof.net: On 20 Feb 2014, at 4:18 am, Andrey

Re: [Pacemaker] Migrating resources on custom conditions

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 7:34 pm, Dan Markhasin minimi...@gmail.com wrote: Hi, I am wondering if it is possible to configure complex/custom migration rules, so resources would migrate in case there is a problem with the current node it is running on. i.e. If the node has a bad disk, or

Re: [Pacemaker] possible regex error in pcs resource enable/disable

2014-02-20 Thread Bob Haxo
-20140220-0855.xml [root@mici-admin ~]# diff -u cib-config-initial-enabled-20140220-0854.xml cib-config-pcs-disable-20140220-0855.xml --- cib-config-initial-enabled-20140220-0854.xml2014-02-20 10:54:18.0 -0600 +++ cib-config-pcs-disable-20140220-0855.xml2014-02-20 10:54

Re: [Pacemaker] Load balancing with CLUSTERIP

2014-02-20 Thread Jefferson Carlos Machado
Hi, All work fine on node, but not in other PC. Please see below the ping tests and configs. [root@europanorte vs_files]# ping -c 3 10.10.2.97 PING 10.10.2.97 (10.10.2.97) 56(84) bytes of data. 64 bytes from 10.10.2.97: icmp_seq=1 ttl=64 time=5.44 ms 64 bytes from 10.10.2.97: icmp_seq=2 ttl=64

Re: [Pacemaker] hangs pending

2014-02-20 Thread Andrew Beekhof
On 19 Feb 2014, at 7:53 pm, Andrey Groshev gre...@yandex.ru wrote: 19.02.2014, 09:49, Andrew Beekhof and...@beekhof.net: On 19 Feb 2014, at 4:18 pm, Andrey Groshev gre...@yandex.ru wrote: 19.02.2014, 09:08, Andrew Beekhof and...@beekhof.net: On 19 Feb 2014, at 4:00 pm, Andrey Groshev

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 8:39 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew 2014-02-20 17:28 GMT+09:00 Andrew Beekhof and...@beekhof.net: Who was pid 16243? Doesn't look like a pacemaker daemon. pid 16243 is crm_mon. That means that the state displayed by crm_mon was 500 updates

[Pacemaker] pcs cluster status options seems to not work

2014-02-20 Thread Bob Haxo
Per 3.5 of the Configuring the RH HA Add-on with Pacemaker document, these should return different information. They do not. What I really need is something that gives this output so that I can quickly script a check whether a resource is running ... [root@mici-admin ~]# crm resource show

Re: [Pacemaker] pcs cluster status options seems to not work

2014-02-20 Thread Andrew Beekhof
On 21 Feb 2014, at 2:49 pm, Bob Haxo bh...@sgi.com wrote: Per 3.5 of the Configuring the RH HA Add-on with Pacemaker document, these should return different information. They do not. What I really need is something that gives this output so that I can quickly script a check whether a

Re: [Pacemaker] hangs pending

2014-02-20 Thread Andrew Beekhof
btw. Whats with all these entries: Feb 19 10:49:27 [1641] dev-cluster2-node2.unix.tensor.ru pacemakerd: info: crm_log_init:Changed active directory to /var/lib/heartbeat/cores/root Feb 19 10:49:27 [1641] dev-cluster2-node2.unix.tensor.ru pacemakerd: info: crm_xml_cleanup:

Re: [Pacemaker] hangs pending

2014-02-20 Thread Andrey Groshev
21.02.2014, 10:18, Andrew Beekhof and...@beekhof.net: btw. Whats with all these entries: Feb 19 10:49:27 [1641] dev-cluster2-node2.unix.tensor.ru pacemakerd: info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root Feb 19 10:49:27 [1641]

Re: [Pacemaker] pcs cluster status options seems to not work

2014-02-20 Thread Chris Feist
[root@mici-admin ~]# crm resource show libvirtd-clone resource libvirtd-clone is running on: mici-admin-ptp resource libvirtd-clone is running on: mici-admin2-ptp ... but does not use crmsh. [root@mici-admin ~]# pcs cluster status Can you try 'pcs status'? Does that give you better

Re: [Pacemaker] Migrating resources on custom conditions

2014-02-20 Thread Dan Markhasin
Good idea, thanks. :-) On Fri, Feb 21, 2014 at 1:58 AM, Andrew Beekhof and...@beekhof.net wrote: On 20 Feb 2014, at 7:34 pm, Dan Markhasin minimi...@gmail.com wrote: Hi, I am wondering if it is possible to configure complex/custom migration rules, so resources would migrate in case

Re: [Pacemaker] possible regex error in pcs resource enable/disable

2014-02-20 Thread Lars Marowsky-Bree
On 2014-02-20T16:03:36, Bob Haxo bh...@sgi.com wrote: Sooo, seems that we need to kick this over the fence to the crmsh folks ... or the SUSE folks, if they are maintaining crmsh. Yes, please file a bug. This shouldn't happen (I thought we had exorcized this class of bugs.)