Re: [Pacemaker] mysql ocf resource agent - resource stays unmanaged if binary unavailable

2013-05-17 Thread Andrew Beekhof
On 18/05/2013, at 6:49 AM, Andreas Kurz wrote: > On 2013-05-17 00:24, Vladimir wrote: >> Hi, >> >> our pacemaker setup provides mysql resource using ocf resource agent. >> Today I tested with my colleagues forcing mysql resource to fail. I >> don't understand the following behaviour. When I rem

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-16 Thread Andrew Beekhof
uces code duplication. But it would also be a single place to put a deployment specific patch :) > Best Regards, > Hideo Yamauchi. > > --- On Thu, 2013/5/16, Andrew Beekhof wrote: > >> >> On 16/05/2013, at 3:49 PM, Vladislav Bogdanov wrote: >> >>> 1

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-16 Thread Andrew Beekhof
On 17/05/2013, at 11:38 AM, Andrew Widdersheim wrote: > Just tried the patch you gave and it worked fine. Any plans on putting this > patch in officially or was this a one off? It will be in 1.1.10-rc3 "soon" > Aside from this patch I guess the only thing to get things to work is to > instal

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-15 Thread Andrew Beekhof
On 16/05/2013, at 3:49 PM, Vladislav Bogdanov wrote: > 16.05.2013 02:46, Andrew Beekhof wrote: >> >> On 15/05/2013, at 6:44 PM, Vladislav Bogdanov wrote: >> >>> 15.05.2013 11:18, Andrew Beekhof wrote: >>>> >>>> On 15/05/2013, at 5:31 PM

[Pacemaker] pcs/crmsh Cheat sheet

2013-05-15 Thread Andrew Beekhof
By popular request, I've taken a stab at a cheat-sheet for those switching between pcs and crmsh. https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md Any and all assistance expanding it and ensuring it is accurate will be gratefully received. -- Andrew

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-15 Thread Andrew Beekhof
On 16/05/2013, at 3:16 PM, Andrew Widdersheim wrote: > I'll look into moving over to the cman option since that is preferred for > RHEL6.4 now if I'm not mistaken. Correct > I'll also try out the patch provided and see how that goes. So was LRMD not > apart of pacemaker previously and later

Re: [Pacemaker] Announcing Pacemaker Remote - extending high availability outside the cluster stack

2013-05-15 Thread Andrew Beekhof
We've been tossing around ideas like this for many many years, its very cool that you've been able to make it a reality. I'm both excited and scared to see what people do with it :) Good work! On 16/05/2013, at 4:17 AM, David Vossel wrote: > Hi, > > I'm excited to announce the initial develop

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-15 Thread Andrew Beekhof
On 16/05/2013, at 2:52 PM, Andrew Beekhof wrote: > > On 16/05/2013, at 2:03 PM, Andrew Widdersheim > wrote: > >> There are quite a few symlinks of heartbeat pieces back to pacemaker pieces >> like crmd as an example but lrmd was not one of them: >> >>

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-15 Thread Andrew Beekhof
On 16/05/2013, at 2:03 PM, Andrew Widdersheim wrote: > There are quite a few symlinks of heartbeat pieces back to pacemaker pieces > like crmd as an example but lrmd was not one of them: > > [root@node1 ~]# ls -lha /usr/lib64/heartbeat/crmd > lrwxrwxrwx 1 root root 27 May 14 17:31 /usr/lib64/h

Re: [Pacemaker] error with cib synchronisation on disk

2013-05-15 Thread Andrew Beekhof
On 15/05/2013, at 9:53 PM, Халезов Иван wrote: > Hello everyone! > > Some problems occured with synchronisation CIB configuration to disk. > I have this errors in pacemaker's logfile: What were the messages before this? Did it happen once or many times? At startup or while the cluster was ru

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-15 Thread Andrew Beekhof
On 16/05/2013, at 10:21 AM, Andrew Widdersheim wrote: > These are the libqb versions: > > libqb-devel-0.14.2-3.el6.x86_64 > libqb-0.14.2-3.el6.x86_64 > > Here is a process listing where lrmd is running: > [root@node1 ~]# ps auxwww | egrep "heartbeat|pacemaker" > root 9553 0.1 0.7 52420

Re: [Pacemaker] failure handling on a cloned resource

2013-05-15 Thread Andrew Beekhof
bugs are fixed :) we're at rc2 now, rc3 should be today/tomorrow > I will currently use the head build I created. This is ok for my testsetup > but I don't want to run this version in production > > Greetings, > Johan Huysmans > > On 2013-05-10 06:55, Andrew Beekh

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-15 Thread Andrew Beekhof
On 15/05/2013, at 6:44 PM, Vladislav Bogdanov wrote: > 15.05.2013 11:18, Andrew Beekhof wrote: >> >> On 15/05/2013, at 5:31 PM, Vladislav Bogdanov wrote: >> >>> 15.05.2013 10:25, Andrew Beekhof wrote: >>>> >>>> On 15/05/2013, at 3:50 PM

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-15 Thread Andrew Beekhof
On 15/05/2013, at 5:31 PM, Vladislav Bogdanov wrote: > 15.05.2013 10:25, Andrew Beekhof wrote: >> >> On 15/05/2013, at 3:50 PM, Vladislav Bogdanov wrote: >> >>> 15.05.2013 08:23, Andrew Beekhof wrote: >>>> >>>> On 15/05/2013, at 3:1

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-15 Thread Andrew Beekhof
On 15/05/2013, at 3:50 PM, Vladislav Bogdanov wrote: > 15.05.2013 08:23, Andrew Beekhof wrote: >> >> On 15/05/2013, at 3:11 PM, renayama19661...@ybb.ne.jp wrote: >> >>> Hi Andrew, >>> >>> Thank you for comments. >>> >>>>

Re: [Pacemaker] pcs group colocation and ping rules

2013-05-14 Thread Andrew Beekhof
On 14/05/2013, at 12:08 AM, Diego Remolina wrote: > On 05/13/2013 10:03 AM, Diego Remolina wrote: >> Hi, >> >> I was wondering if anybody can tell me what is the best way to replicate >> the following crm commands in pcs. It seems pcs cannot do any >> collocation rules using groups (or I just

Re: [Pacemaker] Pacemaker still may include memory leaks

2013-05-14 Thread Andrew Beekhof
On 15/05/2013, at 2:18 PM, Yuichi SEINO wrote: > 2013/5/15 Andrew Beekhof : >> >> On 15/05/2013, at 12:22 PM, Yuichi SEINO wrote: >> >>> Hi, >>> >>> I ran the test for about two days. >>> >>>

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-14 Thread Andrew Beekhof
be necessary to evade this problem to use Pacemaker in > vSphere5.1 environment. > > Best Regards, > Hideo Yamauchi. > > > --- On Wed, 2013/5/15, Andrew Beekhof wrote: > >> >> On 13/05/2013, at 4:14 PM, renayama19661...@ybb.ne.jp wrote: >> >>> Hi All,

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-14 Thread Andrew Beekhof
On 13/05/2013, at 4:14 PM, renayama19661...@ybb.ne.jp wrote: > Hi All, > > We constituted a simple cluster in environment of vSphere5.1. > > We composed it of two ESXi servers and shared disk. > > The guest located it to the shared disk. What is on the shared disk? The whole OS or app-specif

Re: [Pacemaker] Pacemaker still may include memory leaks

2013-05-14 Thread Andrew Beekhof
On 15/05/2013, at 12:22 PM, Yuichi SEINO wrote: > Hi, > > I ran the test for about two days. > > Environment > > OS:RHEL 6.3 > pacemaker-1.1.9-devel (commit 138556cb0b375a490a96f35e7fbeccc576a22011) > corosync-2.3.0 > cluster-glue > latest+patch(detail:http://www.gossamer-threads.com/lists/l

Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8

2013-05-12 Thread Andrew Beekhof
On 10/05/2013, at 6:16 PM, pavan tc wrote: > > > > > Well, and also Pacemaker's crmd process. > > My guess... the node is overloaded which is causing the cib queries to time > > out. > > > > > > Is there a cib query timeout value that I can set? > > No. You can set the batch-limit property t

Re: [Pacemaker] ClusterMon Resource starting multiple instances of crm_mon

2013-05-12 Thread Andrew Beekhof
On 10/05/2013, at 7:35 PM, Steven Bambling wrote: >>> If this is correct what is the best practice for monitoring additional >>> resource states? >> >> Define "additional"? >> If the resource fails we'll normally recover it automatically. > An example of an additional resource would be a vip u

Re: [Pacemaker] ClusterMon Resource starting multiple instances of crm_mon

2013-05-12 Thread Andrew Beekhof
On 10/05/2013, at 8:08 PM, Steven Bambling wrote: > > On May 10, 2013, at 5:35 AM, Steven Bambling wrote: > >> >> On May 9, 2013, at 8:05 PM, Andrew Beekhof wrote: >> >>> >>> On 10/05/2013, at 12:40 AM, Steven Bambling wrote: >>>

Re: [Pacemaker] detecting resource failures after maintenance

2013-05-12 Thread Andrew Beekhof
On 11/05/2013, at 1:53 AM, Jeffrey Lewis wrote: > It seems pacemaker is not properly detecting resource failures after > maintenance. Example follows. > > Pacemaker is managing two IPaddr2 resources. Both resources are > online, and all is well. > > jlewis@qa3db22:~$ sudo crm resource show >

Re: [Pacemaker] SmartOS / illumos

2013-05-12 Thread Andrew Beekhof
tible pointer type [-Werror] > cc1: all warnings being treated as errors > gmake[2]: *** [ipc.lo] Error 1 > gmake[2]: Leaving directory `/root/pacemaker/lib/common' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory `/root/pacemaker/lib' > make: *** [cor

Re: [Pacemaker] resource starts but then fails right away

2013-05-12 Thread Andrew Beekhof
On 10/05/2013, at 9:23 PM, Brian J. Murrell wrote: > On 13-05-09 09:53 PM, Andrew Beekhof wrote: >> >> May 7 02:36:16 node1 crmd[16836]: info: delete_resource: Removing >> resource testfs-resource1 for 18002_crm_resource (internal) on node1 >> May 7 02:36:1

Re: [Pacemaker] failure handling on a cloned resource

2013-05-09 Thread Andrew Beekhof
Fixed! https://github.com/beekhof/pacemaker/commit/d87de1b On 10/05/2013, at 11:59 AM, Andrew Beekhof wrote: > > On 07/05/2013, at 5:15 PM, Johan Huysmans wrote: > >> Hi, >> >> I only keep a couple of pe-input file, and that pe-inpurt-1 version was >> a

Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8

2013-05-09 Thread Andrew Beekhof
On 10/05/2013, at 1:44 PM, pavan tc wrote: > > > > On Fri, May 10, 2013 at 6:21 AM, Andrew Beekhof wrote: > > On 08/05/2013, at 9:16 PM, pavan tc wrote: > > > Hi Andrew, > > Thanks much for looking into this. I have some queries inline. > >

Re: [Pacemaker] failure handling on a cloned resource

2013-05-09 Thread Andrew Beekhof
7;ll see what I can do... > > gr. > Johan > > On 2013-05-07 04:08, Andrew Beekhof wrote: >> I have a much clearer idea of the problem you're seeing now, thankyou. >> >> Could you attach /var/lib/pacemaker/pengine/pe-input-1.bz2 from CSE-1 ? >> >&g

Re: [Pacemaker] resource starts but then fails right away

2013-05-09 Thread Andrew Beekhof
On 10/05/2013, at 12:26 AM, Brian J. Murrell wrote: > I do see the: > > May 7 02:37:32 node1 crmd[16836]:error: print_elem: Aborting transition, > action lost: [Action 5]: In-flight (id: testfs-resource1_monitor_0, loc: > node1, priority: 0) > > in the log. Is that the root cause of th

Re: [Pacemaker] Using fence_sanlock with pacemaker 1.1.8-7.el6

2013-05-09 Thread Andrew Beekhof
On 10/05/2013, at 11:18 AM, John McCabe wrote: > > > > On Thu, May 9, 2013 at 3:12 AM, Andrew Beekhof wrote: > > On 08/05/2013, at 11:52 PM, John McCabe wrote: > > > Hi, > > I've been trying, unsuccessfully, to get fence_sanlock running as a fence

Re: [Pacemaker] SmartOS / illumos

2013-05-09 Thread Andrew Beekhof
Looks like you need https://github.com/beekhof/pacemaker/commit/629aa36 I recall making that change once before but it got lost somehow. On 10/05/2013, at 10:02 AM, Dalho PARK wrote: > Hello, > I’m trying to compile pacemaker on SmartOS and having error during make. > Does anyone has already su

Re: [Pacemaker] crm_mon failed with upgrade failed message

2013-05-09 Thread Andrew Beekhof
On 10/05/2013, at 9:42 AM, Andrew Beekhof wrote: > > On 10/05/2013, at 1:11 AM, Michal Fiala wrote: > >> On 05/08/2013 01:17 AM, Andrew Beekhof wrote: >>> >>> On 07/05/2013, at 11:42 PM, Michal Fiala wrote: >>> >>>> Hallo, >>>

Re: [Pacemaker] ClusterMon Resource starting multiple instances of crm_mon

2013-05-09 Thread Andrew Beekhof
On 10/05/2013, at 12:40 AM, Steven Bambling wrote: > I'm having some issues with getting some cluster monitoring setup and > configured on a 3 node multi-state cluster. I'm using Florian's blog as an > example > http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemake

Re: [Pacemaker] Pacemaker timeout problem

2013-05-09 Thread Andrew Beekhof
On 10/05/2013, at 2:29 AM, Pedro Sousa wrote: > Hi, > > can anybody explain me why this happens? > > lrmd: [1551]: WARN: perform_ra_op: the operation operation monitor[52] on > lsb::named::res_named_Sip for client 1554, its parameters: > CRM_meta_name=[monitor] CRM_meta_start_delay=[15000]

Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-09 Thread Andrew Beekhof
src.rpm and Source RPM : corosync-2.3.0-1.fc18.src.rpm > Rainer > > Gesendet: Donnerstag, 09. Mai 2013 um 04:31 Uhr > Von: "Andrew Beekhof" > An: "The Pacemaker cluster resource manager" > Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? > > On

Re: [Pacemaker] crm_mon failed with upgrade failed message

2013-05-09 Thread Andrew Beekhof
On 10/05/2013, at 1:11 AM, Michal Fiala wrote: > On 05/08/2013 01:17 AM, Andrew Beekhof wrote: >> >> On 07/05/2013, at 11:42 PM, Michal Fiala wrote: >> >>> Hallo, >>> >>> I have updated corosync/pacemaker cluster, versions see bellow

Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-08 Thread Andrew Beekhof
On 08/05/2013, at 4:53 PM, Andrew Beekhof wrote: > > On 08/05/2013, at 4:08 PM, Andrew Beekhof wrote: > >> >> On 03/05/2013, at 8:46 PM, Rainer Brestan wrote: >> >>> Now i have all the logs for some combinations. >>> >>> Corosync: 1.4

Re: [Pacemaker] Using fence_sanlock with pacemaker 1.1.8-7.el6

2013-05-08 Thread Andrew Beekhof
On 08/05/2013, at 11:52 PM, John McCabe wrote: > Hi, > I've been trying, unsuccessfully, to get fence_sanlock running as a fence > device within pacemaker 1.1.8 in Centos64. > > I've set the pcmk_host_argument="host_id" You mean the literal string "host_id" or the true value? Might be better

Re: [Pacemaker] stonithd segfault

2013-05-08 Thread Andrew Beekhof
On 08/05/2013, at 10:33 PM, Pavel wrote: > Hello everyone > > Can anyone, please assist me with the following problem. In syslog I get the > following messages: > > kernel: stonithd[2029]: segfault at 0 ip 004047ed sp 7fffe886c8c0 > error 4 in stonithd[40+17000] We need the

Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-07 Thread Andrew Beekhof
On 08/05/2013, at 4:08 PM, Andrew Beekhof wrote: > > On 03/05/2013, at 8:46 PM, Rainer Brestan wrote: > >> Now i have all the logs for some combinations. >> >> Corosync: 1.4.1-7 for all the tests on all nodes >> Base is always fresh installation of

Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-07 Thread Andrew Beekhof
some testing with. I'll let you know what I discover. > > 3.) 1.1.9-2 attaches to running 1.1.8-4 cluster > https://www.dropbox.com/s/y9o4yo8g8ahwjga/attach_1.1.9-2_to_1.1.8-4.zip > Result: join successful > > Rainer > Gesendet: Freitag, 03. Mai 2013 um 01:30 Uhr >

Re: [Pacemaker] crm_mon failed with upgrade failed message

2013-05-07 Thread Andrew Beekhof
On 07/05/2013, at 11:42 PM, Michal Fiala wrote: > Hallo, > > I have updated corosync/pacemaker cluster, versions see bellow. Cluster > is working fine, but when I change configuration via crm configure edit, > crm_mon is exited with error message: > > "Your current configuration could only be

Re: [Pacemaker] Pacemaker core dumps

2013-05-07 Thread Andrew Beekhof
Analyste de Systèmes | Systems Analyst > Service étudiants, service de l'informatique et des communications/Student > services, computing and communications services. > 1 Nicholas Street (810) > Ottawa ON K1N 7B7 > Tél. | Tel. 613-562-5800 (2120) > > > > > --

Re: [Pacemaker] pacemaker dev lead suggesting s/w upgrade

2013-05-07 Thread Andrew Beekhof
> > From: Babu Challa > Sent: 02 May 2013 10:27 > To: Vinod Prabhu; Michael van der Westhuizen; Dinesh Arney > Cc: Karthik Ganesan; Gavin Stevens; Mandar Magikar > Subject: pacemaker dev lead suggesting s/w upgrade > > Hi All, > > I have requested Andrew Beekho

Re: [Pacemaker] Corosync 2.3 dies randomly

2013-05-06 Thread Andrew Beekhof
On 06/05/2013, at 3:27 AM, Robert Parsons wrote: > > I'm trying to build out a web farm cluster using Corosync/Pacemaker. I > started with the stock versions in Ubuntu 12.04 but did not have a lot of > success. I removed the corosync (1.x) and pacemaker packages and built > Corosync 2.3 and

Re: [Pacemaker] failure handling on a cloned resource

2013-05-06 Thread Andrew Beekhof
_tomcat [d_tomcat] > d_tomcat:0(ocf::ntc:tomcat):Started (unmanaged) FAILED > Stopped: [ d_tomcat:1 ] > > => Fixing failure: Revert system so tomcat is running without failure (in > attachment step_3.log) > > # crm resource status > Resource Group: svc

Re: [Pacemaker] Pacemaker core dumps

2013-05-05 Thread Andrew Beekhof
quot;monitor\" > lrmd_rsc_userdata_str=\"4:664:0:596925c4-4bfa-46e2-9295-c3f9b6bd1ef9\" > lrmd_rsc_output=\"tomcat6 (pid 3199) is running...\033[60G[\033[0;32m "... > (gdb) print input+400 > $5 = 0x1e74788 "6925c4-4bfa-46e2-9295-c3f9b6bd1ef9\" > l

Re: [Pacemaker] Pacemaker core dumps

2013-05-05 Thread Andrew Beekhof
tomcat6 (pid 3199) is running...\033[60G[\033[0;32m "... > (gdb) print input+400 > $5 = 0x1e74788 "6925c4-4bfa-46e2-9295-c3f9b6bd1ef9\" > lrmd_rsc_output=\"tomcat6 (pid 3199) is running...\033[60G[\033[0;32m OK > \033[0;39m]\r\n\"> CRM_meta_name=\"monitor\&qu

[Pacemaker] Release candidate: 1.1.10-rc2

2013-05-02 Thread Andrew Beekhof
For those that may have missed it, we're trailing a release candidate approach for Pacemaker releases. See the original post at: http://blog.clusterlabs.org/blog/2013/release-candidate-1-dot-1-10-rc1/ Announcing the second release candidate for Pacemaker 1.1.10 > No major changes have been

Re: [Pacemaker] failure handling on a cloned resource

2013-05-02 Thread Andrew Beekhof
On 02/05/2013, at 5:45 PM, Johan Huysmans wrote: > > On 2013-05-01 05:48, Andrew Beekhof wrote: >> On 17/04/2013, at 9:54 PM, Johan Huysmans wrote: >> >>> Hi All, >>> >>> I'm trying to setup a specific configuration in our cluster,

Re: [Pacemaker] /usr/sbin/crm command not found in pacemaker-cli 1.1.8

2013-05-02 Thread Andrew Beekhof
On 02/05/2013, at 8:28 PM, Sounak Nandi wrote: > Hii, > > I am unable to find crm shell command in pacemaker-cli 1.1.8 which is > required for running crm configure show command or similar commands like crm > configure load or replace command. Please help asap!!! > The replacement for crm fou

Re: [Pacemaker] monitor domain controller

2013-05-02 Thread Andrew Beekhof
On 02/05/2013, at 6:39 PM, Michael Schwartzkopff wrote: > Am Donnerstag, 2. Mai 2013, 08:32:19 schrieb James Harper: > > Currently I am using a ping resource to ensure that other windows VM's don't > > start up until the domain controllers are started. This helps prevent > > things like Exchange

Re: [Pacemaker] Making Virtual Domain utilization dynamic

2013-05-02 Thread Andrew Beekhof
On 02/05/2013, at 5:25 PM, Michael Schwartzkopff wrote: > Am Donnerstag, 2. Mai 2013, 17:08:25 schrieb Andrew Beekhof: > > On 02/05/2013, at 5:03 PM, Michael Schwartzkopff > > wrote: > > > Am Donnerstag, 2. Mai 2013, 16:56:19 schrieb Andrew Beekhof: > > > >

Re: [Pacemaker] Pacemaker core dumps

2013-05-02 Thread Andrew Beekhof
0004055cc in main (argc=1, argv=0x7fffe77a4f88) at main.c:120 > > > Xavier Lashmar > Analyste de Systèmes | Systems Analyst > Service étudiants, service de l'informatique et des communications/Student > services, computing and communications services. > 1 Nicholas Stre

Re: [Pacemaker] Pcmk migration logic and Libvirt migration behavior

2013-05-02 Thread Andrew Beekhof
On 03/05/2013, at 12:32 AM, Andreas Hofmeister wrote: > On 05/01/2013 10:49 PM, David Vossel wrote: > >> >> Have you tested this with 1.1? There have been changes to how migration >> works, some of which have to do with properly handling partial migrations. >> I'd test this in 1.1.10.rc1 or

Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-02 Thread Andrew Beekhof
On 03/05/2013, at 4:46 AM, Rainer Brestan wrote: > Hi Lars, > i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with > corosync 1.4.1-17, also running with 1.1.7-6 at the other node. > I have to go deeper in details later on (with logs), but the first try was > worse than 1.1.8-

Re: [Pacemaker] Making Virtual Domain utilization dynamic

2013-05-02 Thread Andrew Beekhof
On 02/05/2013, at 5:03 PM, Michael Schwartzkopff wrote: > Am Donnerstag, 2. Mai 2013, 16:56:19 schrieb Andrew Beekhof: > > On 02/05/2013, at 4:43 PM, Michael Schwartzkopff > > wrote: > > > Am Freitag, 26. April 2013, 22:14:59 schrieb Michael Schwartzkopff: > &

Re: [Pacemaker] Making Virtual Domain utilization dynamic

2013-05-02 Thread Andrew Beekhof
On 02/05/2013, at 4:43 PM, Michael Schwartzkopff wrote: > Am Freitag, 26. April 2013, 22:14:59 schrieb Michael Schwartzkopff: > > Hi, > > > > I picked up my old idea to make resouce utilizations dymanic and wrote the > > little patch for VirtualDomain to check CPU utilization every time the > >

Re: [Pacemaker] Clone Resources Individual Configuration per Node

2013-05-01 Thread Andrew Beekhof
On 10/04/2013, at 8:03 AM, Andrew Beekhof wrote: > > On 10/04/2013, at 4:33 AM, Felix Zachlod wrote: > >>> It seems no rule has been selected here and it has fallen back to the >> default. >>> It looks similar on the other node- but some times it seems that

Re: [Pacemaker] Pacemaker core dumps

2013-05-01 Thread Andrew Beekhof
uld be greatly appreciated. The libtool parts aren't so interesting. Were there no other frames? (lines starting with # and a number) > > Xavier Lashmar > Analyste de Systèmes | Systems Analyst > Service étudiants, service de l'informatique et des communications/Student >

Re: [Pacemaker] Pacemaker/Corosync on Ubuntu 12.04

2013-05-01 Thread Andrew Beekhof
On 02/05/2013, at 8:21 AM, Andrew Martin wrote: > - Original Message - >> From: "Robert Parsons" >> To: "The Pacemaker cluster resource manager" >> Sent: Wednesday, May 1, 2013 4:03:46 PM >> Subject: [Pacemaker] Pacemaker/Corosync on Ubuntu 12.04 >> >> >> >> We're wanting to build a

Re: [Pacemaker] corosync restarts service when slave node joins the cluster

2013-05-01 Thread Andrew Beekhof
eone else appreciates > it or not. "Excellence" is a drive from inside, not outside. Excellence is > not for someone else to notice but for your own satisfaction and efficiency... > > > -Original Message- > From: Andrew Beekhof [mailto:and...@beekhof.net] &

Re: [Pacemaker] Two node KVM cluster

2013-04-30 Thread Andrew Beekhof
ould someone help me, please? > > Thanks, > Oriol > > On 16/04/13 06:10, Andrew Beekhof wrote: >> >> On 10/04/2013, at 3:20 PM, Oriol Mula-Valls wrote: >> >>> On 10/04/13 02:10, Andrew Beekhof wrote: >>>> >>>> On 09/04/2013, at 7:31

Re: [Pacemaker] failure handling on a cloned resource

2013-04-30 Thread Andrew Beekhof
On 17/04/2013, at 9:54 PM, Johan Huysmans wrote: > Hi All, > > I'm trying to setup a specific configuration in our cluster, however I'm > struggling with my configuration. > > This is what I'm trying to achieve: > On both nodes of the cluster a daemon must be running (tomcat). > Some failover

Re: [Pacemaker] Pacemaker configuration with different dependencies

2013-04-30 Thread Andrew Beekhof
On 17/04/2013, at 6:15 PM, Ivor Prebeg wrote: > Hi Andreas, thank you for your answer. > > Maybe my description was a little fuzzy, sorry for that. > > What I want is following: > > * if l3_ping fails on a particular node, all services should go to standby on > that node (which probably work

Re: [Pacemaker] Kernel WARN unpack_status in syslog

2013-04-30 Thread Andrew Beekhof
On 20/04/2013, at 3:07 AM, Ivor Prebeg wrote: > Guys, > > I can't get rid of following warnings: > > Apr 19 19:00:37 node2 crmd: [32230]: WARN: start_subsystem: Client pengine > already running as pid 32240 > Apr 19 19:00:44 node2 pengine: [32240]: WARN: unpack_status: Node node1 in > status

Re: [Pacemaker] Failed actions message

2013-04-30 Thread Andrew Beekhof
On 20/04/2013, at 9:12 PM, Tommy Cooper wrote: > Hi, > > I have a fully functional 2 node cluster of Asterisk servers. When one of the > servers fail the message below shows up in crm_mon and stays there even if I > fix the problem. I know that the message is supposed to show up. Is there a

Re: [Pacemaker] warning: unpack_rsc_op: Processing failed op monitor for my_resource on node1: unknown error (1)

2013-04-30 Thread Andrew Beekhof
On 01/05/2013, at 2:51 AM, Brian J. Murrell wrote: > Using 1.1.8 on EL6.4, I am seeing this sort of thing: > > pengine[1590]: warning: unpack_rsc_op: Processing failed op monitor for > my_resource on node1: unknown error (1) > > The full log from the point of adding the resource until the er

Re: [Pacemaker] will a stonith resource be moved from an AWOL node?

2013-04-30 Thread Andrew Beekhof
On 01/05/2013, at 1:28 AM, Brian J. Murrell wrote: > On 13-04-30 11:13 AM, Lars Marowsky-Bree wrote: >> >> Pacemaker 1.1.8's stonith/fencing subsystem directly ties into the CIB, >> and will complete the fencing request even if the fencing/stonith >> resource is not instantiated on the node yet

Re: [Pacemaker] Behavior when crm_mon is a daemon

2013-04-30 Thread Andrew Beekhof
On 19/04/2013, at 11:05 AM, Yuichi SEINO wrote: > HI, > > 2013/4/16 Andrew Beekhof : >> >> On 15/04/2013, at 7:42 PM, Yuichi SEINO wrote: >> >>> Hi All, >>> >>> I look at the daemon of tools to make a new daemon. So, I have a questi

Re: [Pacemaker] lrm monitor failure status lost during DC election

2013-04-30 Thread Andrew Beekhof
On 19/04/2013, at 6:36 AM, David Adair wrote: > Hello. > > I have an issue with pacemaker 1.1.6.1 but believe this may still be > present in the > latest git versions and would like to know if the fix makes sense. > > > What I see is the following: > Setup: > - 2 node cluster > - ocf:heartbea

Re: [Pacemaker] cannot register service of pacemaker_remote

2013-04-30 Thread Andrew Beekhof
Done. Thanks! On 30/04/2013, at 3:34 PM, nozawat wrote: > Hi > > Because there was typo in pacemaker.spec.in, I was not able to register > service of pacemaker_remote. > > - > diff --git a/pacemaker.spec.in b/pacemaker.spec.in > index 10296a5..1e1fd6d 100644

Re: [Pacemaker] corosync restarts service when slave node joins the cluster

2013-04-30 Thread Andrew Beekhof
Please ask questions on the mailing lists. On 01/05/2013, at 12:30 AM, Babu Challa wrote: > Hi Andrew, > > Greetings, > > We are using corosync/pacemaker for high availability > > This is a 4 node HA cluster where each pair of nodes are configured for DB > and file system replication > W

Re: [Pacemaker] Two node KVM cluster

2013-04-29 Thread Andrew Beekhof
On 17/04/2013, at 4:02 PM, Oriol Mula-Valls wrote: > On 16/04/13 06:10, Andrew Beekhof wrote: >> >> On 10/04/2013, at 3:20 PM, Oriol Mula-Valls wrote: >> >>> On 10/04/13 02:10, Andrew Beekhof wrote: >>>> >>>> On 09/04/2013, at 7:31

Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-29 Thread Andrew Beekhof
On 24/04/2013, at 7:44 PM, Rainer Brestan wrote: > Pacemaker log of int2node2 with trace setting. > https://www.dropbox.com/s/04ciy2g6dfbauxy/pacemaker.log?n=165978094 > On int2node1 (1.1.7) the trace setting did not create the pacemaker.log file. > Ah, yes, 1.1.7 wasn't so smart yet. Can you

Re: [Pacemaker] Pacemaker installed to custom location

2013-04-29 Thread Andrew Beekhof
On 26/04/2013, at 9:12 PM, James Masson wrote: > > > On 26/04/13 01:29, Andrew Beekhof wrote: >> >> On 26/04/2013, at 12:12 AM, James Masson wrote: >> >>> >>> Hi list, >>> >>> I'm trying to build and ru

Re: [Pacemaker] Bind virtual IP resource to service resource running state

2013-04-29 Thread Andrew Beekhof
On 26/04/2013, at 7:34 PM, Forum Registrant wrote: > Hi! > > I have 2 node cluster. On each node I have mysql, nginx and php-fpm. Each > node have it's own virtual IP. I need this virtual ip to migrate to other > node if one of services (mysql/nginx/php-fpm) is down/stopped. How can I do > i

Re: [Pacemaker] HA KVM over DRBD primary/secondary configuration

2013-04-29 Thread Andrew Beekhof
On 19/04/2013, at 5:44 PM, Rasto Levrinc wrote: > On Fri, Apr 19, 2013 at 9:11 AM, Alexandr A. Alexandrov > wrote: >> Hi Rasto, >> >> Note that on RHEL 6/CentOS 6, you should run the Pacemaker through CMAN and >> not a Corosync plugin > > I wonder if that's still true, Yes. More so since t

Re: [Pacemaker] crm_attribute not returning node attribute

2013-04-29 Thread Andrew Beekhof
On 20/04/2013, at 3:39 AM, Brian J. Murrell wrote: > Given: > > host1# crm node attribute host1 show foo > scope=nodes name=foo value=bar > > Why doesn't this return anything: > > host1# crm_attribute --node host1 --name foo --query This is looking up transient attributes. You need to add "

Re: [Pacemaker] Shared redundant machine

2013-04-29 Thread Andrew Beekhof
On 26/04/2013, at 7:05 PM, Grant Bagdasarian wrote: > Hello, > > Let’s say I have three physical servers available to configure in a redundant > way. I’m going to use two of the three servers to connect each to a different > network. The one that is left over will be the fallback server for

Re: [Pacemaker] How to display interface link status in corosync

2013-04-29 Thread Andrew Beekhof
On 18/04/2013, at 2:42 PM, Yuichi SEINO wrote: > Hi, > > 2013/4/15 Andrew Beekhof : >> >> On 15/04/2013, at 3:38 PM, Yuichi SEINO wrote: >> >>> Hi, >>> >>> 2013/4/8 Andrew Beekhof : >>>> I'm not 100% sure what the bes

Re: [Pacemaker] resource-stickness-issue

2013-04-29 Thread Andrew Beekhof
On 16/04/2013, at 6:37 PM, ravindra.raut...@wipro.com wrote: > Hi All, > I have created cluster with these versions in fedora 17. > pacemaker-1.1.7-2.fc17.x86_64 > corosync-2.0.0-1.fc17.x86_64 > > Everything is working fine for me except resource stickiness. > > Any idea on this ?

Re: [Pacemaker] Routing-Ressources on a 2-Node-Cluster

2013-04-29 Thread Andrew Beekhof
On 23/04/2013, at 6:05 PM, T. wrote: > Hi Devin, > > thank you very much for your answer. > >> If you insist on trying to do this with just the Linux-HA cluster, >> I don't have any suggestions as to how you should proceed. > I know that the "construct" we are building is quite complicated. >

Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

2013-04-29 Thread Andrew Beekhof
On 18/04/2013, at 5:58 PM, T. wrote: > Hi, > >> Seems appropriate :) >> >> >> http://blog.clusterlabs.org/blog/2009/configuring-heartbeat-v1-was-so-simple/ > "…because it couldn’t do anything." > > That might be true, but this was everything I needed the last years ... You didn't need to k

Re: [Pacemaker] cman + corosync + pacemaker + fence_scsi

2013-04-29 Thread Andrew Beekhof
On 26/04/2013, at 4:25 PM, Angel L. Mateo wrote: > El 26/04/13 02:01, Andrew Beekhof escribió: >> >> On 24/04/2013, at 10:48 PM, Angel L. Mateo wrote: >> >>> Hello, >>> >>> I'm trying to configure a 2 node cluster in ubuntu with

Re: [Pacemaker] Action "unknown exec error" and unmanaged/failed resources, how to migrate?

2013-04-29 Thread Andrew Beekhof
On 29/04/2013, at 2:48 PM, Mark Williams wrote: > Hi all, > > My two node cluster (qemu VMs, drbd) is now in quite a messy state. > > The problem started with a unresponsive qemu VM, which appeared to be > caused by a libvertd problem/bug. > Others said the solution was to kill & restart libve

Re: [Pacemaker] Pacemaker core dumps

2013-04-29 Thread Andrew Beekhof
Lashmar > Analyste de Systèmes | Systems Analyst > Service étudiants, service de l'informatique et des communications/Student > services, computing and communications services. > 1 Nicholas Street (810) > Ottawa ON K1N 7B7 > Tél. | Tel. 613-562-5800 (2120) > > >

Re: [Pacemaker] why so long to stonith?

2013-04-25 Thread Andrew Beekhof
On 26/04/2013, at 10:24 AM, David Coulson wrote: > > On 4/25/13 7:43 PM, Andrew Beekhof wrote: >> I certainly hope so :) > So I should complain to our sales people about this BZ before we upgrade our > clusters to 6.4? Actually, I'm going to back-track on this. After f

Re: [Pacemaker] Pacemaker installed to custom location

2013-04-25 Thread Andrew Beekhof
On 26/04/2013, at 12:12 AM, James Masson wrote: > > Hi list, > > I'm trying to build and run pacemaker from a custom location. > > > # > # cluster-glue > tar xf pacemaker/cluster-glue-1.0.11+.tar.gz > ( > cd Reusable-Cluster-Components-glue--8347e8c9b94f > ./autogen.sh > .

Re: [Pacemaker] why so long to stonith?

2013-04-25 Thread Andrew Beekhof
On 26/04/2013, at 10:24 AM, David Coulson wrote: > > On 4/25/13 7:43 PM, Andrew Beekhof wrote: >> I certainly hope so :) > So I should complain to our sales people about this BZ before we upgrade our > clusters to 6.4? I don't think it would hurt to demonstrate how m

Re: [Pacemaker] Pacemaker core dumps

2013-04-25 Thread Andrew Beekhof
On 26/04/2013, at 10:06 AM, Andrew Beekhof wrote: > > On 25/04/2013, at 11:59 PM, Xavier Lashmar wrote: > >> Following further investigation, we were able to determine that upgrading >> both nodes (in a two node cluster) from Pacemaker 1.1.7-6 to Pacemaker >> 1.

Re: [Pacemaker] Pacemaker core dumps

2013-04-25 Thread Andrew Beekhof
On 25/04/2013, at 11:59 PM, Xavier Lashmar wrote: > Following further investigation, we were able to determine that upgrading > both nodes (in a two node cluster) from Pacemaker 1.1.7-6 to Pacemaker > 1.1.8-7 (CentOS 6.3 or Centos 6.4) caused these errors to begin happening: Would you be able

Re: [Pacemaker] cman + corosync + pacemaker + fence_scsi

2013-04-25 Thread Andrew Beekhof
On 24/04/2013, at 10:48 PM, Angel L. Mateo wrote: > Hello, > > I'm trying to configure a 2 node cluster in ubuntu with cman + corosync > + pacemaker (the use of cman is because it is recommended at pacemaker > quickstart). In order to solve the split brain in the 2 node cluster I'm > u

Re: [Pacemaker] cman + corosync + pacemaker + fence_scsi

2013-04-25 Thread Andrew Beekhof
On 24/04/2013, at 11:44 PM, Andreas Mock wrote: > Hi Angel, > > two hints from my side. As you're working with ubuntu > ask in this list which setup is or will be the best > concerning corosync + pacemaker. I'm pretty sure > (but I really don't know) that you'll get the advice > to drop cman.

Re: [Pacemaker] best setup for corosync + pacemaker in ubuntu 12.04

2013-04-25 Thread Andrew Beekhof
On 25/04/2013, at 4:03 PM, Angel L. Mateo wrote: > Hello everbody, > > As suggested by Andreas Mock in a previous thread... what is the best > setup for corosync and pacemaker in a VM running ubuntu 12.04? > > In pacemaker's quickstart > (http://clusterlabs.org/quickstart-ubuntu.

Re: [Pacemaker] why so long to stonith?

2013-04-25 Thread Andrew Beekhof
On 25/04/2013, at 5:22 AM, Brian J. Murrell wrote: > On 13-04-24 01:16 AM, Andrew Beekhof wrote: >> >> Almost certainly you are hitting: >> >>https://bugzilla.redhat.com/show_bug.cgi?id=951340 > > Yup. The patch posted there fixed it. > >>

Re: [Pacemaker] Web farm question

2013-04-25 Thread Andrew Beekhof
On 25/04/2013, at 12:49 AM, Robert Parsons wrote: > > We are building a new web farm to replace our 7 year old system. The old > system used ipvs/ldirectord/heartbeat to implement redundant load balancers. > All web server nodes were physical boxes. > > The proposed new system will utilize a

Re: [Pacemaker] clear failcount when monitor is successful?

2013-04-23 Thread Andrew Beekhof
On 23/04/2013, at 11:24 PM, Johan Huysmans wrote: > Hi All, > > I have a cloned resource, running on my both nodes, my on-fail is set to > block. > So if the resource fails on a node the failcount increases, but whenever the > resource automatically recovers the failcount isn't reset. > > Is

Re: [Pacemaker] why so long to stonith?

2013-04-23 Thread Andrew Beekhof
On 24/04/2013, at 5:34 AM, Brian J. Murrell wrote: > Using pacemaker 1.1.8 on RHEL 6.4, I did a test where I just killed > (-KILL) corosync on a peer node. Pacemaker seemed to take a long time > to transition to stonithing it though after noticing it was AWOL: [snip] > As you can see, 3 minut

<    8   9   10   11   12   13   14   15   16   17   >