Re: [Pacemaker] behavior when do fail start resource

2014-04-03 Thread Andrew Beekhof
On 26 Mar 2014, at 4:57 pm, Andrey Groshev wrote: > Hi, ALL! > Some time ago, I saw somewhere a description of behavior: > "When at the start of the resource fails, the fail-count is set to 100 > and the resource is no longer starts, > even if it is established "start on-fail = restart" >

Re: [Pacemaker] Enabling pacemaker debug logging while running

2014-04-03 Thread Andrew Beekhof
the blackbox route (which gets you even more detail). > > Or is the "only" option the backbox feature? > > Best regards > Andreas Mock > > > > -Ursprüngliche Nachricht- > Von: Andrew Beekhof [mailto:and...@beekhof.net] > Gesendet: Montag, 24. M

Re: [Pacemaker] Enabling pacemaker debug logging while running

2014-04-03 Thread Andrew Beekhof
On 24 Mar 2014, at 10:07 pm, emmanuel segura wrote: > but it will be implemented? no plans to > > > 2014-03-24 2:22 GMT+01:00 Andrew Beekhof : > > On 24 Mar 2014, at 11:04 am, emmanuel segura wrote: > > > how can i turn off the debug without reboot t

Re: [Pacemaker] unix socket corosync.ipc not created by corosync 2.3.3 for pacemakerd 1.1

2014-04-03 Thread Andrew Beekhof
You're running a version of pacemaker that was built against a different version of corosync (1.x) than you're running (2.x). On 2 Apr 2014, at 8:14 pm, Stefan Bauer wrote: > It seems that pacemakerd 1.1 is trying to connect to unix socket > corosync.ipc but that is not provided by corosync 2.3

Re: [Pacemaker] heartbeat keep alive latency questions

2014-04-03 Thread Andrew Beekhof
On 3 Apr 2014, at 4:41 pm, Lev G wrote: > Hi, > Can you please advise what the default latency of heartbeat keep alive > packets is? For Corosync or Heartbeat? And I'd have thought the latency would be a function of your network infrastructure, not the software. > Another question, whether t

[Pacemaker] Fwd: crmsh 2.0 released, and moving to Github

2014-04-03 Thread Andrew Beekhof
FYI for those that prefer crmsh Begin forwarded message: > From: Kristoffer Grönlund > Subject: [Linux-HA] crmsh 2.0 released, and moving to Github > Date: 4 April 2014 3:03:33 am AEDT > To: Linux-HA > Cc: Dejan Muhamedagic , linux-ha-dev > > Reply-To: General Linux-HA mailing list > > Hell

Re: [Pacemaker] Errors while compiling

2014-03-25 Thread Andrew Beekhof
variable ‘retries’ [-Werror=unused-variable] > cc1: all warnings being treated as errors > make[1]: *** [corosync.o] Fehler 1 > make[1]: Leaving directory `/opt/srccluster/pacemaker-Pacemaker-1.1.11/mcp' > make: *** [core] Fehler 1 > > Any ideas? Perhaps re-run ./configure so t

Re: [Pacemaker] Enabling pacemaker debug logging while running

2014-03-23 Thread Andrew Beekhof
On 24 Mar 2014, at 11:04 am, emmanuel segura wrote: > how can i turn off the debug without reboot the pacemaker? you cant. > > > 2014-03-24 0:36 GMT+01:00 Andrew Beekhof : > > On 20 Mar 2014, at 11:24 pm, Andreas Mock wrote: > > > Hi all, > > > > t

Re: [Pacemaker] Enabling pacemaker debug logging while running

2014-03-23 Thread Andrew Beekhof
On 20 Mar 2014, at 11:24 pm, Andreas Mock wrote: > Hi all, > > today I faced a problem which I couldn't solve reading > several man pages and other found hint on the web. > > I have a clone of RHEL 6.5, cman based cluster and > pacemaker 1.1.10+. I was able to change the value > debug="on" in

Re: [Pacemaker] crmd internal error during failover

2014-03-23 Thread Andrew Beekhof
On 21 Mar 2014, at 3:57 am, Drapeau, Mathieu wrote: > Hello, > From pacemaker 1.1.8-7 from EL6, crmd died unexpected generating this logs > during a failover: Please update to 1.1.10 from the EL6 update channels: http://blog.clusterlabs.org/blog/2014/potential-for-data-corruption-in-pacema

Re: [Pacemaker] crm resource doesn´t move after hardware crash

2014-03-23 Thread Andrew Beekhof
On 21 Mar 2014, at 11:11 pm, Beo Banks wrote: > yap and that´s my issue. > > stonith is very powerfull but how can the cluster handle hardware failure? by connecting to the switch that supplies power to said hardware exactly the reason devices like fence_virsh and external/ssh are not consider

Re: [Pacemaker] Stonithd segfaulting and causing unclean?

2014-03-20 Thread Andrew Beekhof
On 21 Mar 2014, at 12:40 am, Michał Margula wrote: > Hello, > > We had many unresolved issues some time ago with Pacemaker. I think > almost all of them got solved by fixing link between clusters (removed > media converters, replaced them with NIC with SFP+, upgraded to 10Gbps). > > Now it see

Re: [Pacemaker] hangs pending

2014-03-19 Thread Andrew Beekhof
On 19 Mar 2014, at 4:00 pm, Andrey Groshev wrote: > > > 19.03.2014, 03:29, "Andrew Beekhof" : >> On 19 Mar 2014, at 6:19 am, Andrey Groshev wrote: >> >>> 12.03.2014, 02:53, "Andrew Beekhof" : >>>> Sorry for the delay, som

Re: [Pacemaker] Remote node not responding error

2014-03-19 Thread Andrew Beekhof
On 20 Mar 2014, at 12:03 am, ESWAR RAO wrote: > Hi All, > > I have a 3 node setup where heartbeat+pacemaker runs on all the 3 machines. > While adding resources , I am observing below errors, but all resources are > configured properly. > > Call cib_replace failed (-41): Remote node did not r

Re: [Pacemaker] This node is within the non-primary component error

2014-03-19 Thread Andrew Beekhof
On 20 Mar 2014, at 4:35 am, K Mehta wrote: > What is meant by the following error ? > > This node is within the non-primary component and will NOT provide any > services. > > > > Are all resources expected to go in unmanaged state after this message is > seen ? No, it means you lost quoru

Re: [Pacemaker] Node in pending state, resources duplicated and data corruption

2014-03-18 Thread Andrew Beekhof
On 19 Mar 2014, at 10:15 am, Andrew Beekhof wrote: > > On 18 Mar 2014, at 10:04 pm, Gabriel Gomiz > wrote: > >> Maybe, this is significant : 'Our DC node >> (gandalf.san01.cooperativaobrera.coop) left the cluster' ... ? > > Very. I hadn

Re: [Pacemaker] hangs pending

2014-03-18 Thread Andrew Beekhof
On 19 Mar 2014, at 6:19 am, Andrey Groshev wrote: > > > 12.03.2014, 02:53, "Andrew Beekhof" : >> Sorry for the delay, sometimes it takes a while to rebuild the necessary >> context > > I'm sorry too for the answer delay. > I switched to

Re: [Pacemaker] help building 2 node config

2014-03-18 Thread Andrew Beekhof
On 18 Mar 2014, at 3:17 pm, Alex Samad - Yieldbroker wrote: > > >> -Original Message----- >> From: Andrew Beekhof [mailto:and...@beekhof.net] >> Sent: Tuesday, 18 March 2014 2:02 PM >> To: The Pacemaker cluster resource manager >> Subject: Re: [Pa

Re: [Pacemaker] Node in pending state, resources duplicated and data corruption

2014-03-18 Thread Andrew Beekhof
On 18 Mar 2014, at 10:04 pm, Gabriel Gomiz wrote: > Maybe, this is significant : 'Our DC node > (gandalf.san01.cooperativaobrera.coop) left the cluster' ... ? Very. I hadn't noticed it was the DC at the time it died. > > Please tell me if you need more details: Can I get the file logs from

Re: [Pacemaker] Pacemaker/corosync freeze

2014-03-18 Thread Andrew Beekhof
On 18 Mar 2014, at 6:03 pm, Attila Megyeri wrote: > Hello, > >> -Original Message----- >> From: Andrew Beekhof [mailto:and...@beekhof.net] >> Sent: Tuesday, March 18, 2014 2:43 AM >> To: Attila Megyeri >> Cc: The Pacemaker cluster resource manager &

Re: [Pacemaker] pacemaker RHEL6 with cman

2014-03-18 Thread Andrew Beekhof
On 19 Mar 2014, at 1:18 am, Leon Fauster wrote: > Am 18.03.2014 um 00:02 schrieb Andrew Beekhof : >> >> On 17 Mar 2014, at 11:26 pm, Gianluca Cecchi >> wrote: >> >>> On Mon, Mar 17, 2014 at 12:19 AM, Alex Samad - Yieldbroker wrote: >>>>

Re: [Pacemaker] Don't want to stop lsb resource on migration

2014-03-18 Thread Andrew Beekhof
On 19 Mar 2014, at 6:56 am, Bingham wrote: > > My problem is that I need to have rabbitmq running on both node1 and node2. > I also need the IP to fail over if rabbitmq were to fail on the current node. > > The 2 rabbitmq services are communicating with each other. > Data is pushed to the cli

Re: [Pacemaker] help building 2 node config

2014-03-17 Thread Andrew Beekhof
On 18 Mar 2014, at 1:36 pm, Alex Samad - Yieldbroker wrote: > Hi > > > >> -Original Message----- >> From: Andrew Beekhof [mailto:and...@beekhof.net] >> Sent: Tuesday, 18 March 2014 11:51 AM >> To: The Pacemaker cluster resource manager >> Sub

Re: [Pacemaker] Pacemaker/corosync freeze

2014-03-17 Thread Andrew Beekhof
On 13 Mar 2014, at 11:44 pm, Attila Megyeri wrote: > Hello, > >> -Original Message- >> From: Jan Friesse [mailto:jfrie...@redhat.com] >> Sent: Thursday, March 13, 2014 10:03 AM >> To: The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] Pacemaker/corosync freeze >> >> ...

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-03-17 Thread Andrew Beekhof
On 12 Mar 2014, at 1:45 pm, Yusuke Iida wrote: > Hi, Andrew > 2014-03-12 6:37 GMT+09:00 Andrew Beekhof : >>> Mar 07 13:24:14 [2528] vm01 crmd: (te_callbacks:493 ) error: >>> te_update_diff: Ingoring create operation for /cib 0xf91c10, >>> configurati

Re: [Pacemaker] help building 2 node config

2014-03-17 Thread Andrew Beekhof
On 13 Mar 2014, at 4:13 pm, Alex Samad - Yieldbroker wrote: > Well I think I have worked it out > > > # Create ybrp ip address > pcs resource create ybrpip ocf:heartbeat:IPaddr2 params ip=10.172.214.50 > cidr_netmask=24 nic=eth0 clusterip_hash=sourceip-sourceport \ >op start interval=

Re: [Pacemaker] Don't want to stop lsb resource on migration

2014-03-17 Thread Andrew Beekhof
On 14 Mar 2014, at 1:00 am, Bingham wrote: > Hello, > > My setup: > I have a 2 node cluster using pacemaker and heartbeat. I have 2 > resources, ocf::heartbeat:IPaddr and lsb:rabbitmq-server. > I have these 2 resources grouped together and they will fail over to > the

Re: [Pacemaker] Node in pending state, resources duplicated and data corruption

2014-03-17 Thread Andrew Beekhof
On 14 Mar 2014, at 10:32 pm, Gabriel Gomiz wrote: > Hi to all! > > We've a 4 node cluster and recently experienced a weird issue with Pacemaker > that resulted in three > database instance resources duplicated (running simultaneously in 2 nodes) > and subsequent data > corruption. > > I've

Re: [Pacemaker] constraints not working

2014-03-17 Thread Andrew Beekhof
On 14 Mar 2014, at 2:23 pm, Alex Samad - Yieldbroker wrote: > I have setup constraints to have orig and clone resources on separate boxes, > but after rebooting devrp1 all the resources are staying on devrp2 Can you attach the result of 'cibadmin -Ql' when the cluster is in this state please

Re: [Pacemaker] Question

2014-03-17 Thread Andrew Beekhof
On 14 Mar 2014, at 7:11 am, Digimer wrote: > On 13/03/14 04:08 PM, Andreas Sinn wrote: >> stonith-enabled="false" \ >> no-quorum-policy="ignore" > > I can't speak to your main question, but this is a split-brain waiting to > happen. Please configure and test stonith. Agree. A

Re: [Pacemaker] fencing question

2014-03-17 Thread Andrew Beekhof
On 14 Mar 2014, at 1:18 am, Karl Rößmann wrote: > Hi, > > I changed the running resource by > crm / configure / edit / commit. It seemed to work. > > I stopped the resource, and changed some details, > Whenever I commit again I get this warning: > warning: do_log: FSA: Input I_ELECTION_DC from

Re: [Pacemaker] process/service watcher

2014-03-17 Thread Andrew Beekhof
On 14 Mar 2014, at 7:01 am, David Vossel wrote: > - Original Message - >> From: "Yair Ogen (yaogen)" >> To: "The Pacemaker cluster resource manager" >> Sent: Thursday, March 13, 2014 9:22:44 AM >> Subject: Re: [Pacemaker] process/service watcher >> >> >> >> Thanks Frank, so you conf

Re: [Pacemaker] How to delay first monitor op upon resource start?

2014-03-17 Thread Andrew Beekhof
On 14 Mar 2014, at 7:14 am, David Vossel wrote: > - Original Message - >> From: "Gianluca Cecchi" >> To: "The Pacemaker cluster resource manager" >> Sent: Thursday, March 13, 2014 12:00:16 PM >> Subject: [Pacemaker] How to delay first monitor op upon resource start? >> >> Hello, >> I

Re: [Pacemaker] pacemaker RHEL6 with cman

2014-03-17 Thread Andrew Beekhof
On 18 Mar 2014, at 9:46 am, Alex Samad - Yieldbroker wrote: > Oh > > Well sort of it was to align with RHEL6.x, I guess I will have to re eval > when it comes time to move to RHEL7. If you're using pcs, then you should hardly notice any difference. > > Alex > >> -Original Message-

Re: [Pacemaker] pacemaker RHEL6 with cman

2014-03-17 Thread Andrew Beekhof
On 17 Mar 2014, at 11:26 pm, Gianluca Cecchi wrote: > On Mon, Mar 17, 2014 at 12:19 AM, Alex Samad - Yieldbroker wrote: >> Hi >> >> >> >> I am in the process of migrating away from the pcmk plugin for corosync and >> converting to cman. >> >> So from what I gather its >> >> Pacemaker -> cm

Re: [Pacemaker] Pacemaker and ldirectord with centos 6.5

2014-03-17 Thread Andrew Beekhof
I don't believe we RHEL (or clones) include ldirectord, I think piranha is the equivalent. Ryan? On 18 Mar 2014, at 4:04 am, Luc Paulin wrote: > Hi, > Has anyone setup a centos 6.5 LVS with ldirectord and pacemaker. Look like I > can't find the ldirectord package > > [root@fwpci-01 ~]# yum

Re: [Pacemaker] Errors while compiling

2014-03-16 Thread Andrew Beekhof
Its looking for cmap_handle_t which will be in one of the corosync headers. What version of corosync have you got installed? On 15 Mar 2014, at 12:18 am, Stephan Buchner wrote: > Hm, i installed "libcrmcluster1-dev" and "libcrmcommon2-dev" on my debian > system, still the same error :/ > > Am

Re: [Pacemaker] help migrating over cluster config from pacemaker plugin into corosync to pcs

2014-03-12 Thread Andrew Beekhof
On 13 Mar 2014, at 11:56 am, Alex Samad - Yieldbroker wrote: > Hi > > So this is what I used to do to setup my cluster > crm configure property stonith-enabled=false > crm configure property no-quorum-policy=ignore > crm configure rsc_defaults resource-stickiness=100 > crm configure primitive

Re: [Pacemaker] missing init scripts for corosync and pacemaker

2014-03-12 Thread Andrew Beekhof
On 13 Mar 2014, at 9:29 am, Jay G. Scott wrote: > > OS = RHEL 6 > > because my machines are behind a firewall, i can't install > via yum. i had to bring down the rpms and install them. > here are the rpms i installed. yeah, it bothers me that > they say fc20 but that's what i got when i used

Re: [Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff

2014-03-11 Thread Andrew Beekhof
On 12 Mar 2014, at 10:56 am, Gianluca Cecchi wrote: > On Wed, Mar 12, 2014 at 12:37 AM, Andrew Beekhof wrote: > >> It was put in when drbd called: >> >> fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; >> >> When and why it called that is

Re: [Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff

2014-03-11 Thread Andrew Beekhof
On 12 Mar 2014, at 10:32 am, Gianluca Cecchi wrote: > On Tue, Mar 11, 2014 at 11:52 PM, Andrew Beekhof wrote: >> >> On 8 Mar 2014, at 11:31 am, Gianluca Cecchi >> wrote: >> >>> I provoke power off of ovirteng01. Fencing agent works ok on >>&g

Re: [Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff

2014-03-11 Thread Andrew Beekhof
On 8 Mar 2014, at 11:31 am, Gianluca Cecchi wrote: > I provoke power off of ovirteng01. Fencing agent works ok on > ovirteng02 and reboots it. > I stop boot ofovirteng01 at grub prompt to simulate problem in boot > (for example system put in console mode due to filesystem problem) > In the mean

Re: [Pacemaker] hangs pending

2014-03-11 Thread Andrew Beekhof
Sorry for the delay, sometimes it takes a while to rebuild the necessary context On 5 Mar 2014, at 4:42 pm, Andrey Groshev wrote: > > > 05.03.2014, 04:04, "Andrew Beekhof" : >> On 25 Feb 2014, at 8:30 pm, Andrey Groshev wrote: >> >>> 21.02.2014, 12

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-03-11 Thread Andrew Beekhof
On 12 Mar 2014, at 8:40 am, Andrew Beekhof wrote: > > On 11 Mar 2014, at 6:23 pm, Vladislav Bogdanov wrote: > >> 07.03.2014 10:30, Vladislav Bogdanov wrote: >>> 07.03.2014 05:43, Andrew Beekhof wrote: >>>> >>>> On 6 Mar 20

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-03-11 Thread Andrew Beekhof
On 11 Mar 2014, at 6:23 pm, Vladislav Bogdanov wrote: > 07.03.2014 10:30, Vladislav Bogdanov wrote: >> 07.03.2014 05:43, Andrew Beekhof wrote: >>> >>> On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov wrote: >>> >>>> 18.02.2014 03:49, Andrew Bee

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-03-11 Thread Andrew Beekhof
On 11 Mar 2014, at 6:51 pm, Yusuke Iida wrote: > Hi, Andrew > > 2014-03-11 14:21 GMT+09:00 Andrew Beekhof : >> >> On 11 Mar 2014, at 4:14 pm, Andrew Beekhof wrote: >> >> [snip] >> >>> If I do this however: >>> >>> # cp star

Re: [Pacemaker] Pacemaker/corosync freeze

2014-03-11 Thread Andrew Beekhof
On 12 Mar 2014, at 1:54 am, Attila Megyeri wrote: >> >> -Original Message----- >> From: Andrew Beekhof [mailto:and...@beekhof.net] >> Sent: Tuesday, March 11, 2014 12:48 AM >> To: The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] Pacem

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-03-10 Thread Andrew Beekhof
On 11 Mar 2014, at 4:14 pm, Andrew Beekhof wrote: [snip] > If I do this however: > > # cp start.xml 1.xml; tools/cibadmin --replace -o configuration --xml-file > replace.some -V > > I start to see what you see: > > ( xml.c:4985 )info: validate_with_r

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-03-10 Thread Andrew Beekhof
2523] vm01cib: ( xml.c:1394 )info: > cib_perform_op: ++ > Mar 07 13:24:14 [2523] vm01cib: ( xml.c:1394 )info: > cib_perform_op: ++ > Mar 07 13:24:14 [2523] vm01cib: ( xml.c:1394 )info: > cib

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-03-10 Thread Andrew Beekhof
On 7 Mar 2014, at 5:35 pm, Yusuke Iida wrote: > Hi, Andrew > 2014-03-07 11:43 GMT+09:00 Andrew Beekhof : >> I don't understand... crm_mon doesn't look for changes to resources or >> constraints and it should already be using the new faster diff format. >> &

Re: [Pacemaker] ordering cloned resources

2014-03-10 Thread Andrew Beekhof
gt;> I will try to simplify the resources by getting rid of the conditional >> instance attribute and try again. In the mean time I'd be delighted to >> hear about what you guys think about that. >> >> Regards, Alex. >> >> 2014-03-07 4:21 GMT+01:00 A

Re: [Pacemaker] Pacemaker/corosync freeze

2014-03-10 Thread Andrew Beekhof
On 7 Mar 2014, at 5:54 pm, Attila Megyeri wrote: > Thanks for the quick response! > >> -Original Message----- >> From: Andrew Beekhof [mailto:and...@beekhof.net] >> Sent: Friday, March 07, 2014 3:48 AM >> To: The Pacemaker cluster resource manager >>

Re: [Pacemaker] Newbie question

2014-03-06 Thread Andrew Beekhof
pacemaker plugin itself complaining :) > > A > >> -----Original Message- >> From: Andrew Beekhof [mailto:and...@beekhof.net] >> Sent: Friday, 7 March 2014 1:45 PM >> To: The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] Newbie question

Re: [Pacemaker] ordering cloned resources

2014-03-06 Thread Andrew Beekhof
On 3 Mar 2014, at 3:56 am, Alexandre wrote: > Hi, > > I am setting up a cluster on debian wheezy. > I have installed pacemaker using the debian provided packages (so am > runing 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff). > > I have roughly 10 nodes, among which some nodes are acting as

Re: [Pacemaker] Get group behaviour with Master slave or clones envolved

2014-03-06 Thread Andrew Beekhof
On 27 Feb 2014, at 12:32 am, Néstor C. wrote: > > > > 2014-02-18 0:47 GMT+01:00 Andrew Beekhof : > > On 17 Feb 2014, at 10:34 pm, Néstor C. wrote: > > > > > > > > > 2014-02-17 1:22 GMT+01:00 Andrew Beekhof : > > > > On 21

Re: [Pacemaker] warning log is outputted after pacemaker stopped

2014-03-06 Thread Andrew Beekhof
On 25 Feb 2014, at 7:23 pm, Kazunori INOUE wrote: > 2014-02-24 11:09 GMT+09:00 Andrew Beekhof : >> >> On 24 Feb 2014, at 12:59 pm, Andrew Beekhof wrote: >> >>> >>> On 21 Feb 2014, at 9:36 pm, Kazunori INOUE >>> wrote: >>> >&g

Re: [Pacemaker] master-slave set staggered restarts

2014-03-06 Thread Andrew Beekhof
On 7 Mar 2014, at 2:39 am, Jay Janssen wrote: > > primitive p_service ... \ >op monitor interval="2s" role="Master" \ >op monitor interval="5s" role="Slave" \ >op start timeout="1s" interval="0" > ms ms_service p_service \ >meta master-max="3" clone-max="3" t

Re: [Pacemaker] Pacemaker/corosync freeze

2014-03-06 Thread Andrew Beekhof
On 7 Mar 2014, at 5:31 am, Attila Megyeri wrote: > Hello, > > We have a strange issue with Corosync/Pacemaker. > From time to time, something unexpected happens and suddenly the crm_mon > output remains static. > When I check the cpu usage, I see that one of the cores uses 100% cpu, but > ca

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-03-06 Thread Andrew Beekhof
On 26 Feb 2014, at 5:25 pm, yusuke iida wrote: > Hi, Andrew > > 2014-02-21 10:47 GMT+09:00 Andrew Beekhof : >> >> On 20 Feb 2014, at 8:39 pm, yusuke iida wrote: >> >>> Hi, Andrew >>> >>> 2014-02-20 17:28 GMT+09:00 Andrew Beekhof : >

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-03-06 Thread Andrew Beekhof
On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov wrote: > 18.02.2014 03:49, Andrew Beekhof wrote: >> >> On 31 Jan 2014, at 6:20 pm, yusuke iida wrote: >> >>> Hi, all >>> >>> I measure the performance of Pacemaker in the following combina

Re: [Pacemaker] Newbie question

2014-03-06 Thread Andrew Beekhof
On 7 Mar 2014, at 10:23 am, Alex Samad - Yieldbroker wrote: > Hi > > I have been using pacemaker and corosync for a while. > > Went to upgrade and now the latest corosync tells me the pacemake plugin > doesn't work ?? > > What is the recommended replacement for this It somewhat depends on

Re: [Pacemaker] Stopping resource using pcs

2014-03-04 Thread Andrew Beekhof
On 3 Mar 2014, at 10:40 pm, K Mehta wrote: > Has no one ever faced this issue ? > > > On Fri, Feb 28, 2014 at 11:51 PM, K Mehta wrote: > Yes, the issue is seen only with multi state resource. Non multi state > resource work fine. Looks like is_resource_started function in utils.py does >

Re: [Pacemaker] hangs pending

2014-03-04 Thread Andrew Beekhof
On 25 Feb 2014, at 8:30 pm, Andrey Groshev wrote: > > > 21.02.2014, 12:04, "Andrey Groshev" : >> 21.02.2014, 05:53, "Andrew Beekhof" : >> >>> On 19 Feb 2014, at 7:53 pm, Andrey Groshev wrote: >>>> 19.02.2014, 09:49, "A

Re: [Pacemaker] configuration lost or cib not in sync?

2014-02-25 Thread Andrew Beekhof
On 25 Feb 2014, at 8:46 pm, Michael Böhm wrote: > Hello everybody, > > today i had to stop a 2-node-cluster and bring them back up, but something > did not went as expected. Apparently one of the nodes didn't have an > up-to-date configuration, which was surprising for me as i thought this wo

Re: [Pacemaker] getting started with development

2014-02-25 Thread Andrew Beekhof
On 26 Feb 2014, at 10:10 am, Tasim Noor wrote: > Hi All, > > I would be interested in contributing to the pacemaker/linux HA codebase. I > did look through the TODO but it doesn't say which of topics are currently > worked on and which ones are open to be taken up. i would appreciate if > s

Re: [Pacemaker] moving from corosync.conf to cluster.conf

2014-02-25 Thread Andrew Beekhof
On 25 Feb 2014, at 7:56 pm, Parveen Jain wrote: > Hi Andrew, > Thanks for responding so quickly for this. > Actually I lost all of my logs as I re Installed my machine for trying some > other combination. > > Is it possible that I just upgrade the O/S to RHEL6.5 and its clustering > software

Re: [Pacemaker] Strange error message with the ocf:pacemaker:ping resource

2014-02-25 Thread Andrew Beekhof
On 26 Feb 2014, at 1:22 am, Michael Schwartzkopff wrote: > Hi, > > When I set up a ocf:pacemaker:ping resource I get the error message: > > crm_glib_handler: Cannot wait on forked child 9252: No child processes (10) Need more logs for context > > System: pacemaker 1.1.10 on gentoo. > > Mit

Re: [Pacemaker] pacemaker/corosync on CentOS 6.4/6.5 node offline after update

2014-02-24 Thread Andrew Beekhof
On 25 Feb 2014, at 2:18 am, Leon Fauster wrote: > Am 24.02.2014 um 12:53 schrieb fatcha...@gmx.de: >> >> so the only solution is to migrate from openais to cman ? Is there a less >> painful way? > > the "supported" solution is with cman, it does not mean that other stacks do > not > work, (

Re: [Pacemaker] moving from corosync.conf to cluster.conf

2014-02-24 Thread Andrew Beekhof
On 24 Feb 2014, at 7:18 pm, Parveen Jain wrote: > Hi All, > Following was my problem: > 1) My RHEL 6.3 is using CRM shell and corosync.conf. > 2) Wanted to move to RHEL6.5 and hence the underlying cluster. Also > wanted to move to recommended way of using cluster with CMAN using >

Re: [Pacemaker] Question about log level at monitor

2014-02-23 Thread Andrew Beekhof
On 21 Feb 2014, at 9:35 pm, Kazunori INOUE wrote: > 2014-02-20 18:59 GMT+09:00 Andrew Beekhof : >> >> On 20 Feb 2014, at 8:37 pm, Kazunori INOUE wrote: >> >>> Hi, >>> >>> Is this by design although log levels differ with a stonith reso

Re: [Pacemaker] warning log is outputted after pacemaker stopped

2014-02-23 Thread Andrew Beekhof
On 24 Feb 2014, at 12:59 pm, Andrew Beekhof wrote: > > On 21 Feb 2014, at 9:36 pm, Kazunori INOUE wrote: > >> Hi, >> >> WARNING of the following is outputted after pacemaker stopped in >> Pacemaker-1.1.11. >> >> Feb 21 18:22:57 bl460g1n6 ping

Re: [Pacemaker] warning log is outputted after pacemaker stopped

2014-02-23 Thread Andrew Beekhof
On 21 Feb 2014, at 9:36 pm, Kazunori INOUE wrote: > Hi, > > WARNING of the following is outputted after pacemaker stopped in > Pacemaker-1.1.11. > > Feb 21 18:22:57 bl460g1n6 ping(prmPing)[9195]: WARNING: Could not > update default_ping_set = 100: rc=141 > > > This is because pacemaker does

Re: [Pacemaker] Need help with quickstart of pacemaker on redhat

2014-02-23 Thread Andrew Beekhof
On 22 Feb 2014, at 1:26 am, Ivan wrote: > Andrew Beekhof writes: > >> >> >> On 29/08/2013, at 7:41 PM, Moturi Upendra >> wrote: >> >>> Please find the attachment >> >> The problem is that the migration-threshold has been set as

Re: [Pacemaker] pre_notify_demote is issued twice

2014-02-23 Thread Andrew Beekhof
On 21 Feb 2014, at 2:19 pm, Andrew Beekhof wrote: > > On 18 Feb 2014, at 1:23 pm, Andrew Beekhof wrote: > >> >> On 6 Feb 2014, at 7:45 pm, Keisuke MORI wrote: >> >>> Hi, >>> >>> I observed that pre_notify_demote is issued twice when

Re: [Pacemaker] hangs pending

2014-02-23 Thread Andrew Beekhof
On 22 Feb 2014, at 7:07 pm, Andrey Groshev wrote: > > > 21.02.2014, 04:00, "Andrew Beekhof" : >> On 20 Feb 2014, at 10:04 pm, Andrey Groshev wrote: >> >>> 20.02.2014, 13:57, "Andrew Beekhof" : >>>> On 20 Feb 2014, at 5:33

Re: [Pacemaker] Various Problems with Pacemaker and/or related Software

2014-02-21 Thread Andrew Beekhof
On 21 Feb 2014, at 7:35 pm, Andrew Beekhof wrote: > > On 21 Feb 2014, at 7:05 pm, Stephan Buchner wrote: > >> Am 21.02.2014 00:01, schrieb Andrew Beekhof: >>> ^^^ please dont use the plugin on rhel6. there is a noisy error message >>> saying it will be go

Re: [Pacemaker] Various Problems with Pacemaker and/or related Software

2014-02-21 Thread Andrew Beekhof
On 21 Feb 2014, at 7:05 pm, Stephan Buchner wrote: > Am 21.02.2014 00:01, schrieb Andrew Beekhof: >> ^^^ please dont use the plugin on rhel6. there is a noisy error message >> saying it will be going away very soon. > Hey Andrew, what exactly do you mean by that? Do not use

Re: [Pacemaker] hangs pending

2014-02-20 Thread Andrew Beekhof
pacemakerd for some reason? On 19 Feb 2014, at 7:53 pm, Andrey Groshev wrote: > > > 19.02.2014, 09:49, "Andrew Beekhof" : >> On 19 Feb 2014, at 4:18 pm, Andrey Groshev wrote: >> >>> 19.02.2014, 09:08, "Andrew Beekhof" : >>>> On 19 F

Re: [Pacemaker] "pcs cluster status" options seems to not work

2014-02-20 Thread Andrew Beekhof
On 21 Feb 2014, at 2:49 pm, Bob Haxo wrote: > Per 3.5 of the Configuring the RH HA Add-on with Pacemaker document, these > should return > different information. They do not. > > What I really need is something that gives this output so that > I can quickly script a check whether a resource is

Re: [Pacemaker] pre_notify_demote is issued twice

2014-02-20 Thread Andrew Beekhof
On 18 Feb 2014, at 1:23 pm, Andrew Beekhof wrote: > > On 6 Feb 2014, at 7:45 pm, Keisuke MORI wrote: > >> Hi, >> >> I observed that pre_notify_demote is issued twice when a master >> resource is migrating. >> I'm wondering if this is the

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 8:39 pm, yusuke iida wrote: > Hi, Andrew > > 2014-02-20 17:28 GMT+09:00 Andrew Beekhof : >> Who was pid 16243? >> Doesn't look like a pacemaker daemon. > pid 16243 is crm_mon. That means that the state displayed by crm_mon was > 500 updates

Re: [Pacemaker] hangs pending

2014-02-20 Thread Andrew Beekhof
On 19 Feb 2014, at 7:53 pm, Andrey Groshev wrote: > > > 19.02.2014, 09:49, "Andrew Beekhof" : >> On 19 Feb 2014, at 4:18 pm, Andrey Groshev wrote: >> >>> 19.02.2014, 09:08, "Andrew Beekhof" : >>>> On 19 Feb 2014, at 4:00

Re: [Pacemaker] Migrating resources on custom conditions

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 7:34 pm, Dan Markhasin wrote: > Hi, > > I am wondering if it is possible to configure complex/custom migration rules, > so resources would migrate in case there is a problem with the current node > it is running on. > > i.e. > > If the node has a bad disk, or high load, t

Re: [Pacemaker] hangs pending

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 10:04 pm, Andrey Groshev wrote: > > > 20.02.2014, 13:57, "Andrew Beekhof" : >> On 20 Feb 2014, at 5:33 pm, Andrey Groshev wrote: >> >>> 20.02.2014, 01:22, "Andrew Beekhof" : >>>> On 20 Feb 2014, at 4:18

Re: [Pacemaker] possible regex error in "pcs resource enable/disable"

2014-02-20 Thread Andrew Beekhof
ers) with RHEL and SLES installations. At least the in-house > engineers find the HA very confusing. This sort of a break will make it > impossible to ship with crmsh, thus continuing the confusion of two > distro specific interfaces. > > Regards, > Bob Haxo > > > On Fri

Re: [Pacemaker] Various Problems with Pacemaker and/or related Software

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 7:08 pm, Stephan Buchner wrote: > Hello everyone, > we are having some problems with pacemaker and/or related software. I hope > you can help to shed some light on the issues we are facing. > > Our Setup consists of 2 nodes and 5 services running on these nodes. > > Here co

Re: [Pacemaker] pacemaker/corosync on CentOS 6.4/6.5 node offline after update

2014-02-20 Thread Andrew Beekhof
On 21 Feb 2014, at 3:11 am, fatcha...@gmx.de wrote: > Hi, > > Im using a pacemaker/corosync 2 node cluster on an CentOS 6.4 to provide a > loadbalancer-service via pound. > After a update with yum the updatet node is not able to work in the cluster > again. > > Here is the cmr_mon and some

Re: [Pacemaker] [PATCH] update Clusters-From-Scratch to latest pcs syntax

2014-02-20 Thread Andrew Beekhof
thanks! On 21 Feb 2014, at 3:01 am, Christine Caulfield wrote: > Ap-Configuration.txt |4 ++-- > Ch-Active-Active.txt | 10 +- > Ch-Active-Passive.txt |4 ++-- > Ch-Apache.txt |4 ++-- > Ch-Installation.txt |2 +- > Ch-Shared-Storage.txt |4 ++-- > Ch-S

Re: [Pacemaker] possible regex error in "pcs resource enable/disable"

2014-02-20 Thread Andrew Beekhof
>> classify >> this as a bug that I have not reported. >> >> >> >> So, yes, how the duplicate entry got there is probably the crux of the >> issue. And >> I have no answer. I have not used crmsh to create resources (the creates >> are &

Re: [Pacemaker] possible regex error in "pcs resource enable/disable"

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 8:03 pm, Lars Marowsky-Bree wrote: > On 2014-02-19T14:39:30, Bob Haxo wrote: > >> Chris, was easy to duplicate ... I thought that I had cleared >> the error, but that had not happened. >> >> Bob Haxo >> >> [root@mici-admin ~]# pcs resource disable virt >> [root@mici-admin

Re: [Pacemaker] Question about log level at monitor

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 8:37 pm, Kazunori INOUE wrote: > Hi, > > Is this by design although log levels differ with a stonith resource > and other resources in Pacemaker-1.1.11 ? > > "P1" is id of ocf:pacemaker:Dummy resource. > "F1" is id of stonith (ex. stonith:external/ipmi) resource. > > * log

Re: [Pacemaker] hangs pending

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 5:33 pm, Andrey Groshev wrote: > > > 20.02.2014, 01:22, "Andrew Beekhof" : >> On 20 Feb 2014, at 4:18 am, Andrey Groshev wrote: >> >>> 19.02.2014, 06:47, "Andrew Beekhof" : >>>> On 18 Feb 2014,

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out "lost"?

2014-02-20 Thread Andrew Beekhof
ated to the newest kernel now. >> kernel-2.6.32-431.5.1.el6.x86_64.rpm >> >> The following parameters are set to bridge which is letting >> communication of corosync pass now. >> As a result, "Retransmit List" no longer occur almost. >> # echo 1 > /sy

Re: [Pacemaker] resource is too active problem in a 2-node cluster

2014-02-19 Thread Andrew Beekhof
: > Feb 04 11:27:38 [45168] gol-5-7-0 crmd: warning: status_from_rc: > Action 8 (GOL-HA_monitor_0) on gol-5-7-6 failed (target: 7 vs. rc: 1): Error This indicates the agent returned an error (1). > ________ > From: Andrew Beekhof [and...@b

Re: [Pacemaker] possible regex error in "pcs resource enable/disable"

2014-02-19 Thread Andrew Beekhof
Have you been mixing pcs and crmsh again? :-) The interesting part is how the dup got in there in the first place. Can you remove both settings and try to recreate that step? On 20 Feb 2014, at 9:39 am, Bob Haxo wrote: > Chris, was easy to duplicate ... I thought that I had cleared > the error

Re: [Pacemaker] hangs pending

2014-02-19 Thread Andrew Beekhof
On 20 Feb 2014, at 4:18 am, Andrey Groshev wrote: > > > 19.02.2014, 06:47, "Andrew Beekhof" : >> On 18 Feb 2014, at 9:29 pm, Andrey Groshev wrote: >> >>> Hi, ALL and Andrew! >>> >>> Today is a good day - I killed a lot, and a l

Re: [Pacemaker] [Patch]Information of "Connectivity is lost" is not displayed

2014-02-18 Thread Andrew Beekhof
> * Node srv01: >+ default_ping_set : 0 : Connectivity is lost Ah! https://github.com/beekhof/pacemaker/commit/5d51930 > > Best Regards, > Hideo Yamauchi. > --- On Wed, 2014/2/19, Andrew Beekhof wrote: > >> >> On 18 Feb 2014, at 2:38 pm,

Re: [Pacemaker] hangs pending

2014-02-18 Thread Andrew Beekhof
On 19 Feb 2014, at 4:18 pm, Andrey Groshev wrote: > > > 19.02.2014, 09:08, "Andrew Beekhof" : >> On 19 Feb 2014, at 4:00 pm, Andrey Groshev wrote: >> >>> 19.02.2014, 06:48, "Andrew Beekhof" : >>>> On 18 Feb 2014,

Re: [Pacemaker] hangs pending

2014-02-18 Thread Andrew Beekhof
On 19 Feb 2014, at 4:00 pm, Andrey Groshev wrote: > > > 19.02.2014, 06:48, "Andrew Beekhof" : >> On 18 Feb 2014, at 11:05 pm, Andrey Groshev wrote: >> >>> Hi, ALL and Andrew! >>> >>> Today is a good day - I killed a lot, and a l

Re: [Pacemaker] [Patch]Information of "Connectivity is lost" is not displayed

2014-02-18 Thread Andrew Beekhof
g crm_mon to show? > > Best Regards, > Hideo Yamauchi. > > > --- On Tue, 2014/2/18, Andrew Beekhof wrote: > >> >> On 18 Feb 2014, at 1:45 pm, renayama19661...@ybb.ne.jp wrote: >> >>> Hi Andrew, >>> >>> Thank you for comments. >

<    1   2   3   4   5   6   7   8   9   10   >