[Pacemaker] why does pacemaker execute fence action immediately when the target node becomes UNCLEAN?

2012-12-19 Thread bin chen
Hi,all I have defined a fence resource ,and cloned it.But when a node becomes UNCLEAN(I disconneted its network),the fence action will be executed immediately.Is there a method to avoid it(for example,a network tolerance time for network flash time )?For if the network is not stable, I don`t want

[Pacemaker] when is 'not installed' rechecked?

2012-12-19 Thread James Harper
I have a resource that returned 'not installed' because (I think) I had forgotten to install the required package. I've installed the package now but I still see the following every 15 minutes: Preventing ocfs2mgmt from re-starting on node1: operation monitor failed 'not installed' (rc=5) As f

Re: [Pacemaker] Split-brain on DRBD + Corosync/Pacemaker

2012-12-19 Thread Soni Maula Harriz
On Thu, Dec 20, 2012 at 12:25 AM, Felipe Gutierrez < felipe.o.gutier...@gmail.com> wrote: > Hi Soni, > > I did these configurations on my DRBD and the correct recovery of > split-brain worked well. > > http://www.drbd.org/users-guide-8.3/s-configure-split-brain-behavior.html#s-split-brain-notifica

Re: [Pacemaker] why does pacemaker migrate a vm by stopping and starting instead of migrating action?

2012-12-19 Thread bin chen
> Hi Cherish, > > On Wed, Dec 19, 2012 at 1:11 AM, bin chen wrote: > >> Hi,all >> My cluster is pacemaker 1.1.7 + corosync 2.0. I have write a >> resource agent to manage the virtual machine.The RA supports >> start,stop,migrate_from,migrate_to,monitor. >> But when I try to migrate

[Pacemaker] Stop failed resource instead of migrating

2012-12-19 Thread Jan Škoda
Hi! Is it possible to set something like started-weight score for a resource? Or make resource stop instead of migrating elsewhere (and take all the resources it's colocated with)? (That would be the same as setting resource stickiness = started-weight.) I have a spam filter that sometimes fails

Re: [Pacemaker] Split-brain on DRBD + Corosync/Pacemaker

2012-12-19 Thread Digimer
Is fencing/stonith configured in pacemaker? Can you call a fence against a peer in pacemaker and trigger a reboot of the target node? If that doesn't work, then you don't have proper fencing in pacemaker and the crm-fence-peer.sh hook script won't work. So yes, you need stonith and you need to mak

Re: [Pacemaker] Split-brain on DRBD + Corosync/Pacemaker

2012-12-19 Thread Felipe Gutierrez
Hi Digimer, I am already using crm-fence-peer.sh resource r8 { handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; split-brain "/usr/lib/drbd/notify-split-brain.sh root"; } Is Stonith still necessary? How do I configure it

Re: [Pacemaker] Split-brain on DRBD + Corosync/Pacemaker

2012-12-19 Thread Digimer
On 12/19/2012 06:21 AM, Felipe Gutierrez wrote: > Hi everyone, > > I have a scenario that I disconnect my primary from the network and the > secondary assume, becaming primary. After this, I connect the younger > primary, and both nodes became secondary(DRBD), or Slave on Pacemaker. > It is becaus

Re: [Pacemaker] Split-brain on DRBD + Corosync/Pacemaker

2012-12-19 Thread Felipe Gutierrez
Hi Soni, I did these configurations on my DRBD and the correct recovery of split-brain worked well. http://www.drbd.org/users-guide-8.3/s-configure-split-brain-behavior.html#s-split-brain-notification I believe that the article you sent to me is about split-brain on Corosync and it is different o

Re: [Pacemaker] Split-brain on DRBD + Corosync/Pacemaker

2012-12-19 Thread Felipe Gutierrez
Hi Soni, thanks for reply, I understood that is not possible if I don't have a connection back-to-back (dedicated). But I am thinking to create a script that do that for me. The commands are describe here: http://www.hastexo.com/resources/hints-and-kinks/solve-drbd-split-brain-4-steps And these co

Re: [Pacemaker] reloading crm changes

2012-12-19 Thread Paul Shannon - NOAA Federal
Well, not accurate unless they consider the crm interface as making manual changes. I do not see those errors anymore after rebooting the machines. Things are (almost) all working now. Paul - Speak the truth, but leave immediately after. - Slovenian proverb** * *Paul Shannon ITO, WFO Juneau

Re: [Pacemaker] ocf:heartbeat:apache fails to start

2012-12-19 Thread Paul Shannon - NOAA Federal
Thank you, thank you, thank you. I was chasing that problem for several days. I did not see anything in any of the logs that pointed to where the problem might be. Now I see that there is something about that requirement on the linux-ha ocf_heartbeat_apache page. Thanks again. Paul Shannon

Re: [Pacemaker] why does pacemaker migrate a vm by stopping and starting instead of migrating action?

2012-12-19 Thread mark - pacemaker list
Hi Cherish, On Wed, Dec 19, 2012 at 1:11 AM, bin chen wrote: > Hi,all > My cluster is pacemaker 1.1.7 + corosync 2.0. I have write a > resource agent to manage the virtual machine.The RA supports > start,stop,migrate_from,migrate_to,monitor. > But when I try to migrate a running

Re: [Pacemaker] why does pacemaker migrate a vm by stopping and starting instead of migrating action?

2012-12-19 Thread mark - pacemaker list
Oops, I haven't have my coffee yet this morning... I see you've written your own RA rather than using the existing ones, my apologies for the noise on the list. Mark On Wed, Dec 19, 2012 at 9:08 AM, mark - pacemaker list < m+pacema...@nerdish.us> wrote: > Hi Cherish, > > On Wed, Dec 19, 2012 at

Re: [Pacemaker] time synchronisation

2012-12-19 Thread Nikita Michalko
Am Mittwoch, 19. Dezember 2012 13:28:40 schrieb Lars Marowsky-Bree: > On 2012-12-19T13:22:54, Nikita Michalko wrote: > > > They should all read Lamport ;-) > > > > Interesting - what/who is Lamport though? > > LMGTFY: > http://en.wikipedia.org/wiki/Lamport_timestamps#Lamport.27s_logical_clock_i

Re: [Pacemaker] time synchronisation

2012-12-19 Thread Lars Marowsky-Bree
On 2012-12-19T13:22:54, Nikita Michalko wrote: > > They should all read Lamport ;-) > Interesting - what/who is Lamport though? LMGTFY: http://en.wikipedia.org/wiki/Lamport_timestamps#Lamport.27s_logical_clock_in_distributed_systems -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff

Re: [Pacemaker] time synchronisation

2012-12-19 Thread Nikita Michalko
Hi Lars! | v Am Mittwoch, 19. Dezember 2012 13:06:25 schrieb Lars Marowsky-Bree: > On 2012-12-19T10:06:25, James Harper wrote: > > What is the behaviour of a cluster when the nodes are up to 10 minutes > > out of sync with each other, because they've just been booted up after > > a crash and th

Re: [Pacemaker] time synchronisation

2012-12-19 Thread Lars Marowsky-Bree
On 2012-12-19T10:06:25, James Harper wrote: > What is the behaviour of a cluster when the nodes are up to 10 minutes > out of sync with each other, because they've just been booted up after > a crash and the hwclocks are out of date and there is no ntp time > source reachable? Could it cause lots

Re: [Pacemaker] time synchronisation

2012-12-19 Thread David Coulson
On 12/19/12 5:06 AM, James Harper wrote: What is the best way on bootup in the above situation to ensure time synchronisation? Is it as simple as having a cron job to reset the hardware clock every so often so that on reboot things are reasonable? At least RHEL and SuSE can do an explicit ntp

Re: [Pacemaker] Split-brain on DRBD + Corosync/Pacemaker

2012-12-19 Thread Soni Maula Harriz
cutting the communication link between the two nodes is not a valid failover scenario. both side will think that other nodes offline and become primary. and if you reconnect them, the splitbrain will happen. you can make the communication link redundant between the two nodes. maybe these articles c

Re: [Pacemaker] wrong device in stonith_admin -l

2012-12-19 Thread laurent+pacemaker
laurent+pacema...@u-picardie.fr writes: > In the end I'm going to fill a bug. Just for information: http://bugs.clusterlabs.org/show_bug.cgi?id=5127 "stonith_agent status" was always returning rc=0 despite being called with the port and nodename env vars, my mistake. to work around the issue w

[Pacemaker] Split-brain on DRBD + Corosync/Pacemaker

2012-12-19 Thread Felipe Gutierrez
Hi everyone, I have a scenario that I disconnect my primary from the network and the secondary assume, becaming primary. After this, I connect the younger primary, and both nodes became secondary(DRBD), or Slave on Pacemaker. It is because DRBD on younger Primary is Standalone and Outdated. It is

Re: [Pacemaker] time synchronisation

2012-12-19 Thread James Harper
> > What is the behaviour of a cluster when the nodes are up to 10 minutes out > of sync with each other, because they've just been booted up after a crash > and the hwclocks are out of date and there is no ntp time source reachable? > Could it cause lots of sig11's and constant re-elections becau

[Pacemaker] time synchronisation

2012-12-19 Thread James Harper
What is the behaviour of a cluster when the nodes are up to 10 minutes out of sync with each other, because they've just been booted up after a crash and the hwclocks are out of date and there is no ntp time source reachable? Could it cause lots of sig11's and constant re-elections because that'

Re: [Pacemaker] node status does not change even if pacemakerd dies

2012-12-19 Thread Kazunori INOUE
(12.12.13 08:26), Andrew Beekhof wrote: On Wed, Dec 12, 2012 at 8:02 PM, Kazunori INOUE wrote: Hi, I recognize that pacemakerd is much less likely to crash. However, a possibility of being killed by OOM_Killer etc. is not 0%. True. Although we just established in another thread that we don

[Pacemaker] Clone resource as a dependency

2012-12-19 Thread Attila Megyeri
Hi, How can I configure a resource (e.g. an apache) to depend on the start of a clone resource (e.g. a filesystem resource) for the given node? I know how to arrange a primitive into a group, but in this particular case, the primitive must run on the passive node as well (performing some async

[Pacemaker] unable to move resource in pacemaker 1.0.8

2012-12-19 Thread Piotr Jewiec
Hi, I am not able to move nfs to second node of my cluster, some time ago crmd on the node that NFS currently runs on was jammed (used all filedescriptors) and was kill -9'ed: Last updated: Wed Dec 19 03:39:59 2012 Stack: openaisCurrent DC: filer-1 - partition with quorumVersion:

[Pacemaker] [RFC] working selinux policy module for pacemaker

2012-12-19 Thread Vladislav Bogdanov
Hi all, I'd like to share my successful attempt to confine pacemaker. I took pacemaker module barebone found in latest fedora's selinux-policy (3.11.1-64.fc18) and extended it a bit, so now I have pacemaker and some pacemaker-managed services running confined. Everything runs on EL6 with corosy