Re: [ClusterLabs] DRBD fencing issue on failover causes resource failure

2016-03-19 Thread Digimer
On 16/03/16 01:17 PM, Tim Walberg wrote: > Having an issue on a newly built CentOS 7.2.1511 NFS cluster with DRBD > (drbd84-utils-8.9.5-1 with kmod-drbd84-8.4.7-1_1). At this point, the > resources consist of a cluster address, a DRBD device mirroring between > the two cluster nodes, the file syste

Re: [ClusterLabs] Moving resources and implicit bans - please explain?

2016-03-19 Thread Ken Gaillot
On 03/16/2016 02:38 PM, Matthew Mucker wrote: > I have set up my first three-node Pacemaker cluster and was doing some > testing by using "crm resource move" commands. I found that once I moved a > resource off a particular node, it would not come back up on that node. I > spent a while troubles

Re: [ClusterLabs] pacemaker remote configuration on ubuntu 14.04

2016-03-19 Thread Сергей Филатов
I’m fairly new to pacemaker, could you tell me what could the blocker? root@controller-1:~# pcs constraint Location Constraints: Resource: clone_p_dns Enabled on: controller-1.domain.com (score:100) Resource: clone_p_haproxy Enabled on: controller-1.domain.com (score:100) Resource: cl

Re: [ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Ferenc Wágner
Andrei Borzenkov writes: > On Wed, Mar 16, 2016 at 2:22 PM, Ferenc Wágner wrote: > >> Pacemaker explained says about this cluster option: >> >> Advanced Use Only: Should the cluster shoot unseen nodes? Not using >> the default is very unsafe! >> >> 1. What are those "unseen" nodes? > > N

Re: [ClusterLabs] reproducible split brain

2016-03-19 Thread Ken Gaillot
On 03/16/2016 03:04 PM, Christopher Harvey wrote: > On Wed, Mar 16, 2016, at 04:00 PM, Digimer wrote: >> On 16/03/16 03:59 PM, Christopher Harvey wrote: >>> I am able to create a split brain situation in corosync 1.1.13 using >>> iptables in a 3 node cluster. >>> >>> I have 3 nodes, vmr-132-3, vmr-

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Digimer
On 17/03/16 07:30 PM, Christopher Harvey wrote: > On Thu, Mar 17, 2016, at 06:24 PM, Ken Gaillot wrote: >> On 03/17/2016 05:10 PM, Christopher Harvey wrote: >>> If I ignore pacemaker's existence, and just run corosync, corosync >>> disagrees about node membership in the situation presented in the f

Re: [ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Andrei Borzenkov
On Wed, Mar 16, 2016 at 4:18 PM, Lars Ellenberg wrote: > On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote: >> >> And some more about fencing: >> >> >> >> 3. What's the difference in cluster behavior between >> >>- stonith-enabled=FALSE (9.3.2: how often will the stop operation be

Re: [ClusterLabs] attrd: Fix sigsegv on exit if initialization failed

2016-03-19 Thread Ken Gaillot
On 10/12/2015 06:08 AM, Vladislav Bogdanov wrote: > Hi, > > This was caught with 0.17.1 libqb, which didn't play well with long pids. > > commit 180a943846b6d94c27b9b984b039ac0465df64da > Author: Vladislav Bogdanov > Date: Mon Oct 12 11:05:29 2015 + > > attrd: Fix sigsegv on exit if i

Re: [ClusterLabs] PCS, Corosync, Pacemaker, and Bind (Ken Gaillot)

2016-03-19 Thread Andrei Borzenkov
On Wed, Mar 16, 2016 at 9:35 PM, Mike Bernhardt wrote: > I guess I have to say "never mind!" I don't know what the problem was > yesterday, but it loads just fine today, even when the named config and the > virtual ip don't match! But for your edamacation, ifconfig does NOT show the > address alth

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Bogdan Dobrelya
On 03/17/2016 11:24 PM, Ken Gaillot wrote: > On 03/17/2016 05:10 PM, Christopher Harvey wrote: >> If I ignore pacemaker's existence, and just run corosync, corosync >> disagrees about node membership in the situation presented in the first >> email. While it's true that stonith just happens to quic

[ClusterLabs] Moving resources and implicit bans - please explain?

2016-03-19 Thread Matthew Mucker
I have set up my first three-node Pacemaker cluster and was doing some testing by using "crm resource move" commands. I found that once I moved a resource off a particular node, it would not come back up on that node. I spent a while troubleshooting and eventually gave up and rebuilt the node.

[ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Ferenc Wágner
Hi, Pacemaker explained says about this cluster option: Advanced Use Only: Should the cluster shoot unseen nodes? Not using the default is very unsafe! 1. What are those "unseen" nodes? And a possibly related question: 2. If I've got UNCLEAN (offline) nodes, is there a way to clean the

[ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Ulrich Windl
>>> Christopher Harvey schrieb am 16.03.2016 um 21:04 in Nachricht <1458158684.122207.551267810.11f73...@webmail.messagingengine.com>: [...] >> > Would stonith solve this problem, or does this look like a bug? >> >> It should, that is its job. > > is there some log I can enable that would say >

Re: [ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Andrei Borzenkov
On Wed, Mar 16, 2016 at 2:22 PM, Ferenc Wágner wrote: > Hi, > > Pacemaker explained says about this cluster option: > > Advanced Use Only: Should the cluster shoot unseen nodes? Not using > the default is very unsafe! > > 1. What are those "unseen" nodes? > Nodes that lost communication w

Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-03-19 Thread Nikhil Utane
Thanks Ken for the detailed response. I suppose I could even use some of the pcs/crm CLI commands then. Cheers. On Wed, Mar 16, 2016 at 8:27 PM, Ken Gaillot wrote: > On 03/16/2016 05:22 AM, Nikhil Utane wrote: > > I see following info gets updated in CIB. Can I use this or there is > better > >

Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-03-19 Thread Ken Gaillot
On 03/16/2016 05:22 AM, Nikhil Utane wrote: > I see following info gets updated in CIB. Can I use this or there is better > way? > > crm-debug-origin="peer_update_callback" join="*down*" expected="member"> in_ccm/crmd/join reflect the current state of the node (as known by the partition that you

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread vija ar
root file system is fine ... but fencing is not a necessity a cluster shld function without it .. i see the issue with corosync which has all been .. a inherent way of not working neatly or smoothly .. for e.g. take an issue where the live node is hung in db cluster .. now db perspective transact

Re: [ClusterLabs] Security with Corosync

2016-03-19 Thread Jan Friesse
Nikhil Utane napsal(a): Honza, In my CIB I see the infrastructure being set to cman. pcs status is reporting the same. [root@node3 corosync]# pcs status Cluster name: mycluster Last updated: Wed Mar 16 16:57:46 2016 Last change: Wed Mar 16 16:56:23 2016 Stack: *cman* But corosync also is run

[ClusterLabs] Antw: Installed Galera, now HAProxy won't start

2016-03-19 Thread Ulrich Windl
>>> Matthew Mucker schrieb am 16.03.2016 um 23:10 in >>> Nachricht [...] > So thinking this through logically, it seems to me that the Openstack > docs were wrong in telling me to configure MariaDB server to bind to all > available ports In a cluster environment with virtual IP addresse

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Digimer
On 19/03/16 10:10 AM, Dennis Jacobfeuerborn wrote: > On 18.03.2016 00:50, Digimer wrote: >> On 17/03/16 07:30 PM, Christopher Harvey wrote: >>> On Thu, Mar 17, 2016, at 06:24 PM, Ken Gaillot wrote: On 03/17/2016 05:10 PM, Christopher Harvey wrote: > If I ignore pacemaker's existence, and j

[ClusterLabs] reproducible split brain

2016-03-19 Thread Christopher Harvey
I am able to create a split brain situation in corosync 1.1.13 using iptables in a 3 node cluster. I have 3 nodes, vmr-132-3, vmr-132-4, and vmr-132-5 All nodes are operational and form a 3 node cluster with all nodes are members of that ring. vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-

[ClusterLabs] [Announce] clufter-0.56.2 released

2016-03-19 Thread Jan Pokorný
I am happy to announce that clufter-0.56.2, a tool/library for transforming/analyzing cluster configuration formats, has been released and published (incl. signature using my 60BCBB4F5CD7F9EF key, expiration of which was prolonged just a few days back so you may want to consult key servers first):

Re: [ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Lars Ellenberg
On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote: > >> And some more about fencing: > >> > >> 3. What's the difference in cluster behavior between > >>- stonith-enabled=FALSE (9.3.2: how often will the stop operation be > >> retried?) > >>- having no configured STONITH devices

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Dennis Jacobfeuerborn
On 18.03.2016 00:50, Digimer wrote: > On 17/03/16 07:30 PM, Christopher Harvey wrote: >> On Thu, Mar 17, 2016, at 06:24 PM, Ken Gaillot wrote: >>> On 03/17/2016 05:10 PM, Christopher Harvey wrote: If I ignore pacemaker's existence, and just run corosync, corosync disagrees about node memb

Re: [ClusterLabs] PCS, Corosync, Pacemaker, and Bind (Ken Gaillot)

2016-03-19 Thread Dennis Jacobfeuerborn
On 17.03.2016 08:45, Andrei Borzenkov wrote: > On Wed, Mar 16, 2016 at 9:35 PM, Mike Bernhardt wrote: >> I guess I have to say "never mind!" I don't know what the problem was >> yesterday, but it loads just fine today, even when the named config and the >> virtual ip don't match! But for your edam

Re: [ClusterLabs] reproducible split brain

2016-03-19 Thread Digimer
On 16/03/16 03:59 PM, Christopher Harvey wrote: > I am able to create a split brain situation in corosync 1.1.13 using > iptables in a 3 node cluster. > > I have 3 nodes, vmr-132-3, vmr-132-4, and vmr-132-5 > > All nodes are operational and form a 3 node cluster with all nodes are > members of th

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Jan Friesse
Christopher, If I ignore pacemaker's existence, and just run corosync, corosync disagrees about node membership in the situation presented in the first email. While it's true that stonith just happens to quickly correct the situation after it occurs it still smells like a bug in the case where

[ClusterLabs] Antw: Re: Pacemaker startup-fencing

2016-03-19 Thread Ulrich Windl
>>> Ferenc Wágner schrieb am 16.03.2016 um 13:47 in Nachricht <87k2l2zj0n@lant.ki.iif.hu>: [...] > Then I wonder why I hear the "must have working fencing if you value > your data" mantra so often (and always without explanation). After all, > it does not risk the data, only the automatic clu

Re: [ClusterLabs] PCS, Corosync, Pacemaker, and Bind

2016-03-19 Thread Ken Gaillot
On 03/15/2016 06:47 PM, Mike Bernhardt wrote: > Not sure if this is a BIND question or a PCS/Corosync question, but > hopefully someone has done this before: > > > > I'm setting up a new CentOS 7 DNS server cluster to replace our very old > CentOS 4 cluster. The old one uses heartbeat which is

[ClusterLabs] Cluster goes to unusable state if fencing resource is down

2016-03-19 Thread Arjun Pandey
Hi I am running a 2 node cluster with this config on centos 6.6 where i have a multi-state resource foo being run in master/slave mode and a bunch of floating IP addresses configured. Additionally i have a collocation constraint for the IP addr to be collocated with the master. When i configure

[ClusterLabs] DRBD fencing issue on failover causes resource failure

2016-03-19 Thread Tim Walberg
Having an issue on a newly built CentOS 7.2.1511 NFS cluster with DRBD (drbd84-utils-8.9.5-1 with kmod-drbd84-8.4.7-1_1). At this point, the resources consist of a cluster address, a DRBD device mirroring between the two cluster nodes, the file system, and the nfs-server resource. The resources all

[ClusterLabs] Installed Galera, now HAProxy won't start

2016-03-19 Thread Matthew Mucker
Sorry, folks, for being a pest here, but I'm finding the learning curve on this clustering stuff to be pretty steep. I'm following the docs to set up a three-node Openstack Controller cluster. I got Pacemaker running and I had two resources, the virtual IP and HAProxy, up and running and I cou

Re: [ClusterLabs] Security with Corosync

2016-03-19 Thread Jan Friesse
Nikhil Utane napsal(a): [root@node3 corosync]# corosync -v Corosync Cluster Engine, version '1.4.7' Copyright (c) 2006-2009 Red Hat, Inc. So it is 1.x :( When I begun I was following multiple tutorials and ended up installing multiple packages. Let me try moving to corosync 2.0. I suppose it sho

Re: [ClusterLabs] Security with Corosync

2016-03-19 Thread Nikhil Utane
Honza, In my CIB I see the infrastructure being set to cman. pcs status is reporting the same. [root@node3 corosync]# pcs status Cluster name: mycluster Last updated: Wed Mar 16 16:57:46 2016 Last change: Wed Mar 16 16:56:23 2016 Stack: *cman* But corosync also is running fine. [root@node2 ni

Re: [ClusterLabs] Installed Galera, now HAProxy won't start

2016-03-19 Thread Ian
> configure MariaDB server to bind to all available ports ( http://docs.openstack.org/ha-guide/controller-ha-galera-config.html, scroll to "Database Configuration," note that bind-address is 0.0.0.0.). If MariaDB binds to the virtual IP address, then HAProxy can't bind to that address and therefore

[ClusterLabs] Reload operation for multi-state resource agent

2016-03-19 Thread Michael Lychkov
Hello everyone, Is there way to initiate reload operation call of master instance of multi-state resource agent? I have an ocf multi-state resource agent for a daemon service and I added reload op into this resource agent: * two parameters of resource agent: ... ...

Re: [ClusterLabs] [DRBD-user] DRBD fencing issue on failover causes resource failure

2016-03-19 Thread Digimer
On 16/03/16 01:51 PM, Tim Walberg wrote: > Is there a way to make this work properly without STONITH? I forgot to mention > that both nodes are virtual machines (QEMU/KVM), which makes STONITH a minor > challenge. Also, since these symptoms occur even under "pcs cluster standby", > where STONITH *s

Re: [ClusterLabs] Security with Corosync

2016-03-19 Thread Nikhil Utane
Honza, Actually this is only for a PoC (Proof of Concept) setup. Next step is to move it to a different platform where we are cross-compiling from the sources. I'd like the PoC setup to have the same version as the final one. Thanks. On Thu, Mar 17, 2016 at 1:07 PM, Jan Friesse wrote: > Nikhil

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Digimer
On 17/03/16 01:57 PM, vija ar wrote: > root file system is fine ... > > but fencing is not a necessity a cluster shld function without it .. i > see the issue with corosync which has all been .. a inherent way of not > working neatly or smoothly .. Absolutely wrong. If you have a service that ca

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Ken Gaillot
On 03/17/2016 05:10 PM, Christopher Harvey wrote: > If I ignore pacemaker's existence, and just run corosync, corosync > disagrees about node membership in the situation presented in the first > email. While it's true that stonith just happens to quickly correct the > situation after it occurs it s

[ClusterLabs] [Announce] libqb 10.rc4 release

2016-03-19 Thread Christine Caulfield
This is a bugfix release and a potential 1.0 candidate. There are no actual code changes in this release, most of the patches are to the build system. Thanks to Jan Pokorný for, er, all of them. I've bumped the library soname to 0.18.0 which should really have happened last time. Changes from 1.0

Re: [ClusterLabs] Cluster failover failure with Unresolved dependency

2016-03-19 Thread Ken Gaillot
On 03/16/2016 11:20 AM, Lorand Kelemen wrote: > Dear Ken, > > I already modified the startup as suggested during testing, thanks! I > swapped the postfix ocf resource to the amavisd systemd resource, as latter > controls postfix startup also as it turns out and having both resouces in > the mail-s

Re: [ClusterLabs] [Announce] libqb 10.rc4 release

2016-03-19 Thread Jan Pokorný
On 17/03/16 16:37 +, Christine Caulfield wrote: > This is a bugfix release and a potential 1.0 candidate. Primarily serving for building some of the components in the common cluster stack nowadays, libqb releases should likely be announced (also) in developers ML, release candidates in particu

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Christopher Harvey
If I ignore pacemaker's existence, and just run corosync, corosync disagrees about node membership in the situation presented in the first email. While it's true that stonith just happens to quickly correct the situation after it occurs it still smells like a bug in the case where corosync in used

Re: [ClusterLabs] [DRBD-user] DRBD fencing issue on failover causes resource failure

2016-03-19 Thread Thomas Lamprecht
On 16.03.2016 18:51, Tim Walberg wrote: > Is there a way to make this work properly without STONITH? I forgot to mention > that both nodes are virtual machines (QEMU/KVM), which makes STONITH a minor > challenge. Also, since these symptoms occur even under "pcs cluster standby", > where STONITH *

Re: [ClusterLabs] Security with Corosync

2016-03-19 Thread Nikhil Utane
[root@node3 corosync]# corosync -v Corosync Cluster Engine, version '1.4.7' Copyright (c) 2006-2009 Red Hat, Inc. So it is 1.x :( When I begun I was following multiple tutorials and ended up installing multiple packages. Let me try moving to corosync 2.0. I suppose it should be as easy as doing yu

Re: [ClusterLabs] [DRBD-user] DRBD fencing issue on failover causes resource failure

2016-03-19 Thread Tim Walberg
Is there a way to make this work properly without STONITH? I forgot to mention that both nodes are virtual machines (QEMU/KVM), which makes STONITH a minor challenge. Also, since these symptoms occur even under "pcs cluster standby", where STONITH *shouldn't* be invoked, I'm not sure if that's the