[ClusterLabs] Antw: Re: Establishing Timeouts

2016-10-10 Thread Ulrich Windl
>>> Klaus Wenninger schrieb am 10.10.2016 um 20:42 in Nachricht <0713ae34-7606-a82b-47f8-5cc64bfca...@redhat.com>: > On 10/10/2016 08:35 PM, Eric Robinson wrote: >> Basically, when we turn off a switch, I want to keep the cluster from > failing over before Linux bonding has had a chance to recove

[ClusterLabs] Antw: Re: Establishing Timeouts

2016-10-10 Thread Ulrich Windl
>>> Klaus Wenninger schrieb am 10.10.2016 um 20:04 in Nachricht <936e4d4b-df5c-246d-4552-5678653b3...@redhat.com>: > On 10/10/2016 06:58 PM, Eric Robinson wrote: >> Thanks for the clarification. So what's the easiest way to ensure that the > cluster waits a desired timeout before deciding that a

[ClusterLabs] Antw: Re: Pacemaker and Oracle ASM

2016-10-10 Thread Ulrich Windl
>>> Chad Cravens schrieb am 10.10.2016 um 19:18 in Nachricht : > The client has specifically said that they purposefully chosen not to > implement Oracle RAC (I'm not sure why). It's a large program and the Most of the time Oracle is a question of how many $$$ you have ;-) > budget and technolog

Re: [ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-10 Thread Andrei Borzenkov
On Tue, Oct 11, 2016 at 9:18 AM, Ulrich Windl wrote: > > My point is this: For a resource that can only exclusively run on one node, > it's important that the other node is down before taking action. But for cLVM > and OCFS2 the resources can run concurrently on each node, Both require coordina

[ClusterLabs] Antw: Re: OCFS2 on cLVM with node waiting for fencing timeout

2016-10-10 Thread Ulrich Windl
{ emmanuel segura schrieb am 10.10.2016 um 16:49 in > Nachricht > : > Node h01 (old DC) was fenced at Oct 10 10:06:33 Node h01 went down around Oct 10 10:06:37. DLM noticed that on node h05: Oct 10 10:06:44 h05 cluster-dlm[12063]: dlm_process_node: Removed inactive node 739512321: born-on=

Re: [ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-10 Thread Andrew Beekhof
On Mon, Oct 10, 2016 at 8:04 AM, Digimer wrote: > > The only geo-located/stretch cluster approach that I've seen that makes > any sense and seems genuinely safe is SUSE's 'pacemaker booth' project. Also arriving in RHEL 7.3 Might be tech preview though. __

[ClusterLabs] Easy Linux Bonding Question?

2016-10-10 Thread Eric Robinson
Short version: How many missed arp_intervals does the bonding driver wait before removing the PASSIVE slave from the bond? Long version: I'm confused about this because I know the passive slave watches for the active slave's arp broadcast as a way of knowing that the passive slave's link is goo

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Pavel Levshin
Thanks for all suggestions. It is really odd for me that this usecase, which is very basic for simple virtualization cluster, is not described in every FAQ out there... It appears that my setup is working correctly with non-symmetrical ordering constraints: Ordering Constraints: start d

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Eric Robinson
> As you probably know when manual maintenance is being done you might > take advantage of that knowledge by e.g. unmanaging the cluster during that > time. We have a LOT of clusters to manage for a small team. I would prefer to just have them survive brief network outages that last for a know

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 08:35 PM, Eric Robinson wrote: > Basically, when we turn off a switch, I want to keep the cluster from failing > over before Linux bonding has had a chance to recover. > > I'm mostly interested in prventing false-positive cluster failovers that > might occur during manual network m

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Eric Robinson
Basically, when we turn off a switch, I want to keep the cluster from failing over before Linux bonding has had a chance to recover. I'm mostly interested in prventing false-positive cluster failovers that might occur during manual network maintenance (for example, testing switch and link outa

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Eric Robinson
I'm mostly interested in prventing false-positive cluster failovers that might occur during manual network maintenance (for example, testing switch and link outages). >> Thanks for the clarification. So what's the easiest way to ensure that the >> cluster waits a >> desired timeout before de

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 06:58 PM, Eric Robinson wrote: > Thanks for the clarification. So what's the easiest way to ensure that the > cluster waits a desired timeout before deciding that a re-convergence is > necessary? By raising the token (lost) timeout I would say. Please correct my (Chrissie) but I

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 06:56 PM, Ken Gaillot wrote: > On 10/10/2016 10:21 AM, Klaus Wenninger wrote: >> On 10/10/2016 04:54 PM, Ken Gaillot wrote: >>> On 10/10/2016 07:36 AM, Pavel Levshin wrote: 10.10.2016 15:11, Klaus Wenninger: > On 10/10/2016 02:00 PM, Pavel Levshin wrote: >> 10.10.2016 14:

Re: [ClusterLabs] Pacemaker and Oracle ASM

2016-10-10 Thread Chad Cravens
The client has specifically said that they purposefully chosen not to implement Oracle RAC (I'm not sure why). It's a large program and the budget and technology platforms were already put in place before I was involved. In other words, I'm not sure why, not sure there's a reason, but RedHat HA Clu

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Eric Robinson
Thanks for the clarification. So what's the easiest way to ensure that the cluster waits a desired timeout before deciding that a re-convergence is necessary? -- Eric Robinson -Original Message- From: Christine Caulfield [mailto:ccaul...@redhat.com] Sent: Monday, October 10, 2016

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Ken Gaillot
On 10/10/2016 10:21 AM, Klaus Wenninger wrote: > On 10/10/2016 04:54 PM, Ken Gaillot wrote: >> On 10/10/2016 07:36 AM, Pavel Levshin wrote: >>> 10.10.2016 15:11, Klaus Wenninger: On 10/10/2016 02:00 PM, Pavel Levshin wrote: > 10.10.2016 14:32, Klaus Wenninger: >> Why are the order-cons

Re: [ClusterLabs] Pacemaker and Oracle ASM

2016-10-10 Thread emmanuel segura
why you don't use oracle rac with asm? 2016-10-07 18:46 GMT+02:00 Chad Cravens : > Hello: > > I'm working on a project where the client is using Oracle ASM (volume > manager) for database storage. I have implemented a cluster before using > LVM with ext4 and understand there are resource agents (

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 04:54 PM, Ken Gaillot wrote: > On 10/10/2016 07:36 AM, Pavel Levshin wrote: >> 10.10.2016 15:11, Klaus Wenninger: >>> On 10/10/2016 02:00 PM, Pavel Levshin wrote: 10.10.2016 14:32, Klaus Wenninger: > Why are the order-constraints between libvirt & vms optional? If they w

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Ken Gaillot
On 10/10/2016 07:36 AM, Pavel Levshin wrote: > 10.10.2016 15:11, Klaus Wenninger: >> On 10/10/2016 02:00 PM, Pavel Levshin wrote: >>> 10.10.2016 14:32, Klaus Wenninger: Why are the order-constraints between libvirt & vms optional? >>> If they were mandatory, then all the virtual machines would

Re: [ClusterLabs] OCFS2 on cLVM with node waiting for fencing timeout

2016-10-10 Thread emmanuel segura
{ocfs2}->{dlm}->{fencing}->{timeout} 2016-10-10 16:46 GMT+02:00 Ulrich Windl : > Hi! > > I observed an interesting thing: In a three node cluster (SLES11 SP4) with > cLVM and OCFS2 on top, one node was fenced as the OCFS2 filesystem was > somehow busy on unmount. We have (for paranoid reasons ma

[ClusterLabs] OCFS2 on cLVM with node waiting for fencing timeout

2016-10-10 Thread Ulrich Windl
Hi! I observed an interesting thing: In a three node cluster (SLES11 SP4) with cLVM and OCFS2 on top, one node was fenced as the OCFS2 filesystem was somehow busy on unmount. We have (for paranoid reasons mainly) an excessive long fencing timout for SBD: 180 seconds While one node was actually

Re: [ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 04:25 PM, Ken Gaillot wrote: > On 10/09/2016 11:02 PM, Digimer wrote: >> On 09/10/16 11:58 PM, Andrei Borzenkov wrote: >>> 10.10.2016 00:42, Eric Robinson пишет: Digimer, thanks for your thoughts. Booth is one of the solutions I looked at, but I don't like it because it is c

Re: [ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-10 Thread Ken Gaillot
On 10/09/2016 11:02 PM, Digimer wrote: > On 09/10/16 11:58 PM, Andrei Borzenkov wrote: >> 10.10.2016 00:42, Eric Robinson пишет: >>> Digimer, thanks for your thoughts. Booth is one of the solutions I >>> looked at, but I don't like it because it is complex and difficult to >>> implement >> >> HA is

Re: [ClusterLabs] Cluster active/active

2016-10-10 Thread Ken Gaillot
x] > Started: [ node01 ] > Stopped: [ node02 ] > root@node01:/usr/local/etc/log_zabbix# > > > zabbix log; > > root@node01:/usr/local/etc# tail -f log_zabbix/zabbix_server.log > 7360:20161010:011833.254 server #23 started [discoverer #5] > 7363:20

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 02:36 PM, Pavel Levshin wrote: > 10.10.2016 15:11, Klaus Wenninger: >> On 10/10/2016 02:00 PM, Pavel Levshin wrote: >>> 10.10.2016 14:32, Klaus Wenninger: Why are the order-constraints between libvirt & vms optional? >>> If they were mandatory, then all the virtual machines would

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Pavel Levshin
10.10.2016 15:11, Klaus Wenninger: On 10/10/2016 02:00 PM, Pavel Levshin wrote: 10.10.2016 14:32, Klaus Wenninger: Why are the order-constraints between libvirt & vms optional? If they were mandatory, then all the virtual machines would be restarted when libvirtd restarts. This is not desired

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 02:00 PM, Pavel Levshin wrote: > > 10.10.2016 14:32, Klaus Wenninger: >> Why are the order-constraints between libvirt & vms optional? > > If they were mandatory, then all the virtual machines would be > restarted when libvirtd restarts. This is not desired nor needed. When > this hap

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-10 Thread Klaus Wenninger
On 10/07/2016 11:10 PM, renayama19661...@ybb.ne.jp wrote: > Hi All, > > Our user may not necessarily use sdb. > > I confirmed that there was a method using WD service of corosync as one > method not to use sdb. > Pacemaker watches the process of pacemaker by WD service using CMAP and can > carry

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Pavel Levshin
10.10.2016 14:32, Klaus Wenninger: Why are the order-constraints between libvirt & vms optional? If they were mandatory, then all the virtual machines would be restarted when libvirtd restarts. This is not desired nor needed. When this happens, the node is fenced because it is unable to rest

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Christine Caulfield
On 10/10/16 05:51, Eric Robinson wrote: > I have about a dozen corosync+pacemaker clusters and I am just now getting > around to understanding timeouts. > > Most of my corosync.conf files look something like this: > > version:2 > token: 5000 > token_retra

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 10:17 AM, Pavel Levshin wrote: > Hello. > > We are trying to migrate our services to relatively fresh version of > cluster software. It is RHEL 7 with pacemaker 1.1.13-10. I’ve faced a > problem when live migration of virtual machines is allowed. In short, > I need to manage libvirtd,

Re: [ClusterLabs] Antw: Establishing Timeouts

2016-10-10 Thread Eric Robinson
> AFAIK, it _all_ ARP targets did not respond _once_ the link will be > considered down It would be great if someone could confirm that. > after "Down Delay". I guess you want to use multiple (and the correct ones) > ARP IP targets... Yes, I use multiple targets, and arp_all_targets=any. Do

Re: [ClusterLabs] Corosync ring shown faulty between healthy nodes & networks (rrp_mode: passive)

2016-10-10 Thread Jan Friesse
Thanks for all responses from Jan, Ulrich and Digimer ! We are already using bond'ed network interfaces, but we are also forced to go across IP-subnets. Certain routes between routers can go and have gone missing. This has happened for one of our node's public network, where it was inaccessible