Re: [ClusterLabs] Warning: handle_startup_fencing: Blind faith: not fencing unseen nodes

2016-12-14 Thread Ken Gaillot
On 12/14/2016 11:14 AM, Denis Gribkov wrote: > Hi Everyone, > > Our company have 15-nodes asynchronous cluster without actually > configured FENCING/STONITH (as I think) features. > > The DC node log getting tons of messages like in subject: > > pengine: warning: handle_startup_fencing: Blind

Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-15 Thread Ken Gaillot
On 12/15/2016 12:37 PM, al...@amisw.com wrote: > Hi, > > I got some trouble since one week and can't find solution by myself. Any > help will be really appreciated ! > I use corosync / pacemaker for 3 or 4 years and all works well, for > failover or load-balancing. > > I have shared ip between 3

Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-15 Thread Ken Gaillot
On 12/15/2016 02:02 PM, al...@amisw.com wrote: >> >> Seeing your configuration might help. Did you set globally-unique=true >> and clone-node-max=3 on the clone? If not, the other nodes can't pick up >> the lost node's share of requests. > > Yes for both, I have globally-unique=true, and I change

Re: [ClusterLabs] question about dc-deadtime

2016-12-15 Thread Ken Gaillot
On 12/15/2016 02:00 PM, Chris Walker wrote: > Hello, > > I have a quick question about dc-deadtime. I believe that Digimer and > others on this list might have already addressed this, but I want to > make sure I'm not missing something. > > If my understanding is correct, dc-deadtime sets the am

Re: [ClusterLabs] Nodes see each other as OFFLINE - fence agent (fence_pcmk) may not be working properly on RHEL 6.5

2016-12-16 Thread Ken Gaillot
On 12/16/2016 07:46 AM, avinash shankar wrote: > > Hello team, > > I am a newbie in pacemaker and corosync cluster. > I am facing trouble with fence_agent on RHEL 6.5 > I have installed pcs, pacemaker, corosync, cman on RHEL 6.5 on two > virtual nodes (libvirt) cluster. > SELINUX and firewall is

Re: [ClusterLabs] Cluster failure

2016-12-20 Thread Ken Gaillot
On 12/20/2016 12:21 AM, Rodrick Brown wrote: > I'm fairly new to Pacemaker and have a few questions about > > The following log event and why resources was removed from my cluster > Right before the resources being killed SIGTERM I notice the following > message. > Dec 18 19:18:18 clusternode38

Re: [ClusterLabs] Can packmaker launch haproxy from new network namespace automatically?

2016-12-21 Thread Ken Gaillot
On 12/17/2016 07:26 PM, Hao QingFeng wrote: > Hi Folks, > > I am installing packmaker to manage the cluster of haproxy within > openstack on ubuntu 16.04. > > I met the problem that haproxy can't start listening for some services > in vip because the related ports > > were occupied by those nati

[ClusterLabs] New ClusterLabs logo unveiled :-)

2016-12-22 Thread Ken Gaillot
need some tweaking to make the best use of it. You might not see it there immediately due to browser caching and DNS resolver caching (the wiki IP changed recently as part of an OS upgrade), but it's there. :-) Wishing everyone a happy holiday season, -- Ken Ga

Re: [ClusterLabs] Access denied when using Floating IP

2017-01-06 Thread Ken Gaillot
On 12/26/2016 12:03 AM, Kaushal Shriyan wrote: > Hi, > > I have set up Highly Available HAProxy Servers with Keepalived and > Floating IP. I have the below details > > *Master Node keepalived.conf* > > global_defs { > # Keepalived process identifier > #lvs_id haproxy_DH > } > # Script used to

Re: [ClusterLabs] centos 7 drbd fubar

2017-01-06 Thread Ken Gaillot
On 12/27/2016 03:08 PM, Dimitri Maziuk wrote: > I ran centos 7.3.1611 update over the holidays and my drbd + nfs + imap > active-passive pair locked up again. This has now been consistent for at > least 3 kernel updates. This time I had enough consoles open to run > fuser & lsof though. > > The pr

Re: [ClusterLabs] Status and help with pgsql RA

2017-01-06 Thread Ken Gaillot
On 12/28/2016 02:24 PM, Nils Carlson wrote: > Hi, > > I am looking to set up postgresql in high-availability and have been > comparing the guide at > http://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster with the > contents of the pgsql resource agent on github. It seems that there have > been

Re: [ClusterLabs] question about dc-deadtime

2017-01-10 Thread Ken Gaillot
> pe-input-series-max=1500 \ > > pe-error-series-max=1500 \ > > stonith-action=poweroff \ > > stonith-timeout=900 \ > > dc-deadtime=2min \ > > maintenance-mode=false \ > > h

Re: [ClusterLabs] No match for shutdown action on

2017-01-10 Thread Ken Gaillot
On 01/10/2017 11:38 AM, Denis Gribkov wrote: > Hi Everyone, > > When I run: > > # pcs resource cleanup resource_name > > I'm getting a block of messages in log on current DC node: > > Jan 10 18:12:13 node1 crmd[21635]: warning: No match for shutdown > action on node2 > Jan 10 18:12:13 node1 cr

Re: [ClusterLabs] New ClusterLabs logo unveiled :-)

2017-01-11 Thread Ken Gaillot
now if there's an "official" name, but I've been calling it the "ClusterLabs stack". > On Mon, Jan 2, 2017 at 11:35 AM, Kristoffer Grönlund <mailto:kgronl...@suse.com>> wrote: > > Ken Gaillot mailto:kgail...@redhat.com>> writes: > >

Re: [ClusterLabs] simple setup and resources on different nodes??

2017-01-11 Thread Ken Gaillot
On 01/11/2017 10:10 AM, lejeczek wrote: > hi eveyone, > I have a simple, test setup, like this: > > $ pcs status > Cluster name: test_cluster > WARNING: corosync and pacemaker node names do not match (IPs used in > setup?) > Stack: corosync > Current DC: work2.whale.private (version 1.1.15-11.el7_

Re: [ClusterLabs] Log rotation for /var/log/debug-pcmk.log

2017-01-12 Thread Ken Gaillot
On 01/12/2017 05:33 AM, Jan Pokorný wrote: > On 11/01/17 18:46 +, Andrew Nagy wrote: >> I hope this is the right place >> We have pacemaker running (on at least one) RHEL 7.2 server with a >> huge (8+ GB) /var/log/debug-pcmk.log file. No one else here knows >> anything about pacemaker and t

Re: [ClusterLabs] just some basics

2017-01-12 Thread Ken Gaillot
On 01/12/2017 11:53 AM, lejeczek wrote: > hi everyone, > > I'm going through some introductory stuff, reading about it all and one, > ok, a few questions come to mind. > I was hoping, before I find answers somewhere at end of the docs someone > here could quickly clarify: > > It is one cluster pe

Re: [ClusterLabs] get status of each RG

2017-01-16 Thread Ken Gaillot
On 01/15/2017 10:02 AM, Florin Portase wrote: > Hello, > > We're about to create some HPOM (HP Operations Manager ) monitoring > policies for RHEL7 cluster environment. > > However, it looks like getting status of running defined RG seems way to > challenging comparing to rgmanager/cman > > SO,

Re: [ClusterLabs] Pacemaker cluster not working after switching from 1.0 to 1.1 (resend as plain text)

2017-01-16 Thread Ken Gaillot
A preliminary question -- what cluster layer are you running? Pacemaker 1.0 worked with heartbeat or corosync 1, while Ubuntu 14.04 ships with corosync 2 by default, IIRC. There were major incompatible changes between corosync 1 and 2, so it's important to get that right before looking at pacemake

Re: [ClusterLabs] Problems with corosync and pacemaker with error scenarios

2017-01-16 Thread Ken Gaillot
On 01/16/2017 08:56 AM, Gerhard Wiesinger wrote: > Hello, > > I'm new to corosync and pacemaker and I want to setup a nginx cluster > with quorum. > > Requirements: > - 3 Linux maschines > - On 2 maschines floating IP should be handled and nginx as a load > balancing proxy > - 3rd maschine is for

Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-17 Thread Ken Gaillot
On 01/17/2017 08:52 AM, Ulrich Windl wrote: Oscar Segarra schrieb am 17.01.2017 um 10:15 in > Nachricht > : >> Hi, >> >> Yes, I will try to explain myself better. >> >> *Initially* >> On node1 (vdicnode01-priv) >>> virsh list >> == >> vdicdb01 started >> >> On node2 (vdicnode0

Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-17 Thread Ken Gaillot
y running a one-time monitor. So it will detect anything running at that time, and start or stop services as needed to meet the configured requirements. > 2017-01-17 16:38 GMT+01:00 Ken Gaillot <mailto:kgail...@redhat.com>>: > > On 01/17/2017 08

Re: [ClusterLabs] Live Guest Migration timeouts for VirtualDomain resources

2017-01-17 Thread Ken Gaillot
On 01/17/2017 10:19 AM, Scott Greenlese wrote: > Hi.. > > I've been testing live guest migration (LGM) with VirtualDomain > resources, which are guests running on Linux KVM / System Z > managed by pacemaker. > > I'm looking for documentation that explains how to configure my > VirtualDomain resou

Re: [ClusterLabs] Live Guest Migration timeouts for VirtualDomain resources

2017-01-18 Thread Ken Gaillot
e:20003ms queue-time:0ms > Jan 17 13:55:14 [27552] zs95kj lrmd: ( lrmd.c:1292 ) trace: > lrmd_rsc_execute: Nothing further to do for zs95kjg110061_res > Jan 17 13:55:14 [27555] zs95kj crmd: ( utils.c:1942 ) debug: > create_operation_update: do_update_resource: Updating resource > zs95kjg11

Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-18 Thread Ken Gaillot
On 01/18/2017 03:49 AM, Ferenc Wágner wrote: > Ken Gaillot writes: > >> * When you move the VM, the cluster detects that it is not running on >> the node you told it to keep it running on. Because there is no >> "Stopped" monitor, the cluster doesn't

Re: [ClusterLabs] Antw: Re: Live Guest Migration timeouts for VirtualDomain resources

2017-01-19 Thread Ken Gaillot
On 01/19/2017 01:36 AM, Ulrich Windl wrote: >>>> Ken Gaillot schrieb am 18.01.2017 um 16:32 in >>>> Nachricht > <4b02d3fa-4693-473b-8bed-dc98f9e3f...@redhat.com>: >> On 01/17/2017 04:45 PM, Scott Greenlese wrote: >>> Ken a

Re: [ClusterLabs] how do I disable/negate resource option?

2017-01-19 Thread Ken Gaillot
On 01/19/2017 06:30 AM, lejeczek wrote: > hi all > > how can it be done? Is it possible? > many thanks, > L. Check the man page / documentation for whatever tool you're using (crm, pcs, etc.). Each one has its own syntax. ___ Users mailing list: Users@

Re: [ClusterLabs] how do I disable/negate resource option?

2017-01-20 Thread Ken Gaillot
On 01/19/2017 09:52 AM, lejeczek wrote: > > > On 19/01/17 15:30, Ken Gaillot wrote: >> On 01/19/2017 06:30 AM, lejeczek wrote: >>> hi all >>> >>> how can it be done? Is it possible? >>> many thanks, >>> L. >> Check the man pag

Re: [ClusterLabs] [Question] About log collection of crm_report.

2017-01-23 Thread Ken Gaillot
On 01/23/2017 04:17 PM, renayama19661...@ybb.ne.jp wrote: > Hi All, > > When I carry out Pacemaker1.1.15 and Pacemaker1.1.16 in RHEL7.3, log in > conjunction with pacemaker is not collected in the file which I collected in > sosreport. > > > This seems to be caused by the next correction and

Re: [ClusterLabs] [Question] About log collection of crm_report.

2017-01-25 Thread Ken Gaillot
of sosreport contents? > If it is such a thing, we can understand. > > - And I test crm_report at the present, but seem to have some problems. > - I intend to report the problem by Bugzilla again. > > Best Regards, > Hideo Yamauchi. Hi Hideo, You are right, that is a p

Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-01-30 Thread Ken Gaillot
On 01/10/2017 04:24 AM, Stefan Schloesser wrote: > Hi, > > I am currently testing a 2 node cluster under Ubuntu 16.04. The setup seems > to be working ok including the STONITH. > For test purposes I issued a "pkill -f pace" killing all pacemaker processes > on one node. > > Result: > The node i

Re: [ClusterLabs] Antw: Resource Priority

2017-02-01 Thread Ken Gaillot
On 02/01/2017 09:07 AM, Ulrich Windl wrote: Chad Cravens schrieb am 01.02.2017 um 15:52 in > Nachricht > : >> Hello Cluster Fans! >> >> I've had a great time working with the clustering software. Implementing a >> HUGE cluster solution now (100+ resources) and it's working great! >> >> I had

Re: [ClusterLabs] Live Guest Migration timeouts for VirtualDomain resources

2017-02-01 Thread Ken Gaillot
) <<< New op name / value >> >> >> Where does that original op name come from in the VirtualDomain resource >> definition? How can we get the initial meta value changed and shipped > with >> a valid operation name (i.e. migrate_to), and >> maybe a mo

Re: [ClusterLabs] Failover When Host is Up, Out of Order Logs

2017-02-01 Thread Ken Gaillot
On 01/31/2017 11:44 AM, Corey Moullas wrote: > I have been getting extremely strange behavior from a Corosync/Pacemaker > install on OVH Public Cloud servers. > > > > After hours of Googling, I thought I would try posting here to see if > somebody knows what to do. > > > > I see this in my

Re: [ClusterLabs] [Question] About a change of crm_failcount.

2017-02-02 Thread Ken Gaillot
On 02/02/2017 12:23 PM, renayama19661...@ybb.ne.jp wrote: > Hi All, > > By the next correction, the user was not able to set a value except zero in > crm_failcount. > > - [Fix: tools: implement crm_failcount command-line options correctly] >- > https://github.com/ClusterLabs/pacemaker/comm

Re: [ClusterLabs] A stop job is running for pacemaker high availability cluster manager

2017-02-02 Thread Ken Gaillot
On 02/02/2017 12:35 PM, Oscar Segarra wrote: > Hi, > > I have a two node cluster... when I try to shutdown the physical host I > get the following message in console: "a stop job is running for > pacemaker high availability cluster manager" and never stops... That would be a message from systemd

Re: [ClusterLabs] Huge amount of files in /var/lib/pacemaker/pengine

2017-02-02 Thread Ken Gaillot
On 02/02/2017 12:49 PM, Oscar Segarra wrote: > Hi, > > A lot of files appear in /var/lib/pacemaker/pengine and fulls my hard disk. > > Is there any way to avoid such amount of files in that directory? > > Thanks in advance! Pacemaker saves the cluster state at each calculated transition. This

Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.

2017-02-02 Thread Ken Gaillot
On 02/02/2017 02:14 PM, Scott Greenlese wrote: > Hi folks, > > I'm testing iface-bridge resource support on a Linux KVM on System Z > pacemaker cluster. > > pacemaker-1.1.13-10.el7_2.ibm.1.s390x > corosync-2.3.4-7.el7_2.ibm.1.s390x > > I created an iface-bridge resource, but specified a non-exis

Re: [ClusterLabs] A stop job is running for pacemaker high availability cluster manager

2017-02-02 Thread Ken Gaillot
g/messages sometimes has relevant messages from non-cluster components. You'd want to look for messages like "Caught 'Terminated' signal" and "Shutting down", as well as resources being stopped ("_stop_0"), then various "Disconnect" and "Stop

Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-02-03 Thread Ken Gaillot
On 02/03/2017 07:00 AM, RaSca wrote: > > On 03/02/2017 11:06, Ferenc Wágner wrote: >> Ken Gaillot writes: >> >>> On 01/10/2017 04:24 AM, Stefan Schloesser wrote: >>> >>>> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup &

Re: [ClusterLabs] Manage Docker service and containers with pacemaker

2017-02-03 Thread Ken Gaillot
On 02/03/2017 08:15 AM, Stephane Gaucher wrote: > Hello I am completing a proof of concept. > > Here are the facts: > An active / passive cluster.Done > A drbd partition for exchanging files for different servicesDone > A shared VIP between the two nodesDone > The docker/containers are

Re: [ClusterLabs] [Question] About a change of crm_failcount.

2017-02-03 Thread Ken Gaillot
On 02/02/2017 12:33 PM, Ken Gaillot wrote: > On 02/02/2017 12:23 PM, renayama19661...@ybb.ne.jp wrote: >> Hi All, >> >> By the next correction, the user was not able to set a value except zero in >> crm_failcount. >> >> - [Fix: tools: implement crm_f

Re: [ClusterLabs] Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-06 Thread Ken Gaillot
On 02/06/2017 03:28 AM, Ulrich Windl wrote: >>>> RaSca schrieb am 03.02.2017 um 14:00 in > Nachricht > <0de64981-904f-5bdb-c98f-9c59ee47b...@miamammausalinux.org>: > >> On 03/02/2017 11:06, Ferenc Wágner wrote: >>> Ken Gaillot writes: >>> &

Re: [ClusterLabs] Failure to configure iface-bridge resource causes cluster node fence action.

2017-02-06 Thread Ken Gaillot
ource and fenced the > node instead of disabling the resource. > Just checking with you to be sure. > > Thanks again.. > > Scott Greenlese ... IBM KVM on System Z Solutions Test, Poughkeepsie, N.Y. > INTERNET: swgre...@us.ibm.com > > > > Inactive hide details fo

Re: [ClusterLabs] lrmd segfault

2017-02-06 Thread Ken Gaillot
On 02/06/2017 05:47 AM, cys wrote: > Hi All. > > Recently we got a lrmd coredump. It occured only once and we don't know how > to reproduce it. > The version we use is pacemaker-1.1.15-11. Ths os is centos 7. > > Core was generated by `/usr/libexec/pacemaker/lrmd'. > Program terminated with sig

Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-07 Thread Ken Gaillot
On 02/07/2017 01:11 AM, Ulrich Windl wrote: >>>> Ken Gaillot schrieb am 06.02.2017 um 16:13 in > Nachricht > <40eba339-2f46-28b8-4605-c7047e0ee...@redhat.com>: >> On 02/06/2017 03:28 AM, Ulrich Windl wrote: >>>>>> RaSca schrieb am 03.02.2017 um 14:

Re: [ClusterLabs] Using "mandatory" startup order but avoiding depending clones from restart after member of parent clone fails

2017-02-08 Thread Ken Gaillot
On 02/06/2017 05:25 PM, Alejandro Comisario wrote: > guys, really happy to post my first doubt. > > i'm kinda having an "conceptual" issue that's bringing me, lots of issues > i need to ensure that order of starting resources are mandatory but > that is causing me a huge issue, that is if just one

Re: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.

2017-02-09 Thread Ken Gaillot
mpute another transition paying attention to > your "failed" resource (will it try to recover it? retry the previous > transition again?). > > I would bet on crm_resource -B. Correct, crm_resource -F only simulates OCF_ERR_GENERIC, which is a soft error. It might be a ni

Re: [ClusterLabs] Using "mandatory" startup order but avoiding depending clones from restart after member of parent clone fails

2017-02-09 Thread Ken Gaillot
course, you can clean up the failure to start over (or set a failure-timeout to do that automatically). > On Thu, Feb 9, 2017 at 12:18 AM, Ken Gaillot <mailto:kgail...@redhat.com>> wrote: > > On 02/06/2017 05:25 PM, Alejandro Comisario wrote: > > guys, really

Re: [ClusterLabs] Failed reload

2017-02-09 Thread Ken Gaillot
On 02/08/2017 02:15 AM, Ferenc Wágner wrote: > Hi, > > There was an interesting discussion on this list about "Doing reload > right" last July (which I still haven't digested entirely). Now I've > got a related question about the current and intented behavior: what > happens if a reload operation

Re: [ClusterLabs] Q (SLES11 SP4): Delay after node came up (info: throttle_send_command: New throttle mode: 0000 (was ffffffff))

2017-02-09 Thread Ken Gaillot
On 01/16/2017 04:25 AM, Ulrich Windl wrote: > Hi! > > I have a question: The following happened in out 3-node cluster (n1, n2, n3): > n3 was DC, n2 was offlined, n2 came online again, n1 rebooted (went > offline/online), then n2 reboted (offline /online) > > I observed a significant delay after

Re: [ClusterLabs] Pacemaker cluster not working after switching from 1.0 to 1.1

2017-02-09 Thread Ken Gaillot
On 01/16/2017 01:16 PM, Rick Kint wrote: > >> Date: Mon, 16 Jan 2017 09:15:44 -0600 >> From: Ken Gaillot >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Pacemaker cluster not working after >> switching from 1.0 to 1.1 (resend as plain text) >

Re: [ClusterLabs] Problems with corosync and pacemaker with error scenarios

2017-02-09 Thread Ken Gaillot
On 01/16/2017 11:18 AM, Gerhard Wiesinger wrote: > Hello Ken, > > thank you for the answers. > > On 16.01.2017 16:43, Ken Gaillot wrote: >> On 01/16/2017 08:56 AM, Gerhard Wiesinger wrote: >>> Hello, >>> >>> I'm new to corosync and pacemake

Re: [ClusterLabs] Trouble setting up selfcompiled Apache in a pacemaker cluster on Oracle Linux 6.8

2017-02-09 Thread Ken Gaillot
On 01/16/2017 10:16 AM, Souvignier, Daniel wrote: > Hi List, > > > > I’ve got trouble getting Apache to work in a Pacemaker cluster I set up > between two Oracle Linux 6.8 hosts. The cluster itself works just fine, > but Apache won’t come up. Thing is here, this Apache is different from a > bas

Re: [ClusterLabs] two node cluster: vm starting - shutting down 15min later - starting again 15min later ... and so on

2017-02-09 Thread Ken Gaillot
On 02/09/2017 10:48 AM, Lentes, Bernd wrote: > Hi, > > i have a two node cluster with a vm as a resource. Currently i'm just testing > and playing. My vm boots and shuts down again in 15min gaps. > Surely this is related to "PEngine Recheck Timer (I_PE_CALC) just popped > (90ms)" found in th

Re: [ClusterLabs] two node cluster: vm starting - shutting down 15min later - starting again 15min later ... and so on

2017-02-10 Thread Ken Gaillot
On 02/10/2017 06:49 AM, Lentes, Bernd wrote: > > > - On Feb 10, 2017, at 1:10 AM, Ken Gaillot kgail...@redhat.com wrote: > >> On 02/09/2017 10:48 AM, Lentes, Bernd wrote: >>> Hi, >>> >>> i have a two node cluster with a vm as a resource. Curren

Re: [ClusterLabs] clone resource not get restarted on fail

2017-02-13 Thread Ken Gaillot
On 02/13/2017 07:57 AM, he.hailo...@zte.com.cn wrote: > Pacemaker 1.1.10 > > Corosync 2.3.3 > > > this is a 3 nodes cluster configured with 3 clone resources, each > attached wih a vip resource of IPAddr2: > > > >crm status > > > Online: [ paas-controller-1 paas-controller-2 paas-controller-

Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-02-13 Thread Ken Gaillot
On 02/08/2017 02:45 AM, Ferenc Wágner wrote: > Ken Gaillot writes: > >> On 02/03/2017 07:00 AM, RaSca wrote: >>> >>> On 03/02/2017 11:06, Ferenc Wágner wrote: >>>> Ken Gaillot writes: >>>> >>>>> On 01/10/2017 04:24 AM,

Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-13 Thread Ken Gaillot
On 02/08/2017 02:49 AM, Ferenc Wágner wrote: > Ken Gaillot writes: > >> On 02/07/2017 01:11 AM, Ulrich Windl wrote: >> >>> Ken Gaillot writes: >>> >>>> On 02/06/2017 03:28 AM, Ulrich Windl wrote: >>>> >>>>> Isn

Re: [ClusterLabs] 答复: Re: clone resource not get restarted on fail

2017-02-14 Thread Ken Gaillot
On 02/13/2017 07:08 PM, he.hailo...@zte.com.cn wrote: > Hi, > > > > crm configure show > > + crm configure show > > node $id="336855579" paas-controller-1 > > node $id="336855580" paas-controller-2 > > node $id="336855581" paas-controller-3 > > primitive apigateway ocf:heartbeat:apigateway \

Re: [ClusterLabs] 答复: Re: 答复: Re: clone resource not get restarted on fail

2017-02-15 Thread Ken Gaillot
On 02/15/2017 03:57 AM, he.hailo...@zte.com.cn wrote: > I just tried using colocation, it dosen't work. > > > I failed the node paas-controller-3, but sdclient_vip didn't get moved: The colocation would work, but the problem you're having with router and apigateway is preventing it from getting

Re: [ClusterLabs] I question whether STONITH is working.

2017-02-15 Thread Ken Gaillot
On 02/15/2017 12:17 PM, dur...@mgtsciences.com wrote: > I have 2 Fedora VMs (node1, and node2) running on a Windows 10 machine > using Virtualbox. > > I began with this. > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/ > > > When it came to fencing, I refer

Re: [ClusterLabs] MySQL Cluster: Strange behaviour when forcing movement of resources

2017-02-16 Thread Ken Gaillot
On 02/16/2017 02:26 AM, Félix Díaz de Rada wrote: > > Hi all, > > We are currently setting up a MySQL cluster (Master-Slave) over this > platform: > - Two nodes, on RHEL 7.0 > - pacemaker-1.1.10-29.el7.x86_64 > - corosync-2.3.3-2.el7.x86_64 > - pcs-0.9.115-32.el7.x86_64 > There is a IP address re

Re: [ClusterLabs] question about equal resource distribution

2017-02-17 Thread Ken Gaillot
On 02/17/2017 08:43 AM, Ilia Sokolinski wrote: > Thank you! > > What quantity does pacemaker tries to equalize - number of running resources > per node or total stickiness per node? > > Suppose I have a bunch of web server groups each with IPaddr and apache > resources, and a fewer number of da

Re: [ClusterLabs] Need help in setting up HA cluster for applications/services other than Apache tomcat.

2017-02-20 Thread Ken Gaillot
On 02/18/2017 10:55 AM, Chad Cravens wrote: > Hello Vijay: > > it seems you may want to consider developing custom Resource Agents. > Take a look at the following guide: > http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html > > I have created several, it is pretty straightforward and has alw

Re: [ClusterLabs] Adding a node to the cluster deployed with "Unicast configuration"

2017-02-22 Thread Ken Gaillot
On 02/22/2017 08:44 AM, Alejandro Comisario wrote: > Hi everyone, i have a problem when scaling a corosync/pacemaker > cluster deployed using unicast. > > eg on corosync.conf. > > nodelist { > node { > ring0_addr: 10.10.0.10 > nodeid: 1 > } > node { > ring0_addr: 10.10.0.11 > nodeid: 2 > } > } >

Re: [ClusterLabs] Adding a node to the cluster deployed with "Unicast configuration"

2017-02-22 Thread Ken Gaillot
On 02/22/2017 09:55 AM, Jan Friesse wrote: > Alejandro Comisario napsal(a): >> Hi everyone, i have a problem when scaling a corosync/pacemaker >> cluster deployed using unicast. >> >> eg on corosync.conf. >> >> nodelist { >> node { >> ring0_addr: 10.10.0.10 >> nodeid: 1 >> } >> node { >> ring0_addr

Re: [ClusterLabs] Insert delay between the statup of VirtualDomain

2017-02-23 Thread Ken Gaillot
On 02/23/2017 01:51 PM, Oscar Segarra wrote: > Hi, > > In my environment I have 5 guestes that have to be started up in a > specified order starting for the MySQL database server. > > I have set the order constraints and VirtualDomains start in the right > order but, the problem I have, is that

Re: [ClusterLabs] Never join a list without a problem...

2017-02-24 Thread Ken Gaillot
On 02/24/2017 08:36 AM, Jeffrey Westgate wrote: > Greetings all. > > I have inherited a pair of Scientific Linux 6 boxes used as front-end load > balancers for our DNS cluster. (Yes, I inherited that, too.) > > It was time to update them so we pulled snapshots (they are VMWare VMs, very > small

Re: [ClusterLabs] Ordering Sets of Resources

2017-02-26 Thread Ken Gaillot
On 02/25/2017 03:35 PM, iva...@libero.it wrote: > Hi all, > i have configured a two node cluster on redhat 7. > > Because I need to manage resources stopping and starting singularly when > they are running I have configured cluster using order set constraints. > > Here the example > > Ordering C

Re: [ClusterLabs] R: Re: Ordering Sets of Resources

2017-02-27 Thread Ken Gaillot
to ensure that things are stopped in order. If your goal is to start and stop them in order *if* they're both starting or stopping, but not *require* it, then you want kind=Optional instead of Mandatory. > > >> ----Messaggio originale >> Da: "Ken Gaillot" >

Re: [ClusterLabs] Never join a list without a problem...

2017-02-27 Thread Ken Gaillot
On 02/27/2017 01:48 PM, Jeffrey Westgate wrote: > I think I may be on to something. It seems that every time my boxes start > showing increased host load, the preceding change that takes place is: > > crmd: info: throttle_send_command: New throttle mode: 0100 (was > ) > > I'm at

Re: [ClusterLabs] Antw: Re: Ordering Sets of Resources

2017-03-01 Thread Ken Gaillot
On 03/01/2017 01:36 AM, Ulrich Windl wrote: >>>> Ken Gaillot schrieb am 26.02.2017 um 20:04 in >>>> Nachricht > : >> On 02/25/2017 03:35 PM, iva...@libero.it wrote: >>> Hi all, >>> i have configured a two node cluster on redhat 7. >>>

Re: [ClusterLabs] Cannot clone clvmd resource

2017-03-01 Thread Ken Gaillot
On 03/01/2017 03:49 PM, Anne Nicolas wrote: > Hi there > > > I'm testing quite an easy configuration to work on clvm. I'm just > getting crazy as it seems clmd cannot be cloned on other nodes. > > clvmd start well on node1 but fails on both node2 and node3. Your config looks fine, so I'm going

Re: [ClusterLabs] R: Re: Antw: Re: Ordering Sets of Resources

2017-03-01 Thread Ken Gaillot
>> Messaggio originale >> Da: "Ken Gaillot" >> Data: 01/03/2017 15.57 >> A: "Ulrich Windl", >> Ogg: Re: [ClusterLabs] Antw: Re: Ordering Sets of Resources >> >> On 03/01/2017 01:36 AM, Ulrich Windl wrote: >>>>>

Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-02 Thread Ken Gaillot
On 03/01/2017 05:28 PM, Andrew Beekhof wrote: > On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg > wrote: >> When I recently tried to make use of the DEGRADED monitoring results, >> I found out that it does still not work. >> >> Because LRMD choses to filter them in ocf2uniform_rc(), >> and maps t

Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-06 Thread Ken Gaillot
On 03/06/2017 10:55 AM, Lars Ellenberg wrote: > On Thu, Mar 02, 2017 at 05:31:33PM -0600, Ken Gaillot wrote: >> On 03/01/2017 05:28 PM, Andrew Beekhof wrote: >>> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg >>> wrote: >>>> When I recently tried to make

Re: [ClusterLabs] resource was disabled automatically

2017-03-06 Thread Ken Gaillot
On 03/06/2017 03:49 AM, cys wrote: > Hi, > > Today I found one resource was disabled. I checked that nobody did it. > The logs showed crmd(or pengine?) stopped it. I don't known why. > So I want to know will pacemaker disable resource automatically? > If so, when and why? > > Thanks. Pacemaker

Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-06 Thread Ken Gaillot
On 03/06/2017 04:15 PM, Lars Ellenberg wrote: > On Mon, Mar 06, 2017 at 12:35:18PM -0600, Ken Gaillot wrote: >>>>>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c >>>>>> index 724edb7..39a7dd1 100644 >>>>>> --- a/lrmd/lrmd.c >>>>>&

Re: [ClusterLabs] resource was disabled automatically

2017-03-07 Thread Ken Gaillot
On 03/06/2017 08:29 PM, cys wrote: > At 2017-03-07 05:47:19, "Ken Gaillot" wrote: >> To figure out why a resource was stopped, you want to check the logs on >> the DC (which will be the node with the most "pengine:" messages around >> that time). When the

Re: [ClusterLabs] VirtualDomain as non-root / encrypted

2017-03-08 Thread Ken Gaillot
On 03/08/2017 04:19 AM, philipp.achmuel...@arz.at wrote: > hi, > > Any ideas how to run VirtualDomain Resource as non-root user with > encrypted transport to remote hypervisor(ssh)? > > i'm able to start/stop/migrate vm via libvirt as non-root, but it > doesn't work with pacemaker - pacemaker run

Re: [ClusterLabs] Antw: Re: Never join a list without a problem...

2017-03-08 Thread Ken Gaillot
labs.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.clusterlabs.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@clusterlabs.org > > You can reach the person mana

Re: [ClusterLabs] Failover question

2017-03-15 Thread Ken Gaillot
Sure, just add a colocation constraint for virtual_ip with proxy. On 03/15/2017 05:06 AM, Frank Fiene wrote: > Hi, > > Another beginner question: > > I have configured a virtual IP resource on two hosts and an apache resource > cloned on both machines like this > > pcs resource create virtual_

Re: [ClusterLabs] Failover question

2017-03-16 Thread Ken Gaillot
onfigured apache as a clone, so it will run on all nodes, regardless of where the IP is -- but the IP would only be placed where apache is successfully running. >> Am 15.03.2017 um 15:15 schrieb Ken Gaillot : >> >> Sure, just add a colocation constraint for virtual_ip with proxy. &

Re: [ClusterLabs] CIB configuration: role with many expressions - error 203

2017-03-21 Thread Ken Gaillot
On 03/21/2017 11:20 AM, Radoslaw Garbacz wrote: > Hi, > > I have a problem when creating rules with many expressions: > > > boolean-op="and"> >id="on_nodes_dbx_first_head-expr" value="Active"/> >id="on_nodes_dbx_first_head-expr" value="AH"/> > >

Re: [ClusterLabs] Antw: Running two independent clusters

2017-03-22 Thread Ken Gaillot
On 03/22/2017 05:23 AM, Nikhil Utane wrote: > Hi Ulrich, > > It's not an option unfortunately. > Our product runs on a specialized hardware and provides both the > services (A & B) that I am referring to. Hence I cannot have service A > running on some nodes as cluster A and service B running on o

Re: [ClusterLabs] Antw: Re: CIB configuration: role with many expressions - error 203

2017-03-22 Thread Ken Gaillot
ving either "boolean-op" or "boolean_op" or no such phrase at all > with more than one "expression" - does not work > > > > I have found the reason: expressions IDs within a rule is the same, once > I made it unique it works. > > > Thanks,

Re: [ClusterLabs] Antw: Running two independent clusters

2017-03-23 Thread Ken Gaillot
set up the VMs as guest nodes if you want to monitor and manage multiple services within them. If your services require hardware access that's not easily passed to a VM, containerizing the services might be a better option. > On Wed, Mar 22, 2017 at 8:06 PM, Ken Gaillot <mailto:kgail..

Re: [ClusterLabs] error: The cib process (17858) exited: Key has expired (127)

2017-03-24 Thread Ken Gaillot
On 03/24/2017 08:06 AM, Rens Houben wrote: > I recently upgraded a two-node cluster (named 'castor' and 'pollux' > because I should not be allowed to think up computer names before I've > had my morning caffeine) from Debian wheezy to Jessie after the > backports for corosync and pacemaker finally

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-24 Thread Ken Gaillot
On 03/22/2017 09:42 AM, Alexander Markov wrote: > >> Please share your config along with the logs from the nodes that were >> effected. > > I'm starting to think it's not about how to define stonith resources. If > the whole box is down with all the logical partitions defined, then HMC > cannot d

Re: [ClusterLabs] error: The cib process (17858) exited: Key has expired (127)

2017-03-24 Thread Ken Gaillot
stemec > Facebook <https://www.facebook.com/systemecbv> Systemec Linkedin > <http://www.linkedin.com/company/systemec-b.v.> Systemec Youtube > <http://www.youtube.com/user/systemec1> > > > > Van: Ken Gaillot >

Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-24 Thread Ken Gaillot
On 03/24/2017 03:52 PM, Digimer wrote: > On 24/03/17 04:44 PM, Seth Reid wrote: >> I have a three node Pacemaker/GFS2 cluster on Ubuntu 16.04. Its not in >> production yet because I'm having a problem during fencing. When I >> disable the network interface of any one machine, the disabled machines

Re: [ClusterLabs] pending actions

2017-03-24 Thread Ken Gaillot
On 03/07/2017 04:13 PM, Jehan-Guillaume de Rorthais wrote: > Hi, > > Occasionally, I find my cluster with one pending action not being executed for > some minutes (I guess until the "PEngine Recheck Timer" elapse). > > Running "crm_simulate -SL" shows the pending actions. > > I'm still confused

Re: [ClusterLabs] Create ressource to monitor each IPSEC VPN

2017-03-24 Thread Ken Gaillot
On 03/09/2017 01:44 AM, Damien Bras wrote: > Hi, > > > > We have a 2 nodes cluster with ipsec (libreswan). > > Actually we have a resource to monitor the service ipsec (via system). > > > > But now I would like to monitor each VPN. Is there a way to do that ? > Which agent could I use for

Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-27 Thread Ken Gaillot
On 03/27/2017 03:54 PM, Seth Reid wrote: > > > > On Fri, Mar 24, 2017 at 2:10 PM, Ken Gaillot <mailto:kgail...@redhat.com>> wrote: > > On 03/24/2017 03:52 PM, Digimer wrote: > > On 24/03/17 04:44 PM, Seth Reid wrote: > >> I have a three n

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Ken Gaillot
On 03/28/2017 08:20 AM, Alexander Markov wrote: > Hello, Dejan, > >> Why? I don't have a test system right now, but for instance this >> should work: >> >> $ stonith -t ibmhmc ipaddr=10.1.2.9 -lS >> $ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename} > > Ah, I see. Everything (including stoni

Re: [ClusterLabs] cloned resource not deployed on all matching nodes

2017-03-28 Thread Ken Gaillot
On 03/28/2017 01:26 PM, Radoslaw Garbacz wrote: > Hi, > > I have a situation when a cloned resource is being deployed only on some > of the nodes, even though this resource is similar to others, which are > being deployed according to location rules properly. > > Please take a look at the configu

Re: [ClusterLabs] Antw: Running two independent clusters

2017-03-30 Thread Ken Gaillot
Not yet, we've been tweaking the syntax a bit, so I wanted to have something more final first. But it's very close. > > On Thu, Mar 23, 2017 at 7:35 PM, Ken Gaillot <mailto:kgail...@redhat.com>> wrote: > > On 03/22/2017 11:08 PM, Nikhil Utane wrote: >

Re: [ClusterLabs] Syncing data and reducing CPU utilization of cib process

2017-03-31 Thread Ken Gaillot
On 03/31/2017 06:44 AM, Nikhil Utane wrote: > We are seeing this log in pacemaker.log continuously. > > Mar 31 17:13:01 [6372] 0005B932ED72cib: info: > crm_compress_string: Compressed 436756 bytes into 14635 (ratio 29:1) in > 284ms > > This looks to be the reason for high CPU. What d

[ClusterLabs] Coming in Pacemaker 1.1.17: container bundles

2017-03-31 Thread Ken Gaillot
resource inside the container. The feature is currently experimental and will likely get significant bugfixes throughout the coming release cycle, but the syntax is stable and likely what will be released. I intend to add a more detailed walk-through example to th

<    6   7   8   9   10   11   12   13   14   15   >