Re: [ClusterLabs] to change resource id - how?

2017-04-03 Thread Ken Gaillot
On 04/03/2017 06:35 AM, lejeczek wrote: > hi > I'm sroogling and reading but cannot find any info - how to > (programmatically) change resources ids? In other words: how to rename > these entities? > many thanks > L As far as I know, higher-level tools don't support this directly -- you have to ed

Re: [ClusterLabs] Antw: Coming in Pacemaker 1.1.17: container bundles

2017-04-03 Thread Ken Gaillot
On 04/03/2017 02:12 AM, Ulrich Windl wrote: >>>> Ken Gaillot schrieb am 01.04.2017 um 00:43 in >>>> Nachricht > <981d420d-73b2-3f24-a67c-e9c66dafb...@redhat.com>: > > [...] >> Pacemaker 1.1.17 introduces a new type of resource: the "bundle

[ClusterLabs] Coming in Pacemaker 1.1.17: Per-operation fail counts

2017-04-03 Thread Ken Gaillot
cemaker 1.1.17. [1] http://lists.clusterlabs.org/pipermail/users/2016-September/004096.html -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting

Re: [ClusterLabs] Antw: Coming in Pacemaker 1.1.17: Per-operation fail counts

2017-04-04 Thread Ken Gaillot
On 04/04/2017 01:18 AM, Ulrich Windl wrote: >>>> Ken Gaillot schrieb am 03.04.2017 um 17:00 in >>>> Nachricht > : >> Hi all, >> >> Pacemaker 1.1.17 will have a significant change in how it tracks >> resource failures, though the change

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-04-04 Thread Ken Gaillot
On 03/13/2017 10:43 PM, Chris Walker wrote: > Thanks for your reply Digimer. > > On Mon, Mar 13, 2017 at 1:35 PM, Digimer > wrote: > > On 13/03/17 12:07 PM, Chris Walker wrote: > > Hello, > > > > On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4;

Re: [ClusterLabs] how start resources on the last running node

2017-04-05 Thread Ken Gaillot
On 04/04/2017 10:01 AM, Ján Poctavek wrote: > Hi, > > I came here to ask for some inspiration about my cluster setup. > > I have 3-node pcs+corosync+pacemaker cluster. When majority of nodes > exist in the cluster, everything is working fine. But what recovery > options do I have when I lose 2 of

Re: [ClusterLabs] cloned resources ordering and remote nodes problem

2017-04-06 Thread Ken Gaillot
On 04/06/2017 09:32 AM, Radoslaw Garbacz wrote: > Hi, > > > I have a question regarding resources order settings. > > Having cloned resources: "res_1-clone", "res_2-clone", > and defined order: first "res_1-clone" then "res_2-clone" > > When I have a monitoring failure on a remote node with "

Re: [ClusterLabs] cluster does not detect kill on pacemaker process ?

2017-04-07 Thread Ken Gaillot
On 04/05/2017 05:16 PM, neeraj ch wrote: > Hello All, > > I noticed something on our pacemaker test cluster. The cluster is > configured to manage an underlying database using master slave primitive. > > I ran a kill on the pacemaker process, all the other nodes kept showing > the node online.

Re: [ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-07 Thread Ken Gaillot
On 04/07/2017 12:58 PM, Eric Robinson wrote: > Somebody want to look at this log and tell me why the cluster failed over? > All we did was add a new resource. We've done it many times before without > any problems. > > -- > > Apr 03 08:50:30 [22762] ha14acib: info: cib_process_reque

Re: [ClusterLabs] [Problem] The crmd causes an error of xml.

2017-04-07 Thread Ken Gaillot
On 04/06/2017 08:49 AM, renayama19661...@ybb.ne.jp wrote: > Hi All, > > I confirmed a development edition of Pacemaker. > - > https://github.com/ClusterLabs/pacemaker/tree/71dbd128c7b0a923c472c8e564d33a0ba1816cb5 > > > property no-quorum-policy="ignore" \ > stonith-enabled="true" \

Re: [ClusterLabs] cluster does not detect kill on pacemaker process ?

2017-04-07 Thread Ken Gaillot
lfunctioning node. The rest of the cluster would then use stonith to disable that node, so it could safely recover its services elsewhere. > On Fri, Apr 7, 2017 at 7:58 AM, Ken Gaillot <mailto:kgail...@redhat.com>> wrote: > > On 04/05/2017 05:16 PM, neeraj ch wrote: >

Re: [ClusterLabs] Antw: Re: Rename option group resource id with pcs

2017-04-11 Thread Ken Gaillot
On 04/11/2017 05:48 AM, Ulrich Windl wrote: Dejan Muhamedagic schrieb am 11.04.2017 um 11:43 in > Nachricht <20170411094352.GD8414@tuttle.homenet>: >> Hi, >> >> On Tue, Apr 11, 2017 at 10:50:56AM +0200, Tomas Jelinek wrote: >>> Dne 11.4.2017 v 08:53 SAYED, MAJID ALI SYED AMJAD ALI napsal(a):

Re: [ClusterLabs] Surprising semantics of location constraints with INFINITY score

2017-04-11 Thread Ken Gaillot
On 04/11/2017 08:30 AM, Kristoffer Grönlund wrote: > Hi all, > > I discovered today that a location constraint with score=INFINITY > doesn't actually restrict resources to running only on particular > nodes. From what I can tell, the constraint assigns the score to that > node, but doesn't change

Re: [ClusterLabs] Pacemaker for Embedded Systems

2017-04-11 Thread Ken Gaillot
On 04/10/2017 03:58 PM, Chad Cravens wrote: > Hello all: > > we have implemented large cluster solutions for complex server > environments that had databases, application servers, apache web servers > and implemented fencing with the IPMI fencing agent. > > However, we are considering if pacemake

Re: [ClusterLabs] Coming in Pacemaker 1.1.17: container bundles

2017-04-17 Thread Ken Gaillot
On 04/13/2017 07:04 AM, Jan Pokorný wrote: > On 03/04/17 09:47 -0500, Ken Gaillot wrote: >> On 04/03/2017 02:12 AM, Ulrich Windl wrote: >>>>>> Ken Gaillot schrieb am 01.04.2017 um 00:43 in >>>>>> Nachricht >>> <981d420d-73b2-3f24-a67c-e9c6

Re: [ClusterLabs] KVM virtualdomain - stopped

2017-04-17 Thread Ken Gaillot
On 04/13/2017 03:01 AM, Jaco van Niekerk wrote: > > Hi > > I am having endless problems with ocf::heartbeat:VirtualDomain when > failing over to second node. The virtualdomain goes into a stopped state > > virtdom_compact (ocf::heartbeat:VirtualDomain): Stopped > > * virtdom_compact_start_0 on

Re: [ClusterLabs] nodes ID assignment issue

2017-04-17 Thread Ken Gaillot
On 04/13/2017 10:40 AM, Radoslaw Garbacz wrote: > Hi, > > I have a question regarding building CIB nodes scope and specifically > assignment to node IDs. > It seems like the preexisting scope is not honored and nodes can get > replaced based on check-in order. > > I pre-create the nodes scope bec

Re: [ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-17 Thread Ken Gaillot
On 04/13/2017 11:11 AM, Ferenc Wágner wrote: > Hi, > > I encountered several (old) statements on various forums along the lines > of: "the CIB is not a transactional database and shouldn't be used as > one" or "resource parameters should only uniquely identify a resource, > not configure it" and "

Re: [ClusterLabs] How to force remove a cluster node?

2017-04-17 Thread Ken Gaillot
On 04/13/2017 01:11 PM, Scott Greenlese wrote: > Hi, > > I need to remove some nodes from my existing pacemaker cluster which are > currently unbootable / unreachable. > > Referenced > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Ken Gaillot
On 04/18/2017 02:47 AM, Ulrich Windl wrote: Digimer schrieb am 16.04.2017 um 20:17 in Nachricht > <12cde13f-8bad-a2f1-6834-960ff3afc...@alteeve.ca>: >> On 16/04/17 01:53 PM, Eric Robinson wrote: >>> I was reading in "Clusters from Scratch" where Beekhof states, "Some would > >> argue that tw

Re: [ClusterLabs] lvm on shared storage and a lot of...

2017-04-18 Thread Ken Gaillot
On 04/18/2017 09:14 AM, lejeczek wrote: > > > On 18/04/17 14:45, Digimer wrote: >> On 18/04/17 07:31 AM, lejeczek wrote: >>> .. device_block & device_unblock in dmesg. >>> >>> and I see that the LVM resource would fail. >>> This to me seems to happen randomly, or I fail to spot a pattern. >>> >>>

Re: [ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-18 Thread Ken Gaillot
On 04/18/2017 11:46 AM, Ferenc Wágner wrote: > Ken Gaillot writes: > >> On 04/13/2017 11:11 AM, Ferenc Wágner wrote: >> >>> I encountered several (old) statements on various forums along the lines >>> of: "the CIB is not a transactional database and s

Re: [ClusterLabs] Wtrlt: Antw: Re: Antw: Re: how important would you consider to have two independent fencing device for each node ?

2017-04-20 Thread Ken Gaillot
On 04/20/2017 01:43 AM, Ulrich Windl wrote: > Should have gone to the list... > > Digimer schrieb am 19.04.2017 um 17:20 in Nachricht >> <600637f1-fef8-0a3d-821c-7aecfa398...@alteeve.ca>: >>> On 19/04/17 02:38 AM, Ulrich Windl wrote: >>> Digimer schrieb am 18.04.2017 um 19:08 in > Nachri

Re: [ClusterLabs] Colocation of a primitive resource with a clone with limited copies

2017-04-20 Thread Ken Gaillot
On 04/20/2017 10:52 AM, Jan Wrona wrote: > Hello, > > my problem is closely related to the thread [1], but I didn't find a > solution there. I have a resource that is set up as a clone C restricted > to two copies (using the clone-max=2 meta attribute||), because the > resource takes long time to

Re: [ClusterLabs] starting primitive resources of a group without starting the complete group - unclear behaviour

2017-04-20 Thread Ken Gaillot
On 04/20/2017 02:53 PM, Lentes, Bernd wrote: > Hi, > > just for the sake of completeness i'd like to figure out what happens if i > start one resource, which is a member of a group, but only this resource. > I'd like to see what the other resources of that group are doing. Also if it > maybe doe

Re: [ClusterLabs] starting primitive resources of a group without starting the complete group - unclear behaviour

2017-04-21 Thread Ken Gaillot
On 04/21/2017 04:38 AM, Lentes, Bernd wrote: > > > - On Apr 21, 2017, at 1:24 AM, Ken Gaillot kgail...@redhat.com wrote: > >> On 04/20/2017 02:53 PM, Lentes, Bernd wrote: > >> >> target-role=Stopped prevents a resource from being started. >> >>

Re: [ClusterLabs] starting primitive resources of a group without starting the complete group - unclear behaviour

2017-04-21 Thread Ken Gaillot
On 04/21/2017 07:52 AM, Lentes, Bernd wrote: > > > - On Apr 21, 2017, at 11:38 AM, Bernd Lentes > bernd.len...@helmholtz-muenchen.de wrote: > >> - On Apr 21, 2017, at 1:24 AM, Ken Gaillot kgail...@redhat.com wrote: >> >>> On 04/20/2017 02:53 PM, Le

Re: [ClusterLabs] Colocation of a primitive resource with a clone with limited copies

2017-04-21 Thread Ken Gaillot
On 04/21/2017 07:14 AM, Vladislav Bogdanov wrote: > 20.04.2017 23:16, Jan Wrona wrote: >> On 20.4.2017 19:33, Ken Gaillot wrote: >>> On 04/20/2017 10:52 AM, Jan Wrona wrote: >>>> Hello, >>>> >>>> my problem is closely related to the thread

Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?

2017-04-24 Thread Ken Gaillot
On 04/24/2017 10:32 AM, Jehan-Guillaume de Rorthais wrote: > On Mon, 24 Apr 2017 17:08:15 +0200 > Lars Ellenberg wrote: > >> On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote: >>> Hi all, >>> >>> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent >>

Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-24 Thread Ken Gaillot
On 04/24/2017 01:52 PM, Lentes, Bernd wrote: > > > - On Apr 24, 2017, at 8:26 PM, Bernd Lentes > bernd.len...@helmholtz-muenchen.de wrote: > >> Hi, >> >> i have a primitive VirtualDomain resource which i can live migrate without >> any >> problem. >> Additionally i have an IP as a resource

[ClusterLabs] Coming in Pacemaker 1.1.17: start a node in standby

2017-04-24 Thread Ken Gaillot
nently to "online", and any manual setting of standby mode would be overwritten at the next boot. Many thanks to developers Alexandra Zhuravleva and Sergey Mishin, who contributed this feature as part of a project with EMC. -- Ken Gaillot ___

Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-24 Thread Ken Gaillot
On 04/24/2017 02:33 PM, Lentes, Bernd wrote: > > - On Apr 24, 2017, at 9:11 PM, Ken Gaillot kgail...@redhat.com wrote: > >>>> primitive prim_vnc_ip_mausdb IPaddr \ >>>>params ip=146.107.235.161 nic=br0 cidr_netmask=24 \ >>>>meta is

Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-25 Thread Ken Gaillot
On 04/25/2017 09:14 AM, Lentes, Bernd wrote: > > > - On Apr 24, 2017, at 11:11 PM, Ken Gaillot kgail...@redhat.com wrote: > >> On 04/24/2017 02:33 PM, Lentes, Bernd wrote: >>> >>> ----- On Apr 24, 2017, at 9:11 PM, Ken Gaillot kgail...@re

Re: [ClusterLabs] Problem with clone ClusterIP

2017-04-25 Thread Ken Gaillot
On 04/25/2017 09:32 AM, Bratislav Petkovic wrote: > I want to make active/active cluster with two physical servers. > > On the servers are installed: oraclelinux-release-7.2-1.0.5.el7.x86_64, > > Pacemaker 1.1.13-10.el7, Corosync Cluster Engine, version '2.3.4', > > pcs 0.9.143. Cluster starts w

Re: [ClusterLabs] Problem with clone ClusterIP

2017-04-26 Thread Ken Gaillot
On 04/26/2017 02:45 AM, Bratislav Petkovic wrote: > Tahank you, > > > > We use the Cisco Nexus 7000 switches, they support Multicast MAC. > > It is possible that something is not configured correctly. > > In this environment working IBM PowerHA SystemMirror 7.1 (use Multicast) > without prob

[ClusterLabs] IPaddr2 cloning inside containers

2017-04-26 Thread Ken Gaillot
to anyone thinking about it. Pacemaker's new bundle feature doesn't support cloning the IPs it creates, but that might be an interesting future feature if this issue is resolved. -- Ken Gaillot ___ Users mailing list: Users@

Re: [ClusterLabs] in standby but still running resources..

2017-04-27 Thread Ken Gaillot
On 04/27/2017 08:29 AM, lejeczek wrote: > .. is this ok? > > hi guys, > > pcs shows no errors after I did standby node, but pcs shows resources > still are being ran on the node I just stoodby. > Is this normal? > > 0.9.152 @C7.3 > thanks > P. That should happen only for as long as it takes to

Re: [ClusterLabs] resource group vs colocation

2017-04-27 Thread Ken Gaillot
On 04/27/2017 02:02 PM, lejeczek wrote: > hi everyone > > I have a group and I'm trying to colocate - sounds strange - order with > the group is not how I want it. > I was hoping that with colocation set I can reorder the resources - can > I? Because .. something, or my is not getting there. > I h

Re: [ClusterLabs] should such a resource set work?

2017-04-28 Thread Ken Gaillot
On 04/28/2017 08:17 AM, lejeczek wrote: > hi everybody > > I have a set: > > set IP2 IP2 IP2 LVM(exclusive) mountpoint smb smartd sequential=true ^^^ Is this a typo? > setoptions score=INFINITY > > it should work, right? > > yet when I standby a node and I see cluster jumps stra

Re: [ClusterLabs] Question about fence_mpath

2017-04-28 Thread Ken Gaillot
On 04/28/2017 03:37 PM, Chris Adams wrote: > Once upon a time, Seth Reid said: >> This confused me too when I set up my cluster. I found that everything >> worked better if I didn't specify a device path. I think there was >> documentation on Redhat that led me to try removing the "device" options

Re: [ClusterLabs] How to fence cluster node when SAN filesystem fail

2017-05-02 Thread Ken Gaillot
Hi, Upstream documentation on fencing in Pacemaker is available at: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139683949958512 Higher-level tools such as crm shell and pcs make it easier; see their man pages and other documentation for detail

Re: [ClusterLabs] Resources still retains in primary node

2017-05-03 Thread Ken Gaillot
On 05/03/2017 02:30 AM, pillai bs wrote: > Hi Experts!!! > > Am having two node HA setup (Primary/Secondary) with > separate resources for Home/data/logs/Virtual IP.. As known the Expected > behavior should be , if Primary node went down, secondary has to take > in-charge (meaning in

Re: [ClusterLabs] Resources still retains in Primary Node even though its interface went down

2017-05-03 Thread Ken Gaillot
On 05/03/2017 02:43 AM, pillai bs wrote: > Hi Experts!!! > > Am having two node setup for HA (Primary/Secondary) with > separate resources for Home/data/logs/Virtual IP.. As known the Expected > behavior should be , if Primary node went down, secondary has to take > in-charge (meanin

Re: [ClusterLabs] stonith device locate on same host in active/passive cluster

2017-05-04 Thread Ken Gaillot
On 05/03/2017 09:04 PM, Albert Weng wrote: > Hi Marek, > > Thanks your reply. > > On Tue, May 2, 2017 at 5:15 PM, Marek Grac > wrote: > > > > On Tue, May 2, 2017 at 11:02 AM, Albert Weng > wrote: > > > Hi Marek, > >

Re: [ClusterLabs] crm_mon -h (writing to a html-file) not showing all desired information and having trouble with the -d option

2017-05-08 Thread Ken Gaillot
On 05/08/2017 11:13 AM, Lentes, Bernd wrote: > Hi, > > playing around with my cluster i always have a shell with crm_mon running > because it provides me a lot of useful and current information concerning > cluster, nodes, resources ... > Normally i have a "crm_mon -nrfRAL" running. > I'd like t

Re: [ClusterLabs] Instant service restart during failback

2017-05-08 Thread Ken Gaillot
If you look in the logs when the node comes back, there should be some "pengine:" messages noting that the restarts will be done, and then a "saving inputs in " message. If you can attach that file (both with and without the constraint changes would be ideal), I'll take a look at it. On 04/21/2017

Re: [ClusterLabs] pacemaker daemon shutdown time with lost remote node

2017-05-08 Thread Ken Gaillot
On 04/28/2017 02:22 PM, Radoslaw Garbacz wrote: > Hi, > > I have a question regarding pacemaker daemon shutdown > procedure/configuration. > > In my case, when a remote node is lost pacemaker needs exactly 10minutes > to shutdown, during which there is nothing logged. > So my questions: > 1. What

Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

2017-05-08 Thread Ken Gaillot
> > break; > > } > > return jobs; > > > > > > The thing is, I know that there is "High CPU load" and this is normal > > state, but I wont Pacemaker to not saying it to me and treat this &

Re: [ClusterLabs] Antw: Behavior after stop action failure with the failure-timeout set and STONITH disabled

2017-05-08 Thread Ken Gaillot
On 05/05/2017 07:49 AM, Jan Wrona wrote: > On 5.5.2017 08:15, Ulrich Windl wrote: > Jan Wrona schrieb am 04.05.2017 um 16:41 in > Nachricht >> : >>> I hope I'll be able to explain the problem clearly and correctly. >>> >>> My setup (simplified): I have two cloned resources, a filesystem mo

[ClusterLabs] Pacemaker 1.1.17-rc1 now available

2017-05-08 Thread Ken Gaillot
x27;t cover all possible use cases, so your feedback is important and appreciated. Many thanks to all contributors of source code to this release, including Alexandra Zhuravleva, Andrew Beekhof, Aravind Kumar, Eric Marques, Ferenc Wágner, Yan Gao, Hayley Swimelar, Hideo Yamauchi, Igor Tsigly

Re: [ClusterLabs] Pacemaker 1.1.17-rc1 now available

2017-05-09 Thread Ken Gaillot
On 05/09/2017 03:51 AM, Lars Ellenberg wrote: > Yay! > > On Mon, May 08, 2017 at 07:50:49PM -0500, Ken Gaillot wrote: >> "crm_attribute --pattern" to update or delete all node >> attributes matching a regular expression > > Just a nit, but "pattern

Re: [ClusterLabs] Fwd: Unable to start cluster (Pacemaker/Corosync)

2017-05-09 Thread Ken Gaillot
On 05/09/2017 02:44 AM, Handra Cs wrote: > Hi there, > > I am currently trying to configure Pacemaker/Corosync. I managed to > install the required packages for the cluster configuration, however I > could not start the cluster service. Based on the log file, there was an > issue with the director

Re: [ClusterLabs] cloned resources ordering and remote nodes problem

2017-05-09 Thread Ken Gaillot
On 04/13/2017 08:49 AM, Radoslaw Garbacz wrote: > Thank you, however in my case this parameter does not change the > described behavior. > > I have a more detail example: > order: res_A-clone -> res_B-clone -> res_C > when "res_C" is not on the node, which had "res_A" instance failed, it > will no

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-10 Thread Ken Gaillot
On 05/10/2017 12:20 AM, Kristoffer Grönlund wrote: > "Lentes, Bernd" writes: > >> - On May 8, 2017, at 9:20 PM, Bernd Lentes >> bernd.len...@helmholtz-muenchen.de wrote: >> >>> Hi, >>> >>> i remember that digimer often campaigns for a fence delay in a 2-node >>> cluster. >>> E.g. here: >>

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-10 Thread Ken Gaillot
On 05/10/2017 12:26 PM, Dimitri Maziuk wrote: > > i remember that digimer often campaigns for a fence delay in a 2-node > cluster. > ... > But ... a random delay does not seem to > be a reliable solution. > >> Some fence agents implement a delay parameter of their own, to set a

Re: [ClusterLabs] newbie question

2017-05-11 Thread Ken Gaillot
On 05/05/2017 03:09 PM, Sergei Gerasenko wrote: > Hi, > > I have a very simple question. > > Pacemaker uses a dedicated "multicast" interface for the totem protocol. > I'm using pacemaker with LVS to provide HA load balancing. LVS uses > multicast interfaces to sync the status of TCP connections

Re: [ClusterLabs] How to check if a resource on a cluster node is really back on after a crash

2017-05-11 Thread Ken Gaillot
On 05/11/2017 03:00 PM, Ludovic Vaugeois-Pepin wrote: > Hi > I translated the a Postgresql multi state RA > (https://github.com/dalibo/PAF) in Python > (https://github.com/ulodciv/deploy_cluster), and I have been editing it > heavily. > > In parallel I am writing unit tests and functional tests. >

Re: [ClusterLabs] Pacemaker remote node ofgline after reboot

2017-05-11 Thread Ken Gaillot
On 05/11/2017 03:45 PM, Ignazio Cassano wrote: > Hello, I installed a pacemaker cluster with 3 nodes and 2 remote nodes > (pacemaker remote). All nodes are centos 7.3. The remote nodes are > online and pacemaker resources are running on them. When I reboot e > remote pacemaker node it does not ret

Re: [ClusterLabs] How to check if a resource on a cluster node is really back on after a crash

2017-05-12 Thread Ken Gaillot
version 1.1.15-11.el7_3.4-e174ec8) - > partition with quorum > Last updated: Fri May 12 10:45:41 2017 Last change: Fri > May 12 10:45:39 2017 by root via crm_attribute on test1 > > 3 nodes and 4 resources configured > > Online: [ test1 t

Re: [ClusterLabs] In N+1 cluster, add/delete of one resource result in other node resources to restart

2017-05-15 Thread Ken Gaillot
On 05/15/2017 06:59 AM, Klaus Wenninger wrote: > On 05/15/2017 12:25 PM, Anu Pillai wrote: >> Hi Klaus, >> >> Please find attached cib.xml as well as corosync.conf. Maybe you're only setting this while testing, but having stonith-enabled=false and no-quorum-policy=ignore is highly dangerous in any

Re: [ClusterLabs] Pacemaker's "stonith too many failures" log is not accurate

2017-05-17 Thread Ken Gaillot
On 05/17/2017 04:56 AM, Klaus Wenninger wrote: > On 05/17/2017 11:28 AM, 井上 和徳 wrote: >> Hi, >> I'm testing Pacemaker-1.1.17-rc1. >> The number of failures in "Too many failures (10) to fence" log does not >> match the number of actual failures. > > Well it kind of does as after 10 failures it do

Re: [ClusterLabs] CIB: op-status=4 ?

2017-05-18 Thread Ken Gaillot
On 05/17/2017 06:10 PM, Radoslaw Garbacz wrote: > Hi, > > I have a question regarding ' 'op-status > attribute getting value 4. > > In my case I have a strange behavior, when resources get those "monitor" > operation entries in the CIB with op-status=4, and they do not seem to > be called (exec-t

Re: [ClusterLabs] In N+1 cluster, add/delete of one resource result in other node resources to restart

2017-05-19 Thread Ken Gaillot
t DC: 0005B9423910 (version 1.1.14-5a6cdd1) - partition with > quorum > Last updated: Tue May 16 12:21:27 2017 Last change: Tue May > 16 12:21:26 2017 by root via cibadmin on 0005B94238BC > > 3 nodes and 2 resources configured > > Online: [ 00

Re: [ClusterLabs] Question about MySQL agent behaviour, in particular _mysql_master_IP

2017-05-19 Thread Ken Gaillot
On 05/08/2017 02:02 AM, Les Green wrote: > Hi All, > > I've recently had an issue when I've needed to add > _mysql_master_IP to hosts in a MySQL HA cluster as I > needed to move a running cluster onto a VPN. > > It seems _REPL_INFO is only set when a resource is > promoted, so as a test, I: > * P

Re: [ClusterLabs] question about fence-virsh

2017-05-19 Thread Ken Gaillot
On 05/19/2017 03:47 PM, Andrew Kerber wrote: > What I am trying to say here is when I get one of the virtual machines > in a bad state, I can still log in and reboot it with the reboot > command. But I need my fencing resource to handle that reboot. > > On Fri, May 19, 2017 at 1:32 PM, Andrew Kerb

Re: [ClusterLabs] both nodes OFFLINE

2017-05-22 Thread Ken Gaillot
On 05/13/2017 01:36 AM, 石井 俊直 wrote: > Hi. > > We have, sometimes, a problem in our two nodes cluster on CentOS7. Let node-2 > and node-3 > be the names of the nodes. When the problem happens, both nodes are > recognized OFFLINE > on node-3 and on node-2, only node-3 is recognized OFFLINE. > >

Re: [ClusterLabs] In N+1 cluster, add/delete of one resource result in other node resources to restart

2017-05-22 Thread Ken Gaillot
a6cdd1) - partition with quorum > Last updated: Tue May 16 12:21:27 2017 Last change: Tue May 16 > 12:21:26 2017 by root via cibadmin on 0005B94238BC > > 3 nodes and 2 resources configured > > Online: [ 0005B94238BC 0005B9423910 0005B9427C5A ] > > > Observat

Re: [ClusterLabs] CIB: op-status=4 ?

2017-05-22 Thread Ken Gaillot
0crm_mon:debug: > find_anonymous_clone:Internally renamed dbx_nfs_nodes on > olegdbx39-vm-0 to dbx_nfs_nodes:0 > May 19 13:15:42 [8114] olegdbx39-vm-0crm_mon:debug: > find_anonymous_clone:Internally renamed dbx_ready_primary on > olegdbx39-vm-0 to dbx_ready_primary:0 &

[ClusterLabs] Pacemaker 1.1.17 Release Candidate 2

2017-05-23 Thread Ken Gaillot
configuration for constraints related to fence devices, to know whether to enable or disable them on the local node. Previously, after reading the initial configuration, it could detect later changes or removals of constraints, but not additions. Now, it can. -- Ken Gaillot

Re: [ClusterLabs] failcount is not getiing reset after failure_timeout if monitoring is disabled

2017-05-23 Thread Ken Gaillot
On 05/23/2017 08:00 AM, ashutosh tiwari wrote: > Hi, > > We are running a two node cluster(Active(X)/passive(Y)) having muliple > resources of type IpAddr2. > Running monitor operations for multiple IPAddr2 resource is actually > hoging the cpu, > as we have configured very low value for monitor

Re: [ClusterLabs] Node attribute disappears when pacemaker is started

2017-05-25 Thread Ken Gaillot
On 05/24/2017 05:13 AM, 井上 和徳 wrote: > Hi, > > After loading the node attribute, when I start pacemaker of that node, the > attribute disappears. > > 1. Start pacemaker on node1. > 2. Load configure containing node attribute of node2. >(I use multicast addresses in corosync, so did not set "

Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-05-25 Thread Ken Gaillot
On 05/24/2017 12:27 PM, Dan Ragle wrote: > I suspect this has been asked before and apologize if so, a google > search didn't seem to find anything that was helpful to me ... > > I'm setting up an active/active two-node cluster and am having an issue > where one of my two defined clusterIPs will n

Re: [ClusterLabs] resource monitor logging

2017-05-25 Thread Ken Gaillot
On 05/24/2017 03:44 PM, Christopher Pax wrote: > > I am running postgresql as a resource in corosync. and there is a > monitor process that kicks off every few seconds to see if postgresqlis > alive (it runs a select now()). My immediate concernis that it is > generating alotof logs in auth.log, a

Re: [ClusterLabs] clearing failed actions

2017-05-30 Thread Ken Gaillot
On 05/30/2017 09:13 AM, Attila Megyeri wrote: > Hi, > > > > Shouldn’t the > > > > cluster-recheck-interval="2m" > > > > property instruct pacemaker to recheck the cluster every 2 minutes and > clean the failcounts? It instructs pacemaker to recalculate whether any actions need to be t

[ClusterLabs] Pacemaker 1.1.17 Release Candidate 3

2017-05-31 Thread Ken Gaillot
maker Remote node unless necessary. Testing and feedback is welcome! -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started:

Re: [ClusterLabs] clearing failed actions

2017-05-31 Thread Ken Gaillot
On 05/30/2017 02:50 PM, Attila Megyeri wrote: > Hi Ken, > > >> -Original Message----- >> From: Ken Gaillot [mailto:kgail...@redhat.com] >> Sent: Tuesday, May 30, 2017 4:32 PM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] clearing fail

Re: [ClusterLabs] clearing failed actions

2017-05-31 Thread Ken Gaillot
On 05/31/2017 12:17 PM, Ken Gaillot wrote: > On 05/30/2017 02:50 PM, Attila Megyeri wrote: >> Hi Ken, >> >> >>> -Original Message- >>> From: Ken Gaillot [mailto:kgail...@redhat.com] >>> Sent: Tuesday, May 30, 2017 4:32 PM >>>

Re: [ClusterLabs] Pacemaker occasionally takes minutes to respond

2017-05-31 Thread Ken Gaillot
On 05/24/2017 08:04 AM, Attila Megyeri wrote: > Hi Klaus, > > Thank you for your response. > I tried many things, but no luck. > > We have many pacemaker clusters with 99% identical configurations, package > versions, and only this one causes issues. (BTW we use unicast for corosync, > but this

Re: [ClusterLabs] Pacemaker's "stonith too many failures" log is not accurate

2017-05-31 Thread Ken Gaillot
op->call_options & > st_opt_sync_call, FALSE); > + > /* mark this op as having notify's already sent */ > op->notify_sent = TRUE; > free_xml(reply); > > Regards, > Kazunori INOUE > >> -Original Message- >> From: Ken Gaillo

Re: [ClusterLabs] Node attribute disappears when pacemaker is started

2017-05-31 Thread Ken Gaillot
On 05/26/2017 03:21 AM, 井上 和徳 wrote: > Hi Ken, > > I got crm_report. > > Regards, > Kazunori INOUE I don't think it attached -- my mail client says it's 0 bytes. >> -----Original Message- >> From: Ken Gaillot [mailto:kgail...@redhat.com] >> S

Re: [ClusterLabs] crm_resource -c field

2017-06-07 Thread Ken Gaillot
On 06/05/2017 10:19 AM, iva...@libero.it wrote: > Hello, > could you explain the meaning of fields in "crm_resource -c" command (c > in lowercase)? > > I've tried to search on web but i didn't find anything. > > Thanks and regards > > Ivan It's used solely by pacemaker's cluster test suite (CTS

Re: [ClusterLabs] Cloned IP not moving back after node restart or standby

2017-06-07 Thread Ken Gaillot
On 05/30/2017 11:47 AM, Przemyslaw Kulczycki wrote: > Hi. > I'm trying to setup a 2-node corosync+pacemaker cluster to function as > an active-active setup for nginx with a shared IP. > > I've discovered (much to my disappointment) that every time I restart > one node or put it in standby, the sec

Re: [ClusterLabs] Cloned IP not moving back after node restart or standby

2017-06-07 Thread Ken Gaillot
On 06/02/2017 06:33 AM, Takehiro Matsushima wrote: > Hi, > > You should not clone IPaddr2 resource. > Clone means that to run the resource at same time on both nodes, so > these nodes will have same duplicated IP address on a network. > > Specifically, you need to configure a IPaddr2 resource run

Re: [ClusterLabs] clearing failed actions

2017-06-07 Thread Ken Gaillot
m: Attila Megyeri [mailto:amegy...@minerva-soft.com] >> Sent: Thursday, June 1, 2017 6:57 PM >> To: kgail...@redhat.com; Cluster Labs - All topics related to open-source >> clustering welcomed >> Subject: Re: [ClusterLabs] clearing failed actions >> >> thanks Ken, >> &

Re: [ClusterLabs] Node attribute disappears when pacemaker is started

2017-06-08 Thread Ken Gaillot
ched it to GitHub, so look at it. > https://github.com/inouekazu/pcmk_report/blob/master/pcmk-Fri-26-May-2017.tar.bz2 > >> -----Original Message- >> From: Ken Gaillot [mailto:kgail...@redhat.com] >> Sent: Thursday, June 01, 2017 8:43 AM >> To: users@clusterlabs.or

Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-12 Thread Ken Gaillot
On 06/10/2017 10:53 AM, Dan Ragle wrote: > So I guess my bottom line question is: How does one tell Pacemaker that > the individual legs of globally unique clones should *always* be spread > across the available nodes whenever possible, regardless of the number > of processes on any one of the node

Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-12 Thread Ken Gaillot
On 06/12/2017 09:23 AM, Klaus Wenninger wrote: > On 06/12/2017 04:02 PM, Ken Gaillot wrote: >> On 06/10/2017 10:53 AM, Dan Ragle wrote: >>> So I guess my bottom line question is: How does one tell Pacemaker that >>> the individual legs of globally unique clones should *a

Re: [ClusterLabs] Does an order set include colocation of resources?

2017-06-14 Thread Ken Gaillot
On 06/14/2017 11:52 AM, Jeff Johnson wrote: > If I have a resource group: > > # pcs resource show > Resource Group: nfs-zfs > nfsnet(ocf::heartbeat:IPaddr2):Started server-a-hb.internal > nfs-daemon(ocf::heartbeat:nfsserver):Started server-a-hb.internal > samba(

Re: [ClusterLabs] Pacemaker shutting down peer node

2017-06-15 Thread Ken Gaillot
On 06/15/2017 12:38 AM, Jaz Khan wrote: > Hi, > > I have been encountering this serious issue from past couple of months. > I really have no idea that why pacemaker sends shutdown signal to peer > node and it goes down. This is very strange and I am too much worried . > > This is not happening d

Re: [ClusterLabs] Pacemaker shutting down peer node

2017-06-16 Thread Ken Gaillot
14 15:52:27 apex1 crmd[18733]: notice: do_shutdown of peer ha-apex2 > is complete > > > Best regards, > Jaz > > > > > > Message: 1 > Date: Thu, 15 Jun 2017 13:53:00 -0500 > From: Ken Gaillot mailto:kgail...@redhat.com>> >

Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-16 Thread Ken Gaillot
On 06/16/2017 01:18 PM, Dan Ragle wrote: > > > On 6/12/2017 10:30 AM, Ken Gaillot wrote: >> On 06/12/2017 09:23 AM, Klaus Wenninger wrote: >>> On 06/12/2017 04:02 PM, Ken Gaillot wrote: >>>> On 06/10/2017 10:53 AM, Dan Ragle wrote: >>>>> So I

Re: [ClusterLabs] what is the best practice for removing a node temporary (e.g. for installing updates) ?

2017-06-19 Thread Ken Gaillot
On 06/19/2017 10:23 AM, Lentes, Bernd wrote: > Hi, > > what would you consider to be the best way for removing a node temporary from > the cluster, e.g. for installing updates ? > I thought "crm node maintenance node" would be the right way, but i was > astonished that the resources keep running

[ClusterLabs] Pacemaker 1.1.17 Release Candidate 4 (likely final)

2017-06-20 Thread Ken Gaillot
final release. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:

Re: [ClusterLabs] Pacemaker 1.1.17 Release Candidate 4 (likely final)

2017-06-21 Thread Ken Gaillot
On 06/21/2017 02:58 AM, Ferenc Wágner wrote: > Ken Gaillot writes: > >> The most significant change in this release is a new cluster option to >> improve scalability. >> >> As users start to create clusters with hundreds of resources and many >> nodes, one bot

Re: [ClusterLabs] clearing failed actions

2017-06-21 Thread Ken Gaillot
ctmgrpengine: info: LogActions: >>> Leave db- >>> mysql:1 (Slave ctdb2) >>> Jun 19 17:37:06 [18997] ctmgrpengine: notice: process_pe_message: >>> Calculated Transition 38: /var/lib/pacemaker/pengine/pe-input-16.bz2 >>> Jun

Re: [ClusterLabs] vip is not removed after node lost connection with the other two nodes

2017-06-23 Thread Ken Gaillot
On 06/22/2017 09:44 PM, Hui Xiang wrote: > Hi guys, > > I have setup 3 nodes(node-1, node-2, node-3) as controller nodes, an > vip is selected by pacemaker between them, after manually make the > management interface down in node-1 (used by corosync) but still have > connectivity to public or no

Re: [ClusterLabs] vip is not removed after node lost connection with the other two nodes

2017-06-23 Thread Ken Gaillot
On 06/23/2017 11:52 AM, Dimitri Maziuk wrote: > On 06/23/2017 11:24 AM, Jan Pokorný wrote: > >> People using ifdown or the iproute-based equivalent seem far >> too prevalent, even if for long time bystanders the idea looks >> continually disproved ad nauseam. > > Has anyone had a network card fai

[ClusterLabs] clusterlabs.org now supports https :-)

2017-06-26 Thread Ken Gaillot
/letsencrypt.org/ [2] https://wiki.clusterlabs.org/ [3] https://bugs.clusterlabs.org/ -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org G

Re: [ClusterLabs] Question about STONITH for VM HA cluster in shared hosts environment

2017-06-29 Thread Ken Gaillot
On 06/29/2017 12:08 PM, Digimer wrote: > On 29/06/17 12:39 PM, Andrés Pozo Muñoz wrote: >> Hi all, >> >> I am a newbie to Pacemaker and I can't find the perfect solution for my >> problem (probably I'm missing something), maybe someone can give me some >> hint :) >> >> My scenario is the following:

Re: [ClusterLabs] reboot node / cluster standby

2017-06-29 Thread Ken Gaillot
On 06/29/2017 04:42 AM, philipp.achmuel...@arz.at wrote: > Hi, > > In order to reboot a Clusternode i would like to set the node to standby > first, so a clean takeover for running resources can take in place. > Is there a default way i can set in pacemaker, or do i have to setup my > own systemd

<    7   8   9   10   11   12   13   14   15   16   >