[ClusterLabs] query on pacemaker monitor timeout

2020-12-11 Thread S Sathish S
Hi Team, Problem Statement: pcs resource monitor got timed out after 12ms and tried to recover resource(application) by stopping and starting first occurrence itself. Due to this restart resource which caused traffic impact momently in their environment. And, we suspect reason for timed out

[ClusterLabs] Calling crm executables via effective uid

2020-12-11 Thread Alex Zarifoglu
Hello,   I have question regarding the running crm commands with the effective uid.   I am trying to create a tool to manage pacemaker resources for multiple users. For security reasons, these users will only be able to create/delete/manage resources that can impact that specific user only. I canno

Re: [ClusterLabs] Can't have 2 nodes as master with galera resource agent

2020-12-11 Thread Raphael Laguerre
> Try it like this: > pcs resource create r_galera ocf:heartbeat:galera enable_creation=true > wsrep_cluster_address="gcomm://192.168.0.1,192.168.0.2" > cluster_host_map="node-01:192.168.0.1;node-02:192.168.0.2" promotable > master-max=2 promoted-max=2 > i.e. drop "meta" after "promotable" Yes, i

Re: [ClusterLabs] Can't have 2 nodes as master with galera resource agent

2020-12-11 Thread Raphael Laguerre
You are right. Thank you very much! I was confused because, as mentionned in this issue report https://github.com/ClusterLabs/resource-agents/issues/1482, the example given in the doc of the agent doesn't work because it use --master and max-master when my version of pcs must use promotable a

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-11 Thread Gabriele Bulfon
I found I can do this temporarily:   crm config property cib-bootstrap-options: no-quorum-policy=ignore   then once node 2 is up again:   crm config property cib-bootstrap-options: no-quorum-policy=stop   so that I make sure nodes will not mount in another strange situation.   Is there any better w

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-11 Thread Gabriele Bulfon
I cannot "use wait_for_all: 0", cause this would move automatically a powered off node from UNCLEAN to OFFLINE and mount the ZFS pool (total risk!): I want to manually move from UNCLEAN to OFFLINE, when I know that 2nd node is actually off!   Actually with wait_for_all to default (1) that was th

Re: [ClusterLabs] query on pacemaker monitor timeout

2020-12-11 Thread Ken Gaillot
On Thu, 2020-12-10 at 17:53 +, S Sathish S wrote: > Hi Team, > > Problem Statement: > > pcs resource monitor got timed out after 12ms and tried to > recover resource(application) by stopping and starting first > occurrence itself. Due to this restart resource which caused traffic > impac

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-11 Thread Andrei Borzenkov
11.12.2020 18:37, Gabriele Bulfon пишет: > I found I can do this temporarily: >   > crm config property cib-bootstrap-options: no-quorum-policy=ignore >   All two node clusters I remember run with setting forever :) > then once node 2 is up again: >   > crm config property cib-bootstrap-options:

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-11 Thread Ken Gaillot
On Fri, 2020-12-11 at 16:37 +0100, Gabriele Bulfon wrote: > I found I can do this temporarily: > > crm config property cib-bootstrap-options: no-quorum-policy=ignore > > then once node 2 is up again: > > crm config property cib-bootstrap-options: no-quorum-policy=stop > > so that I make sur

Re: [ClusterLabs] Can't have 2 nodes as master with galera resource agent

2020-12-11 Thread Tomas Jelinek
Dne 11. 12. 20 v 15:10 Andrei Borzenkov napsal(a): 11.12.2020 16:13, Raphael Laguerre пишет: Hello, I'm trying to setup a 2 nodes cluster with 2 galera instances. I use the ocf:heartbeat:galera resource agent, however, after I create the resource, only one node appears to be in master role, t

Re: [ClusterLabs] A word of warning regarding VirtualDomain and utilization

2020-12-11 Thread Ken Gaillot
On Fri, 2020-12-11 at 11:59 +0100, Ulrich Windl wrote: > Hi! > > A word of warning (for SLES15 SP2): > I learned that the VirtualDomain RA sets utilization parameters "cpu" > and "hv_memory" for the resource's permanent configuration. That is: > You'll see those parameters, even though you did not

Re: [ClusterLabs] Recoveing from node failure

2020-12-11 Thread Gabriele Bulfon
I tried setting wait_for_all: 0, but then when I start only 1st node, it will power off itself after few minues! :O :O :O     Sonicle S.r.l. : http://www.sonicle.com Music: http://www.gabrielebulfon.com eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets   ---

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-11 Thread Gabriele Bulfon
That's what I suspect:   sonicle@xstorage1:/sonicle/home$ pfexec crm_mon -1Arfj Stack: corosync Current DC: xstha1 (version 1.1.15-e174ec8) - partition WITHOUT quorum Last updated: Fri Dec 11 11:49:50 2020          Last change: Fri Dec 11 11:00:38 2020 by hacluster via cibadmin on xstha1 2 nodes a

[ClusterLabs] Recoveing from node failure

2020-12-11 Thread Gabriele Bulfon
Hi, I finally could manage stonith with IPMI in my 2 nodes XStreamOS/illumos storage cluster. I have NFS IPs and shared storage zpool moving from one node or the other, and stonith controllin ipmi powering off when something is not clear.   What happens now is that if I shutdown 2nd node, I see t

Re: [ClusterLabs] Can't have 2 nodes as master with galera resource agent

2020-12-11 Thread Andrei Borzenkov
11.12.2020 16:13, Raphael Laguerre пишет: > Hello, > > I'm trying to setup a 2 nodes cluster with 2 galera instances. I use the > ocf:heartbeat:galera resource agent, however, after I create the resource, > only one node appears to be in master role, the other one can't be promoted > and stays

[ClusterLabs] A word of warning regarding VirtualDomain and utilization

2020-12-11 Thread Ulrich Windl
Hi! A word of warning (for SLES15 SP2): I learned that the VirtualDomain RA sets utilization parameters "cpu" and "hv_memory" for the resource's permanent configuration. That is: You'll see those parameters, even though you did not configure them. So far, so good, but when you defined a utilizat

Re: [ClusterLabs] Recoveing from node failure

2020-12-11 Thread Reid Wahl
Hi, Gabriele. It sounds like you don't have quorum on node 1. Resources won't start unless the node is part of a quorate cluster partition. You probably have "two_node: 1" configured by default in corosync.conf. This setting automatically enables wait_for_all. >From the votequorum(5) man page:

[ClusterLabs] Antw: [EXT] Recoveing from node failure

2020-12-11 Thread Ulrich Windl
Hi! Did you take care for special "two node" settings (quorum I mean)? When I use "crm_mon -1Arfj", I see something like " * Current DC: h19 (version 2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a) - partition with quorum" What do you see? Regards, Ulrich >>> Gabriele Bulfon schrie

Re: [ClusterLabs] Q: LVM-activate a shared LV

2020-12-11 Thread Gang He
Hi Ulrish Which Linux distribution/version do you use? could you share the whole crm configure? There is a crm configuration demo for your reference. primitive dlm ocf:pacemaker:controld \ op start interval=0 timeout=90 \ op stop interval=0 timeout=100 \ op monitor interv

[ClusterLabs] Antw: [EXT] Re: Q: LVM-activate a shared LV

2020-12-11 Thread Ulrich Windl
Hi! Serveral resources are unrelated, and I wanted to keep the complexity low. Specifically because the problem seems to be related to LVM LV activation. So what could be related is lvmlockd and DLM resources, but not more. Agree? I'm using SLES15 SP2. The DLM-related config is: primitive prm_DL