Re: [ClusterLabs] Beginner lost with promotable "group" design

2024-01-31 Thread Adam Cecile
On 1/17/24 16:33, Ken Gaillot wrote: On Wed, 2024-01-17 at 14:23 +0100, Adam Cécile wrote: Hello, I'm trying to achieve the following setup with 3 hosts: * One master gets a shared IP, then remove default gw, add another gw, start a service * Two slaves should have none of them but add a

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-30 Thread Klaus Wenninger
On Tue, Jan 30, 2024 at 2:21 PM Walker, Chris wrote: > >>> However, now it seems to wait that amount of time before it elects a > >>> DC, even when quorum is acquired earlier. In my log snippet below, > >>> with dc-deadtime 300s, > >> > >> The dc-deadtime is not waiting for quorum, but for

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-30 Thread Ken Gaillot
On Tue, 2024-01-30 at 13:20 +, Walker, Chris wrote: > >>> However, now it seems to wait that amount of time before it > elects a > >>> DC, even when quorum is acquired earlier. In my log snippet > below, > >>> with dc-deadtime 300s, > >> > >> The dc-deadtime is not waiting for quorum, but for

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-30 Thread Walker, Chris
>>> However, now it seems to wait that amount of time before it elects a >>> DC, even when quorum is acquired earlier. In my log snippet below, >>> with dc-deadtime 300s, >> >> The dc-deadtime is not waiting for quorum, but for another DC to show >> up. If all nodes show up, it can proceed, but

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-29 Thread Faaland, Olaf P. via Users
>> However, now it seems to wait that amount of time before it elects a >> DC, even when quorum is acquired earlier. In my log snippet below, >> with dc-deadtime 300s, > > The dc-deadtime is not waiting for quorum, but for another DC to show > up. If all nodes show up, it can proceed, but

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-29 Thread Ken Gaillot
On Mon, 2024-01-29 at 14:35 -0800, Reid Wahl wrote: > > > On Monday, January 29, 2024, Ken Gaillot wrote: > > On Mon, 2024-01-29 at 18:05 +, Faaland, Olaf P. via Users > wrote: > >> Hi, > >> > >> I have configured clusters of node pairs, so each cluster has 2 > >> nodes. The cluster

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-29 Thread Ken Gaillot
On Mon, 2024-01-29 at 22:48 +, Faaland, Olaf P. wrote: > Thank you, Ken. > > I changed my configuration management system to put an initial > cib.xml into /var/lib/pacemaker/cib/, which sets all the property > values I was setting via pcs commands, including dc-deadtime. I > removed those

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-29 Thread Faaland, Olaf P. via Users
Thank you, Ken. I changed my configuration management system to put an initial cib.xml into /var/lib/pacemaker/cib/, which sets all the property values I was setting via pcs commands, including dc-deadtime. I removed those "pcs property set" commands from the ones that are run at startup

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-29 Thread Reid Wahl
On Monday, January 29, 2024, Ken Gaillot wrote: > On Mon, 2024-01-29 at 18:05 +, Faaland, Olaf P. via Users wrote: >> Hi, >> >> I have configured clusters of node pairs, so each cluster has 2 >> nodes. The cluster members are statically defined in corosync.conf >> before corosync or

Re: [ClusterLabs] controlling cluster behavior on startup

2024-01-29 Thread Ken Gaillot
On Mon, 2024-01-29 at 18:05 +, Faaland, Olaf P. via Users wrote: > Hi, > > I have configured clusters of node pairs, so each cluster has 2 > nodes. The cluster members are statically defined in corosync.conf > before corosync or pacemaker is started, and quorum {two_node: 1} is > set. > >

[ClusterLabs] controlling cluster behavior on startup

2024-01-29 Thread Faaland, Olaf P. via Users
Hi, I have configured clusters of node pairs, so each cluster has 2 nodes. The cluster members are statically defined in corosync.conf before corosync or pacemaker is started, and quorum {two_node: 1} is set. When both nodes are powered off and I power them on, they do not start pacemaker at

Re: [ClusterLabs] trigger something at ?

2024-01-29 Thread Klaus Wenninger
On Mon, Jan 29, 2024 at 5:22 PM Ken Gaillot wrote: > On Fri, 2024-01-26 at 13:55 +0100, lejeczek via Users wrote: > > Hi guys. > > > > Is it possible to trigger some... action - I'm thinking specifically > > at shutdown/start. > > If not within the cluster then - if you do that - perhaps

Re: [ClusterLabs] trigger something at ?

2024-01-29 Thread Ken Gaillot
On Fri, 2024-01-26 at 13:55 +0100, lejeczek via Users wrote: > Hi guys. > > Is it possible to trigger some... action - I'm thinking specifically > at shutdown/start. > If not within the cluster then - if you do that - perhaps outside. > I would like to create/remove constraints, when cluster

[ClusterLabs] GCP and IP address question

2024-01-26 Thread Strahil Nikolov via Users
Hello All, I will soon build my first cluster in the cloud and I was wondering if I can still use IPAddr2 resource in GCP or I really have to use ocf:heartbeat:gcp-vpc-move-route & ocf:heartbeat:gcp-vpc-move-vip ? I'm still trying to find a guide, so I can understand the idea behind those

[ClusterLabs] trigger something at ?

2024-01-26 Thread lejeczek via Users
Hi guys. Is it possible to trigger some... action - I'm thinking specifically at shutdown/start. If not within the cluster then - if you do that - perhaps outside. I would like to create/remove constraints, when cluster starts & stops, respectively. many thanks,

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-25 Thread Ken Gaillot
On Thu, 2024-01-25 at 10:31 +0100, Jehan-Guillaume de Rorthais wrote: > On Wed, 24 Jan 2024 16:47:54 -0600 > Ken Gaillot wrote: > ... > > > Erm. Well, as this is a major upgrade where we can affect > > > people's > > > conf and > > > break old things & so on, I'll jump in this discussion with a >

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-25 Thread Jehan-Guillaume de Rorthais via Users
On Wed, 24 Jan 2024 16:47:54 -0600 Ken Gaillot wrote: ... > > Erm. Well, as this is a major upgrade where we can affect people's > > conf and > > break old things & so on, I'll jump in this discussion with a > > wishlist to > > discuss :) > > > > I made sure we're tracking all these (links

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-24 Thread Ken Gaillot
On Tue, 2024-01-23 at 18:49 +0100, Jehan-Guillaume de Rorthais wrote: > Hi there ! > > On Wed, 03 Jan 2024 11:06:27 -0600 > Ken Gaillot wrote: > > > Hi all, > > > > I'd like to release Pacemaker 3.0.0 around the middle of this > > year. > > I'm gathering proposed changes here: > > > > > >

[ClusterLabs] New ClusterLabs wiki

2024-01-23 Thread Ken Gaillot
Hi all, The ClusterLabs project manager is now publicly viewable, without needing a GitHub account: https://projects.clusterlabs.org/ Anyone can now follow issues tracked there. (Issues created before the site was public will still require an account unless someone updates their settings.)

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-23 Thread Jehan-Guillaume de Rorthais via Users
Hi there ! On Wed, 03 Jan 2024 11:06:27 -0600 Ken Gaillot wrote: > Hi all, > > I'd like to release Pacemaker 3.0.0 around the middle of this year. > I'm gathering proposed changes here: > > https://projects.clusterlabs.org/w/projects/pacemaker/pacemaker_3.0_changes/ > > Please review for

Re: [ClusterLabs] When did the CIB change how it reports in_ccm and crmd?

2024-01-20 Thread Reid Wahl
On Sat, Jan 20, 2024 at 1:18 PM Madison Kelly wrote: > > I'm sure it was announced and I missed it, but I just tripped over my > pants when an update changed 'in_ccm' and 'crmd' in the CIB from > 'true/false' to timestamps... > > When did that happen? Is there an announcement marking other

[ClusterLabs] When did the CIB change how it reports in_ccm and crmd?

2024-01-20 Thread Madison Kelly
I'm sure it was announced and I missed it, but I just tripped over my pants when an update changed 'in_ccm' and 'crmd' in the CIB from 'true/false' to timestamps... When did that happen? Is there an announcement marking other changes that happened at the same time? Cheers, Madi -- wiki -

[ClusterLabs] Corosync main process was not scheduled

2024-01-19 Thread Mr.R via Users
Hi all?? In the HA cluster built by corosync+pacemaker, the following log appears on host01: Jan 03 03:38:41 [2095] host01 corosync warning [MAIN ] Corosync main process was not scheduled (@1704224321984) for 6552.6865 ms (threshold is 4800. ms). Consider token timeout increase. Jan

Re: [ClusterLabs] [EXT] Migrating off CentOS

2024-01-18 Thread Windl, Ulrich
Hi! I’m no expert, but my guess is that Debian is more conservative than OpenSUSE Leap, which is in turn more conservative than Fedora. So if you cannot use Fedora (frequent out of support releases), you might consider Leap (15.5) before considering Debian. I’m not saying anything against

[ClusterLabs] pcs 0.10.18 released

2024-01-17 Thread he / him
I am happy to announce the latest release of pcs, version 0.10.18. Source code is available at: https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.10.18.tar.gz or https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.10.18.zip This is the last planned release of pcs 0.10. It primarily

Re: [ClusterLabs] Beginner lost with promotable "group" design

2024-01-17 Thread Ken Gaillot
On Wed, 2024-01-17 at 14:23 +0100, Adam Cécile wrote: > Hello, > > > I'm trying to achieve the following setup with 3 hosts: > > * One master gets a shared IP, then remove default gw, add another > gw, > start a service > > * Two slaves should have none of them but add a different default gw

[ClusterLabs] replica vol when a peer is lost/offed & qcow2 ?

2024-01-17 Thread lejeczek via Users
Hi guys. I wonder if you might have any tips/tweaks for volume/cluster to make it more resilient? accommodating? to qcow2 files but! when a peer is lots or missing? I have 3-peer cluster/volume: 2 + 1 arbiter & my experience is such, that when all is good then.. well, all is good, but... when

[ClusterLabs] Beginner lost with promotable "group" design

2024-01-17 Thread Adam Cécile
Hello, I'm trying to achieve the following setup with 3 hosts: * One master gets a shared IP, then remove default gw, add another gw, start a service * Two slaves should have none of them but add a different default gw I managed quite easily to get the master workflow running with ordering

Re: [ClusterLabs] Migrating off CentOS

2024-01-15 Thread Ken Gaillot
On Sat, 2024-01-13 at 09:07 -0600, Billy Croan wrote: > I'm planning to migrate a two-node cluster off CentOS 7 this year. I > think I'm taking it to Debian Stable, but open for suggestions if any > distribution is better supported by pacemaker. Debian, RHEL, SUSE, Ubuntu, and compatible

Re: [ClusterLabs] Migrating off CentOS

2024-01-14 Thread Michele Baldessari
On Sat, Jan 13, 2024 at 09:07:52AM -0600, Billy Croan wrote: > I'm planning to migrate a two-node cluster off CentOS 7 this year. I think > I'm taking it to Debian Stable, but open for suggestions if any > distribution is better supported by pacemaker. > > Have any of you had success doing major

[ClusterLabs] Migrating off CentOS

2024-01-13 Thread Billy Croan
I'm planning to migrate a two-node cluster off CentOS 7 this year. I think I'm taking it to Debian Stable, but open for suggestions if any distribution is better supported by pacemaker. Have any of you had success doing major upgrades (bullseye to bookworm on Debian) of your physical nodes one

[ClusterLabs] pcs 0.11.7 released

2024-01-11 Thread he / him
I am happy to announce the latest release of pcs, version 0.11.7. Source code is available at: https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.11.7.tar.gz or https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.11.7.zip This release contains mostly bug fixes and user experience

[ClusterLabs] Release crmsh 4.6.0

2024-01-09 Thread Xin Liang via Users
Hello everyone! I am pleased to announce that crmsh 4.6.0 is now available for release! Changes since tag 4.5.0 Features: - bootstrap: Support ssh-agent and crmsh could no longer rely on the private key in the cluster nodes (#1261) - prun:

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-04 Thread Ken Gaillot
Thanks, I hadn't heard that! On Thu, 2024-01-04 at 01:13 +0100, Valentin Vidić via Users wrote: > On Wed, Jan 03, 2024 at 11:06:27AM -0600, Ken Gaillot wrote: > > I'd like to release Pacemaker 3.0.0 around the middle of this > > year. > > I'm gathering proposed changes here: > > > > > >

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-04 Thread Reid Wahl
On Wed, Jan 3, 2024 at 8:09 PM Madison Kelly wrote: > > On 2024-01-03 12:06, Ken Gaillot wrote: > > Hi all, > > I'd like to release Pacemaker 3.0.0 around the middle of this year. > I'm gathering proposed changes here: > >

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-03 Thread Madison Kelly
On 2024-01-03 12:06, Ken Gaillot wrote: Hi all, I'd like to release Pacemaker 3.0.0 around the middle of this year. I'm gathering proposed changes here: https://projects.clusterlabs.org/w/projects/pacemaker/pacemaker_3.0_changes/ Please review for anything that might affect you, and reply

Re: [ClusterLabs] Planning for Pacemaker 3

2024-01-03 Thread Valentin Vidić via Users
On Wed, Jan 03, 2024 at 11:06:27AM -0600, Ken Gaillot wrote: > I'd like to release Pacemaker 3.0.0 around the middle of this year. > I'm gathering proposed changes here: > > https://projects.clusterlabs.org/w/projects/pacemaker/pacemaker_3.0_changes/ > > Please review for anything that might

[ClusterLabs] Planning for Pacemaker 3

2024-01-03 Thread Ken Gaillot
Hi all, I'd like to release Pacemaker 3.0.0 around the middle of this year. I'm gathering proposed changes here: https://projects.clusterlabs.org/w/projects/pacemaker/pacemaker_3.0_changes/ Please review for anything that might affect you, and reply here if you have any concerns. Pacemaker

[ClusterLabs] Release crmsh 4.5.1

2024-01-02 Thread Xin Liang via Users
Hello everyone! I am pleased to announce that crmsh 4.5.1 is now available for release! Changes since tag 4.5.0 Features: - prun: replace parallax with crmsh.prun to support non-root sudoer (#1147) Major fixes: - Fix: ui_cluster: Improve the process of 'crm cluster stop' (bsc#1213889) - Fix:

Re: [ClusterLabs] colocate Redis - weird

2024-01-01 Thread Ken Gaillot
On Wed, 2023-12-20 at 11:16 +0100, lejeczek via Users wrote: > > > On 19/12/2023 19:13, lejeczek via Users wrote: > > hi guys, > > > > Is this below not the weirdest thing? > > > > -> $ pcs constraint ref PGSQL-PAF-5435 > > Resource: PGSQL-PAF-5435 > >

Re: [ClusterLabs] colocation constraint - do I get it all wrong?

2024-01-01 Thread Ken Gaillot
On Fri, 2023-12-22 at 17:02 +0100, lejeczek via Users wrote: > hi guys. > > I have a colocation constraint: > > -> $ pcs constraint ref DHCPD > Resource: DHCPD > colocation-DHCPD-GATEWAY-NM-link-INFINITY > > and the trouble is... I thought DHCPD is to follow GATEWAY-NM-link, > always! > If

[ClusterLabs] colocation constraint - do I get it all wrong?

2023-12-22 Thread lejeczek via Users
hi guys. I have a colocation constraint: -> $ pcs constraint ref DHCPD Resource: DHCPD   colocation-DHCPD-GATEWAY-NM-link-INFINITY and the trouble is... I thought DHCPD is to follow GATEWAY-NM-link, always! If that is true that I see very strange behavior, namely. When there is an issue with

[ClusterLabs] Release crmsh 4.6.0-rc2

2023-12-22 Thread Xin Liang via Users
Hello everyone! I am pleased to announce that crmsh 4.6.0-rc2 is now available for release! Changes since tag 4.6.0-rc1: * Dev: unify version string used in setup.py and autotools (#943) * Fix: ui_cluster: Improve the process of 'crm cluster stop' (bsc#1213889) * Dev: log: save

Re: [ClusterLabs] colocate Redis - weird

2023-12-20 Thread lejeczek via Users
On 19/12/2023 19:13, lejeczek via Users wrote: hi guys, Is this below not the weirdest thing? -> $ pcs constraint ref PGSQL-PAF-5435 Resource: PGSQL-PAF-5435   colocation-HA-10-1-1-84-PGSQL-PAF-5435-clone-INFINITY   colocation-REDIS-6385-clone-PGSQL-PAF-5435-clone-INFINITY  

[ClusterLabs] Pacemaker 2.1.7 final release now available

2023-12-19 Thread Ken Gaillot
Hi all, Source code for Pacemaker version 2.1.7 is available at: https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7 This is primarily a bug fix release. See the ChangeLog or the link above for details. Many thanks to all contributors of source code to this release, including

Re: [ClusterLabs] Build cluster one node at a time

2023-12-19 Thread Ken Gaillot
Correct. You want to enable pcsd to start at boot. Also, after starting pcsd the first time on a node, authorize it from the first node with "pcs host auth -u hacluster". On Tue, 2023-12-19 at 22:42 +0200, Tiaan Wessels wrote: > So i run the pcs add command for every new node on the first

Re: [ClusterLabs] Build cluster one node at a time

2023-12-19 Thread Tiaan Wessels
So i run the pcs add command for every new node on the first original node, not on the node being added? Only corosync, pacemaker and pcsd needs to run on the node to be added and the commands being run on the original node will speak to these on the new node? On Tue, 19 Dec 2023, 21:39 Ken

Re: [ClusterLabs] Build cluster one node at a time

2023-12-19 Thread Ken Gaillot
On Tue, 2023-12-19 at 17:03 +0200, Tiaan Wessels wrote: > Hi, > Is it possible to build a corosync pacemaker cluster on redhat9 one > node at a time? In other words, when I'm finished with the first node > and reboot it, all services are started on it. Then i build a second > node to integrate

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Andrei Borzenkov
On 19.12.2023 21:42, Artem wrote: Andrei and Klaus thanks for prompt reply and clarification! As I understand, design and behavior of Pacemaker is tightly coupled with the stonith concept. But isn't it too rigid? If you insist on shooting yourself in the foot, pacemaker gives you the gun. It

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Vladislav Bogdanov
What if node (especially vm) freezes for several minutes and then continues to write to a shared disk where other nodes already put their data? In my opinion, fencing, preferably two-level, is mandatory for lustre, trust me, I'd developed whole HA stack for both Exascaler and PangeaFS. We've

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Artem
Andrei and Klaus thanks for prompt reply and clarification! As I understand, design and behavior of Pacemaker is tightly coupled with the stonith concept. But isn't it too rigid? Is there a way to leverage self-monitoring or pingd rules to trigger isolated node to umount its FS? Like vSphere High

[ClusterLabs] colocate Redis - weird

2023-12-19 Thread lejeczek via Users
hi guys, Is this below not the weirdest thing? -> $ pcs constraint ref PGSQL-PAF-5435 Resource: PGSQL-PAF-5435   colocation-HA-10-1-1-84-PGSQL-PAF-5435-clone-INFINITY   colocation-REDIS-6385-clone-PGSQL-PAF-5435-clone-INFINITY   order-PGSQL-PAF-5435-clone-HA-10-1-1-84-Mandatory  

[ClusterLabs] Build cluster one node at a time

2023-12-19 Thread Tiaan Wessels
Hi, Is it possible to build a corosync pacemaker cluster on redhat9 one node at a time? In other words, when I'm finished with the first node and reboot it, all services are started on it. Then i build a second node to integrate into the cluster and once done, pcs status shows two nodes on-line ?

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Klaus Wenninger
On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov wrote: > On Tue, Dec 19, 2023 at 10:41 AM Artem wrote: > ... > > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is > unrunnable (node is offline) > > Dec

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Andrei Borzenkov
On Tue, Dec 19, 2023 at 10:41 AM Artem wrote: ... > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] > (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is > unrunnable (node is offline) > Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] >

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-18 Thread Artem
Hi Ken, I rolled back settings to 100:100 scores without ping and did simulation again I checked pacemaker.log and the only meaningful entry is the following, still it doesn't make sense to me. Actions: Stop OST4(lustre4 ) blocked crit: Cannot fence lustre4

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-18 Thread Ken Gaillot
On Mon, 2023-12-18 at 23:39 +0300, Artem wrote: > Hello experts. > > I previously played with a dummy resource and it worked as expected. > Now I'm switching to a Lustre OST resource and cannot make it. > Neither can I understand. > > > ### Initial setup: > # pcs resource defaults update

[ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-18 Thread Artem
Hello experts. I previously played with a dummy resource and it worked as expected. Now I'm switching to a Lustre OST resource and cannot make it. Neither can I understand. ### Initial setup: # pcs resource defaults update resource-stickness=110 # for i in {1..4}; do pcs cluster node add-remote

Re: [ClusterLabs] resource-agents and VMs

2023-12-15 Thread Andrei Borzenkov
On Fri, Dec 15, 2023 at 2:23 PM lejeczek via Users wrote: > > Hi guys. > > my resources-agents depend like so: > > resource-agents-deps.target > ○ ├─00\\x2dVMsy.mount > ● └─virt-guest-shutdown.target > If this is output of "systemctl list-depenedncies" - it has a lot of flags that completely

[ClusterLabs] resource-agents and VMs

2023-12-15 Thread lejeczek via Users
Hi guys. my resources-agents depend like so: resource-agents-deps.target ○ ├─00\\x2dVMsy.mount ● └─virt-guest-shutdown.target when I reboot a node VMs seems to migrated off it live a ok, but.. when node comes back on after a reboot, VMs fail to migrate back to it, live. I see on such node

[ClusterLabs] Pacemaker 2.1.7-rc4 now available (likely final for real)

2023-12-12 Thread Ken Gaillot
Hi all, Source code for the fourth (and very likely final) release candidate for Pacemaker version 2.1.7 is available at: https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7-rc4 This release candidate fixes a newly found regression that was introduced in rc1. This is probably

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Ken Gaillot
On Tue, 2023-12-12 at 18:08 +0300, Artem wrote: > Hi Andrei. pingd==0 won't satisfy both statements. It would if I used > GTE, but I used GT. > pingd lt 1 --> [0] > pingd gt 0 --> [1,2,3,...] It's the "or defined pingd" part of the rule that will match pingd==0. A value of 0 is defined. I'm

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Ken Gaillot
On Mon, 2023-12-11 at 21:05 +0300, Artem wrote: > Hi Ken, > > On Mon, 11 Dec 2023 at 19:00, Ken Gaillot > wrote: > > > Question #2) I shut lustre3 VM down and leave it like that > > How did you shut it down? Outside cluster control, or with > > something > > like pcs resource disable? > > > >

Re: [ClusterLabs] resource fails manual failover

2023-12-12 Thread Ken Gaillot
On Tue, 2023-12-12 at 16:50 +0300, Artem wrote: > Is there a detailed explanation for resource monitor and start > timeouts and intervals with examples, for dummies? No, though Pacemaker Explained has some reference information:

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Artem
Hi Andrei. pingd==0 won't satisfy both statements. It would if I used GTE, but I used GT. pingd lt 1 --> [0] pingd gt 0 --> [1,2,3,...] On Tue, 12 Dec 2023 at 17:21, Andrei Borzenkov wrote: > On Tue, Dec 12, 2023 at 4:47 PM Artem wrote: > >> > pcs constraint location FAKE3 rule score=0 pingd

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Andrei Borzenkov
On Tue, Dec 12, 2023 at 4:47 PM Artem wrote: > > > > On Tue, 12 Dec 2023 at 16:17, Andrei Borzenkov wrote: >> >> On Fri, Dec 8, 2023 at 5:44 PM Artem wrote: >> > pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined pingd >> > pcs constraint location FAKE4 rule score=0 pingd lt 1

Re: [ClusterLabs] resource fails manual failover

2023-12-12 Thread Andrei Borzenkov
On Tue, Dec 12, 2023 at 4:50 PM Artem wrote: > > Is there a detailed explanation for resource monitor and start timeouts and > intervals with examples, for dummies? > > my resource configured s follows: > [root@lustre-mds1 ~]# pcs resource show MDT00 > Warning: This command is deprecated and

[ClusterLabs] resource fails manual failover

2023-12-12 Thread Artem
Is there a detailed explanation for resource monitor and start timeouts and intervals with examples, for dummies? my resource configured s follows: [root@lustre-mds1 ~]# pcs resource show MDT00 Warning: This command is deprecated and will be removed. Please use 'pcs resource config' instead.

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Artem
On Tue, 12 Dec 2023 at 16:17, Andrei Borzenkov wrote: > On Fri, Dec 8, 2023 at 5:44 PM Artem wrote: > > pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined > pingd > > pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined > pingd > > pcs constraint location FAKE3

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Andrei Borzenkov
On Fri, Dec 8, 2023 at 5:44 PM Artem wrote: > > Hello experts. > > I use pacemaker for a Lustre cluster. But for simplicity and exploration I > use a Dummy resource. I didn't like how resource performed failover and > failback. When I shut down VM with remote agent, pacemaker tries to restart

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Artem
Dear Ken and other experts. How can I leverage pingd to speedup failover? Or may be it is useless and we should leverage monitor/start timeouts and migration-threshold/failure-timeout ? I have preference like this for normal operations: > pcs constraint location FAKE3 prefers lustre3=100 > pcs

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-11 Thread Artem
Hi Ken, On Mon, 11 Dec 2023 at 19:00, Ken Gaillot wrote: > > Question #2) I shut lustre3 VM down and leave it like that > How did you shut it down? Outside cluster control, or with something > like pcs resource disable? > > I did it outside of the cluster to simulate a failure. I turned off

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-11 Thread Ken Gaillot
On Fri, 2023-12-08 at 17:44 +0300, Artem wrote: > Hello experts. > > I use pacemaker for a Lustre cluster. But for simplicity and > exploration I use a Dummy resource. I didn't like how resource > performed failover and failback. When I shut down VM with remote > agent, pacemaker tries to restart

Re: [ClusterLabs] [EXT] Prevent cluster transition when resource unavailable on both nodes

2023-12-11 Thread Alexander Eastwood
Hi, Thanks Ken and Ulrich for your replies. With your suggestions I ended up finding out about ocf:heartbeat:ethmonitor and will try to set this up as an additional resource within our cluster. I can share more information once (if!) I have it working the way I want to. Cheers, Alex > On

Re: [ClusterLabs] how to colocate promoted resources ?

2023-12-08 Thread Jehan-Guillaume de Rorthais via Users
On Fri, 8 Dec 2023 17:11:58 +0100 lejeczek via Users wrote: ... > Apologies, perhaps I was quite vague. > I was thinking - having a 3-node HA cluster and 3-node > single-master->slaves pgSQL, now.. > say, I want pgSQL masters to spread across HA cluster so I > theory - having each HA node

Re: [ClusterLabs] how to colocate promoted resources ?

2023-12-08 Thread lejeczek via Users
On 08/12/2023 13:25, Jehan-Guillaume de Rorthais wrote: Hi, On Wed, 6 Dec 2023 10:36:39 +0100 lejeczek via Users wrote: How do your colocate your promoted resources with balancing underlying resources as priority? What do you mean? With a simple scenario, say 3 nodes and 3 pgSQL

[ClusterLabs] ocf:pacemaker:ping works strange

2023-12-08 Thread Artem
Hello experts. I use pacemaker for a Lustre cluster. But for simplicity and exploration I use a Dummy resource. I didn't like how resource performed failover and failback. When I shut down VM with remote agent, pacemaker tries to restart it. According to pcs status it marks the resource (not RA)

Re: [ClusterLabs] how to colocate promoted resources ?

2023-12-08 Thread Jehan-Guillaume de Rorthais via Users
Hi, On Wed, 6 Dec 2023 10:36:39 +0100 lejeczek via Users wrote: > How do your colocate your promoted resources with balancing > underlying resources as priority? What do you mean? > With a simple scenario, say > 3 nodes and 3 pgSQL clusters > what would be best possible way - I'm thinking

[ClusterLabs] Release crmsh 4.6.0-rc1

2023-12-07 Thread Xin Liang via Users
Hello everyone! I am pleased to announce that crmsh 4.6.0-rc1 is now available for release! Changes since tag 4.5.0 Features: - bootstrap: Support ssh-agent and crmsh could no longer rely on the private key in the cluster nodes (#1261) - prun:

Re: [ClusterLabs] ethernet link up/down - ?

2023-12-07 Thread Reid Wahl
On Thu, Dec 7, 2023 at 7:34 AM lejeczek via Users wrote: > > > > On 04/12/2023 20:58, Reid Wahl wrote: > > On Thu, Nov 30, 2023 at 10:30 AM lejeczek via Users > > wrote: > >> > >> > >> On 07/02/2022 20:09, lejeczek via Users wrote: > >>> Hi guys > >>> > >>> How do you guys go about doing link

Re: [ClusterLabs] How to achieve next order behavior for start/stop action for failover

2023-12-07 Thread Reid Wahl
On Thu, Dec 7, 2023 at 7:07 AM Novik Arthur wrote: > > Thank you Reid! > "Mandatory" with "symmetrical=false" did exactly what I wanted. > > Sincerely yours, > A Adding the list back to confirm this is resolved. Thanks for confirming and for correcting my typo! Glad it's working as desired. >

[ClusterLabs] Pacemaker 2.1.7-rc3 now available (likely final)

2023-12-07 Thread Ken Gaillot
Hi all, Source code for the third (and likely final) release candidate for Pacemaker version 2.1.7 is available at: https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7-rc3 This release candidate fixes a couple issues introduced in rc1. See the ChangeLog or the link above for

Re: [ClusterLabs] ethernet link up/down - ?

2023-12-07 Thread lejeczek via Users
On 04/12/2023 20:58, Reid Wahl wrote: On Thu, Nov 30, 2023 at 10:30 AM lejeczek via Users wrote: On 07/02/2022 20:09, lejeczek via Users wrote: Hi guys How do you guys go about doing link up/down as a resource? many thanks, L. With simple tests I confirmed that indeed Linux - on my

Re: [ClusterLabs] [EXT] Prevent cluster transition when resource unavailable on both nodes

2023-12-07 Thread Windl, Ulrich
Hi! What about this: Run a ping node for a remote resource to set up some score value. If the remote is unreachable, the score will reflect that. Then add a rule chink that score, deciding whether to run the virtual IP or not. Regards, Ulrich -Original Message- From: Users On Behalf

Re: [ClusterLabs] [EXT] IPaddr2 Started (disabled) ?

2023-12-06 Thread Windl, Ulrich
The „disabled“ makes me wonder. From: Users On Behalf Of lejeczek via Users Sent: Monday, December 4, 2023 10:21 AM To: users@clusterlabs.org Cc: lejeczek Subject: [EXT] [ClusterLabs] IPaddr2 Started (disabled) ? hi guys. A cluster thinks the resource is up: ... * HA-10-1-1-80

Re: [ClusterLabs] Prevent cluster transition when resource unavailable on both nodes

2023-12-06 Thread Ken Gaillot
On Wed, 2023-12-06 at 17:55 +0100, Alexander Eastwood wrote: > Hello, > > I administrate a Pacemaker cluster consisting of 2 nodes, which are > connected to each other via ethernet cable to ensure that they are > always able to communicate with each other. A network switch is also > connected to

[ClusterLabs] Prevent cluster transition when resource unavailable on both nodes

2023-12-06 Thread Alexander Eastwood
Hello, I administrate a Pacemaker cluster consisting of 2 nodes, which are connected to each other via ethernet cable to ensure that they are always able to communicate with each other. A network switch is also connected to each node via ethernet cable and provides external access. One of

[ClusterLabs] how to colocate promoted resources ?

2023-12-06 Thread lejeczek via Users
Hi guys. How do your colocate your promoted resources with balancing underlying resources as priority? With a simple scenario, say 3 nodes and 3 pgSQL clusters what would be best possible way - I'm thinking most gentle at the same time, if that makes sense. many thanks,

Re: [ClusterLabs] make promoted follow promoted resource ?

2023-12-06 Thread lejeczek via Users
On 26/11/2023 12:20, Reid Wahl wrote: On Sun, Nov 26, 2023 at 1:32 AM lejeczek via Users wrote: Hi guys. With these: -> $ pcs resource status REDIS-6381-clone * Clone Set: REDIS-6381-clone [REDIS-6381] (promotable): * Promoted: [ ubusrv2 ] * Unpromoted: [ ubusrv1 ubusrv3 ] ->

Re: [ClusterLabs] Setting up an Active/Active Pacemaker cluster for a Postfix/Dovecot cluster, using a DRBD backend for the data storage

2023-12-05 Thread Raphael DUBOIS-LISKI
Hi, Thank you for your quick response, I am indeed using a diskless watchdog, I have already looked into a setting up a device dependant watchdog, but wouldn’t that create a single point of failure in the case that common drive becomes unavailable ? [SOGET] Raphael DUBOIS-LISKI Ingénieur

Re: [ClusterLabs] Redundant entries in log

2023-12-05 Thread Ken Gaillot
On Tue, 2023-12-05 at 17:21 +, Jean-Baptiste Skutnik wrote: > Hi, > > It was indeed a configuration of 1m on the recheck interval that > triggered the transitions. > > Could you elaborate on why this is not relevant anymore ? I am > training > on the HA stack and if there are mechanisms to

Re: [ClusterLabs] Setting up an Active/Active Pacemaker cluster for a Postfix/Dovecot cluster, using a DRBD backend for the data storage

2023-12-05 Thread Damiano Giuliani
It could be the watchdog? Are u using diskless watchdog?Two nodes are not supported in diskless mode. On Tue, Dec 5, 2023, 5:40 PM Raphael DUBOIS-LISKI < raphael.dubois-li...@soget.fr> wrote: > Hello, > > > > I am seeking help for the setup of an Active/Active pacemaker cluster that > relies on

Re: [ClusterLabs] Redundant entries in log

2023-12-05 Thread Jean-Baptiste Skutnik via Users
Hi, It was indeed a configuration of 1m on the recheck interval that triggered the transitions. Could you elaborate on why this is not relevant anymore ? I am training on the HA stack and if there are mechanisms to detect failure more advanced than a recheck I would be interested in what to look

Re: [ClusterLabs] RemoteOFFLINE status, permanently

2023-12-04 Thread Artem
Thank you very much Ken! I missed this step. Now I clearly see it in Morrone_LUG2017.pdf I added the constraint and RA became online. What bugs me is the following. I destroyed and recreated the cluster with the same settings on designated hosts and nothing worked - always RemoteOFFLINE. But when

Re: [ClusterLabs] RemoteOFFLINE status, permanently

2023-12-04 Thread Ken Gaillot
On Wed, 2023-11-29 at 12:56 +0300, Artem wrote: > Hello, > > I deployed a Lustre cluster with 3 nodes (metadata) as > pacemaker/corosync and 4 nodes as Remote Agents (for data). Initially > all went well, I've set up MGS and MDS resources, checked failover > and failback, remote agents were

Re: [ClusterLabs] ethernet link up/down - ?

2023-12-04 Thread Reid Wahl
On Thu, Nov 30, 2023 at 10:30 AM lejeczek via Users wrote: > > > > On 07/02/2022 20:09, lejeczek via Users wrote: > > Hi guys > > > > How do you guys go about doing link up/down as a resource? > > > > many thanks, L. > > > > With simple tests I confirmed that indeed Linux - on my > hardware at

Re: [ClusterLabs] IPaddr2 Started (disabled) ?

2023-12-04 Thread Reid Wahl
On Mon, Dec 4, 2023 at 1:21 AM lejeczek via Users wrote: > > hi guys. > > A cluster thinks the resource is up: > ... > * HA-10-1-1-80(ocf:heartbeat:IPaddr2): Started ubusrv3 (disabled) > .. > while it is not the case. What might it mean? > Config is simple: > -> $ pcs resource config

Re: [ClusterLabs] How to achieve next order behavior for start/stop action for failover

2023-12-04 Thread Reid Wahl
On Mon, Dec 4, 2023 at 8:48 AM Novik Arthur wrote: > > Hello community! > I'm not sure if pacemaker can do it or not with current logic (maybe it could > be a new feature), but it's worth asking before starting to "build my own > Luna-park ,with blackjack and " > > Right now I have

[ClusterLabs] How to achieve next order behavior for start/stop action for failover

2023-12-04 Thread Novik Arthur
Hello community! I'm not sure if pacemaker can do it or not with current logic (maybe it could be a new feature), but it's worth asking before starting to "build my own *Luna*-*park* ,with *blackjack* and " *Right now* I have something like: MGS -> MDT -> OST order mdt-after-mgs* Optional*:

[ClusterLabs] IPaddr2 Started (disabled) ?

2023-12-04 Thread lejeczek via Users
hi guys. A cluster thinks the resource is up: ...   * HA-10-1-1-80    (ocf:heartbeat:IPaddr2):     Started ubusrv3 (disabled) .. while it is not the case. What might it mean? Config is simple: -> $ pcs resource config HA-10-1-1-80  Resource: HA-10-1-1-80 (class=ocf provider=heartbeat

<    1   2   3   4   5   6   7   8   9   10   >