Hi, Thank you for your quick response,
I am indeed using a diskless watchdog, I have already looked into a setting up a device dependant watchdog, but wouldn’t that create a single point of failure in the case that common drive becomes unavailable ? [SOGET] Raphael DUBOIS-LISKI Ingénieur Système et Réseau +33 2 35 19 25 54 SOGET SA • 4, rue des Lamaneurs • 76600 Le Havre, FR [web]<https://signature.soget.fr/l/dnA0alZ4cjRrTERoZlFWSDhaN0xYdz09-VzNGNG5oU1pWYjZRMTVQMUxBZ2xJZz09> [linkedin]<https://signature.soget.fr/l/dnA0alZ4cjRrTERoZlFWSDhaN0xYdz09-eFZMaExNZWZGQjMvaVVJaDArTTl6Zz09> [twitter]<https://signature.soget.fr/l/dnA0alZ4cjRrTERoZlFWSDhaN0xYdz09-VjFWNTBIYlNCUDdIbXlxKzJyRzFPUT09> Disclaimer<http://soget.fr/disclaimer> De : Damiano Giuliani <damianogiulian...@gmail.com> Envoyé : mardi 5 décembre 2023 18:30 À : Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org> Objet : Re: [ClusterLabs] Setting up an Active/Active Pacemaker cluster for a Postfix/Dovecot cluster, using a DRBD backend for the data storage It could be the watchdog? Are u using diskless watchdog?Two nodes are not supported in diskless mode. On Tue, Dec 5, 2023, 5:40 PM Raphael DUBOIS-LISKI <raphael.dubois-li...@soget.fr<mailto:raphael.dubois-li...@soget.fr>> wrote: Hello, I am seeking help for the setup of an Active/Active pacemaker cluster that relies on a DRBD cluster as the data storage backend, as the solution is mounted on 2 RHEL9 VMs, the file system used is GFS2. Linked, is a PDF of the infrastructure that I am currently experimenting on. For context, this is my Pacemaker cluster config: Cluster Name: mycluster Corosync Nodes: Node1 Node2 Pacemaker Nodes: Node1 Node2 Resources: Clone: Data-clone Meta Attributes: Data-clone-meta_attributes clone-max=2 clone-node-max=1 notify=true promotable=true promoted-max=2 promoted-node-max=1 Resource: Data (class=ocf provider=linbit type=drbd) Attributes: Data-instance_attributes drbd_resource=drbd0 Operations: demote: Data-demote-interval-0s interval=0s timeout=90 monitor: Data-monitor-interval-60s interval=60s notify: Data-notify-interval-0s interval=0s timeout=90 promote: Data-promote-interval-0s interval=0s timeout=90 reload: Data-reload-interval-0s interval=0s timeout=30 start: Data-start-interval-0s interval=0s timeout=240 stop: Data-stop-interval-0s interval=0s timeout=100 Clone: dlm-clone Meta Attributes: dlm-clone-meta_attributes clone-max=2 clone-node-max=1 Resource: dlm (class=ocf provider=pacemaker type=controld) Operations: monitor: dlm-monitor-interval-60s interval=60s start: dlm-start-interval-0s interval=0s timeout=90s stop: dlm-stop-interval-0s interval=0s timeout=100s Clone: FS-clone Resource: FS (class=ocf provider=heartbeat type=Filesystem) Attributes: FS-instance_attributes device=/dev/drbd0 directory=/home/vusers fstype=gfs2 Operations: monitor: FS-monitor-interval-20s interval=20s timeout=40s start: FS-start-interval-0s interval=0s timeout=60s stop: FS-stop-interval-0s interval=0s timeout=60s Clone: smtp_postfix-clone Meta Attributes: smtp_postfix-clone-meta_attributes clone-max=2 clone-node-max=1 Resource: smtp_postfix (class=ocf provider=heartbeat type=postfix) Operations: monitor: smtp_postfix-monitor-interval-60s interval=60s timeout=20s reload: smtp_postfix-reload-interval-0s interval=0s timeout=20s start: smtp_postfix-start-interval-0s interval=0s timeout=20s stop: smtp_postfix-stop-interval-0s interval=0s timeout=20s Clone: WebSite-clone Resource: WebSite (class=ocf provider=heartbeat type=apache) Attributes: WebSite-instance_attributes configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status Operations: monitor: WebSite-monitor-interval-1min interval=1min start: WebSite-start-interval-0s interval=0s timeout=40s stop: WebSite-stop-interval-0s interval=0s timeout=60s Colocation Constraints: resource 'FS-clone' with Promoted resource 'Data-clone' (id: colocation-FS- Data-clone-INFINITY) score=INFINITY resource 'WebSite-clone' with resource 'FS-clone' (id: colocation-WebSite-FS- INFINITY) score=INFINITY resource 'FS-clone' with resource 'dlm-clone' (id: colocation-FS-dlm-clone- INFINITY) score=INFINITY resource 'FS-clone' with resource 'smtp_postfix-clone' (id: colocation-FS- clone-smtp_postfix-clone-INFINITY) score=INFINITY Order Constraints: promote resource 'Data-clone' then start resource 'FS-clone' (id: order-Data- clone-FS-mandatory) start resource 'FS-clone' then start resource 'WebSite-clone' (id: order-FS- WebSite-mandatory) start resource 'dlm-clone' then start resource 'FS-clone' (id: order-dlm- clone-FS-mandatory) start resource 'FS-clone' then start resource 'smtp_postfix-clone' (id: order- FS-clone-smtp_postfix-clone-mandatory) Resources Defaults: Meta Attrs: build-resource-defaults resource-stickiness=1 (id: build-resource-stickiness) Operations Defaults: Meta Attrs: op_defaults-meta_attributes timeout=240s (id: op_defaults-meta_attributes-timeout) Cluster Properties: cib-bootstrap-options cluster-infrastructure=corosync cluster-name=mycluster dc-version=2.1.6-9.el9-6fdc9deea29 have-watchdog=true last-lrm-refresh=1701787695 no-quorum-policy=ignore stonith-enabled=true stonith-watchdog-timeout=10 And this is my DRBD configuration : global { usage-count no; } common { disk { resync-rate 100M; al-extents 257; } } resource drbd0 { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trig; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trig; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt; fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh<http://crm-fence-peer.9.sh>"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh<http://crm-unfence-peer.9.sh>"; split-brain "/usr/lib/drbd/notify-split-brain.sh root"; out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; } startup { wfc-timeout 1; degr-wfc-timeout 1; become-primary-on both; } net { # The following lines are dedicated to handle # split-brain situations (e.g., if one of the nodes fails) after-sb-0pri discard-zero-changes; # If both nodes are secondary, just make one of them primary after-sb-1pri discard-secondary; # If one is primary, one is not, trust the primary node after-sb-2pri disconnect; allow-two-primaries yes; verify-alg sha1; } disk { on-io-error detach; } options { auto-promote yes; } on fradevtestmail1 { device /dev/drbd0; disk /dev/rootvg/drbdlv; address X.X.X.X:7788; flexible-meta-disk internal; } on fradevtestmail2 { device /dev/drbd0; disk /dev/rootvg/drbdlv; address X.X.X.X:7788; flexible-meta-disk internal; } } Knowing all this, The cluster works perfectly as expected when both nodes are up, but a problem arises when I put the cluster in a degraded state by killing one of the nodes improperly (to simulate an unexpected crash). This causes the remaining node to reboot, restart the cluster, with all going well in the resource start process, until it's time to mount the File System, where it times out and fails. Would you have any idea why this behaviour happens, and possibly how I would be able to fix this behaviour, so that the cluster is still usable even with one node down? Until we can get the second node back and running in case of an unexpected crash ? Many thanks for your help, Have a nice day, BR, [SOGET] Raphael DUBOIS-LISKI Ingénieur Système et Réseau +33 2 35 19 25 54 SOGET SA • 4, rue des Lamaneurs • 76600 Le Havre, FR [web]<https://signature.soget.fr/l/dnA0alZ4cjRrTERoZlFWSDhaN0xYdz09-VzNGNG5oU1pWYjZRMTVQMUxBZ2xJZz09> [linkedin]<https://signature.soget.fr/l/dnA0alZ4cjRrTERoZlFWSDhaN0xYdz09-eFZMaExNZWZGQjMvaVVJaDArTTl6Zz09> [twitter]<https://signature.soget.fr/l/dnA0alZ4cjRrTERoZlFWSDhaN0xYdz09-VjFWNTBIYlNCUDdIbXlxKzJyRzFPUT09> Disclaimer<http://soget.fr/disclaimer> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/