Dear List,
I have been running two different 3-node clusters for some time. I am having a fatal problem with corosync: After a node failure, rebooted node does NOT start corosync.
Clusters;
- All nodes are running Ubuntu Server 24.04
- corosync is 3.1.7
- corosync-qdevice is 3.0.3
- pacemaker is 2.1.6
- The third node at both clusters is a quorum device. Cluster is on ffsplit algorithm.
- All nodes are baremetal & attached to a dedicated kronosnet network.
- STONITH is enabled in one of the clusters and disabled for the other.
corosync & pacemaker service starts (systemd) are disabled. I am starting any cluster with the command pcs cluster start.
corosync NEVER starts AFTER a node failure (node is rebooted). There is nothing in /var/log/corosync/corosync.log, service freezes as:
Aug 01 12:54:56 [3193] charon
corosync notice [MAIN ] Corosync Cluster Engine 3.1.7 starting
up
Aug 01 12:54:56 [3193] charon corosync info [MAIN ] Corosync
built-in features: dbus monitoring watchdog augeas systemd
xmlconf vqsim nozzle snmp pie relro bindnow
corosync never starts kronosnet. I checked kronosnet interfaces,
all OK, there is IP connectivity in between. If I do corosync -t, it is the same
freeze.
I could ONLY manage to start corosync by reinstalling it: apt reinstall corosync ; pcs cluster start.
The above issue repeated itself at least 5-6 times. I do NOT see anything in syslog either. I will be glad if you lead me on how to solve this.
Thanks,
Murat
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/