Dear List,

I have been running two different 3-node clusters for some time. I am having a fatal problem with corosync: After a node failure, rebooted node does NOT start corosync.

Clusters;

  • All nodes are running Ubuntu Server 24.04
  • corosync is 3.1.7
  • corosync-qdevice is 3.0.3
  • pacemaker is 2.1.6
  • The third node at both clusters is a quorum device. Cluster is on ffsplit algorithm.
  • All nodes are baremetal & attached to a dedicated kronosnet network.
  • STONITH is enabled in one of the clusters and disabled for the other.

corosync & pacemaker service starts (systemd) are disabled. I am starting any cluster with the command pcs cluster start.

corosync NEVER starts AFTER a node failure (node is rebooted). There is nothing in /var/log/corosync/corosync.log, service freezes as:

Aug 01 12:54:56 [3193] charon corosync notice  [MAIN  ] Corosync Cluster Engine 3.1.7 starting up
Aug 01 12:54:56 [3193] charon corosync info    [MAIN  ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim nozzle snmp pie relro bindnow

corosync never starts kronosnet. I checked kronosnet interfaces, all OK, there is IP connectivity in between. If I do corosync -t, it is the same freeze.

I could ONLY manage to start corosync by reinstalling it: apt reinstall corosync ; pcs cluster start.

The above issue repeated itself at least 5-6 times. I do NOT see anything in syslog either. I will be glad if you lead me on how to solve this.

Thanks,

Murat

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to