Additional information from the charm:
Without cluster_count set to NUM_UNITS a race occurs where the relation
to the last hacluster node is not yet set leading to the attempt to
startup corosync and pacemaker with only n-1/n nodes.
The last node only has one relationship it is aware of yet when there should be
2 relations:
relation-list -r hanode:0
hacluster/0
corosync.conf looks like the following when there should be 3 nodes:
nodelist {
node {
ring0_addr: 10.5.35.235
nodeid: 1000
}
node {
ring0_addr: 10.5.35.237
nodeid: 1001
}
}
The services themselves (not the charm) fail:
corosync logs thousands of RETRANSMIT errors.
pacemaker eventually times out after waiting on corosync.
Adding more documentation to push the setting of cluster_count and
updating the amulet tests to include it.
--
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1654403
Title:
Race condition in hacluster charm that leaves pacemaker down
Status in corosync package in Ubuntu:
New
Status in hacluster package in Juju Charms Collection:
Triaged
Bug description:
Symptom: one or more hacluster nodes are left in an executing state.
Observing the process list on the affected nodes the command 'crm node list'
is in an infinite loop and pacemaker is not started. On nodes that complete the
crm node list and other crm commands pacemaker is started.
See the artefacts from this run:
https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline/openstack/charm-percona-cluster/417131/1/1873/index.html
Hypothesis: There is a race that leads to crm node list being executed
before pacemaker is started. It is also possible that something causes
pacemaker to fail to start.
Suggest a check for pacemaker heath before any crm commands are run.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions
_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help : https://help.launchpad.net/ListHelp