For the fix to Bug #1654403 charm-hacluster sets TimeoutStartSec and
TimeoutStopSync for both corosync and pacemaker, to the same value.

system-wide default (xenial, bionic): TimeoutStopSec=90s TimeoutStartSec=90s
corosync package default: system-wide default (no changes)
pacemaker package default: TimeoutStopSec=30min TimeoutStartSec=60s

charm-hacluster corosync+pacemaker override: TimeoutStopSec=60s
TimeoutStartSec=180s

effective changes:
corosync TimeoutStopSec=90s -> 60s    TimeoutStartSec=90s -> 180s
pacemaker TimeoutStopSec=30min -> 60s TimeoutStartSec=60s -> 180s

The original bug description was "On corosync restart, corosync may take
longer than a minute to come up. The systemd start script times out too
soon. Then pacemaker which is dependent on corosync is immediatly
started and fails as corosync is still in the process of starting."

So the TimeoutStartSec increase from 60/90 -> 180 was the only thing
needed. I believe the TimeoutStopSec change for pacemaker is in error at
least as the bug is described.

Having said that, I can imagine charm failures during deployment or
reconfiguration where it tries to stop pacemaker for various reasons and
it fails to stop fast enough because the resources won't migrate away
(possibly because all the nodes are trying to stop at the same time, as
charm-hacluster doesn't seem to have a staggered change setup) and it
currently restarts corosync to effect changes to the ring. So this may
well have fixed other charm-related problems not really accurately
described in the previous bug - though that bug does specifically
mention cases where the expected cluster_count is not set - in that case
it tries to setup corosync/pacemaker before all 3 nodes are up - which
might get into this scenario. So before we go ahead and change the
stop_timeout back to 30min we probably need to validate various
scenarios for that issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1903745

Title:
  pacemaker left stopped after unattended-upgrade of pacemaker
  (1.1.14-2ubuntu1.8 -> 1.1.14-2ubuntu1.9)

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1903745/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to