Reviewed:  https://review.openstack.org/419204
Committed: 
https://git.openstack.org/cgit/openstack/charm-hacluster/commit/?id=fda5176bd53f17a69f3e22b6b363bff96ff565c0
Submitter: Jenkins
Branch:    master

commit fda5176bd53f17a69f3e22b6b363bff96ff565c0
Author: David Ames <[email protected]>
Date:   Wed Jan 11 16:00:39 2017 -0800

    Fix pacemaker down crm infinite loop
    
    On corosync restart, corosync may take longer than a minute to come
    up. The systemd start script times out too soon. Then pacemaker which
    is dependent on corosync is immediatly started and fails as corosync
    is still in the process of starting.
    
    Subsequently the charm would run crm node list to validate pacemaker.
    This would become an infinite loop.
    
    This change adds longer timeout values for systemd scripts and adds
    better error handling and communication to the end user.
    
    Change-Id: I7c3d018a03fddfb1f6bfd91fd7aeed4b13879e45
    Partial-Bug: #1654403

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1654403

Title:
  Race condition in hacluster charm that leaves pacemaker down

Status in corosync package in Ubuntu:
  New
Status in hacluster package in Juju Charms Collection:
  Triaged

Bug description:
  Symptom: one or more hacluster nodes are left in an executing state.
  Observing the process list on the affected nodes the command 'crm node list' 
is in an infinite loop and pacemaker is not started. On nodes that complete the 
crm node list and other crm commands pacemaker is started.

  See the artefacts from this run:
  
https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline/openstack/charm-percona-cluster/417131/1/1873/index.html

  Hypothesis: There is a race that leads to crm node list being executed
  before pacemaker is started. It is also possible that something causes
  pacemaker to fail to start.

  Suggest a check for pacemaker heath before any crm commands are run.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

Reply via email to