Created one public PPA so the SRU proposal can be tested before asking for sponsorship:
https://launchpad.net/~inaddy/+archive/ubuntu/lp1353473 # apt-add-repository ppa:inaddy/lp1353473 # apt-get update # apt-get dist-upgrade * attention: this will replace current trusty pacemaker version: pacemaker_1.1.10+git20130802-1ubuntu2 * to version: pacemaker_1.1.10+git20130802-1ubuntu3 * because versioning is already ready for the SRU proposal. * to get back to current trusty version you will have to remove * the pacemaker by hand and install it again (maybe ignoring * dependencies if you don't want to reinstall hole clustering * packages). After upgrading to version: pacemaker_1.1.10+git20130802-1ubuntu3 Anyone who is suffering for this issue can try to # "crm node standby <node>" again and check if ldmd stops monitoring resources on nodes put to standby. Tks -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to pacemaker in Ubuntu. https://bugs.launchpad.net/bugs/1353473 Title: Trusty Pacemaker "crm node standby" stops resource successfully, but lrmd still monitors it and causes "Failed actions" Status in “pacemaker” package in Ubuntu: Confirmed Bug description: It was brought to me (~inaddy) the following situation: """""" * Environment Ubuntu 14.04 LTS Pacemaker 1.1.10+git20130802-1ubuntu2 * Priority High * Issue I used "crm node standby" and the resource(haproxy) was stopped successfully. But lrmd still monitors it and causes "Failed actions". --------------------------------------- Node A1LB101 (167969461): standby Online: [ A1LB102 ] Resource Group: grpHaproxy vip-internal (ocf::heartbeat:IPaddr2): Started A1LB102 vip-external (ocf::heartbeat:IPaddr2): Started A1LB102 vip-nfs (ocf::heartbeat:IPaddr2): Started A1LB102 vip-iscsi (ocf::heartbeat:IPaddr2): Started A1LB102 Resource Group: grpStonith1 prmStonith1-1 (stonith:external/stonith-helper): Started A1LB102 Clone Set: clnHaproxy [haproxy] Started: [ A1LB102 ] Stopped: [ A1LB101 ] Clone Set: clnPing [ping] Started: [ A1LB102 ] Stopped: [ A1LB101 ] Node Attributes: * Node A1LB101: * Node A1LB102: + default_ping_set : 400 Migration summary: * Node A1LB101: haproxy: migration-threshold=1 fail-count=18 last-failure='Mon Jul 7 20:28:58 2014' * Node A1LB102: Failed actions: haproxy_monitor_10000 (node=A1LB101, call=2332, rc=7, status=complete, last-rc-change=Mon Jul 7 20:28:58 2014 , queued=0ms, exec=0ms ): not running --------------------------------------- Abstract from log (ha-log.node1) Jul 7 20:28:50 A1LB101 crmd[6364]: notice: te_rsc_command: Initiating action 42: stop haproxy_stop_0 on A1LB101 (local) Jul 7 20:28:50 A1LB101 crmd[6364]: info: match_graph_event: Action haproxy_stop_0 (42) confirmed on A1LB101 (rc=0) Jul 7 20:28:58 A1LB101 crmd[6364]: notice: process_lrm_event: A1LB101-haproxy_monitor_10000:1372 [ haproxy not running.\n ] """""" I wasn't able to reproduce this error so far but the fix seems a straightforward cherry-picking from upstream patch set fix: c72bfea664bd04656c306409381cef824679ea06 [PATCH 1/3] Fix: services: Do not allow duplicate recurring op entries. 7a02cd7745d56009ac65251c77d0fe052008224f [PATCH 2/3] High: lrmd: Merge duplicate recurring monitor operations. 7e37f9bb35534102b83e2bc45941036361e33214 [PATCH 3/3] Fix: lrmd: Cancel recurring operations before stop action is executed So I'm assuming (and testing right now) this will fix the issue... Opening the public bug for the fix I'll provide after tests, and to ask others to test the fix also. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1353473/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~ubuntu-ha Post to : [email protected] Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp

