Hi! Obviously you violated the most important cluster rule that is "be patient". Maybe the next important is "Don't change the configuration while the cluster is not in IDLE state" ;-)
I feel these are issues that should be fixed, but the above rules make your life easier while these issues still exist. Regards, Ulrich >>> Ferenc Wágner <wagner.fer...@kifu.gov.hu> schrieb am 27.09.2018 um 08:37 in Nachricht <87tvmb5ttw....@lant.ki.iif.hu>: > Hi, > > The current behavior of cancelled migration with Pacemaker 1.1.16 with a > resource implementing push migration: > > # /usr/sbin/crm_resource ‑‑ban ‑r vm‑conv‑4 > > vhbl03 crmd[10017]: notice: State transition S_IDLE ‑> S_POLICY_ENGINE > vhbl03 pengine[10016]: notice: Migrate vm‑conv‑4#011(Started vhbl07 ‑> vhbl04) > vhbl03 crmd[10017]: notice: Initiating migrate_to operation > vm‑conv‑4_migrate_to_0 on vhbl07 > vhbl03 pengine[10016]: notice: Calculated transition 4633, saving inputs > in /var/lib/pacemaker/pengine/pe‑input‑1069.bz2 > [...] > > At this point, with the migration still ongoing, I wanted to get rid of > the constraint: > > # /usr/sbin/crm_resource ‑‑clear ‑r vm‑conv‑4 > > vhbl03 crmd[10017]: notice: Transition aborted by deletion of > rsc_location[@id='cli‑ban‑vm‑conv‑4‑on‑vhbl07']: Configuration change > vhbl07 crmd[10233]: notice: Result of migrate_to operation for vm‑conv‑4 on > vhbl07: 0 (ok) > vhbl03 crmd[10017]: notice: Transition 4633 (Complete=6, Pending=0, > Fired=0, Skipped=1, Incomplete=6, > Source=/var/lib/pacemaker/pengine/pe‑input‑1069.bz2): Stopped > vhbl03 pengine[10016]: notice: Resource vm‑conv‑4 can no longer migrate to > vhbl04. Stopping on vhbl07 too > vhbl03 pengine[10016]: notice: Reload vm‑conv‑4#011(Started vhbl07) > vhbl03 pengine[10016]: notice: Calculated transition 4634, saving inputs > in /var/lib/pacemaker/pengine/pe‑input‑1070.bz2 > vhbl03 crmd[10017]: notice: Initiating stop operation vm‑conv‑4_stop_0 on > vhbl07 > vhbl03 crmd[10017]: notice: Initiating stop operation vm‑conv‑4_stop_0 on > vhbl04 > vhbl03 crmd[10017]: notice: Initiating reload operation vm‑conv‑4_reload_0 > on vhbl04 > > This recovery was entirely unnecessary, as the resource successfully > migrated to vhbl04 (the migrate_from operation does nothing). Pacemaker > does not know this, but is there a way to educate it? I think in this > special case it is possible to redesign the agent making migrate_to a > no‑op and doing everything in migrate_from, which would significantly > reduce the window between the start points of the two "halfs", but I'm > not sure that would help in the end: Pacemaker could still decide to do > an unnecessary stop+start recovery. Would it? I failed to find any > documentation on recovery from aborted migration transitions. I don't > expect on‑fail (for migrate_* ops, not me) to apply here, does it? > > Side question: why initiate a reload in any case, like above? > > Even more side question: could you please consider using space instead > of TAB in syslog messages? (Actually, I wouldn't mind getting rid of > them altogether in any output.) > ‑‑ > Thanks, > Feri > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org