On Thu, 2018-09-27 at 09:36 +0200, Ulrich Windl wrote: > Hi! > > Obviously you violated the most important cluster rule that is "be > patient". > Maybe the next important is "Don't change the configuration while the > cluster > is not in IDLE state" ;-)
Agreed -- although even idle, removing a ban can result in a migration back (if something like stickiness doesn't prevent it). There's currently no way to tell pacemaker that an operation (i.e. migrate_from) is a no-op and can be ignored. If a migration is only partially completed, it has to be considered a failure and reverted. I'm not sure why the reload was scheduled; I suspect it's a bug due to a restart being needed but no parameters having changed. There should be special handling for a partial migration to make the stop required. > I feel these are issues that should be fixed, but the above rules > make your > life easier while these issues still exist. > > Regards, > Ulrich > > > > > Ferenc Wágner <wagner.fer...@kifu.gov.hu> schrieb am 27.09.2018 > > > > um 08:37 > > in > Nachricht <87tvmb5ttw....@lant.ki.iif.hu>: > > Hi, > > > > The current behavior of cancelled migration with Pacemaker 1.1.16 > > with a > > resource implementing push migration: > > > > # /usr/sbin/crm_resource ‑‑ban ‑r vm‑conv‑4 > > > > vhbl03 crmd[10017]: notice: State transition S_IDLE ‑> > > S_POLICY_ENGINE > > vhbl03 pengine[10016]: notice: Migrate vm‑conv‑4#011(Started > > vhbl07 ‑> > > vhbl04) > > vhbl03 crmd[10017]: notice: Initiating migrate_to operation > > vm‑conv‑4_migrate_to_0 on vhbl07 > > vhbl03 pengine[10016]: notice: Calculated transition 4633, saving > > inputs > > in /var/lib/pacemaker/pengine/pe‑input‑1069.bz2 > > [...] > > > > At this point, with the migration still ongoing, I wanted to get > > rid of > > the constraint: > > > > # /usr/sbin/crm_resource ‑‑clear ‑r vm‑conv‑4 > > > > vhbl03 crmd[10017]: notice: Transition aborted by deletion of > > rsc_location[@id='cli‑ban‑vm‑conv‑4‑on‑vhbl07']: Configuration > > change > > vhbl07 crmd[10233]: notice: Result of migrate_to operation for > > vm‑conv‑4 > > on > > vhbl07: 0 (ok) > > vhbl03 crmd[10017]: notice: Transition 4633 (Complete=6, > > Pending=0, > > Fired=0, Skipped=1, Incomplete=6, > > Source=/var/lib/pacemaker/pengine/pe‑input‑1069.bz2): Stopped > > vhbl03 pengine[10016]: notice: Resource vm‑conv‑4 can no longer > > migrate to > > vhbl04. Stopping on vhbl07 too > > vhbl03 pengine[10016]: notice: Reload vm‑conv‑4#011(Started > > vhbl07) > > vhbl03 pengine[10016]: notice: Calculated transition 4634, saving > > inputs > > in /var/lib/pacemaker/pengine/pe‑input‑1070.bz2 > > vhbl03 crmd[10017]: notice: Initiating stop operation > > vm‑conv‑4_stop_0 on > > vhbl07 > > vhbl03 crmd[10017]: notice: Initiating stop operation > > vm‑conv‑4_stop_0 on > > vhbl04 > > vhbl03 crmd[10017]: notice: Initiating reload operation > > vm‑conv‑4_reload_0 > > on vhbl04 > > > > This recovery was entirely unnecessary, as the resource > > successfully > > migrated to vhbl04 (the migrate_from operation does > > nothing). Pacemaker > > does not know this, but is there a way to educate it? I think in > > this > > special case it is possible to redesign the agent making migrate_to > > a > > no‑op and doing everything in migrate_from, which would > > significantly > > reduce the window between the start points of the two "halfs", but > > I'm > > not sure that would help in the end: Pacemaker could still decide > > to do > > an unnecessary stop+start recovery. Would it? I failed to find > > any > > documentation on recovery from aborted migration transitions. I > > don't > > expect on‑fail (for migrate_* ops, not me) to apply here, does it? > > > > Side question: why initiate a reload in any case, like above? > > > > Even more side question: could you please consider using space > > instead > > of TAB in syslog messages? (Actually, I wouldn't mind getting rid > > of > > them altogether in any output.) > > ‑‑ > > Thanks, > > Feri > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc > > h.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org