[ClusterLabs] Salvaging aborted resource migration

Ferenc Wágner Wed, 26 Sep 2018 23:38:23 -0700

Hi,

The current behavior of cancelled migration with Pacemaker 1.1.16 with a
resource implementing push migration:


# /usr/sbin/crm_resource --ban -r vm-conv-4

vhbl03 crmd[10017]:   notice: State transition S_IDLE -> S_POLICY_ENGINE
vhbl03 pengine[10016]:   notice: Migrate vm-conv-4#011(Started vhbl07 -> vhbl04)
vhbl03 crmd[10017]:   notice: Initiating migrate_to operation 
vm-conv-4_migrate_to_0 on vhbl07
vhbl03 pengine[10016]:   notice: Calculated transition 4633, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-1069.bz2
[...]

At this point, with the migration still ongoing, I wanted to get rid of
the constraint:

# /usr/sbin/crm_resource --clear -r vm-conv-4

vhbl03 crmd[10017]:   notice: Transition aborted by deletion of 
rsc_location[@id='cli-ban-vm-conv-4-on-vhbl07']: Configuration change
vhbl07 crmd[10233]:   notice: Result of migrate_to operation for vm-conv-4 on 
vhbl07: 0 (ok)
vhbl03 crmd[10017]:   notice: Transition 4633 (Complete=6, Pending=0, Fired=0, 
Skipped=1, Incomplete=6, Source=/var/lib/pacemaker/pengine/pe-input-1069.bz2): 
Stopped
vhbl03 pengine[10016]:   notice: Resource vm-conv-4 can no longer migrate to 
vhbl04. Stopping on vhbl07 too
vhbl03 pengine[10016]:   notice: Reload  vm-conv-4#011(Started vhbl07)
vhbl03 pengine[10016]:   notice: Calculated transition 4634, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-1070.bz2
vhbl03 crmd[10017]:   notice: Initiating stop operation vm-conv-4_stop_0 on 
vhbl07
vhbl03 crmd[10017]:   notice: Initiating stop operation vm-conv-4_stop_0 on 
vhbl04
vhbl03 crmd[10017]:   notice: Initiating reload operation vm-conv-4_reload_0 on 
vhbl04

This recovery was entirely unnecessary, as the resource successfully
migrated to vhbl04 (the migrate_from operation does nothing).  Pacemaker
does not know this, but is there a way to educate it?  I think in this
special case it is possible to redesign the agent making migrate_to a
no-op and doing everything in migrate_from, which would significantly
reduce the window between the start points of the two "halfs", but I'm
not sure that would help in the end: Pacemaker could still decide to do
an unnecessary stop+start recovery.  Would it?  I failed to find any
documentation on recovery from aborted migration transitions.  I don't
expect on-fail (for migrate_* ops, not me) to apply here, does it?

Side question: why initiate a reload in any case, like above?

Even more side question: could you please consider using space instead
of TAB in syslog messages?  (Actually, I wouldn't mind getting rid of
them altogether in any output.)
-- 
Thanks,
Feri
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Salvaging aborted resource migration

Reply via email to