** Also affects: nova/pike Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New
** Changed in: nova Importance: Undecided => Medium ** Changed in: nova/pike Status: New => In Progress ** Changed in: nova/queens Status: New => In Progress ** Changed in: nova/pike Importance: Undecided => Medium ** Changed in: nova/queens Importance: Undecided => Medium ** Changed in: nova/pike Assignee: (unassigned) => Lee Yarwood (lyarwood) ** Changed in: nova/queens Assignee: (unassigned) => Lee Yarwood (lyarwood) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1764883 Title: Evacuation fails if the source host returns while the migration is still in progress Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: In Progress Bug description: Description =========== If the migration is in a 'pre-migrating' state this can result in the source compute manager not removing the evacuating instances in question during _destroy_evacuated_instances. More importantly the source host returning online early allows _init_instance to set instance.status to ERROR and instance.task_state to None thanks to the following failed rebuild logic : https://github.com/openstack/nova/blob/f106094e961c5ab430687d673063baee379f6bbd/nova/compute/manager.py#L810-L821 As a result the in-progress rebuild will fail when it attempts to save the instance while expecting a certain task_state : https://github.com/openstack/nova/blob/f106094e961c5ab430687d673063baee379f6bbd/nova/compute/manager.py#L3050-L3052 https://github.com/openstack/nova/blob/f106094e961c5ab430687d673063baee379f6bbd/nova/compute/manager.py#L3123 This issue was originally reported downstream while testing an instance high-availability feature that uses a mixture of Pacemaker and instance evacuation to keep instances online : Nova reports overcloud instance in error state after failed double compute failover instance-ha evacuation https://bugzilla.redhat.com/show_bug.cgi?id=1567606 This report includes an example UnexpectedTaskStateError failure in c#8 : 2018-04-17 11:11:12.999 1 ERROR nova.compute.manager [req-ac20c023-9abf-412f-987f-2981c7837c57 da4d95c480c343c5bf6abe3b789f4c17 d2c2437b7f6642b4a1d5907fa5f373a9 - default default] [instance: d9419b05-025e-4193-b3f7-7f0efc23593b] Setting instance vm_state to ERROR: UnexpectedTaskStateError_Remote: Conflict updating instance d9419b05-025e-4193-b3f7-7f0efc23593b. Expected: {'task_state': [u'rebuild_spawning']}. Actual: {'task_state': None} The rally based tests for this feature just happen to use the `b` sysrq-trigger that immediately reboots the host allowing them to recover just in time to hit this. Steps to reproduce ================== - Evacuate an instance - Restart the source compute service before the instance is fully rebuilt Expected result =============== The source compute removes the instance and does not attempt to update the instance or task state. Actual result ============= The source compute doesn't attempt to remove the instance and attempts to update the instance and task state before the rebuild is complete. Environment =========== 1. Exact version of OpenStack you are running. See the following 88adde8bba393b8d08ce21e9e3334a76e853b2e0 2. Which hypervisor did you use? (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...) What's the version of that? Libvirt + KVM 2. Which storage type did you use? (For example: Ceph, LVM, GPFS, ...) What's the version of that? Local, yet to test with shared storage. 3. Which networking type did you use? (For example: nova-network, Neutron with OpenVSwitch, ...) N/A Logs & Configs ============== See https://bugzilla.redhat.com/show_bug.cgi?id=1567606#c8 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1764883/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp