Public bug reported: During an unshelve from an offloaded instance, conductor will call the scheduler to pick a host. The scheduler will make allocations against the chosen node as part of that select_destinations() call. Then conductor casts to that compute host to unshelve the instance.
If the spawn on the hypervisor fails while we've made the instance claim: https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4485 Or even if the claim test fails, the allocations on the destination node aren't removed in Placement. The RT aborts the claim here: https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L414 That calls _update_usage_from_instance but doesn't change the has_ocata_computes kwarg so we get here: https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L1041 And we don't cleanup the allocations for the instance. The other case is if the claim fails, the instance_claim method will raise ComputeResourcesUnavailable which would be handled here: https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/claims.py#L161 https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4491 But we don't remove allocations or do any other cleanup there. ** Affects: nova Importance: High Status: Triaged ** Affects: nova/pike Importance: High Status: Confirmed ** Tags: placement shelve unshelve ** Also affects: nova/pike Importance: Undecided Status: New ** Changed in: nova/pike Status: New => Confirmed ** Changed in: nova/pike Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1713796 Title: Failed unshelve does not remove allocations from destination node Status in OpenStack Compute (nova): Triaged Status in OpenStack Compute (nova) pike series: Confirmed Bug description: During an unshelve from an offloaded instance, conductor will call the scheduler to pick a host. The scheduler will make allocations against the chosen node as part of that select_destinations() call. Then conductor casts to that compute host to unshelve the instance. If the spawn on the hypervisor fails while we've made the instance claim: https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4485 Or even if the claim test fails, the allocations on the destination node aren't removed in Placement. The RT aborts the claim here: https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L414 That calls _update_usage_from_instance but doesn't change the has_ocata_computes kwarg so we get here: https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L1041 And we don't cleanup the allocations for the instance. The other case is if the claim fails, the instance_claim method will raise ComputeResourcesUnavailable which would be handled here: https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/claims.py#L161 https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4491 But we don't remove allocations or do any other cleanup there. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1713796/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp