[Yahoo-eng-team] [Bug 1713796] [NEW] Failed unshelve does not remove allocations from destination node

Matt Riedemann Tue, 29 Aug 2017 12:02:32 -0700

Public bug reported:

During an unshelve from an offloaded instance, conductor will call the
scheduler to pick a host. The scheduler will make allocations against
the chosen node as part of that select_destinations() call. Then
conductor casts to that compute host to unshelve the instance.


If the spawn on the hypervisor fails while we've made the instance
claim:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4485

Or even if the claim test fails, the allocations on the destination node
aren't removed in Placement.

The RT aborts the claim here:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L414

That calls _update_usage_from_instance but doesn't change the
has_ocata_computes kwarg so we get here:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L1041

And we don't cleanup the allocations for the instance.

The other case is if the claim fails, the instance_claim method will
raise ComputeResourcesUnavailable which would be handled here:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/claims.py#L161

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4491

But we don't remove allocations or do any other cleanup there.

** Affects: nova
     Importance: High
         Status: Triaged

** Affects: nova/pike
     Importance: High
         Status: Confirmed


** Tags: placement shelve unshelve

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Changed in: nova/pike
       Status: New => Confirmed

** Changed in: nova/pike
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713796

Title:
  Failed unshelve does not remove allocations from destination node

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) pike series:
  Confirmed

Bug description:
  During an unshelve from an offloaded instance, conductor will call the
  scheduler to pick a host. The scheduler will make allocations against
  the chosen node as part of that select_destinations() call. Then
  conductor casts to that compute host to unshelve the instance.

  If the spawn on the hypervisor fails while we've made the instance
  claim:

  
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4485

  Or even if the claim test fails, the allocations on the destination
  node aren't removed in Placement.

  The RT aborts the claim here:

  
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L414

  That calls _update_usage_from_instance but doesn't change the
  has_ocata_computes kwarg so we get here:

  
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L1041

  And we don't cleanup the allocations for the instance.

  The other case is if the claim fails, the instance_claim method will
  raise ComputeResourcesUnavailable which would be handled here:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/claims.py#L161

  
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4491

  But we don't remove allocations or do any other cleanup there.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713796/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1713796] [NEW] Failed unshelve does not remove allocations from destination node

Reply via email to