This goes back to Pike as noted in the bug description. Before Pike the ResourceTracker.update_available_resource code would at least correct the allocations based on the instance.flavor and instance.host.
** Also affects: nova/rocky Importance: Undecided Status: New ** Also affects: nova/pike Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Changed in: nova/pike Status: New => Triaged ** Changed in: nova/rocky Status: New => Triaged ** Changed in: nova/stein Status: New => Triaged ** Changed in: nova/rocky Importance: Undecided => Medium ** Changed in: nova/pike Importance: Undecided => Medium ** Changed in: nova/queens Status: New => Triaged ** Changed in: nova/queens Importance: Undecided => Medium ** Changed in: nova/stein Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1821594 Title: Error in confirm_migration leaves stale allocations and 'confirming' migration state Status in OpenStack Compute (nova): Triaged Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: Triaged Status in OpenStack Compute (nova) rocky series: Triaged Status in OpenStack Compute (nova) stein series: Triaged Bug description: Description: When performing a cold migration, if an exception is raised by the driver during confirm_migration (this runs in the source node), the migration record is stuck in "confirming" state and the allocations against the source node are not removed. The instance is fine at the destination in this stage, but the source host has allocations that is not possible to clean without going to the database or invoking the Placement API via curl. After several migration attempts that fail in the same spot, the source node is filled with these allocations that prevent new instances from being created or instances migrated to this node. When confirm_migration fails in this stage, the migrating instance can be saved through a hard reboot or a reset state to active. Steps to reproduce: Unfortunately, I don't have logs of the real root cause of the problem inside driver.confirm_migration running libvirt driver. However, the stale allocations and migration status problem can be easily reproduced by raising an exception in libvirt driver's confirm_migration method, and it would affect any driver. Expected results: Discussed this issue with efried and mriedem over #openstack-nova on March 25th, 2019. They confirmed that allocations not being cleared up is a bug. Actual results: Instance is fine at the destination after a reset-state. Source node has stale allocations that prevent new instances from being created/migrated to the source node. Migration record is stuck in "confirming" state. Environment: I verified this bug on on pike, queens and stein branches. Running libvirt KVM driver. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1821594/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp