Public bug reported: This came up in the cross-cell resize review:
https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495 And I was able to recreate with a functional test here: https://review.opendev.org/#/c/688832/ That test is doing a cross-cell cold migration but looking at the code: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461 We can hit an issue for same-cell resize/cold migrate if we have swapped the allocations so the source node allocations are held by the migration consumer and the instance holds allocations on the target node (created by the scheduler): https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328 If something fails between ^ and the cast to prep_resize, the task will rollback and revert the allocations so the target node allocations are dropped and the source node allocations are moved back to the instance: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91 Furthermore, if the instance was deleted when we perform that swap, the move_allocations method will recreate the allocations on the source node for the now-deleted instance since we don't assert consumer generations during the swap: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886 This results in leaking allocations for the source node since the instance is deleted. ** Affects: nova Importance: Undecided Status: Triaged ** Tags: cold-migrate placement resize ** Changed in: nova Status: New => Triaged -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1848343 Title: MigrationTask rollback can leak allocations for a deleted server Status in OpenStack Compute (nova): Triaged Bug description: This came up in the cross-cell resize review: https://review.opendev.org/#/c/627890/60/nova/conductor/tasks/cross_cell_migrate.py@495 And I was able to recreate with a functional test here: https://review.opendev.org/#/c/688832/ That test is doing a cross-cell cold migration but looking at the code: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L461 We can hit an issue for same-cell resize/cold migrate if we have swapped the allocations so the source node allocations are held by the migration consumer and the instance holds allocations on the target node (created by the scheduler): https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L328 If something fails between ^ and the cast to prep_resize, the task will rollback and revert the allocations so the target node allocations are dropped and the source node allocations are moved back to the instance: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/conductor/tasks/migrate.py#L91 Furthermore, if the instance was deleted when we perform that swap, the move_allocations method will recreate the allocations on the source node for the now-deleted instance since we don't assert consumer generations during the swap: https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/scheduler/client/report.py#L1886 This results in leaking allocations for the source node since the instance is deleted. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1848343/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp