Public bug reported: While I was working on fixing the resize for pci passthrough [1] I have notice the following issue in resize.
If you are using small image and you resize-confirm it very fast the old resources are not getting freed. After debug this issue I found out the root cause of it. A Good run of resize is as detailed below: When doing resize the _update_usage_from_migration in the resource trucker called twice. 1. The first call we return the instance type of the new flavor and will enter this case https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L718 2. Then it will put in the tracked_migrations the migration and the new instance_type https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763 3. The second call we return the old instance_type and will enter this case https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L725 4. Then in the tracked_migrations it will overwrite the old value with migration and the old instance type 5. https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763 6. When doing resize-confirm the drop_move_claim called with the old instance type https://github.com/openstack/nova/blob/9a05d38f48ef0f630c5e49e332075b273cee38b9/nova/compute/manager.py#L3369 7. The drop_move_claim will compare the instance_type[id] from the tracked_migrations to the instance_type.id (which is the old one) 8. And because they are equals it will remove the old resource usage https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L328 But with small image like CirrOS and doing the revert-confirm fast the second call of _update_usage_from_migration will not get executing. The result is that when we enter the drop_move_claim it compares it with the new instance_type and this expression is false https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L314 This mean that this code block is not executed https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L326 and therefore old resources are not getting freed. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1590556 Title: race condition with resize causing old resources not to be free Status in OpenStack Compute (nova): New Bug description: While I was working on fixing the resize for pci passthrough [1] I have notice the following issue in resize. If you are using small image and you resize-confirm it very fast the old resources are not getting freed. After debug this issue I found out the root cause of it. A Good run of resize is as detailed below: When doing resize the _update_usage_from_migration in the resource trucker called twice. 1. The first call we return the instance type of the new flavor and will enter this case https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L718 2. Then it will put in the tracked_migrations the migration and the new instance_type https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763 3. The second call we return the old instance_type and will enter this case https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L725 4. Then in the tracked_migrations it will overwrite the old value with migration and the old instance type 5. https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763 6. When doing resize-confirm the drop_move_claim called with the old instance type https://github.com/openstack/nova/blob/9a05d38f48ef0f630c5e49e332075b273cee38b9/nova/compute/manager.py#L3369 7. The drop_move_claim will compare the instance_type[id] from the tracked_migrations to the instance_type.id (which is the old one) 8. And because they are equals it will remove the old resource usage https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L328 But with small image like CirrOS and doing the revert-confirm fast the second call of _update_usage_from_migration will not get executing. The result is that when we enter the drop_move_claim it compares it with the new instance_type and this expression is false https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L314 This mean that this code block is not executed https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L326 and therefore old resources are not getting freed. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1590556/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp