Public bug reported: This came up while reviewing the fix for bug 1756179:
https://review.openstack.org/#/c/554920/6/nova/api/openstack/compute/services.py@226 Full IRC conversation is here: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack- nova.2018-04-11.log.html#t2018-04-11T20:32:13 The summary is that it's possible to delete a compute service and it's associated compute node record even if that compute node has instances on it. Before placement, this wasn't a huge problem because you could evacuate the instances to another host or if you brought the host back up, it will recreate the service and compute node and the resource tracker will "heal" itself by finding instances running on that host and node combo: https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/compute/resource_tracker.py#L714 The problem is after we started requiring placement, and creating allocations in the scheduler in Pike, those allocations are against the compute_nodes.uuid for the compute node resource provider. If the service and it's related compute node record are deleted, restarting the service will create a new service and compute node record with a new UUID which will result in a new resource provider in placement, and the instances running on that host will have allocations against the now orphaned resource provider. The new resource provider will be reporting incorrect consumption so scheduling will also be affected. So we should block deleting a compute service (and it's node) here: https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/api/openstack/compute/services.py#L213 If that host (node) has instances on it. This problem goes back to Pike. Ocata is OK in that the resource tracker on Ocata computes will "heal" allocations during the update_available_resource periodic task (and when the compute service starts up), and in Ocata the FilterScheduler does not create allocations in Placement. ** Affects: nova Importance: High Assignee: Matt Riedemann (mriedem) Status: Triaged ** Affects: nova/pike Importance: Undecided Status: New ** Affects: nova/queens Importance: Undecided Status: New ** Tags: api placement ** Also affects: nova/pike Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Changed in: nova Assignee: (unassigned) => Matt Riedemann (mriedem) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1763183 Title: DELETE /os-services/{service_id} does not block for hosted instances Status in OpenStack Compute (nova): Triaged Status in OpenStack Compute (nova) pike series: New Status in OpenStack Compute (nova) queens series: New Bug description: This came up while reviewing the fix for bug 1756179: https://review.openstack.org/#/c/554920/6/nova/api/openstack/compute/services.py@226 Full IRC conversation is here: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack- nova.2018-04-11.log.html#t2018-04-11T20:32:13 The summary is that it's possible to delete a compute service and it's associated compute node record even if that compute node has instances on it. Before placement, this wasn't a huge problem because you could evacuate the instances to another host or if you brought the host back up, it will recreate the service and compute node and the resource tracker will "heal" itself by finding instances running on that host and node combo: https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/compute/resource_tracker.py#L714 The problem is after we started requiring placement, and creating allocations in the scheduler in Pike, those allocations are against the compute_nodes.uuid for the compute node resource provider. If the service and it's related compute node record are deleted, restarting the service will create a new service and compute node record with a new UUID which will result in a new resource provider in placement, and the instances running on that host will have allocations against the now orphaned resource provider. The new resource provider will be reporting incorrect consumption so scheduling will also be affected. So we should block deleting a compute service (and it's node) here: https://github.com/openstack/nova/blob/2c5da2212c3fa3e589c4af171486a2097fd8c54e/nova/api/openstack/compute/services.py#L213 If that host (node) has instances on it. This problem goes back to Pike. Ocata is OK in that the resource tracker on Ocata computes will "heal" allocations during the update_available_resource periodic task (and when the compute service starts up), and in Ocata the FilterScheduler does not create allocations in Placement. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1763183/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp