We'll have to backport whatever the fix is to stable/pike: https://review.openstack.org/#/q/I0df401a7c91f012fdb25cb0e6b344ca51de8c309
** Also affects: nova/pike Importance: Undecided Status: New ** Changed in: nova/pike Importance: Undecided => High ** Changed in: nova Importance: Undecided => High ** Changed in: nova/pike Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1721652 Title: Evacuate cleanup fails at _delete_allocation_for_moved_instance Status in OpenStack Compute (nova): Confirmed Status in OpenStack Compute (nova) pike series: Confirmed Bug description: Description =========== After an evacuation, when nova-compute is restarted on the source host, the clean up of the old instance on the source host fails. The traceback in nova-compute.log ends with: 2017-10-04 05:32:18.725 5575 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 679, in _destroy_evacuated_instances 2017-10-04 05:32:18.725 5575 ERROR oslo_service.service instance, migration.source_node) 2017-10-04 05:32:18.725 5575 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 1216, in delete_allocation_for_evacuated_instance 2017-10-04 05:32:18.725 5575 ERROR oslo_service.service instance, node, 'evacuated', node_type) 2017-10-04 05:32:18.725 5575 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 1227, in _delete_allocation_for_moved_instance 2017-10-04 05:32:18.725 5575 ERROR oslo_service.service cn_uuid = self.compute_nodes[node].uuid 2017-10-04 05:32:18.725 5575 ERROR oslo_service.service KeyError: u'<SOURCE_HOST_NAME>' 2017-10-04 05:32:18.725 5575 ERROR oslo_service.service Steps to reproduce ================== Deploy instance on Host A. Shut down Host A. Evacuate instance to Host B. Turn back on Host A. Wait for cleanup of old instance allocation to occur Expected result =============== Clean up of old instance from Host A is successful Actual result ============= Old instance clean up appears to work but there's a traceback in the log and allocation is not cleaned up. Environment =========== (pike)nova-compute/now 10:16.0.0-201710030907 Additional Info: ================ Problem seems to come from this change: https://github.com/openstack/nova/commit/0de806684f5d670dd5f961f7adf212961da3ed87 at: rt = self._get_resource_tracker() rt.delete_allocation_for_evacuated_instance That is called very early in init_host flow to clean up the allocations. The problem is that at this point in the startup the resource tracker's self.compute_node is still None. That makes delete_allocation_for_evacuated_instance blow up with a key error at: cn_uuid = self.compute_nodes[node].uuid The resource tracker's self.compute_node is actually initialized later on in the startup process via the update_available_resources() -> _update_available_resources() -> _init_compute_node(). It isn't initialized when the tracker is first created which appears to be the assumption made by the referenced commit. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1721652/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp