Public bug reported: Originally reported in RH bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1584315
Reproduced on OSP12 (Pike). After resizing an instance but before confirm, update_available_resource will fail on the source compute due to bug 1774249. If nova compute is restarted at this point before the resize is confirmed, the update_available_resource period task will never have succeeded, and therefore ResourceTracker's compute_nodes dict will not be populated at all. When confirm calls _delete_allocation_after_move() it will fail with ComputeHostNotFound because there is no entry for the current node in ResourceTracker. The error looks like: 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [req-4f7d5d63-fc05-46ed-b505-41050d889752 09abbd4893bb45eea8fb1d5e40635339 d4483d13a6ef41b2ae575ddbd0c59141 - default default] [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] Setting instance vm_state to ERROR: ComputeHostNotFound: Compute host compute-1.localdomain could not be found. 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] Traceback (most recent call last): 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7445, in _error_out_instance_on_exception 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] yield 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3757, in _confirm_resize 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] migration.source_node) 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3790, in _delete_allocation_after_move 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] cn_uuid = rt.get_node_uuid(nodename) 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 155, in get_node_uuid 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] raise exception.ComputeHostNotFound(host=nodename) 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] ComputeHostNotFound: Compute host compute-1.localdomain could not be found. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1774252 Title: Resize confirm fails if nova-compute is restarted after resize Status in OpenStack Compute (nova): New Bug description: Originally reported in RH bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1584315 Reproduced on OSP12 (Pike). After resizing an instance but before confirm, update_available_resource will fail on the source compute due to bug 1774249. If nova compute is restarted at this point before the resize is confirmed, the update_available_resource period task will never have succeeded, and therefore ResourceTracker's compute_nodes dict will not be populated at all. When confirm calls _delete_allocation_after_move() it will fail with ComputeHostNotFound because there is no entry for the current node in ResourceTracker. The error looks like: 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [req-4f7d5d63-fc05-46ed-b505-41050d889752 09abbd4893bb45eea8fb1d5e40635339 d4483d13a6ef41b2ae575ddbd0c59141 - default default] [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] Setting instance vm_state to ERROR: ComputeHostNotFound: Compute host compute-1.localdomain could not be found. 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] Traceback (most recent call last): 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7445, in _error_out_instance_on_exception 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] yield 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3757, in _confirm_resize 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] migration.source_node) 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3790, in _delete_allocation_after_move 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] cn_uuid = rt.get_node_uuid(nodename) 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 155, in get_node_uuid 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] raise exception.ComputeHostNotFound(host=nodename) 2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] ComputeHostNotFound: Compute host compute-1.localdomain could not be found. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1774252/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp