just adding the previous filed downstream redhat bug https://bugzilla.redhat.com/show_bug.cgi?id=1852110
this can happen in queens for context so when we root cause the issue and fix it it should like be backported to queens. tjere are other older bugs form newton that look similar related to unshelve so its posible that the same issue is affecting multiple move operations. ** Bug watch added: Red Hat Bugzilla #1852110 https://bugzilla.redhat.com/show_bug.cgi?id=1852110 ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/ussuri Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/victoria Importance: Low Assignee: Balazs Gibizer (balazs-gibizer) Status: Confirmed ** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/ussuri Importance: Undecided => Low ** Changed in: nova/ussuri Status: New => Triaged ** Changed in: nova/train Importance: Undecided => Low ** Changed in: nova/train Status: New => Triaged ** Changed in: nova/stein Importance: Undecided => Low ** Changed in: nova/stein Status: New => Triaged ** Changed in: nova/rocky Importance: Undecided => Low ** Changed in: nova/rocky Status: New => Triaged ** Changed in: nova/queens Importance: Undecided => Low ** Changed in: nova/queens Status: New => Triaged -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896463 Title: evacuation failed: Port update failed : Unable to correlate PCI slot Status in OpenStack Compute (nova): Confirmed Status in OpenStack Compute (nova) queens series: Triaged Status in OpenStack Compute (nova) rocky series: Triaged Status in OpenStack Compute (nova) stein series: Triaged Status in OpenStack Compute (nova) train series: Triaged Status in OpenStack Compute (nova) ussuri series: Triaged Status in OpenStack Compute (nova) victoria series: Confirmed Bug description: Description =========== if the _update_available_resource() of resource_tracker is called between _do_rebuild_instance_with_claim() and instance.save() when evacuating VM instances on destination host, nova/compute/manager.py 2931 def rebuild_instance(self, context, instance, orig_image_ref, image_ref, 2932 +-- 84 lines: injected_files, new_pass, orig_sys_metadata,------------------------------------------------------------------- 3016 claim_ctxt = rebuild_claim( 3017 context, instance, scheduled_node, 3018 limits=limits, image_meta=image_meta, 3019 migration=migration) 3020 self._do_rebuild_instance_with_claim( 3021 +-- 47 lines: claim_ctxt, context, instance, orig_image_ref,----------------------------------------------------------------- 3068 instance.apply_migration_context() 3069 # NOTE (ndipanov): This save will now update the host and node 3070 # attributes making sure that next RT pass is consistent since 3071 # it will be based on the instance and not the migration DB 3072 # entry. 3073 instance.host = self.host 3074 instance.node = scheduled_node 3075 instance.save() 3076 instance.drop_migration_context() the instance is not handled as managed instance of the destination host because it is not updated on DB yet. 2020-09-19 07:27:36.321 8 WARNING nova.compute.resource_tracker [req- b35d5b9a-0786-4809-bd81-ad306cdda8d5 - - - - -] Instance 22f6ca0e-f964-4467-83a3-f2bf12bb05ae is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'MEMORY_MB': 12288, u'VCPU': 2, u'DISK_GB': 10}}. Skipping heal of allocation because we do not know what to do. And so the SRIOV ports (PCI device) was free by clean_usage() eventhough the VM has the VF port already. 743 def _update_available_resource(self, context, resources): 744 +-- 45 lines: # initialize the compute node object, creating it-------------------------------------------------------------- 789 self.pci_tracker.clean_usage(instances, migrations, orphans) 790 dev_pools_obj = self.pci_tracker.stats.to_device_pools_obj() After that, evacuated this VM to another compute host again, we got the error like below. Steps to reproduce ================== 1. create a VM on com1 with SRIOV VF ports. 2. stop and disable nova-compute service on com1 3. wait 60 sec (nova-compute reporting interval) 4. evauate the VM to com2 5. wait the VM is active on com2 6. enable and start nova-compute on com1 7. wait 60 sec (nova-compute reporting interval) 8. stop and disable nova-compute service on com2 9. wait 60 sec (nova-compute reporting interval) 10. evauate the VM to com1 11. wait the VM is active on com1 12. enable and start nova-compute on com2 13. wait 60 sec (nova-compute reporting interval) 14. go to step 2. Expected result =============== Evacuation should be done without errors. Actual result ============= Evacuation failed with "Port update failed" Environment =========== openstack-nova-compute-18.0.1-1 with SRIOV ports are used. libvirt is used. Logs & Configs ============== 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [req-38dd0be2-7223-4a59-8073-dd1b072125c5 c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Setting instance vm_state to ERROR: PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Traceback (most recent call last): 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7993, in _error_out_instance_on_exception 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] yield 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3025, in rebuild_instance 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration, request_spec) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3087, in _do_rebuild_instance_with_claim 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] self._do_rebuild_instance(*args, **kwargs) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3190, in _do_rebuild_instance 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] context, instance, self.host, migration) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 2953, in setup_instance_network_on_host 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 3058, in _update_port_binding_for_instance 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] pci_slot) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1896463/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp