Reviewed: https://review.openstack.org/301859 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c469b8466fc5ff5514957a0fbd17d141761774c8 Submitter: Jenkins Branch: master
commit c469b8466fc5ff5514957a0fbd17d141761774c8 Author: Nikola Dipanov <ndipa...@redhat.com> Date: Tue Apr 5 18:09:53 2016 +0100 pci: make sure device relationships are kept in memory `pci_devs` attribute of PciDevTracker class is the in-memory "master copy" of all devices on each compute host, and all data changes that happen when claiming/allocating/freeing devices HAVE TO be made against instances contained in `pci_devs` list, because they are periodically flushed to the DB when the save() method is called. Due to this we need to make sure all the relationships are available to the code using them (claiming/allocation/freeing methods). We do this by simply keeping a tree structure by referencing parent/children from objects themselves. This is done on every update of the state of PCI devices (on compute service start up, and on every resource tracker pass), so that this information is always as up to date as the in memory view of devices. This change adds the code to build up the tree, and subsequent changes will make sure the newly added relationships are used when needed. We also add 2 non-versioned fields added to PciDevice object to hold the references. Co-Authored-By: Sahid Ferdjaoui <sahid.ferdja...@redhat.com> Change-Id: Id6868b7839efb2cd53f5f7aaac2c55d169356ce4 Partial-bug: #1565785 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1565785 Title: SR-IOV PF passthrough device claiming/allocation does not work for physical functions devices Status in OpenStack Compute (nova): Fix Released Bug description: Enable PCI passthrough on a compute host (whitelist devices explained in more detail in the docs), and create a network, subnet and a port that represents a SR-IOV physical function passthrough: $ neutron net-create --provider:physical_network=phynet --provider:network_type=flat sriov-net $ neutron subnet-create sriov-net 192.168.2.0/24 --name sriov-subne $ neutron port-create sriov-net --binding:vnic_type=direct-physical --name pf After that try to boot an instance using the created port (provided the pci_passthrough_whitelist was setup correctly) this should work: $ boot --image xxx --flavor 1 --nic port-id=$PORT_ABOVE testvm My test env has 2 PFs with 7 VFs each, after spawning an instance, the PF gets marked as allocated, but non of the VFs do, even though they are removed from the host (note that device_pools are correctly updated. So after the instance was successfully booted we get MariaDB [nova]> select count(*) from pci_devices where status="available" and deleted=0; +----------+ | count(*) | +----------+ | 15 | +----------+ # This should be 8 - we are leaking 7 VFs belonging to the attached PF that never get updated. MariaDB [nova]> select pci_stats from compute_nodes; | pci_stats | {"nova_object.version": "1.1", "nova_object.changes": ["objects"], "nova_object.name": "PciDevicePoolList", "nova_object.data": {"objects": [{"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_ node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_object.data": {"count": 1, "numa_node": 0, "vendor_id": "8086", "product_id": "1521", "tags": {"dev_type": "type-PF", "physical _network": "phynet"}}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_ object.data": {"count": 7, "numa_node": 0, "vendor_id": "8086", "product_id": "1520", "tags": {"dev_type": "type-VF", "physical_network": "phynet"}}, "nova_object.namespace": "nova"}]}, "nova_object.namespace": "n ova"} | This is correct - shows 8 available devices Once a new resource_tracker run happens we hit https://bugs.launchpad.net/nova/+bug/1565721 so we stop updating based on what is found on the host. The root cause of this is (I believe) that we update PCI objects in the local scope, but never call save() on those particular instances. So we grap and update the status here: https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/objects/pci_device.py#L339-L349 but never call save inside that method. The save is eventually called here referencing completely different instances that never see the update: https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/compute/resource_tracker.py#L646 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1565785/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp