** Summary changed: - Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance + [SRU] Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance
** Description changed: + [Impact] + This patch is required to prevent nova from accidentally marking pci_device allocations as deleted when it incorrectly reads the passthrough whitelist + + [Test Case] + * deploy openstack (any version that supports sriov) + * single compute configured for sriov with at least once device in pci_passthrough_whitelist + * create a vm and attach sriov port + * remove device from pci_passthrough_whitelist and restart nova-compute + * check that pci_devices allocations have not been marked as deleted + + [Regression Potential] + None anticipated + ---------------------------------------------------------------------------- Upon trying to create VM instance (Say A) with one QAT VF, it fails with the following error i.e., “Requested operation is not valid: PCI device 0000:88:04.7 is in use by driver QEMU, domain instance-00000081”. Please note that, PCI device 0000:88:04.7 is already being assigned to another VM (Say B) . We have installed openstack-mitaka release on CentO7 system. It has two Intel QAT devices. There are 32 VF devices available per QAT Device/DH895xCC device Out of 64 VFs, only 8 VFs are allocated (to VM instances) and rest should be available. - But the nova scheduler tries to assign an already-in-use SRIOV VF to a new instance and instance fails. It appears that the nova database is not tracking which VF's have already been taken. But if I shut down VM B instance, then other instance VM A boots up and vice-versa. Note that, both the VM instances cannot run simultaneously because of the aforesaid issue. + But the nova scheduler tries to assign an already-in-use SRIOV VF to a new instance and instance fails. It appears that the nova database is not tracking which VF's have already been taken. But if I shut down VM B instance, then other instance VM A boots up and vice-versa. Note that, both the VM instances cannot run simultaneously because of the aforesaid issue. We should always be able to create as many instances with the requested PCI devices as there are available VFs. Please feel free to let me know if additional information is needed. Can anyone please suggest why it tries to assign same PCI device which has been assigned already? Is there any way to resolve this issue? Thank you in advance for your support and help. [root@localhost ~(keystone_admin)]# lspci -d:435 83:00.0 Co-processor: Intel Corporation DH895XCC Series QAT 88:00.0 Co-processor: Intel Corporation DH895XCC Series QAT [root@localhost ~(keystone_admin)]# - [root@localhost ~(keystone_admin)]# lspci -d:443 | grep "QAT Virtual Function" | wc -l 64 [root@localhost ~(keystone_admin)]# - - + [root@localhost ~(keystone_admin)]# mysql -u root nova -e "SELECT hypervisor_hostname, address, instance_uuid, status FROM pci_devices JOIN compute_nodes oncompute_nodes.id=compute_node_id" | grep 0000:88:04.7 localhost 0000:88:04.7 e10a76f3-e58e-4071-a4dd-7a545e8000de allocated localhost 0000:88:04.7 c3dbac90-198d-4150-ba0f-a80b912d8021 allocated localhost 0000:88:04.7 c7f6adad-83f0-4881-b68f-6d154d565ce3 allocated localhost.nfv.benunets.com 0000:88:04.7 0c3c11a5-f9a4-4f0d-b120-40e4dde843d4 allocated [root@localhost ~(keystone_admin)]# - + [root@localhost ~(keystone_admin)]# grep -r e10a76f3-e58e-4071-a4dd-7a545e8000de /etc/libvirt/qemu /etc/libvirt/qemu/instance-00000081.xml: <uuid>e10a76f3-e58e-4071-a4dd-7a545e8000de</uuid> /etc/libvirt/qemu/instance-00000081.xml: <entry name='uuid'>e10a76f3-e58e-4071-a4dd-7a545e8000de</entry> /etc/libvirt/qemu/instance-00000081.xml: <source file='/var/lib/nova/instances/e10a76f3-e58e-4071-a4dd-7a545e8000de/disk'/> /etc/libvirt/qemu/instance-00000081.xml: <source path='/var/lib/nova/instances/e10a76f3-e58e-4071-a4dd-7a545e8000de/console.log'/> /etc/libvirt/qemu/instance-00000081.xml: <source path='/var/lib/nova/instances/e10a76f3-e58e-4071-a4dd-7a545e8000de/console.log'/> [root@localhost ~(keystone_admin)]# [root@localhost ~(keystone_admin)]# grep -r 0c3c11a5-f9a4-4f0d-b120-40e4dde843d4 /etc/libvirt/qemu /etc/libvirt/qemu/instance-000000ab.xml: <uuid>0c3c11a5-f9a4-4f0d-b120-40e4dde843d4</uuid> /etc/libvirt/qemu/instance-000000ab.xml: <entry name='uuid'>0c3c11a5-f9a4-4f0d-b120-40e4dde843d4</entry> /etc/libvirt/qemu/instance-000000ab.xml: <source file='/var/lib/nova/instances/0c3c11a5-f9a4-4f0d-b120-40e4dde843d4/disk'/> /etc/libvirt/qemu/instance-000000ab.xml: <source path='/var/lib/nova/instances/0c3c11a5-f9a4-4f0d-b120-40e4dde843d4/console.log'/> /etc/libvirt/qemu/instance-000000ab.xml: <source path='/var/lib/nova/instances/0c3c11a5-f9a4-4f0d-b120-40e4dde843d4/console.log'/> [root@localhost ~(keystone_admin)]# - - On the controller, , it appears there are duplicate PCI device entries in the Database: - + + On the controller, , it appears there are duplicate PCI device entries + in the Database: + MariaDB [nova]> select hypervisor_hostname,address,count(*) from pci_devices JOIN compute_nodes on compute_nodes.id=compute_node_id group by hypervisor_hostname,address having count(*) > 1; +---------------------+--------------+----------+ | hypervisor_hostname | address | count(*) | +---------------------+--------------+----------+ | localhost | 0000:05:00.0 | 3 | | localhost | 0000:05:00.1 | 3 | | localhost | 0000:83:01.0 | 3 | | localhost | 0000:83:01.1 | 3 | | localhost | 0000:83:01.2 | 3 | | localhost | 0000:83:01.3 | 3 | | localhost | 0000:83:01.4 | 3 | | localhost | 0000:83:01.5 | 3 | | localhost | 0000:83:01.6 | 3 | | localhost | 0000:83:01.7 | 3 | | localhost | 0000:83:02.0 | 3 | | localhost | 0000:83:02.1 | 3 | | localhost | 0000:83:02.2 | 3 | | localhost | 0000:83:02.3 | 3 | | localhost | 0000:83:02.4 | 3 | | localhost | 0000:83:02.5 | 3 | | localhost | 0000:83:02.6 | 3 | | localhost | 0000:83:02.7 | 3 | | localhost | 0000:83:03.0 | 3 | | localhost | 0000:83:03.1 | 3 | | localhost | 0000:83:03.2 | 3 | | localhost | 0000:83:03.3 | 3 | | localhost | 0000:83:03.4 | 3 | | localhost | 0000:83:03.5 | 3 | | localhost | 0000:83:03.6 | 3 | | localhost | 0000:83:03.7 | 3 | | localhost | 0000:83:04.0 | 3 | | localhost | 0000:83:04.1 | 3 | | localhost | 0000:83:04.2 | 3 | | localhost | 0000:83:04.3 | 3 | | localhost | 0000:83:04.4 | 3 | | localhost | 0000:83:04.5 | 3 | | localhost | 0000:83:04.6 | 3 | | localhost | 0000:83:04.7 | 3 | | localhost | 0000:88:01.0 | 3 | | localhost | 0000:88:01.1 | 3 | | localhost | 0000:88:01.2 | 3 | | localhost | 0000:88:01.3 | 3 | | localhost | 0000:88:01.4 | 3 | | localhost | 0000:88:01.5 | 3 | | localhost | 0000:88:01.6 | 3 | | localhost | 0000:88:01.7 | 3 | | localhost | 0000:88:02.0 | 3 | | localhost | 0000:88:02.1 | 3 | | localhost | 0000:88:02.2 | 3 | | localhost | 0000:88:02.3 | 3 | | localhost | 0000:88:02.4 | 3 | | localhost | 0000:88:02.5 | 3 | | localhost | 0000:88:02.6 | 3 | | localhost | 0000:88:02.7 | 3 | | localhost | 0000:88:03.0 | 3 | | localhost | 0000:88:03.1 | 3 | | localhost | 0000:88:03.2 | 3 | | localhost | 0000:88:03.3 | 3 | | localhost | 0000:88:03.4 | 3 | | localhost | 0000:88:03.5 | 3 | | localhost | 0000:88:03.6 | 3 | | localhost | 0000:88:03.7 | 3 | | localhost | 0000:88:04.0 | 3 | | localhost | 0000:88:04.1 | 3 | | localhost | 0000:88:04.2 | 3 | | localhost | 0000:88:04.3 | 3 | | localhost | 0000:88:04.4 | 3 | | localhost | 0000:88:04.5 | 3 | | localhost | 0000:88:04.6 | 3 | | localhost | 0000:88:04.7 | 3 | +---------------------+--------------+----------+ 66 rows in set (0.00 sec) - + MariaDB [nova]> ** Tags added: sts-sru-needed ** Also affects: nova (Ubuntu) Importance: Undecided Status: New ** Also affects: cloud-archive Importance: Undecided Status: New ** Also affects: cloud-archive/mitaka Importance: Undecided Status: New ** Also affects: cloud-archive/rocky Importance: Undecided Status: New ** Also affects: cloud-archive/ocata Importance: Undecided Status: New ** Also affects: cloud-archive/stein Importance: Undecided Status: New ** Also affects: cloud-archive/queens Importance: Undecided Status: New ** Also affects: nova (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: nova (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: nova (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: nova (Ubuntu Eoan) Importance: Undecided Status: New ** Also affects: nova (Ubuntu Disco) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1633120 Title: [SRU] Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive mitaka series: New Status in Ubuntu Cloud Archive ocata series: New Status in Ubuntu Cloud Archive queens series: New Status in Ubuntu Cloud Archive rocky series: New Status in Ubuntu Cloud Archive stein series: New Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Fix Committed Status in OpenStack Compute (nova) pike series: Fix Committed Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Status in nova package in Ubuntu: New Status in nova source package in Xenial: New Status in nova source package in Bionic: New Status in nova source package in Cosmic: New Status in nova source package in Disco: New Status in nova source package in Eoan: New Bug description: [Impact] This patch is required to prevent nova from accidentally marking pci_device allocations as deleted when it incorrectly reads the passthrough whitelist [Test Case] * deploy openstack (any version that supports sriov) * single compute configured for sriov with at least once device in pci_passthrough_whitelist * create a vm and attach sriov port * remove device from pci_passthrough_whitelist and restart nova-compute * check that pci_devices allocations have not been marked as deleted [Regression Potential] None anticipated ---------------------------------------------------------------------------- Upon trying to create VM instance (Say A) with one QAT VF, it fails with the following error i.e., “Requested operation is not valid: PCI device 0000:88:04.7 is in use by driver QEMU, domain instance-00000081”. Please note that, PCI device 0000:88:04.7 is already being assigned to another VM (Say B) . We have installed openstack-mitaka release on CentO7 system. It has two Intel QAT devices. There are 32 VF devices available per QAT Device/DH895xCC device Out of 64 VFs, only 8 VFs are allocated (to VM instances) and rest should be available. But the nova scheduler tries to assign an already-in-use SRIOV VF to a new instance and instance fails. It appears that the nova database is not tracking which VF's have already been taken. But if I shut down VM B instance, then other instance VM A boots up and vice-versa. Note that, both the VM instances cannot run simultaneously because of the aforesaid issue. We should always be able to create as many instances with the requested PCI devices as there are available VFs. Please feel free to let me know if additional information is needed. Can anyone please suggest why it tries to assign same PCI device which has been assigned already? Is there any way to resolve this issue? Thank you in advance for your support and help. [root@localhost ~(keystone_admin)]# lspci -d:435 83:00.0 Co-processor: Intel Corporation DH895XCC Series QAT 88:00.0 Co-processor: Intel Corporation DH895XCC Series QAT [root@localhost ~(keystone_admin)]# [root@localhost ~(keystone_admin)]# lspci -d:443 | grep "QAT Virtual Function" | wc -l 64 [root@localhost ~(keystone_admin)]# [root@localhost ~(keystone_admin)]# mysql -u root nova -e "SELECT hypervisor_hostname, address, instance_uuid, status FROM pci_devices JOIN compute_nodes oncompute_nodes.id=compute_node_id" | grep 0000:88:04.7 localhost 0000:88:04.7 e10a76f3-e58e-4071-a4dd-7a545e8000de allocated localhost 0000:88:04.7 c3dbac90-198d-4150-ba0f-a80b912d8021 allocated localhost 0000:88:04.7 c7f6adad-83f0-4881-b68f-6d154d565ce3 allocated localhost.nfv.benunets.com 0000:88:04.7 0c3c11a5-f9a4-4f0d-b120-40e4dde843d4 allocated [root@localhost ~(keystone_admin)]# [root@localhost ~(keystone_admin)]# grep -r e10a76f3-e58e-4071-a4dd-7a545e8000de /etc/libvirt/qemu /etc/libvirt/qemu/instance-00000081.xml: <uuid>e10a76f3-e58e-4071-a4dd-7a545e8000de</uuid> /etc/libvirt/qemu/instance-00000081.xml: <entry name='uuid'>e10a76f3-e58e-4071-a4dd-7a545e8000de</entry> /etc/libvirt/qemu/instance-00000081.xml: <source file='/var/lib/nova/instances/e10a76f3-e58e-4071-a4dd-7a545e8000de/disk'/> /etc/libvirt/qemu/instance-00000081.xml: <source path='/var/lib/nova/instances/e10a76f3-e58e-4071-a4dd-7a545e8000de/console.log'/> /etc/libvirt/qemu/instance-00000081.xml: <source path='/var/lib/nova/instances/e10a76f3-e58e-4071-a4dd-7a545e8000de/console.log'/> [root@localhost ~(keystone_admin)]# [root@localhost ~(keystone_admin)]# grep -r 0c3c11a5-f9a4-4f0d-b120-40e4dde843d4 /etc/libvirt/qemu /etc/libvirt/qemu/instance-000000ab.xml: <uuid>0c3c11a5-f9a4-4f0d-b120-40e4dde843d4</uuid> /etc/libvirt/qemu/instance-000000ab.xml: <entry name='uuid'>0c3c11a5-f9a4-4f0d-b120-40e4dde843d4</entry> /etc/libvirt/qemu/instance-000000ab.xml: <source file='/var/lib/nova/instances/0c3c11a5-f9a4-4f0d-b120-40e4dde843d4/disk'/> /etc/libvirt/qemu/instance-000000ab.xml: <source path='/var/lib/nova/instances/0c3c11a5-f9a4-4f0d-b120-40e4dde843d4/console.log'/> /etc/libvirt/qemu/instance-000000ab.xml: <source path='/var/lib/nova/instances/0c3c11a5-f9a4-4f0d-b120-40e4dde843d4/console.log'/> [root@localhost ~(keystone_admin)]# On the controller, , it appears there are duplicate PCI device entries in the Database: MariaDB [nova]> select hypervisor_hostname,address,count(*) from pci_devices JOIN compute_nodes on compute_nodes.id=compute_node_id group by hypervisor_hostname,address having count(*) > 1; +---------------------+--------------+----------+ | hypervisor_hostname | address | count(*) | +---------------------+--------------+----------+ | localhost | 0000:05:00.0 | 3 | | localhost | 0000:05:00.1 | 3 | | localhost | 0000:83:01.0 | 3 | | localhost | 0000:83:01.1 | 3 | | localhost | 0000:83:01.2 | 3 | | localhost | 0000:83:01.3 | 3 | | localhost | 0000:83:01.4 | 3 | | localhost | 0000:83:01.5 | 3 | | localhost | 0000:83:01.6 | 3 | | localhost | 0000:83:01.7 | 3 | | localhost | 0000:83:02.0 | 3 | | localhost | 0000:83:02.1 | 3 | | localhost | 0000:83:02.2 | 3 | | localhost | 0000:83:02.3 | 3 | | localhost | 0000:83:02.4 | 3 | | localhost | 0000:83:02.5 | 3 | | localhost | 0000:83:02.6 | 3 | | localhost | 0000:83:02.7 | 3 | | localhost | 0000:83:03.0 | 3 | | localhost | 0000:83:03.1 | 3 | | localhost | 0000:83:03.2 | 3 | | localhost | 0000:83:03.3 | 3 | | localhost | 0000:83:03.4 | 3 | | localhost | 0000:83:03.5 | 3 | | localhost | 0000:83:03.6 | 3 | | localhost | 0000:83:03.7 | 3 | | localhost | 0000:83:04.0 | 3 | | localhost | 0000:83:04.1 | 3 | | localhost | 0000:83:04.2 | 3 | | localhost | 0000:83:04.3 | 3 | | localhost | 0000:83:04.4 | 3 | | localhost | 0000:83:04.5 | 3 | | localhost | 0000:83:04.6 | 3 | | localhost | 0000:83:04.7 | 3 | | localhost | 0000:88:01.0 | 3 | | localhost | 0000:88:01.1 | 3 | | localhost | 0000:88:01.2 | 3 | | localhost | 0000:88:01.3 | 3 | | localhost | 0000:88:01.4 | 3 | | localhost | 0000:88:01.5 | 3 | | localhost | 0000:88:01.6 | 3 | | localhost | 0000:88:01.7 | 3 | | localhost | 0000:88:02.0 | 3 | | localhost | 0000:88:02.1 | 3 | | localhost | 0000:88:02.2 | 3 | | localhost | 0000:88:02.3 | 3 | | localhost | 0000:88:02.4 | 3 | | localhost | 0000:88:02.5 | 3 | | localhost | 0000:88:02.6 | 3 | | localhost | 0000:88:02.7 | 3 | | localhost | 0000:88:03.0 | 3 | | localhost | 0000:88:03.1 | 3 | | localhost | 0000:88:03.2 | 3 | | localhost | 0000:88:03.3 | 3 | | localhost | 0000:88:03.4 | 3 | | localhost | 0000:88:03.5 | 3 | | localhost | 0000:88:03.6 | 3 | | localhost | 0000:88:03.7 | 3 | | localhost | 0000:88:04.0 | 3 | | localhost | 0000:88:04.1 | 3 | | localhost | 0000:88:04.2 | 3 | | localhost | 0000:88:04.3 | 3 | | localhost | 0000:88:04.4 | 3 | | localhost | 0000:88:04.5 | 3 | | localhost | 0000:88:04.6 | 3 | | localhost | 0000:88:04.7 | 3 | +---------------------+--------------+----------+ 66 rows in set (0.00 sec) MariaDB [nova]> To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1633120/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp