Reviewed: https://review.opendev.org/c/openstack/nova/+/838555 Committed: https://opendev.org/openstack/nova/commit/3af2ecc13fa9334de8418accaed4fffefefb41da Submitter: "Zuul (22348)" Branch: master
commit 3af2ecc13fa9334de8418accaed4fffefefb41da Author: Balazs Gibizer <g...@redhat.com> Date: Tue Apr 19 18:36:50 2022 +0200 Allow claiming PCI PF if child VF is unavailable As If9ab424cc7375a1f0d41b03f01c4a823216b3eb8 stated there is a way for the pci_device table to become inconsistent. Parent PF can be in 'available' state while children VFs are still in 'unavailable' state. In this situation the PF is schedulable but the PCI claim will fail when try to mark the dependent VFs unavailable. This patch changes the PCI claim logic to allow claiming the parent PF in the inconsistent situation as we assume that it is safe to do so. This claim also fixed the inconsistency so that when the parent PF is freed the children VFs become available again. Closes-Bug: #1969496 Change-Id: I575ce06bcc913add7db0849f85728371da2032fc ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1969496 Title: booting with PCI device fails: Attempt to consume PCI device xxx from empty pool Status in OpenStack Compute (nova): Fix Released Bug description: We saw in the field that the pci_devices table can end up in inconsistent state after a compute node HW failure and re-deployment. There could be dependent devices where the parent PF is in available state while the children VFs are in unavailable state. (Before the HW fault the PF was allocated hence the VFs was marked unavailable). In this state this PF is still schedulable but during the PCI claim the handling of dependent devices in the PCI tracker will fail with the error: "Attempt to consume PCI device XXX from empty pool". The reason of the failure is that when the PF is claimed, all the children VFs are marked unavailable. But if the VF is already unavailable such step fails. There is no reproducer found so far that generates the inconsistent state. (We tried whitelist reconfiguration, evacuation, VM delete while the compute was down) But recovering from the inconsistency should be possible. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1969496/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp