** Description changed: + [ Impact ] + + s390/pci: Fix immediate re-add of PCI function after remove + + A PCI function may be reserved directly after being + deconfigured. If it subsequently returns back in the standby + state Linux may not be able to use the new instance generating + a kernel warning about trying to create an already existing + sysfs file for the IOMMU. + + The problem occurs because the new instance of the same + underlying device is created before the prior instance is + completely torn down. This happens because the lifetime of the + PCI device representation in Linux is determined by reference + counts. A driver, the network stack, or even user-space + (including via vfio-pci) may be holding onto the device + represenation even after the underlying device is gone. + + The solution to this is twofold. Firstly allow re-using the + pre-existing struct zpci_dev and/or struct pci_dev for the newly + re-added instance of the underlying device up until the point + where the struct zpci_dev is fully removed. Secondly serialize + the addition and removal of PCI functions such that re-adding + a new instance, after the old one is already being removed, will + wait for the removal to finish before adding the new instance. + This fix also builds on prior upstream work of serializing state + transitions for PCI devices e.g. from configured to standby. + + [ Fix ] + + Backport from mainline: + - 0d48566d4b58 s390/pci: rename lock member in struct zpci_dev + - bcb5d6c76903 s390/pci: introduce lock to synchronize state of zpci_dev's + - 6ee600bfbe0f s390/pci: remove hotplug slot when releasing the device + - c4a585e952ca s390/pci: Fix potential double remove of hotplug slot + - 42420c50c68f s390/pci: Fix missing check for zpci_create_device() error return + - 05a2538f2b48 s390/pci: Fix duplicate pci_dev_put() in disable_slot() when PF has child VFs + - d76f96332967 s390/pci: Remove redundant bus removal and disable from zpci_release_device() + - 47c397844869 s390/pci: Prevent self deletion in disable_slot() + - 4b1815a52d7e s390/pci: Allow re-add of a reserved but not yet removed device + - 774a1fa880bc s390/pci: Serialize device addition and removal + + [ Test Plan ] + + The issue can be reproduced looking at the behavior of the kernel wrt to + NETH PCI functions. In fact, IBM Z firmware temporarily reserves NETH + PCI functions to check for pending service when the last FID of a PCHID + is deconfigured. When nothing is pending the PCI function is immediately + returned in the standby state, thus triggering this issue quite + reliably. + + [ Where Problems Could Occur ] + + The fix affects the PCI function lifecycle management in the s390 PCI + hotplug infrastructure, specifically the serialization and reuse logic + of zpci_dev and pci_dev structures during rapid remove and re-add + cycles. An issue with this fix may introduce problems such as stale or + incorrectly reused device state, leading to improper reinitialization of + PCI functions. + + + --- + Description: s390/pci: Fix immediate re-add of PCI function after remove Symptom: A PCI function may be reserved directly after being - deconfigured. If it subsequently returns back in the standby - state Linux may not be able to use the new instance generating - a kernel warning about trying to create an already existing - sysfs file for the IOMMU. + deconfigured. If it subsequently returns back in the standby + state Linux may not be able to use the new instance generating + a kernel warning about trying to create an already existing + sysfs file for the IOMMU. Problem: The problem occurs because the new instance of the same - underlying device is created before the prior instance is - completely torn down. This happens because the lifetime of the - PCI device representation in Linux is determined by reference - counts. A driver, the network stack, or even user-space - (including via vfio-pci) may be holding onto the device - represenation even after the underlying device is gone. + underlying device is created before the prior instance is + completely torn down. This happens because the lifetime of the + PCI device representation in Linux is determined by reference + counts. A driver, the network stack, or even user-space + (including via vfio-pci) may be holding onto the device + represenation even after the underlying device is gone. Solution: The solution to this is twofold. Firstly allow re-using the - pre-existing struct zpci_dev and/or struct pci_dev for the newly - re-added instance of the underlying device up until the point - where the struct zpci_dev is fully removed. Secondly serialize - the addition and removal of PCI functions such that re-adding - a new instance, after the old one is already being removed, will - wait for the removal to finish before adding the new instance. - This fix also builds on prior upstream work of serializing state - transitions for PCI devices e.g. from configured to standby. + pre-existing struct zpci_dev and/or struct pci_dev for the newly + re-added instance of the underlying device up until the point + where the struct zpci_dev is fully removed. Secondly serialize + the addition and removal of PCI functions such that re-adding + a new instance, after the old one is already being removed, will + wait for the removal to finish before adding the new instance. + This fix also builds on prior upstream work of serializing state + transitions for PCI devices e.g. from configured to standby. Reproduction: This problem was originally found with firmware which - temporarily reserves NETH PCI functions to check for pending - service when the last FID of a PCHID is deconfigured. When - nothing is pending the PCI function is immediately returned in - the standby state, thus triggering this issue quite reliably. + temporarily reserves NETH PCI functions to check for pending + service when the last FID of a PCHID is deconfigured. When + nothing is pending the PCI function is immediately returned in + the standby state, thus triggering this issue quite reliably. Upstream-ID: 0d48566d4b58946c8e1b0baac0347616060a81c9 - bcb5d6c769039c8358a2359e7c3ea5d97ce93108 - 6ee600bfbe0f818ffb7748d99e9b0c89d0d9f02a - c4a585e952ca403a370586d3f16e8331a7564901 - 42420c50c68f3e95e90de2479464f420602229fc - 05a2538f2b48500cf4e8a0a0ce76623cc5bafcf1 - d76f9633296785343d45f85199f4138cb724b6d2 - 47c397844869ad0e6738afb5879c7492f4691122 - 4b1815a52d7eb03b3e0e6742c6728bc16a4b2d1d - 774a1fa880bc949d88b5ddec9494a13be733dfa8 + bcb5d6c769039c8358a2359e7c3ea5d97ce93108 + 6ee600bfbe0f818ffb7748d99e9b0c89d0d9f02a + c4a585e952ca403a370586d3f16e8331a7564901 + 42420c50c68f3e95e90de2479464f420602229fc + 05a2538f2b48500cf4e8a0a0ce76623cc5bafcf1 + d76f9633296785343d45f85199f4138cb724b6d2 + 47c397844869ad0e6738afb5879c7492f4691122 + 4b1815a52d7eb03b3e0e6742c6728bc16a4b2d1d + 774a1fa880bc949d88b5ddec9494a13be733dfa8
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2114174 Title: [UBUNTU 24.04] s390/pci: Fix immediate re-add of PCI function after remove To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2114174/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs