Private bug reported:
PCIe hotplug enables dynamic insertion and removal of devices without
requiring a system reboot. In modern platforms, hotplug operations must
be coordinated with robust error handling mechanisms such as Advanced
Error Reporting (AER) and Downstream Port Containment (DPC) to ensure
safe device addition/removal and fault recovery.
In the OS First AER model, the operating system takes primary
responsibility for handling PCIe errors, including detection, logging,
and recovery actions. This contrasts with firmware-first approaches
where firmware intercepts and processes errors before notifying the OS.
DPC (Downstream Port Containment) is a PCIe feature that isolates faulty
downstream devices or links by automatically disabling the affected port
upon detecting a fatal error, preventing error propagation. In OS-first
mode, the OS is responsible for coordinating recovery actions after DPC
events, including device reset, link retraining, and re-enumeration.
In the Linux kernel, support exists for PCIe hotplug (pciehp), AER, and
DPC subsystems. However, seamless integration of OS-first AER handling
with firmware-assisted DPC flows during hotplug scenarios requires
enhanced coordination, especially in complex topologies (switches, CXL
fabrics, Gen5/Gen6 links).
Feature request:
Requested details to be enabled on OS:
Enable OS-first AER handling for PCIe hotplug scenarios.
Ensure proper integration between AER and DPC subsystems during hotplug
events.
Support coordinated recovery flows (link reset, device re-enumeration)
post-DPC trigger.
Enhance pciehp driver to handle DPC-triggered hotplug-like events.
Provide clear separation and fallback between OS-first and firmware-first
modes.
Expose AER/DPC event details via sysfs/debugfs for observability.
Improve logging and tracing of hotplug + error recovery sequences.
Ensure compatibility with PCIe Gen5/Gen6 and CXL devices.
Support hotplug across PCIe switches and multi-level topologies.
Enable validation and fault-injection testing for AER/DPC hotplug flows.
Document supported flows, configuration options, and best practices.
Business Justification:
Improves reliability and stability of dynamic device insertion/removal.
Enables faster and more flexible infrastructure management without downtime.
Enhances fault containment and recovery in error scenarios.
Supports modern data center use cases with composable and hot-pluggable
resources.
Reduces dependency on firmware for error handling, increasing transparency.
Aligns OS capabilities with advanced PCIe/CXL RAS and hotplug requirements.
References:
PCI-SIG PCIe Specification (AER, DPC, Hotplug)
Linux Kernel PCIe AER, DPC, and Hotplug (pciehp) Documentation
ACPI Specification (_OSC for OS-first control)
Industry Whitepapers on PCIe Hotplug and Error Handling
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Information type changed from Public to Private
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146709
Title:
Request for Hotplug Support – OS First AER and DPC Firmware Handling
Mode
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2146709/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs