Public bug reported:
This patch series adds VFIO CXL Type-2 device passthrough support to the
nvidia-6.17 kernel, enabling CXL-capable accelerator devices to be
assigned to virtual machines via VFIO. It includes:
1. VFIO get_region_info refactoring - Upstream series that splits
VFIO_DEVICE_GET_REGION_INFO into its own driver op and introduces
get_region_info_caps, which is a prerequisite for the CXL VFIO region
implementation
2. VFIO CXL Type-2 passthrough - Manish Honap's series adding CXL awareness to
vfio-pci-core, including HDM decoder register emulation, DPA region mapping
with demand-fault mmap, CXL DVSEC config virtualization, and CXL region
management
3. VFIO CXL guest-initiated reset - Manish Honap's RFC-v2 series
enabling guest-initiated CXL protocol reset with HDM decoder base
address preservation and DVSEC STATUS2 virtualization
Key Features Added:
- CXL Type-2 device detection and initialization within vfio-pci-core
- HDM decoder register emulation framework for guest access
- DPA (Device Physical Address) VFIO region with demand-fault mmap and reset zap
- CXL DVSEC configuration space write virtualization
- CXL component BAR sparse mmap advertisement to userspace
- Guest-initiated CXL protocol reset (cxl_dev_reset)
- HDM decoder base address preservation across reset
- DVSEC STATUS2 register virtualization in vconfig shadow
- Module parameter disable_cxl for per-device opt-out
- UAPI header (include/uapi/cxl/cxl_regs.h) for CXL register defines
Justification
VFIO CXL passthrough is required for assigning CXL Type-2 accelerator
devices (GPUs, SmartNICs) to virtual machines:
- Enables VM guests to directly access CXL device memory (DPA regions) via VFIO
- Provides proper HDM decoder emulation so guests can manage device memory
- Supports guest-initiated CXL reset for device recovery within VMs
- Virtualizes DVSEC config writes to allow safe guest access to CXL
configuration
- Required for NVIDIA CXL-capable hardware in virtualized environments
Source
Patch Breakdown (51 patches):
----------------------------------------------------------------------------------
1. 03 patches - Upstream VFIO prerequisites (Hisilicon + nvgrace-gpu fix)
Upstream torvalds/master (merged)
----------------------------------------------------------------------------------
2. 22 patches - VFIO get_region_info series
Upstream torvalds/master (merged in v6.19)
----------------------------------------------------------------------------------
3. 19 patches - Manish Honap's VFIO CXL Type-2 series v2 (19/20, selftest
skipped)
LKML (v2, not yet merged)
----------------------------------------------------------------------------------
4. 06 patches - Manish Honap's VFIO CXL reset series RFC-v2
Internal (RFC-v2, not yet merged)
----------------------------------------------------------------------------------
5. 01 patch - Config annotations update
OOT (build config)
----------------------------------------------------------------------------------
TOTAL 51
Notes on upstream prerequisites (item 1):
Three upstream commits cherry-picked:
4868d2d52df6 — crypto: hisilicon - qm updates BAR configuration
2131c1517f30 — hisi_acc_vfio_pci: adapt to new migration configuration
767b1ed8b980 — vfio/nvgrace-gpu: fix grammatical error
The first two resolve a dependency for e238f147d517 ("vfio/hisi: Convert
to the get_region_info op"). The third fixes a pre-existing comment typo in
the nvgrace-gpu driver that would otherwise cause a patch-ID mismatch with
upstream 1b0ecb5baf4a ("vfio/pci: Convert all PCI drivers to
get_region_info_caps").
Notes on the VFIO get_region_info series (item 2):
22 upstream commits from Jason Gunthorpe's series, already merged in
v6.19:
https://lore.kernel.org/all/[email protected]/
These refactor the VFIO region info infrastructure that the CXL VFIO
passthrough series depends on.
Notes on Manish Honap's VFIO CXL series (item 3):
19 out of 20 patches ported from:
https://lore.kernel.org/linux-
cxl/[email protected]/
Patch 20/20 (selftests) was skipped as the upstream VFIO selftest
infrastructure (tools/testing/selftests/vfio/) is not present in
the NV-Kernels base.
Conflict resolutions were required for 10 of 19 patches due to the
NV-Kernels base diverging from upstream in two ways:
CXL PCI function declarations (cxl_find_regblock,
cxl_probe_component_regs, cxl_await_range_active,
cxl_regblock_get_bar_info) are in include/cxl/pci.h
unconditionally (per Srirangan/Alejandro's series convention),
rather than in include/cxl/cxl.h with CONFIG_CXL_BUS guards
as Manish's patches expect.
Missing upstream xe driver, dmabuf, and p2pdma support causes
context mismatches in Kconfig, Makefiles, and VFIO headers.
Notes on Manish Honap's CXL reset series (item 4):
6 patches from internal RFC-v2 posting:
[RFC-v2 0/6] vfio/cxl: Guest-initiated CXL protocol reset
Patch 1/6 had a conflict resolution identical to item 3 (declarations
added to include/cxl/pci.h instead of include/cxl/cxl.h).
Lore Links:
Jason Gunthorpe's VFIO get_region_info series (v2, merged in v6.19):
https://lore.kernel.org/all/[email protected]/
Manish Honap's VFIO CXL Type-2 passthrough series (v2):
https://lore.kernel.org/linux-cxl/[email protected]/
Testing
Build Validation:
Built successfully for ARM64 4K page size kernel
Built successfully for ARM64 64K page size kernel
Config Verification:
CXL VFIO config enabled:
CONFIG_VFIO_CXL_CORE=y
Runtime Testing:
Boot test on ARM64 system
CXL Type-2 device enumeration via VFIO
CXL guest-initiated reset test
Notes
- CONFIG_VFIO_CXL_CORE is a new bool config enabled for both amd64 and
arm64. It depends on VFIO_PCI_CORE (module), CXL_BUS (built-in), and
CXL_MEM (built-in). As a bool, it compiles into the vfio-pci-core module.
- This series depends on the CXL infrastructure established in PR
[linux-nvidia-6.17-next] Add CXL Type-2 device support, RAS error handling,
reset, state save/restore, and interleaving support #342
(Alejandro's v23, Srirangan's save/restore and reset series).
- A new UAPI header include/uapi/cxl/cxl_regs.h is introduced for CXL
component and HDM register defines, using UAPI-safe macros (__GENMASK,
_BITUL) and raw hex sizes instead of kernel-internal SZ_* macros.
- Patch 20/20 of Manish's series (CXL Type-2 VFIO assignment selftest) was
intentionally skipped as the upstream VFIO selftest infrastructure is not
present in the NV-Kernels base.
** Affects: linux-nvidia-6.17 (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2152222
Title:
CXL VFIO: Add CXL Type-2 device passthrough support
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2152222/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs