Public bug reported:

This patch series adds VFIO CXL Type-2 device passthrough support to the
nvidia-6.17 kernel, enabling CXL-capable accelerator devices to be
assigned to virtual machines via VFIO. It includes:

1. VFIO get_region_info refactoring - Upstream series that splits 
VFIO_DEVICE_GET_REGION_INFO into its own driver op and introduces 
get_region_info_caps, which is a prerequisite for the CXL VFIO region 
implementation
    
2. VFIO CXL Type-2 passthrough - Manish Honap's series adding CXL awareness to 
vfio-pci-core, including HDM decoder register emulation, DPA region mapping 
with demand-fault mmap, CXL DVSEC config virtualization, and CXL region 
management

3. VFIO CXL guest-initiated reset - Manish Honap's RFC-v2 series
enabling guest-initiated CXL protocol reset with HDM decoder base
address preservation and DVSEC STATUS2 virtualization

Key Features Added:

- CXL Type-2 device detection and initialization within vfio-pci-core
- HDM decoder register emulation framework for guest access
- DPA (Device Physical Address) VFIO region with demand-fault mmap and reset zap
- CXL DVSEC configuration space write virtualization
- CXL component BAR sparse mmap advertisement to userspace
- Guest-initiated CXL protocol reset (cxl_dev_reset)
- HDM decoder base address preservation across reset
- DVSEC STATUS2 register virtualization in vconfig shadow
- Module parameter disable_cxl for per-device opt-out
- UAPI header (include/uapi/cxl/cxl_regs.h) for CXL register defines

Justification

VFIO CXL passthrough is required for assigning CXL Type-2 accelerator
devices (GPUs, SmartNICs) to virtual machines:

- Enables VM guests to directly access CXL device memory (DPA regions) via VFIO
- Provides proper HDM decoder emulation so guests can manage device memory
- Supports guest-initiated CXL reset for device recovery within VMs
- Virtualizes DVSEC config writes to allow safe guest access to CXL 
configuration
- Required for NVIDIA CXL-capable hardware in virtualized environments

Source
Patch Breakdown (51 patches):
----------------------------------------------------------------------------------
1. 03 patches - Upstream VFIO prerequisites (Hisilicon + nvgrace-gpu fix)
   Upstream torvalds/master (merged)
----------------------------------------------------------------------------------
2. 22 patches - VFIO get_region_info series
   Upstream torvalds/master (merged in v6.19)
----------------------------------------------------------------------------------
3. 19 patches - Manish Honap's VFIO CXL Type-2 series v2 (19/20, selftest 
skipped)
   LKML (v2, not yet merged)
----------------------------------------------------------------------------------
4. 06 patches - Manish Honap's VFIO CXL reset series RFC-v2
   Internal (RFC-v2, not yet merged)
----------------------------------------------------------------------------------
5. 01 patch  - Config annotations update
   OOT (build config)
----------------------------------------------------------------------------------
TOTAL   51


Notes on upstream prerequisites (item 1):

Three upstream commits cherry-picked:

    4868d2d52df6 — crypto: hisilicon - qm updates BAR configuration
    2131c1517f30 — hisi_acc_vfio_pci: adapt to new migration configuration
    767b1ed8b980 — vfio/nvgrace-gpu: fix grammatical error

The first two resolve a dependency for e238f147d517 ("vfio/hisi: Convert
to the get_region_info op"). The third fixes a pre-existing comment typo in
the nvgrace-gpu driver that would otherwise cause a patch-ID mismatch with
upstream 1b0ecb5baf4a ("vfio/pci: Convert all PCI drivers to
get_region_info_caps").
Notes on the VFIO get_region_info series (item 2):

22 upstream commits from Jason Gunthorpe's series, already merged in
v6.19:

https://lore.kernel.org/all/[email protected]/

These refactor the VFIO region info infrastructure that the CXL VFIO
passthrough series depends on.
Notes on Manish Honap's VFIO CXL series (item 3):

19 out of 20 patches ported from:

https://lore.kernel.org/linux-
cxl/[email protected]/

Patch 20/20 (selftests) was skipped as the upstream VFIO selftest
infrastructure (tools/testing/selftests/vfio/) is not present in
the NV-Kernels base.

Conflict resolutions were required for 10 of 19 patches due to the
NV-Kernels base diverging from upstream in two ways:

    CXL PCI function declarations (cxl_find_regblock,
    cxl_probe_component_regs, cxl_await_range_active,
    cxl_regblock_get_bar_info) are in include/cxl/pci.h
    unconditionally (per Srirangan/Alejandro's series convention),
    rather than in include/cxl/cxl.h with CONFIG_CXL_BUS guards
    as Manish's patches expect.
    Missing upstream xe driver, dmabuf, and p2pdma support causes
    context mismatches in Kconfig, Makefiles, and VFIO headers.

Notes on Manish Honap's CXL reset series (item 4):

6 patches from internal RFC-v2 posting:

    [RFC-v2 0/6] vfio/cxl: Guest-initiated CXL protocol reset

Patch 1/6 had a conflict resolution identical to item 3 (declarations
added to include/cxl/pci.h instead of include/cxl/cxl.h).
Lore Links:

    Jason Gunthorpe's VFIO get_region_info series (v2, merged in v6.19):
    
https://lore.kernel.org/all/[email protected]/

    Manish Honap's VFIO CXL Type-2 passthrough series (v2):
    https://lore.kernel.org/linux-cxl/[email protected]/


Testing
Build Validation:

    Built successfully for ARM64 4K page size kernel
    Built successfully for ARM64 64K page size kernel

Config Verification:

CXL VFIO config enabled:

CONFIG_VFIO_CXL_CORE=y

Runtime Testing:

    Boot test on ARM64 system
    CXL Type-2 device enumeration via VFIO
    CXL guest-initiated reset test

Notes

- CONFIG_VFIO_CXL_CORE is a new bool config enabled for both amd64 and
    arm64. It depends on VFIO_PCI_CORE (module), CXL_BUS (built-in), and
    CXL_MEM (built-in). As a bool, it compiles into the vfio-pci-core module.

- This series depends on the CXL infrastructure established in PR 
[linux-nvidia-6.17-next] Add CXL Type-2 device support, RAS error handling, 
reset, state save/restore, and interleaving support #342
    (Alejandro's v23, Srirangan's save/restore and reset series).

- A new UAPI header include/uapi/cxl/cxl_regs.h is introduced for CXL
    component and HDM register defines, using UAPI-safe macros (__GENMASK,
    _BITUL) and raw hex sizes instead of kernel-internal SZ_* macros.

- Patch 20/20 of Manish's series (CXL Type-2 VFIO assignment selftest) was
    intentionally skipped as the upstream VFIO selftest infrastructure is not
    present in the NV-Kernels base.

** Affects: linux-nvidia-6.17 (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2152222

Title:
  CXL VFIO: Add CXL Type-2 device passthrough support

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2152222/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to