On Wed, Feb 12, 2025 at 04:38:00PM +0100, Roger Pau Monne wrote:
> Xen currently prevents dom0 from creating CPU or IOMMU page-table mappings
> into the interrupt address range [0xfee00000, 0xfeefffff]. This range has
> two different purposes. For accesses from the CPU is contains the default
> position of local APIC page at 0xfee00000. For accesses from devices
> it's the MSI address range, so the address field in the MSI entries
> (usually) point to an address on that range to trigger an interrupt.
>
> There are reports of Lenovo Thinkpad devices placing what seems to be the
> UCSI shared mailbox at address 0xfeec2000 in the interrupt address range.
> Attempting to use that device with a Linux PV dom0 leads to an error when
> Linux kernel maps 0xfeec2000:
>
> RIP: e030:xen_mc_flush+0x1e8/0x2b0
> xen_leave_lazy_mmu+0x15/0x60
> vmap_range_noflush+0x408/0x6f0
> __ioremap_caller+0x20d/0x350
> acpi_os_map_iomem+0x1a3/0x1c0
> acpi_ex_system_memory_space_handler+0x229/0x3f0
> acpi_ev_address_space_dispatch+0x17e/0x4c0
> acpi_ex_access_region+0x28a/0x510
> acpi_ex_field_datum_io+0x95/0x5c0
> acpi_ex_extract_from_field+0x36b/0x4e0
> acpi_ex_read_data_from_field+0xcb/0x430
> acpi_ex_resolve_node_to_value+0x2e0/0x530
> acpi_ex_resolve_to_value+0x1e7/0x550
> acpi_ds_evaluate_name_path+0x107/0x170
> acpi_ds_exec_end_op+0x392/0x860
> acpi_ps_parse_loop+0x268/0xa30
> acpi_ps_parse_aml+0x221/0x5e0
> acpi_ps_execute_method+0x171/0x3e0
> acpi_ns_evaluate+0x174/0x5d0
> acpi_evaluate_object+0x167/0x440
> acpi_evaluate_dsm+0xb6/0x130
> ucsi_acpi_dsm+0x53/0x80
> ucsi_acpi_read+0x2e/0x60
> ucsi_register+0x24/0xa0
> ucsi_acpi_probe+0x162/0x1e3
> platform_probe+0x48/0x90
> really_probe+0xde/0x340
> __driver_probe_device+0x78/0x110
> driver_probe_device+0x1f/0x90
> __driver_attach+0xd2/0x1c0
> bus_for_each_dev+0x77/0xc0
> bus_add_driver+0x112/0x1f0
> driver_register+0x72/0xd0
> do_one_initcall+0x48/0x300
> do_init_module+0x60/0x220
> __do_sys_init_module+0x17f/0x1b0
> do_syscall_64+0x82/0x170
>
> Remove the restrictions to create mappings the interrupt address range for
> dom0. Note that the restriction to map the local APIC page is enforced
> separately, and that continues to be present.
>
> For PVH dom0 it's important that the restriction is removed from
> arch_iommu_hwdom_init(), as the logic in that function creates mappings in
> both the CPU and the IOMMU page tables for reserved regions in the memory
> map. The expectation is that the page at 0xfeec2000 will be added to the
> host memory map using the EfiACPIMemoryNVS type, so arch_iommu_hwdom_init()
> will create a mapping for it.
>
> Note that even if the interrupt address range entries are populated in the
> IOMMU page-tables no device access will reach those pages. Device accesses
> to the Interrupt Address Range will always be converted into Interrupt
> Messages and are not subject to DMA remapping.
>
> There's also the following restriction noted in Intel VT-d:
>
> > Software must not program paging-structure entries to remap any address to
> > the interrupt address range. Untranslated requests and translation requests
> > that result in an address in the interrupt range will be blocked with
> > condition code LGN.4 or SGN.8. Translated requests with an address in the
> > interrupt address range are treated as Unsupported Request (UR).
>
> However this restriction doesn't apply to the identity mappings possibly
> created for dom0, since the interrupt address range is never subject to DMA
> remapping.
>
> Reported-by: Jürgen Groß <[email protected]>
> Link:
> https://lore.kernel.org/xen-devel/[email protected]/
> Signed-off-by: Roger Pau Monné <[email protected]>
> ---
> xen/arch/x86/dom0_build.c | 4 ----
> xen/drivers/passthrough/x86/iommu.c | 5 -----
> 2 files changed, 9 deletions(-)
>
> diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
> index e8f5bf5447bc..d1b4ef83b2d0 100644
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -555,10 +555,6 @@ int __init dom0_setup_permissions(struct domain *d)
> if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
> rc |= iomem_deny_access(d, mfn, mfn);
> }
> - /* MSI range. */
> - rc |= iomem_deny_access(d, paddr_to_pfn(MSI_ADDR_BASE_LO),
> - paddr_to_pfn(MSI_ADDR_BASE_LO +
> - MSI_ADDR_DEST_ID_MASK));
> /* HyperTransport range. */
> if ( boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
> {
> diff --git a/xen/drivers/passthrough/x86/iommu.c
> b/xen/drivers/passthrough/x86/iommu.c
> index 8b1e0596b84a..ec17701c90dd 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -475,11 +475,6 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
> if ( rc )
> panic("IOMMU failed to remove Xen ranges: %d\n", rc);
>
> - /* Remove any overlap with the Interrupt Address Range. */
> - rc = rangeset_remove_range(map, 0xfee00, 0xfeeff);
> - if ( rc )
> - panic("IOMMU failed to remove Interrupt Address Range: %d\n", rc);
> -
This last chunk is not correct, if the interrupt address range is not
removed, the local APIC page needs to be filtered out, so this should
instead be:
diff --git a/xen/drivers/passthrough/x86/iommu.c
b/xen/drivers/passthrough/x86/iommu.c
index 8b1e0596b84a..c53626dfc69d 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -475,10 +475,11 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
if ( rc )
panic("IOMMU failed to remove Xen ranges: %d\n", rc);
- /* Remove any overlap with the Interrupt Address Range. */
- rc = rangeset_remove_range(map, 0xfee00, 0xfeeff);
+ /* Remove any overlap with the local APIC page. */
+ rc = rangeset_remove_range(map, paddr_to_pfn(mp_lapic_addr),
+ paddr_to_pfn(mp_lapic_addr));
if ( rc )
- panic("IOMMU failed to remove Interrupt Address Range: %d\n", rc);
+ panic("IOMMU failed to remove local APIC page: %d\n", rc);
/* If emulating IO-APIC(s) make sure the base address is unmapped. */
if ( has_vioapic(d) )