On 2025-09-09 4:45 pm, Shyam Saini wrote:
Individual PCI devices lack dedicated device tree nodes, but
their IOMMU configuration (including reserved IOVA regions) is often
defined at the PCI host controller level in the device tree. The
"iommu-addresses" property in reserved-memory nodes specifies IOVA
ranges that should be reserved for specific devices.

Currently, PCI devices cannot access these configurations because their
dev->of_node is NULL, preventing of_iommu_get_resv_regions() from
discovering reserved regions defined in the device tree.

There are at least 3 ways to reserve iommu-addresses for individual PCI
devices,
  - 1) By dynamically adding DTS nodes for individual PCI devices using
    [2] CONFIG_PCI_DYNAMIC_OF_NODES, this requires hardcoding PCI device
    IDs in DECLARE_PCI_FIXUP_FINAL

  - 2) By adding PCI devices nodes either in DTS or by modifying FDT at
    boot time in the firmware, eg [3] However, of_iommu driver doesn't
    seem to handle individual PCI devices, additionally this approach
    doesn't seem to much scalable for the complex PCI hierarchy

  - 3) By configuring PCI host controller DTS node for PCI device so
    that it can inherit iommu-addresses defined in the parent node.

This commit addresses the problem using approach 3) by assigning the
PCI host controller's device tree node to PCI devices during IOMMU
configuration, enabling them to inherit the host controller's device
tree properties. This allows PCI devices to properly discover and
reserve IOVA regions specified in the device tree.

Signed-off-by: Shyam Saini <[email protected]>
---
  drivers/iommu/of_iommu.c | 11 +++++++++++
  1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 6b989a62def20..077482917e3e8 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -145,6 +145,17 @@ int of_iommu_configure(struct device *dev, struct 
device_node *master_np,
                err = pci_for_each_dma_alias(to_pci_dev(dev),
                                             of_pci_iommu_init, &info);
                of_pci_check_device_ats(dev, master_np);
+
+               /*
+                * For PCI devices, ensure the device's of_node points to the
+                * PCI host controller's device tree node so that reserved 
regions
+                * and other DT-specific IOMMU configuration can be found.
+                * PCI devices typically don't have individual DT nodes, but
+                * their configuration (including reserved regions) is defined
+                * at the PCI host controller level.
+                */
+               if (!err && master_np && !dev->of_node)
+                       dev->of_node = of_node_get(master_np);

This is just wrong. Disregarding the fiddly aspects of node reuse that are completely ignored here, an endpoint device is not the host bridge/root complex device, so it is wildly inappropriate to associate one with the other's DT node and all its properties, resources, etc.

If it truly is the case that boot firmware has somehow "reserved" some small amount of *IOVA* address space for specific endpoints (but without any endpoint or SMMU configuration, given that those both get reset by VFIO?) then frankly it *should* populate the PCI hierarchy in DT so it can accurately and truthfully describe what it has done.

On the other hand, if as I suspect it is simply the case that the host bridge has limited windows into system *physical* address space, like plenty of other systems do, then just like those other systems that should be described as standard "dma-ranges" instead of trying to wave silly hacks about.

Thanks,
Robin.

        } else {
                err = of_iommu_configure_device(master_np, dev, id);
        }


Reply via email to