On 5/12/23 03:25, Jan Beulich wrote:
> On 11.05.2023 21:16, Stewart Hildebrand wrote:
>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>              pdev->domain = NULL;
>>              goto out;
>>          }
>> +#ifdef CONFIG_HAS_DEVICE_TREE
>> +        ret = iommu_add_dt_pci_device(pdev);
>> +        if ( ret < 0 )
>> +        {
>> +            printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
>> +            goto out;
>> +        }
>> +#endif
>>          ret = iommu_add_device(pdev);
> 
> Hmm, am I misremembering that in the earlier patch you had #else to
> invoke the alternative behavior?

You are remembering correctly. v1 had an #else, v2 does not.

> Now you end up calling both functions;
> if that's indeed intended,

Yes, this is intentional.

> this may still want doing differently.
> Looking at the earlier patch introducing the function, I can't infer
> though whether that's intended: iommu_add_dt_pci_device() checks that
> the add_device hook is present, but then I didn't find any use of this
> hook. The revlog there suggests the check might be stale.

Ah, right, the ops->add_device check is stale in the other patch. Good catch, 
I'll remove it there.

> If indeed the function does only preparatory work, I don't see why it
> would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
> then.

The function has now been reduced to reading SMMU configuration data from DT 
and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it 
is still invoking another iommu_ops hook function, dt_xlate (which is yet 
another AXI stream ID translation, separate from what is being discussed here). 
Does this justify keeping "iommu_..." in the name? I'm not convinced 
pci_add_dt_device() is a good name for it either (more on this below).

> Plus in such a case #ifdef-ary here probably wants avoiding by
> introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then
> ...
> 
>>          if ( ret )
>>          {
>> +#ifdef CONFIG_HAS_DEVICE_TREE
>> +            iommu_fwspec_free(pci_to_dev(pdev));
>> +#endif
> 
> ... this (which I understand is doing the corresponding cleanup) then
> also wants wrapping in a suitably named tiny helper function.

Sure, I'm on board with eliminating/reducing the #ifdef-ary where possible. 
Will do.

> But yet further I'm then no longer convinced this is the right place
> for the addition. pci_add_device() is backing physdev hypercalls. It
> would seem to me that the function may want invoking yet one layer
> further up, or it may even want invoking from a brand new DT-specific
> physdev-op. This would then leave at least the x86-only paths (invoking
> pci_add_device() from outside of pci_physdev_op()) entirely alone.

Let's establish that pci_add_device()/iommu_add_device() are already inherently 
performing tasks related to setting up a PCI device to work with an IOMMU.

The preparatory work in question needs to happen after:

  pci_add_device()
    -> alloc_pdev()

since we need to know all the possible RIDs (including those for phantom 
functions), but before the add_device iommu hook:

  pci_add_device()
    -> iommu_add_device()
      -> iommu_call(hd->platform_ops, add_device, ...)


The preparatory work (i.e. mapping RID/BDF -> AXI stream ID) is inherently 
associated with setting up a PCI device to work with an ARM SMMU (but not any 
particular variant of the SMMU). The SMMU distinguishes what PCI 
device/function DMA traffic is associated with based on the derived AXI stream 
ID (sideband data), not the RID/BDF directly. See [1].

Moving the preparatory work one layer up would mean duplicating what 
alloc_pdev() is already doing to set up pdev->phantom_stride (which we need to 
figure out all RIDs for that particular device). Moving it down into the 
individual SMMU drivers (smmu_ops/platform_ops) would mean duplicating special 
phantom function handling in each SMMU driver, and further deviating from the 
Linux SMMU driver(s) they are based on.

It still feels to me like pci_add_device() (or iommu_add_device()) is the right 
place to perform the RID/BDF -> AXI stream ID mapping.

Since there's nothing inherently DT specific (or ACPI specific) about deriving 
sideband data from RID/BDF, let me propose a new name for the function (instead 
of iommu_add_dt_pci_device):

  iommu_derive_pci_device_sideband_IDs()


Now, as far as DT and ACPI co-existing goes, I admit I haven't tested with 
CONFIG_ACPI=y yet (there seems to be some issues when both CONFIG_ARM_SMMU_V3=y 
and CONFIG_ACPI=y are enabled, even in staging). But I do recognize that we 
need a way support both CONFIG_HAS_DEVICE_TREE=y and CONFIG_ACPI=y 
simultaneously. Let me think on that for a bit...

[1] 
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt

Reply via email to