On 2025-09-23 5:19 pm, Jason Gunthorpe wrote:
On Tue, Sep 23, 2025 at 08:56:47AM -0700, Shyam Saini wrote:
Hi Jason, Will,
On 19 Sep 2025 09:08, Jason Gunthorpe wrote:
On Fri, Sep 19, 2025 at 08:33:23AM +0100, Will Deacon wrote:
pieces and will need to work on the userspace side. It's not like
MSI_IOVA2 is magically going to work (and I bet it won't be tested).
It could, if someone checks the default memory map a second constant
could be selected that works.
Nicolin has some patches on the iommufd side to let userspace select
the MSI address instead, but they are not done yet.
Maybe we should just wait for that? Carrying a temporary hack with ABI
implications to support broken hardware isn't particularly compelling
to me.
This patch would still be needed for kernel users.
Arguably the kernel users should just be using the iova allocator from
dma-iommu.c. This whole hard coded constant/sneaky uapi is just a hack
to make vfio work..
So maybe if the single constant doesn't work we could set some
indication that the caller must allocate the MSI iova, the kernel can
use the dma-iommu allocator and VFIO can just refuse to use the device
for now.
So, are we settling on having two predefined MSI IOVA base constants,
and if both of those conflict with reserved regions on a given platform,
falling back to dynamic allocation via the IOVA allocator? Just checking
if that's the consensus we're reaching.
I think Will is arguing against introducing a new constant..
Yesterday I was looking at the SW_MSI code again.. What specific
problem is it you have?
It looks to me like dma-iommu.c is already allocating MSI addresses
using its built in IOVA allocator. So if your DT is marking that space
reserved then it should Just Work right now as dma-iommu.c already
processes the reserved ranges and will allocate MSI addresses around
them?
The base value of the SW_MSI is only used by VFIO - are you trying to
use VFIO with this device, or have I misunderstood the dma-iommu.c
logic?
Indeed the sole user of the entire
IOMMU_RESV_SW_MSI/iommu_dma_get_msi_cookie() mechanism is that one place
in vfio_iommu_type1. iommu-dma itself treats MSIs just like any other
DMA mapping, so if address space limitations are not correctly described
then any breakage will be to DMA in general.
If it is only VFIO at issue then perhaps we should solve this by
completing the work Nicolin started to allow VFIO userspace to specify
the MSI Aperture?
+1 to that - the arbitrary fake MSI reserved region was only ever meant
to be a first step to get existing VMMs working with bare minimal
"squint and pretend it's like x86" changes; MSI_IOVA_BASE was literally
just picked to fit the standard Qemu virt machine memory map nicely. It
was always intended that we'd eventually have more
Arm-system-architecture-aware VMMs that would understand it's just a
notional hole that needs punching in VFIO address space _somewhere_, and
we'd figure out some interface for negotiating it. There has also always
been at least one platform where this MSI_IOVA_BASE knowingly could
never work, but that one (Arm Juno) also has sufficient other
impediments to realistic VFIO usage (I've had it working, but it's
definitely no more than a novelty) that it was never going to justify
any upstream investment itself.
If we do now have a "serious" VFIO-capable system where the basic bodge
no longer suffices, that surely does justify it finally being time to do
the right thing.
Thanks,
Robin.