Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-10-09 Thread Jason Gunthorpe
On Wed, Oct 09, 2024 at 03:20:57PM +0800, Yi Liu wrote:
> On 2024/10/1 05:59, Nicolin Chen wrote:
> > On Sun, Sep 29, 2024 at 03:16:55PM +0800, Yi Liu wrote:
> > > > > > I feel these two might act somehow similarly to the two DIDs
> > > > > > during nested translations?
> > > > > 
> > > > > not quite the same. Is it possible that the ASID is the same for 
> > > > > stage-1?
> > > > > Intel VT-d side can have the pasid to be the same. Like the gIOVA, all
> > > > > devices use the same ridpasid. Like the scenario I replied to 
> > > > > Baolu[1],
> > > > > do er choose to use different DIDs to differentiate the caches for the
> > > > > two devices.
> > > > 
> > > > On ARM, each S1 domain (either a normal stage-1 PASID=0 domain or
> > > > an SVA PASID>0 domain) has a unique ASID.
> > > 
> > > I see. Looks like ASID is not the PASID.
> > 
> > It's not. PASID is called Substream ID in SMMU term. It's used to
> > index the PASID table. For cache invalidations, a PASID (ssid) is
> > for ATC (dev cache) or PASID table entry invalidation only.
> 
> sure. Is there any relationship between PASID and ASID? Per the below
> link, ASID is used to tag the TLB entries of an application. So it's
> used in the SVA case. right?

Unlike Intel and AMD the IOTLB tag is entirely controlled by
software. So the HW will lookup the PASID and retrieve an ASID, then
use that as a cache tag.

Intel and AMD will use the PASID as the cache tag.

As we've talked about several times using the PASID directly as a
cache tag robs the SW of optimization possibilities in some cases.

The extra ASID indirection allows the SW to always tag the same page
table top pointer with the same ASID regardless of what PASID it is
assigned to and guarentee IOTLB sharing.

Jason



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-10-09 Thread Yi Liu

On 2024/10/1 05:59, Nicolin Chen wrote:

On Sun, Sep 29, 2024 at 03:16:55PM +0800, Yi Liu wrote:

I feel these two might act somehow similarly to the two DIDs
during nested translations?


not quite the same. Is it possible that the ASID is the same for stage-1?
Intel VT-d side can have the pasid to be the same. Like the gIOVA, all
devices use the same ridpasid. Like the scenario I replied to Baolu[1],
do er choose to use different DIDs to differentiate the caches for the
two devices.


On ARM, each S1 domain (either a normal stage-1 PASID=0 domain or
an SVA PASID>0 domain) has a unique ASID.


I see. Looks like ASID is not the PASID.


It's not. PASID is called Substream ID in SMMU term. It's used to
index the PASID table. For cache invalidations, a PASID (ssid) is
for ATC (dev cache) or PASID table entry invalidation only.


sure. Is there any relationship between PASID and ASID? Per the below
link, ASID is used to tag the TLB entries of an application. So it's
used in the SVA case. right?

https://developer.arm.com/documentation/102142/0100/Stage-2-translation



So it unlikely has the
situation of two identical ASIDs if they are on the same vIOMMU,
because the ASID pool is per IOMMU instance (whether p or v).

With two vIOMMU instances, there might be the same ASIDs but they
will be tagged with different VMIDs.


[1]
https://lore.kernel.org/linux-iommu/4bc9bd20-5aae-440d-84fd-f530d0747...@intel.com/


Is "gIOVA" a type of invalidation that only uses "address" out of
"PASID, DID and address"? I.e. PASID and DID are not provided via
the invalidation request, so it's going to broadcast all viommus?


gIOVA is just a term v.s. vSVA. Just want to differentiate it from vSVA. :)
PASID and DID are still provided in the invalidation.


I am still not getting this gIOVA. What it does exactly v.s. vSVA?
And should RIDPASID be IOMMU_NO_PASID?

gIOVA is the IOVA in guest. vSVA just the SVA in guest. Maybe the confusion
comes why not use vIOVA instead of gIOVA. is it? I think you are clear
about IOVA v.s. SVA. :)

yes, RIDPASID is the IOMMU_NO_PASID although VT-d arch allows it to be non
IOMMU_NO_PASID.

--
Regards,
Yi Liu



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-10-01 Thread Nicolin Chen
On Tue, Oct 01, 2024 at 10:48:15AM -0300, Jason Gunthorpe wrote:
> On Sun, Sep 29, 2024 at 03:19:42PM +0800, Yi Liu wrote:
> > > So their viommu HW concepts come along with a requirement that there
> > > be a fixed translation for the VM, which we model by attaching a S2
> > > HWPT to the VIOMMU object which get's linked into the IOMMU HW as
> > > the translation for the queue memory.
> > 
> > Is the mapping of the S2 be static? or it an be unmapped per userspace?
> 
> In principle it should be dynamic, but I think the vCMDQ stuff will
> struggle to do that

Yea. vCMDQ HW requires a setting of the physical address of the
base address to a queue in the VM's ram space. If the S2 mapping
changes (resulting a different queue location in the physical
memory), VMM should notify the kernel for a HW reconfiguration.

I wonder what all the user cases are, which can cause a shifting
of S2 mappings? VM migration? Any others?

Thanks
Nicolin



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-10-01 Thread Jason Gunthorpe
On Tue, Oct 01, 2024 at 03:06:57PM +1000, Alexey Kardashevskiy wrote:
> I've just read in this thread that "it should be generally restricted to the
> number of pIOMMUs, although likely (not 100% sure) we could do multiple
> vIOMMUs on a single-pIOMMU system. Any reason for doing that?"? thought "we
> have every reason to do that, unless p means something different", so I
> decided to ask :) Thanks,

I think that was inteded as "multiple vIOMMUs per pIOMMU within a
single VM".

There would always be multiple vIOMMUs per pIOMMU across VMs/etc.

Jason



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-10-01 Thread Jason Gunthorpe
On Sun, Sep 29, 2024 at 03:19:42PM +0800, Yi Liu wrote:
> > So their viommu HW concepts come along with a requirement that there
> > be a fixed translation for the VM, which we model by attaching a S2
> > HWPT to the VIOMMU object which get's linked into the IOMMU HW as
> > the translation for the queue memory.
> 
> Is the mapping of the S2 be static? or it an be unmapped per userspace?

In principle it should be dynamic, but I think the vCMDQ stuff will
struggle to do that

Jason
 



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-30 Thread Alexey Kardashevskiy




On 1/10/24 13:36, Nicolin Chen wrote:

On Tue, Oct 01, 2024 at 11:55:59AM +1000, Alexey Kardashevskiy wrote:

On 11/9/24 17:08, Nicolin Chen wrote:

On Wed, Sep 11, 2024 at 06:12:21AM +, Tian, Kevin wrote:

From: Nicolin Chen 
Sent: Wednesday, August 28, 2024 1:00 AM


[...]

On a multi-IOMMU system, the VIOMMU object can be instanced to the
number
of vIOMMUs in a guest VM, while holding the same parent HWPT to share
the


Is there restriction that multiple vIOMMU objects can be only created
on a multi-IOMMU system?


I think it should be generally restricted to the number of pIOMMUs,
although likely (not 100% sure) we could do multiple vIOMMUs on a
single-pIOMMU system. Any reason for doing that?



Just to clarify the terminology here - what are pIOMMU and vIOMMU exactly?

On AMD, IOMMU is a pretend-pcie device, one per a rootport, manages a DT
- device table, one entry per BDFn, the entry owns a queue. A slice of
that can be passed to a VM (== queues mapped directly to the VM, and
such IOMMU appears in the VM as a pretend-pcie device too). So what is
[pv]IOMMU here? Thanks,
  
The "p" stands for physical: the entire IOMMU unit/instance. In

the IOMMU subsystem terminology, it's a struct iommu_device. It
sounds like AMD would register one iommu device per rootport?


Yup, my test machine has 4 of these.



The "v" stands for virtual: a slice of the pIOMMU that could be
shared or passed through to a VM:
  - Intel IOMMU doesn't have passthrough queues, so it uses a
shared queue (for invalidation). In this case, vIOMMU will
be a pure SW structure for HW queue sharing (with the host
machine and other VMs). That said, I think the channel (or
the port) that Intel VT-d uses internally for a device to
do a two-stage translation can be seen as a "passthrough"
feature, held by a vIOMMU.
  - AMD IOMMU can assign passthrough queues to VMs, in which
case, vIOMMU will be a structure holding all passthrough
resource (of the pIOMMU) assisgned to a VM. If there is a
shared resource, it can be packed into the vIOMMU struct
too. FYI, vQUEUE (future series) on the other hand will
represent each passthrough queue in a vIOMMU struct. The
VM then, per that specific pIOMMU (rootport?), will have
one vIOMMU holding a number of vQUEUEs.
  - ARM SMMU is sort of in the middle, depending on the impls.
vIOMMU will be a structure holding both passthrough and
shared resource. It can define vQUEUEs, if the impl has
passthrough queues like AMD does.

Allowing a vIOMMU to hold shared resource makes it a bit of an
upgraded model for IOMMU virtualization, from the existing HWPT
model that now looks like a subset of the vIOMMU model.


Thanks for confirming.

I've just read in this thread that "it should be generally restricted to 
the number of pIOMMUs, although likely (not 100% sure) we could do 
multiple vIOMMUs on a single-pIOMMU system. Any reason for doing that?"? 
thought "we have every reason to do that, unless p means something 
different", so I decided to ask :) Thanks,





Thanks
Nicolin


--
Alexey




Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-30 Thread Nicolin Chen
On Tue, Oct 01, 2024 at 11:55:59AM +1000, Alexey Kardashevskiy wrote:
> On 11/9/24 17:08, Nicolin Chen wrote:
> > On Wed, Sep 11, 2024 at 06:12:21AM +, Tian, Kevin wrote:
> > > > From: Nicolin Chen 
> > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > > 
> > > [...]
> > > > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > > > number
> > > > of vIOMMUs in a guest VM, while holding the same parent HWPT to share
> > > > the
> > > 
> > > Is there restriction that multiple vIOMMU objects can be only created
> > > on a multi-IOMMU system?
> > 
> > I think it should be generally restricted to the number of pIOMMUs,
> > although likely (not 100% sure) we could do multiple vIOMMUs on a
> > single-pIOMMU system. Any reason for doing that?
> 
> 
> Just to clarify the terminology here - what are pIOMMU and vIOMMU exactly?
> 
> On AMD, IOMMU is a pretend-pcie device, one per a rootport, manages a DT
> - device table, one entry per BDFn, the entry owns a queue. A slice of
> that can be passed to a VM (== queues mapped directly to the VM, and
> such IOMMU appears in the VM as a pretend-pcie device too). So what is
> [pv]IOMMU here? Thanks,
 
The "p" stands for physical: the entire IOMMU unit/instance. In
the IOMMU subsystem terminology, it's a struct iommu_device. It
sounds like AMD would register one iommu device per rootport?

The "v" stands for virtual: a slice of the pIOMMU that could be
shared or passed through to a VM:
 - Intel IOMMU doesn't have passthrough queues, so it uses a
   shared queue (for invalidation). In this case, vIOMMU will
   be a pure SW structure for HW queue sharing (with the host
   machine and other VMs). That said, I think the channel (or
   the port) that Intel VT-d uses internally for a device to
   do a two-stage translation can be seen as a "passthrough"
   feature, held by a vIOMMU.
 - AMD IOMMU can assign passthrough queues to VMs, in which
   case, vIOMMU will be a structure holding all passthrough
   resource (of the pIOMMU) assisgned to a VM. If there is a
   shared resource, it can be packed into the vIOMMU struct
   too. FYI, vQUEUE (future series) on the other hand will
   represent each passthrough queue in a vIOMMU struct. The
   VM then, per that specific pIOMMU (rootport?), will have
   one vIOMMU holding a number of vQUEUEs.
 - ARM SMMU is sort of in the middle, depending on the impls.
   vIOMMU will be a structure holding both passthrough and
   shared resource. It can define vQUEUEs, if the impl has
   passthrough queues like AMD does.

Allowing a vIOMMU to hold shared resource makes it a bit of an
upgraded model for IOMMU virtualization, from the existing HWPT
model that now looks like a subset of the vIOMMU model. 

Thanks
Nicolin



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-30 Thread Alexey Kardashevskiy

On 11/9/24 17:08, Nicolin Chen wrote:

On Wed, Sep 11, 2024 at 06:12:21AM +, Tian, Kevin wrote:

From: Nicolin Chen 
Sent: Wednesday, August 28, 2024 1:00 AM


[...]

On a multi-IOMMU system, the VIOMMU object can be instanced to the
number
of vIOMMUs in a guest VM, while holding the same parent HWPT to share
the


Is there restriction that multiple vIOMMU objects can be only created
on a multi-IOMMU system?


I think it should be generally restricted to the number of pIOMMUs,
although likely (not 100% sure) we could do multiple vIOMMUs on a
single-pIOMMU system. Any reason for doing that?



Just to clarify the terminology here - what are pIOMMU and vIOMMU exactly?

On AMD, IOMMU is a pretend-pcie device, one per a rootport, manages a DT 
- device table, one entry per BDFn, the entry owns a queue. A slice of 
that can be passed to a VM (== queues mapped directly to the VM, and 
such IOMMU appears in the VM as a pretend-pcie device too). So what is 
[pv]IOMMU here? Thanks,






stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own
VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:


this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
entire context it actually means the physical 'VMID' allocated on the
associated physical IOMMU, correct?


Quoting Jason's narratives, a VMID is a "Security namespace for
guest owned ID". The allocation, using SMMU as an example, should
be a part of vIOMMU instance allocation in the host SMMU driver.
Then, this VMID will be used to mark the cache tags. So, it is
still a software allocated ID, while HW would use it too.

Thanks
Nicolin


--
Alexey




Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-30 Thread Nicolin Chen
On Sun, Sep 29, 2024 at 03:16:55PM +0800, Yi Liu wrote:
> > > > I feel these two might act somehow similarly to the two DIDs
> > > > during nested translations?
> > > 
> > > not quite the same. Is it possible that the ASID is the same for stage-1?
> > > Intel VT-d side can have the pasid to be the same. Like the gIOVA, all
> > > devices use the same ridpasid. Like the scenario I replied to Baolu[1],
> > > do er choose to use different DIDs to differentiate the caches for the
> > > two devices.
> > 
> > On ARM, each S1 domain (either a normal stage-1 PASID=0 domain or
> > an SVA PASID>0 domain) has a unique ASID.
> 
> I see. Looks like ASID is not the PASID.

It's not. PASID is called Substream ID in SMMU term. It's used to
index the PASID table. For cache invalidations, a PASID (ssid) is
for ATC (dev cache) or PASID table entry invalidation only.

> > So it unlikely has the
> > situation of two identical ASIDs if they are on the same vIOMMU,
> > because the ASID pool is per IOMMU instance (whether p or v).
> > 
> > With two vIOMMU instances, there might be the same ASIDs but they
> > will be tagged with different VMIDs.
> > 
> > > [1]
> > > https://lore.kernel.org/linux-iommu/4bc9bd20-5aae-440d-84fd-f530d0747...@intel.com/
> > 
> > Is "gIOVA" a type of invalidation that only uses "address" out of
> > "PASID, DID and address"? I.e. PASID and DID are not provided via
> > the invalidation request, so it's going to broadcast all viommus?
> 
> gIOVA is just a term v.s. vSVA. Just want to differentiate it from vSVA. :)
> PASID and DID are still provided in the invalidation.

I am still not getting this gIOVA. What it does exactly v.s. vSVA?
And should RIDPASID be IOMMU_NO_PASID?

Nicolin



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-29 Thread Yi Liu

On 2024/9/27 20:20, Jason Gunthorpe wrote:

On Fri, Sep 27, 2024 at 08:12:20PM +0800, Yi Liu wrote:

Perhaps calling it a slice sounds more accurate, as I guess all
the confusion comes from the name "vIOMMU" that might be thought
to be a user space object/instance that likely holds all virtual
stuff like stage-1 HWPT or so?


yeah. Maybe this confusion partly comes when you start it with the
cache invalidation as well. I failed to get why a S2 hwpt needs to
be part of the vIOMMU obj at the first glance.


Both amd and arm have direct to VM queues for the iommu and these
queues have their DMA translated by the S2.


ok, this explains why the S2 should be part of the vIOMMU obj.



So their viommu HW concepts come along with a requirement that there
be a fixed translation for the VM, which we model by attaching a S2
HWPT to the VIOMMU object which get's linked into the IOMMU HW as
the translation for the queue memory.


Is the mapping of the S2 be static? or it an be unmapped per userspace?

--
Regards,
Yi Liu



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-29 Thread Yi Liu

On 2024/9/28 04:44, Nicolin Chen wrote:

On Fri, Sep 27, 2024 at 08:12:20PM +0800, Yi Liu wrote:

On 2024/9/27 14:32, Nicolin Chen wrote:

On Fri, Sep 27, 2024 at 01:54:45PM +0800, Yi Liu wrote:

Baolu told me that Intel may have the same: different domain IDs
on different IOMMUs; multiple IOMMU instances on one chip:
https://lore.kernel.org/linux-iommu/cf4fe15c-8bcb-4132-a1fd-b2c8ddf27...@linux.intel.com/
So, I think we are having the same situation here.


yes, it's called iommu unit or dmar. A typical Intel server can have
multiple iommu units. But like Baolu mentioned in that thread, the intel
iommu driver maintains separate domain ID spaces for iommu units, which
means a given iommu domain has different DIDs when associated with
different iommu units. So intel side is not suffering from this so far.


An ARM SMMU has its own VMID pool as well. The suffering comes
from associating VMIDs to one shared parent S2 domain.


Is this because of the VMID is tied with a S2 domain?


On ARM, yes. VMID is a part of S2 domain stuff.


Does a DID per S1 nested domain or parent S2? If it is per S2,
I think the same suffering applies when we share the S2 across
IOMMU instances?


per S1 I think. The iotlb efficiency is low as S2 caches would be
tagged with different DIDs even the page table is the same. :)


On ARM, the stage-1 is tagged with an ASID (Address Space ID)
while the stage-2 is tagged with a VMID. Then an invalidation
for a nested S1 domain must require the VMID from the S2. The
ASID may be also required if the invalidation is specific to
that address space (otherwise, broadcast per VMID.)



Looks like the nested s1 caches are tagged with both ASID and VMID.


Yea, my understanding is similar. If both stages are enabled for
a nested translation, VMID is tagged for S1 cache too.


I feel these two might act somehow similarly to the two DIDs
during nested translations?


not quite the same. Is it possible that the ASID is the same for stage-1?
Intel VT-d side can have the pasid to be the same. Like the gIOVA, all
devices use the same ridpasid. Like the scenario I replied to Baolu[1],
do er choose to use different DIDs to differentiate the caches for the
two devices.


On ARM, each S1 domain (either a normal stage-1 PASID=0 domain or
an SVA PASID>0 domain) has a unique ASID.


I see. Looks like ASID is not the PASID.


So it unlikely has the
situation of two identical ASIDs if they are on the same vIOMMU,
because the ASID pool is per IOMMU instance (whether p or v).

With two vIOMMU instances, there might be the same ASIDs but they
will be tagged with different VMIDs.


[1]
https://lore.kernel.org/linux-iommu/4bc9bd20-5aae-440d-84fd-f530d0747...@intel.com/


Is "gIOVA" a type of invalidation that only uses "address" out of
"PASID, DID and address"? I.e. PASID and DID are not provided via
the invalidation request, so it's going to broadcast all viommus?


gIOVA is just a term v.s. vSVA. Just want to differentiate it from vSVA. :)
PASID and DID are still provided in the invalidation.

--
Regards,
Yi Liu



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-27 Thread Nicolin Chen
On Fri, Sep 27, 2024 at 08:12:20PM +0800, Yi Liu wrote:
> On 2024/9/27 14:32, Nicolin Chen wrote:
> > On Fri, Sep 27, 2024 at 01:54:45PM +0800, Yi Liu wrote:
> > > > > > Baolu told me that Intel may have the same: different domain IDs
> > > > > > on different IOMMUs; multiple IOMMU instances on one chip:
> > > > > > https://lore.kernel.org/linux-iommu/cf4fe15c-8bcb-4132-a1fd-b2c8ddf27...@linux.intel.com/
> > > > > > So, I think we are having the same situation here.
> > > > > 
> > > > > yes, it's called iommu unit or dmar. A typical Intel server can have
> > > > > multiple iommu units. But like Baolu mentioned in that thread, the 
> > > > > intel
> > > > > iommu driver maintains separate domain ID spaces for iommu units, 
> > > > > which
> > > > > means a given iommu domain has different DIDs when associated with
> > > > > different iommu units. So intel side is not suffering from this so 
> > > > > far.
> > > > 
> > > > An ARM SMMU has its own VMID pool as well. The suffering comes
> > > > from associating VMIDs to one shared parent S2 domain.
> > > 
> > > Is this because of the VMID is tied with a S2 domain?
> > 
> > On ARM, yes. VMID is a part of S2 domain stuff.
> > 
> > > > Does a DID per S1 nested domain or parent S2? If it is per S2,
> > > > I think the same suffering applies when we share the S2 across
> > > > IOMMU instances?
> > > 
> > > per S1 I think. The iotlb efficiency is low as S2 caches would be
> > > tagged with different DIDs even the page table is the same. :)
> > 
> > On ARM, the stage-1 is tagged with an ASID (Address Space ID)
> > while the stage-2 is tagged with a VMID. Then an invalidation
> > for a nested S1 domain must require the VMID from the S2. The
> > ASID may be also required if the invalidation is specific to
> > that address space (otherwise, broadcast per VMID.)

> Looks like the nested s1 caches are tagged with both ASID and VMID.

Yea, my understanding is similar. If both stages are enabled for
a nested translation, VMID is tagged for S1 cache too.

> > I feel these two might act somehow similarly to the two DIDs
> > during nested translations?
> 
> not quite the same. Is it possible that the ASID is the same for stage-1?
> Intel VT-d side can have the pasid to be the same. Like the gIOVA, all
> devices use the same ridpasid. Like the scenario I replied to Baolu[1],
> do er choose to use different DIDs to differentiate the caches for the
> two devices.

On ARM, each S1 domain (either a normal stage-1 PASID=0 domain or
an SVA PASID>0 domain) has a unique ASID. So it unlikely has the
situation of two identical ASIDs if they are on the same vIOMMU,
because the ASID pool is per IOMMU instance (whether p or v).

With two vIOMMU instances, there might be the same ASIDs but they
will be tagged with different VMIDs.

> [1]
> https://lore.kernel.org/linux-iommu/4bc9bd20-5aae-440d-84fd-f530d0747...@intel.com/

Is "gIOVA" a type of invalidation that only uses "address" out of
"PASID, DID and address"? I.e. PASID and DID are not provided via
the invalidation request, so it's going to broadcast all viommus?

Thanks
Nicolin



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-27 Thread Yi Liu

On 2024/9/27 14:32, Nicolin Chen wrote:

On Fri, Sep 27, 2024 at 01:54:45PM +0800, Yi Liu wrote:

Baolu told me that Intel may have the same: different domain IDs
on different IOMMUs; multiple IOMMU instances on one chip:
https://lore.kernel.org/linux-iommu/cf4fe15c-8bcb-4132-a1fd-b2c8ddf27...@linux.intel.com/
So, I think we are having the same situation here.


yes, it's called iommu unit or dmar. A typical Intel server can have
multiple iommu units. But like Baolu mentioned in that thread, the intel
iommu driver maintains separate domain ID spaces for iommu units, which
means a given iommu domain has different DIDs when associated with
different iommu units. So intel side is not suffering from this so far.


An ARM SMMU has its own VMID pool as well. The suffering comes
from associating VMIDs to one shared parent S2 domain.


Is this because of the VMID is tied with a S2 domain?


On ARM, yes. VMID is a part of S2 domain stuff.


Does a DID per S1 nested domain or parent S2? If it is per S2,
I think the same suffering applies when we share the S2 across
IOMMU instances?


per S1 I think. The iotlb efficiency is low as S2 caches would be
tagged with different DIDs even the page table is the same. :)


On ARM, the stage-1 is tagged with an ASID (Address Space ID)
while the stage-2 is tagged with a VMID. Then an invalidation
for a nested S1 domain must require the VMID from the S2. The
ASID may be also required if the invalidation is specific to
that address space (otherwise, broadcast per VMID.)

Looks like the nested s1 caches are tagged with both ASID and VMID.


I feel these two might act somehow similarly to the two DIDs
during nested translations?


not quite the same. Is it possible that the ASID is the same for stage-1?
Intel VT-d side can have the pasid to be the same. Like the gIOVA, all
devices use the same ridpasid. Like the scenario I replied to Baolu[1],
do er choose to use different DIDs to differentiate the caches for the
two devices.

[1] 
https://lore.kernel.org/linux-iommu/4bc9bd20-5aae-440d-84fd-f530d0747...@intel.com/



Adding another vIOMMU wrapper on the other hand can allow us to
allocate different VMIDs/DIDs for different IOMMUs.


that looks like to generalize the association of the iommu domain and the
iommu units?


A vIOMMU is a presentation/object of a physical IOMMU instance
in a VM.


a slice of a physical IOMMU. is it?


Yes. When multiple nested translations happen at the same time,
IOMMU (just like a CPU) is shared by these slices. And so is an
invalidation queue executing multiple requests.

Perhaps calling it a slice sounds more accurate, as I guess all
the confusion comes from the name "vIOMMU" that might be thought
to be a user space object/instance that likely holds all virtual
stuff like stage-1 HWPT or so?


yeah. Maybe this confusion partly comes when you start it with the
cache invalidation as well. I failed to get why a S2 hwpt needs to
be part of the vIOMMU obj at the first glance.




and you treat S2 hwpt as a resource of the physical IOMMU as well.


Yes. A parent HWPT (in the old day, we called it "kernel-manged"
HWPT) is not a user space thing. This belongs to a kernel owned
object.


This presentation gives a VMM some capability to take
advantage of some of HW resource of the physical IOMMU:
- a VMID is a small HW reousrce to tag the cache;
- a vIOMMU invalidation allows to access device cache that's
not straightforwardly done via an S1 HWPT invalidation;
- a virtual device presentation of a physical device in a VM,
related to the vIOMMU in the VM, which contains some VM-level
info: virtual device ID, security level (ARM CCA), and etc;
- Non-PRI IRQ forwarding to the guest VM;
- HW-accelerated virtualization resource: vCMDQ, AMD VIOMMU;


might be helpful to draw a diagram to show what the vIOMMU obj contains.:)


That's what I plan to. Basically looks like:
   device>stage1--->[ viommu [s2_hwpt, vmid, virq, HW-acc, etc.] ]


ok. let's see your new doc.

--
Regards,
Yi Liu



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-27 Thread Jason Gunthorpe
On Fri, Sep 27, 2024 at 08:12:20PM +0800, Yi Liu wrote:
> > Perhaps calling it a slice sounds more accurate, as I guess all
> > the confusion comes from the name "vIOMMU" that might be thought
> > to be a user space object/instance that likely holds all virtual
> > stuff like stage-1 HWPT or so?
> 
> yeah. Maybe this confusion partly comes when you start it with the
> cache invalidation as well. I failed to get why a S2 hwpt needs to
> be part of the vIOMMU obj at the first glance.

Both amd and arm have direct to VM queues for the iommu and these
queues have their DMA translated by the S2.

So their viommu HW concepts come along with a requirement that there
be a fixed translation for the VM, which we model by attaching a S2
HWPT to the VIOMMU object which get's linked into the IOMMU HW as
the translation for the queue memory.

Jason



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-26 Thread Nicolin Chen
On Fri, Sep 27, 2024 at 01:54:45PM +0800, Yi Liu wrote:
> > > > Baolu told me that Intel may have the same: different domain IDs
> > > > on different IOMMUs; multiple IOMMU instances on one chip:
> > > > https://lore.kernel.org/linux-iommu/cf4fe15c-8bcb-4132-a1fd-b2c8ddf27...@linux.intel.com/
> > > > So, I think we are having the same situation here.
> > > 
> > > yes, it's called iommu unit or dmar. A typical Intel server can have
> > > multiple iommu units. But like Baolu mentioned in that thread, the intel
> > > iommu driver maintains separate domain ID spaces for iommu units, which
> > > means a given iommu domain has different DIDs when associated with
> > > different iommu units. So intel side is not suffering from this so far.
> > 
> > An ARM SMMU has its own VMID pool as well. The suffering comes
> > from associating VMIDs to one shared parent S2 domain.
> 
> Is this because of the VMID is tied with a S2 domain?

On ARM, yes. VMID is a part of S2 domain stuff.

> > Does a DID per S1 nested domain or parent S2? If it is per S2,
> > I think the same suffering applies when we share the S2 across
> > IOMMU instances?
> 
> per S1 I think. The iotlb efficiency is low as S2 caches would be
> tagged with different DIDs even the page table is the same. :)

On ARM, the stage-1 is tagged with an ASID (Address Space ID)
while the stage-2 is tagged with a VMID. Then an invalidation
for a nested S1 domain must require the VMID from the S2. The
ASID may be also required if the invalidation is specific to
that address space (otherwise, broadcast per VMID.)

I feel these two might act somehow similarly to the two DIDs
during nested translations?

> > > > Adding another vIOMMU wrapper on the other hand can allow us to
> > > > allocate different VMIDs/DIDs for different IOMMUs.
> > > 
> > > that looks like to generalize the association of the iommu domain and the
> > > iommu units?
> > 
> > A vIOMMU is a presentation/object of a physical IOMMU instance
> > in a VM.
> 
> a slice of a physical IOMMU. is it?

Yes. When multiple nested translations happen at the same time,
IOMMU (just like a CPU) is shared by these slices. And so is an
invalidation queue executing multiple requests.

Perhaps calling it a slice sounds more accurate, as I guess all
the confusion comes from the name "vIOMMU" that might be thought
to be a user space object/instance that likely holds all virtual
stuff like stage-1 HWPT or so?

> and you treat S2 hwpt as a resource of the physical IOMMU as well.

Yes. A parent HWPT (in the old day, we called it "kernel-manged"
HWPT) is not a user space thing. This belongs to a kernel owned
object.

> > This presentation gives a VMM some capability to take
> > advantage of some of HW resource of the physical IOMMU:
> > - a VMID is a small HW reousrce to tag the cache;
> > - a vIOMMU invalidation allows to access device cache that's
> >not straightforwardly done via an S1 HWPT invalidation;
> > - a virtual device presentation of a physical device in a VM,
> >related to the vIOMMU in the VM, which contains some VM-level
> >info: virtual device ID, security level (ARM CCA), and etc;
> > - Non-PRI IRQ forwarding to the guest VM;
> > - HW-accelerated virtualization resource: vCMDQ, AMD VIOMMU;
> 
> might be helpful to draw a diagram to show what the vIOMMU obj contains.:)

That's what I plan to. Basically looks like:
  device>stage1--->[ viommu [s2_hwpt, vmid, virq, HW-acc, etc.] ]

Thanks
Nic



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-26 Thread Yi Liu

On 2024/9/27 10:05, Baolu Lu wrote:

On 9/27/24 4:03 AM, Nicolin Chen wrote:

On Thu, Sep 26, 2024 at 04:47:02PM +0800, Yi Liu wrote:

On 2024/9/26 02:55, Nicolin Chen wrote:

On Wed, Sep 25, 2024 at 06:30:20PM +0800, Yi Liu wrote:

Hi Nic,

On 2024/8/28 00:59, Nicolin Chen wrote:

This series introduces a new VIOMMU infrastructure and related ioctls.

IOMMUFD has been using the HWPT infrastructure for all cases, 
including a
nested IO page table support. Yet, there're limitations for an 
HWPT-based
structure to support some advanced HW-accelerated features, such as 
CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a 
multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the 
same

parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone.

could you elaborate a bit for the last sentence in the above paragraph?

Stage-2 HWPT/domain on ARM holds a VMID. If we share the parent
domain across IOMMU instances, we'd have to make sure that VMID
is available on all IOMMU instances. There comes the limitation
and potential resource starving, so not ideal.

got it.


Baolu told me that Intel may have the same: different domain IDs
on different IOMMUs; multiple IOMMU instances on one chip:
https://lore.kernel.org/linux-iommu/cf4fe15c-8bcb-4132-a1fd-b2c8ddf27...@linux.intel.com/
So, I think we are having the same situation here.

yes, it's called iommu unit or dmar. A typical Intel server can have
multiple iommu units. But like Baolu mentioned in that thread, the intel
iommu driver maintains separate domain ID spaces for iommu units, which
means a given iommu domain has different DIDs when associated with
different iommu units. So intel side is not suffering from this so far.

An ARM SMMU has its own VMID pool as well. The suffering comes
from associating VMIDs to one shared parent S2 domain.

Does a DID per S1 nested domain or parent S2? If it is per S2,
I think the same suffering applies when we share the S2 across
IOMMU instances?


It's per S1 nested domain in current VT-d design. It's simple but lacks
sharing of DID within a VM. We probably will change this later.


Could you share a bit more about this? I hope it is not going to share the
DID if the S1 nested domains share the same S2 hwpt. For fist-stage caches,
the tag is PASID, DID and address. If both PASID and DID are the same, then
there is cache conflict. And the typical scenarios is the gIOVA which uses
the RIDPASID. :)

--
Regards,
Yi Liu



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-26 Thread Yi Liu

On 2024/9/27 04:03, Nicolin Chen wrote:

On Thu, Sep 26, 2024 at 04:47:02PM +0800, Yi Liu wrote:

On 2024/9/26 02:55, Nicolin Chen wrote:

On Wed, Sep 25, 2024 at 06:30:20PM +0800, Yi Liu wrote:

Hi Nic,

On 2024/8/28 00:59, Nicolin Chen wrote:

This series introduces a new VIOMMU infrastructure and related ioctls.

IOMMUFD has been using the HWPT infrastructure for all cases, including a
nested IO page table support. Yet, there're limitations for an HWPT-based
structure to support some advanced HW-accelerated features, such as CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the same
parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone.


could you elaborate a bit for the last sentence in the above paragraph?


Stage-2 HWPT/domain on ARM holds a VMID. If we share the parent
domain across IOMMU instances, we'd have to make sure that VMID
is available on all IOMMU instances. There comes the limitation
and potential resource starving, so not ideal.


got it.


Baolu told me that Intel may have the same: different domain IDs
on different IOMMUs; multiple IOMMU instances on one chip:
https://lore.kernel.org/linux-iommu/cf4fe15c-8bcb-4132-a1fd-b2c8ddf27...@linux.intel.com/
So, I think we are having the same situation here.


yes, it's called iommu unit or dmar. A typical Intel server can have
multiple iommu units. But like Baolu mentioned in that thread, the intel
iommu driver maintains separate domain ID spaces for iommu units, which
means a given iommu domain has different DIDs when associated with
different iommu units. So intel side is not suffering from this so far.


An ARM SMMU has its own VMID pool as well. The suffering comes
from associating VMIDs to one shared parent S2 domain.


Is this because of the VMID is tied with a S2 domain?


Does a DID per S1 nested domain or parent S2? If it is per S2,
I think the same suffering applies when we share the S2 across
IOMMU instances?


per S1 I think. The iotlb efficiency is low as S2 caches would be
tagged with different DIDs even the page table is the same. :)


Adding another vIOMMU wrapper on the other hand can allow us to
allocate different VMIDs/DIDs for different IOMMUs.


that looks like to generalize the association of the iommu domain and the
iommu units?


A vIOMMU is a presentation/object of a physical IOMMU instance
in a VM.


a slice of a physical IOMMU. is it? and you treat S2 hwpt as a resource
of the physical IOMMU as well.


This presentation gives a VMM some capability to take
advantage of some of HW resource of the physical IOMMU:
- a VMID is a small HW reousrce to tag the cache;
- a vIOMMU invalidation allows to access device cache that's
   not straightforwardly done via an S1 HWPT invalidation;
- a virtual device presentation of a physical device in a VM,
   related to the vIOMMU in the VM, which contains some VM-level
   info: virtual device ID, security level (ARM CCA), and etc;
- Non-PRI IRQ forwarding to the guest VM;
- HW-accelerated virtualization resource: vCMDQ, AMD VIOMMU;


might be helpful to draw a diagram to show what the vIOMMU obj contains.:)

--
Regards,
Yi Liu



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-26 Thread Baolu Lu

On 9/27/24 4:03 AM, Nicolin Chen wrote:

On Thu, Sep 26, 2024 at 04:47:02PM +0800, Yi Liu wrote:

On 2024/9/26 02:55, Nicolin Chen wrote:

On Wed, Sep 25, 2024 at 06:30:20PM +0800, Yi Liu wrote:

Hi Nic,

On 2024/8/28 00:59, Nicolin Chen wrote:

This series introduces a new VIOMMU infrastructure and related ioctls.

IOMMUFD has been using the HWPT infrastructure for all cases, including a
nested IO page table support. Yet, there're limitations for an HWPT-based
structure to support some advanced HW-accelerated features, such as CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the same
parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone.

could you elaborate a bit for the last sentence in the above paragraph?

Stage-2 HWPT/domain on ARM holds a VMID. If we share the parent
domain across IOMMU instances, we'd have to make sure that VMID
is available on all IOMMU instances. There comes the limitation
and potential resource starving, so not ideal.

got it.


Baolu told me that Intel may have the same: different domain IDs
on different IOMMUs; multiple IOMMU instances on one chip:
https://lore.kernel.org/linux-iommu/cf4fe15c-8bcb-4132-a1fd-b2c8ddf27...@linux.intel.com/
So, I think we are having the same situation here.

yes, it's called iommu unit or dmar. A typical Intel server can have
multiple iommu units. But like Baolu mentioned in that thread, the intel
iommu driver maintains separate domain ID spaces for iommu units, which
means a given iommu domain has different DIDs when associated with
different iommu units. So intel side is not suffering from this so far.

An ARM SMMU has its own VMID pool as well. The suffering comes
from associating VMIDs to one shared parent S2 domain.

Does a DID per S1 nested domain or parent S2? If it is per S2,
I think the same suffering applies when we share the S2 across
IOMMU instances?


It's per S1 nested domain in current VT-d design. It's simple but lacks
sharing of DID within a VM. We probably will change this later.

Thanks,
baolu



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-26 Thread Nicolin Chen
On Thu, Sep 26, 2024 at 04:47:02PM +0800, Yi Liu wrote:
> On 2024/9/26 02:55, Nicolin Chen wrote:
> > On Wed, Sep 25, 2024 at 06:30:20PM +0800, Yi Liu wrote:
> > > Hi Nic,
> > > 
> > > On 2024/8/28 00:59, Nicolin Chen wrote:
> > > > This series introduces a new VIOMMU infrastructure and related ioctls.
> > > > 
> > > > IOMMUFD has been using the HWPT infrastructure for all cases, including 
> > > > a
> > > > nested IO page table support. Yet, there're limitations for an 
> > > > HWPT-based
> > > > structure to support some advanced HW-accelerated features, such as 
> > > > CMDQV
> > > > on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a 
> > > > multi-IOMMU
> > > > environment, it is not straightforward for nested HWPTs to share the 
> > > > same
> > > > parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone.
> > > 
> > > could you elaborate a bit for the last sentence in the above paragraph?
> > 
> > Stage-2 HWPT/domain on ARM holds a VMID. If we share the parent
> > domain across IOMMU instances, we'd have to make sure that VMID
> > is available on all IOMMU instances. There comes the limitation
> > and potential resource starving, so not ideal.
> 
> got it.
> 
> > Baolu told me that Intel may have the same: different domain IDs
> > on different IOMMUs; multiple IOMMU instances on one chip:
> > https://lore.kernel.org/linux-iommu/cf4fe15c-8bcb-4132-a1fd-b2c8ddf27...@linux.intel.com/
> > So, I think we are having the same situation here.
> 
> yes, it's called iommu unit or dmar. A typical Intel server can have
> multiple iommu units. But like Baolu mentioned in that thread, the intel
> iommu driver maintains separate domain ID spaces for iommu units, which
> means a given iommu domain has different DIDs when associated with
> different iommu units. So intel side is not suffering from this so far.

An ARM SMMU has its own VMID pool as well. The suffering comes
from associating VMIDs to one shared parent S2 domain.

Does a DID per S1 nested domain or parent S2? If it is per S2,
I think the same suffering applies when we share the S2 across
IOMMU instances?

> > Adding another vIOMMU wrapper on the other hand can allow us to
> > allocate different VMIDs/DIDs for different IOMMUs.
> 
> that looks like to generalize the association of the iommu domain and the
> iommu units?

A vIOMMU is a presentation/object of a physical IOMMU instance
in a VM. This presentation gives a VMM some capability to take
advantage of some of HW resource of the physical IOMMU:
- a VMID is a small HW reousrce to tag the cache;
- a vIOMMU invalidation allows to access device cache that's
  not straightforwardly done via an S1 HWPT invalidation;
- a virtual device presentation of a physical device in a VM,
  related to the vIOMMU in the VM, which contains some VM-level
  info: virtual device ID, security level (ARM CCA), and etc;
- Non-PRI IRQ forwarding to the guest VM;
- HW-accelerated virtualization resource: vCMDQ, AMD VIOMMU;

Thanks
Nicolin



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-26 Thread Yi Liu

On 2024/9/26 02:55, Nicolin Chen wrote:

On Wed, Sep 25, 2024 at 06:30:20PM +0800, Yi Liu wrote:

Hi Nic,

On 2024/8/28 00:59, Nicolin Chen wrote:

This series introduces a new VIOMMU infrastructure and related ioctls.

IOMMUFD has been using the HWPT infrastructure for all cases, including a
nested IO page table support. Yet, there're limitations for an HWPT-based
structure to support some advanced HW-accelerated features, such as CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the same
parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone.


could you elaborate a bit for the last sentence in the above paragraph?


Stage-2 HWPT/domain on ARM holds a VMID. If we share the parent
domain across IOMMU instances, we'd have to make sure that VMID
is available on all IOMMU instances. There comes the limitation
and potential resource starving, so not ideal.


got it.


Baolu told me that Intel may have the same: different domain IDs
on different IOMMUs; multiple IOMMU instances on one chip:
https://lore.kernel.org/linux-iommu/cf4fe15c-8bcb-4132-a1fd-b2c8ddf27...@linux.intel.com/
So, I think we are having the same situation here.


yes, it's called iommu unit or dmar. A typical Intel server can have
multiple iommu units. But like Baolu mentioned in that thread, the intel
iommu driver maintains separate domain ID spaces for iommu units, which
means a given iommu domain has different DIDs when associated with
different iommu units. So intel side is not suffering from this so far.


Adding another vIOMMU wrapper on the other hand can allow us to
allocate different VMIDs/DIDs for different IOMMUs.


that looks like to generalize the association of the iommu domain and the
iommu units?

--
Regards,
Yi Liu



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-25 Thread Nicolin Chen
On Wed, Sep 25, 2024 at 06:30:20PM +0800, Yi Liu wrote:
> Hi Nic,
> 
> On 2024/8/28 00:59, Nicolin Chen wrote:
> > This series introduces a new VIOMMU infrastructure and related ioctls.
> > 
> > IOMMUFD has been using the HWPT infrastructure for all cases, including a
> > nested IO page table support. Yet, there're limitations for an HWPT-based
> > structure to support some advanced HW-accelerated features, such as CMDQV
> > on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
> > environment, it is not straightforward for nested HWPTs to share the same
> > parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone.
> 
> could you elaborate a bit for the last sentence in the above paragraph?

Stage-2 HWPT/domain on ARM holds a VMID. If we share the parent
domain across IOMMU instances, we'd have to make sure that VMID
is available on all IOMMU instances. There comes the limitation
and potential resource starving, so not ideal.

Baolu told me that Intel may have the same: different domain IDs
on different IOMMUs; multiple IOMMU instances on one chip:
https://lore.kernel.org/linux-iommu/cf4fe15c-8bcb-4132-a1fd-b2c8ddf27...@linux.intel.com/
So, I think we are having the same situation here.

Adding another vIOMMU wrapper on the other hand can allow us to
allocate different VMIDs/DIDs for different IOMMUs.

Thanks
Nic



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-25 Thread Yi Liu

Hi Nic,

On 2024/8/28 00:59, Nicolin Chen wrote:

This series introduces a new VIOMMU infrastructure and related ioctls.

IOMMUFD has been using the HWPT infrastructure for all cases, including a
nested IO page table support. Yet, there're limitations for an HWPT-based
structure to support some advanced HW-accelerated features, such as CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the same
parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone.


could you elaborate a bit for the last sentence in the above paragraph?



The new VIOMMU object is an additional layer, between the nested HWPT and
its parent HWPT, to give to both the IOMMUFD core and an IOMMU driver an
additional structure to support HW-accelerated feature:
  
  | |  paging_hwpt0  |
  | hwpt_nested0 |--->| viommu0 --
  | | HW-accel feats |
  

On a multi-IOMMU system, the VIOMMU object can be instanced to the number
of vIOMMUs in a guest VM, while holding the same parent HWPT to share the
stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own
VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
  
  | |  paging_hwpt0  |
  | hwpt_nested0 |--->| viommu0 --
  | | VMID0  |
  
  
  | |  paging_hwpt0  |
  | hwpt_nested1 |--->| viommu1 --
  | | VMID1  |
  

As an initial part-1, add ioctls to support a VIOMMU-based invalidation:
 IOMMUFD_CMD_VIOMMU_ALLOC to allocate a VIOMMU object
 IOMMUFD_CMD_VIOMMU_SET/UNSET_VDEV_ID to set/clear device's virtual ID
 (Resue IOMMUFD_CMD_HWPT_INVALIDATE for a VIOMMU object to flush cache
  by a given driver data)

Worth noting that the VDEV_ID is for a per-VIOMMU device list for drivers
to look up the device's physical instance from its virtual ID in a VM. It
is essential for a VIOMMU-based invalidation where the request contains a
device's virtual ID for its device cache flush, e.g. ATC invalidation.

As for the implementation of the series, add an IOMMU_VIOMMU_TYPE_DEFAULT
type for a core-allocated-core-managed VIOMMU object, allowing drivers to
simply hook a default viommu ops for viommu-based invalidation alone. And
provide some viommu helpers to drivers for VDEV_ID translation and parent
domain lookup. Add VIOMMU invalidation support to ARM SMMUv3 driver for a
real world use case. This adds supports of arm-smmuv-v3's CMDQ_OP_ATC_INV
and CMDQ_OP_CFGI_CD/ALL commands, supplementing HWPT-based invalidations.

In the future, drivers will also be able to choose a driver-managed type
to hold its own structure by adding a new type to enum iommu_viommu_type.
More VIOMMU-based structures and ioctls will be introduced in part-2/3 to
support a driver-managed VIOMMU, e.g. VQUEUE object for a HW accelerated
queue, VIRQ (or VEVENT) object for IRQ injections. Although we repurposed
the VIOMMU object from an earlier RFC discussion, for a referece:
https://lore.kernel.org/all/cover.1712978212.git.nicol...@nvidia.com/

This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v2
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_viommu_p1-v2

Changelog
v2
  * Limited vdev_id to one per idev
  * Added a rw_sem to protect the vdev_id list
  * Reworked driver-level APIs with proper lockings
  * Added a new viommu_api file for IOMMUFD_DRIVER config
  * Dropped useless iommu_dev point from the viommu structure
  * Added missing index numnbers to new types in the uAPI header
  * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
  * Reworked mock_viommu_cache_invalidate() using the new iommu helper
  * Reordered details of set/unset_vdev_id handlers for proper lockings
  * Added arm_smmu_cache_invalidate_user patch from Jason's nesting series
v1
  https://lore.kernel.org/all/cover.1723061377.git.nicol...@nvidia.com/

Thanks!
Nicolin

Jason Gunthorpe (3):
   iommu: Add iommu_copy_struct_from_full_user_array helper
   iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
   iommu/arm-smmu-v3: Update comments about ATS and bypass

Nicolin Chen (16):
   iommufd: Reorder struct forward declarations
   iommufd/viommu: Add IOMMUFD_OBJ_VIOMMU and IOMMU_VIOMMU_ALLOC ioctl
   iommu: Pass in a viommu pointer to domain_alloc_user op
   iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC
   iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage
   iommufd/viommu: Add IOMMU_VIOMMU_SET/U

Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-11 Thread Nicolin Chen
On Wed, Sep 11, 2024 at 08:08:04AM +, Tian, Kevin wrote:
> > From: Nicolin Chen 
> > Sent: Wednesday, September 11, 2024 3:41 PM
> >
> > On Wed, Sep 11, 2024 at 07:18:10AM +, Tian, Kevin wrote:
> > > > From: Nicolin Chen 
> > > > Sent: Wednesday, September 11, 2024 3:08 PM
> > > >
> > > > On Wed, Sep 11, 2024 at 06:12:21AM +, Tian, Kevin wrote:
> > > > > > From: Nicolin Chen 
> > > > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > > > >
> > > > > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate 
> > > > > > its
> > > > own
> > > > > > VMID to attach the shared stage-2 IO pagetable to the physical 
> > > > > > IOMMU:
> > > > >
> > > > > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from 
> > > > > the
> > > > > entire context it actually means the physical 'VMID' allocated on the
> > > > > associated physical IOMMU, correct?
> > > >
> > > > Quoting Jason's narratives, a VMID is a "Security namespace for
> > > > guest owned ID". The allocation, using SMMU as an example, should
> > >
> > > the VMID alone is not a namespace. It's one ID to tag another namespace.
> > >
> > > > be a part of vIOMMU instance allocation in the host SMMU driver.
> > > > Then, this VMID will be used to mark the cache tags. So, it is
> > > > still a software allocated ID, while HW would use it too.
> > > >
> > >
> > > VMIDs are physical resource belonging to the host SMMU driver.
> >
> > Yes. Just the lifecycle of a VMID is controlled by a vIOMMU, i.e.
> > the guest.
> >
> > > but I got your original point that it's each vIOMMU gets an unique VMID
> > > from the host SMMU driver, not exactly that each vIOMMU maintains
> > > its own VMID namespace. that'd be a different concept.
> >
> > What's a VMID namespace actually? Please educate me :)
> >
> 
> I meant the 16bit VMID pool under each SMMU.

I see. Makes sense now.

Thanks
Nicolin



RE: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-11 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Wednesday, September 11, 2024 3:41 PM
> 
> On Wed, Sep 11, 2024 at 07:18:10AM +, Tian, Kevin wrote:
> > > From: Nicolin Chen 
> > > Sent: Wednesday, September 11, 2024 3:08 PM
> > >
> > > On Wed, Sep 11, 2024 at 06:12:21AM +, Tian, Kevin wrote:
> > > > > From: Nicolin Chen 
> > > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > > >
> > > > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> > > own
> > > > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> > > >
> > > > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > > > entire context it actually means the physical 'VMID' allocated on the
> > > > associated physical IOMMU, correct?
> > >
> > > Quoting Jason's narratives, a VMID is a "Security namespace for
> > > guest owned ID". The allocation, using SMMU as an example, should
> >
> > the VMID alone is not a namespace. It's one ID to tag another namespace.
> >
> > > be a part of vIOMMU instance allocation in the host SMMU driver.
> > > Then, this VMID will be used to mark the cache tags. So, it is
> > > still a software allocated ID, while HW would use it too.
> > >
> >
> > VMIDs are physical resource belonging to the host SMMU driver.
> 
> Yes. Just the lifecycle of a VMID is controlled by a vIOMMU, i.e.
> the guest.
> 
> > but I got your original point that it's each vIOMMU gets an unique VMID
> > from the host SMMU driver, not exactly that each vIOMMU maintains
> > its own VMID namespace. that'd be a different concept.
> 
> What's a VMID namespace actually? Please educate me :)
> 

I meant the 16bit VMID pool under each SMMU.



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-11 Thread Nicolin Chen
On Wed, Sep 11, 2024 at 07:18:10AM +, Tian, Kevin wrote:
> > From: Nicolin Chen 
> > Sent: Wednesday, September 11, 2024 3:08 PM
> >
> > On Wed, Sep 11, 2024 at 06:12:21AM +, Tian, Kevin wrote:
> > > > From: Nicolin Chen 
> > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > >
> > > [...]
> > > > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > > > number
> > > > of vIOMMUs in a guest VM, while holding the same parent HWPT to
> > share
> > > > the
> > >
> > > Is there restriction that multiple vIOMMU objects can be only created
> > > on a multi-IOMMU system?
> >
> > I think it should be generally restricted to the number of pIOMMUs,
> > although likely (not 100% sure) we could do multiple vIOMMUs on a
> > single-pIOMMU system. Any reason for doing that?
> 
> No idea. But if you stated so then there will be code to enforce it e.g.
> failing the attempt to create a vIOMMU object on a pIOMMU to which
> another vIOMMU object is already linked?

Yea, I can do that.

> > > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> > own
> > > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> > >
> > > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > > entire context it actually means the physical 'VMID' allocated on the
> > > associated physical IOMMU, correct?
> >
> > Quoting Jason's narratives, a VMID is a "Security namespace for
> > guest owned ID". The allocation, using SMMU as an example, should
> 
> the VMID alone is not a namespace. It's one ID to tag another namespace.
> 
> > be a part of vIOMMU instance allocation in the host SMMU driver.
> > Then, this VMID will be used to mark the cache tags. So, it is
> > still a software allocated ID, while HW would use it too.
> >
> 
> VMIDs are physical resource belonging to the host SMMU driver.

Yes. Just the lifecycle of a VMID is controlled by a vIOMMU, i.e.
the guest.

> but I got your original point that it's each vIOMMU gets an unique VMID
> from the host SMMU driver, not exactly that each vIOMMU maintains
> its own VMID namespace. that'd be a different concept.

What's a VMID namespace actually? Please educate me :)

Thanks
Nicolin



RE: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-11 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Wednesday, September 11, 2024 3:08 PM
> 
> On Wed, Sep 11, 2024 at 06:12:21AM +, Tian, Kevin wrote:
> > > From: Nicolin Chen 
> > > Sent: Wednesday, August 28, 2024 1:00 AM
> > >
> > [...]
> > > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > > number
> > > of vIOMMUs in a guest VM, while holding the same parent HWPT to
> share
> > > the
> >
> > Is there restriction that multiple vIOMMU objects can be only created
> > on a multi-IOMMU system?
> 
> I think it should be generally restricted to the number of pIOMMUs,
> although likely (not 100% sure) we could do multiple vIOMMUs on a
> single-pIOMMU system. Any reason for doing that?

No idea. But if you stated so then there will be code to enforce it e.g.
failing the attempt to create a vIOMMU object on a pIOMMU to which
another vIOMMU object is already linked?

> 
> > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> own
> > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> >
> > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > entire context it actually means the physical 'VMID' allocated on the
> > associated physical IOMMU, correct?
> 
> Quoting Jason's narratives, a VMID is a "Security namespace for
> guest owned ID". The allocation, using SMMU as an example, should

the VMID alone is not a namespace. It's one ID to tag another namespace.

> be a part of vIOMMU instance allocation in the host SMMU driver.
> Then, this VMID will be used to mark the cache tags. So, it is
> still a software allocated ID, while HW would use it too.
> 

VMIDs are physical resource belonging to the host SMMU driver.

but I got your original point that it's each vIOMMU gets an unique VMID
from the host SMMU driver, not exactly that each vIOMMU maintains
its own VMID namespace. that'd be a different concept.



Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-11 Thread Nicolin Chen
On Wed, Sep 11, 2024 at 06:12:21AM +, Tian, Kevin wrote:
> > From: Nicolin Chen 
> > Sent: Wednesday, August 28, 2024 1:00 AM
> >
> [...]
> > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > number
> > of vIOMMUs in a guest VM, while holding the same parent HWPT to share
> > the
> 
> Is there restriction that multiple vIOMMU objects can be only created
> on a multi-IOMMU system?

I think it should be generally restricted to the number of pIOMMUs,
although likely (not 100% sure) we could do multiple vIOMMUs on a
single-pIOMMU system. Any reason for doing that?

> > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own
> > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> 
> this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> entire context it actually means the physical 'VMID' allocated on the
> associated physical IOMMU, correct?

Quoting Jason's narratives, a VMID is a "Security namespace for
guest owned ID". The allocation, using SMMU as an example, should
be a part of vIOMMU instance allocation in the host SMMU driver.
Then, this VMID will be used to mark the cache tags. So, it is
still a software allocated ID, while HW would use it too.

Thanks
Nicolin



RE: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

2024-09-10 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Wednesday, August 28, 2024 1:00 AM
> 
[...]
> On a multi-IOMMU system, the VIOMMU object can be instanced to the
> number
> of vIOMMUs in a guest VM, while holding the same parent HWPT to share
> the

Is there restriction that multiple vIOMMU objects can be only created
on a multi-IOMMU system?

> stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own
> VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:

this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
entire context it actually means the physical 'VMID' allocated on the
associated physical IOMMU, correct?