[PATCH for-4.19?] x86/IOMMU: Move allocation in iommu_identity_mapping

2024-07-17 Thread Teddy Astie
If for some reason, xmalloc fails after having mapped the
reserved regions, a error is reported, but the regions are
actually mapped in p2m.

Move the allocation before trying to map the regions, in
case the allocation fails, no mapping is actually done
which could allows this operation to be retried with the
same regions without failing due to already existing mappings.

Fixes: c0e19d7c6c ("IOMMU: generalize VT-d's tracking of mapped RMRR regions")
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/x86/iommu.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/x86/iommu.c 
b/xen/drivers/passthrough/x86/iommu.c
index cc0062b027..b6bc6d71cb 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -267,18 +267,22 @@ int iommu_identity_mapping(struct domain *d, p2m_access_t 
p2ma,
 if ( p2ma == p2m_access_x )
 return -ENOENT;
 
+map = xmalloc(struct identity_map);
+if ( !map )
+return -ENOMEM;
+
 while ( base_pfn < end_pfn )
 {
 int err = set_identity_p2m_entry(d, base_pfn, p2ma, flag);
 
 if ( err )
+{
+xfree(map);
 return err;
+}
 base_pfn++;
 }
 
-map = xmalloc(struct identity_map);
-if ( !map )
-return -ENOMEM;
 map->base = base;
 map->end = end;
 map->access = p2ma;
-- 
2.45.2



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[RFC XEN] xen/public: PV-IOMMU interface header proposal

2024-07-15 Thread Teddy Astie
Hello,

After some discussions with Alejandro and Jan on the PV-IOMMU design 
patch [1], I've redesigned parts of the interface with some proposals 
and to be more similar to other hypercalls.

In addition to that, I've took a step further and added tentative design 
for some new features (including optional ones) that could be worth to 
consider as well.

[1] : 
https://lists.xenproject.org/archives/html/xen-devel/2024-07/msg00641.html

A summary of the changes :
- I replaced the tagged union style with a parameter in the hypercall 
with various structures in a similar fashion than other Xen hypercalls.
- There is no more page limit on map/unmap operation (it can be 
implemented with continuations hypervisor-side).
- Map/Unmap now take a "pgsize" parameter (that needs to be supported)
- There is now a capability "bitmap" (cap_flags) that inform the guest 
of what features are supported.
- Added cache, nested and pasid optional features.
- Add a flag indicating if the "default context" is a identity one, the 
idea is to allow to create domain where device doesn't have access to 
guest memory by default (e.g to implement pre-boot DMA protection). If 
supported and needed, the guest can create a "identity mapped" context.
- Add a way to do operations on other domains [2]

[2] : The idea is to use this feature for VT-d/AMD-Vi emulation with 
physical/passed-through devices, such as QEMU can reflect on IOMMU 
subsystem operations made on emulated-IOMMU. It can also be combined 
with nested to allow hardware assisted IOMMU emulation.

## Pasid feature

The idea behind the PASID implementation is (if supported by IOMMU) to 
allow a single device (identified with BDF) to access multiples address 
spaces using PASID tagged DMA (VT-d specs describe it as 
"Requests-with-PASID"). To do that, we can introduce the concept of 
"device-with-pasid" which is a IOMMU subsystem device unit that we can 
summon and at some point, move to another domain (similarly to PCI 
passthrough but without BAR).
This is a part of introducing support for Scalable IOV and could be 
useful for virtio-gpu as well. That can also be used for adding support 
for Linux SVA.

---

/* SPDX-License-Identifier: MIT */
/**
  * pv-iommu.h
  *
  * Paravirtualized IOMMU driver interface.
  *
  * Copyright (c) 2024 Teddy Astie 
  */

#ifndef __XEN_PUBLIC_PV_IOMMU_H__
#define __XEN_PUBLIC_PV_IOMMU_H__

#include "stdint.h"
#include "xen.h"
#include "physdev.h"

#define IOMMU_DEFAULT_CONTEXT (0)

enum {
 /* Basic cmd */
 IOMMU_noop = 0,
 IOMMU_query_capabilities,
 IOMMU_alloc_context,
 IOMMU_free_context,
 IOMMU_reattach_device,
 IOMMU_map_pages,
 IOMMU_unmap_pages,
 IOMMU_lookup_page,
 IOMMU_remote_cmd,

 /* Extended cmd */
 IOMMU_alloc_nested, /* if IOMMUCAP_nested */
 IOMMU_flush_nested, /* if IOMMUCAP_nested */
 IOMMU_attach_pasid, /* if IOMMUCAP_pasid */
 IOMMU_detach_pasid, /* if IOMMUCAP_pasid */
};

/**
  * Indicate if the default context is a identity mapping to domain memory.
  * If not defined, default context blocks all DMA to domain memory.
  */
#define IOMMUCAP_default_identity  (1 << 0)

/**
  * IOMMU_MAP_cache support.
  */
#define IOMMUCAP_cache (1 << 1)

/**
  * Support for IOMMU_alloc_nested.
  */
#define IOMMUCAP_nested(1 << 2)

/**
  * Support for IOMMU_attach_pasid and IOMMU_detach_pasid and pasid 
parameter in
  * reattach_context.
  */
#define IOMMUCAP_pasid (1 << 3)

/**
  * Support for IOMMU_ALLOC_identity
  */
#define IOMMUCAP_identity (1 << 4)

/**
  * IOMMU_query_capabilities
  * Query PV-IOMMU capabilities for this domain.
  */
struct pv_iommu_capabilities {
 /*
  * OUT: Maximum device address (iova) that the guest can use for 
mappings.
  */
 uint64_aligned_t max_iova_addr;

 /* OUT: IOMMU capabilities flags */
 uint32_t cap_flags;

 /* OUT: Mask of all supported page sizes. */
 uint32_t pgsize_mask;

 /* OUT: Maximum pasid (if IOMMUCAP_pasid) */
 uint32_t max_pasid;

 /* OUT: Maximum number of IOMMU context this domain can use. */
 uint16_t max_ctx_no;
};
typedef struct pv_iommu_capabilities pv_iommu_capabilities_t;
DEFINE_XEN_GUEST_HANDLE(pv_iommu_capabilities_t);

/**
  * Create a 1:1 identity mapped context to domain memory
  * (needs IOMMUCAP_identity).
  */
#define IOMMU_ALLOC_identity (1 << 0)

/**
  * IOMMU_alloc_context
  * Allocate an IOMMU context.
  * Fails with -ENOSPC if no context number is available.
  */
struct pv_iommu_alloc {
 /* OUT: allocated IOMMU context number */
 uint16_t ctx_no;

 /* IN: allocation flags */
 uint32_t alloc_flags;
};
typedef struct pv_iommu_alloc pv_iommu_alloc_t;
DEFINE_XEN_GUEST_HANDLE(pv_iommu_alloc_t);

/**
  * Move all devices to default context before freeing the context.
  */
#define IOMMU_FREE_reattach

Re: [RFC XEN PATCH v3 5/5] xen/public: Introduce PV-IOMMU hypercall interface

2024-07-12 Thread Teddy Astie
Hello Jan,

Le 12/07/2024 à 12:46, Jan Beulich a écrit :
> On 11.07.2024 21:20, Alejandro Vallejo wrote:
>> On Thu Jul 11, 2024 at 3:04 PM BST, Teddy Astie wrote:
>>> --- /dev/null
>>> +++ b/xen/common/pv-iommu.c
>>> @@ -0,0 +1,328 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +/*
>>> + * xen/common/pv_iommu.c
>>> + *
>>> + * PV-IOMMU hypercall interface.
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#define PVIOMMU_PREFIX "[PV-IOMMU] "
>>> +
>>> +#define PVIOMMU_MAX_PAGES 256 /* Move to Kconfig ? */
>>
>> It probably wants to be a cmdline argument, I think.
>
> For Dom0. For DomU-s it wants to be a guest config setting, I suppose. Then
> again I wonder if I understand the purpose of this correctly: The number looks
> surprisingly small if it was something the guest may use for arranging its
> mappings.
>
> Jan

Makes sense to be a guest setting for DomUs. I don't think this limit is
too small, actually it means that we can can map up to 1M of contiguous
memory in a single hypercall, in the guest case (e.g Linux), it very
rarely goes beyond this limit.

I put this limit to prevent the guest from trying to map millions of
pages, which is going to take a while (and may cause stability issues).
And to actually give a chance for Xen to preempt the guest (and keep the
ability to shut it down between 2 hypercalls).

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




Re: [RFC XEN PATCH v3 5/5] xen/public: Introduce PV-IOMMU hypercall interface

2024-07-12 Thread Teddy Astie
ectures. I know this instance (and the others)
> happen to be aligned in both cases, but still.
> 

Makes sense

> That said, either these are 32bit frame numbers or otherwise we're better off
> using addresses instead.  It'd also be less confusing on systems with several
> page sizes, and it'd be specially less confusing if an additional parameter of
> "order" or "page size" was added here.
> 

It's something pretty similar to what Linux expects, and it is in fact 
in guest responsibility to ensure that addresses are aligned to page 
size and that the page size is actually supported.

While guest_addr (replacement for gfn) could be a good fit for phys_addr 
or something similar. I would keep dfn as uint64_t as it's perfectly 
valid to be able to map on >4G areas on a 32-bits platform.
But we want to make sure that all the guest memory can be represented as 
a phys_addr (e.g 32-bits addr may not be enough for a i686 guest due to 
PAE).

>> +/* Number of pages to map */
>> +uint32_t nr_pages;
>> +/* Number of pages actually mapped after sub-op */
>> +uint32_t mapped;
> 
> A more helpful output might some kind of err_addr with the offending page. 
> Then
> the caller doesn't have to do arithmetic to print the error somewhere.
> 

This value is used as it is the kind of result Linux wants for IOMMU. 
Linux has a return value which is the number of pages actually mapped 
(if it's less than expected, it will retry though and if it is zero, it 
assumes it's a failure).

> Furthermore, seeing how this is something that should not happen I'd be 
> tempted
> to amend the spec to somehow roll back whatever we just did, but that could be
> tricky with big numbers in `nr_pages`.
> 

Being able to rollback may be tricky as it supposes you keep previous 
state. For instance, when doing unmap, you practically need to backup 
parts of the pagetable or do CoW on it, so you can revert in case of 
something goes wrong.
Otherwise, all modifications made directly on the pagetable may be 
immediately effective if those entries are not cached by IOTLB.

That's possible, but needs a more sophisticated management logic of page 
tables at first.


---

I will try to do some changes on this interface with some eventual new 
features (even if unimplemented yet) and some of your proposals and 
publish it as an independent RFC.

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



Re: [RFC XEN PATCH v3 1/5] docs/designs: Add a design document for PV-IOMMU

2024-07-12 Thread Teddy Astie
Hello Alejandro, thanks for reply !

Le 11/07/2024 à 20:26, Alejandro Vallejo a écrit :
> Disclaimer: I haven't looked at the code yet.
>
> On Thu Jul 11, 2024 at 3:04 PM BST, Teddy Astie wrote:
>> Some operating systems want to use IOMMU to implement various features (e.g
>> VFIO) or DMA protection.
>> This patch introduce a proposal for IOMMU paravirtualization for Dom0.
>>
>> Signed-off-by Teddy Astie 
>> ---
>>   docs/designs/pv-iommu.md | 105 +++
>>   1 file changed, 105 insertions(+)
>>   create mode 100644 docs/designs/pv-iommu.md
>>
>> diff --git a/docs/designs/pv-iommu.md b/docs/designs/pv-iommu.md
>> new file mode 100644
>> index 00..c01062a3ad
>> --- /dev/null
>> +++ b/docs/designs/pv-iommu.md
>> @@ -0,0 +1,105 @@
>> +# IOMMU paravirtualization for Dom0
>> +
>> +Status: Experimental
>> +
>> +# Background
>> +
>> +By default, Xen only uses the IOMMU for itself, either to make device adress
>> +space coherent with guest adress space (x86 HVM/PVH) or to prevent devices
>> +from doing DMA outside it's expected memory regions including the hypervisor
>> +(x86 PV).
>
> "By default...": Do you mean "currently"?
>

Yes, that's what I mean with default here.

>> +
>> +[1] VFIO - "Virtual Function I/O" - 
>> https://www.kernel.org/doc/html/latest/driver-api/vfio.html
>> +
>> +# Design
>> +
>> +The operating system may want to have access to various IOMMU features such 
>> as
>> +context management and DMA remapping. We can create a new hypercall that 
>> allows
>> +the guest to have access to a new paravirtualized IOMMU interface.
>> +
>> +This feature is only meant to be available for the Dom0, as DomU have some
>> +emulated devices that can't be managed on Xen side and are not hardware, we
>> +can't rely on the hardware IOMMU to enforce DMA remapping.
>
> Is that the reason though? While it's true we can't mix emulated and real
> devices under the same emulated PCI bus covered by an IOMMU, nothing prevents 
> us
> from stating "the IOMMU(s) configured via PV-IOMMU cover from busN to busM".
>
> AFAIK, that already happens on systems with several IOMMUs, where they might
> affect partially disjoint devices. But I admit I'm no expert on this.
>
I am not a expert on how emulated devices are exposed, but the guest
will definitely need a way to know if a device is hardware or not.

But I think the situation will be different whether we do PV or HVM. In
PV, there is no emulated device AFAIK, so no need for identifying. In
case of HVM, there is though, which we should consider.

There is still the question of interactions between eventual future
IOMMU emulation (VT-d with QEMU) that can be allowed to act on real
devices (e.g by relying on the new IOMMU infrastructure) and PV-IOMMU.

> I can definitely see a lot of interesting use cases for a PV-IOMMU interface
> exposed to domUs (it'd be a subset of that of dom0, obviously); that'd
> allow them to use the IOMMU without resorting to 2-stage translation, which 
> has
> terrible IOTLB miss costs.
>

Makes sense, could be useful for e.g storage domains with support for
SPDK. Do note that 2-stage IOMMU translation is only supported by very
modern hardware (e.g Xeon Scalable 4th gen).

>> +
>> +This interface is exposed under the `iommu_op` hypercall.
>> +
>> +In addition, Xen domains are modified in order to allow existence of several
>> +IOMMU context including a default one that implement default behavior (e.g
>> +hardware assisted paging) and can't be modified by guest. DomU cannot have
>> +contexts, and therefore act as if they only have the default domain.
>> +
>> +Each IOMMU context within a Xen domain is identified using a domain-specific
>> +context number that is used in the Xen IOMMU subsystem and the hypercall
>> +interface.
>> +
>> +The number of IOMMU context a domain can use is predetermined at domain 
>> creation
>> +and is configurable through `dom0-iommu=nb-ctx=N` xen cmdline.
>
> nit: I think it's more typical within Xen to see "nr" rather than "nb"
>

yes

>> +
>> +# IOMMU operations
>> +
>> +## Alloc context
>> +
>> +Create a new IOMMU context for the guest and return the context number to 
>> the
>> +guest.
>> +Fail if the IOMMU context limit of the guest is reached.
>
> or -ENOMEM, I guess.
>
> I'm guessing from this dom0 takes care of the contexts for guests? Or are 
> these
> contexts for use within dom0 exclusively?
>

Each domain has a set of "I

[RFC XEN PATCH v3 0/5] IOMMU subsystem redesign and PV-IOMMU interface

2024-07-11 Thread Teddy Astie
This work has been presented at Xen Summit 2024 during the
  IOMMU paravirtualization and Xen IOMMU subsystem rework
design session.

Operating systems may want to have access to a IOMMU in order to do DMA
protection or implement certain features (e.g VFIO on Linux).

VFIO support is mandatory for framework such as SPDK, which can be useful to
implement an alternative storage backend for virtual machines [1].

In this patch series, we introduce in Xen the ability to manage several
contexts per domain and provide a new hypercall interface to allow guests
to manage IOMMU contexts.

The VT-d driver is updated to support these new features.

[1] Using SPDK with the Xen hypervisor - FOSDEM 2023

---
Changed in v2 :
* fixed Xen crash when dumping IOMMU contexts (using X debug key)
with DomUs without IOMMU
* s/dettach/detach/
* removed some unused includes
* fix dangling devices in contexts with detach

Changed in v3 :
* lock entirely map/unmap in hypercall
* prevent IOMMU operations on dying contexts (fix race condition)
* iommu_check_context+iommu_get_context -> iommu_get_context and check for NULL

Teddy Astie (5):
  docs/designs: Add a design document for PV-IOMMU
  docs/designs: Add a design document for IOMMU subsystem redesign
  IOMMU: Introduce redesigned IOMMU subsystem
  VT-d: Port IOMMU driver to new subsystem
  xen/public: Introduce PV-IOMMU hypercall interface
---

 docs/designs/iommu-contexts.md   |  398 +++
 docs/designs/pv-iommu.md |  105 ++
 xen/arch/x86/domain.c|2 +-
 xen/arch/x86/include/asm/arena.h |   54 +
 xen/arch/x86/include/asm/iommu.h |   44 +-
 xen/arch/x86/include/asm/pci.h   |   17 -
 xen/arch/x86/mm/p2m-ept.c|2 +-
 xen/arch/x86/pv/dom0_build.c |4 +-
 xen/arch/x86/tboot.c |4 +-
 xen/common/Makefile  |1 +
 xen/common/memory.c  |4 +-
 xen/common/pv-iommu.c|  328 ++
 xen/drivers/passthrough/Kconfig  |   14 +
 xen/drivers/passthrough/Makefile |3 +
 xen/drivers/passthrough/context.c|  649 +++
 xen/drivers/passthrough/iommu.c  |  337 ++
 xen/drivers/passthrough/pci.c|   49 +-
 xen/drivers/passthrough/quarantine.c |   49 +
 xen/drivers/passthrough/vtd/Makefile |2 +-
 xen/drivers/passthrough/vtd/extern.h |   14 +-
 xen/drivers/passthrough/vtd/iommu.c  | 1557 +++---
 xen/drivers/passthrough/vtd/quirks.c |   21 +-
 xen/drivers/passthrough/x86/Makefile |1 +
 xen/drivers/passthrough/x86/arena.c  |  157 +++
 xen/drivers/passthrough/x86/iommu.c  |  104 +-
 xen/include/hypercall-defs.c |6 +
 xen/include/public/pv-iommu.h|  114 ++
 xen/include/public/xen.h |1 +
 xen/include/xen/iommu.h  |  120 +-
 xen/include/xen/pci.h|3 +
 30 files changed, 2855 insertions(+), 1309 deletions(-)
 create mode 100644 docs/designs/iommu-contexts.md
 create mode 100644 docs/designs/pv-iommu.md
 create mode 100644 xen/arch/x86/include/asm/arena.h
 create mode 100644 xen/common/pv-iommu.c
 create mode 100644 xen/drivers/passthrough/context.c
 create mode 100644 xen/drivers/passthrough/quarantine.c
 create mode 100644 xen/drivers/passthrough/x86/arena.c
 create mode 100644 xen/include/public/pv-iommu.h

-- 
2.45.2



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[RFC XEN PATCH v3 5/5] xen/public: Introduce PV-IOMMU hypercall interface

2024-07-11 Thread Teddy Astie
Introduce a new pv interface to manage the underlying IOMMU and manage contexts
and devices. This interface allows creation of new contexts from Dom0 and
addition of IOMMU mappings using guest PoV.

This interface doesn't allow creation of mapping to other domains.

Signed-off-by Teddy Astie 
---
Changed in V2:
* formatting

Changed in V3:
* prevent IOMMU operations on dying contexts
---
 xen/common/Makefile   |   1 +
 xen/common/pv-iommu.c | 328 ++
 xen/include/hypercall-defs.c  |   6 +
 xen/include/public/pv-iommu.h | 114 
 xen/include/public/xen.h  |   1 +
 5 files changed, 450 insertions(+)
 create mode 100644 xen/common/pv-iommu.c
 create mode 100644 xen/include/public/pv-iommu.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index f12a474d40..52ada89888 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -58,6 +58,7 @@ obj-y += wait.o
 obj-bin-y += warning.init.o
 obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
+obj-y += pv-iommu.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma lzo unlzo 
unlz4 unzstd earlycpio,$(n).init.o)
 
diff --git a/xen/common/pv-iommu.c b/xen/common/pv-iommu.c
new file mode 100644
index 00..a94c0f1e1a
--- /dev/null
+++ b/xen/common/pv-iommu.c
@@ -0,0 +1,328 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * xen/common/pv_iommu.c
+ *
+ * PV-IOMMU hypercall interface.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PVIOMMU_PREFIX "[PV-IOMMU] "
+
+#define PVIOMMU_MAX_PAGES 256 /* Move to Kconfig ? */
+
+/* Allowed masks for each sub-operation */
+#define ALLOC_OP_FLAGS_MASK (0)
+#define FREE_OP_FLAGS_MASK (IOMMU_TEARDOWN_REATTACH_DEFAULT)
+
+static int get_paged_frame(struct domain *d, gfn_t gfn, mfn_t *mfn,
+   struct page_info **page, int readonly)
+{
+p2m_type_t p2mt;
+
+*page = get_page_from_gfn(d, gfn_x(gfn), ,
+ (readonly) ? P2M_ALLOC : P2M_UNSHARE);
+
+if ( !(*page) )
+{
+*mfn = INVALID_MFN;
+if ( p2m_is_shared(p2mt) )
+return -EINVAL;
+if ( p2m_is_paging(p2mt) )
+{
+p2m_mem_paging_populate(d, gfn);
+return -EIO;
+}
+
+return -EPERM;
+}
+
+*mfn = page_to_mfn(*page);
+
+return 0;
+}
+
+static int can_use_iommu_check(struct domain *d)
+{
+if ( !iommu_enabled )
+{
+printk(PVIOMMU_PREFIX "IOMMU is not enabled\n");
+return 0;
+}
+
+if ( !is_hardware_domain(d) )
+{
+printk(PVIOMMU_PREFIX "Non-hardware domain\n");
+return 0;
+}
+
+if ( !is_iommu_enabled(d) )
+{
+printk(PVIOMMU_PREFIX "IOMMU disabled for this domain\n");
+return 0;
+}
+
+return 1;
+}
+
+static long query_cap_op(struct pv_iommu_op *op, struct domain *d)
+{
+op->cap.max_ctx_no = d->iommu.other_contexts.count;
+op->cap.max_nr_pages = PVIOMMU_MAX_PAGES;
+op->cap.max_iova_addr = (1LLU << 39) - 1; /* TODO: hardcoded 39-bits */
+
+return 0;
+}
+
+static long alloc_context_op(struct pv_iommu_op *op, struct domain *d)
+{
+u16 ctx_no = 0;
+int status = 0;
+
+status = iommu_context_alloc(d, _no, op->flags & ALLOC_OP_FLAGS_MASK);
+
+if (status < 0)
+return status;
+
+printk("Created context %hu\n", ctx_no);
+
+op->ctx_no = ctx_no;
+return 0;
+}
+
+static long free_context_op(struct pv_iommu_op *op, struct domain *d)
+{
+return iommu_context_free(d, op->ctx_no,
+  IOMMU_TEARDOWN_PREEMPT | (op->flags & 
FREE_OP_FLAGS_MASK));
+}
+
+static long reattach_device_op(struct pv_iommu_op *op, struct domain *d)
+{
+struct physdev_pci_device dev = op->reattach_device.dev;
+device_t *pdev;
+
+pdev = pci_get_pdev(d, PCI_SBDF(dev.seg, dev.bus, dev.devfn));
+
+if ( !pdev )
+return -ENOENT;
+
+return iommu_reattach_context(d, d, pdev, op->ctx_no);
+}
+
+static long map_pages_op(struct pv_iommu_op *op, struct domain *d)
+{
+int ret = 0, flush_ret;
+struct page_info *page = NULL;
+mfn_t mfn;
+unsigned int flags;
+unsigned int flush_flags = 0;
+size_t i = 0;
+
+if ( op->map_pages.nr_pages > PVIOMMU_MAX_PAGES )
+return -E2BIG;
+
+if ( !iommu_check_context(d, op->ctx_no) )
+return -EINVAL;
+
+//printk("Mapping gfn:%lx-%lx to dfn:%lx-%lx on %hu\n",
+//   op->map_pages.gfn, op->map_pages.gfn + op->map_pages.nr_pages - 1,
+//   op->map_pages.dfn, op->map_pages.dfn + op->map_pages.nr_pages - 1,
+//   op->ctx_no);
+
+flags = 0;
+
+if ( op->flags & IOMMU_OP_readable )
+flags |= IOMMUF_readable;
+
+if ( op->flags & IOMMU_OP_writeable )
+flags |= IOM

[RFC XEN PATCH v3 2/5] docs/designs: Add a design document for IOMMU subsystem redesign

2024-07-11 Thread Teddy Astie
Current IOMMU subsystem has some limitations that make PV-IOMMU practically 
impossible.
One of them is the assumtion that each domain is bound to a single "IOMMU 
domain", which
also causes complications with quarantine implementation.

Moreover, current IOMMU subsystem is not entirely well-defined, for instance, 
the behavior
of map_page between ARM SMMUv3 and x86 VT-d/AMD-Vi greatly differs. On ARM, it 
can modifies
the domain page table while on x86, it may be forbidden (e.g using HAP with 
PVH), or only
modifying the devices PoV (e.g using PV).

The goal of this redesign is to define more explicitely the behavior and 
interface of the
IOMMU subsystem while allowing PV-IOMMU to be effectively implemented.

Signed-off-by Teddy Astie 
---
Changed in V2:
* nit s/dettach/detach/
---
 docs/designs/iommu-contexts.md | 398 +
 1 file changed, 398 insertions(+)
 create mode 100644 docs/designs/iommu-contexts.md

diff --git a/docs/designs/iommu-contexts.md b/docs/designs/iommu-contexts.md
new file mode 100644
index 00..8211f91692
--- /dev/null
+++ b/docs/designs/iommu-contexts.md
@@ -0,0 +1,398 @@
+# IOMMU context management in Xen
+
+Status: Experimental
+Revision: 0
+
+# Background
+
+The design for *IOMMU paravirtualization for Dom0* [1] explains that some 
guests may
+want to access to IOMMU features. In order to implement this in Xen, several 
adjustments
+needs to be made to the IOMMU subsystem.
+
+This "hardware IOMMU domain" is currently implemented on a per-domain basis 
such as each
+domain actually has a specific *hardware IOMMU domain*, this design aims to 
allow a
+single Xen domain to manage several "IOMMU context", and allow some domains 
(e.g Dom0
+[1]) to modify their IOMMU contexts.
+
+In addition to this, quarantine feature can be refactored into using IOMMU 
contexts
+to reduce the complexity of platform-specific implementations and ensuring more
+consistency across platforms.
+
+# IOMMU context
+
+We define a "IOMMU context" as being a *hardware IOMMU domain*, but named as a 
context
+to avoid confusion with Xen domains.
+It represents some hardware-specific data structure that contains mappings 
from a device
+frame-number to a machine frame-number (e.g using a pagetable) that can be 
applied to
+a device using IOMMU hardware.
+
+This structure is bound to a Xen domain, but a Xen domain may have several 
IOMMU context.
+These contexts may be modifiable using the interface as defined in [1] aside 
some
+specific cases (e.g modifying default context).
+
+This is implemented in Xen as a new structure that will hold context-specific
+data.
+
+```c
+struct iommu_context {
+u16 id; /* Context id (0 means default context) */
+struct list_head devices;
+
+struct arch_iommu_context arch;
+
+bool opaque; /* context can't be modified nor accessed (e.g HAP) */
+};
+```
+
+A context is identified by a number that is domain-specific and may be used by 
IOMMU
+users such as PV-IOMMU by the guest.
+
+struct arch_iommu_context is splited from struct arch_iommu
+
+```c
+struct arch_iommu_context
+{
+spinlock_t pgtables_lock;
+struct page_list_head pgtables;
+
+union {
+/* Intel VT-d */
+struct {
+uint64_t pgd_maddr; /* io page directory machine address */
+domid_t *didmap; /* per-iommu DID */
+unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the 
context uses */
+} vtd;
+/* AMD IOMMU */
+struct {
+struct page_info *root_table;
+} amd;
+};
+};
+
+struct arch_iommu
+{
+spinlock_t mapping_lock; /* io page table lock */
+struct {
+struct page_list_head list;
+spinlock_t lock;
+} pgtables;
+
+struct list_head identity_maps;
+
+union {
+/* Intel VT-d */
+struct {
+/* no more context-specific values */
+unsigned int agaw; /* adjusted guest address width, 0 is level 2 
30-bit */
+} vtd;
+/* AMD IOMMU */
+struct {
+unsigned int paging_mode;
+struct guest_iommu *g_iommu;
+} amd;
+};
+};
+```
+
+IOMMU context information is now carried by iommu_context rather than being 
integrated to
+struct arch_iommu.
+
+# Xen domain IOMMU structure
+
+`struct domain_iommu` is modified to allow multiples context within a single 
Xen domain
+to exist :
+
+```c
+struct iommu_context_list {
+uint16_t count; /* Context count excluding default context */
+
+/* if count > 0 */
+
+uint64_t *bitmap; /* bitmap of context allocation */
+struct iommu_context *map; /* Map of contexts */
+};
+
+struct domain_iommu {
+/* ... */
+
+struct iommu_context default_ctx;
+struct iommu_context_list other_contexts;
+
+/* ... */
+}
+```
+
+default_ctx is a special context with id=0 that holds the page table mapping 
the entire
+domain, which basically preserve the p

[RFC XEN PATCH v3 4/5] VT-d: Port IOMMU driver to new subsystem

2024-07-11 Thread Teddy Astie
Port the driver with guidances specified in iommu-contexts.md.

Add a arena-based allocator for allocating a fixed chunk of memory and
split it into 4k pages for use by the IOMMU contexts. This chunk size
is configurable with X86_ARENA_ORDER and dom0-iommu=arena-order=N.

Signed-off-by Teddy Astie 
---
Changed in V2:
* cleanup some unneeded includes
* s/dettach/detach/
* don't dump IOMMU context of non-iommu domains (fix crash with DomUs)
---
 xen/arch/x86/include/asm/arena.h |   54 +
 xen/arch/x86/include/asm/iommu.h |   44 +-
 xen/arch/x86/include/asm/pci.h   |   17 -
 xen/drivers/passthrough/Kconfig  |   14 +
 xen/drivers/passthrough/vtd/Makefile |2 +-
 xen/drivers/passthrough/vtd/extern.h |   14 +-
 xen/drivers/passthrough/vtd/iommu.c  | 1557 +++---
 xen/drivers/passthrough/vtd/quirks.c |   21 +-
 xen/drivers/passthrough/x86/Makefile |1 +
 xen/drivers/passthrough/x86/arena.c  |  157 +++
 xen/drivers/passthrough/x86/iommu.c  |  104 +-
 11 files changed, 981 insertions(+), 1004 deletions(-)
 create mode 100644 xen/arch/x86/include/asm/arena.h
 create mode 100644 xen/drivers/passthrough/x86/arena.c

diff --git a/xen/arch/x86/include/asm/arena.h b/xen/arch/x86/include/asm/arena.h
new file mode 100644
index 00..7555b100e0
--- /dev/null
+++ b/xen/arch/x86/include/asm/arena.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/**
+ * Simple arena-based page allocator.
+ */
+
+#ifndef __XEN_IOMMU_ARENA_H__
+#define __XEN_IOMMU_ARENA_H__
+
+#include "xen/domain.h"
+#include "xen/atomic.h"
+#include "xen/mm-frame.h"
+#include "xen/types.h"
+
+/**
+ * struct page_arena: Page arena structure
+ */
+struct iommu_arena {
+/* mfn of the first page of the memory region */
+mfn_t region_start;
+/* bitmap of allocations */
+unsigned long *map;
+
+/* Order of the arena */
+unsigned int order;
+
+/* Used page count */
+atomic_t used_pages;
+};
+
+/**
+ * Initialize a arena using domheap allocator.
+ * @param [out] arena Arena to allocate
+ * @param [in] domain domain that has ownership of arena pages
+ * @param [in] order order of the arena (power of two of the size)
+ * @param [in] memflags Flags for domheap_alloc_pages()
+ * @return -ENOMEM on arena allocation error, 0 otherwise
+ */
+int iommu_arena_initialize(struct iommu_arena *arena, struct domain *domain,
+   unsigned int order, unsigned int memflags);
+
+/**
+ * Teardown a arena.
+ * @param [out] arena arena to allocate
+ * @param [in] check check for existing allocations
+ * @return -EBUSY if check is specified
+ */
+int iommu_arena_teardown(struct iommu_arena *arena, bool check);
+
+struct page_info *iommu_arena_allocate_page(struct iommu_arena *arena);
+bool iommu_arena_free_page(struct iommu_arena *arena, struct page_info *page);
+
+#define iommu_arena_size(arena) (1LLU << (arena)->order)
+
+#endif
diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/iommu.h
index 8dc464fbd3..8fb402f1ee 100644
--- a/xen/arch/x86/include/asm/iommu.h
+++ b/xen/arch/x86/include/asm/iommu.h
@@ -2,14 +2,18 @@
 #ifndef __ARCH_X86_IOMMU_H__
 #define __ARCH_X86_IOMMU_H__
 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
+#include "arena.h"
+
 #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
 
 struct g2m_ioport {
@@ -31,27 +35,48 @@ typedef uint64_t daddr_t;
 #define dfn_to_daddr(dfn) __dfn_to_daddr(dfn_x(dfn))
 #define daddr_to_dfn(daddr) _dfn(__daddr_to_dfn(daddr))
 
-struct arch_iommu
+struct arch_iommu_context
 {
-spinlock_t mapping_lock; /* io page table lock */
 struct {
 struct page_list_head list;
 spinlock_t lock;
 } pgtables;
 
-struct list_head identity_maps;
+/* Queue for freeing pages */
+struct page_list_head free_queue;
 
 union {
 /* Intel VT-d */
 struct {
 uint64_t pgd_maddr; /* io page directory machine address */
+domid_t *didmap; /* per-iommu DID */
+unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the 
context uses */
+bool duplicated_rmrr; /* tag indicating that duplicated rmrr 
mappings are mapped */
+uint32_t superpage_progress; /* superpage progress during teardown 
*/
+} vtd;
+/* AMD IOMMU */
+struct {
+struct page_info *root_table;
+} amd;
+};
+};
+
+struct arch_iommu
+{
+spinlock_t lock; /* io page table lock */
+struct list_head identity_maps;
+
+struct iommu_arena pt_arena; /* allocator for non-default contexts */
+
+union {
+/* Intel VT-d */
+struct {
 unsigned int agaw; /* adjusted guest address width, 0 is level 2 
30-bit */
-unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the domain 
uses */
 } vtd;
 /* AMD IOMMU */
 struct {
 unsigned

[RFC XEN PATCH v3 3/5] IOMMU: Introduce redesigned IOMMU subsystem

2024-07-11 Thread Teddy Astie
Based on docs/designs/iommu-contexts.md, implement the redesigned IOMMU 
subsystem.

Signed-off-by Teddy Astie 
---
Changed in V2:
* cleanup some unneeded includes
* fix dangling devices in context on detach

Changed in V3:
* add unlocked _iommu_lookup_page
* iommu_check_context+iommu_get_context -> iommu_get_context and check for NULL
* prevent IOMMU operations on dying contexts
---
 xen/arch/x86/domain.c|   2 +-
 xen/arch/x86/mm/p2m-ept.c|   2 +-
 xen/arch/x86/pv/dom0_build.c |   4 +-
 xen/arch/x86/tboot.c |   4 +-
 xen/common/memory.c  |   4 +-
 xen/drivers/passthrough/Makefile |   3 +
 xen/drivers/passthrough/context.c| 649 +++
 xen/drivers/passthrough/iommu.c  | 337 --
 xen/drivers/passthrough/pci.c|  49 +-
 xen/drivers/passthrough/quarantine.c |  49 ++
 xen/include/xen/iommu.h  | 120 -
 xen/include/xen/pci.h|   3 +
 12 files changed, 921 insertions(+), 305 deletions(-)
 create mode 100644 xen/drivers/passthrough/context.c
 create mode 100644 xen/drivers/passthrough/quarantine.c

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index ccadfe0c9e..1dd2453d71 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2405,7 +2405,7 @@ int domain_relinquish_resources(struct domain *d)
 
 PROGRESS(iommu_pagetables):
 
-ret = iommu_free_pgtables(d);
+ret = iommu_free_pgtables(d, iommu_default_context(d));
 if ( ret )
 return ret;
 
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 469e27ee93..80026a9cb9 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -975,7 +975,7 @@ out:
 rc = iommu_iotlb_flush(d, _dfn(gfn), 1ul << order,
(iommu_flags ? IOMMU_FLUSHF_added : 0) |
(vtd_pte_present ? IOMMU_FLUSHF_modified
-: 0));
+: 0), 0);
 else if ( need_iommu_pt_sync(d) )
 rc = iommu_flags ?
 iommu_legacy_map(d, _dfn(gfn), mfn, 1ul << order, iommu_flags) 
:
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index d8043fa58a..db7298737d 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -76,7 +76,7 @@ static __init void mark_pv_pt_pages_rdonly(struct domain *d,
  * iommu_memory_setup() ended up mapping them.
  */
 if ( need_iommu_pt_sync(d) &&
- iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, 0, flush_flags) 
)
+ iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, 0, flush_flags, 
0) )
 BUG();
 
 /* Read-only mapping + PGC_allocated + page-table page. */
@@ -127,7 +127,7 @@ static void __init iommu_memory_setup(struct domain *d, 
const char *what,
 
 while ( (rc = iommu_map(d, _dfn(mfn_x(mfn)), mfn, nr,
 IOMMUF_readable | IOMMUF_writable | IOMMUF_preempt,
-flush_flags)) > 0 )
+flush_flags, 0)) > 0 )
 {
 mfn = mfn_add(mfn, rc);
 nr -= rc;
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index ba0700d2d5..ca55306830 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -216,9 +216,9 @@ static void tboot_gen_domain_integrity(const uint8_t 
key[TB_KEY_SIZE],
 
 if ( is_iommu_enabled(d) && is_vtd )
 {
-const struct domain_iommu *dio = dom_iommu(d);
+struct domain_iommu *dio = dom_iommu(d);
 
-update_iommu_mac(, dio->arch.vtd.pgd_maddr,
+update_iommu_mac(, 
iommu_default_context(d)->arch.vtd.pgd_maddr,
  agaw_to_level(dio->arch.vtd.agaw));
 }
 }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index de2cc7ad92..0eb0f9da7b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -925,7 +925,7 @@ int xenmem_add_to_physmap(struct domain *d, struct 
xen_add_to_physmap *xatp,
 this_cpu(iommu_dont_flush_iotlb) = 0;
 
 ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), done,
-IOMMU_FLUSHF_modified);
+IOMMU_FLUSHF_modified, 0);
 if ( unlikely(ret) && rc >= 0 )
 rc = ret;
 
@@ -939,7 +939,7 @@ int xenmem_add_to_physmap(struct domain *d, struct 
xen_add_to_physmap *xatp,
 put_page(pages[i]);
 
 ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done,
-IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
+IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified, 0);
 if ( unlikely(ret) && rc >= 0 )
 rc = ret;
 }
diff --git a/xen/drivers/passthroug

[RFC XEN PATCH v3 1/5] docs/designs: Add a design document for PV-IOMMU

2024-07-11 Thread Teddy Astie
Some operating systems want to use IOMMU to implement various features (e.g
VFIO) or DMA protection.
This patch introduce a proposal for IOMMU paravirtualization for Dom0.

Signed-off-by Teddy Astie 
---
 docs/designs/pv-iommu.md | 105 +++
 1 file changed, 105 insertions(+)
 create mode 100644 docs/designs/pv-iommu.md

diff --git a/docs/designs/pv-iommu.md b/docs/designs/pv-iommu.md
new file mode 100644
index 00..c01062a3ad
--- /dev/null
+++ b/docs/designs/pv-iommu.md
@@ -0,0 +1,105 @@
+# IOMMU paravirtualization for Dom0
+
+Status: Experimental
+
+# Background
+
+By default, Xen only uses the IOMMU for itself, either to make device adress
+space coherent with guest adress space (x86 HVM/PVH) or to prevent devices
+from doing DMA outside it's expected memory regions including the hypervisor
+(x86 PV).
+
+A limitation is that guests (especially privildged ones) may want to use
+IOMMU hardware in order to implement features such as DMA protection and
+VFIO [1] as IOMMU functionality is not available outside of the hypervisor
+currently.
+
+[1] VFIO - "Virtual Function I/O" - 
https://www.kernel.org/doc/html/latest/driver-api/vfio.html
+
+# Design
+
+The operating system may want to have access to various IOMMU features such as
+context management and DMA remapping. We can create a new hypercall that allows
+the guest to have access to a new paravirtualized IOMMU interface.
+
+This feature is only meant to be available for the Dom0, as DomU have some
+emulated devices that can't be managed on Xen side and are not hardware, we
+can't rely on the hardware IOMMU to enforce DMA remapping.
+
+This interface is exposed under the `iommu_op` hypercall.
+
+In addition, Xen domains are modified in order to allow existence of several
+IOMMU context including a default one that implement default behavior (e.g
+hardware assisted paging) and can't be modified by guest. DomU cannot have
+contexts, and therefore act as if they only have the default domain.
+
+Each IOMMU context within a Xen domain is identified using a domain-specific
+context number that is used in the Xen IOMMU subsystem and the hypercall
+interface.
+
+The number of IOMMU context a domain can use is predetermined at domain 
creation
+and is configurable through `dom0-iommu=nb-ctx=N` xen cmdline.
+
+# IOMMU operations
+
+## Alloc context
+
+Create a new IOMMU context for the guest and return the context number to the
+guest.
+Fail if the IOMMU context limit of the guest is reached.
+
+A flag can be specified to create a identity mapping.
+
+## Free context
+
+Destroy a IOMMU context created previously.
+It is not possible to free the default context.
+
+Reattach context devices to default context if specified by the guest.
+
+Fail if there is a device in the context and reattach-to-default flag is not
+specified.
+
+## Reattach device
+
+Reattach a device to another IOMMU context (including the default one).
+The target IOMMU context number must be valid and the context allocated.
+
+The guest needs to specify a PCI SBDF of a device he has access to.
+
+## Map/unmap page
+
+Map/unmap a page on a context.
+The guest needs to specify a gfn and target dfn to map.
+
+Refuse to create the mapping if one already exist for the same dfn.
+
+## Lookup page
+
+Get the gfn mapped by a specific dfn.
+
+# Implementation considerations
+
+## Hypercall batching
+
+In order to prevent unneeded hypercalls and IOMMU flushing, it is advisable to
+be able to batch some critical IOMMU operations (e.g map/unmap multiple pages).
+
+## Hardware without IOMMU support
+
+Operating system needs to be aware on PV-IOMMU capability, and whether it is
+able to make contexts. However, some operating system may critically fail in
+case they are able to make a new IOMMU context. Which is supposed to happen
+if no IOMMU hardware is available.
+
+The hypercall interface needs a interface to advertise the ability to create
+and manage IOMMU contexts including the amount of context the guest is able
+to use. Using these informations, the Dom0 may decide whether to use or not
+the PV-IOMMU interface.
+
+## Page pool for contexts
+
+In order to prevent unexpected starving on the hypervisor memory with a
+buggy Dom0. We can preallocate the pages the contexts will use and make
+map/unmap use these pages instead of allocating them dynamically.
+
-- 
2.45.2



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



Re: Disaggregated (Xoar) Dom0 Building

2024-06-28 Thread Teddy Astie
Hi Lonnie,

There are dedicated Matrix channels for chat
https://xenproject.org/help/matrix/

Teddy

Le 27/06/2024 à 15:38, Lonnie a écrit :
> Hi Teddy,
>
> You are actually on track with what I was thinking in this area which 
> initially gave me 2 main ideas:
>
> 1. Take the NOVA Microhypervisor (very small TCB at only 5K LOC) and try to 
> get QEMU or Bhyve integrated as the VMM which would require a huge amount of 
> development time.  The Genode framework has a configuration/compile approach 
> that uses NOVA with a custom VirtualBox, but I did not want to go that route.
>
> 2. Take the Alpine XEN distro as the base and then update the dated Xoar 
> patches which effectively breaks Dom0 into 9 Service and Driver Mini/Nano VMs 
> for which I was thinking about further setting them up as ultra-thin 
> Unikernels (MirageOS, IncludeOS, etc.) but still researching.
>
> My effort is to make a purely Ultra-Thin RAM-Based Xen Hypervisor that boots 
> UEFI for modern systems. Plus a number of other features if all goes well.
>
> Your ideas of QEMU as a Unikernel would probably really work for both XEN and 
> NOVA (with a bit of work on the NOVA side). I actually liked NOVA and 
> experimented with it a while back being able to produce a very lightweight 
> Microhypervisor ISO that would boot and do some simple things and even fire 
> up lightweight Linux instances but with very limited capabilities, of course, 
> just to see if it worked. Unfortunately, that direction although very 
> interesting, would definitely take too much development to make a viable and 
> more complete hypervisor.  I did like that you could easily start with no VM 
> and easily start one or more and then use Hot-Keys to flip between consoles. 
> That was pretty cool and is something that I would like to see about working 
> into this XEN effort as well maybe some config file in the Xen.efi directory 
> for that or something but am still thinking about it.
>
> I think that perhaps the Alpine-XEN-Xoar approach could be benefitual but 
> XEN, plus supporting libraries is still a bit larger than I would have hoped 
> although you get more capabilities and more of a solid hypervisor as well, I 
> think.  Maybe we can chat more about things if you like.
>
> Best,
> Lonnie
> On Thursday, June 27, 2024 14:38 CEST, Teddy Astie  
> wrote:
>
>> Hi Lonnie,
>>
>> Le 27/06/2024 à 11:33, Lonnie Cumberland a écrit :
>>> I am working towards is to have
>>> everything as a RAM-based ultra-lightweight thin hypervisor.   I looked
>>> over ACRN, the NOVA Microhypervisor (Headron, Beadrock Udo),
>>> Rust-Shyper, Bareflank-MicroV, and many other development efforts but it
>>> seems that Xen is the most advanced for my purposes here.
>>>
>>
>> You can have a disk-less (or ramdisk based) distro supporting Xen with
>> Alpine Linux (with Xen flavour). It does still use Dom0 with all its
>> responsibilities though.
>>
>>>>> Currently, I am investigating and researching the ideas of
>>>>> "Disaggregating" Dom0 and have the Xoar Xen patches ("Breaking Up is
>>>>> Hard to Do: Security and Functionality in a Commodity Hypervisor"
>>>>> 2011) available which were developed against version 22155 of
>>>>> xen-unstable. The Linux patches are against Linux with pvops
>>>>> 2.6.31.13 and developed on a standard Ubuntu 10.04 install. My effort
>>>>> would also be up update these patches.
>>>>>
>>>>> I have been able to locate the Xen "Dom0 Disaggregation"
>>>>> (https://wiki.xenproject.org/wiki/Dom0_Disaggregation) am reading up
>>>>> on things now but wanted to ask the developers list about any
>>>>> experience you may have had in this area since the research objective
>>>>> is to integrate Xoar with the latest Xen 4.20, if possible, and to
>>>>> take it further to basically eliminate Dom0 all together with
>>>>> individual Mini-OS or Unikernel "Service and Driver VM's" instead
>>>>> that are loaded at UEFI boot time.
>>
>> The latest stuff going on I have in mind regarding this idea of moving
>> stuff out of Dom0 is QEMU as Unikernel (using Unikraft), there were some
>> discussions on this in Matrix and at Xen Summit, and it's currently work
>> in progress from Unikraft side.
>>
>> Teddy
>>
>>
>> Teddy Astie | Vates XCP-ng Intern
>>
>> XCP-ng & Xen Orchestra - Vates solutions
>>
>> web: https://vates.tech
>>
>


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




Re: Disaggregated (Xoar) Dom0 Building

2024-06-27 Thread Teddy Astie
Hi Lonnie,

Le 27/06/2024 à 11:33, Lonnie Cumberland a écrit :
> I am working towards is to have
> everything as a RAM-based ultra-lightweight thin hypervisor.   I looked
> over ACRN, the NOVA Microhypervisor (Headron, Beadrock Udo),
> Rust-Shyper, Bareflank-MicroV, and many other development efforts but it
> seems that Xen is the most advanced for my purposes here.
>

You can have a disk-less (or ramdisk based) distro supporting Xen with
Alpine Linux (with Xen flavour). It does still use Dom0 with all its
responsibilities though.

>>> Currently, I am investigating and researching the ideas of
>>> "Disaggregating" Dom0 and have the Xoar Xen patches ("Breaking Up is
>>> Hard to Do: Security and Functionality in a Commodity Hypervisor"
>>> 2011) available which were developed against version 22155 of
>>> xen-unstable. The Linux patches are against Linux with pvops
>>> 2.6.31.13 and developed on a standard Ubuntu 10.04 install. My effort
>>> would also be up update these patches.
>>>
>>> I have been able to locate the Xen "Dom0 Disaggregation"
>>> (https://wiki.xenproject.org/wiki/Dom0_Disaggregation) am reading up
>>> on things now but wanted to ask the developers list about any
>>> experience you may have had in this area since the research objective
>>> is to integrate Xoar with the latest Xen 4.20, if possible, and to
>>> take it further to basically eliminate Dom0 all together with
>>> individual Mini-OS or Unikernel "Service and Driver VM's" instead
>>> that are loaded at UEFI boot time.

The latest stuff going on I have in mind regarding this idea of moving
stuff out of Dom0 is QEMU as Unikernel (using Unikraft), there were some
discussions on this in Matrix and at Xen Summit, and it's currently work
in progress from Unikraft side.

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




Re: [RFC XEN PATCH v2 0/5] IOMMU subsystem redesign and PV-IOMMU interface

2024-06-26 Thread Teddy Astie
forgot to add that this patch series is rebased on top of staging

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



Re: [RFC PATCH v2] iommu/xen: Add Xen PV-IOMMU driver

2024-06-26 Thread Teddy Astie
Hello Robin,

Le 26/06/2024 à 14:09, Robin Murphy a écrit :
> On 2024-06-24 3:36 pm, Teddy Astie wrote:
>> Hello Robin,
>> Thanks for the thourough review.
>>
>>>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>>>> index 0af39bbbe3a3..242cefac77c9 100644
>>>> --- a/drivers/iommu/Kconfig
>>>> +++ b/drivers/iommu/Kconfig
>>>> @@ -480,6 +480,15 @@ config VIRTIO_IOMMU
>>>>  Say Y here if you intend to run this kernel as a guest.
>>>> +config XEN_IOMMU
>>>> +    bool "Xen IOMMU driver"
>>>> +    depends on XEN_DOM0
>>>
>>> Clearly this depends on X86 as well.
>>>
>> Well, I don't intend this driver to be X86-only, even though the current
>> Xen RFC doesn't support ARM (yet). Unless there is a counter-indication
>> for it ?
>
> It's purely practical - even if you drop the asm/iommu.h stuff it would
> still break ARM DOM0 builds due to HYPERVISOR_iommu_op() only being
> defined for x86. And it's better to add a dependency here to make it
> clear what's *currently* supported, than to add dummy code to allow it
> to build for ARM if that's not actually tested or usable yet.
>

Ok, it does exist from the hypervisor side (even though a such Xen build
is not possible yet), I suppose I just need to add relevant hypercall
interfaces for arm/arm64 in hypercall{.S,.h}.

>>>> +bool xen_iommu_capable(struct device *dev, enum iommu_cap cap)
>>>> +{
>>>> +    switch (cap) {
>>>> +    case IOMMU_CAP_CACHE_COHERENCY:
>>>> +    return true;
>>>
>>> Will the PV-IOMMU only ever be exposed on hardware where that really is
>>> always true?
>>>
>>
>> On the hypervisor side, the PV-IOMMU interface always implicitely flush
>> the IOMMU hardware on map/unmap operation, so at the end of the
>> hypercall, the cache should be always coherent IMO.
>
> As Jason already brought up, this is not about TLBs or anything cached
> by the IOMMU itself, it's about the memory type(s) it can create
> mappings with. Returning true here says Xen guarantees it can use a
> cacheable memory type which will let DMA snoop the CPU caches.
> Furthermore, not explicitly handling IOMMU_CACHE in the map_pages op
> then also implies that it will *always* do that, so you couldn't
> actually get an uncached mapping even if you wanted one.
>

Yes, this is a point we are currently discussing on the Xen side.

>>>> +    while (xen_pg_count) {
>>>> +    size_t to_unmap = min(xen_pg_count, max_nr_pages);
>>>> +
>>>> +    //pr_info("Unmapping %lx-%lx\n", dfn, dfn + to_unmap - 1);
>>>> +
>>>> +    op.unmap_pages.dfn = dfn;
>>>> +    op.unmap_pages.nr_pages = to_unmap;
>>>> +
>>>> +    ret = HYPERVISOR_iommu_op();
>>>> +
>>>> +    if (ret)
>>>> +    pr_warn("Unmap failure (%lx-%lx)\n", dfn, dfn + to_unmap
>>>> - 1);
>>>
>>> But then how
>>>> would it ever happen anyway? Unmap is a domain op, so a domain which
>>>> doesn't allow unmapping shouldn't offer it in the first place...
>>
>> Unmap failing should be exceptionnal, but is possible e.g with
>> transparent superpages (like Xen IOMMU drivers do). Xen drivers folds
>> appropriate contiguous mappings into superpages entries to optimize
>> memory usage and iotlb. However, if you unmap in the middle of a region
>> covered by a superpage entry, this is no longer a valid superpage entry,
>> and you need to allocate and fill the lower levels, which is faillible
>> if lacking memory.
>
> OK, so in the worst case you could potentially have a partial unmap
> failure if the range crosses a superpage boundary and the end part
> happens to have been folded, and Xen doesn't detect and prepare that
> allocation until it's already unmapped up to the boundary. If that is
> so, does the hypercall interface give any information about partial
> failure, or can any error only be taken to mean that some or all of the
> given range may or may not have be unmapped now?

The hypercall interface has op.unmap_page.unmapped to indicate the real
amount of pages actually unmapped even in case of error. If it is less
than requested initially, it will retry with the remaining part,
certainly fail afterward. Although, I have the impression that it is
going to fail silently.

>>> In this case I'd argue that you really *do* want to return short, in the
>>> hope of propagating the error back up and letting the caller know the
>>> addr

Re: [RFC PATCH v2] iommu/xen: Add Xen PV-IOMMU driver

2024-06-24 Thread Teddy Astie
Hello Robin,
Thanks for the thourough review.

>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index 0af39bbbe3a3..242cefac77c9 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -480,6 +480,15 @@ config VIRTIO_IOMMU
>>     Say Y here if you intend to run this kernel as a guest.
>> +config XEN_IOMMU
>> +    bool "Xen IOMMU driver"
>> +    depends on XEN_DOM0
>
> Clearly this depends on X86 as well.
>
Well, I don't intend this driver to be X86-only, even though the current
Xen RFC doesn't support ARM (yet). Unless there is a counter-indication
for it ?

>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>
> Please drop this; it's a driver, not a DMA ops implementation.
>
Sure, in addition to some others unneeded headers.

>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +MODULE_DESCRIPTION("Xen IOMMU driver");
>> +MODULE_AUTHOR("Teddy Astie ");
>> +MODULE_LICENSE("GPL");
>> +
>> +#define MSI_RANGE_START    (0xfee0)
>> +#define MSI_RANGE_END    (0xfeef)
>> +
>> +#define XEN_IOMMU_PGSIZES   (0x1000)
>> +
>> +struct xen_iommu_domain {
>> +    struct iommu_domain domain;
>> +
>> +    u16 ctx_no; /* Xen PV-IOMMU context number */
>> +};
>> +
>> +static struct iommu_device xen_iommu_device;
>> +
>> +static uint32_t max_nr_pages;
>> +static uint64_t max_iova_addr;
>> +
>> +static spinlock_t lock;
>
> Not a great name - usually it's good to name a lock after what it
> protects. Although perhaps it is already, since AFAICS this isn't
> actually used anywhere anyway.
>

Yes, that shouldn't be there.

>> +static inline struct xen_iommu_domain *to_xen_iommu_domain(struct
>> iommu_domain *dom)
>> +{
>> +    return container_of(dom, struct xen_iommu_domain, domain);
>> +}
>> +
>> +static inline u64 addr_to_pfn(u64 addr)
>> +{
>> +    return addr >> 12;
>> +}
>> +
>> +static inline u64 pfn_to_addr(u64 pfn)
>> +{
>> +    return pfn << 12;
>> +}
>> +
>> +bool xen_iommu_capable(struct device *dev, enum iommu_cap cap)
>> +{
>> +    switch (cap) {
>> +    case IOMMU_CAP_CACHE_COHERENCY:
>> +    return true;
>
> Will the PV-IOMMU only ever be exposed on hardware where that really is
> always true?
>

On the hypervisor side, the PV-IOMMU interface always implicitely flush
the IOMMU hardware on map/unmap operation, so at the end of the
hypercall, the cache should be always coherent IMO.

>> +
>> +static struct iommu_device *xen_iommu_probe_device(struct device *dev)
>> +{
>> +    if (!dev_is_pci(dev))
>> +    return ERR_PTR(-ENODEV);
>> +
>> +    return _iommu_device;
>
> Even emulated PCI devices have to have an (emulated, presumably) IOMMU?
>

No and that's indeed one thing to check.

>> +}
>> +
>> +static void xen_iommu_probe_finalize(struct device *dev)
>> +{
>> +    set_dma_ops(dev, NULL);
>> +    iommu_setup_dma_ops(dev, 0, max_iova_addr);
>

That got changed in 6.10, good to know that this code is not required
anymore.

>
>> +}
>> +
>> +static int xen_iommu_map_pages(struct iommu_domain *domain, unsigned
>> long iova,
>> +   phys_addr_t paddr, size_t pgsize, size_t pgcount,
>> +   int prot, gfp_t gfp, size_t *mapped)
>> +{
>> +    size_t xen_pg_count = (pgsize / XEN_PAGE_SIZE) * pgcount;
>
> You only advertise the one page size, so you'll always get that back,
> and this seems a bit redundant.
>

Yes

>> +    struct xen_iommu_domain *dom = to_xen_iommu_domain(domain);
>> +    struct pv_iommu_op op = {
>> +    .subop_id = IOMMUOP_map_pages,
>> +    .flags = 0,
>> +    .ctx_no = dom->ctx_no
>> +    };
>> +    /* NOTE: paddr is actually bound to pfn, not gfn */
>> +    uint64_t pfn = addr_to_pfn(paddr);
>> +    uint64_t dfn = addr_to_pfn(iova);
>> +    int ret = 0;
>> +
>> +    //pr_info("Mapping to %lx %zu %zu paddr %x\n", iova, pgsize,
>> pgcount, paddr);
>
> Please try to clean up debugging leftovers before posting the patch (but
> also note that there are already t

Re: [RFC PATCH] iommu/xen: Add Xen PV-IOMMU driver

2024-06-21 Thread Teddy Astie
Hello Jason,

Le 19/06/2024 à 18:30, Jason Gunthorpe a écrit :
> On Thu, Jun 13, 2024 at 01:50:22PM +0000, Teddy Astie wrote:
>
>> +struct iommu_domain *xen_iommu_domain_alloc(unsigned type)
>> +{
>> +struct xen_iommu_domain *domain;
>> +u16 ctx_no;
>> +int ret;
>> +
>> +if (type & IOMMU_DOMAIN_IDENTITY) {
>> +/* use default domain */
>> +ctx_no = 0;
>
> Please use the new ops, domain_alloc_paging and the static identity domain.

Yes, in the v2, I will use this newer interface.

I have a question on this new interface : is it valid to not have a
identity domain (and "default domain" being blocking); well in the
current implementation it doesn't really matter, but at some point, we
may want to allow not having it (thus making this driver mandatory).

>
>> +static struct iommu_group *xen_iommu_device_group(struct device *dev)
>> +{
>> +if (!dev_is_pci(dev))
>> +return ERR_PTR(-ENODEV);
>> +
>
> device_group is only called after probe_device, since you already
> exclude !pci during probe there is no need for this wrapper, just set
> the op directly to pci_device_group.
>
>> +if (!dev_is_pci(dev))
>> +return;
>
> No op is ever called on a non-probed device, remove all these checks.
>
>
> A paging domain should be the only domain ops that have a populated
> map so this should be made impossible by construction
Makes sense, will remove these redundant checks in v2.

>
>> +static void xen_iommu_release_device(struct device *dev)
>> +{
>> +int ret;
>> +struct pci_dev *pdev;
>> +struct pv_iommu_op op = {
>> +.subop_id = IOMMUOP_reattach_device,
>> +.flags = 0,
>> +.ctx_no = 0 /* reattach device back to default context */
>> +};
>
> Consider if you can use release_domain for this, I think this is
> probably a BLOCKED domain behavior.

The goal is to put back all devices where they were at the beginning
(the default "context"), which is what release_domain looks like it is
doing. Will use it for v2.

>
> Jason

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




Re: [RFC PATCH] iommu/xen: Add Xen PV-IOMMU driver

2024-06-17 Thread Teddy Astie
Le 13/06/2024 à 16:32, Jan Beulich a écrit :
> On 13.06.2024 15:50, Teddy Astie wrote:
>> @@ -214,6 +215,38 @@ struct xen_add_to_physmap_range {
>>   };
>>   DEFINE_GUEST_HANDLE_STRUCT(xen_add_to_physmap_range);
>>
>> +/*
>> + * With some legacy devices, certain guest-physical addresses cannot safely
>> + * be used for other purposes, e.g. to map guest RAM.  This hypercall
>> + * enumerates those regions so the toolstack can avoid using them.
>> + */
>> +#define XENMEM_reserved_device_memory_map   27
>> +struct xen_reserved_device_memory {
>> +xen_pfn_t start_pfn;
>> +xen_ulong_t nr_pages;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(xen_reserved_device_memory);
>> +
>> +struct xen_reserved_device_memory_map {
>> +#define XENMEM_RDM_ALL 1 /* Request all regions (ignore dev union). */
>> +/* IN */
>> +uint32_t flags;
>> +/*
>> + * IN/OUT
>> + *
>> + * Gets set to the required number of entries when too low,
>> + * signaled by error code -ERANGE.
>> + */
>> +unsigned int nr_entries;
>> +/* OUT */
>> +GUEST_HANDLE(xen_reserved_device_memory) buffer;
>> +/* IN */
>> +union {
>> +struct physdev_pci_device pci;
>> +} dev;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(xen_reserved_device_memory_map);
>
> This is a tools-only (i.e. unstable) sub-function in Xen; even the comment
> at the top says "toolstack". It is therefore not suitable for use in a
> kernel.
>
IMO this comment actually describes how the toolstack uses the
hypercall, but I don't think it is actually reserved for toolstack use.
Or maybe we should allow the kernel to use this hypercall as well.

>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> \ No newline at end of file
>
> Nit: I'm pretty sure you want to avoid this.
>
Indeed.

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




[RFC PATCH] iommu/xen: Add Xen PV-IOMMU driver

2024-06-13 Thread Teddy Astie
In the context of Xen, Linux runs as Dom0 and doesn't have access to the
machine IOMMU. Although, a IOMMU is mandatory to use some kernel features
such as VFIO or DMA protection.

In Xen, we added a paravirtualized IOMMU with iommu_op hypercall in order
to allow Dom0 to implement such feature. This commit introduces a new
IOMMU driver that uses this new hypercall interface.

Signed-off-by Teddy Astie 
---
 arch/x86/include/asm/xen/hypercall.h |   6 +
 drivers/iommu/Kconfig|   9 +
 drivers/iommu/Makefile   |   1 +
 drivers/iommu/xen-iommu.c| 508 +++
 include/xen/interface/memory.h   |  33 ++
 include/xen/interface/pv-iommu.h | 114 ++
 include/xen/interface/xen.h  |   1 +
 7 files changed, 672 insertions(+)
 create mode 100644 drivers/iommu/xen-iommu.c
 create mode 100644 include/xen/interface/pv-iommu.h

diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index a2dd24947eb8..6b1857f27c14 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -490,6 +490,12 @@ HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
return _hypercall2(int, xenpmu_op, op, arg);
 }
 
+static inline int
+HYPERVISOR_iommu_op(void *arg)
+{
+   return _hypercall1(int, iommu_op, arg);
+}
+
 static inline int
 HYPERVISOR_dm_op(
domid_t dom, unsigned int nr_bufs, struct xen_dm_op_buf *bufs)
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 2b12b583ef4b..8d8a22b91e34 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -482,6 +482,15 @@ config VIRTIO_IOMMU
 
  Say Y here if you intend to run this kernel as a guest.
 
+config XEN_IOMMU
+   bool "Xen IOMMU driver"
+   depends on XEN_DOM0
+   select IOMMU_API
+   help
+   Xen PV-IOMMU driver for Dom0.
+
+   Say Y here if you intend to run this guest as Xen Dom0.
+
 config SPRD_IOMMU
tristate "Unisoc IOMMU Support"
depends on ARCH_SPRD || COMPILE_TEST
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 769e43d780ce..11fa258d3a04 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -30,3 +30,4 @@ obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
 obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o io-pgfault.o
 obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
 obj-$(CONFIG_APPLE_DART) += apple-dart.o
+obj-$(CONFIG_XEN_IOMMU) += xen-iommu.o
\ No newline at end of file
diff --git a/drivers/iommu/xen-iommu.c b/drivers/iommu/xen-iommu.c
new file mode 100644
index ..2c8e42240a6b
--- /dev/null
+++ b/drivers/iommu/xen-iommu.c
@@ -0,0 +1,508 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Xen PV-IOMMU driver.
+ *
+ * Copyright (C) 2024 Vates SAS
+ *
+ * Author: Teddy Astie 
+ *
+ */
+
+#define pr_fmt(fmt)"xen-iommu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_DESCRIPTION("Xen IOMMU driver");
+MODULE_AUTHOR("Teddy Astie ");
+MODULE_LICENSE("GPL");
+
+#define MSI_RANGE_START(0xfee0)
+#define MSI_RANGE_END  (0xfeef)
+
+#define XEN_IOMMU_PGSIZES   (0x1000)
+
+struct xen_iommu_domain {
+   struct iommu_domain domain;
+
+   u16 ctx_no; /* Xen PV-IOMMU context number */
+};
+
+static struct iommu_device xen_iommu_device;
+
+static uint32_t max_nr_pages;
+static uint64_t max_iova_addr;
+
+static spinlock_t lock;
+
+static inline struct xen_iommu_domain *to_xen_iommu_domain(struct iommu_domain 
*dom)
+{
+   return container_of(dom, struct xen_iommu_domain, domain);
+}
+
+static inline u64 addr_to_pfn(u64 addr)
+{
+   return addr >> 12;
+}
+
+static inline u64 pfn_to_addr(u64 pfn)
+{
+   return pfn << 12;
+}
+
+bool xen_iommu_capable(struct device *dev, enum iommu_cap cap)
+{
+   switch (cap) {
+   case IOMMU_CAP_CACHE_COHERENCY:
+   return true;
+
+   default:
+   return false;
+   }
+}
+
+struct iommu_domain *xen_iommu_domain_alloc(unsigned type)
+{
+   struct xen_iommu_domain *domain;
+   u16 ctx_no;
+   int ret;
+
+   if (type & IOMMU_DOMAIN_IDENTITY) {
+   /* use default domain */
+   ctx_no = 0;
+   } else {
+   struct pv_iommu_op op = {
+   .ctx_no = 0,
+   .flags = 0,
+   .subop_id = IOMMUOP_alloc_context
+   };
+
+   ret = HYPERVISOR_iommu_op();
+
+   if (ret) {
+   pr_err("Unable to create Xen IOMMU context (%d)", ret);
+   return ERR_PTR(ret);
+   }
+
+   ctx_no = op.ctx_no;
+

[RFC XEN PATCH 4/5] VT-d: Port IOMMU driver to new subsystem

2024-06-13 Thread Teddy Astie
Port the driver with guidances specified in iommu-contexts.md.

Add a arena-based allocator for allocating a fixed chunk of memory and
split it into 4k pages for use by the IOMMU contexts. This chunk size
is configurable with X86_ARENA_ORDER and dom0-iommu=arena-order=N.

Signed-off-by Teddy Astie 
---
Missing in this RFC

Preventing guest from mapping on top of reserved regions.
Reattach RMRR failure cleanup code is incomplete and can cause issues with a
subsequent teardown operation.
---
 xen/arch/x86/include/asm/arena.h |   54 +
 xen/arch/x86/include/asm/iommu.h |   44 +-
 xen/arch/x86/include/asm/pci.h   |   17 -
 xen/drivers/passthrough/Kconfig  |   14 +
 xen/drivers/passthrough/vtd/Makefile |2 +-
 xen/drivers/passthrough/vtd/extern.h |   14 +-
 xen/drivers/passthrough/vtd/iommu.c  | 1555 +++---
 xen/drivers/passthrough/vtd/quirks.c |   21 +-
 xen/drivers/passthrough/x86/Makefile |1 +
 xen/drivers/passthrough/x86/arena.c  |  157 +++
 xen/drivers/passthrough/x86/iommu.c  |  104 +-
 11 files changed, 980 insertions(+), 1003 deletions(-)
 create mode 100644 xen/arch/x86/include/asm/arena.h
 create mode 100644 xen/drivers/passthrough/x86/arena.c

diff --git a/xen/arch/x86/include/asm/arena.h b/xen/arch/x86/include/asm/arena.h
new file mode 100644
index 00..7555b100e0
--- /dev/null
+++ b/xen/arch/x86/include/asm/arena.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/**
+ * Simple arena-based page allocator.
+ */
+
+#ifndef __XEN_IOMMU_ARENA_H__
+#define __XEN_IOMMU_ARENA_H__
+
+#include "xen/domain.h"
+#include "xen/atomic.h"
+#include "xen/mm-frame.h"
+#include "xen/types.h"
+
+/**
+ * struct page_arena: Page arena structure
+ */
+struct iommu_arena {
+/* mfn of the first page of the memory region */
+mfn_t region_start;
+/* bitmap of allocations */
+unsigned long *map;
+
+/* Order of the arena */
+unsigned int order;
+
+/* Used page count */
+atomic_t used_pages;
+};
+
+/**
+ * Initialize a arena using domheap allocator.
+ * @param [out] arena Arena to allocate
+ * @param [in] domain domain that has ownership of arena pages
+ * @param [in] order order of the arena (power of two of the size)
+ * @param [in] memflags Flags for domheap_alloc_pages()
+ * @return -ENOMEM on arena allocation error, 0 otherwise
+ */
+int iommu_arena_initialize(struct iommu_arena *arena, struct domain *domain,
+   unsigned int order, unsigned int memflags);
+
+/**
+ * Teardown a arena.
+ * @param [out] arena arena to allocate
+ * @param [in] check check for existing allocations
+ * @return -EBUSY if check is specified
+ */
+int iommu_arena_teardown(struct iommu_arena *arena, bool check);
+
+struct page_info *iommu_arena_allocate_page(struct iommu_arena *arena);
+bool iommu_arena_free_page(struct iommu_arena *arena, struct page_info *page);
+
+#define iommu_arena_size(arena) (1LLU << (arena)->order)
+
+#endif
diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/iommu.h
index 8dc464fbd3..8fb402f1ee 100644
--- a/xen/arch/x86/include/asm/iommu.h
+++ b/xen/arch/x86/include/asm/iommu.h
@@ -2,14 +2,18 @@
 #ifndef __ARCH_X86_IOMMU_H__
 #define __ARCH_X86_IOMMU_H__
 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
+#include "arena.h"
+
 #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
 
 struct g2m_ioport {
@@ -31,27 +35,48 @@ typedef uint64_t daddr_t;
 #define dfn_to_daddr(dfn) __dfn_to_daddr(dfn_x(dfn))
 #define daddr_to_dfn(daddr) _dfn(__daddr_to_dfn(daddr))
 
-struct arch_iommu
+struct arch_iommu_context
 {
-spinlock_t mapping_lock; /* io page table lock */
 struct {
 struct page_list_head list;
 spinlock_t lock;
 } pgtables;
 
-struct list_head identity_maps;
+/* Queue for freeing pages */
+struct page_list_head free_queue;
 
 union {
 /* Intel VT-d */
 struct {
 uint64_t pgd_maddr; /* io page directory machine address */
+domid_t *didmap; /* per-iommu DID */
+unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the 
context uses */
+bool duplicated_rmrr; /* tag indicating that duplicated rmrr 
mappings are mapped */
+uint32_t superpage_progress; /* superpage progress during teardown 
*/
+} vtd;
+/* AMD IOMMU */
+struct {
+struct page_info *root_table;
+} amd;
+};
+};
+
+struct arch_iommu
+{
+spinlock_t lock; /* io page table lock */
+struct list_head identity_maps;
+
+struct iommu_arena pt_arena; /* allocator for non-default contexts */
+
+union {
+/* Intel VT-d */
+struct {
 unsigned int agaw; /* adjusted guest address width, 0 is level 2 
30-bit */
-unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the domain 
uses */
 } vtd;
 /* AMD IOMMU

[RFC XEN PATCH 1/5] docs/designs: Add a design document for PV-IOMMU

2024-06-13 Thread Teddy Astie
Some operating systems want to use IOMMU to implement various features (e.g
VFIO) or DMA protection.
This patch introduce a proposal for IOMMU paravirtualization for Dom0.

Signed-off-by Teddy Astie 
---
 docs/designs/pv-iommu.md | 105 +++
 1 file changed, 105 insertions(+)
 create mode 100644 docs/designs/pv-iommu.md

diff --git a/docs/designs/pv-iommu.md b/docs/designs/pv-iommu.md
new file mode 100644
index 00..c01062a3ad
--- /dev/null
+++ b/docs/designs/pv-iommu.md
@@ -0,0 +1,105 @@
+# IOMMU paravirtualization for Dom0
+
+Status: Experimental
+
+# Background
+
+By default, Xen only uses the IOMMU for itself, either to make device adress
+space coherent with guest adress space (x86 HVM/PVH) or to prevent devices
+from doing DMA outside it's expected memory regions including the hypervisor
+(x86 PV).
+
+A limitation is that guests (especially privildged ones) may want to use
+IOMMU hardware in order to implement features such as DMA protection and
+VFIO [1] as IOMMU functionality is not available outside of the hypervisor
+currently.
+
+[1] VFIO - "Virtual Function I/O" - 
https://www.kernel.org/doc/html/latest/driver-api/vfio.html
+
+# Design
+
+The operating system may want to have access to various IOMMU features such as
+context management and DMA remapping. We can create a new hypercall that allows
+the guest to have access to a new paravirtualized IOMMU interface.
+
+This feature is only meant to be available for the Dom0, as DomU have some
+emulated devices that can't be managed on Xen side and are not hardware, we
+can't rely on the hardware IOMMU to enforce DMA remapping.
+
+This interface is exposed under the `iommu_op` hypercall.
+
+In addition, Xen domains are modified in order to allow existence of several
+IOMMU context including a default one that implement default behavior (e.g
+hardware assisted paging) and can't be modified by guest. DomU cannot have
+contexts, and therefore act as if they only have the default domain.
+
+Each IOMMU context within a Xen domain is identified using a domain-specific
+context number that is used in the Xen IOMMU subsystem and the hypercall
+interface.
+
+The number of IOMMU context a domain can use is predetermined at domain 
creation
+and is configurable through `dom0-iommu=nb-ctx=N` xen cmdline.
+
+# IOMMU operations
+
+## Alloc context
+
+Create a new IOMMU context for the guest and return the context number to the
+guest.
+Fail if the IOMMU context limit of the guest is reached.
+
+A flag can be specified to create a identity mapping.
+
+## Free context
+
+Destroy a IOMMU context created previously.
+It is not possible to free the default context.
+
+Reattach context devices to default context if specified by the guest.
+
+Fail if there is a device in the context and reattach-to-default flag is not
+specified.
+
+## Reattach device
+
+Reattach a device to another IOMMU context (including the default one).
+The target IOMMU context number must be valid and the context allocated.
+
+The guest needs to specify a PCI SBDF of a device he has access to.
+
+## Map/unmap page
+
+Map/unmap a page on a context.
+The guest needs to specify a gfn and target dfn to map.
+
+Refuse to create the mapping if one already exist for the same dfn.
+
+## Lookup page
+
+Get the gfn mapped by a specific dfn.
+
+# Implementation considerations
+
+## Hypercall batching
+
+In order to prevent unneeded hypercalls and IOMMU flushing, it is advisable to
+be able to batch some critical IOMMU operations (e.g map/unmap multiple pages).
+
+## Hardware without IOMMU support
+
+Operating system needs to be aware on PV-IOMMU capability, and whether it is
+able to make contexts. However, some operating system may critically fail in
+case they are able to make a new IOMMU context. Which is supposed to happen
+if no IOMMU hardware is available.
+
+The hypercall interface needs a interface to advertise the ability to create
+and manage IOMMU contexts including the amount of context the guest is able
+to use. Using these informations, the Dom0 may decide whether to use or not
+the PV-IOMMU interface.
+
+## Page pool for contexts
+
+In order to prevent unexpected starving on the hypervisor memory with a
+buggy Dom0. We can preallocate the pages the contexts will use and make
+map/unmap use these pages instead of allocating them dynamically.
+
-- 
2.45.2



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[RFC XEN PATCH 5/5] xen/public: Introduce PV-IOMMU hypercall interface

2024-06-13 Thread Teddy Astie
Introduce a new pv interface to manage the underlying IOMMU and manage contexts
and devices. This interface allows creation of new contexts from Dom0 and
addition of IOMMU mappings using guest PoV.

This interface doesn't allow creation of mappings to other domains.

Signed-off-by Teddy Astie 
---
Missing in this RFC

Usage of PV-IOMMU inside DomU

Differences with Malcolm Crossley PV IOMMU RFC [1] :

Original PV IOMMU interfaces exposes IOMMU subsystem operations to the guest,
in some way, it still has the limitations of the Xen IOMMU subsystem. For
instance, all devices are bound to a single IOMMU domain, and this subsystem
can only modify a domain-wide one.
The main goal of the original implementation is to allow implementing vGPU by
mapping other guests into devices's address space (actually shared for all 
devices
of the domain).
This original implementation draft cannot work with PVH (due to HAP P2M being 
immutable
from IOMMU driver point of view) and cannot be used for implementing the Linux 
IOMMU
subsystem (due to inability to create separate iommu domains).

This new proposal aims for supporting the Linux IOMMU subsystem from Dom0 (and 
DomU
in the future). It needs to allow creation and management of IOMMU domains 
(named
IOMMU context) separated from the "default context" on a per-domain basis. 
There is
no foreign mapping support (yet) and emphasis on uses of Linux IOMMU subsystem 
(e.g
DMA protection and VFIO).

[1] 
https://lore.kernel.org/all/1455099035-17649-2-git-send-email-malcolm.cross...@citrix.com/
---
 xen/common/Makefile   |   1 +
 xen/common/pv-iommu.c | 320 ++
 xen/include/hypercall-defs.c  |   6 +
 xen/include/public/pv-iommu.h | 114 
 xen/include/public/xen.h  |   1 +
 5 files changed, 442 insertions(+)
 create mode 100644 xen/common/pv-iommu.c
 create mode 100644 xen/include/public/pv-iommu.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index d512cad524..336c5ea143 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -57,6 +57,7 @@ obj-y += wait.o
 obj-bin-y += warning.init.o
 obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
+obj-y += pv-iommu.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma lzo unlzo 
unlz4 unzstd earlycpio,$(n).init.o)
 
diff --git a/xen/common/pv-iommu.c b/xen/common/pv-iommu.c
new file mode 100644
index 00..844642ee54
--- /dev/null
+++ b/xen/common/pv-iommu.c
@@ -0,0 +1,320 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * xen/common/pv_iommu.c
+ *
+ * PV-IOMMU hypercall interface.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PVIOMMU_PREFIX "[PV-IOMMU] "
+
+#define PVIOMMU_MAX_PAGES 256 /* Move to Kconfig ? */
+
+/* Allowed masks for each sub-operation */
+#define ALLOC_OP_FLAGS_MASK (0)
+#define FREE_OP_FLAGS_MASK (IOMMU_TEARDOWN_REATTACH_DEFAULT)
+
+static int get_paged_frame(struct domain *d, gfn_t gfn, mfn_t *mfn,
+   struct page_info **page, int readonly)
+{
+p2m_type_t p2mt;
+
+*page = get_page_from_gfn(d, gfn_x(gfn), ,
+ (readonly) ? P2M_ALLOC : P2M_UNSHARE);
+
+if ( !(*page) )
+{
+*mfn = INVALID_MFN;
+if ( p2m_is_shared(p2mt) )
+return -EINVAL;
+if ( p2m_is_paging(p2mt) )
+{
+p2m_mem_paging_populate(d, gfn);
+return -EIO;
+}
+
+return -EPERM;
+}
+
+*mfn = page_to_mfn(*page);
+
+return 0;
+}
+
+static int can_use_iommu_check(struct domain *d)
+{
+if ( !iommu_enabled )
+{
+printk(PVIOMMU_PREFIX "IOMMU is not enabled\n");
+return 0;
+}
+
+if ( !is_hardware_domain(d) )
+{
+printk(PVIOMMU_PREFIX "Non-hardware domain\n");
+return 0;
+}
+
+if ( !is_iommu_enabled(d) )
+{
+printk(PVIOMMU_PREFIX "IOMMU disabled for this domain\n");
+return 0;
+}
+
+return 1;
+}
+
+static long query_cap_op(struct pv_iommu_op *op, struct domain *d)
+{
+op->cap.max_ctx_no = d->iommu.other_contexts.count;
+op->cap.max_nr_pages = PVIOMMU_MAX_PAGES;
+op->cap.max_iova_addr = (1LLU << 39) - 1; /* TODO: hardcoded 39-bits */
+
+return 0;
+}
+
+static long alloc_context_op(struct pv_iommu_op *op, struct domain *d)
+{
+u16 ctx_no = 0;
+int status = 0;
+
+status = iommu_context_alloc(d, _no, op->flags & ALLOC_OP_FLAGS_MASK);
+
+if (status < 0)
+return status;
+
+printk("Created context %hu\n", ctx_no);
+
+op->ctx_no = ctx_no;
+return 0;
+}
+
+static long free_context_op(struct pv_iommu_op *op, struct domain *d)
+{
+return iommu_context_free(d, op->ctx_no,
+  IOMMU_TEARDOWN_PREEMPT | (op->flags & 
FREE_OP_FLAGS_MASK));
+}
+
+static long reattach_device_op(stru

[RFC XEN PATCH 0/5] IOMMU subsystem redesign and PV-IOMMU interface

2024-06-13 Thread Teddy Astie
This work has been presented at Xen Summit 2024 during the
  IOMMU paravirtualization and Xen IOMMU subsystem rework
design session.

Operating systems may want to have access to a IOMMU in order to do DMA
protection or implement certain features (e.g VFIO on Linux).

VFIO support is mandatory for framework such as SPDK, which can be useful to
implement an alternative storage backend for virtual machines [1].

In this patch series, we introduce in Xen the ability to manage several
contexts per domain and provide a new hypercall interface to allow guests
to manage IOMMU contexts.

The VT-d driver is updated to support these new features.

[1] Using SPDK with the Xen hypervisor - FOSDEM 2023

Teddy Astie (5):
  docs/designs: Add a design document for PV-IOMMU
  docs/designs: Add a design document for IOMMU subsystem redesign
  IOMMU: Introduce redesigned IOMMU subsystem
  VT-d: Port IOMMU driver to new subsystem
  xen/public: Introduce PV-IOMMU hypercall interface

 docs/designs/iommu-contexts.md   |  398 +++
 docs/designs/pv-iommu.md |  105 ++
 xen/arch/x86/domain.c|2 +-
 xen/arch/x86/include/asm/arena.h |   54 +
 xen/arch/x86/include/asm/iommu.h |   44 +-
 xen/arch/x86/include/asm/pci.h   |   17 -
 xen/arch/x86/mm/p2m-ept.c|2 +-
 xen/arch/x86/pv/dom0_build.c |4 +-
 xen/arch/x86/tboot.c |4 +-
 xen/common/Makefile  |1 +
 xen/common/memory.c  |4 +-
 xen/common/pv-iommu.c|  320 ++
 xen/drivers/passthrough/Kconfig  |   14 +
 xen/drivers/passthrough/Makefile |3 +
 xen/drivers/passthrough/context.c|  626 +++
 xen/drivers/passthrough/iommu.c  |  333 ++
 xen/drivers/passthrough/pci.c|   49 +-
 xen/drivers/passthrough/quarantine.c |   49 +
 xen/drivers/passthrough/vtd/Makefile |2 +-
 xen/drivers/passthrough/vtd/extern.h |   14 +-
 xen/drivers/passthrough/vtd/iommu.c  | 1555 +++---
 xen/drivers/passthrough/vtd/quirks.c |   21 +-
 xen/drivers/passthrough/x86/Makefile |1 +
 xen/drivers/passthrough/x86/arena.c  |  157 +++
 xen/drivers/passthrough/x86/iommu.c  |  104 +-
 xen/include/hypercall-defs.c |6 +
 xen/include/public/pv-iommu.h|  114 ++
 xen/include/public/xen.h |1 +
 xen/include/xen/iommu.h  |  118 +-
 xen/include/xen/pci.h|3 +
 30 files changed, 2822 insertions(+), 1303 deletions(-)
 create mode 100644 docs/designs/iommu-contexts.md
 create mode 100644 docs/designs/pv-iommu.md
 create mode 100644 xen/arch/x86/include/asm/arena.h
 create mode 100644 xen/common/pv-iommu.c
 create mode 100644 xen/drivers/passthrough/context.c
 create mode 100644 xen/drivers/passthrough/quarantine.c
 create mode 100644 xen/drivers/passthrough/x86/arena.c
 create mode 100644 xen/include/public/pv-iommu.h

-- 
2.45.2



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[RFC XEN PATCH 3/5] IOMMU: Introduce redesigned IOMMU subsystem

2024-06-13 Thread Teddy Astie
Based on docs/designs/iommu-contexts.md, implement the redesigned IOMMU 
subsystem.

Signed-off-by Teddy Astie 
---
Missing in this RFC

Quarantine implementation is incomplete
Automatic determination of max ctx_no (maximum IOMMU context count) using
on PCI device count.
Automatic determination of max ctx_no (for dom_io).
Empty/no default IOMMU context mode (UEFI IOMMU based boot).
Support for DomU (and configuration using e.g libxl).

---
 xen/arch/x86/domain.c|   2 +-
 xen/arch/x86/mm/p2m-ept.c|   2 +-
 xen/arch/x86/pv/dom0_build.c |   4 +-
 xen/arch/x86/tboot.c |   4 +-
 xen/common/memory.c  |   4 +-
 xen/drivers/passthrough/Makefile |   3 +
 xen/drivers/passthrough/context.c| 626 +++
 xen/drivers/passthrough/iommu.c  | 333 --
 xen/drivers/passthrough/pci.c|  49 ++-
 xen/drivers/passthrough/quarantine.c |  49 +++
 xen/include/xen/iommu.h  | 118 -
 xen/include/xen/pci.h|   3 +
 12 files changed, 897 insertions(+), 300 deletions(-)
 create mode 100644 xen/drivers/passthrough/context.c
 create mode 100644 xen/drivers/passthrough/quarantine.c

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 00a3aaa576..52de634c81 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2381,7 +2381,7 @@ int domain_relinquish_resources(struct domain *d)
 
 PROGRESS(iommu_pagetables):
 
-ret = iommu_free_pgtables(d);
+ret = iommu_free_pgtables(d, iommu_default_context(d));
 if ( ret )
 return ret;
 
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index f83610cb8c..94c3631818 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -970,7 +970,7 @@ out:
 rc = iommu_iotlb_flush(d, _dfn(gfn), 1ul << order,
(iommu_flags ? IOMMU_FLUSHF_added : 0) |
(vtd_pte_present ? IOMMU_FLUSHF_modified
-: 0));
+: 0), 0);
 else if ( need_iommu_pt_sync(d) )
 rc = iommu_flags ?
 iommu_legacy_map(d, _dfn(gfn), mfn, 1ul << order, iommu_flags) 
:
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index d8043fa58a..db7298737d 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -76,7 +76,7 @@ static __init void mark_pv_pt_pages_rdonly(struct domain *d,
  * iommu_memory_setup() ended up mapping them.
  */
 if ( need_iommu_pt_sync(d) &&
- iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, 0, flush_flags) 
)
+ iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, 0, flush_flags, 
0) )
 BUG();
 
 /* Read-only mapping + PGC_allocated + page-table page. */
@@ -127,7 +127,7 @@ static void __init iommu_memory_setup(struct domain *d, 
const char *what,
 
 while ( (rc = iommu_map(d, _dfn(mfn_x(mfn)), mfn, nr,
 IOMMUF_readable | IOMMUF_writable | IOMMUF_preempt,
-flush_flags)) > 0 )
+flush_flags, 0)) > 0 )
 {
 mfn = mfn_add(mfn, rc);
 nr -= rc;
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index ba0700d2d5..ca55306830 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -216,9 +216,9 @@ static void tboot_gen_domain_integrity(const uint8_t 
key[TB_KEY_SIZE],
 
 if ( is_iommu_enabled(d) && is_vtd )
 {
-const struct domain_iommu *dio = dom_iommu(d);
+struct domain_iommu *dio = dom_iommu(d);
 
-update_iommu_mac(, dio->arch.vtd.pgd_maddr,
+update_iommu_mac(, 
iommu_default_context(d)->arch.vtd.pgd_maddr,
  agaw_to_level(dio->arch.vtd.agaw));
 }
 }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index de2cc7ad92..0eb0f9da7b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -925,7 +925,7 @@ int xenmem_add_to_physmap(struct domain *d, struct 
xen_add_to_physmap *xatp,
 this_cpu(iommu_dont_flush_iotlb) = 0;
 
 ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), done,
-IOMMU_FLUSHF_modified);
+IOMMU_FLUSHF_modified, 0);
 if ( unlikely(ret) && rc >= 0 )
 rc = ret;
 
@@ -939,7 +939,7 @@ int xenmem_add_to_physmap(struct domain *d, struct 
xen_add_to_physmap *xatp,
 put_page(pages[i]);
 
 ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done,
-IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
+IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified, 0);
 if ( unlikely(ret) && rc >= 0 )
  

[RFC XEN PATCH 2/5] docs/designs: Add a design document for IOMMU subsystem redesign

2024-06-13 Thread Teddy Astie
Current IOMMU subsystem has some limitations that make PV-IOMMU practically 
impossible.
One of them is the assumtion that each domain is bound to a single "IOMMU 
domain", which
also causes complications with quarantine implementation.

Moreover, current IOMMU subsystem is not entirely well-defined, for instance, 
the behavior
of map_page between ARM SMMUv3 and x86 VT-d/AMD-Vi greatly differs. On ARM, it 
can modifies
the domain page table while on x86, it may be forbidden (e.g using HAP with 
PVH), or only
modifying the devices PoV (e.g using PV).

The goal of this redesign is to define more explicitely the behavior and 
interface of the
IOMMU subsystem while allowing PV-IOMMU to be effectively implemented.

Signed-off-by Teddy Astie 
---
 docs/designs/iommu-contexts.md | 398 +
 1 file changed, 398 insertions(+)
 create mode 100644 docs/designs/iommu-contexts.md

diff --git a/docs/designs/iommu-contexts.md b/docs/designs/iommu-contexts.md
new file mode 100644
index 00..41bc701bf2
--- /dev/null
+++ b/docs/designs/iommu-contexts.md
@@ -0,0 +1,398 @@
+# IOMMU context management in Xen
+
+Status: Experimental
+Revision: 0
+
+# Background
+
+The design for *IOMMU paravirtualization for Dom0* [1] explains that some 
guests may
+want to access to IOMMU features. In order to implement this in Xen, several 
adjustments
+needs to be made to the IOMMU subsystem.
+
+This "hardware IOMMU domain" is currently implemented on a per-domain basis 
such as each
+domain actually has a specific *hardware IOMMU domain*, this design aims to 
allow a
+single Xen domain to manage several "IOMMU context", and allow some domains 
(e.g Dom0
+[1]) to modify their IOMMU contexts.
+
+In addition to this, quarantine feature can be refactored into using IOMMU 
contexts
+to reduce the complexity of platform-specific implementations and ensuring more
+consistency across platforms.
+
+# IOMMU context
+
+We define a "IOMMU context" as being a *hardware IOMMU domain*, but named as a 
context
+to avoid confusion with Xen domains.
+It represents some hardware-specific data structure that contains mappings 
from a device
+frame-number to a machine frame-number (e.g using a pagetable) that can be 
applied to
+a device using IOMMU hardware.
+
+This structure is bound to a Xen domain, but a Xen domain may have several 
IOMMU context.
+These contexts may be modifiable using the interface as defined in [1] aside 
some
+specific cases (e.g modifying default context).
+
+This is implemented in Xen as a new structure that will hold context-specific
+data.
+
+```c
+struct iommu_context {
+u16 id; /* Context id (0 means default context) */
+struct list_head devices;
+
+struct arch_iommu_context arch;
+
+bool opaque; /* context can't be modified nor accessed (e.g HAP) */
+};
+```
+
+A context is identified by a number that is domain-specific and may be used by 
IOMMU
+users such as PV-IOMMU by the guest.
+
+struct arch_iommu_context is splited from struct arch_iommu
+
+```c
+struct arch_iommu_context
+{
+spinlock_t pgtables_lock;
+struct page_list_head pgtables;
+
+union {
+/* Intel VT-d */
+struct {
+uint64_t pgd_maddr; /* io page directory machine address */
+domid_t *didmap; /* per-iommu DID */
+unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the 
context uses */
+} vtd;
+/* AMD IOMMU */
+struct {
+struct page_info *root_table;
+} amd;
+};
+};
+
+struct arch_iommu
+{
+spinlock_t mapping_lock; /* io page table lock */
+struct {
+struct page_list_head list;
+spinlock_t lock;
+} pgtables;
+
+struct list_head identity_maps;
+
+union {
+/* Intel VT-d */
+struct {
+/* no more context-specific values */
+unsigned int agaw; /* adjusted guest address width, 0 is level 2 
30-bit */
+} vtd;
+/* AMD IOMMU */
+struct {
+unsigned int paging_mode;
+struct guest_iommu *g_iommu;
+} amd;
+};
+};
+```
+
+IOMMU context information is now carried by iommu_context rather than being 
integrated to
+struct arch_iommu.
+
+# Xen domain IOMMU structure
+
+`struct domain_iommu` is modified to allow multiples context within a single 
Xen domain
+to exist :
+
+```c
+struct iommu_context_list {
+uint16_t count; /* Context count excluding default context */
+
+/* if count > 0 */
+
+uint64_t *bitmap; /* bitmap of context allocation */
+struct iommu_context *map; /* Map of contexts */
+};
+
+struct domain_iommu {
+/* ... */
+
+struct iommu_context default_ctx;
+struct iommu_context_list other_contexts;
+
+/* ... */
+}
+```
+
+default_ctx is a special context with id=0 that holds the page table mapping 
the entire
+domain, which basically preserve the previous behavior. All devices are 
expected to be
+

Re: [XEN PATCH] x86/iommu: Conditionally compile platform-specific union entries

2024-05-23 Thread Teddy Astie
Le 23/05/2024 à 11:52, Roger Pau Monné a écrit :
> The #ifdef and #endif processor directives shouldn't be indented.
>
> Would you mind adding /* CONFIG_{AMD,INTEL}_IOMMU */ comments in the
> #endif directives?
>

Sure, will change it for v2.

> I wonder if we could move the definitions of those structures to the
> vendor specific headers, but that's more convoluted, and would require
> including the iommu headers in pci.h

Do you mean moving the vtd/amd union entries to separate structures (e.g
vtd_arch_iommu) and put them into another file (I don't see any
vendor-specific headers for this, perhaps create ones ?).

>
> Thanks, Roger.

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




[XEN PATCH] x86/iommu: Conditionally compile platform-specific union entries

2024-05-23 Thread Teddy Astie
If some platform driver isn't compiled in, remove its related union
entries as they are not used.

Signed-off-by Teddy Astie 
---
 xen/arch/x86/include/asm/iommu.h | 4 
 xen/arch/x86/include/asm/pci.h   | 4 
 2 files changed, 8 insertions(+)

diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/iommu.h
index 8dc464fbd3..99180940c4 100644
--- a/xen/arch/x86/include/asm/iommu.h
+++ b/xen/arch/x86/include/asm/iommu.h
@@ -42,17 +42,21 @@ struct arch_iommu
 struct list_head identity_maps;
 
 union {
+#ifdef CONFIG_INTEL_IOMMU
 /* Intel VT-d */
 struct {
 uint64_t pgd_maddr; /* io page directory machine address */
 unsigned int agaw; /* adjusted guest address width, 0 is level 2 
30-bit */
 unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the domain 
uses */
 } vtd;
+#endif
+#ifdef CONFIG_AMD_IOMMU
 /* AMD IOMMU */
 struct {
 unsigned int paging_mode;
 struct page_info *root_table;
 } amd;
+#endif
 };
 };
 
diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h
index fd5480d67d..842710f0dc 100644
--- a/xen/arch/x86/include/asm/pci.h
+++ b/xen/arch/x86/include/asm/pci.h
@@ -22,12 +22,16 @@ struct arch_pci_dev {
  */
 union {
 /* Subset of struct arch_iommu's fields, to be used in dom_io. */
+#ifdef CONFIG_INTEL_IOMMU
 struct {
 uint64_t pgd_maddr;
 } vtd;
+#endif
+#ifdef CONFIG_AMD_IOMMU
 struct {
 struct page_info *root_table;
 } amd;
+#endif
 };
 domid_t pseudo_domid;
 mfn_t leaf_mfn;
-- 
2.45.1



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



Re: [XEN PATCH 2/2] x86/msr: add suffix 'U' to MSR_AMD_CSTATE_CFG macro.

2024-04-24 Thread Teddy Astie
Le 24/04/2024 à 14:11, Alessandro Zucchelli a écrit :
> This addresses violations of MISRA C:2012 Rule 7.2 which states as
> following: A “u” or “U” suffix shall be applied to all integer constants
> that are represented in an unsigned type.
>
> No functional change.
>
> Signed-off-by: Alessandro Zucchelli 
> ---
>   xen/arch/x86/include/asm/msr-index.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/include/asm/msr-index.h 
> b/xen/arch/x86/include/asm/msr-index.h
> index 92dd9fa496..9cdb5b2625 100644
> --- a/xen/arch/x86/include/asm/msr-index.h
> +++ b/xen/arch/x86/include/asm/msr-index.h
> @@ -236,7 +236,7 @@
>
>   #define MSR_VIRT_SPEC_CTRL  _AC(0xc001011f, U) /* Layout 
> matches MSR_SPEC_CTRL */
>
> -#define MSR_AMD_CSTATE_CFG  0xc0010296
> +#define MSR_AMD_CSTATE_CFG  0xc0010296U
>
>   /*
>* Legacy MSR constants in need of cleanup.  No new MSRs below this comment.

Hello, thanks for the patches

I wonder if the same approach should be also taken for all the other MSR
constants of this file that are similar ?

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




[RFC XEN PATCH v1] xen/public: Add initial files for PV-IOMMU

2024-04-23 Thread Teddy Astie
g these informations, the Dom0 may decide whether to use or not
+the PV-IOMMU interface.
+
+## Page pool for contexts
+
+In order to prevent unexpected starving on the hypervisor memory with a
+buggy Dom0. We can preallocate the pages the contexts will use and make
+map/unmap use these pages instead of allocating them dynamically.
+
diff --git a/xen/include/public/pv-iommu.h b/xen/include/public/pv-iommu.h
new file mode 100644
index 00..45f9c44eb1
--- /dev/null
+++ b/xen/include/public/pv-iommu.h
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: MIT */
+/**
+ * pv-iommu.h
+ *
+ * Paravirtualized IOMMU driver interface.
+ *
+ * Copyright (c) 2024 Teddy Astie 
+ */
+
+#ifndef __XEN_PUBLIC_PV_IOMMU_H__
+#define __XEN_PUBLIC_PV_IOMMU_H__
+
+#include "xen.h"
+#include "physdev.h"
+
+#define IOMMU_DEFAULT_CONTEXT (0)
+
+/**
+ * Query PV-IOMMU capabilities for this domain.
+ */
+#define IOMMUOP_query_capabilities1
+
+/**
+ * Allocate an IOMMU context, the new context handle will be written to ctx_no.
+ */
+#define IOMMUOP_alloc_context 2
+
+/**
+ * Destroy a IOMMU context.
+ * All devices attached to this context are reattached to default context.
+ *
+ * The default context can't be destroyed (0).
+ */
+#define IOMMUOP_free_context  3
+
+/**
+ * Reattach the device to IOMMU context.
+ */
+#define IOMMUOP_reattach_device   4
+
+#define IOMMUOP_map_pages 5
+#define IOMMUOP_unmap_pages   6
+
+/**
+ * Get the GFN associated to a specific DFN.
+ */
+#define IOMMUOP_lookup_page   7
+
+struct pv_iommu_op {
+uint16_t subop_id;
+uint16_t ctx_no;
+
+/**
+ * Create a context that is cloned from default.
+ * The new context will be populated with 1:1 mappings covering the entire 
guest memory.
+ */
+#define IOMMU_CREATE_clone (1 << 0)
+
+#define IOMMU_OP_readable (1 << 0)
+#define IOMMU_OP_writeable (1 << 1)
+uint32_t flags;
+
+union {
+struct {
+uint64_t gfn;
+uint64_t dfn;
+/* Number of pages to map */
+uint32_t nr_pages;
+/* Number of pages actually mapped after sub-op */
+uint32_t mapped;
+} map_pages;
+
+struct {
+uint64_t dfn;
+/* Number of pages to unmap */
+uint32_t nr_pages;
+/* Number of pages actually unmapped after sub-op */
+uint32_t unmapped;
+} unmap_pages;
+
+struct {
+struct physdev_pci_device dev;
+} reattach_device;
+
+struct {
+uint64_t gfn;
+uint64_t dfn;
+} lookup_page;
+
+struct {
+/* Maximum number of IOMMU context this domain can use. */
+uint16_t max_ctx_no;
+/* Maximum number of pages that can be modified in a single 
map/unmap operation. */
+uint32_t max_nr_pages;
+/* Maximum device address (iova) that the guest can use for 
mappings. */
+uint64_t max_iova_addr;
+} cap;
+};
+};
+
+typedef struct pv_iommu_op pv_iommu_op_t;
+DEFINE_XEN_GUEST_HANDLE(pv_iommu_op_t);
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
\ No newline at end of file
--
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




[RFC XEN v1] docs/designs: Introduce IOMMU context management

2024-04-23 Thread Teddy Astie
+- on the default context, reuse the domain_id (the default context is unique 
per domain)
+- on non-default context, use a id allocated in the pseudo_domid map, 
(actually used by
+quarantine) which is a DID outside of Xen domain_id range
+
+### AMD-Vi
+
+TODO
+
+## Device-tree platforms
+
+### SMMU and SMMUv3
+
+TODO
+
+* * *
+
+[1] See pv-iommu.md
+
+[2] pci: phantom functions assigned to incorrect contexts
+https://xenbits.xen.org/xsa/advisory-449.html
--
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




[XEN PATCH v5 3/3] x86/iommu: Disable intrerrupt remapping if cx16 is not supported

2024-04-18 Thread Teddy Astie
All hardware with VT-d/AMD-Vi has CMPXCHG16B support.  Check this at
initialisation time, and remove the effectively-dead logic for the non-cx16
case.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/amd/iommu_intr.c |  6 ++
 xen/drivers/passthrough/vtd/intremap.c   | 70 +++-
 xen/drivers/passthrough/vtd/iommu.c  |  7 +--
 3 files changed, 27 insertions(+), 56 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_intr.c 
b/xen/drivers/passthrough/amd/iommu_intr.c
index 7fc796dec2..9ab7c68749 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -649,6 +649,12 @@ bool __init cf_check iov_supports_xt(void)
 if ( !iommu_enable || !iommu_intremap )
 return false;
 
+if ( unlikely(!cpu_has_cx16) )
+{
+AMD_IOMMU_WARN("CPU doesn't support CMPXCHG16B, disable interrupt 
remapping\n");
+return false;
+}
+
 if ( amd_iommu_prepare(true) )
 return false;
 
diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index c504852eb8..7d4d907b4f 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -150,6 +150,13 @@ bool __init cf_check intel_iommu_supports_eim(void)
 if ( !iommu_qinval || !iommu_intremap || list_empty(_drhd_units) )
 return false;
 
+if ( unlikely(!cpu_has_cx16) )
+{
+printk(XENLOG_WARNING VTDPREFIX
+   "CPU doesn't support CMPXCHG16B, disable interrupt 
remapping\n");
+return false;
+}
+
 /* We MUST have a DRHD unit for each IOAPIC. */
 for ( apic = 0; apic < nr_ioapics; apic++ )
 if ( !ioapic_to_drhd(IO_APIC_ID(apic)) )
@@ -173,47 +180,26 @@ bool __init cf_check intel_iommu_supports_eim(void)
  * Assume iremap_lock has been acquired. It is to make sure software will not
  * change the same IRTE behind us. With this assumption, if only high qword or
  * low qword in IRTE is to be updated, this function's atomic variant can
- * present an atomic update to VT-d hardware even when cmpxchg16b
- * instruction is not supported.
+ * present an atomic update to VT-d hardware.
  */
 static void update_irte(struct vtd_iommu *iommu, struct iremap_entry *entry,
 const struct iremap_entry *new_ire, bool atomic)
 {
-ASSERT(spin_is_locked(>intremap.lock));
+__uint128_t ret;
+struct iremap_entry old_ire;
 
-if ( cpu_has_cx16 )
-{
-__uint128_t ret;
-struct iremap_entry old_ire;
+ASSERT(spin_is_locked(>intremap.lock));
 
-old_ire = *entry;
-ret = cmpxchg16b(entry, _ire, new_ire);
+old_ire = *entry;
+ret = cmpxchg16b(entry, _ire, new_ire);
 
-/*
- * In the above, we use cmpxchg16 to atomically update the 128-bit
- * IRTE, and the hardware cannot update the IRTE behind us, so
- * the return value of cmpxchg16 should be the same as old_ire.
- * This ASSERT validate it.
- */
-ASSERT(ret == old_ire.val);
-}
-else
-{
-/*
- * VT-d hardware doesn't update IRTEs behind us, nor the software
- * since we hold iremap_lock. If the caller wants VT-d hardware to
- * always see a consistent entry, but we can't meet it, a bug will
- * be raised.
- */
-if ( entry->lo == new_ire->lo )
-write_atomic(>hi, new_ire->hi);
-else if ( entry->hi == new_ire->hi )
-write_atomic(>lo, new_ire->lo);
-else if ( !atomic )
-*entry = *new_ire;
-else
-BUG();
-}
+/*
+ * In the above, we use cmpxchg16 to atomically update the 128-bit
+ * IRTE, and the hardware cannot update the IRTE behind us, so
+ * the return value of cmpxchg16 should be the same as old_ire.
+ * This ASSERT validate it.
+ */
+ASSERT(ret == old_ire.val);
 }
 
 /* Mark specified intr remap entry as free */
@@ -395,7 +381,6 @@ static int ioapic_rte_to_remap_entry(struct vtd_iommu 
*iommu,
 /* Indicate remap format. */
 remap_rte->format = 1;
 
-/* If cmpxchg16b is not available the caller must mask the IO-APIC pin. */
 update_irte(iommu, iremap_entry, _ire, !init && !masked);
 iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
@@ -437,21 +422,6 @@ void cf_check io_apic_write_remap_rte(
 bool masked = true;
 int rc;
 
-if ( !cpu_has_cx16 )
-{
-   /*
-* Cannot atomically update the IRTE entry: mask the IO-APIC pin to
-* avoid interrupts seeing an inconsistent IRTE entry.
-*/
-old_rte = __ioapic_read_entry(apic, pin, true);
-if ( !old_rte.mask )
-{
-masked = false;
-old_rte.mask = 1;
-__ioapic_write_entry(apic, pin, true, old_rte);
-   

[XEN PATCH v5 0/3] x86/iommu: Drop IOMMU support when cx16 isn't supported

2024-04-18 Thread Teddy Astie
All hardware that supports VT-d/AMD-Vi that exists also supports cx16 (aside
specifically crafted virtual machines).

Some IOMMU code paths in Xen consider cases where VT-d/AMD-Vi is supported
while cx16 isn't, those paths may be bugged and are barely tested, dead code
in practice.

Disable IOMMU in case we have IOMMU hardware but no cx16, then cleanup
no-cx16 handling logic from VT-d and AMD-Vi drivers. Also disable
interrupt remapping that also relies on cx16.

Teddy

---

Changed in v2:

  * Added cleanup no-cx16 code for x2APIC
  * Fixed commit and code formatting
  * Added missing Suggested-by note

Changed in v3:

  * Use -ENODEV instead of -ENOSYS.

Changed in v4:

  * Reworded "Disable IOMMU if cx16 isn't supported"
  * Moved interrupt remapping cleanup in separate patches
  * Check cx16 for interrupt remapping in driver's callbacks rather than 
in x2apic_bsp_setup

Changed in v5:

 * Folded VT-d/AMD-Vi cx16 cleanup patches

Teddy Astie (3):
  x86/iommu: Disable IOMMU if cx16 isn't supported
  VT-d: Cleanup MAP_SINGLE_DEVICE and related code
  x86/iommu: Disable intrerrupt remapping if cx16 is not supported

 xen/drivers/passthrough/amd/iommu_intr.c|  6 ++
 xen/drivers/passthrough/amd/iommu_map.c | 42 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 ++
 xen/drivers/passthrough/vtd/intremap.c  | 70 +---
 xen/drivers/passthrough/vtd/iommu.c | 92 +++--
 xen/drivers/passthrough/vtd/vtd.h   |  5 +-
 6 files changed, 77 insertions(+), 144 deletions(-)

-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v5 2/3] VT-d: Cleanup MAP_SINGLE_DEVICE and related code

2024-04-18 Thread Teddy Astie
This flag was only used in case cx16 is not available, as those code paths no
longer exist, this flag now does basically nothing.

Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/vtd/iommu.c | 12 +++-
 xen/drivers/passthrough/vtd/vtd.h   |  5 ++---
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 9c787ba9eb..7c6bae0256 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1692,15 +1692,9 @@ static int domain_context_mapping(struct domain *domain, 
u8 devfn,
 break;
 }
 
-if ( domain != pdev->domain && pdev->domain != dom_io )
-{
-if ( pdev->domain->is_dying )
-mode |= MAP_OWNER_DYING;
-else if ( drhd &&
-  !any_pdev_behind_iommu(pdev->domain, pdev, drhd->iommu) &&
-  !pdev->phantom_stride )
-mode |= MAP_SINGLE_DEVICE;
-}
+if ( domain != pdev->domain && pdev->domain != dom_io &&
+ pdev->domain->is_dying )
+mode |= MAP_OWNER_DYING;
 
 switch ( pdev->type )
 {
diff --git a/xen/drivers/passthrough/vtd/vtd.h 
b/xen/drivers/passthrough/vtd/vtd.h
index cb2df76eed..43f06a353d 100644
--- a/xen/drivers/passthrough/vtd/vtd.h
+++ b/xen/drivers/passthrough/vtd/vtd.h
@@ -28,9 +28,8 @@
  */
 #define MAP_WITH_RMRR (1u << 0)
 #define MAP_OWNER_DYING   (1u << 1)
-#define MAP_SINGLE_DEVICE (1u << 2)
-#define MAP_ERROR_RECOVERY(1u << 3)
-#define UNMAP_ME_PHANTOM_FUNC (1u << 4)
+#define MAP_ERROR_RECOVERY(1u << 2)
+#define UNMAP_ME_PHANTOM_FUNC (1u << 3)
 
 /* Allow for both IOAPIC and IOSAPIC. */
 #define IO_xAPIC_route_entry IO_APIC_route_entry
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v5 1/3] x86/iommu: Disable IOMMU if cx16 isn't supported

2024-04-18 Thread Teddy Astie
All hardware with VT-d/AMD-Vi has CMPXCHG16B support. Check this at
initialisation time, and remove the effectively-dead logic for the
non-cx16 case.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/amd/iommu_map.c | 42 
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 ++
 xen/drivers/passthrough/vtd/iommu.c | 73 +++--
 3 files changed, 45 insertions(+), 76 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
b/xen/drivers/passthrough/amd/iommu_map.c
index e0f4fe736a..f67975e700 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -167,15 +167,14 @@ int amd_iommu_set_root_page_table(struct amd_iommu_dte 
*dte,
 {
 bool valid = flags & SET_ROOT_VALID;
 
-if ( dte->v && dte->tv &&
- (cpu_has_cx16 || (flags & SET_ROOT_WITH_UNITY_MAP)) )
+if ( dte->v && dte->tv )
 {
 union {
 struct amd_iommu_dte dte;
 uint64_t raw64[4];
 __uint128_t raw128[2];
 } ldte = { .dte = *dte };
-__uint128_t old = ldte.raw128[0];
+__uint128_t res, old = ldte.raw128[0];
 int ret = 0;
 
 ldte.dte.domain_id = domain_id;
@@ -185,33 +184,20 @@ int amd_iommu_set_root_page_table(struct amd_iommu_dte 
*dte,
 ldte.dte.paging_mode = paging_mode;
 ldte.dte.v = valid;
 
-if ( cpu_has_cx16 )
-{
-__uint128_t res = cmpxchg16b(dte, , [0]);
+res = cmpxchg16b(dte, , [0]);
 
-/*
- * Hardware does not update the DTE behind our backs, so the
- * return value should match "old".
- */
-if ( res != old )
-{
-printk(XENLOG_ERR
-   "Dom%d: unexpected DTE %016lx_%016lx (expected 
%016lx_%016lx)\n",
-   domain_id,
-   (uint64_t)(res >> 64), (uint64_t)res,
-   (uint64_t)(old >> 64), (uint64_t)old);
-ret = -EILSEQ;
-}
-}
-else /* Best effort, updating domain_id last. */
+/*
+ * Hardware does not update the DTE behind our backs, so the
+ * return value should match "old".
+ */
+if ( res != old )
 {
-uint64_t *ptr = (void *)dte;
-
-write_atomic(ptr + 0, ldte.raw64[0]);
-/* No barrier should be needed between these two. */
-write_atomic(ptr + 1, ldte.raw64[1]);
-
-ret = 1;
+printk(XENLOG_ERR
+   "Dom%d: unexpected DTE %016lx_%016lx (expected 
%016lx_%016lx)\n",
+   domain_id,
+   (uint64_t)(res >> 64), (uint64_t)res,
+   (uint64_t)(old >> 64), (uint64_t)old);
+ret = -EILSEQ;
 }
 
 return ret;
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index f6efd88e36..3a6a23f706 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -305,6 +305,12 @@ static int __init cf_check iov_detect(void)
 if ( !iommu_enable && !iommu_intremap )
 return 0;
 
+if ( unlikely(!cpu_has_cx16) )
+{
+printk("AMD-Vi: CPU doesn't support CMPXCHG16B, disabling\n");
+return -ENODEV;
+}
+
 if ( (init_done ? amd_iommu_init_late()
 : amd_iommu_init(false)) != 0 )
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c7110af7c9..9c787ba9eb 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1482,7 +1482,7 @@ int domain_context_mapping_one(
 {
 struct domain_iommu *hd = dom_iommu(domain);
 struct context_entry *context, *context_entries, lctxt;
-__uint128_t old;
+__uint128_t res, old;
 uint64_t maddr;
 uint16_t seg = iommu->drhd->segment, prev_did = 0;
 struct domain *prev_dom = NULL;
@@ -1580,55 +1580,23 @@ int domain_context_mapping_one(
 ASSERT(!context_fault_disable(lctxt));
 }
 
-if ( cpu_has_cx16 )
-{
-__uint128_t res = cmpxchg16b(context, , );
-
-/*
- * Hardware does not update the context entry behind our backs,
- * so the return value should match "old".
- */
-if ( res != old )
-{
-if ( pdev )
-check_cleanup_domid_map(domain, pdev, iommu);
-printk(XENLOG_ERR
-   "%pp: unexpected context entry %016lx_%016lx (expected 
%016lx_%016lx)\n",
-   _SBDF(seg, bus, devfn),
-   (uint64_t)(res >> 64), (uint64_t)res,
-   (uint64_t)(old >> 64), (uint64_t)old);
-  

Re: [XEN PATCH v1 15/15] x86/hvm: make AMD-V and Intel VT-x support configurable

2024-04-16 Thread Teddy Astie
Hello Sergiy,

> Also make INTEL_IOMMU/AMD_IOMMU options dependant on VMX/SVM options.

The discussion in the RFC series stated the IOMMU may be used with PV 
guests, and doesn't rely on VMX/SVM support (in fact, it can be used 
without HVM support in Xen).

However, in the discussion, posted interrupts were supposed to be 
dependent on VMX/SVM instead. I suppose this is a leftover from the 
original RFC ?

> https://lore.kernel.org/xen-devel/e29e375f-3d30-0eb1-7e28-b93f2d831...@suse.com/

Teddy

---


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



Re: [XEN PATCH v4 0/5] x86/iommu: Drop IOMMU support when cx16 isn't supported

2024-04-15 Thread Teddy Astie
Le 15/04/2024 à 14:15, Teddy Astie a écrit :
> All hardware that supports VT-d/AMD-Vi that exists also supports cx16 (aside
> specifically crafted virtual machines).
>
> Some IOMMU code paths in Xen consider cases where VT-d/AMD-Vi is supported
> while cx16 isn't, those paths may be bugged and are barely tested, dead code
> in practice.
>
> Disable IOMMU in case we have IOMMU hardware but no cx16, then cleanup
> no-cx16 handling logic from VT-d and AMD-Vi drivers. Also disable
> interrupt remapping that also relies on cx16.
>
> Teddy Astie (5):
>VT-d: Disable IOMMU if cx16 isn't supported
>AMD-Vi: Disable IOMMU if cx16 isn't supported
>VT-d: Cleanup MAP_SINGLE_DEVICE and related code
>VT-d: Disable intrerrupt remapping if cx16 is not supported
>AMD-Vi: Disable intrerrupt remapping if cx16 is not supported
>
>   xen/drivers/passthrough/amd/iommu_intr.c|  6 ++
>   xen/drivers/passthrough/amd/iommu_map.c | 42 --
>   xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 ++
>   xen/drivers/passthrough/vtd/intremap.c  | 70 +---
>   xen/drivers/passthrough/vtd/iommu.c | 92 +++--
>   xen/drivers/passthrough/vtd/vtd.h   |  5 +-
>   6 files changed, 77 insertions(+), 144 deletions(-)
>

Here is the patch history that got lost for some reason in this cover.

Changed in v2:

  * Added cleanup no-cx16 code for x2APIC
  * Fixed commit and code formatting
  * Added missing Suggested-by note

Changed in v3:

  * Use -ENODEV instead of -ENOSYS.

Changed in v4:

  * Reworded "Disable IOMMU if cx16 isn't supported"
  * Moved interrupt remapping cleanup in separate patches
  * Check cx16 for interrupt remapping in driver's callbacks rather than
in x2apic_bsp_setup

Teddy

---


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




[XEN PATCH v4 2/5] AMD-Vi: Disable IOMMU if cx16 isn't supported

2024-04-15 Thread Teddy Astie
All hardware with AMD-Vi has CMPXCHG16 support. Check this at initialisation
time, and remove the effectively-dead logic for the non-cx16 case.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/amd/iommu_map.c | 42 +++--
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 +++
 2 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
b/xen/drivers/passthrough/amd/iommu_map.c
index e0f4fe736a..f67975e700 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -167,15 +167,14 @@ int amd_iommu_set_root_page_table(struct amd_iommu_dte 
*dte,
 {
 bool valid = flags & SET_ROOT_VALID;
 
-if ( dte->v && dte->tv &&
- (cpu_has_cx16 || (flags & SET_ROOT_WITH_UNITY_MAP)) )
+if ( dte->v && dte->tv )
 {
 union {
 struct amd_iommu_dte dte;
 uint64_t raw64[4];
 __uint128_t raw128[2];
 } ldte = { .dte = *dte };
-__uint128_t old = ldte.raw128[0];
+__uint128_t res, old = ldte.raw128[0];
 int ret = 0;
 
 ldte.dte.domain_id = domain_id;
@@ -185,33 +184,20 @@ int amd_iommu_set_root_page_table(struct amd_iommu_dte 
*dte,
 ldte.dte.paging_mode = paging_mode;
 ldte.dte.v = valid;
 
-if ( cpu_has_cx16 )
-{
-__uint128_t res = cmpxchg16b(dte, , [0]);
+res = cmpxchg16b(dte, , [0]);
 
-/*
- * Hardware does not update the DTE behind our backs, so the
- * return value should match "old".
- */
-if ( res != old )
-{
-printk(XENLOG_ERR
-   "Dom%d: unexpected DTE %016lx_%016lx (expected 
%016lx_%016lx)\n",
-   domain_id,
-   (uint64_t)(res >> 64), (uint64_t)res,
-   (uint64_t)(old >> 64), (uint64_t)old);
-ret = -EILSEQ;
-}
-}
-else /* Best effort, updating domain_id last. */
+/*
+ * Hardware does not update the DTE behind our backs, so the
+ * return value should match "old".
+ */
+if ( res != old )
 {
-uint64_t *ptr = (void *)dte;
-
-write_atomic(ptr + 0, ldte.raw64[0]);
-/* No barrier should be needed between these two. */
-write_atomic(ptr + 1, ldte.raw64[1]);
-
-ret = 1;
+printk(XENLOG_ERR
+   "Dom%d: unexpected DTE %016lx_%016lx (expected 
%016lx_%016lx)\n",
+   domain_id,
+   (uint64_t)(res >> 64), (uint64_t)res,
+   (uint64_t)(old >> 64), (uint64_t)old);
+ret = -EILSEQ;
 }
 
 return ret;
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index f6efd88e36..3a6a23f706 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -305,6 +305,12 @@ static int __init cf_check iov_detect(void)
 if ( !iommu_enable && !iommu_intremap )
 return 0;
 
+if ( unlikely(!cpu_has_cx16) )
+{
+printk("AMD-Vi: CPU doesn't support CMPXCHG16B, disabling\n");
+return -ENODEV;
+    }
+
 if ( (init_done ? amd_iommu_init_late()
 : amd_iommu_init(false)) != 0 )
 {
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v4 1/5] VT-d: Disable IOMMU if cx16 isn't supported

2024-04-15 Thread Teddy Astie
All hardware with VT-d has CMPXCHG16B support. Check this at initialisation
time, and remove the effectively-dead logic for the non-cx16 case.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/vtd/iommu.c | 73 ++---
 1 file changed, 25 insertions(+), 48 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c7110af7c9..9c787ba9eb 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1482,7 +1482,7 @@ int domain_context_mapping_one(
 {
 struct domain_iommu *hd = dom_iommu(domain);
 struct context_entry *context, *context_entries, lctxt;
-__uint128_t old;
+__uint128_t res, old;
 uint64_t maddr;
 uint16_t seg = iommu->drhd->segment, prev_did = 0;
 struct domain *prev_dom = NULL;
@@ -1580,55 +1580,23 @@ int domain_context_mapping_one(
 ASSERT(!context_fault_disable(lctxt));
 }
 
-if ( cpu_has_cx16 )
-{
-__uint128_t res = cmpxchg16b(context, , );
-
-/*
- * Hardware does not update the context entry behind our backs,
- * so the return value should match "old".
- */
-if ( res != old )
-{
-if ( pdev )
-check_cleanup_domid_map(domain, pdev, iommu);
-printk(XENLOG_ERR
-   "%pp: unexpected context entry %016lx_%016lx (expected 
%016lx_%016lx)\n",
-   _SBDF(seg, bus, devfn),
-   (uint64_t)(res >> 64), (uint64_t)res,
-   (uint64_t)(old >> 64), (uint64_t)old);
-rc = -EILSEQ;
-goto unlock;
-}
-}
-else if ( !prev_dom || !(mode & MAP_WITH_RMRR) )
-{
-context_clear_present(*context);
-iommu_sync_cache(context, sizeof(*context));
+res = cmpxchg16b(context, , );
 
-write_atomic(>hi, lctxt.hi);
-/* No barrier should be needed between these two. */
-write_atomic(>lo, lctxt.lo);
-}
-else /* Best effort, updating DID last. */
+/*
+ * Hardware does not update the context entry behind our backs,
+ * so the return value should match "old".
+ */
+if ( res != old )
 {
- /*
-  * By non-atomically updating the context entry's DID field last,
-  * during a short window in time TLB entries with the old domain ID
-  * but the new page tables may be inserted.  This could affect I/O
-  * of other devices using this same (old) domain ID.  Such updating
-  * therefore is not a problem if this was the only device associated
-  * with the old domain ID.  Diverting I/O of any of a dying domain's
-  * devices to the quarantine page tables is intended anyway.
-  */
-if ( !(mode & (MAP_OWNER_DYING | MAP_SINGLE_DEVICE)) )
-printk(XENLOG_WARNING VTDPREFIX
-   " %pp: reassignment may cause %pd data corruption\n",
-   _SBDF(seg, bus, devfn), prev_dom);
-
-write_atomic(>lo, lctxt.lo);
-/* No barrier should be needed between these two. */
-write_atomic(>hi, lctxt.hi);
+if ( pdev )
+check_cleanup_domid_map(domain, pdev, iommu);
+printk(XENLOG_ERR
+"%pp: unexpected context entry %016lx_%016lx (expected 
%016lx_%016lx)\n",
+_SBDF(seg, bus, devfn),
+(uint64_t)(res >> 64), (uint64_t)res,
+(uint64_t)(old >> 64), (uint64_t)old);
+rc = -EILSEQ;
+goto unlock;
 }
 
 iommu_sync_cache(context, sizeof(struct context_entry));
@@ -2630,6 +2598,15 @@ static int __init cf_check vtd_setup(void)
 int ret;
 bool reg_inval_supported = true;
 
+if ( unlikely(!cpu_has_cx16) )
+{
+printk(XENLOG_ERR VTDPREFIX
+   "IOMMU: CPU doesn't support CMPXCHG16B, disabling\n");
+
+ret = -ENODEV;
+    goto error;
+}
+
 if ( list_empty(_drhd_units) )
 {
 ret = -ENODEV;
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v4 3/5] VT-d: Cleanup MAP_SINGLE_DEVICE and related code

2024-04-15 Thread Teddy Astie
This flag was only used in case cx16 is not available, as those code paths no
longer exist, this flag now does basically nothing.

Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/vtd/iommu.c | 12 +++-
 xen/drivers/passthrough/vtd/vtd.h   |  5 ++---
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 9c787ba9eb..7c6bae0256 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1692,15 +1692,9 @@ static int domain_context_mapping(struct domain *domain, 
u8 devfn,
 break;
 }
 
-if ( domain != pdev->domain && pdev->domain != dom_io )
-{
-if ( pdev->domain->is_dying )
-mode |= MAP_OWNER_DYING;
-else if ( drhd &&
-  !any_pdev_behind_iommu(pdev->domain, pdev, drhd->iommu) &&
-  !pdev->phantom_stride )
-mode |= MAP_SINGLE_DEVICE;
-}
+if ( domain != pdev->domain && pdev->domain != dom_io &&
+ pdev->domain->is_dying )
+mode |= MAP_OWNER_DYING;
 
 switch ( pdev->type )
 {
diff --git a/xen/drivers/passthrough/vtd/vtd.h 
b/xen/drivers/passthrough/vtd/vtd.h
index cb2df76eed..43f06a353d 100644
--- a/xen/drivers/passthrough/vtd/vtd.h
+++ b/xen/drivers/passthrough/vtd/vtd.h
@@ -28,9 +28,8 @@
  */
 #define MAP_WITH_RMRR (1u << 0)
 #define MAP_OWNER_DYING   (1u << 1)
-#define MAP_SINGLE_DEVICE (1u << 2)
-#define MAP_ERROR_RECOVERY(1u << 3)
-#define UNMAP_ME_PHANTOM_FUNC (1u << 4)
+#define MAP_ERROR_RECOVERY(1u << 2)
+#define UNMAP_ME_PHANTOM_FUNC (1u << 3)
 
 /* Allow for both IOAPIC and IOSAPIC. */
 #define IO_xAPIC_route_entry IO_APIC_route_entry
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v4 5/5] AMD-Vi: Disable intrerrupt remapping if cx16 is not supported

2024-04-15 Thread Teddy Astie
All hardware with AMD-Vi has CMPXCHG16 support.  Check this at initialisation
time, and remove the effectively-dead logic for the non-cx16 case.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/amd/iommu_intr.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/drivers/passthrough/amd/iommu_intr.c 
b/xen/drivers/passthrough/amd/iommu_intr.c
index 7fc796dec2..9ab7c68749 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -649,6 +649,12 @@ bool __init cf_check iov_supports_xt(void)
 if ( !iommu_enable || !iommu_intremap )
 return false;
 
+if ( unlikely(!cpu_has_cx16) )
+{
+AMD_IOMMU_WARN("CPU doesn't support CMPXCHG16B, disable interrupt 
remapping\n");
+return false;
+}
+
 if ( amd_iommu_prepare(true) )
 return false;
 
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v4 0/5] x86/iommu: Drop IOMMU support when cx16 isn't supported

2024-04-15 Thread Teddy Astie
All hardware that supports VT-d/AMD-Vi that exists also supports cx16 (aside
specifically crafted virtual machines).

Some IOMMU code paths in Xen consider cases where VT-d/AMD-Vi is supported
while cx16 isn't, those paths may be bugged and are barely tested, dead code
in practice.

Disable IOMMU in case we have IOMMU hardware but no cx16, then cleanup
no-cx16 handling logic from VT-d and AMD-Vi drivers. Also disable
interrupt remapping that also relies on cx16.

Teddy Astie (5):
  VT-d: Disable IOMMU if cx16 isn't supported
  AMD-Vi: Disable IOMMU if cx16 isn't supported
  VT-d: Cleanup MAP_SINGLE_DEVICE and related code
  VT-d: Disable intrerrupt remapping if cx16 is not supported
  AMD-Vi: Disable intrerrupt remapping if cx16 is not supported

 xen/drivers/passthrough/amd/iommu_intr.c|  6 ++
 xen/drivers/passthrough/amd/iommu_map.c | 42 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 ++
 xen/drivers/passthrough/vtd/intremap.c  | 70 +---
 xen/drivers/passthrough/vtd/iommu.c | 92 +++--
 xen/drivers/passthrough/vtd/vtd.h   |  5 +-
 6 files changed, 77 insertions(+), 144 deletions(-)

-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v4 4/5] VT-d: Disable intrerrupt remapping if cx16 is not supported

2024-04-15 Thread Teddy Astie
All hardware with VT-d has CMPXCHG16B support.  Check this at initialisation
time, and remove the effectively-dead logic for the non-cx16 case.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/vtd/intremap.c | 70 --
 xen/drivers/passthrough/vtd/iommu.c|  7 +--
 2 files changed, 21 insertions(+), 56 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index c504852eb8..7d4d907b4f 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -150,6 +150,13 @@ bool __init cf_check intel_iommu_supports_eim(void)
 if ( !iommu_qinval || !iommu_intremap || list_empty(_drhd_units) )
 return false;
 
+if ( unlikely(!cpu_has_cx16) )
+{
+printk(XENLOG_WARNING VTDPREFIX
+   "CPU doesn't support CMPXCHG16B, disable interrupt 
remapping\n");
+return false;
+}
+
 /* We MUST have a DRHD unit for each IOAPIC. */
 for ( apic = 0; apic < nr_ioapics; apic++ )
 if ( !ioapic_to_drhd(IO_APIC_ID(apic)) )
@@ -173,47 +180,26 @@ bool __init cf_check intel_iommu_supports_eim(void)
  * Assume iremap_lock has been acquired. It is to make sure software will not
  * change the same IRTE behind us. With this assumption, if only high qword or
  * low qword in IRTE is to be updated, this function's atomic variant can
- * present an atomic update to VT-d hardware even when cmpxchg16b
- * instruction is not supported.
+ * present an atomic update to VT-d hardware.
  */
 static void update_irte(struct vtd_iommu *iommu, struct iremap_entry *entry,
 const struct iremap_entry *new_ire, bool atomic)
 {
-ASSERT(spin_is_locked(>intremap.lock));
+__uint128_t ret;
+struct iremap_entry old_ire;
 
-if ( cpu_has_cx16 )
-{
-__uint128_t ret;
-struct iremap_entry old_ire;
+ASSERT(spin_is_locked(>intremap.lock));
 
-old_ire = *entry;
-ret = cmpxchg16b(entry, _ire, new_ire);
+old_ire = *entry;
+ret = cmpxchg16b(entry, _ire, new_ire);
 
-/*
- * In the above, we use cmpxchg16 to atomically update the 128-bit
- * IRTE, and the hardware cannot update the IRTE behind us, so
- * the return value of cmpxchg16 should be the same as old_ire.
- * This ASSERT validate it.
- */
-ASSERT(ret == old_ire.val);
-}
-else
-{
-/*
- * VT-d hardware doesn't update IRTEs behind us, nor the software
- * since we hold iremap_lock. If the caller wants VT-d hardware to
- * always see a consistent entry, but we can't meet it, a bug will
- * be raised.
- */
-if ( entry->lo == new_ire->lo )
-write_atomic(>hi, new_ire->hi);
-else if ( entry->hi == new_ire->hi )
-write_atomic(>lo, new_ire->lo);
-else if ( !atomic )
-*entry = *new_ire;
-else
-BUG();
-}
+/*
+ * In the above, we use cmpxchg16 to atomically update the 128-bit
+ * IRTE, and the hardware cannot update the IRTE behind us, so
+ * the return value of cmpxchg16 should be the same as old_ire.
+ * This ASSERT validate it.
+ */
+ASSERT(ret == old_ire.val);
 }
 
 /* Mark specified intr remap entry as free */
@@ -395,7 +381,6 @@ static int ioapic_rte_to_remap_entry(struct vtd_iommu 
*iommu,
 /* Indicate remap format. */
 remap_rte->format = 1;
 
-/* If cmpxchg16b is not available the caller must mask the IO-APIC pin. */
 update_irte(iommu, iremap_entry, _ire, !init && !masked);
 iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
@@ -437,21 +422,6 @@ void cf_check io_apic_write_remap_rte(
 bool masked = true;
 int rc;
 
-if ( !cpu_has_cx16 )
-{
-   /*
-* Cannot atomically update the IRTE entry: mask the IO-APIC pin to
-* avoid interrupts seeing an inconsistent IRTE entry.
-*/
-old_rte = __ioapic_read_entry(apic, pin, true);
-if ( !old_rte.mask )
-{
-masked = false;
-old_rte.mask = 1;
-__ioapic_write_entry(apic, pin, true, old_rte);
-}
-}
-
 /* Not the initializer, for old gcc to cope. */
 new_rte.raw = rte;
 
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 7c6bae0256..a1bd3c5ff6 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2663,12 +2663,7 @@ static int __init cf_check vtd_setup(void)
 iommu_intremap = iommu_intremap_off;
 
 #ifndef iommu_intpost
-/*
- * We cannot use posted interrupt if X86_FEATURE_CX16 is
- * not supported, since we count on this feature to
- * atomically update 16-byte IRTE in posted format

Re: [XEN PATCH v3 0/3] x86/iommu: Drop IOMMU support when cx16 isn't supported

2024-04-12 Thread Teddy Astie
Le 12/04/2024 à 16:49, Andrew Cooper a écrit :

> 1) You introduced trailing whitespace in patch 1 in the middle line here:
>
>> + ASSERT(spin_is_locked(>intremap.lock)); + + old_ire = *entry;

Good catch, will fix

> 2) In your commit messages, the grammar is a bit awkward.  It would be
> clearer to say:
>
> "All hardware with VT-d/AMD-Vi has CMPXCHG16 support.  Check this at
> initialisation time, and remove the effectively-dead logic for the
> non-cx16 case."
>
> just as you've done in the cover letter.

Yes

> 3) In patch 1, you shouldn't modify x2apic_bsp_setup() like that.
> x2APIC && no-IOMMU is a legal combination.
>
> Instead, you should put a cx16 check in both driver's supports_x2apic()
> hooks.

In this case, you mean both intel_iommu_supports_eim and iov_supports_xt
(AMD) ?

>
> 4) In patch 3, you should drop the Suggested-by me.  You found that one
> all yourself.
>

Will change this.

Teddy

---


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




[XEN PATCH v3 3/3] VT-d: Cleanup MAP_SINGLE_DEVICE and related code

2024-04-12 Thread Teddy Astie
This flag was only used in case cx16 is not available, as those code paths no
longer exist, this flag now does basically nothing.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/vtd/iommu.c | 12 +++-
 xen/drivers/passthrough/vtd/vtd.h   |  5 ++---
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index ef9380ed6a..a1bd3c5ff6 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1692,15 +1692,9 @@ static int domain_context_mapping(struct domain *domain, 
u8 devfn,
 break;
 }
 
-if ( domain != pdev->domain && pdev->domain != dom_io )
-{
-if ( pdev->domain->is_dying )
-mode |= MAP_OWNER_DYING;
-else if ( drhd &&
-  !any_pdev_behind_iommu(pdev->domain, pdev, drhd->iommu) &&
-  !pdev->phantom_stride )
-mode |= MAP_SINGLE_DEVICE;
-}
+if ( domain != pdev->domain && pdev->domain != dom_io &&
+ pdev->domain->is_dying )
+mode |= MAP_OWNER_DYING;
 
 switch ( pdev->type )
 {
diff --git a/xen/drivers/passthrough/vtd/vtd.h 
b/xen/drivers/passthrough/vtd/vtd.h
index cb2df76eed..43f06a353d 100644
--- a/xen/drivers/passthrough/vtd/vtd.h
+++ b/xen/drivers/passthrough/vtd/vtd.h
@@ -28,9 +28,8 @@
  */
 #define MAP_WITH_RMRR (1u << 0)
 #define MAP_OWNER_DYING   (1u << 1)
-#define MAP_SINGLE_DEVICE (1u << 2)
-#define MAP_ERROR_RECOVERY(1u << 3)
-#define UNMAP_ME_PHANTOM_FUNC (1u << 4)
+#define MAP_ERROR_RECOVERY(1u << 2)
+#define UNMAP_ME_PHANTOM_FUNC (1u << 3)
 
 /* Allow for both IOAPIC and IOSAPIC. */
 #define IO_xAPIC_route_entry IO_APIC_route_entry
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v3 1/3] VT-d: Disable IOMMU if cx16 isn't supported

2024-04-12 Thread Teddy Astie
No hardware has VT-d support while not having cx16 support, disable IOMMU in
this case to avoid potentially buggy code.

Now that IOMMU is only enabled if cx16 is supported, drop dead code that
handles cases where cx16 isn't supported.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/arch/x86/apic.c|  6 ++
 xen/drivers/passthrough/vtd/intremap.c | 65 +
 xen/drivers/passthrough/vtd/iommu.c| 80 +-
 3 files changed, 46 insertions(+), 105 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 592b78e11e..91d7f2b248 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -836,6 +836,12 @@ void __init x2apic_bsp_setup(void)
 if ( !cpu_has_x2apic )
 return;
 
+if ( unlikely(!cpu_has_cx16) )
+{
+printk("x2APIC: CPU doesn't support CMPXCHG16B, disabling\n");
+return;
+}
+
 if ( !opt_x2apic )
 {
 if ( !x2apic_enabled )
diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index c504852eb8..b0a0dbdbc2 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -173,47 +173,26 @@ bool __init cf_check intel_iommu_supports_eim(void)
  * Assume iremap_lock has been acquired. It is to make sure software will not
  * change the same IRTE behind us. With this assumption, if only high qword or
  * low qword in IRTE is to be updated, this function's atomic variant can
- * present an atomic update to VT-d hardware even when cmpxchg16b
- * instruction is not supported.
+ * present an atomic update to VT-d hardware.
  */
 static void update_irte(struct vtd_iommu *iommu, struct iremap_entry *entry,
 const struct iremap_entry *new_ire, bool atomic)
 {
-ASSERT(spin_is_locked(>intremap.lock));
-
-if ( cpu_has_cx16 )
-{
-__uint128_t ret;
-struct iremap_entry old_ire;
+__uint128_t ret;
+struct iremap_entry old_ire;
 
-old_ire = *entry;
-ret = cmpxchg16b(entry, _ire, new_ire);
+ASSERT(spin_is_locked(>intremap.lock));
+
+old_ire = *entry;
+ret = cmpxchg16b(entry, _ire, new_ire);
 
-/*
- * In the above, we use cmpxchg16 to atomically update the 128-bit
- * IRTE, and the hardware cannot update the IRTE behind us, so
- * the return value of cmpxchg16 should be the same as old_ire.
- * This ASSERT validate it.
- */
-ASSERT(ret == old_ire.val);
-}
-else
-{
-/*
- * VT-d hardware doesn't update IRTEs behind us, nor the software
- * since we hold iremap_lock. If the caller wants VT-d hardware to
- * always see a consistent entry, but we can't meet it, a bug will
- * be raised.
- */
-if ( entry->lo == new_ire->lo )
-write_atomic(>hi, new_ire->hi);
-else if ( entry->hi == new_ire->hi )
-write_atomic(>lo, new_ire->lo);
-else if ( !atomic )
-*entry = *new_ire;
-else
-BUG();
-}
+/*
+ * In the above, we use cmpxchg16 to atomically update the 128-bit
+ * IRTE, and the hardware cannot update the IRTE behind us, so
+ * the return value of cmpxchg16 should be the same as old_ire.
+ * This ASSERT validate it.
+ */
+ASSERT(ret == old_ire.val);
 }
 
 /* Mark specified intr remap entry as free */
@@ -395,7 +374,6 @@ static int ioapic_rte_to_remap_entry(struct vtd_iommu 
*iommu,
 /* Indicate remap format. */
 remap_rte->format = 1;
 
-/* If cmpxchg16b is not available the caller must mask the IO-APIC pin. */
 update_irte(iommu, iremap_entry, _ire, !init && !masked);
 iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
@@ -437,21 +415,6 @@ void cf_check io_apic_write_remap_rte(
 bool masked = true;
 int rc;
 
-if ( !cpu_has_cx16 )
-{
-   /*
-* Cannot atomically update the IRTE entry: mask the IO-APIC pin to
-* avoid interrupts seeing an inconsistent IRTE entry.
-*/
-old_rte = __ioapic_read_entry(apic, pin, true);
-if ( !old_rte.mask )
-{
-masked = false;
-old_rte.mask = 1;
-__ioapic_write_entry(apic, pin, true, old_rte);
-}
-}
-
 /* Not the initializer, for old gcc to cope. */
 new_rte.raw = rte;
 
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c7110af7c9..ef9380ed6a 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1482,7 +1482,7 @@ int domain_context_mapping_one(
 {
 struct domain_iommu *hd = dom_iommu(domain);
 struct context_entry *context, *context_entries, lctxt;
-__uint128_t old;
+__uint128_t res, old;
 uint64_t maddr;
 uint16_t seg = iom

[XEN PATCH v3 2/3] AMD-Vi: Disable IOMMU if cx16 isn't supported

2024-04-12 Thread Teddy Astie
No hardware has AMD-Vi support while not having cx16 support, disable IOMMU
in this case to avoid potentially buggy code.

Now that IOMMU is only enabled if cx16 is supported, drop dead code that
handles cases where cx16 isn't supported.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/amd/iommu_map.c | 42 +++--
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 +++
 2 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
b/xen/drivers/passthrough/amd/iommu_map.c
index e0f4fe736a..f67975e700 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -167,15 +167,14 @@ int amd_iommu_set_root_page_table(struct amd_iommu_dte 
*dte,
 {
 bool valid = flags & SET_ROOT_VALID;
 
-if ( dte->v && dte->tv &&
- (cpu_has_cx16 || (flags & SET_ROOT_WITH_UNITY_MAP)) )
+if ( dte->v && dte->tv )
 {
 union {
 struct amd_iommu_dte dte;
 uint64_t raw64[4];
 __uint128_t raw128[2];
 } ldte = { .dte = *dte };
-__uint128_t old = ldte.raw128[0];
+__uint128_t res, old = ldte.raw128[0];
 int ret = 0;
 
 ldte.dte.domain_id = domain_id;
@@ -185,33 +184,20 @@ int amd_iommu_set_root_page_table(struct amd_iommu_dte 
*dte,
 ldte.dte.paging_mode = paging_mode;
 ldte.dte.v = valid;
 
-if ( cpu_has_cx16 )
-{
-__uint128_t res = cmpxchg16b(dte, , [0]);
+res = cmpxchg16b(dte, , [0]);
 
-/*
- * Hardware does not update the DTE behind our backs, so the
- * return value should match "old".
- */
-if ( res != old )
-{
-printk(XENLOG_ERR
-   "Dom%d: unexpected DTE %016lx_%016lx (expected 
%016lx_%016lx)\n",
-   domain_id,
-   (uint64_t)(res >> 64), (uint64_t)res,
-   (uint64_t)(old >> 64), (uint64_t)old);
-ret = -EILSEQ;
-}
-}
-else /* Best effort, updating domain_id last. */
+/*
+ * Hardware does not update the DTE behind our backs, so the
+ * return value should match "old".
+ */
+if ( res != old )
 {
-uint64_t *ptr = (void *)dte;
-
-write_atomic(ptr + 0, ldte.raw64[0]);
-/* No barrier should be needed between these two. */
-write_atomic(ptr + 1, ldte.raw64[1]);
-
-ret = 1;
+printk(XENLOG_ERR
+   "Dom%d: unexpected DTE %016lx_%016lx (expected 
%016lx_%016lx)\n",
+   domain_id,
+   (uint64_t)(res >> 64), (uint64_t)res,
+   (uint64_t)(old >> 64), (uint64_t)old);
+ret = -EILSEQ;
 }
 
 return ret;
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index f6efd88e36..a213dbea0b 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -312,6 +312,12 @@ static int __init cf_check iov_detect(void)
 return -ENODEV;
 }
 
+if ( unlikely(!cpu_has_cx16) )
+{
+printk("AMD-Vi: CPU doesn't support CMPXCHG16B, disabling\n");
+    return -ENODEV;
+}
+
 init_done = 1;
 
 if ( !amd_iommu_perdev_intremap )
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v3 0/3] x86/iommu: Drop IOMMU support when cx16 isn't supported

2024-04-12 Thread Teddy Astie
All hardware that supports VT-d/AMD-Vi that exists also supports cx16 (aside
specifically crafted virtual machines).

Some IOMMU code paths in Xen consider cases where VT-d/AMD-Vi is supported
while cx16 isn't, those paths may be bugged and are barely tested, dead code
in practice.

Disable IOMMU in case we have IOMMU hardware but no cx16, then cleanup
no-cx16 handling logic from VT-d and AMD-Vi drivers.

Teddy

Changed in v2:

 * Added cleanup no-cx16 code for x2APIC
 * Fixed commit and code formatting
 * Added missing Suggested-by note

Changed in v3:

 * Use -ENODEV instead of -ENOSYS.

Teddy Astie (3):
  VT-d: Disable IOMMU if cx16 isn't supported
  AMD-Vi: Disable IOMMU if cx16 isn't supported
  VT-d: Cleanup MAP_SINGLE_DEVICE and related code

 xen/arch/x86/apic.c |  6 ++
 xen/drivers/passthrough/amd/iommu_map.c | 42 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 ++
 xen/drivers/passthrough/vtd/intremap.c  | 65 ---
 xen/drivers/passthrough/vtd/iommu.c | 92 +++--
 xen/drivers/passthrough/vtd/vtd.h   |  5 +-
 6 files changed, 71 insertions(+), 145 deletions(-)

-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



Re: [XEN PATCH v2 0/3] x86/iommu: Drop IOMMU support when cx16 isn't supported

2024-04-12 Thread Teddy Astie
Le 11/04/2024 à 22:05, Andrew Cooper a écrit :
> On 08/04/2024 2:02 pm, Teddy Astie wrote:
>> All hardware that supports VT-d/AMD-Vi that exists also supports cx16 (aside
>> specifically crafted virtual machines).
>>
>> Some IOMMU code paths in Xen consider cases where VT-d/AMD-Vi is supported
>> while cx16 isn't, those paths may be bugged and are barely tested, dead code
>> in practice.
>>
>> Disable IOMMU in case we have IOMMU hardware but no cx16, then cleanup
>> no-cx16 handling logic from VT-d and AMD-Vi drivers.
>>
>> Teddy
>>
>> Changed in v2:
>>
>>   * Added cleanup no-cx16 code for x2APIC
>>   * Fixed commit and code formatting
>>   * Added missing Suggested-by note
>>
>> Teddy Astie (3):
>>VT-d: Disable IOMMU if cx16 isn't supported
>>AMD-Vi: Disable IOMMU if cx16 isn't supported
>>VT-d: Cleanup MAP_SINGLE_DEVICE and related code
>>
>>   xen/arch/x86/apic.c |  6 ++
>>   xen/drivers/passthrough/amd/iommu_map.c | 42 --
>>   xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 ++
>>   xen/drivers/passthrough/vtd/intremap.c  | 65 ---
>>   xen/drivers/passthrough/vtd/iommu.c | 92 +++--
>>   xen/drivers/passthrough/vtd/vtd.h   |  5 +-
>>   6 files changed, 71 insertions(+), 145 deletions(-)
>>
>
> Sorry, but you've sent out two copies of each patch in this series, and
> it's not clear if they're identical or not.
>
> Please could you send out another version, making sure there's only one
> of each patch.
>
> Also, you need to swap ENOSYS with ENODEV, as per Jan's review on v1.
>
> Thanks,
>
> ~Andrew

Hello,

Not entirely sure why it got sent twice, as marek said he only received
it once. Will double-check next time to avoid this issue in case I
wrongfully sent it twice.

Will also swap ENOSYS with ENODEV in the next version.

Thanks,

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech




[XEN PATCH v2 3/3] VT-d: Cleanup MAP_SINGLE_DEVICE and related code

2024-04-08 Thread Teddy Astie
This flag was only used in case cx16 is not available, as those code paths no
longer exist, this flag now does basically nothing.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/vtd/iommu.c | 12 +++-
 xen/drivers/passthrough/vtd/vtd.h   |  5 ++---
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 47b56f37a9..4b15e6da79 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1692,15 +1692,9 @@ static int domain_context_mapping(struct domain *domain, 
u8 devfn,
 break;
 }
 
-if ( domain != pdev->domain && pdev->domain != dom_io )
-{
-if ( pdev->domain->is_dying )
-mode |= MAP_OWNER_DYING;
-else if ( drhd &&
-  !any_pdev_behind_iommu(pdev->domain, pdev, drhd->iommu) &&
-  !pdev->phantom_stride )
-mode |= MAP_SINGLE_DEVICE;
-}
+if ( domain != pdev->domain && pdev->domain != dom_io &&
+ pdev->domain->is_dying )
+mode |= MAP_OWNER_DYING;
 
 switch ( pdev->type )
 {
diff --git a/xen/drivers/passthrough/vtd/vtd.h 
b/xen/drivers/passthrough/vtd/vtd.h
index cb2df76eed..43f06a353d 100644
--- a/xen/drivers/passthrough/vtd/vtd.h
+++ b/xen/drivers/passthrough/vtd/vtd.h
@@ -28,9 +28,8 @@
  */
 #define MAP_WITH_RMRR (1u << 0)
 #define MAP_OWNER_DYING   (1u << 1)
-#define MAP_SINGLE_DEVICE (1u << 2)
-#define MAP_ERROR_RECOVERY(1u << 3)
-#define UNMAP_ME_PHANTOM_FUNC (1u << 4)
+#define MAP_ERROR_RECOVERY(1u << 2)
+#define UNMAP_ME_PHANTOM_FUNC (1u << 3)
 
 /* Allow for both IOAPIC and IOSAPIC. */
 #define IO_xAPIC_route_entry IO_APIC_route_entry
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v2 1/3] VT-d: Disable IOMMU if cx16 isn't supported

2024-04-08 Thread Teddy Astie
No hardware has VT-d support while not having cx16 support, disable IOMMU in
this case to avoid potentially buggy code.

Now that IOMMU is only enabled if cx16 is supported, drop dead code that
handles cases where cx16 isn't supported.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/arch/x86/apic.c|  6 ++
 xen/drivers/passthrough/vtd/intremap.c | 65 +
 xen/drivers/passthrough/vtd/iommu.c| 80 +-
 3 files changed, 46 insertions(+), 105 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 592b78e11e..91d7f2b248 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -836,6 +836,12 @@ void __init x2apic_bsp_setup(void)
 if ( !cpu_has_x2apic )
 return;
 
+if ( unlikely(!cpu_has_cx16) )
+{
+printk("x2APIC: CPU doesn't support CMPXCHG16B, disabling\n");
+return;
+}
+
 if ( !opt_x2apic )
 {
 if ( !x2apic_enabled )
diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index c504852eb8..b0a0dbdbc2 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -173,47 +173,26 @@ bool __init cf_check intel_iommu_supports_eim(void)
  * Assume iremap_lock has been acquired. It is to make sure software will not
  * change the same IRTE behind us. With this assumption, if only high qword or
  * low qword in IRTE is to be updated, this function's atomic variant can
- * present an atomic update to VT-d hardware even when cmpxchg16b
- * instruction is not supported.
+ * present an atomic update to VT-d hardware.
  */
 static void update_irte(struct vtd_iommu *iommu, struct iremap_entry *entry,
 const struct iremap_entry *new_ire, bool atomic)
 {
-ASSERT(spin_is_locked(>intremap.lock));
-
-if ( cpu_has_cx16 )
-{
-__uint128_t ret;
-struct iremap_entry old_ire;
+__uint128_t ret;
+struct iremap_entry old_ire;
 
-old_ire = *entry;
-ret = cmpxchg16b(entry, _ire, new_ire);
+ASSERT(spin_is_locked(>intremap.lock));
+
+old_ire = *entry;
+ret = cmpxchg16b(entry, _ire, new_ire);
 
-/*
- * In the above, we use cmpxchg16 to atomically update the 128-bit
- * IRTE, and the hardware cannot update the IRTE behind us, so
- * the return value of cmpxchg16 should be the same as old_ire.
- * This ASSERT validate it.
- */
-ASSERT(ret == old_ire.val);
-}
-else
-{
-/*
- * VT-d hardware doesn't update IRTEs behind us, nor the software
- * since we hold iremap_lock. If the caller wants VT-d hardware to
- * always see a consistent entry, but we can't meet it, a bug will
- * be raised.
- */
-if ( entry->lo == new_ire->lo )
-write_atomic(>hi, new_ire->hi);
-else if ( entry->hi == new_ire->hi )
-write_atomic(>lo, new_ire->lo);
-else if ( !atomic )
-*entry = *new_ire;
-else
-BUG();
-}
+/*
+ * In the above, we use cmpxchg16 to atomically update the 128-bit
+ * IRTE, and the hardware cannot update the IRTE behind us, so
+ * the return value of cmpxchg16 should be the same as old_ire.
+ * This ASSERT validate it.
+ */
+ASSERT(ret == old_ire.val);
 }
 
 /* Mark specified intr remap entry as free */
@@ -395,7 +374,6 @@ static int ioapic_rte_to_remap_entry(struct vtd_iommu 
*iommu,
 /* Indicate remap format. */
 remap_rte->format = 1;
 
-/* If cmpxchg16b is not available the caller must mask the IO-APIC pin. */
 update_irte(iommu, iremap_entry, _ire, !init && !masked);
 iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
@@ -437,21 +415,6 @@ void cf_check io_apic_write_remap_rte(
 bool masked = true;
 int rc;
 
-if ( !cpu_has_cx16 )
-{
-   /*
-* Cannot atomically update the IRTE entry: mask the IO-APIC pin to
-* avoid interrupts seeing an inconsistent IRTE entry.
-*/
-old_rte = __ioapic_read_entry(apic, pin, true);
-if ( !old_rte.mask )
-{
-masked = false;
-old_rte.mask = 1;
-__ioapic_write_entry(apic, pin, true, old_rte);
-}
-}
-
 /* Not the initializer, for old gcc to cope. */
 new_rte.raw = rte;
 
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c7110af7c9..47b56f37a9 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1482,7 +1482,7 @@ int domain_context_mapping_one(
 {
 struct domain_iommu *hd = dom_iommu(domain);
 struct context_entry *context, *context_entries, lctxt;
-__uint128_t old;
+__uint128_t res, old;
 uint64_t maddr;
 uint16_t seg = iom

[XEN PATCH v2 0/3] x86/iommu: Drop IOMMU support when cx16 isn't supported

2024-04-08 Thread Teddy Astie
All hardware that supports VT-d/AMD-Vi that exists also supports cx16 (aside
specifically crafted virtual machines).

Some IOMMU code paths in Xen consider cases where VT-d/AMD-Vi is supported
while cx16 isn't, those paths may be bugged and are barely tested, dead code
in practice.

Disable IOMMU in case we have IOMMU hardware but no cx16, then cleanup
no-cx16 handling logic from VT-d and AMD-Vi drivers.

Teddy

Changed in v2:

 * Added cleanup no-cx16 code for x2APIC
 * Fixed commit and code formatting
 * Added missing Suggested-by note

Teddy Astie (3):
  VT-d: Disable IOMMU if cx16 isn't supported
  AMD-Vi: Disable IOMMU if cx16 isn't supported
  VT-d: Cleanup MAP_SINGLE_DEVICE and related code

 xen/arch/x86/apic.c |  6 ++
 xen/drivers/passthrough/amd/iommu_map.c | 42 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 ++
 xen/drivers/passthrough/vtd/intremap.c  | 65 ---
 xen/drivers/passthrough/vtd/iommu.c | 92 +++--
 xen/drivers/passthrough/vtd/vtd.h   |  5 +-
 6 files changed, 71 insertions(+), 145 deletions(-)

-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH v2 2/3] AMD-Vi: Disable IOMMU if cx16 isn't supported

2024-04-08 Thread Teddy Astie
No hardware has AMD-Vi support while not having cx16 support, disable IOMMU
in this case to avoid potentially buggy code.

Now that IOMMU is only enabled if cx16 is supported, drop dead code that
handles cases where cx16 isn't supported.

Suggested-by: Andrew Cooper 
Signed-off-by: Teddy Astie 
---
 xen/drivers/passthrough/amd/iommu_map.c | 42 +++--
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 +++
 2 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
b/xen/drivers/passthrough/amd/iommu_map.c
index e0f4fe736a..f67975e700 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -167,15 +167,14 @@ int amd_iommu_set_root_page_table(struct amd_iommu_dte 
*dte,
 {
 bool valid = flags & SET_ROOT_VALID;
 
-if ( dte->v && dte->tv &&
- (cpu_has_cx16 || (flags & SET_ROOT_WITH_UNITY_MAP)) )
+if ( dte->v && dte->tv )
 {
 union {
 struct amd_iommu_dte dte;
 uint64_t raw64[4];
 __uint128_t raw128[2];
 } ldte = { .dte = *dte };
-__uint128_t old = ldte.raw128[0];
+__uint128_t res, old = ldte.raw128[0];
 int ret = 0;
 
 ldte.dte.domain_id = domain_id;
@@ -185,33 +184,20 @@ int amd_iommu_set_root_page_table(struct amd_iommu_dte 
*dte,
 ldte.dte.paging_mode = paging_mode;
 ldte.dte.v = valid;
 
-if ( cpu_has_cx16 )
-{
-__uint128_t res = cmpxchg16b(dte, , [0]);
+res = cmpxchg16b(dte, , [0]);
 
-/*
- * Hardware does not update the DTE behind our backs, so the
- * return value should match "old".
- */
-if ( res != old )
-{
-printk(XENLOG_ERR
-   "Dom%d: unexpected DTE %016lx_%016lx (expected 
%016lx_%016lx)\n",
-   domain_id,
-   (uint64_t)(res >> 64), (uint64_t)res,
-   (uint64_t)(old >> 64), (uint64_t)old);
-ret = -EILSEQ;
-}
-}
-else /* Best effort, updating domain_id last. */
+/*
+ * Hardware does not update the DTE behind our backs, so the
+ * return value should match "old".
+ */
+if ( res != old )
 {
-uint64_t *ptr = (void *)dte;
-
-write_atomic(ptr + 0, ldte.raw64[0]);
-/* No barrier should be needed between these two. */
-write_atomic(ptr + 1, ldte.raw64[1]);
-
-ret = 1;
+printk(XENLOG_ERR
+   "Dom%d: unexpected DTE %016lx_%016lx (expected 
%016lx_%016lx)\n",
+   domain_id,
+   (uint64_t)(res >> 64), (uint64_t)res,
+   (uint64_t)(old >> 64), (uint64_t)old);
+ret = -EILSEQ;
 }
 
 return ret;
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index f6efd88e36..656c5eda5d 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -312,6 +312,12 @@ static int __init cf_check iov_detect(void)
 return -ENODEV;
 }
 
+if ( unlikely(!cpu_has_cx16) )
+{
+printk("AMD-Vi: CPU doesn't support CMPXCHG16B, disabling\n");
+    return -ENOSYS;
+}
+
 init_done = 1;
 
 if ( !amd_iommu_perdev_intremap )
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH 3/3] VT-d: Cleanup MAP_SINGLE_DEVICE and related dead code.

2024-03-21 Thread Teddy Astie
This flag was only used in case cx16 is not available, as those code paths no 
longer exist, this flag now does basically nothing.

Signed-off-by Teddy Astie 
---
 xen/drivers/passthrough/vtd/iommu.c | 12 +++-
 xen/drivers/passthrough/vtd/vtd.h   |  5 ++---
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 47b56f37a9..4b15e6da79 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1692,15 +1692,9 @@ static int domain_context_mapping(struct domain *domain, 
u8 devfn,
 break;
 }
 
-if ( domain != pdev->domain && pdev->domain != dom_io )
-{
-if ( pdev->domain->is_dying )
-mode |= MAP_OWNER_DYING;
-else if ( drhd &&
-  !any_pdev_behind_iommu(pdev->domain, pdev, drhd->iommu) &&
-  !pdev->phantom_stride )
-mode |= MAP_SINGLE_DEVICE;
-}
+if ( domain != pdev->domain && pdev->domain != dom_io &&
+ pdev->domain->is_dying )
+mode |= MAP_OWNER_DYING;
 
 switch ( pdev->type )
 {
diff --git a/xen/drivers/passthrough/vtd/vtd.h 
b/xen/drivers/passthrough/vtd/vtd.h
index cb2df76eed..43f06a353d 100644
--- a/xen/drivers/passthrough/vtd/vtd.h
+++ b/xen/drivers/passthrough/vtd/vtd.h
@@ -28,9 +28,8 @@
  */
 #define MAP_WITH_RMRR (1u << 0)
 #define MAP_OWNER_DYING   (1u << 1)
-#define MAP_SINGLE_DEVICE (1u << 2)
-#define MAP_ERROR_RECOVERY(1u << 3)
-#define UNMAP_ME_PHANTOM_FUNC (1u << 4)
+#define MAP_ERROR_RECOVERY(1u << 2)
+#define UNMAP_ME_PHANTOM_FUNC (1u << 3)
 
 /* Allow for both IOAPIC and IOSAPIC. */
 #define IO_xAPIC_route_entry IO_APIC_route_entry
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH 1/3] VT-d: Disable IOMMU if cx16 isn't supported

2024-03-21 Thread Teddy Astie
No hardware has VT-d support while not having cx16 support, consider disabling 
IOMMU in this case to avoid potentially buggy code.

Now that IOMMU is only enabled if cx16 is supported, drop dead code that 
handles cases where cx16 isn't supported.

Signed-off-by Teddy Astie 
---
 xen/drivers/passthrough/vtd/intremap.c | 67 +
 xen/drivers/passthrough/vtd/iommu.c| 80 +-
 2 files changed, 41 insertions(+), 106 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index c504852eb8..312b73e693 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -173,47 +173,26 @@ bool __init cf_check intel_iommu_supports_eim(void)
  * Assume iremap_lock has been acquired. It is to make sure software will not
  * change the same IRTE behind us. With this assumption, if only high qword or
  * low qword in IRTE is to be updated, this function's atomic variant can
- * present an atomic update to VT-d hardware even when cmpxchg16b
- * instruction is not supported.
+ * present an atomic update to VT-d hardware.
  */
 static void update_irte(struct vtd_iommu *iommu, struct iremap_entry *entry,
 const struct iremap_entry *new_ire, bool atomic)
 {
-ASSERT(spin_is_locked(>intremap.lock));
-
-if ( cpu_has_cx16 )
-{
-__uint128_t ret;
-struct iremap_entry old_ire;
+__uint128_t ret;
+struct iremap_entry old_ire;
 
-old_ire = *entry;
-ret = cmpxchg16b(entry, _ire, new_ire);
+ASSERT(spin_is_locked(>intremap.lock));
+
+old_ire = *entry;
+ret = cmpxchg16b(entry, _ire, new_ire);
 
-/*
- * In the above, we use cmpxchg16 to atomically update the 128-bit
- * IRTE, and the hardware cannot update the IRTE behind us, so
- * the return value of cmpxchg16 should be the same as old_ire.
- * This ASSERT validate it.
- */
-ASSERT(ret == old_ire.val);
-}
-else
-{
-/*
- * VT-d hardware doesn't update IRTEs behind us, nor the software
- * since we hold iremap_lock. If the caller wants VT-d hardware to
- * always see a consistent entry, but we can't meet it, a bug will
- * be raised.
- */
-if ( entry->lo == new_ire->lo )
-write_atomic(>hi, new_ire->hi);
-else if ( entry->hi == new_ire->hi )
-write_atomic(>lo, new_ire->lo);
-else if ( !atomic )
-*entry = *new_ire;
-else
-BUG();
-}
+/*
+ * In the above, we use cmpxchg16 to atomically update the 128-bit
+ * IRTE, and the hardware cannot update the IRTE behind us, so
+ * the return value of cmpxchg16 should be the same as old_ire.
+ * This ASSERT validate it.
+ */
+ASSERT(ret == old_ire.val);
 }
 
 /* Mark specified intr remap entry as free */
@@ -394,8 +373,7 @@ static int ioapic_rte_to_remap_entry(struct vtd_iommu 
*iommu,
 remap_rte->reserved = 0;
 /* Indicate remap format. */
 remap_rte->format = 1;
-
-/* If cmpxchg16b is not available the caller must mask the IO-APIC pin. */
+
 update_irte(iommu, iremap_entry, _ire, !init && !masked);
 iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
 iommu_flush_iec_index(iommu, 0, index);
@@ -437,21 +415,6 @@ void cf_check io_apic_write_remap_rte(
 bool masked = true;
 int rc;
 
-if ( !cpu_has_cx16 )
-{
-   /*
-* Cannot atomically update the IRTE entry: mask the IO-APIC pin to
-* avoid interrupts seeing an inconsistent IRTE entry.
-*/
-old_rte = __ioapic_read_entry(apic, pin, true);
-if ( !old_rte.mask )
-{
-masked = false;
-old_rte.mask = 1;
-__ioapic_write_entry(apic, pin, true, old_rte);
-}
-}
-
 /* Not the initializer, for old gcc to cope. */
 new_rte.raw = rte;
 
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c7110af7c9..47b56f37a9 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1482,7 +1482,7 @@ int domain_context_mapping_one(
 {
 struct domain_iommu *hd = dom_iommu(domain);
 struct context_entry *context, *context_entries, lctxt;
-__uint128_t old;
+__uint128_t res, old;
 uint64_t maddr;
 uint16_t seg = iommu->drhd->segment, prev_did = 0;
 struct domain *prev_dom = NULL;
@@ -1580,55 +1580,23 @@ int domain_context_mapping_one(
 ASSERT(!context_fault_disable(lctxt));
 }
 
-if ( cpu_has_cx16 )
-{
-__uint128_t res = cmpxchg16b(context, , );
+res = cmpxchg16b(context, , );
 
-/*
- * Hardware does not update the context entry behind our backs,
- * so the return value should match "old".
- */

[XEN PATCH 2/3] AMD-Vi: Disable IOMMU if cx16 isn't supported

2024-03-21 Thread Teddy Astie
No hardware has VT-d support while not having cx16 support, consider disabling 
IOMMU in this case to avoid potentially buggy code.

Now that IOMMU is only enabled if cx16 is supported, drop dead code that 
handles cases where cx16 isn't supported.

Signed-off-by Teddy Astie 
---
 xen/drivers/passthrough/amd/iommu_map.c | 43 +++--
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 +++
 2 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
b/xen/drivers/passthrough/amd/iommu_map.c
index e0f4fe736a..c8c1c0cfae 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -167,15 +167,14 @@ int amd_iommu_set_root_page_table(struct amd_iommu_dte 
*dte,
 {
 bool valid = flags & SET_ROOT_VALID;
 
-if ( dte->v && dte->tv &&
- (cpu_has_cx16 || (flags & SET_ROOT_WITH_UNITY_MAP)) )
+if ( dte->v && dte->tv )
 {
 union {
 struct amd_iommu_dte dte;
 uint64_t raw64[4];
 __uint128_t raw128[2];
 } ldte = { .dte = *dte };
-__uint128_t old = ldte.raw128[0];
+__uint128_t res, old = ldte.raw128[0];
 int ret = 0;
 
 ldte.dte.domain_id = domain_id;
@@ -185,33 +184,21 @@ int amd_iommu_set_root_page_table(struct amd_iommu_dte 
*dte,
 ldte.dte.paging_mode = paging_mode;
 ldte.dte.v = valid;
 
-if ( cpu_has_cx16 )
-{
-__uint128_t res = cmpxchg16b(dte, , [0]);
+
+res = cmpxchg16b(dte, , [0]);
 
-/*
- * Hardware does not update the DTE behind our backs, so the
- * return value should match "old".
- */
-if ( res != old )
-{
-printk(XENLOG_ERR
-   "Dom%d: unexpected DTE %016lx_%016lx (expected 
%016lx_%016lx)\n",
-   domain_id,
-   (uint64_t)(res >> 64), (uint64_t)res,
-   (uint64_t)(old >> 64), (uint64_t)old);
-ret = -EILSEQ;
-}
-}
-else /* Best effort, updating domain_id last. */
+/*
+ * Hardware does not update the DTE behind our backs, so the
+ * return value should match "old".
+ */
+if ( res != old )
 {
-uint64_t *ptr = (void *)dte;
-
-write_atomic(ptr + 0, ldte.raw64[0]);
-/* No barrier should be needed between these two. */
-write_atomic(ptr + 1, ldte.raw64[1]);
-
-ret = 1;
+printk(XENLOG_ERR
+   "Dom%d: unexpected DTE %016lx_%016lx (expected 
%016lx_%016lx)\n",
+   domain_id,
+   (uint64_t)(res >> 64), (uint64_t)res,
+   (uint64_t)(old >> 64), (uint64_t)old);
+ret = -EILSEQ;
 }
 
 return ret;
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index f6efd88e36..656c5eda5d 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -312,6 +312,12 @@ static int __init cf_check iov_detect(void)
 return -ENODEV;
 }
 
+if ( unlikely(!cpu_has_cx16) )
+{
+printk("AMD-Vi: CPU doesn't support CMPXCHG16B, disabling\n");
+    return -ENOSYS;
+}
+
 init_done = 1;
 
 if ( !amd_iommu_perdev_intremap )
-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



[XEN PATCH 0/3] x86/iommu: Drop IOMMU support when cpu doesn't support cx16.

2024-03-21 Thread Teddy Astie
All hardware that supports VT-d/AMD-Vi that exists also supports cx16 (aside 
specifically crafted virtual machines).

Some IOMMU code paths in Xen consider cases where VT-d/AMD-Vi is supported 
while cx16 isn't, those paths may be bugged and are barely tested, dead code in 
practice.

Consider disabling IOMMU in case we have IOMMU hardware but no cx16, then 
cleanup no-cx16 handling logic from VT-d and AMD-Vi drivers.

Teddy Astie (3):
  VT-d: Disable IOMMU if cx16 isn't supported
  AMD-Vi: Disable IOMMU if cx16 isn't supported
  VT-d: Cleanup MAP_SINGLE_DEVICE and related dead code.

 xen/drivers/passthrough/amd/iommu_map.c | 43 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  6 ++
 xen/drivers/passthrough/vtd/intremap.c  | 67 ---
 xen/drivers/passthrough/vtd/iommu.c | 92 +++--
 xen/drivers/passthrough/vtd/vtd.h   |  5 +-
 5 files changed, 67 insertions(+), 146 deletions(-)

-- 
2.44.0



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech