Re: [PATCH v6 12/13] xen/arm: translate virtual PCI bus topology for guests

2022-02-03 Thread Jan Beulich
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -168,6 +168,35 @@ static void vpci_remove_virtual_device(struct domain *d,
>  pdev->vpci->guest_sbdf.sbdf = ~0;
>  }
>  
> +/*
> + * Find the physical device which is mapped to the virtual device
> + * and translate virtual SBDF to the physical one.
> + */
> +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
> +{
> +struct pci_dev *pdev;
> +
> +ASSERT(!is_hardware_domain(d));

In addition to this, don't you also need to assert that pcidevs_lock is
held (or if it isn't, you'd need to acquire it) for ...

> +for_each_pdev( d, pdev )

... this to be race-free?

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-03 Thread Jan Beulich
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
> uint16_t cmd, bool rom_only)
>  continue;
>  }
>  
> +spin_lock(&tmp->vpci_lock);
> +if ( !tmp->vpci )
> +{
> +spin_unlock(&tmp->vpci_lock);
> +continue;
> +}
>  for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>  {
>  const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
> uint16_t cmd, bool rom_only)
>  rc = rangeset_remove_range(mem, start, end);
>  if ( rc )
>  {
> +spin_unlock(&tmp->vpci_lock);
>  printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
> start, end, rc);
>  rangeset_destroy(mem);
>  return rc;
>  }
>  }
> +spin_unlock(&tmp->vpci_lock);
>  }

At the first glance this simply looks like another unjustified (in the
description) change, as you're not converting anything here but you
actually add locking (and I realize this was there before, so I'm sorry
for not pointing this out earlier). But then I wonder whether you
actually tested this, since I can't help getting the impression that
you're introducing a live-lock: The function is called from cmd_write()
and rom_write(), which in turn are called out of vpci_write(). Yet that
function already holds the lock, and the lock is not (currently)
recursive. (For the 3rd caller of the function - init_bars() - otoh
the locking looks to be entirely unnecessary.)

Then again this was present already even in Roger's original patch, so
I guess I must be missing something ...

> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -138,7 +138,7 @@ static void control_write(const struct pci_dev *pdev, 
> unsigned int reg,
>  pci_conf_write16(pdev->sbdf, reg, val);
>  }
>  
> -static struct vpci_msix *msix_find(const struct domain *d, unsigned long 
> addr)
> +static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr)
>  {
>  struct vpci_msix *msix;
>  
> @@ -150,15 +150,29 @@ static struct vpci_msix *msix_find(const struct domain 
> *d, unsigned long addr)
>  for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>  if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
>   VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
> +{
> +spin_lock(&msix->pdev->vpci_lock);
>  return msix;
> +}

I think deliberately returning with a lock held requires a respective
comment ahead of the function.

>  }
>  
>  return NULL;
>  }
>  
> +static void msix_put(struct vpci_msix *msix)
> +{
> +if ( !msix )
> +return;
> +
> +spin_unlock(&msix->pdev->vpci_lock);
> +}

Maybe shorter

if ( msix )
spin_unlock(&msix->pdev->vpci_lock);

? Yet there's only one case where you may pass NULL in here, so
maybe it's better anyway to move the conditional ...

>  static int msix_accept(struct vcpu *v, unsigned long addr)
>  {
> -return !!msix_find(v->domain, addr);
> +struct vpci_msix *msix = msix_get(v->domain, addr);
> +
> +msix_put(msix);
> +return !!msix;
>  }

... here?

> @@ -186,7 +200,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, 
> unsigned int len,
>   unsigned long *data)
>  {
>  const struct domain *d = v->domain;
> -struct vpci_msix *msix = msix_find(d, addr);
> +struct vpci_msix *msix = msix_get(d, addr);
>  const struct vpci_msix_entry *entry;
>  unsigned int offset;
>  
> @@ -196,7 +210,10 @@ static int msix_read(struct vcpu *v, unsigned long addr, 
> unsigned int len,
>  return X86EMUL_RETRY;
>  
>  if ( !access_allowed(msix->pdev, addr, len) )
> +{
> +msix_put(msix);
>  return X86EMUL_OKAY;
> +}
>  
>  if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
>  {
> @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long 
> addr, unsigned int len,
>  break;
>  }
>  
> +msix_put(msix);
>  return X86EMUL_OKAY;
>  }
>  
> -spin_lock(&msix->pdev->vpci->lock);
>  entry = get_entry(msix, addr);
>  offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);

You're increasing the locked region quite a bit here. If this is really
needed, it wants explaining. And if this is deemed acceptable as a
"side effect", it wants justifying or at least stating imo. Same for
msix_write() then, obviously. (I'm not sure Roger actually implied this
when suggesting to switch to the get/put pair.)

> @@ -327,7 +334,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
> unsigned int size)
>  if ( !pdev )
>  return vpci_read_hw(sbdf, reg, size);
>  
> -spin

[PATCH v6 13/13] xen/arm: account IO handlers for emulated PCI MSI-X

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

At the moment, we always allocate an extra 16 slots for IO handlers
(see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
MSI-X registers we need to explicitly tell that we have additional IO
handlers, so those are accounted.

Signed-off-by: Oleksandr Andrushchenko 

---
Cc: Julien Grall 
Cc: Stefano Stabellini 
---
This actually moved here from the part 2 of the prep work for PCI
passthrough on Arm as it seems to be the proper place for it.

Since v5:
- optimize with IS_ENABLED(CONFIG_HAS_PCI_MSI) since VPCI_MAX_VIRT_DEV is
  defined unconditionally
New in v5
---
 xen/arch/arm/vpci.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 84b2b068a0fe..c5902cb9d34d 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -131,6 +131,8 @@ static int vpci_get_num_handlers_cb(struct domain *d,
 
 unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
 {
+unsigned int count;
+
 if ( !has_vpci(d) )
 return 0;
 
@@ -151,7 +153,17 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct 
domain *d)
  * For guests each host bridge requires one region to cover the
  * configuration space. At the moment, we only expose a single host bridge.
  */
-return 1;
+count = 1;
+
+/*
+ * There's a single MSI-X MMIO handler that deals with both PBA
+ * and MSI-X tables per each PCI device being passed through.
+ * Maximum number of emulated virtual devices is VPCI_MAX_VIRT_DEV.
+ */
+if ( IS_ENABLED(CONFIG_HAS_PCI_MSI) )
+count += VPCI_MAX_VIRT_DEV;
+
+return count;
 }
 
 /*
-- 
2.25.1




[PATCH v6 12/13] xen/arm: translate virtual PCI bus topology for guests

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

There are three  originators for the PCI configuration space access:
1. The domain that owns physical host bridge: MMIO handlers are
there so we can update vPCI register handlers with the values
written by the hardware domain, e.g. physical view of the registers
vs guest's view on the configuration space.
2. Guest access to the passed through PCI devices: we need to properly
map virtual bus topology to the physical one, e.g. pass the configuration
space access to the corresponding physical devices.
3. Emulated host PCI bridge access. It doesn't exist in the physical
topology, e.g. it can't be mapped to some physical host bridge.
So, all access to the host bridge itself needs to be trapped and
emulated.

Signed-off-by: Oleksandr Andrushchenko 
---
Since v5:
- add vpci_translate_virtual_device for #ifndef CONFIG_HAS_VPCI_GUEST_SUPPORT
  case to simplify ifdefery
- add ASSERT(!is_hardware_domain(d)); to vpci_translate_virtual_device
- reset output register on failed virtual SBDF translation
Since v4:
- indentation fixes
- constify struct domain
- updated commit message
- updates to the new locking scheme (pdev->vpci_lock)
Since v3:
- revisit locking
- move code to vpci.c
Since v2:
 - pass struct domain instead of struct vcpu
 - constify arguments where possible
 - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/arch/arm/vpci.c | 17 +
 xen/drivers/vpci/vpci.c | 29 +
 xen/include/xen/vpci.h  |  7 +++
 3 files changed, 53 insertions(+)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index a9fc5817f94e..84b2b068a0fe 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -41,6 +41,16 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 /* data is needed to prevent a pointer cast on 32bit */
 unsigned long data;
 
+/*
+ * For the passed through devices we need to map their virtual SBDF
+ * to the physical PCI device being passed through.
+ */
+if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
+{
+*r = ~0ul;
+return 1;
+}
+
 if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
 1U << info->dabt.size, &data) )
 {
@@ -59,6 +69,13 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
 struct pci_host_bridge *bridge = p;
 pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
 
+/*
+ * For the passed through devices we need to map their virtual SBDF
+ * to the physical PCI device being passed through.
+ */
+if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
+return 1;
+
 return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
1U << info->dabt.size, r);
 }
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 7d422d11f83d..070db7391391 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -168,6 +168,35 @@ static void vpci_remove_virtual_device(struct domain *d,
 pdev->vpci->guest_sbdf.sbdf = ~0;
 }
 
+/*
+ * Find the physical device which is mapped to the virtual device
+ * and translate virtual SBDF to the physical one.
+ */
+bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
+{
+struct pci_dev *pdev;
+
+ASSERT(!is_hardware_domain(d));
+
+for_each_pdev( d, pdev )
+{
+bool found;
+
+spin_lock(&pdev->vpci_lock);
+found = pdev->vpci && (pdev->vpci->guest_sbdf.sbdf == sbdf->sbdf);
+spin_unlock(&pdev->vpci_lock);
+
+if ( found )
+{
+/* Replace guest SBDF with the physical one. */
+*sbdf = pdev->sbdf;
+return true;
+}
+}
+
+return false;
+}
+
 /* Notify vPCI that device is assigned to guest. */
 int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
 {
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 1f04d34a2369..f6eb9f2051af 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -271,6 +271,7 @@ static inline bool __must_check vpci_process_pending(struct 
vcpu *v)
 /* Notify vPCI that device is assigned/de-assigned to/from guest. */
 int vpci_assign_device(struct domain *d, struct pci_dev *pdev);
 void vpci_deassign_device(struct domain *d, struct pci_dev *pdev);
+bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf);
 #else
 static inline int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
 {
@@ -280,6 +281,12 @@ static inline int vpci_assign_device(struct domain *d, 
struct pci_dev *pdev)
 static inline void vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
 {
 };
+
+static inline bool vpci_translate_virtual_device(const struct domain *d,
+ pci_sbdf_t *sbdf)
+{
+return false;
+}
 #endif
 
 #endif
-- 
2.25.1




[PATCH v6 11/13] vpci: add initial support for virtual PCI bus topology

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Please note, that at the moment only function 0 of a multifunction
device can be passed through.

Signed-off-by: Oleksandr Andrushchenko 
---
Since v5:
- s/vpci_add_virtual_device/add_virtual_device and make it static
- call add_virtual_device from vpci_assign_device and do not use
  REGISTER_VPCI_INIT machinery
- add pcidevs_locked ASSERT
- use DECLARE_BITMAP for vpci_dev_assigned_map
Since v4:
- moved and re-worked guest sbdf initializers
- s/set_bit/__set_bit
- s/clear_bit/__clear_bit
- minor comment fix s/Virtual/Guest/
- added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
  later for counting the number of MMIO handlers required for a guest
  (Julien)
Since v3:
 - make use of VPCI_INIT
 - moved all new code to vpci.c which belongs to it
 - changed open-coded 31 to PCI_SLOT(~0)
 - added comments and code to reject multifunction devices with
   functions other than 0
 - updated comment about vpci_dev_next and made it unsigned int
 - implement roll back in case of error while assigning/deassigning devices
 - s/dom%pd/%pd
Since v2:
 - remove casts that are (a) malformed and (b) unnecessary
 - add new line for better readability
 - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
functions are now completely gated with this config
 - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/drivers/vpci/vpci.c | 74 +++--
 xen/include/xen/sched.h |  8 +
 xen/include/xen/vpci.h  | 11 ++
 3 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 3177f13c1c22..7d422d11f83d 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -89,6 +89,9 @@ int vpci_add_handlers(struct pci_dev *pdev)
 return -ENOMEM;
 
 INIT_LIST_HEAD(&vpci->handlers);
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+vpci->guest_sbdf.sbdf = ~0;
+#endif
 
 spin_lock(&pdev->vpci_lock);
 pdev->vpci = vpci;
@@ -114,6 +117,57 @@ int vpci_add_handlers(struct pci_dev *pdev)
 }
 
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+static int add_virtual_device(struct pci_dev *pdev)
+{
+struct domain *d = pdev->domain;
+pci_sbdf_t sbdf = { 0 };
+unsigned long new_dev_number;
+
+if ( is_hardware_domain(d) )
+return 0;
+
+ASSERT(pcidevs_locked());
+
+/*
+ * Each PCI bus supports 32 devices/slots at max or up to 256 when
+ * there are multi-function ones which are not yet supported.
+ */
+if ( pdev->info.is_extfn )
+{
+gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
+ &pdev->sbdf);
+return -EOPNOTSUPP;
+}
+
+new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
+ VPCI_MAX_VIRT_DEV);
+if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
+return -ENOSPC;
+
+__set_bit(new_dev_number, &d->vpci_dev_assigned_map);
+
+/*
+ * Both segment and bus number are 0:
+ *  - we emulate a single host bridge for the guest, e.g. segment 0
+ *  - with bus 0 the virtual devices are seen as embedded
+ *endpoints behind the root complex
+ *
+ * TODO: add support for multi-function devices.
+ */
+sbdf.devfn = PCI_DEVFN(new_dev_number, 0);
+pdev->vpci->guest_sbdf = sbdf;
+
+return 0;
+
+}
+
+static void vpci_remove_virtual_device(struct domain *d,
+   const struct pci_dev *pdev)
+{
+__clear_bit(pdev->vpci->guest_sbdf.dev, &d->vpci_dev_assigned_map);
+pdev->vpci->guest_sbdf.sbdf = ~0;
+}
+
 /* Notify vPCI that device is assigned to guest. */
 int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
 {
@@ -124,8 +178,16 @@ int vpci_assign_device(struct domain *d, struct pci_dev 
*pdev)
 
 rc = vpci_add_handlers(pdev);
 if ( rc )
-vpci_deassign_device(d, pdev);
+goto fail;
+
+rc = add_virtual_device(pdev);
+if ( rc )
+goto fail;
 
+return 0;
+
+ fail:
+vpci_deassign_device(d, pdev);
 return rc;
 }
 
@@ -135,7 +197,15 @@ void vpci_deassign_device(struct domain *d, struct pci_dev 
*pdev)
 if ( !has_vpci(d) )
 return;
 
-vpci_remove_device(pdev);
+spin_lock(&pdev->vpci_lock);
+if ( !pdev->vpci )
+goto done;
+
+vpci_remove_virtual_device(d, pdev);
+vpci_remove_device_locked(pdev);
+
+ done:
+spin_unlock(&pdev->vpci_lock);
 }
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 37f78cc4c4c9..3c25e265eaa8 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -444,6 +444,14 @@ struct domain
 
 #i

[PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

Reset the command register when assigning a PCI device to a guest:
according to the PCI spec the PCI_COMMAND register is typically all 0's
after reset.

Signed-off-by: Oleksandr Andrushchenko 
---
Since v5:
- updated commit message
Since v1:
 - do not write 0 to the command register, but respect host settings.
---
 xen/drivers/vpci/header.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 33d8c15ae6e8..407fa2fc4749 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -454,8 +454,7 @@ static void cmd_write(const struct pci_dev *pdev, unsigned 
int reg,
 pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
-static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
-uint32_t cmd, void *data)
+static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
 {
 /* TODO: Add proper emulation for all bits of the command register. */
 
@@ -467,7 +466,13 @@ static void guest_cmd_write(const struct pci_dev *pdev, 
unsigned int reg,
 }
 #endif
 
-cmd_write(pdev, reg, cmd, data);
+return cmd;
+}
+
+static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
+uint32_t cmd, void *data)
+{
+cmd_write(pdev, reg, emulate_cmd_reg(pdev, cmd), data);
 }
 
 static void bar_write(const struct pci_dev *pdev, unsigned int reg,
@@ -676,6 +681,10 @@ static int init_bars(struct pci_dev *pdev)
 return -EOPNOTSUPP;
 }
 
+/* Reset the command register for the guest. */
+if ( !is_hwdom )
+pci_conf_write16(pdev->sbdf, PCI_COMMAND, emulate_cmd_reg(pdev, 0));
+
 /* Setup a handler for the command register. */
 rc = vpci_add_register(pdev->vpci, vpci_hw_read16,
is_hwdom ? cmd_write : guest_cmd_write,
-- 
2.25.1




[PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for guests

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

Add basic emulation support for guests. At the moment only emulate
PCI_COMMAND_INTX_DISABLE bit, the rest is not emulated yet and left
as TODO.

Signed-off-by: Oleksandr Andrushchenko 
---
Since v5:
- add additional check for MSI-X enabled while altering INTX bit
- make sure INTx disabled while guests enable MSI/MSI-X
Since v3:
- gate more code on CONFIG_HAS_MSI
- removed logic for the case when MSI/MSI-X not enabled
---
 xen/drivers/vpci/header.c | 21 +++--
 xen/drivers/vpci/msi.c|  4 
 xen/drivers/vpci/msix.c   |  4 
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 88ca1ad8211d..33d8c15ae6e8 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -454,6 +454,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned 
int reg,
 pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
+static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
+uint32_t cmd, void *data)
+{
+/* TODO: Add proper emulation for all bits of the command register. */
+
+#ifdef CONFIG_HAS_PCI_MSI
+if ( pdev->vpci->msi->enabled || pdev->vpci->msix->enabled )
+{
+/* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X 
enabled. */
+cmd |= PCI_COMMAND_INTX_DISABLE;
+}
+#endif
+
+cmd_write(pdev, reg, cmd, data);
+}
+
 static void bar_write(const struct pci_dev *pdev, unsigned int reg,
   uint32_t val, void *data)
 {
@@ -661,8 +677,9 @@ static int init_bars(struct pci_dev *pdev)
 }
 
 /* Setup a handler for the command register. */
-rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
-   2, header);
+rc = vpci_add_register(pdev->vpci, vpci_hw_read16,
+   is_hwdom ? cmd_write : guest_cmd_write,
+   PCI_COMMAND, 2, header);
 if ( rc )
 return rc;
 
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index e3ce46869dad..90465dcb4831 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -70,6 +70,10 @@ static void control_write(const struct pci_dev *pdev, 
unsigned int reg,
 
 if ( vpci_msi_arch_enable(msi, pdev, vectors) )
 return;
+
+/* Make sure guest doesn't enable INTx while enabling MSI. */
+if ( !is_hardware_domain(pdev->domain) )
+pci_intx(pdev, false);
 }
 else
 vpci_msi_arch_disable(msi, pdev);
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index d1dbfc6e0ffd..4c0e1836b589 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -92,6 +92,10 @@ static void control_write(const struct pci_dev *pdev, 
unsigned int reg,
 for ( i = 0; i < msix->max_entries; i++ )
 if ( !msix->entries[i].masked && msix->entries[i].updated )
 update_entry(&msix->entries[i], pdev, i);
+
+/* Make sure guest doesn't enable INTx while enabling MSI-X. */
+if ( !is_hardware_domain(pdev->domain) )
+pci_intx(pdev, false);
 }
 else if ( !new_enabled && msix->enabled )
 {
-- 
2.25.1




[PATCH v6 08/13] vpci/header: program p2m with guest BAR view

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

Take into account guest's BAR view and program its p2m accordingly:
gfn is guest's view of the BAR and mfn is the physical BAR value as set
up by the PCI bus driver in the hardware domain.
This way hardware domain sees physical BAR values and guest sees
emulated ones.

Signed-off-by: Oleksandr Andrushchenko 
---
Since v5:
- remove debug print in map_range callback
- remove "identity" from the debug print
Since v4:
- moved start_{gfn|mfn} calculation into map_range
- pass vpci_bar in the map_data instead of start_{gfn|mfn}
- s/guest_addr/guest_reg
Since v3:
- updated comment (Roger)
- removed gfn_add(map->start_gfn, rc); which is wrong
- use v->domain instead of v->vpci.pdev->domain
- removed odd e.g. in comment
- s/d%d/%pd in altered code
- use gdprintk for map/unmap logs
Since v2:
- improve readability for data.start_gfn and restructure ?: construct
Since v1:
 - s/MSI/MSI-X in comments
---
 xen/drivers/vpci/header.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 0c94504b87d8..88ca1ad8211d 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -30,6 +30,7 @@
 
 struct map_data {
 struct domain *d;
+const struct vpci_bar *bar;
 bool map;
 };
 
@@ -41,8 +42,21 @@ static int map_range(unsigned long s, unsigned long e, void 
*data,
 
 for ( ; ; )
 {
+/* Start address of the BAR as seen by the guest. */
+gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d)
+? map->bar->addr
+: map->bar->guest_reg));
+/* Physical start address of the BAR. */
+mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
 unsigned long size = e - s + 1;
 
+/*
+ * Ranges to be mapped don't always start at the BAR start address, as
+ * there can be holes or partially consumed ranges. Account for the
+ * offset of the current address from the BAR start.
+ */
+start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));
+
 /*
  * ARM TODOs:
  * - On ARM whether the memory is prefetchable or not should be passed
@@ -52,8 +66,8 @@ static int map_range(unsigned long s, unsigned long e, void 
*data,
  * - {un}map_mmio_regions doesn't support preemption.
  */
 
-rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
-  : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
+rc = map->map ? map_mmio_regions(map->d, start_gfn, size, _mfn(s))
+  : unmap_mmio_regions(map->d, start_gfn, size, _mfn(s));
 if ( rc == 0 )
 {
 *c += size;
@@ -62,8 +76,8 @@ static int map_range(unsigned long s, unsigned long e, void 
*data,
 if ( rc < 0 )
 {
 printk(XENLOG_G_WARNING
-   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
-   map->map ? "" : "un", s, e, map->d->domain_id, rc);
+   "Failed to %smap [%lx, %lx] for %pd: %d\n",
+   map->map ? "" : "un", s, e, map->d, rc);
 break;
 }
 ASSERT(rc < size);
@@ -154,6 +168,7 @@ bool vpci_process_pending(struct vcpu *v)
 if ( rangeset_is_empty(bar->mem) )
 continue;
 
+data.bar = bar;
 rc = rangeset_consume_ranges(bar->mem, map_range, &data);
 
 if ( rc == -ERESTART )
@@ -206,6 +221,7 @@ static int __init apply_map(struct domain *d, const struct 
pci_dev *pdev,
 if ( rangeset_is_empty(bar->mem) )
 continue;
 
+data.bar = bar;
 while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
   &data)) == -ERESTART )
 process_pending_softirqs();
-- 
2.25.1




[PATCH v6 06/13] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

Add relevant vpci register handlers when assigning PCI device to a domain
and remove those when de-assigning. This allows having different
handlers for different domains, e.g. hwdom and other guests.

Emulate guest BAR register values: this allows creating a guest view
of the registers and emulates size and properties probe as it is done
during PCI device enumeration by the guest.

All empty, IO and ROM BARs for guests are emulated by returning 0 on
reads and ignoring writes: this BARs are special with this respect as
their lower bits have special meaning, so returning default ~0 on read
may confuse guest OS.

Memory decoding is initially disabled when used by guests in order to
prevent the BAR being placed on top of a RAM region.

Signed-off-by: Oleksandr Andrushchenko 
---
Since v5:
- make sure that the guest set address has the same page offset
  as the physical address on the host
- remove guest_rom_{read|write} as those just implement the default
  behaviour of the registers not being handled
- adjusted comment for struct vpci.addr field
- add guest handlers for BARs which are not handled and will otherwise
  return ~0 on read and ignore writes. The BARs are special with this
  respect as their lower bits have special meaning, so returning ~0
  doesn't seem to be right
Since v4:
- updated commit message
- s/guest_addr/guest_reg
Since v3:
- squashed two patches: dynamic add/remove handlers and guest BAR
  handler implementation
- fix guest BAR read of the high part of a 64bit BAR (Roger)
- add error handling to vpci_assign_device
- s/dom%pd/%pd
- blank line before return
Since v2:
- remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
  has been eliminated from being built on x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - simplify some code3. simplify
 - use gdprintk + error code instead of gprintk
 - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
   so these do not get compiled for x86
 - removed unneeded is_system_domain check
 - re-work guest read/write to be much simpler and do more work on write
   than read which is expected to be called more frequently
 - removed one too obvious comment
---
 xen/drivers/vpci/header.c | 131 +-
 xen/include/xen/vpci.h|   3 +
 2 files changed, 118 insertions(+), 16 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index bd23c0274d48..2620a95ff35b 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -406,6 +406,81 @@ static void bar_write(const struct pci_dev *pdev, unsigned 
int reg,
 pci_conf_write32(pdev->sbdf, reg, val);
 }
 
+static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
+uint32_t val, void *data)
+{
+struct vpci_bar *bar = data;
+bool hi = false;
+uint64_t guest_reg = bar->guest_reg;
+
+if ( bar->type == VPCI_BAR_MEM64_HI )
+{
+ASSERT(reg > PCI_BASE_ADDRESS_0);
+bar--;
+hi = true;
+}
+else
+{
+val &= PCI_BASE_ADDRESS_MEM_MASK;
+val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
+   : PCI_BASE_ADDRESS_MEM_TYPE_64;
+val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
+}
+
+guest_reg &= ~(0xull << (hi ? 32 : 0));
+guest_reg |= (uint64_t)val << (hi ? 32 : 0);
+
+guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
+
+/*
+ * Make sure that the guest set address has the same page offset
+ * as the physical address on the host or otherwise things won't work as
+ * expected.
+ */
+if ( (guest_reg & (~PAGE_MASK & PCI_BASE_ADDRESS_MEM_MASK)) !=
+ (bar->addr & ~PAGE_MASK) )
+{
+gprintk(XENLOG_WARNING,
+"%pp: ignored BAR %zu write with wrong page offset\n",
+&pdev->sbdf, bar - pdev->vpci->header.bars + hi);
+return;
+}
+
+bar->guest_reg = guest_reg;
+}
+
+static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
+   void *data)
+{
+const struct vpci_bar *bar = data;
+bool hi = false;
+
+if ( bar->type == VPCI_BAR_MEM64_HI )
+{
+ASSERT(reg > PCI_BASE_ADDRESS_0);
+bar--;
+hi = true;
+}
+
+return bar->guest_reg >> (hi ? 32 : 0);
+}
+
+static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev,
+  unsigned int reg, void *data)
+{
+return 0;
+}
+
+static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg,
+ struct vpci_bar *bar)
+{
+if ( is_hardware_domain(pdev->domain) )
+return 0;
+
+return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
+ reg, 4, bar);
+}
+
 static void rom_write(const struct pci_dev *pd

[PATCH v6 04/13] vpci: restrict unhandled read/write operations for guests

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

A guest can read and write those registers which are not emulated and
have no respective vPCI handlers, so it can access the HW directly.
In order to prevent a guest from reads and writes from/to the unhandled
registers make sure only hardware domain can access HW directly and restrict
guests from doing so.

Suggested-by: Roger Pau Monné 
Signed-off-by: Oleksandr Andrushchenko 

---
New in v6
---
 xen/drivers/vpci/vpci.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index cb2ababa28e3..f8a93e61c08f 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -215,11 +215,15 @@ int vpci_remove_register(struct vpci *vpci, unsigned int 
offset,
 }
 
 /* Wrappers for performing reads/writes to the underlying hardware. */
-static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int reg,
+static uint32_t vpci_read_hw(bool is_hwdom, pci_sbdf_t sbdf, unsigned int reg,
  unsigned int size)
 {
 uint32_t data;
 
+/* Guest domains are not allowed to read real hardware. */
+if ( !is_hwdom )
+return ~(uint32_t)0;
+
 switch ( size )
 {
 case 4:
@@ -260,9 +264,13 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int 
reg,
 return data;
 }
 
-static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
-  uint32_t data)
+static void vpci_write_hw(bool is_hwdom, pci_sbdf_t sbdf, unsigned int reg,
+  unsigned int size, uint32_t data)
 {
+/* Guest domains are not allowed to write real hardware. */
+if ( !is_hwdom )
+return;
+
 switch ( size )
 {
 case 4:
@@ -322,6 +330,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
unsigned int size)
 const struct vpci_register *r;
 unsigned int data_offset = 0;
 uint32_t data = ~(uint32_t)0;
+bool is_hwdom = is_hardware_domain(d);
 
 if ( !size )
 {
@@ -332,13 +341,13 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
unsigned int size)
 /* Find the PCI dev matching the address. */
 pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
 if ( !pdev )
-return vpci_read_hw(sbdf, reg, size);
+return vpci_read_hw(is_hwdom, sbdf, reg, size);
 
 spin_lock(&pdev->vpci_lock);
 if ( !pdev->vpci )
 {
 spin_unlock(&pdev->vpci_lock);
-return vpci_read_hw(sbdf, reg, size);
+return vpci_read_hw(is_hwdom, sbdf, reg, size);
 }
 
 /* Read from the hardware or the emulated register handlers. */
@@ -361,7 +370,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
unsigned int size)
 {
 /* Heading gap, read partial content from hardware. */
 read_size = r->offset - emu.offset;
-val = vpci_read_hw(sbdf, emu.offset, read_size);
+val = vpci_read_hw(is_hwdom, sbdf, emu.offset, read_size);
 data = merge_result(data, val, read_size, data_offset);
 data_offset += read_size;
 }
@@ -387,7 +396,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
unsigned int size)
 if ( data_offset < size )
 {
 /* Tailing gap, read the remaining. */
-uint32_t tmp_data = vpci_read_hw(sbdf, reg + data_offset,
+uint32_t tmp_data = vpci_read_hw(is_hwdom, sbdf, reg + data_offset,
  size - data_offset);
 
 data = merge_result(data, tmp_data, size - data_offset, data_offset);
@@ -430,6 +439,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned 
int size,
 const struct vpci_register *r;
 unsigned int data_offset = 0;
 const unsigned long *ro_map = pci_get_ro_map(sbdf.seg);
+bool is_hwdom = is_hardware_domain(d);
 
 if ( !size )
 {
@@ -448,7 +458,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned 
int size,
 pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
 if ( !pdev )
 {
-vpci_write_hw(sbdf, reg, size, data);
+vpci_write_hw(is_hwdom, sbdf, reg, size, data);
 return;
 }
 
@@ -456,7 +466,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned 
int size,
 if ( !pdev->vpci )
 {
 spin_unlock(&pdev->vpci_lock);
-vpci_write_hw(sbdf, reg, size, data);
+vpci_write_hw(is_hwdom, sbdf, reg, size, data);
 return;
 }
 
@@ -479,7 +489,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned 
int size,
 if ( emu.offset < r->offset )
 {
 /* Heading gap, write partial content to hardware. */
-vpci_write_hw(sbdf, emu.offset, r->offset - emu.offset,
+vpci_write_hw(is_hwdom, sbdf, emu.offset, r->offset - emu.offset,
   data >> (data_offset * 8));
 data_offset += r->offset - emu.offset;
 }
@@ -498

[PATCH v6 07/13] vpci/header: handle p2m range sets per BAR

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

Instead of handling a single range set, that contains all the memory
regions of all the BARs and ROM, have them per BAR.
As the range sets are now created when a PCI device is added and destroyed
when it is removed so make them named and accounted.

Note that rangesets were chosen here despite there being only up to
3 separate ranges in each set (typically just 1). But rangeset per BAR
was chosen for the ease of implementation and existing code re-usability.

This is in preparation of making non-identity mappings in p2m for the MMIOs.

Signed-off-by: Oleksandr Andrushchenko 

---
Since v5:
- fix comments
- move rangeset allocation to init_bars and only allocate
  for MAPPABLE BARs
- check for overlap with the already setup BAR ranges
Since v4:
- use named range sets for BARs (Jan)
- changes required by the new locking scheme
- updated commit message (Jan)
Since v3:
- re-work vpci_cancel_pending accordingly to the per-BAR handling
- s/num_mem_ranges/map_pending and s/uint8_t/bool
- ASSERT(bar->mem) in modify_bars
- create and destroy the rangesets on add/remove
---
 xen/drivers/vpci/header.c | 213 --
 xen/drivers/vpci/vpci.c   |  17 ++-
 xen/include/xen/vpci.h|   4 +-
 3 files changed, 177 insertions(+), 57 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 2620a95ff35b..0c94504b87d8 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -131,50 +131,85 @@ static void modify_decoding(const struct pci_dev *pdev, 
uint16_t cmd,
 
 bool vpci_process_pending(struct vcpu *v)
 {
-if ( v->vpci.mem )
+struct pci_dev *pdev = v->vpci.pdev;
+
+if ( !pdev )
+return false;
+
+spin_lock(&pdev->vpci_lock);
+if ( v->vpci.map_pending )
 {
 struct map_data data = {
 .d = v->domain,
 .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
 };
-int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
+struct vpci_header *header = &pdev->vpci->header;
+unsigned int i;
+
+for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+{
+struct vpci_bar *bar = &header->bars[i];
+int rc;
+
+if ( rangeset_is_empty(bar->mem) )
+continue;
 
-if ( rc == -ERESTART )
-return true;
+rc = rangeset_consume_ranges(bar->mem, map_range, &data);
+
+if ( rc == -ERESTART )
+{
+spin_unlock(&pdev->vpci_lock);
+return true;
+}
 
-spin_lock(&v->vpci.pdev->vpci_lock);
-if ( v->vpci.pdev->vpci )
 /* Disable memory decoding unconditionally on failure. */
-modify_decoding(v->vpci.pdev,
-rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : 
v->vpci.cmd,
-!rc && v->vpci.rom_only);
-spin_unlock(&v->vpci.pdev->vpci_lock);
+modify_decoding(pdev, rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY :
+   v->vpci.cmd, !rc && v->vpci.rom_only);
+
+if ( rc )
+{
+/*
+ * FIXME: in case of failure remove the device from the domain.
+ * Note that there might still be leftover mappings. While this
+ * is safe for Dom0, for DomUs the domain needs to be killed in
+ * order to avoid leaking stale p2m mappings on failure.
+ */
+if ( is_hardware_domain(v->domain) )
+vpci_remove_device_locked(pdev);
+else
+domain_crash(v->domain);
+
+break;
+}
+}
+
+v->vpci.map_pending = false;
 
-rangeset_destroy(v->vpci.mem);
-v->vpci.mem = NULL;
-if ( rc )
-/*
- * FIXME: in case of failure remove the device from the domain.
- * Note that there might still be leftover mappings. While this is
- * safe for Dom0, for DomUs the domain will likely need to be
- * killed in order to avoid leaking stale p2m mappings on
- * failure.
- */
-vpci_remove_device(v->vpci.pdev);
 }
+spin_unlock(&pdev->vpci_lock);
 
 return false;
 }
 
 static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
-struct rangeset *mem, uint16_t cmd)
+uint16_t cmd)
 {
 struct map_data data = { .d = d, .map = true };
-int rc;
+struct vpci_header *header = &pdev->vpci->header;
+int rc = 0;
+unsigned int i;
+
+for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+{
+struct vpci_bar *bar = &header->bars[i];
 
-while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART 
)
-process_pending_softirqs();
-rangeset_destroy(mem);
+if ( rangeset_i

[PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

Add a stub for is_memory_hole which is required for PCI passthrough
on Arm.

Signed-off-by: Oleksandr Andrushchenko 

---
Cc: Julien Grall 
Cc: Stefano Stabellini 
---
New in v6
---
 xen/arch/arm/mm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index b1eae767c27c..c32e34a182a2 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1640,6 +1640,12 @@ unsigned long get_upper_mfn_bound(void)
 return max_page - 1;
 }
 
+bool is_memory_hole(mfn_t start, mfn_t end)
+{
+/* TODO: this needs to be properly implemented. */
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1




[PATCH v6 05/13] vpci: add hooks for PCI device assign/de-assign

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

When a PCI device gets assigned/de-assigned some work on vPCI side needs
to be done for that device. Introduce a pair of hooks so vPCI can handle
that.

Signed-off-by: Oleksandr Andrushchenko 
---
Since v5:
- do not split code into run_vpci_init
- do not check for is_system_domain in vpci_{de}assign_device
- do not use vpci_remove_device_handlers_locked and re-allocate
  pdev->vpci completely
- make vpci_deassign_device void
Since v4:
 - de-assign vPCI from the previous domain on device assignment
 - do not remove handlers in vpci_assign_device as those must not
   exist at that point
Since v3:
 - remove toolstack roll-back description from the commit message
   as error are to be handled with proper cleanup in Xen itself
 - remove __must_check
 - remove redundant rc check while assigning devices
 - fix redundant CONFIG_HAS_VPCI check for CONFIG_HAS_VPCI_GUEST_SUPPORT
 - use REGISTER_VPCI_INIT machinery to run required steps on device
   init/assign: add run_vpci_init helper
Since v2:
- define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
  for x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - extended the commit message
---
 xen/drivers/Kconfig   |  4 
 xen/drivers/passthrough/pci.c |  6 ++
 xen/drivers/vpci/vpci.c   | 27 +++
 xen/include/xen/vpci.h| 15 +++
 4 files changed, 52 insertions(+)

diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
index db94393f47a6..780490cf8e39 100644
--- a/xen/drivers/Kconfig
+++ b/xen/drivers/Kconfig
@@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
 config HAS_VPCI
bool
 
+config HAS_VPCI_GUEST_SUPPORT
+   bool
+   depends on HAS_VPCI
+
 endmenu
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 50dec3bb73d0..88836aab6baf 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -943,6 +943,8 @@ static int deassign_device(struct domain *d, uint16_t seg, 
uint8_t bus,
 if ( ret )
 goto out;
 
+vpci_deassign_device(d, pdev);
+
 if ( pdev->domain == hardware_domain  )
 pdev->quarantine = false;
 
@@ -1488,6 +1490,8 @@ static int assign_device(struct domain *d, u16 seg, u8 
bus, u8 devfn, u32 flag)
 ASSERT(pdev && (pdev->domain == hardware_domain ||
 pdev->domain == dom_io));
 
+vpci_deassign_device(pdev->domain, pdev);
+
 rc = pdev_msix_assign(d, pdev);
 if ( rc )
 goto done;
@@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 
bus, u8 devfn, u32 flag)
 pci_to_dev(pdev), flag);
 }
 
+rc = vpci_assign_device(d, pdev);
+
  done:
 if ( rc )
 printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index f8a93e61c08f..4e774875fa04 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -99,6 +99,33 @@ int vpci_add_handlers(struct pci_dev *pdev)
 
 return rc;
 }
+
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+/* Notify vPCI that device is assigned to guest. */
+int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
+{
+int rc;
+
+if ( !has_vpci(d) )
+return 0;
+
+rc = vpci_add_handlers(pdev);
+if ( rc )
+vpci_deassign_device(d, pdev);
+
+return rc;
+}
+
+/* Notify vPCI that device is de-assigned from guest. */
+void vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
+{
+if ( !has_vpci(d) )
+return;
+
+vpci_remove_device(pdev);
+}
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index f2a7d82ce77b..246307e6f5d5 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -251,6 +251,21 @@ static inline bool __must_check 
vpci_process_pending(struct vcpu *v)
 }
 #endif
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+/* Notify vPCI that device is assigned/de-assigned to/from guest. */
+int vpci_assign_device(struct domain *d, struct pci_dev *pdev);
+void vpci_deassign_device(struct domain *d, struct pci_dev *pdev);
+#else
+static inline int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
+{
+return 0;
+};
+
+static inline void vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
+{
+};
+#endif
+
 #endif
 
 /*
-- 
2.25.1




[PATCH v6 00/13] PCI devices passthrough on Arm, part 3

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

Hi, all!

1. This patch series is focusing on vPCI and adds support for non-identity
PCI BAR mappings which is required while passing through a PCI device to
a guest. The highlights are:

- Add relevant vpci register handlers when assigning PCI device to a domain
  and remove those when de-assigning. This allows having different
  handlers for different domains, e.g. hwdom and other guests.

- Emulate guest BAR register values based on physical BAR values.
  This allows creating a guest view of the registers and emulates
  size and properties probe as it is done during PCI device enumeration by
  the guest.

- Instead of handling a single range set, that contains all the memory
  regions of all the BARs and ROM, have them per BAR.

- Take into account guest's BAR view and program its p2m accordingly:
  gfn is guest's view of the BAR and mfn is the physical BAR value as set
  up by the host bridge in the hardware domain.
  This way hardware doamin sees physical BAR values and guest sees
  emulated ones.

2. The series also adds support for virtual PCI bus topology for guests:
 - We emulate a single host bridge for the guest, so segment is always 0.
 - The implementation is limited to 32 devices which are allowed on
   a single PCI bus.
 - The virtual bus number is set to 0, so virtual devices are seen
   as embedded endpoints behind the root complex.

3. The series has complete re-work of the locking scheme used/absent before with
the help of the work started by Roger [1]:
[PATCH v6 03/13] vpci: move lock outside of struct vpci

This way the lock can be used to check whether vpci is present, and
removal can be performed while holding the lock, in order to make
sure there are no accesses to the contents of the vpci struct.
Previously removal could race with vpci_read for example, since the
lock was dropped prior to freeing pdev->vpci.
This also solves synchronization issues between all vPCI code entities
which could run in parallel.

4. For unprivileged guests vpci_{read|write} has been re-worked
to not passthrough accesses to the registers not explicitly handled
by the corresponding vPCI handlers: without that passthrough
to guests is completely unsafe as Xen allows them full access to
the registers.
During development this can be reverted for debugging purposes.

5. The series was also tested on:
 - x86 PVH Dom0 and doesn't break it.
 - x86 HVM with PCI passthrough to DomU and doesn't break it.
 - Arm

Thank you,
Oleksandr

[1] 
https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger@citrix.com/

Oleksandr Andrushchenko (12):
  xen/pci: arm: add stub for is_memory_hole
  rangeset: add RANGESETF_no_print flag
  vpci: restrict unhandled read/write operations for guests
  vpci: add hooks for PCI device assign/de-assign
  vpci/header: implement guest BAR register handlers
  vpci/header: handle p2m range sets per BAR
  vpci/header: program p2m with guest BAR view
  vpci/header: emulate PCI_COMMAND register for guests
  vpci/header: reset the command register when adding devices
  vpci: add initial support for virtual PCI bus topology
  xen/arm: translate virtual PCI bus topology for guests
  xen/arm: account IO handlers for emulated PCI MSI-X

Roger Pau Monné (1):
  vpci: move lock outside of struct vpci

 tools/tests/vpci/emul.h   |   5 +-
 tools/tests/vpci/main.c   |   3 +-
 xen/arch/arm/mm.c |   6 +
 xen/arch/arm/vpci.c   |  31 ++-
 xen/arch/x86/hvm/vmsi.c   |   8 +-
 xen/common/rangeset.c |   5 +-
 xen/drivers/Kconfig   |   4 +
 xen/drivers/passthrough/pci.c |   7 +
 xen/drivers/vpci/header.c | 407 +++---
 xen/drivers/vpci/msi.c|  15 +-
 xen/drivers/vpci/msix.c   |  43 +++-
 xen/drivers/vpci/vpci.c   | 232 ---
 xen/include/xen/pci.h |   1 +
 xen/include/xen/rangeset.h|   5 +-
 xen/include/xen/sched.h   |   8 +
 xen/include/xen/vpci.h|  43 +++-
 16 files changed, 688 insertions(+), 135 deletions(-)

-- 
2.25.1




[PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-03 Thread Oleksandr Andrushchenko
From: Roger Pau Monné 

This way the lock can be used (and in a few cases is used right away)
to check whether vpci is present, and removal can be performed while
holding the lock, in order to make sure there are no accesses to the
contents of the vpci struct.
Previously removal could race with vpci_read for example, since the
lock was dropped prior to freeing pdev->vpci.

Signed-off-by: Roger Pau Monné 
Signed-off-by: Oleksandr Andrushchenko 
---
Cc: Andrew Cooper 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Stefano Stabellini 
---
New in v5 of this series: this is an updated version of the patch published at
https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger@citrix.com/

Changes since v5:
 - vpci_lock in test_pdev is already initialized to false by default
 - introduce msix_{get|put} to protect former msix_find's result
 - add comments to vpci_{add|remove}_registers about pdev->vpci_lock must
   be held.
 - do not split code into vpci_remove_device_handlers_locked yet
 - move INIT_LIST_HEAD outside the locked region (Jan)
 - stripped out locking optimizations for vpci_{read|write} into a
   dedicated patch
Changes since v2:
 - fixed pdev->vpci = xzalloc(struct vpci); under spin_lock (Jan)
Changes since v1:
 - Assert that vpci_lock is locked in vpci_remove_device_locked.
 - Remove double newline.
 - Shrink critical section in vpci_{read/write}.
---
 tools/tests/vpci/emul.h   |  5 ++-
 tools/tests/vpci/main.c   |  3 +-
 xen/arch/x86/hvm/vmsi.c   |  8 ++---
 xen/drivers/passthrough/pci.c |  1 +
 xen/drivers/vpci/header.c | 21 +++
 xen/drivers/vpci/msi.c| 11 --
 xen/drivers/vpci/msix.c   | 39 -
 xen/drivers/vpci/vpci.c   | 65 ++-
 xen/include/xen/pci.h |  1 +
 xen/include/xen/vpci.h|  3 +-
 10 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
index 2e1d3057c9d8..d018fb5eef21 100644
--- a/tools/tests/vpci/emul.h
+++ b/tools/tests/vpci/emul.h
@@ -44,6 +44,7 @@ struct domain {
 };
 
 struct pci_dev {
+bool vpci_lock;
 struct vpci *vpci;
 };
 
@@ -53,10 +54,8 @@ struct vcpu
 };
 
 extern const struct vcpu *current;
-extern const struct pci_dev test_pdev;
+extern struct pci_dev test_pdev;
 
-typedef bool spinlock_t;
-#define spin_lock_init(l) (*(l) = false)
 #define spin_lock(l) (*(l) = true)
 #define spin_unlock(l) (*(l) = false)
 
diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index b9a0a6006bb9..3b86ed232eb1 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -23,7 +23,7 @@ static struct vpci vpci;
 
 const static struct domain d;
 
-const struct pci_dev test_pdev = {
+struct pci_dev test_pdev = {
 .vpci = &vpci,
 };
 
@@ -158,7 +158,6 @@ main(int argc, char **argv)
 int rc;
 
 INIT_LIST_HEAD(&vpci.handlers);
-spin_lock_init(&vpci.lock);
 
 VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
 VPCI_READ_CHECK(0, 4, r0);
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 13e2a190b439..1f7a37f78264 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -910,14 +910,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
 {
 struct pci_dev *pdev = msix->pdev;
 
-spin_unlock(&msix->pdev->vpci->lock);
+spin_unlock(&msix->pdev->vpci_lock);
 process_pending_softirqs();
 /* NB: we assume that pdev cannot go away for an alive domain. */
-if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
+if ( !spin_trylock(&pdev->vpci_lock) )
 return -EBUSY;
-if ( pdev->vpci->msix != msix )
+if ( !pdev->vpci || pdev->vpci->msix != msix )
 {
-spin_unlock(&pdev->vpci->lock);
+spin_unlock(&pdev->vpci_lock);
 return -EAGAIN;
 }
 }
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e8b09d77d880..50dec3bb73d0 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -397,6 +397,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 
bus, u8 devfn)
 *((u8*) &pdev->bus) = bus;
 *((u8*) &pdev->devfn) = devfn;
 pdev->domain = NULL;
+spin_lock_init(&pdev->vpci_lock);
 
 arch_pci_init_pdev(pdev);
 
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 40ff79c33f8f..bd23c0274d48 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v)
 if ( rc == -ERESTART )
 return true;
 
-spin_lock(&v->vpci.pdev->vpci->lock);
-/* Disable memory decoding unconditionally on failure. */
-modify_decoding(v->vpci.pdev,
-rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
-!rc && v->vpci.rom_only);
-   

[PATCH v6 02/13] rangeset: add RANGESETF_no_print flag

2022-02-03 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

There are range sets which should not be printed, so introduce a flag
which allows marking those as such. Implement relevant logic to skip
such entries while printing.

While at it also simplify the definition of the flags by directly
defining those without helpers.

Suggested-by: Jan Beulich 
Signed-off-by: Oleksandr Andrushchenko 
Reviewed-by: Jan Beulich 
---
Since v5:
- comment indentation (Jan)
Since v1:
- update BUG_ON with new flag
- simplify the definition of the flags
---
 xen/common/rangeset.c  | 5 -
 xen/include/xen/rangeset.h | 5 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 885b6b15c229..ea27d651723b 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -433,7 +433,7 @@ struct rangeset *rangeset_new(
 INIT_LIST_HEAD(&r->range_list);
 r->nr_ranges = -1;
 
-BUG_ON(flags & ~RANGESETF_prettyprint_hex);
+BUG_ON(flags & ~(RANGESETF_prettyprint_hex | RANGESETF_no_print));
 r->flags = flags;
 
 safe_strcpy(r->name, name ?: "(no name)");
@@ -575,6 +575,9 @@ void rangeset_domain_printk(
 
 list_for_each_entry ( r, &d->rangesets, rangeset_list )
 {
+if ( r->flags & RANGESETF_no_print )
+continue;
+
 printk("");
 rangeset_printk(r);
 printk("\n");
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 135f33f6066f..f7c69394d66a 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -49,8 +49,9 @@ void rangeset_limit(
 
 /* Flags for passing to rangeset_new(). */
  /* Pretty-print range limits in hexadecimal. */
-#define _RANGESETF_prettyprint_hex 0
-#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
+#define RANGESETF_prettyprint_hex   (1U << 0)
+ /* Do not print entries marked with this flag. */
+#define RANGESETF_no_print  (1U << 1)
 
 bool_t __must_check rangeset_is_empty(
 const struct rangeset *r);
-- 
2.25.1




[xen-4.16-testing test] 167997: tolerable FAIL - PUSHED

2022-02-03 Thread osstest service owner
flight 167997 xen-4.16-testing real [real]
flight 168005 xen-4.16-testing real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/167997/
http://logs.test-lab.xenproject.org/osstest/logs/168005/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 168005-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167894
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167894
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167894
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167894
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167894
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167894
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167894
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167894
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167894
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167894
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167894
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167894
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-su

[xen-4.15-testing test] 167996: tolerable FAIL - PUSHED

2022-02-03 Thread osstest service owner
flight 167996 xen-4.15-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167996/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 
167965
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167965
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167965
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167965
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167965
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167965
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167965
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167965
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167965
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167965
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167965
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167965
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  3a9450fe5eb0fda8b7069f37d21ce2655bb59da6
baseline version:
 xen  32dcef0

[linux-linus test] 167993: regressions - FAIL

2022-02-03 Thread osstest service owner
flight 167993 linux-linus real [real]
flight 168002 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/167993/
http://logs.test-lab.xenproject.org/osstest/logs/168002/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. 
vs. 167988

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167988
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167988
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167988
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167988
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167988
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167988
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167988
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167988
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux88808fbbead481aedb46640a5ace69c58287f56a
baseline version:
 linux27bb0b18c208ecd4c0deda6aad28616d73e4133d

Last test of basis   167988  2022-02-02 18:11:17 Z1 days
Testing same since   167993  2022-02-03 04:34:02 Z0 days1 attempts


People who touched revisions under test:
  Chuck Lever 
  Dai Ngo 
  Dan Carpenter 
  J. Bruce Fields 
  Jan Kara 
  Linus Torvalds 

jobs:
 build-amd64-xsm   

[xen-unstable test] 167994: tolerable FAIL

2022-02-03 Thread osstest service owner
flight 167994 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167994/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat  fail pass in 167990

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167990
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167990
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167990
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167990
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167990
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167990
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167990
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167990
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167990
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167990
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167990
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167990
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targe

[PATCH] tools/guest: Fix comment regarding CPUID compatibility

2022-02-03 Thread Andrew Cooper
It was Xen 4.14 where CPUID data was added to the migration stream, and 4.13
that we need to worry about with regards to compatibility.  Xen 4.12 isn't
relevant.

Expand and correct the commentary.

Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore")
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Anthony PERARD 
CC: Juergen Gross 

Roger, this also has a knock-on effect on your CPUID series.  The 4.12 had
been nagging me for a while before I figured out why.
---
 tools/libs/guest/xg_cpuid_x86.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index b9e827ce7eb0..57f81eb0a082 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -497,9 +497,19 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t 
domid, bool restore,
 if ( restore )
 {
 /*
- * Account for feature which have been disabled by default since Xen 
4.13,
- * so migrated-in VM's don't risk seeing features disappearing.
+ * Xen 4.14 introduced support to move the guest's CPUID data in the
+ * migration stream.  Previously, the destination side would invent a
+ * policy out of thin air in the hopes that it was ok.
+ *
+ * This restore path is used for incoming VMs with no CPUID data
+ * i.e. originated on Xen 4.13 or earlier.  We must invent a policy
+ * compatible with what Xen 4.13 would have done on the same hardware.
+ *
+ * Specifically:
+ * - Clamp max leaves.
+ * - Re-enable features which have become (possibly) off by default.
  */
+
 p->basic.rdrand = test_bit(X86_FEATURE_RDRAND, host_featureset);
 p->feat.hle = test_bit(X86_FEATURE_HLE, host_featureset);
 p->feat.rtm = test_bit(X86_FEATURE_RTM, host_featureset);
@@ -509,7 +519,6 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t 
domid, bool restore,
 p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset);
 }
 
-/* Clamp maximum leaves to the ones supported on 4.12. */
 p->basic.max_leaf = min(p->basic.max_leaf, 0xdu);
 p->feat.max_subleaf = 0;
 p->extd.max_leaf = min(p->extd.max_leaf, 0x801c);
-- 
2.11.0




[xen-unstable-smoke test] 167999: tolerable all pass - PUSHED

2022-02-03 Thread osstest service owner
flight 167999 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167999/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  75cc460a1b8cfd8e5d2c4302234ee194defe4872
baseline version:
 xen  b17e0ec72eded037297f34a233655aad23f64711

Last test of basis   167985  2022-02-02 10:00:30 Z1 days
Testing same since   167999  2022-02-03 13:01:51 Z0 days1 attempts


People who touched revisions under test:
  Oleksandr Andrushchenko 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   b17e0ec72e..75cc460a1b  75cc460a1b8cfd8e5d2c4302234ee194defe4872 -> smoke



Re: [PATCH] xen: Modify domain_crash() to take a print string

2022-02-03 Thread Jan Beulich
On 03.02.2022 15:41, Andrew Cooper wrote:
> On 03/02/2022 14:19, Jan Beulich wrote:
>> On 03.02.2022 15:11, Andrew Cooper wrote:
>>> On 03/02/2022 13:48, Julien Grall wrote:
 On 03/02/2022 13:38, Andrew Cooper wrote:
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 37f78cc4c4c9..38b390d20371 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v);
>    * from any processor.
>    */
>   void __domain_crash(struct domain *d);
> -#define domain_crash(d) do
> {  \
> -    printk("domain_crash called from %s:%d\n", __FILE__,
> __LINE__);   \
> -   
> __domain_crash(d);    \
> -} while (0)
> +#define domain_crash(d, ...)    \
> +    do {    \
> +    if ( count_args(__VA_ARGS__) == 0 ) \
> +    printk("domain_crash called from %s:%d\n",  \
> +   __FILE__, __LINE__); \
 I find a bit odd that here you are using a normal printk
>>> That's unmodified from before.  Only reformatted.
>>>
 but...


> +    else    \
> +    printk(XENLOG_G_ERR __VA_ARGS__);   \
 here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so,
 wouldn't it be better to only use XENLOG_ERR so they can always be
 seen? (A domain shouldn't be able to abuse it).
>>> Perhaps.  I suppose it is more important information than pretty much
>>> anything else about the guest.
>> Indeed, but then - is this really an error in all cases?
> 
> Yes.  It is always a fatal event for the VM.

Which may or may not be Xen's fault. If the guest put itself in a bad
state, I don't see why we shouldn't consider such just a warning. IOW
I continue to think a log level, if so wanted, should be supplied by
the user of the macro.

Jan




Re: [PATCH] xen: Modify domain_crash() to take a print string

2022-02-03 Thread Andrew Cooper
On 03/02/2022 14:19, Jan Beulich wrote:
> On 03.02.2022 15:11, Andrew Cooper wrote:
>> On 03/02/2022 13:48, Julien Grall wrote:
>>> On 03/02/2022 13:38, Andrew Cooper wrote:
 diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
 index 37f78cc4c4c9..38b390d20371 100644
 --- a/xen/include/xen/sched.h
 +++ b/xen/include/xen/sched.h
 @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v);
    * from any processor.
    */
   void __domain_crash(struct domain *d);
 -#define domain_crash(d) do
 {  \
 -    printk("domain_crash called from %s:%d\n", __FILE__,
 __LINE__);   \
 -   
 __domain_crash(d);    \
 -} while (0)
 +#define domain_crash(d, ...)    \
 +    do {    \
 +    if ( count_args(__VA_ARGS__) == 0 ) \
 +    printk("domain_crash called from %s:%d\n",  \
 +   __FILE__, __LINE__); \
>>> I find a bit odd that here you are using a normal printk
>> That's unmodified from before.  Only reformatted.
>>
>>> but...
>>>
>>>
 +    else    \
 +    printk(XENLOG_G_ERR __VA_ARGS__);   \
>>> here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so,
>>> wouldn't it be better to only use XENLOG_ERR so they can always be
>>> seen? (A domain shouldn't be able to abuse it).
>> Perhaps.  I suppose it is more important information than pretty much
>> anything else about the guest.
> Indeed, but then - is this really an error in all cases?

Yes.  It is always a fatal event for the VM.

> The prior
> printk() simply ended up defaulting to a warning, and I would think
> that's what the new one should be doing too.

WARNING isn't exactly correct for a fatal event.

Ideally, we want a non-ratelimited G_ERR, but that doesn't exist.  If it
appears in the future, we can use it.

The set of people running with loglvl=error is almost certainly empty,
so it doesn't matter.

~Andrew


[PATCH v2] tools/libxl: don't allow IOMMU usage with PoD

2022-02-03 Thread Roger Pau Monne
Prevent libxl from creating guests that attempts to use PoD together
with an IOMMU, even if no devices are actually assigned.

While the hypervisor could support using PoD together with an IOMMU as
long as no devices are assigned, such usage seems doubtful. There's no
guarantee the guest has PoD no longer be active, and thus a later
assignment of a PCI device to such domain could fail.

Preventing the usage of PoD together with an IOMMU at guest creation
avoids having to add checks for active PoD entries in the device
assignment paths.

Signed-off-by: Roger Pau Monné 
---
Cc: Jan Beulich 
---
Changes since v1:
 - Reword commit message.
---
 tools/libs/light/libxl_create.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index d7a40d7550..7499922088 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
 pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) &&
 (d_config->b_info.target_memkb < d_config->b_info.max_memkb);
 
-/* We cannot have PoD and PCI device assignment at the same time
- * for HVM guest. It was reported that IOMMU cannot work with PoD
- * enabled because it needs to populated entire page table for
- * guest. To stay on the safe side, we disable PCI device
- * assignment when PoD is enabled.
+/* We don't support having PoD and an IOMMU at the same time for HVM
+ * guests. An active IOMMU cannot work with PoD because it needs a fully
+ * populated page-table. Prevent PoD usage if the domain has an IOMMU
+ * assigned, even if not active.
  */
 if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
-d_config->num_pcidevs && pod_enabled) {
+d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
+pod_enabled) {
 ret = ERROR_INVAL;
-LOGD(ERROR, domid,
- "PCI device assignment for HVM guest failed due to PoD enabled");
+LOGD(ERROR, domid, "IOMMU not supported together with PoD");
 goto error_out;
 }
 
-- 
2.34.1




Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Oleksandr Andrushchenko


On 03.02.22 16:05, Roger Pau Monné wrote:
> On Thu, Feb 03, 2022 at 01:30:26PM +, Oleksandr Andrushchenko wrote:
>>
>> On 03.02.22 14:54, Jan Beulich wrote:
>>> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
>> Also memory decoding needs to be initially disabled when used by
>> guests, in order to prevent the BAR being placed on top of a RAM
>> region. The guest physmap will be different from the host one, so it's
>> possible for BARs to end up placed on top of RAM regions initially
>> until the firmware or OS places them at a suitable address.
> Agree, memory decoding must be disabled
 Isn't it already achieved by the toolstack resetting the PCI device
 while assigning  it to a guest?
>>> Iirc the tool stack would reset a device only after getting it back from
>>> a DomU. When coming straight from Dom0 or DomIO, no reset would be
>>> performed. Furthermore, (again iirc) there are cases where there's no
>>> known (standard) way to reset a device. Assigning such to a guest when
>>> it previously was owned by another one is risky (and hence needs an
>>> admin knowing what they're doing), but may be acceptable in particular
>>> when e.g. simply rebooting a guest.
>>>
>>> IOW - I don't think you can rely on the bit being in a particular state.
>> So, you mean something like:
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 7695158e6445..9ebd57472da8 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev)
>>    return rc;
>>    }
>>
>> +    /*
>> + * Memory decoding needs to be initially disabled when used by
>> + * guests, in order to prevent the BAR being placed on top of a RAM
>> + * region.
>> + */
>> +    if ( !is_hwdom )
>> +    pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & 
>> ~PCI_COMMAND_MEMORY);
> Memory decoding is already disabled here, so you just need to avoid
> enabling it, for example:
>
>      /*
>   * Memory decoding needs to be initially disabled when used by
>   * guests, in order to prevent the BARs being mapped at gfn 0 by
>   * default.
>   */
>      if ( !is_hwdom )
>  cmd &= ~PCI_COMMAND_MEMORY;
>
>>    return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
> This is important here because guest_reg won't be set (ie: will be set
> to 0) so if for some reason memory decoding was enabled you would end
> up with BARs mappings overlapping at gfn 0.
Then the patch [1] will do what we need to be done for the guest I guess
I am thinking to still have it in the series which will solve exactly the 
problem
we are trying to solve
>
> Thanks, Roger.
[1] 
https://patchwork.kernel.org/project/xen-devel/patch/20211125110251.2877218-11-andr2...@gmail.com/

Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Oleksandr Andrushchenko


On 03.02.22 16:04, Jan Beulich wrote:
> On 03.02.2022 14:30, Oleksandr Andrushchenko wrote:
>>
>> On 03.02.22 14:54, Jan Beulich wrote:
>>> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
>> Also memory decoding needs to be initially disabled when used by
>> guests, in order to prevent the BAR being placed on top of a RAM
>> region. The guest physmap will be different from the host one, so it's
>> possible for BARs to end up placed on top of RAM regions initially
>> until the firmware or OS places them at a suitable address.
> Agree, memory decoding must be disabled
 Isn't it already achieved by the toolstack resetting the PCI device
 while assigning  it to a guest?
>>> Iirc the tool stack would reset a device only after getting it back from
>>> a DomU. When coming straight from Dom0 or DomIO, no reset would be
>>> performed. Furthermore, (again iirc) there are cases where there's no
>>> known (standard) way to reset a device. Assigning such to a guest when
>>> it previously was owned by another one is risky (and hence needs an
>>> admin knowing what they're doing), but may be acceptable in particular
>>> when e.g. simply rebooting a guest.
>>>
>>> IOW - I don't think you can rely on the bit being in a particular state.
>> So, you mean something like:
> Perhaps, but then I think ...
>
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev)
>>    return rc;
>>    }
>>
>> +    /*
>> + * Memory decoding needs to be initially disabled when used by
>> + * guests, in order to prevent the BAR being placed on top of a RAM
>> + * region.
>> + */
>> +    if ( !is_hwdom )
>> +    pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & 
>> ~PCI_COMMAND_MEMORY);
>> +
>>    return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
> ... you also want to update cmd, thus avoiding the call to modify_bars().
>
> And btw, from an abstract pov the same is true for I/O decoding: I
> realize that you mean to leave I/O port BARs aside for the moment, but I
> think the command register handling could very well take care of both.
>
> Which quickly gets us to the bus master enable bit: I think that one
> should initially be off too. Making me wonder: Doesn't the PCI spec
> define what the reset state of this register is? If so, that's what I
> think we want to put in place for DomU-s.
The spec I have says that all bits are typically 0 after reset.
So, it seems to be reasonable to just write 0 to PCI_COMMAND
> Jan
>
Thank you,
Oleksandr

Re: [PATCH] xen: Modify domain_crash() to take a print string

2022-02-03 Thread Jan Beulich
On 03.02.2022 15:11, Andrew Cooper wrote:
> On 03/02/2022 13:48, Julien Grall wrote:
>> On 03/02/2022 13:38, Andrew Cooper wrote:
>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>> index 37f78cc4c4c9..38b390d20371 100644
>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v);
>>>    * from any processor.
>>>    */
>>>   void __domain_crash(struct domain *d);
>>> -#define domain_crash(d) do
>>> {  \
>>> -    printk("domain_crash called from %s:%d\n", __FILE__,
>>> __LINE__);   \
>>> -   
>>> __domain_crash(d);    \
>>> -} while (0)
>>> +#define domain_crash(d, ...)    \
>>> +    do {    \
>>> +    if ( count_args(__VA_ARGS__) == 0 ) \
>>> +    printk("domain_crash called from %s:%d\n",  \
>>> +   __FILE__, __LINE__); \
>>
>> I find a bit odd that here you are using a normal printk
> 
> That's unmodified from before.  Only reformatted.
> 
>> but...
>>
>>
>>> +    else    \
>>> +    printk(XENLOG_G_ERR __VA_ARGS__);   \
>>
>> here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so,
>> wouldn't it be better to only use XENLOG_ERR so they can always be
>> seen? (A domain shouldn't be able to abuse it).
> 
> Perhaps.  I suppose it is more important information than pretty much
> anything else about the guest.

Indeed, but then - is this really an error in all cases? The prior
printk() simply ended up defaulting to a warning, and I would think
that's what the new one should be doing too. Or even leave the
setting of the log level to the invocation sites of the macro.

Jan




Re: [PATCH] xen: Modify domain_crash() to take a print string

2022-02-03 Thread Julien Grall

Hi Andrew,

On 03/02/2022 14:11, Andrew Cooper wrote:

On 03/02/2022 13:48, Julien Grall wrote:

Hi,

On 03/02/2022 13:38, Andrew Cooper wrote:

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 37f78cc4c4c9..38b390d20371 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v);
    * from any processor.
    */
   void __domain_crash(struct domain *d);
-#define domain_crash(d) do
{  \
-    printk("domain_crash called from %s:%d\n", __FILE__,
__LINE__);   \
-
__domain_crash(d);    \
-} while (0)
+#define domain_crash(d, ...)    \
+    do {    \
+    if ( count_args(__VA_ARGS__) == 0 ) \
+    printk("domain_crash called from %s:%d\n",  \
+   __FILE__, __LINE__); \


I find a bit odd that here you are using a normal printk


That's unmodified from before.  Only reformatted.


but...



+    else    \
+    printk(XENLOG_G_ERR __VA_ARGS__);   \


here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so,
wouldn't it be better to only use XENLOG_ERR so they can always be
seen? (A domain shouldn't be able to abuse it).


Perhaps.  I suppose it is more important information than pretty much
anything else about the guest.

I've changed locally, but won't repost just for this.


Ok. With that:

Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall



Xen 4.14.4 released

2022-02-03 Thread Jan Beulich
All,

we're pleased to announce the release of Xen 4.14.4. This is available
immediately from its git repository
http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.14
(tag RELEASE-4.14.4) or from the XenProject download page
https://xenproject.org/downloads/xen-project-archives/xen-project-4-14-series/xen-project-4-14-4/
(where a list of changes can also be found).

We recommend all users of the 4.14 stable series to update to this
last point release scheduled to be made by the Xen Project team from
this branch.

Regards, Jan




Xen 4.15.2 released

2022-02-03 Thread Jan Beulich
All,

we're pleased to announce the release of Xen 4.15.2. This is available
immediately from its git repository
http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.15
(tag RELEASE-4.15.2) or from the XenProject download page
https://xenproject.org/downloads/xen-project-archives/xen-project-4-15-series/xen-project-4-15-2/
(where a list of changes can also be found).

We recommend all users of the 4.15 stable series to update to this
latest point release.

Regards, Jan




Re: [PATCH] xen: Modify domain_crash() to take a print string

2022-02-03 Thread Andrew Cooper
On 03/02/2022 13:48, Julien Grall wrote:
> Hi,
>
> On 03/02/2022 13:38, Andrew Cooper wrote:
>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index 37f78cc4c4c9..38b390d20371 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v);
>>    * from any processor.
>>    */
>>   void __domain_crash(struct domain *d);
>> -#define domain_crash(d) do
>> {  \
>> -    printk("domain_crash called from %s:%d\n", __FILE__,
>> __LINE__);   \
>> -   
>> __domain_crash(d);    \
>> -} while (0)
>> +#define domain_crash(d, ...)    \
>> +    do {    \
>> +    if ( count_args(__VA_ARGS__) == 0 ) \
>> +    printk("domain_crash called from %s:%d\n",  \
>> +   __FILE__, __LINE__); \
>
> I find a bit odd that here you are using a normal printk

That's unmodified from before.  Only reformatted.

> but...
>
>
>> +    else    \
>> +    printk(XENLOG_G_ERR __VA_ARGS__);   \
>
> here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so,
> wouldn't it be better to only use XENLOG_ERR so they can always be
> seen? (A domain shouldn't be able to abuse it).

Perhaps.  I suppose it is more important information than pretty much
anything else about the guest.

I've changed locally, but won't repost just for this.

~Andrew


Re: [PATCH] tools/libxl: don't allow IOMMU usage with PoD

2022-02-03 Thread Jan Beulich
On 03.02.2022 14:50, Roger Pau Monné wrote:
> On Thu, Feb 03, 2022 at 12:55:56PM +0100, Jan Beulich wrote:
>> On 03.02.2022 12:06, Roger Pau Monne wrote:
>>> Prevent libxl from creating guests that attempts to use PoD together
>>> with an IOMMU, even if no devices are actually assigned.
>>>
>>> While the hypervisor could support using PoD together with an IOMMU as
>>> long as no devices are assigned, such usage seems doubtful. There's no
>>> guarantee the guest has ballooned down enough memory for PoD to no
>>> longer be active, and thus a later assignment of a PCI device to such
>>> domain could fail.
>>
>> That's not a precise description of the constraint: The guest ballooning
>> down enough only means entries == cache, but for device assignment we
>> need entries == 0 (and a guarantee that no new entries can appear, but I
>> think this is already the case once a guest was launched).
> 
> Would you be OK with:
> 
> "While the hypervisor could support using PoD together with an IOMMU as
> long as no devices are assigned, such usage seems doubtful. There's no
> guarantee the guest has PoD no longer be active, and thus a later
> assignment of a PCI device to such domain could fail."

Yes, thanks, this sounds better (to me at least).

Jan




Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Roger Pau Monné
On Thu, Feb 03, 2022 at 01:30:26PM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 03.02.22 14:54, Jan Beulich wrote:
> > On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
>  Also memory decoding needs to be initially disabled when used by
>  guests, in order to prevent the BAR being placed on top of a RAM
>  region. The guest physmap will be different from the host one, so it's
>  possible for BARs to end up placed on top of RAM regions initially
>  until the firmware or OS places them at a suitable address.
> >>> Agree, memory decoding must be disabled
> >> Isn't it already achieved by the toolstack resetting the PCI device
> >> while assigning  it to a guest?
> > Iirc the tool stack would reset a device only after getting it back from
> > a DomU. When coming straight from Dom0 or DomIO, no reset would be
> > performed. Furthermore, (again iirc) there are cases where there's no
> > known (standard) way to reset a device. Assigning such to a guest when
> > it previously was owned by another one is risky (and hence needs an
> > admin knowing what they're doing), but may be acceptable in particular
> > when e.g. simply rebooting a guest.
> >
> > IOW - I don't think you can rely on the bit being in a particular state.
> So, you mean something like:
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 7695158e6445..9ebd57472da8 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev)
>   return rc;
>   }
> 
> +    /*
> + * Memory decoding needs to be initially disabled when used by
> + * guests, in order to prevent the BAR being placed on top of a RAM
> + * region.
> + */
> +    if ( !is_hwdom )
> +    pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);

Memory decoding is already disabled here, so you just need to avoid
enabling it, for example:

    /*
 * Memory decoding needs to be initially disabled when used by
 * guests, in order to prevent the BARs being mapped at gfn 0 by
 * default.
 */
    if ( !is_hwdom )
cmd &= ~PCI_COMMAND_MEMORY;

>   return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;

This is important here because guest_reg won't be set (ie: will be set
to 0) so if for some reason memory decoding was enabled you would end
up with BARs mappings overlapping at gfn 0.

Thanks, Roger.



Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Jan Beulich
On 03.02.2022 14:30, Oleksandr Andrushchenko wrote:
> 
> 
> On 03.02.22 14:54, Jan Beulich wrote:
>> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
> Also memory decoding needs to be initially disabled when used by
> guests, in order to prevent the BAR being placed on top of a RAM
> region. The guest physmap will be different from the host one, so it's
> possible for BARs to end up placed on top of RAM regions initially
> until the firmware or OS places them at a suitable address.
 Agree, memory decoding must be disabled
>>> Isn't it already achieved by the toolstack resetting the PCI device
>>> while assigning  it to a guest?
>> Iirc the tool stack would reset a device only after getting it back from
>> a DomU. When coming straight from Dom0 or DomIO, no reset would be
>> performed. Furthermore, (again iirc) there are cases where there's no
>> known (standard) way to reset a device. Assigning such to a guest when
>> it previously was owned by another one is risky (and hence needs an
>> admin knowing what they're doing), but may be acceptable in particular
>> when e.g. simply rebooting a guest.
>>
>> IOW - I don't think you can rely on the bit being in a particular state.
> So, you mean something like:

Perhaps, but then I think ...

> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev)
>   return rc;
>   }
> 
> +    /*
> + * Memory decoding needs to be initially disabled when used by
> + * guests, in order to prevent the BAR being placed on top of a RAM
> + * region.
> + */
> +    if ( !is_hwdom )
> +    pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
> +
>   return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;

... you also want to update cmd, thus avoiding the call to modify_bars().

And btw, from an abstract pov the same is true for I/O decoding: I
realize that you mean to leave I/O port BARs aside for the moment, but I
think the command register handling could very well take care of both.

Which quickly gets us to the bus master enable bit: I think that one
should initially be off too. Making me wonder: Doesn't the PCI spec
define what the reset state of this register is? If so, that's what I
think we want to put in place for DomU-s.

Jan




[PATCH v4] x86/altp2m: p2m_altp2m_propagate_change() should honor present page order

2022-02-03 Thread Jan Beulich
For higher order mappings the comparison against p2m->min_remapped_gfn
needs to take the upper bound of the covered GFN range into account, not
just the base GFN. Otherwise, i.e. when dropping a mapping overlapping
the remapped range but the base GFN outside of that range, an altp2m may
wrongly not get reset.

Note that there's no need to call get_gfn_type_access() ahead of the
check against the remapped range boundaries: None of its outputs are
needed earlier, and p2m_reset_altp2m() doesn't require the lock to be
held. In fact this avoids a latent lock order violation: With per-GFN
locking p2m_reset_altp2m() not only doesn't require the GFN lock to be
held, but holding such a lock would actually not be allowed, as the
function acquires a P2M lock.

Local variables are moved into the more narrow scope (one is deleted
altogether) to help see their actual life ranges.

Signed-off-by: Jan Beulich 
---
Note that this addresses only half of the problem: get_gfn_type_access()
would also need invoking for all of the involved GFNs, not just the 1st
one.
---
v4: Restore mistakenly dropped mfn_eq(mfn, INVALID_MFN).
v3: Don't pass the minimum of both orders to p2m_set_entry() (as was the
case in the original code). Restore get_gfn_type_access() return
value checking.

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2534,9 +2534,6 @@ int p2m_altp2m_propagate_change(struct d
 p2m_type_t p2mt, p2m_access_t p2ma)
 {
 struct p2m_domain *p2m;
-p2m_access_t a;
-p2m_type_t t;
-mfn_t m;
 unsigned int i;
 unsigned int reset_count = 0;
 unsigned int last_reset_idx = ~0;
@@ -2549,15 +2546,17 @@ int p2m_altp2m_propagate_change(struct d
 
 for ( i = 0; i < MAX_ALTP2M; i++ )
 {
+p2m_type_t t;
+p2m_access_t a;
+
 if ( d->arch.altp2m_eptp[i] == mfn_x(INVALID_MFN) )
 continue;
 
 p2m = d->arch.altp2m_p2m[i];
-m = get_gfn_type_access(p2m, gfn_x(gfn), &t, &a, 0, NULL);
 
 /* Check for a dropped page that may impact this altp2m */
 if ( mfn_eq(mfn, INVALID_MFN) &&
- gfn_x(gfn) >= p2m->min_remapped_gfn &&
+ gfn_x(gfn) + (1UL << page_order) > p2m->min_remapped_gfn &&
  gfn_x(gfn) <= p2m->max_remapped_gfn )
 {
 if ( !reset_count++ )
@@ -2568,8 +2566,6 @@ int p2m_altp2m_propagate_change(struct d
 else
 {
 /* At least 2 altp2m's impacted, so reset everything */
-__put_gfn(p2m, gfn_x(gfn));
-
 for ( i = 0; i < MAX_ALTP2M; i++ )
 {
 if ( i == last_reset_idx ||
@@ -2583,16 +2579,19 @@ int p2m_altp2m_propagate_change(struct d
 break;
 }
 }
-else if ( !mfn_eq(m, INVALID_MFN) )
+else if ( !mfn_eq(get_gfn_type_access(p2m, gfn_x(gfn), &t, &a, 0,
+  NULL), INVALID_MFN) )
 {
 int rc = p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
 
 /* Best effort: Don't bail on error. */
 if ( !ret )
 ret = rc;
-}
 
-__put_gfn(p2m, gfn_x(gfn));
+__put_gfn(p2m, gfn_x(gfn));
+}
+else
+__put_gfn(p2m, gfn_x(gfn));
 }
 
 altp2m_list_unlock(d);




Re: [PATCH] tools/libxl: don't allow IOMMU usage with PoD

2022-02-03 Thread Roger Pau Monné
On Thu, Feb 03, 2022 at 12:55:56PM +0100, Jan Beulich wrote:
> On 03.02.2022 12:06, Roger Pau Monne wrote:
> > Prevent libxl from creating guests that attempts to use PoD together
> > with an IOMMU, even if no devices are actually assigned.
> > 
> > While the hypervisor could support using PoD together with an IOMMU as
> > long as no devices are assigned, such usage seems doubtful. There's no
> > guarantee the guest has ballooned down enough memory for PoD to no
> > longer be active, and thus a later assignment of a PCI device to such
> > domain could fail.
> 
> That's not a precise description of the constraint: The guest ballooning
> down enough only means entries == cache, but for device assignment we
> need entries == 0 (and a guarantee that no new entries can appear, but I
> think this is already the case once a guest was launched).

Would you be OK with:

"While the hypervisor could support using PoD together with an IOMMU as
long as no devices are assigned, such usage seems doubtful. There's no
guarantee the guest has PoD no longer be active, and thus a later
assignment of a PCI device to such domain could fail."

By "PoD no longer be active" meaning cache == entries == 0.

Thanks, Roger.



Re: [PATCH] xen: Modify domain_crash() to take a print string

2022-02-03 Thread Julien Grall

Hi,

On 03/02/2022 13:38, Andrew Cooper wrote:

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 37f78cc4c4c9..38b390d20371 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v);
   * from any processor.
   */
  void __domain_crash(struct domain *d);
-#define domain_crash(d) do {  \
-printk("domain_crash called from %s:%d\n", __FILE__, __LINE__);   \
-__domain_crash(d);\
-} while (0)
+#define domain_crash(d, ...)\
+do {\
+if ( count_args(__VA_ARGS__) == 0 ) \
+printk("domain_crash called from %s:%d\n",  \
+   __FILE__, __LINE__); \


I find a bit odd that here you are using a normal printk but...



+else\
+printk(XENLOG_G_ERR __VA_ARGS__);   \


here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so, wouldn't 
it be better to only use XENLOG_ERR so they can always be seen? (A 
domain shouldn't be able to abuse it).


Cheers,

--
Julien Grall



[PATCH] xen: Modify domain_crash() to take a print string

2022-02-03 Thread Andrew Cooper
There are two problems with domain_crash().

First, that it is frequently not preceded by a printk() at all, or that it is
only preceded by a dprintk().  Either way, critical diagnostic is omitted for
an event which is fatal to the guest.

Second, the embedded __LINE__ is an issue for livepatching, creating unwanted
churn in the binary diff.  This is the final __LINE__ remaining in
livepatching-relevant contexts.

The end goal is to have domain_crash() require a print string which gets fed
to printk(), making it far less easy to omit relevant diagnostic information.

However, modifying all callers at once is far too big and complicated, so use
some macro magic to tolerate the old API (no print string) in the short term.

Adjust two callers in load_segments() to demonstrate the new API.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 

Supersedes my previous attempt to update every caller in one go.  In due
course I'll split that mammoth patch up into a series.
---
 xen/arch/x86/domain.c   | 14 --
 xen/include/xen/sched.h | 13 +
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index ef1812dc1402..45be5e1cd7c9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1693,11 +1693,8 @@ static void load_segments(struct vcpu *n)
  put_guest(uregs->fs,   esp - 5) |
  put_guest(uregs->es,   esp - 6) |
  put_guest(uregs->ds,   esp - 7) )
-{
-gprintk(XENLOG_ERR,
-"error while creating compat failsafe callback 
frame\n");
-domain_crash(n->domain);
-}
+domain_crash(n->domain,
+ "Error creating compat failsafe callback 
frame\n");
 
 if ( n->arch.pv.vgc_flags & VGCF_failsafe_disables_events )
 vcpu_info(n, evtchn_upcall_mask) = 1;
@@ -1732,11 +1729,8 @@ static void load_segments(struct vcpu *n)
  put_guest(uregs->ds,   rsp -  9) |
  put_guest(regs->r11,   rsp - 10) |
  put_guest(regs->rcx,   rsp - 11) )
-{
-gprintk(XENLOG_ERR,
-"error while creating failsafe callback frame\n");
-domain_crash(n->domain);
-}
+domain_crash(n->domain,
+ "Error creating failsafe callback frame\n");
 
 if ( n->arch.pv.vgc_flags & VGCF_failsafe_disables_events )
 vcpu_info(n, evtchn_upcall_mask) = 1;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 37f78cc4c4c9..38b390d20371 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v);
  * from any processor.
  */
 void __domain_crash(struct domain *d);
-#define domain_crash(d) do {  \
-printk("domain_crash called from %s:%d\n", __FILE__, __LINE__);   \
-__domain_crash(d);\
-} while (0)
+#define domain_crash(d, ...)\
+do {\
+if ( count_args(__VA_ARGS__) == 0 ) \
+printk("domain_crash called from %s:%d\n",  \
+   __FILE__, __LINE__); \
+else\
+printk(XENLOG_G_ERR __VA_ARGS__);   \
+__domain_crash(d);  \
+} while ( 0 )
 
 /*
  * Called from assembly code, with an optional address to help indicate why
-- 
2.11.0




Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Oleksandr Andrushchenko


On 03.02.22 14:54, Jan Beulich wrote:
> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
 Also memory decoding needs to be initially disabled when used by
 guests, in order to prevent the BAR being placed on top of a RAM
 region. The guest physmap will be different from the host one, so it's
 possible for BARs to end up placed on top of RAM regions initially
 until the firmware or OS places them at a suitable address.
>>> Agree, memory decoding must be disabled
>> Isn't it already achieved by the toolstack resetting the PCI device
>> while assigning  it to a guest?
> Iirc the tool stack would reset a device only after getting it back from
> a DomU. When coming straight from Dom0 or DomIO, no reset would be
> performed. Furthermore, (again iirc) there are cases where there's no
> known (standard) way to reset a device. Assigning such to a guest when
> it previously was owned by another one is risky (and hence needs an
> admin knowing what they're doing), but may be acceptable in particular
> when e.g. simply rebooting a guest.
>
> IOW - I don't think you can rely on the bit being in a particular state.
So, you mean something like:

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 7695158e6445..9ebd57472da8 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev)
  return rc;
  }

+    /*
+ * Memory decoding needs to be initially disabled when used by
+ * guests, in order to prevent the BAR being placed on top of a RAM
+ * region.
+ */
+    if ( !is_hwdom )
+    pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
+
  return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
  }
  REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);

> Jan
>
Thank you,
Oleksandr

Re: [PATCH v2 1/2] xen: add option to disable GNTTABOP_transfer

2022-02-03 Thread Jan Beulich
On 03.02.2022 14:14, Juergen Gross wrote:
> The grant table operation GNTTABOP_transfer is meant to be used in
> PV device backends, and it hasn't been used in Linux since the old
> Xen-o-Linux days.
> 
> Add a command line sub-option to the "gnttab" option for disabling the
> GNTTABOP_transfer functionality.
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 




[libvirt test] 167992: regressions - FAIL

2022-02-03 Thread osstest service owner
flight 167992 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167992/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  41e878859ac7eb4059f2e9b338e585e205fd575c
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  573 days
Failing since151818  2020-07-11 04:18:52 Z  572 days  554 attempts
Testing same since   167992  2022-02-03 04:20:17 Z0 days1 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Andika Triwidada 
  Andrea Bolognani 
  Ani Sinha 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brad Laue 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Borntraeger 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Christophe Fergeau 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  dinglimin 
  Divya Garg 
  Dmitrii Shcherbakov 
  Dmytro Linkin 
  Eiichi Tsukata 
  Emilio Herrera 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Franck Ridel 
  Gavi Teitz 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Hela Basa 
  Helmut Grohne 
  Hiroki Narukawa 
  Hyman Huang(黄勇) 
  Ian Wienand 
  Ioanna Alifieraki 
  Ivan Teterevkov 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  jason lee 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jinsheng Zhang 
  Jiri Denemark 
  Joachim Falk 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Koichi Murase 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Lei Yang 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Lubomir Rintel 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Chevsky 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nicolas Lécureuil 
  Nicolas Lécureuil 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Or Ozeri 
  Orion Poplawski 
  Pany 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Praveen K Paladugu 
  Richard W.M. Jones 
  Ricky Tigg 
  Robin Lee 
  Rohit Kumar 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  SeongHyun Jo 
  Shalini Chellathurai Saroja 
  Shaojun Yang 
  shenjiat

[PATCH v2 2/2] xen/include/public: deprecate GNTTABOP_transfer

2022-02-03 Thread Juergen Gross
Add a comment to include/public/grant_table.h that GNTTABOP_transfer
is deprecated, in order to discourage new use cases.

Signed-off-by: Juergen Gross 
---
v2:
- new patch
---
 xen/include/public/grant_table.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/public/grant_table.h b/xen/include/public/grant_table.h
index 7934d7b718..7fbd1c6d10 100644
--- a/xen/include/public/grant_table.h
+++ b/xen/include/public/grant_table.h
@@ -417,6 +417,8 @@ DEFINE_XEN_GUEST_HANDLE(gnttab_dump_table_t);
  * GNTTABOP_transfer: Transfer  to a foreign domain. The foreign domain
  * has previously registered its interest in the transfer via .
  *
+ * This operation is deprecated! Please don't add new use cases!
+ *
  * Note that, even if the transfer fails, the specified page no longer belongs
  * to the calling domain *unless* the error is GNTST_bad_page.
  *
-- 
2.34.1




[PATCH v2 1/2] xen: add option to disable GNTTABOP_transfer

2022-02-03 Thread Juergen Gross
The grant table operation GNTTABOP_transfer is meant to be used in
PV device backends, and it hasn't been used in Linux since the old
Xen-o-Linux days.

Add a command line sub-option to the "gnttab" option for disabling the
GNTTABOP_transfer functionality.

Signed-off-by: Juergen Gross 
---
V2:
- make option available for CONFIG_PV only (Jan Beulich)
- return -EOPNOTSUPP instead of -ENOSYS (Jan Beulich)
---
 docs/misc/xen-command-line.pandoc |  8 ++--
 xen/common/grant_table.c  | 12 
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 6b3da6ddc1..44232b94c5 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1167,9 +1167,9 @@ does not provide `VM_ENTRY_LOAD_GUEST_PAT`.
 Specify which console gdbstub should use. See **console**.
 
 ### gnttab
-> `= List of [ max-ver:, transitive= ]`
+> `= List of [ max-ver:, transitive=, transfer= ]`
 
-> Default: `gnttab=max-ver:2,transitive`
+> Default: `gnttab=max-ver:2,transitive,transfer`
 
 Control various aspects of the grant table behaviour available to guests.
 
@@ -1178,6 +1178,10 @@ version are 1 and 2.
 * `transitive` Permit or disallow the use of transitive grants.  Note that the
 use of grant table v2 without transitive grants is an ABI breakage from the
 guests point of view.
+* `transfer` Permit or disallow the GNTTABOP_transfer operation of the
+grant table hypercall.  Note that disallowing GNTTABOP_transfer is an ABI
+breakage from the guests point of view.  This option is only available on
+hypervisors configured to support PV guests.
 
 The usage of gnttab v2 is not security supported on ARM platforms.
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index ed1e2fabce..57dfc54994 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -181,6 +181,11 @@ static int parse_gnttab_max_maptrack_frames(const char 
*arg)
 
 unsigned int __read_mostly opt_gnttab_max_version = GNTTAB_MAX_VERSION;
 static bool __read_mostly opt_transitive_grants = true;
+#ifdef CONFIG_PV
+static bool __ro_after_init opt_grant_transfer = true;
+#else
+#define opt_grant_transfer false
+#endif
 
 static int __init parse_gnttab(const char *s)
 {
@@ -204,6 +209,10 @@ static int __init parse_gnttab(const char *s)
 }
 else if ( (val = parse_boolean("transitive", s, ss)) >= 0 )
 opt_transitive_grants = val;
+#ifndef opt_grant_transfer
+else if ( (val = parse_boolean("transfer", s, ss)) >= 0 )
+opt_grant_transfer = val;
+#endif
 else
 rc = -EINVAL;
 
@@ -2233,6 +2242,9 @@ gnttab_transfer(
 unsigned int max_bitsize;
 struct active_grant_entry *act;
 
+if ( !opt_grant_transfer )
+return -EOPNOTSUPP;
+
 for ( i = 0; i < count; i++ )
 {
 bool_t okay;
-- 
2.34.1




Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Jan Beulich
On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
>>> Also memory decoding needs to be initially disabled when used by
>>> guests, in order to prevent the BAR being placed on top of a RAM
>>> region. The guest physmap will be different from the host one, so it's
>>> possible for BARs to end up placed on top of RAM regions initially
>>> until the firmware or OS places them at a suitable address.
>> Agree, memory decoding must be disabled
> Isn't it already achieved by the toolstack resetting the PCI device
> while assigning  it to a guest?

Iirc the tool stack would reset a device only after getting it back from
a DomU. When coming straight from Dom0 or DomIO, no reset would be
performed. Furthermore, (again iirc) there are cases where there's no
known (standard) way to reset a device. Assigning such to a guest when
it previously was owned by another one is risky (and hence needs an
admin knowing what they're doing), but may be acceptable in particular
when e.g. simply rebooting a guest.

IOW - I don't think you can rely on the bit being in a particular state.

Jan




[PATCH v2] docs: document patch rules

2022-02-03 Thread Juergen Gross
Add a document to describe the rules for sending a proper patch.

As it contains all the information already being present in
docs/process/tags.pandoc remove that file.

The "Reviewed-by:" and "Acked-by:" tags are expanded to allow an
optional restriction of the tag.

A new tag "Origin:" is added to tag patches taken from another project.

Signed-off-by: Juergen Gross 
---
v2:
- expanded commit message (Roger Pau Monné)
- some rewordings (Roger Pau Monné, Jan Beulich)
- add "Requested-by:" description (Jan Beulich)
- rename "Taken-from:" to "Origin:" (Jan Beulich)
- add reviewers as recipients of patch (Jan Beulich)
- style fixes (Roger Pau Monné, Jan Beulich)

Signed-off-by: Juergen Gross 
---
 docs/process/sending-patches.pandoc | 298 
 docs/process/tags.pandoc|  55 -
 2 files changed, 298 insertions(+), 55 deletions(-)
 create mode 100644 docs/process/sending-patches.pandoc
 delete mode 100644 docs/process/tags.pandoc

diff --git a/docs/process/sending-patches.pandoc 
b/docs/process/sending-patches.pandoc
new file mode 100644
index 00..2091037901
--- /dev/null
+++ b/docs/process/sending-patches.pandoc
@@ -0,0 +1,298 @@
+# How a proper patch should look like
+
+This is a brief description how a proper patch for the Xen project should
+look like. Examples and tooling tips are not part of this document, those
+can be found in the
+[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches).
+
+## The patch subject
+
+The first line at the top of the patch should contain a short description of
+what the patch does, and hints as to what code it touches. This line is used
+as the **Subject** line of the mail when sending the patch.
+
+The hint which code is touched is usually in form of an abstract entity
+(like e.g. `build` for the build system), or a component (like `tools` or
+`iommu`). Further specification is possible via adding a sub-component with
+a slash (e.g. `tools/xenstore`):
+
+: 
+
+E.g.:
+
+xen/arm: increase memory banks number define value
+tools/libxenevtchn: deduplicate xenevtchn_fd()
+MAINTAINERS: update my email address
+build: correct usage comments in Kbuild.include
+
+The description should give a rough hint *what* is done in the patch.
+
+The subject line should in general not exceed 80 characters. It must be
+followed by a blank line.
+
+## The commit message
+
+The commit message is free text describing *why* the patch is done and
+*how* the goal of the patch is achieved. A good commit message will describe
+the current situation, the desired goal, and the way this goal is being
+achieved. Parts of that can be omitted in obvious cases.
+
+In case additional changes are done in the patch (like e.g. cleanups), those
+should be mentioned.
+
+When referencing other patches (e.g. `similar to patch xy ...`) those
+patches should be referenced via their commit id (at least 12 digits)
+and the patch subject, if the very same patch isn't referenced by the
+`Fixes:` tag, too:
+
+Similar to commit 67d01cdb5518 ("x86: infrastructure to allow converting
+certain indirect calls to direct ones") add ...
+
+The following ``git config`` settings can be used to add a pretty format for
+outputting the above style in the ``git log`` or ``git show`` commands:
+
+[core]
+abbrev = 12
+[pretty]
+fixes = Fixes: %h (\"%s\")
+
+Lines in the commit message should not exceed 75 characters, except when
+copying error output directly into the commit message.
+
+## Tags
+
+Tags are entries in the form
+
+Tag: something
+
+In general tags are added in chronological order. So a `Reviewed-by:` tag
+should be added **after** the `Signed-off-by:` tag, as the review happened
+after the patch was written.
+
+Do not split a tag across multiple lines, tags are exempt from the
+"wrap at 75 columns" rule in order to simplify parsing scripts.
+
+### Origin:
+
+Xen has inherited some source files from other open source projects. In case
+a patch modifying such an inherited file is taken from that project (maybe in
+modified form), the `Origin:` tag specifies the source of the patch:
+
+Origin:  
+
+E.g.:
+
+Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
f093b08c47b3
+
+All tags **above** the `Origin:` tag are from the original patch (which
+should all be kept), while tags **after** `Origin:` are related to the
+normal Xen patch process as described here.
+
+### Fixes:
+
+If your patch fixes a bug in a specific commit, e.g. you found an issue using
+``git bisect``, please use the `Fixes:` tag with the first 12 characters of
+the commit id, and the one line summary.
+
+Fixes:  ("")
+
+E.g.:
+
+Fixes: 67d01cdb5518 ("x86: infrastructure to allow converting certain 
indirect calls to direct ones")
+
+### Backport:
+
+A backport tag is an optional tag in the commit message to request a
+given commit to be backported to the released trees:
+
+

Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Oleksandr Andrushchenko


On 03.02.22 14:50, Jan Beulich wrote:
> On 03.02.2022 13:48, Oleksandr Andrushchenko wrote:
>> Hi, Jan!
>>
>> On 03.02.22 14:44, Jan Beulich wrote:
>>> On 03.02.2022 13:36, Oleksandr Andrushchenko wrote:
 Hi, Bertrand!

 On 26.11.21 14:19, Oleksandr Andrushchenko wrote:
> Hi, Bertrand!
>
> On 25.11.21 18:28, Bertrand Marquis wrote:
>> Hi Oleksandr,
>>
>>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko  
>>> wrote:
>>>
>>> From: Oleksandr Andrushchenko 
>>>
>>> Add relevant vpci register handlers when assigning PCI device to a 
>>> domain
>>> and remove those when de-assigning. This allows having different
>>> handlers for different domains, e.g. hwdom and other guests.
>>>
>>> Emulate guest BAR register values: this allows creating a guest view
>>> of the registers and emulates size and properties probe as it is done
>>> during PCI device enumeration by the guest.
>>>
>>> ROM BAR is only handled for the hardware domain and for guest domains
>>> there is a stub: at the moment PCI expansion ROM handling is supported
>>> for x86 only and it might not be used by other architectures without
>>> emulating x86. Other use-cases may include using that expansion ROM 
>>> before
>>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>>> wants to use the ROM code which seems to be rare.
>> In the generic code, bars for ioports are actually skipped (check code 
>> before
>> in header.c, in case of ioports there is a continue) and no handler is 
>> registered for them.
>> The consequence will be that a guest will access hardware when reading 
>> those BARs.
> Yes, this seems to be a valid point
 So, with the approach we have developed these days we will ignore all 
 writes
 and return ~0 for reads for all unhandled ops, e.g. those which do not 
 have explicit
 register handlers employed. Thus, this case will fall into unhandled 
 clause.
>>> Except that I guess BARs are special in that reads may not return ~0,
>>> or else the low bits carry a meaning we don't want to convey. Unused
>>> BARs need to be hard-wired to 0, I think.
>> So, you mean we should have 2 sets of BAR handlers for guests:
>> 1. normal emulation (these are implemented in this patch)
>> 2. all other BARs: read 0/ignore write for all other BARs, including ROM, IO 
>> etc.
>>
>> Is this what you mean?
> I think that's what we're going to need, yes.
Ok, then I'll stuff that into this patch v6
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Jan Beulich
On 03.02.2022 13:48, Oleksandr Andrushchenko wrote:
> Hi, Jan!
> 
> On 03.02.22 14:44, Jan Beulich wrote:
>> On 03.02.2022 13:36, Oleksandr Andrushchenko wrote:
>>> Hi, Bertrand!
>>>
>>> On 26.11.21 14:19, Oleksandr Andrushchenko wrote:
 Hi, Bertrand!

 On 25.11.21 18:28, Bertrand Marquis wrote:
> Hi Oleksandr,
>
>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko  
>> wrote:
>>
>> From: Oleksandr Andrushchenko 
>>
>> Add relevant vpci register handlers when assigning PCI device to a domain
>> and remove those when de-assigning. This allows having different
>> handlers for different domains, e.g. hwdom and other guests.
>>
>> Emulate guest BAR register values: this allows creating a guest view
>> of the registers and emulates size and properties probe as it is done
>> during PCI device enumeration by the guest.
>>
>> ROM BAR is only handled for the hardware domain and for guest domains
>> there is a stub: at the moment PCI expansion ROM handling is supported
>> for x86 only and it might not be used by other architectures without
>> emulating x86. Other use-cases may include using that expansion ROM 
>> before
>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>> wants to use the ROM code which seems to be rare.
> In the generic code, bars for ioports are actually skipped (check code 
> before
> in header.c, in case of ioports there is a continue) and no handler is 
> registered for them.
> The consequence will be that a guest will access hardware when reading 
> those BARs.
 Yes, this seems to be a valid point
>>> So, with the approach we have developed these days we will ignore all writes
>>> and return ~0 for reads for all unhandled ops, e.g. those which do not have 
>>> explicit
>>> register handlers employed. Thus, this case will fall into unhandled clause.
>> Except that I guess BARs are special in that reads may not return ~0,
>> or else the low bits carry a meaning we don't want to convey. Unused
>> BARs need to be hard-wired to 0, I think.
> So, you mean we should have 2 sets of BAR handlers for guests:
> 1. normal emulation (these are implemented in this patch)
> 2. all other BARs: read 0/ignore write for all other BARs, including ROM, IO 
> etc.
> 
> Is this what you mean?

I think that's what we're going to need, yes.

Jan




Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Oleksandr Andrushchenko
Hi, Jan!

On 03.02.22 14:44, Jan Beulich wrote:
> On 03.02.2022 13:36, Oleksandr Andrushchenko wrote:
>> Hi, Bertrand!
>>
>> On 26.11.21 14:19, Oleksandr Andrushchenko wrote:
>>> Hi, Bertrand!
>>>
>>> On 25.11.21 18:28, Bertrand Marquis wrote:
 Hi Oleksandr,

> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko  
> wrote:
>
> From: Oleksandr Andrushchenko 
>
> Add relevant vpci register handlers when assigning PCI device to a domain
> and remove those when de-assigning. This allows having different
> handlers for different domains, e.g. hwdom and other guests.
>
> Emulate guest BAR register values: this allows creating a guest view
> of the registers and emulates size and properties probe as it is done
> during PCI device enumeration by the guest.
>
> ROM BAR is only handled for the hardware domain and for guest domains
> there is a stub: at the moment PCI expansion ROM handling is supported
> for x86 only and it might not be used by other architectures without
> emulating x86. Other use-cases may include using that expansion ROM before
> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
> wants to use the ROM code which seems to be rare.
 In the generic code, bars for ioports are actually skipped (check code 
 before
 in header.c, in case of ioports there is a continue) and no handler is 
 registered for them.
 The consequence will be that a guest will access hardware when reading 
 those BARs.
>>> Yes, this seems to be a valid point
>> So, with the approach we have developed these days we will ignore all writes
>> and return ~0 for reads for all unhandled ops, e.g. those which do not have 
>> explicit
>> register handlers employed. Thus, this case will fall into unhandled clause.
> Except that I guess BARs are special in that reads may not return ~0,
> or else the low bits carry a meaning we don't want to convey. Unused
> BARs need to be hard-wired to 0, I think.
So, you mean we should have 2 sets of BAR handlers for guests:
1. normal emulation (these are implemented in this patch)
2. all other BARs: read 0/ignore write for all other BARs, including ROM, IO 
etc.

Is this what you mean?
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Oleksandr Andrushchenko
Hi, Roger!
>> Also memory decoding needs to be initially disabled when used by
>> guests, in order to prevent the BAR being placed on top of a RAM
>> region. The guest physmap will be different from the host one, so it's
>> possible for BARs to end up placed on top of RAM regions initially
>> until the firmware or OS places them at a suitable address.
> Agree, memory decoding must be disabled
Isn't it already achieved by the toolstack resetting the PCI device
while assigning  it to a guest?

Thank you,
Oleksandr

Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Jan Beulich
On 03.02.2022 13:36, Oleksandr Andrushchenko wrote:
> Hi, Bertrand!
> 
> On 26.11.21 14:19, Oleksandr Andrushchenko wrote:
>> Hi, Bertrand!
>>
>> On 25.11.21 18:28, Bertrand Marquis wrote:
>>> Hi Oleksandr,
>>>
 On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko  
 wrote:

 From: Oleksandr Andrushchenko 

 Add relevant vpci register handlers when assigning PCI device to a domain
 and remove those when de-assigning. This allows having different
 handlers for different domains, e.g. hwdom and other guests.

 Emulate guest BAR register values: this allows creating a guest view
 of the registers and emulates size and properties probe as it is done
 during PCI device enumeration by the guest.

 ROM BAR is only handled for the hardware domain and for guest domains
 there is a stub: at the moment PCI expansion ROM handling is supported
 for x86 only and it might not be used by other architectures without
 emulating x86. Other use-cases may include using that expansion ROM before
 Xen boots, hence no emulation is needed in Xen itself. Or when a guest
 wants to use the ROM code which seems to be rare.
>>> In the generic code, bars for ioports are actually skipped (check code 
>>> before
>>> in header.c, in case of ioports there is a continue) and no handler is 
>>> registered for them.
>>> The consequence will be that a guest will access hardware when reading 
>>> those BARs.
>> Yes, this seems to be a valid point
> So, with the approach we have developed these days we will ignore all writes
> and return ~0 for reads for all unhandled ops, e.g. those which do not have 
> explicit
> register handlers employed. Thus, this case will fall into unhandled clause.

Except that I guess BARs are special in that reads may not return ~0,
or else the low bits carry a meaning we don't want to convey. Unused
BARs need to be hard-wired to 0, I think.

Jan




Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers

2022-02-03 Thread Oleksandr Andrushchenko
Hi, Bertrand!

On 26.11.21 14:19, Oleksandr Andrushchenko wrote:
> Hi, Bertrand!
>
> On 25.11.21 18:28, Bertrand Marquis wrote:
>> Hi Oleksandr,
>>
>>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko  
>>> wrote:
>>>
>>> From: Oleksandr Andrushchenko 
>>>
>>> Add relevant vpci register handlers when assigning PCI device to a domain
>>> and remove those when de-assigning. This allows having different
>>> handlers for different domains, e.g. hwdom and other guests.
>>>
>>> Emulate guest BAR register values: this allows creating a guest view
>>> of the registers and emulates size and properties probe as it is done
>>> during PCI device enumeration by the guest.
>>>
>>> ROM BAR is only handled for the hardware domain and for guest domains
>>> there is a stub: at the moment PCI expansion ROM handling is supported
>>> for x86 only and it might not be used by other architectures without
>>> emulating x86. Other use-cases may include using that expansion ROM before
>>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>>> wants to use the ROM code which seems to be rare.
>> In the generic code, bars for ioports are actually skipped (check code before
>> in header.c, in case of ioports there is a continue) and no handler is 
>> registered for them.
>> The consequence will be that a guest will access hardware when reading those 
>> BARs.
> Yes, this seems to be a valid point
So, with the approach we have developed these days we will ignore all writes
and return ~0 for reads for all unhandled ops, e.g. those which do not have 
explicit
register handlers employed. Thus, this case will fall into unhandled clause.

Thank you,
Oleksandr

Re: [PATCH] xen: add option to disable GNTTABOP_transfer

2022-02-03 Thread Jan Beulich
On 03.02.2022 11:55, Juergen Gross wrote:
> On 03.02.22 10:10, Jan Beulich wrote:
>> On 01.02.2022 10:02, Juergen Gross wrote:
>>> The grant table operation GNTTABOP_transfer is meant to be used in
>>> PV device backends, and it hasn't been used in Linux since the old
>>> Xen-o-Linux days.
>>
>> Kind of unusual spelling of XenoLinux ;-)
>>
>>> --- a/xen/common/grant_table.c
>>> +++ b/xen/common/grant_table.c
>>> @@ -181,6 +181,7 @@ static int parse_gnttab_max_maptrack_frames(const char 
>>> *arg)
>>>   
>>>   unsigned int __read_mostly opt_gnttab_max_version = GNTTAB_MAX_VERSION;
>>>   static bool __read_mostly opt_transitive_grants = true;
>>> +static bool __read_mostly opt_grant_transfer = true;
>>
>> If this was conditional upon PV (with a #define to false in the
>> opposite case), it could be __ro_after_init right away, while at
>> the same time allowing the compiler to eliminate gnttab_transfer().
> 
> Nice idea. The other option would be to put all (or most) of
> gnttab_transfer() in a "#ifdef CONFIG_PV" section, allowing to
> remove the "#ifdef CONFIG_X86" parts in it, too.

Yes, sure. The downside being that then this code won't be compile-
tested anymore in !PV builds. Yet keeping code visible to compilers
is what we aim for elsewhere by preferring if(IS_ENABLED(...)) over
#ifdef where possible.

Jan




Re: [PATCH] tools/libxl: don't allow IOMMU usage with PoD

2022-02-03 Thread Jan Beulich
On 03.02.2022 12:06, Roger Pau Monne wrote:
> Prevent libxl from creating guests that attempts to use PoD together
> with an IOMMU, even if no devices are actually assigned.
> 
> While the hypervisor could support using PoD together with an IOMMU as
> long as no devices are assigned, such usage seems doubtful. There's no
> guarantee the guest has ballooned down enough memory for PoD to no
> longer be active, and thus a later assignment of a PCI device to such
> domain could fail.

That's not a precise description of the constraint: The guest ballooning
down enough only means entries == cache, but for device assignment we
need entries == 0 (and a guarantee that no new entries can appear, but I
think this is already the case once a guest was launched).

(FWIW the wording in code comment and log message read fine to me.)

Jan




Re: [PATCH] docs: document patch rules

2022-02-03 Thread Jan Beulich
On 03.02.2022 11:31, Juergen Gross wrote:
> On 03.02.22 11:07, Jan Beulich wrote:
>> On 02.02.2022 12:44, Juergen Gross wrote:
>>> +## The commit message
>>> +
>>> +The commit message is free text describing *why* the patch is done and
>>> +*how* the goal of the patch is achieved. A good commit message will 
>>> describe
>>> +the current situation, the desired goal, and the way this goal is being
>>> +achieved. Parts of that can be omitted in obvious cases.
>>> +
>>> +In case additional changes are done in the patch (like e.g. cleanups), 
>>> those
>>> +should be mentioned.
>>> +
>>> +When referencing other patches (e.g. `patch xy introduced a bug ...`) those
>>> +patches should be referenced via their commit id (at least 12 digits) and 
>>> the
>>> +patch subject:
>>> +
>>> +Commit 67d01cdb5518 ("x86: infrastructure to allow converting certain
>>> +indirect calls to direct ones") introduced a bug ...
>>
>> I think this should have a reference to the Fixes: tag, as generally it
>> makes the text less convoluted if it references such a tag rather than
>> spelling out hash and title a 2nd time.
> 
> I think this depends on the use case. If the cited patch is in the
> Fixes: tag I agree. Sometimes a patch is cited for other reasons, e.g.
> when adding a fix similar to the one in the cited patch. I always like
> to have the commit id in such a case.
> 
> Are you fine with me rephrasing the text to:
> 
> When referencing other patches (e.g. `similar to patch xy ...`) those
> patches should be referenced via their commit id (at least 12 digits)
> and the patch subject, if the very same patch isn't referenced by the
> `Fixes:` tag, too:

Sounds good to me.

>>> +### Reviewed-by:
>>> +
>>> +A `Reviewed-by:` tag can only be given by a reviewer of the patch. With
>>> +responding to a sent patch adding the `Reviewed-by:` tag the reviewer
>>> +(which can be anybody) confirms to have looked thoroughly at the patch and
>>> +didn't find any issue (being it technical, legal or formal ones). If the
>>> +review is covering only some parts of the patch, those parts can optionally
>>> +be specified (multiple areas can be covered with multiple `Reviewed-by:`
>>> +tags).
>>
>> I'd prefer if the comma separated form was also explicitly mentioned
>> (and hence permitted) here. I'd even go as far as suggesting that this
>> should be the preferred form as long as line length constraints permit.
> 
> OTOH this will make automated parsing harder.
> 
> I'm open for both variants, just wanted to mention why I've chosen the
> multiline form initially.

Unless commas are expected to be part of such "identifiers", I don't think
I see how parsing would become meaningfully harder. When the email address
is wanted, parsing would strip # and everything following anyway.

Jan




[qemu-mainline test] 167991: tolerable FAIL - PUSHED

2022-02-03 Thread osstest service owner
flight 167991 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167991/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds18 guest-start/debian.repeat fail REGR. vs. 167987

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167987
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167987
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167987
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167987
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167987
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167987
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167987
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167987
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuuf7c0e223acd5021d03736644cc0abf3501003820
baseline version:
 qemuu47cc1a3655135b89fa75c2824fbddd29df874612

Last test of basis   167987  2022-02-02 13:39:33 Z0 days
Testing same since   167991  2022-02-02 23:39:43 Z0 days1 attempts

---

[PATCH] tools/libxl: don't allow IOMMU usage with PoD

2022-02-03 Thread Roger Pau Monne
Prevent libxl from creating guests that attempts to use PoD together
with an IOMMU, even if no devices are actually assigned.

While the hypervisor could support using PoD together with an IOMMU as
long as no devices are assigned, such usage seems doubtful. There's no
guarantee the guest has ballooned down enough memory for PoD to no
longer be active, and thus a later assignment of a PCI device to such
domain could fail.

Preventing the usage of PoD together with an IOMMU at guest creation
avoids having to add checks for active PoD entries in the device
assignment paths.

Signed-off-by: Roger Pau Monné 
---
Cc: Jan Beulich 
---
 tools/libs/light/libxl_create.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index d7a40d7550..7499922088 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
 pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) &&
 (d_config->b_info.target_memkb < d_config->b_info.max_memkb);
 
-/* We cannot have PoD and PCI device assignment at the same time
- * for HVM guest. It was reported that IOMMU cannot work with PoD
- * enabled because it needs to populated entire page table for
- * guest. To stay on the safe side, we disable PCI device
- * assignment when PoD is enabled.
+/* We don't support having PoD and an IOMMU at the same time for HVM
+ * guests. An active IOMMU cannot work with PoD because it needs a fully
+ * populated page-table. Prevent PoD usage if the domain has an IOMMU
+ * assigned, even if not active.
  */
 if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
-d_config->num_pcidevs && pod_enabled) {
+d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
+pod_enabled) {
 ret = ERROR_INVAL;
-LOGD(ERROR, domid,
- "PCI device assignment for HVM guest failed due to PoD enabled");
+LOGD(ERROR, domid, "IOMMU not supported together with PoD");
 goto error_out;
 }
 
-- 
2.34.1




Re: [PATCH] xen: add option to disable GNTTABOP_transfer

2022-02-03 Thread Juergen Gross

On 03.02.22 10:10, Jan Beulich wrote:

On 01.02.2022 10:02, Juergen Gross wrote:

The grant table operation GNTTABOP_transfer is meant to be used in
PV device backends, and it hasn't been used in Linux since the old
Xen-o-Linux days.


Kind of unusual spelling of XenoLinux ;-)


--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -181,6 +181,7 @@ static int parse_gnttab_max_maptrack_frames(const char *arg)
  
  unsigned int __read_mostly opt_gnttab_max_version = GNTTAB_MAX_VERSION;

  static bool __read_mostly opt_transitive_grants = true;
+static bool __read_mostly opt_grant_transfer = true;


If this was conditional upon PV (with a #define to false in the
opposite case), it could be __ro_after_init right away, while at
the same time allowing the compiler to eliminate gnttab_transfer().


Nice idea. The other option would be to put all (or most) of
gnttab_transfer() in a "#ifdef CONFIG_PV" section, allowing to
remove the "#ifdef CONFIG_X86" parts in it, too.




@@ -204,6 +205,8 @@ static int __init parse_gnttab(const char *s)
  }
  else if ( (val = parse_boolean("transitive", s, ss)) >= 0 )
  opt_transitive_grants = val;
+else if ( (val = parse_boolean("transfer", s, ss)) >= 0 )
+opt_grant_transfer = val;
  else
  rc = -EINVAL;


To possibly save a further roundtrip: If the PV dependency was added
above, I'd like to ask to follow the model of parse_iommu_param()
here and use "#ifndef opt_grant_transfer" around the added code in
favor of "#ifdef CONFIG_PV".


Okay.




@@ -2233,6 +2236,9 @@ gnttab_transfer(
  unsigned int max_bitsize;
  struct active_grant_entry *act;
  
+if ( !opt_grant_transfer )

+return -ENOSYS;


-EOPNOTSUPP please.


Yes, that's better.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] docs: document patch rules

2022-02-03 Thread Juergen Gross

On 03.02.22 11:07, Jan Beulich wrote:

On 02.02.2022 12:44, Juergen Gross wrote:

--- /dev/null
+++ b/docs/process/sending-patches.pandoc
@@ -0,0 +1,284 @@
+# How a proper patch should look like
+
+This is a brief description how a proper patch for the Xen project should
+look like. Examples and tooling tips are not part of this document, those
+can be found in the
+[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches).
+
+## The patch subject
+
+The first line at the top of the patch should contain a short description of
+what the patch does, and hints as to what code it touches. This line is used
+as the **Subject** line of the mail when sending the patch.
+
+The hint which code is touched us usually in form of a relative path inside


Nit: s/ us / is /


+the Xen git repository, where obvious directories can be omitted or replaced
+by abbreviations, or it can be a single word describing the topic:
+
+: 
+
+E.g.:
+
+xen/arm: increase memory banks number define value
+tools/libs/evtchn: Deduplicate xenevtchn_fd()
+MAINTAINERS: update my email address
+build: correct usage comments in Kbuild.include


I realize there's "usually" in the wording, but I'm still uncertain in how
far we want to suggest paths here. I have to admit that I never really
liked overly long prefixes like the "tools/libs/evtchn:" you give as
example. The prefix should be sufficiently unambiguous, yes, but in this
particular case "libs/evtchn:" or "libxenevtchn:" would be enough to
achieve that.

I'd prefer if the tag was described as specifying a (sub-)component (or
other abstract entity, like is the case for your "build:" example).


Okay.




+The description should give a rough hint *what* is done in the patch.
+
+The subject line should in general not exceed 80 characters. It must be
+followed by a blank line.
+
+## The commit message
+
+The commit message is free text describing *why* the patch is done and
+*how* the goal of the patch is achieved. A good commit message will describe
+the current situation, the desired goal, and the way this goal is being
+achieved. Parts of that can be omitted in obvious cases.
+
+In case additional changes are done in the patch (like e.g. cleanups), those
+should be mentioned.
+
+When referencing other patches (e.g. `patch xy introduced a bug ...`) those
+patches should be referenced via their commit id (at least 12 digits) and the
+patch subject:
+
+Commit 67d01cdb5518 ("x86: infrastructure to allow converting certain
+indirect calls to direct ones") introduced a bug ...


I think this should have a reference to the Fixes: tag, as generally it
makes the text less convoluted if it references such a tag rather than
spelling out hash and title a 2nd time.


I think this depends on the use case. If the cited patch is in the
Fixes: tag I agree. Sometimes a patch is cited for other reasons, e.g.
when adding a fix similar to the one in the cited patch. I always like
to have the commit id in such a case.

Are you fine with me rephrasing the text to:

When referencing other patches (e.g. `similar to patch xy ...`) those
patches should be referenced via their commit id (at least 12 digits)
and the patch subject, if the very same patch isn't referenced by the
`Fixes:` tag, too:




+## Tags
+
+Tags are entries in the form
+
+Tag: something
+
+In general tags are added in chronological order. So a `Reviewed-by:` tag
+should be added **after** the `Signed-off-by:` tag, as the review happened
+after the patch was written.
+
+Do not split a tag across multiple lines, tags are exempt from the
+"wrap at 75 columns" rule in order to simplify parsing scripts.
+
+### Taken-from:
+
+Xen has inherited some source files from other open source projects. In case
+a patch modifying such an inherited file is taken from that project (maybe in
+modified form), the `Taken-from:` tag specifies the source of the patch:
+
+Taken-from:  
+
+E.g.:
+
+Taken-from: 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f093b08c47b3
+
+All tags **above** the `Taken-from:` tag are from the original patch (which
+should all be kept), while tags **after** `Taken-from:` are related to the
+normal Xen patch process as described here.


While I don't mind it becoming "Taken-from:", I'd like to put up for
consideration the (slightly shorter) alternative of "Origin:".


Fine with me.




+### Reported-by:
+
+This optional tag can be used to give credit to someone reporting an issue.
+It is in the format:
+
+Reported-by: name 
+
+E.g.:
+
+Reported-by: Jane Doe 
+
+As the email address will be made public via git, the reporter of an issue
+should be asked whether he/she is fine with being mentioned in the patch.
+
+### Suggested-by:
+
+This optional tag can be used to give credit to someone having suggested the
+solution the patch is implementing. It is in the format:
+
+Suggested-by: name 
+
+E.g.:
+
+Suggested-by: Jane Doe 
+
+As the email address will 

Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests

2022-02-03 Thread Roger Pau Monné
On Thu, Feb 03, 2022 at 11:20:15AM +0100, Jan Beulich wrote:
> On 03.02.2022 10:52, Roger Pau Monné wrote:
> > On Thu, Feb 03, 2022 at 10:21:54AM +0100, Jan Beulich wrote:
> >> On 03.02.2022 10:04, Roger Pau Monné wrote:
> >>> On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote:
>  On 02.02.2022 17:13, Roger Pau Monné wrote:
> > On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote:
> >> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d,
> >>  
> >>  ASSERT( pod_target >= p2m->pod.count );
> >>  
> >> -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
> >> +if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
> >
> > Is it possible to have cache flush allowed without any PCI device
> > assigned? AFAICT the iomem/ioport_caps would only get setup when there
> > are device passed through?
> 
>  One can assign MMIO or ports to a guest the raw way. That's not
>  secure, but functionally explicitly permitted.
> 
> > TBH I would be fine if we just say that PoD cannot be used in
> > conjunction with an IOMMU, and just check for is_iommu_enable(d) here.
> >
> > I understand it's technically possible for PoD to be used together
> > with a domain that will later get a device passed through once PoD is
> > no longer in use, but I doubt there's much value in supporting that
> > use case, and I fear we might be introducing corner cases that could
> > create issues in the future. Overall I think it would be safer to just
> > disable PoD in conjunction with an IOMMU.
> 
>  I consider it wrong to put in place such a restriction, but I could
>  perhaps accept you and Andrew thinking this way if this was the only
>  aspect playing into here. However, this would then want an equivalent
>  tools side check, and while hunting down where to make the change as
>  done here, I wasn't able to figure out where that alternative
>  adjustment would need doing. Hence I would possibly(!) buy into this
>  only if someone else took care of doing so properly in the tool stack
>  (including the emission of a sensible error message).
> >>>
> >>> What about the (completely untested) chunk below:
> >>>
> >>> diff --git a/tools/libs/light/libxl_create.c 
> >>> b/tools/libs/light/libxl_create.c
> >>> index d7a40d7550..e585ef4c5c 100644
> >>> --- a/tools/libs/light/libxl_create.c
> >>> +++ b/tools/libs/light/libxl_create.c
> >>> @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
> >>>  pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) &&
> >>>  (d_config->b_info.target_memkb < d_config->b_info.max_memkb);
> >>>  
> >>> -/* We cannot have PoD and PCI device assignment at the same time
> >>> +/* We cannot have PoD and an active IOMMU at the same time
> >>>   * for HVM guest. It was reported that IOMMU cannot work with PoD
> >>>   * enabled because it needs to populated entire page table for
> >>> - * guest. To stay on the safe side, we disable PCI device
> >>> - * assignment when PoD is enabled.
> >>> + * guest.
> >>>   */
> >>>  if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
> >>> -d_config->num_pcidevs && pod_enabled) {
> >>> +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
> >>> +pod_enabled) {
> >>>  ret = ERROR_INVAL;
> >>> -LOGD(ERROR, domid,
> >>> - "PCI device assignment for HVM guest failed due to PoD 
> >>> enabled");
> >>> +LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD");
> >>>  goto error_out;
> >>>  }
> >>
> >> Perhaps. Seeing this I actually recall coming across this check during
> >> my investigation. Not changing it along the lines of what you do was
> >> then really more because of me not being convinced of the extra
> >> restriction; I clearly misremembered when writing the earlier reply.
> >> If we were to do what you suggest, I'd like to ask that the comment be
> >> changed differently, though: "We cannot ..." then isn't really true
> >> anymore. We choose not to permit this mode; "cannot" only applies to
> >> actual device assignment (and of course only as long as there aren't
> >> restartable IOMMU faults).
> > 
> > I'm fine with an adjusted wording here. This was mostly a placement
> > suggestion, but I didn't gave much thought to the error message.
> 
> FTAOD: Are you going to transform this into a proper patch then? While
> I wouldn't object to such a behavioral change, I also wouldn't want to
> put my name under it. But if it went in, I think I might be able to
> then drop the libxl adjustment from my patch.

Oh, I somewhat assumed you would integrate this check into the patch.
I can send a standalone patch myself if that's your preference. Let me
do that now.

Roger.



Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests

2022-02-03 Thread Jan Beulich
On 03.02.2022 10:52, Roger Pau Monné wrote:
> On Thu, Feb 03, 2022 at 10:21:54AM +0100, Jan Beulich wrote:
>> On 03.02.2022 10:04, Roger Pau Monné wrote:
>>> On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote:
 On 02.02.2022 17:13, Roger Pau Monné wrote:
> On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote:
>> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d,
>>  
>>  ASSERT( pod_target >= p2m->pod.count );
>>  
>> -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
>> +if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
>
> Is it possible to have cache flush allowed without any PCI device
> assigned? AFAICT the iomem/ioport_caps would only get setup when there
> are device passed through?

 One can assign MMIO or ports to a guest the raw way. That's not
 secure, but functionally explicitly permitted.

> TBH I would be fine if we just say that PoD cannot be used in
> conjunction with an IOMMU, and just check for is_iommu_enable(d) here.
>
> I understand it's technically possible for PoD to be used together
> with a domain that will later get a device passed through once PoD is
> no longer in use, but I doubt there's much value in supporting that
> use case, and I fear we might be introducing corner cases that could
> create issues in the future. Overall I think it would be safer to just
> disable PoD in conjunction with an IOMMU.

 I consider it wrong to put in place such a restriction, but I could
 perhaps accept you and Andrew thinking this way if this was the only
 aspect playing into here. However, this would then want an equivalent
 tools side check, and while hunting down where to make the change as
 done here, I wasn't able to figure out where that alternative
 adjustment would need doing. Hence I would possibly(!) buy into this
 only if someone else took care of doing so properly in the tool stack
 (including the emission of a sensible error message).
>>>
>>> What about the (completely untested) chunk below:
>>>
>>> diff --git a/tools/libs/light/libxl_create.c 
>>> b/tools/libs/light/libxl_create.c
>>> index d7a40d7550..e585ef4c5c 100644
>>> --- a/tools/libs/light/libxl_create.c
>>> +++ b/tools/libs/light/libxl_create.c
>>> @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
>>>  pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) &&
>>>  (d_config->b_info.target_memkb < d_config->b_info.max_memkb);
>>>  
>>> -/* We cannot have PoD and PCI device assignment at the same time
>>> +/* We cannot have PoD and an active IOMMU at the same time
>>>   * for HVM guest. It was reported that IOMMU cannot work with PoD
>>>   * enabled because it needs to populated entire page table for
>>> - * guest. To stay on the safe side, we disable PCI device
>>> - * assignment when PoD is enabled.
>>> + * guest.
>>>   */
>>>  if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
>>> -d_config->num_pcidevs && pod_enabled) {
>>> +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
>>> +pod_enabled) {
>>>  ret = ERROR_INVAL;
>>> -LOGD(ERROR, domid,
>>> - "PCI device assignment for HVM guest failed due to PoD 
>>> enabled");
>>> +LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD");
>>>  goto error_out;
>>>  }
>>
>> Perhaps. Seeing this I actually recall coming across this check during
>> my investigation. Not changing it along the lines of what you do was
>> then really more because of me not being convinced of the extra
>> restriction; I clearly misremembered when writing the earlier reply.
>> If we were to do what you suggest, I'd like to ask that the comment be
>> changed differently, though: "We cannot ..." then isn't really true
>> anymore. We choose not to permit this mode; "cannot" only applies to
>> actual device assignment (and of course only as long as there aren't
>> restartable IOMMU faults).
> 
> I'm fine with an adjusted wording here. This was mostly a placement
> suggestion, but I didn't gave much thought to the error message.

FTAOD: Are you going to transform this into a proper patch then? While
I wouldn't object to such a behavioral change, I also wouldn't want to
put my name under it. But if it went in, I think I might be able to
then drop the libxl adjustment from my patch.

Jan




Re: [PATCH] docs: document patch rules

2022-02-03 Thread Jan Beulich
On 02.02.2022 12:44, Juergen Gross wrote:
> --- /dev/null
> +++ b/docs/process/sending-patches.pandoc
> @@ -0,0 +1,284 @@
> +# How a proper patch should look like
> +
> +This is a brief description how a proper patch for the Xen project should
> +look like. Examples and tooling tips are not part of this document, those
> +can be found in the
> +[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches).
> +
> +## The patch subject
> +
> +The first line at the top of the patch should contain a short description of
> +what the patch does, and hints as to what code it touches. This line is used
> +as the **Subject** line of the mail when sending the patch.
> +
> +The hint which code is touched us usually in form of a relative path inside

Nit: s/ us / is /

> +the Xen git repository, where obvious directories can be omitted or replaced
> +by abbreviations, or it can be a single word describing the topic:
> +
> +: 
> +
> +E.g.:
> +
> +xen/arm: increase memory banks number define value
> +tools/libs/evtchn: Deduplicate xenevtchn_fd()
> +MAINTAINERS: update my email address
> +build: correct usage comments in Kbuild.include

I realize there's "usually" in the wording, but I'm still uncertain in how
far we want to suggest paths here. I have to admit that I never really
liked overly long prefixes like the "tools/libs/evtchn:" you give as
example. The prefix should be sufficiently unambiguous, yes, but in this
particular case "libs/evtchn:" or "libxenevtchn:" would be enough to
achieve that.

I'd prefer if the tag was described as specifying a (sub-)component (or
other abstract entity, like is the case for your "build:" example).

> +The description should give a rough hint *what* is done in the patch.
> +
> +The subject line should in general not exceed 80 characters. It must be
> +followed by a blank line.
> +
> +## The commit message
> +
> +The commit message is free text describing *why* the patch is done and
> +*how* the goal of the patch is achieved. A good commit message will describe
> +the current situation, the desired goal, and the way this goal is being
> +achieved. Parts of that can be omitted in obvious cases.
> +
> +In case additional changes are done in the patch (like e.g. cleanups), those
> +should be mentioned.
> +
> +When referencing other patches (e.g. `patch xy introduced a bug ...`) those
> +patches should be referenced via their commit id (at least 12 digits) and the
> +patch subject:
> +
> +Commit 67d01cdb5518 ("x86: infrastructure to allow converting certain
> +indirect calls to direct ones") introduced a bug ...

I think this should have a reference to the Fixes: tag, as generally it
makes the text less convoluted if it references such a tag rather than
spelling out hash and title a 2nd time.

> +## Tags
> +
> +Tags are entries in the form
> +
> +Tag: something
> +
> +In general tags are added in chronological order. So a `Reviewed-by:` tag
> +should be added **after** the `Signed-off-by:` tag, as the review happened
> +after the patch was written.
> +
> +Do not split a tag across multiple lines, tags are exempt from the
> +"wrap at 75 columns" rule in order to simplify parsing scripts.
> +
> +### Taken-from:
> +
> +Xen has inherited some source files from other open source projects. In case
> +a patch modifying such an inherited file is taken from that project (maybe in
> +modified form), the `Taken-from:` tag specifies the source of the patch:
> +
> +Taken-from:  
> +
> +E.g.:
> +
> +Taken-from: 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f093b08c47b3
> +
> +All tags **above** the `Taken-from:` tag are from the original patch (which
> +should all be kept), while tags **after** `Taken-from:` are related to the
> +normal Xen patch process as described here.

While I don't mind it becoming "Taken-from:", I'd like to put up for
consideration the (slightly shorter) alternative of "Origin:".

> +### Reported-by:
> +
> +This optional tag can be used to give credit to someone reporting an issue.
> +It is in the format:
> +
> +Reported-by: name 
> +
> +E.g.:
> +
> +Reported-by: Jane Doe 
> +
> +As the email address will be made public via git, the reporter of an issue
> +should be asked whether he/she is fine with being mentioned in the patch.
> +
> +### Suggested-by:
> +
> +This optional tag can be used to give credit to someone having suggested the
> +solution the patch is implementing. It is in the format:
> +
> +Suggested-by: name 
> +
> +E.g.:
> +
> +Suggested-by: Jane Doe 
> +
> +As the email address will be made public via git, the reporter of an issue
> +should be asked whether he/she is fine with being mentioned in the patch.

Besides these two we've also been using Requested-by:, which I think in
some cases conveys information more precisely than Suggested-by: (e.g.
when some result was to be achieved without a solution or path there
having been given).

> +### Reviewed-by:
> +
> +A `Review

Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests

2022-02-03 Thread Roger Pau Monné
On Thu, Feb 03, 2022 at 10:21:54AM +0100, Jan Beulich wrote:
> On 03.02.2022 10:04, Roger Pau Monné wrote:
> > On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote:
> >> On 02.02.2022 17:13, Roger Pau Monné wrote:
> >>> On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote:
>  @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d,
>   
>   ASSERT( pod_target >= p2m->pod.count );
>   
>  -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
>  +if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
> >>>
> >>> Is it possible to have cache flush allowed without any PCI device
> >>> assigned? AFAICT the iomem/ioport_caps would only get setup when there
> >>> are device passed through?
> >>
> >> One can assign MMIO or ports to a guest the raw way. That's not
> >> secure, but functionally explicitly permitted.
> >>
> >>> TBH I would be fine if we just say that PoD cannot be used in
> >>> conjunction with an IOMMU, and just check for is_iommu_enable(d) here.
> >>>
> >>> I understand it's technically possible for PoD to be used together
> >>> with a domain that will later get a device passed through once PoD is
> >>> no longer in use, but I doubt there's much value in supporting that
> >>> use case, and I fear we might be introducing corner cases that could
> >>> create issues in the future. Overall I think it would be safer to just
> >>> disable PoD in conjunction with an IOMMU.
> >>
> >> I consider it wrong to put in place such a restriction, but I could
> >> perhaps accept you and Andrew thinking this way if this was the only
> >> aspect playing into here. However, this would then want an equivalent
> >> tools side check, and while hunting down where to make the change as
> >> done here, I wasn't able to figure out where that alternative
> >> adjustment would need doing. Hence I would possibly(!) buy into this
> >> only if someone else took care of doing so properly in the tool stack
> >> (including the emission of a sensible error message).
> > 
> > What about the (completely untested) chunk below:
> > 
> > diff --git a/tools/libs/light/libxl_create.c 
> > b/tools/libs/light/libxl_create.c
> > index d7a40d7550..e585ef4c5c 100644
> > --- a/tools/libs/light/libxl_create.c
> > +++ b/tools/libs/light/libxl_create.c
> > @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
> >  pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) &&
> >  (d_config->b_info.target_memkb < d_config->b_info.max_memkb);
> >  
> > -/* We cannot have PoD and PCI device assignment at the same time
> > +/* We cannot have PoD and an active IOMMU at the same time
> >   * for HVM guest. It was reported that IOMMU cannot work with PoD
> >   * enabled because it needs to populated entire page table for
> > - * guest. To stay on the safe side, we disable PCI device
> > - * assignment when PoD is enabled.
> > + * guest.
> >   */
> >  if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
> > -d_config->num_pcidevs && pod_enabled) {
> > +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
> > +pod_enabled) {
> >  ret = ERROR_INVAL;
> > -LOGD(ERROR, domid,
> > - "PCI device assignment for HVM guest failed due to PoD 
> > enabled");
> > +LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD");
> >  goto error_out;
> >  }
> 
> Perhaps. Seeing this I actually recall coming across this check during
> my investigation. Not changing it along the lines of what you do was
> then really more because of me not being convinced of the extra
> restriction; I clearly misremembered when writing the earlier reply.
> If we were to do what you suggest, I'd like to ask that the comment be
> changed differently, though: "We cannot ..." then isn't really true
> anymore. We choose not to permit this mode; "cannot" only applies to
> actual device assignment (and of course only as long as there aren't
> restartable IOMMU faults).

I'm fine with an adjusted wording here. This was mostly a placement
suggestion, but I didn't gave much thought to the error message.

> >> Finally this still leaves out the "raw MMIO / ports" case mentioned
> >> above.
> > 
> > But the raw MMIO 'mode' doesn't care much about PoD, because if
> > there's no PCI device assigned there's no IOMMU setup, and thus such
> > raw MMIO regions (could?) belong to a device that's not constrained by
> > the guest p2m anyway?
> 
> Hmm, yes, true.
> 
>  --- a/xen/common/vm_event.c
>  +++ b/xen/common/vm_event.c
>  @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st
>   
>   rc = -EXDEV;
>   /* Disallow paging in a PoD guest */
>  -if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) )
>  +if ( p2m_pod_active(d) )
> >>>
> >>> Isn't it fine to just check for entry_count like you suggest in the
> >>> chan

Re: [PATCH] docs: document patch rules

2022-02-03 Thread Juergen Gross

On 03.02.22 10:36, Roger Pau Monné wrote:

On Wed, Feb 02, 2022 at 12:44:48PM +0100, Juergen Gross wrote:

Add a document to describe the rules for sending a proper patch.

As it contains all the information already being present in
docs/process/tags.pandoc remove that file.

Signed-off-by: Juergen Gross 
---
  docs/process/sending-patches.pandoc | 284 
  docs/process/tags.pandoc|  55 --
  2 files changed, 284 insertions(+), 55 deletions(-)
  create mode 100644 docs/process/sending-patches.pandoc
  delete mode 100644 docs/process/tags.pandoc

diff --git a/docs/process/sending-patches.pandoc 
b/docs/process/sending-patches.pandoc
new file mode 100644
index 00..4cfc6e1a5b
--- /dev/null
+++ b/docs/process/sending-patches.pandoc
@@ -0,0 +1,284 @@
+# How a proper patch should look like
+
+This is a brief description how a proper patch for the Xen project should
+look like. Examples and tooling tips are not part of this document, those
+can be found in the
+[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches).
+
+## The patch subject
+
+The first line at the top of the patch should contain a short description of
+what the patch does, and hints as to what code it touches. This line is used
+as the **Subject** line of the mail when sending the patch.
+
+The hint which code is touched us usually in form of a relative path inside
+the Xen git repository, where obvious directories can be omitted or replaced
+by abbreviations, or it can be a single word describing the topic:
+
+: 


I would use  maybe instead of path, to explicitly note this
is not usually a real path inside the repo.


Good idea.




+
+E.g.:
+
+xen/arm: increase memory banks number define value
+tools/libs/evtchn: Deduplicate xenevtchn_fd()


Mostly a nit, but since this document is about style: I wouldn't
recommend using a capital letter after ':' by default. The above line
should instead be:

 tools/libs/evtchn: deduplicate xenevtchn_fd()



Yes.


+MAINTAINERS: update my email address
+build: correct usage comments in Kbuild.include
+
+The description should give a rough hint *what* is done in the patch.
+
+The subject line should in general not exceed 80 characters. It must be
+followed by a blank line.
+
+## The commit message
+
+The commit message is free text describing *why* the patch is done and
+*how* the goal of the patch is achieved. A good commit message will describe
+the current situation, the desired goal, and the way this goal is being
+achieved. Parts of that can be omitted in obvious cases.
+
+In case additional changes are done in the patch (like e.g. cleanups), those
+should be mentioned.
+
+When referencing other patches (e.g. `patch xy introduced a bug ...`) those
+patches should be referenced via their commit id (at least 12 digits) and the
+patch subject:
+
+Commit 67d01cdb5518 ("x86: infrastructure to allow converting certain
+indirect calls to direct ones") introduced a bug ...
+
+The following ``git config`` settings can be used to add a pretty format for
+outputting the above style in the ``git log`` or ``git show`` commands:
+
+[core]
+abbrev = 12
+[pretty]
+fixes = Fixes: %h (\"%s\")
+
+Lines in the commit message should not exceed 75 characters, except when
+copying error output directly into the commit message.
+
+## Tags
+
+Tags are entries in the form
+
+Tag: something
+
+In general tags are added in chronological order. So a `Reviewed-by:` tag
+should be added **after** the `Signed-off-by:` tag, as the review happened
+after the patch was written.
+
+Do not split a tag across multiple lines, tags are exempt from the
+"wrap at 75 columns" rule in order to simplify parsing scripts.
+
+### Taken-from:
+
+Xen has inherited some source files from other open source projects. In case
+a patch modifying such an inherited file is taken from that project (maybe in
+modified form), the `Taken-from:` tag specifies the source of the patch:
+
+Taken-from:  
+
+E.g.:
+
+Taken-from: 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f093b08c47b3
+
+All tags **above** the `Taken-from:` tag are from the original patch (which
+should all be kept), while tags **after** `Taken-from:` are related to the
+normal Xen patch process as described here.
+
+### Fixes:
+
+If your patch fixes a bug in a specific commit, e.g. you found an issue using
+``git bisect``, please use the `Fixes:` tag with the first 12 characters of
+the commit id, and the one line summary.
+
+Fixes:  ("")
+
+E.g.:
+
+Fixes: 67d01cdb5518 ("x86: infrastructure to allow converting certain indirect 
calls to direct ones")
+
+### Backport:
+
+A backport tag is an optional tag in the commit message to request a
+given commit to be backported to the released trees:
+
+Backport:  [# ]


So we already had a documented usage of '#' in tags, which I think
should make it a better candidat

Re: [PATCH] docs: document patch rules

2022-02-03 Thread Roger Pau Monné
On Wed, Feb 02, 2022 at 12:44:48PM +0100, Juergen Gross wrote:
> Add a document to describe the rules for sending a proper patch.
> 
> As it contains all the information already being present in
> docs/process/tags.pandoc remove that file.
> 
> Signed-off-by: Juergen Gross 
> ---
>  docs/process/sending-patches.pandoc | 284 
>  docs/process/tags.pandoc|  55 --
>  2 files changed, 284 insertions(+), 55 deletions(-)
>  create mode 100644 docs/process/sending-patches.pandoc
>  delete mode 100644 docs/process/tags.pandoc
> 
> diff --git a/docs/process/sending-patches.pandoc 
> b/docs/process/sending-patches.pandoc
> new file mode 100644
> index 00..4cfc6e1a5b
> --- /dev/null
> +++ b/docs/process/sending-patches.pandoc
> @@ -0,0 +1,284 @@
> +# How a proper patch should look like
> +
> +This is a brief description how a proper patch for the Xen project should
> +look like. Examples and tooling tips are not part of this document, those
> +can be found in the
> +[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches).
> +
> +## The patch subject
> +
> +The first line at the top of the patch should contain a short description of
> +what the patch does, and hints as to what code it touches. This line is used
> +as the **Subject** line of the mail when sending the patch.
> +
> +The hint which code is touched us usually in form of a relative path inside
> +the Xen git repository, where obvious directories can be omitted or replaced
> +by abbreviations, or it can be a single word describing the topic:
> +
> +: 

I would use  maybe instead of path, to explicitly note this
is not usually a real path inside the repo.

> +
> +E.g.:
> +
> +xen/arm: increase memory banks number define value
> +tools/libs/evtchn: Deduplicate xenevtchn_fd()

Mostly a nit, but since this document is about style: I wouldn't
recommend using a capital letter after ':' by default. The above line
should instead be:

tools/libs/evtchn: deduplicate xenevtchn_fd()

> +MAINTAINERS: update my email address
> +build: correct usage comments in Kbuild.include
> +
> +The description should give a rough hint *what* is done in the patch.
> +
> +The subject line should in general not exceed 80 characters. It must be
> +followed by a blank line.
> +
> +## The commit message
> +
> +The commit message is free text describing *why* the patch is done and
> +*how* the goal of the patch is achieved. A good commit message will describe
> +the current situation, the desired goal, and the way this goal is being
> +achieved. Parts of that can be omitted in obvious cases.
> +
> +In case additional changes are done in the patch (like e.g. cleanups), those
> +should be mentioned.
> +
> +When referencing other patches (e.g. `patch xy introduced a bug ...`) those
> +patches should be referenced via their commit id (at least 12 digits) and the
> +patch subject:
> +
> +Commit 67d01cdb5518 ("x86: infrastructure to allow converting certain
> +indirect calls to direct ones") introduced a bug ...
> +
> +The following ``git config`` settings can be used to add a pretty format for
> +outputting the above style in the ``git log`` or ``git show`` commands:
> +
> +[core]
> +abbrev = 12
> +[pretty]
> +fixes = Fixes: %h (\"%s\")
> +
> +Lines in the commit message should not exceed 75 characters, except when
> +copying error output directly into the commit message.
> +
> +## Tags
> +
> +Tags are entries in the form
> +
> +Tag: something
> +
> +In general tags are added in chronological order. So a `Reviewed-by:` tag
> +should be added **after** the `Signed-off-by:` tag, as the review happened
> +after the patch was written.
> +
> +Do not split a tag across multiple lines, tags are exempt from the
> +"wrap at 75 columns" rule in order to simplify parsing scripts.
> +
> +### Taken-from:
> +
> +Xen has inherited some source files from other open source projects. In case
> +a patch modifying such an inherited file is taken from that project (maybe in
> +modified form), the `Taken-from:` tag specifies the source of the patch:
> +
> +Taken-from:  
> +
> +E.g.:
> +
> +Taken-from: 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f093b08c47b3
> +
> +All tags **above** the `Taken-from:` tag are from the original patch (which
> +should all be kept), while tags **after** `Taken-from:` are related to the
> +normal Xen patch process as described here.
> +
> +### Fixes:
> +
> +If your patch fixes a bug in a specific commit, e.g. you found an issue using
> +``git bisect``, please use the `Fixes:` tag with the first 12 characters of
> +the commit id, and the one line summary.
> +
> +Fixes:  ("")
> +
> +E.g.:
> +
> +Fixes: 67d01cdb5518 ("x86: infrastructure to allow converting certain 
> indirect calls to direct ones")
> +
> +### Backport:
> +
> +A backport tag is an optional tag in the commit message to request a
> +giv

Re: [PATCH 4.16 / 4.15] MAINTAINERS: Anthony is stable branch tools maintainer

2022-02-03 Thread Anthony PERARD
On Thu, Feb 03, 2022 at 09:56:08AM +0100, Jan Beulich wrote:
> Signed-off-by: Jan Beulich 
> 
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -60,7 +60,7 @@ The maintainer for this branch is:
>  
>  Tools backport requests should also be copied to:
>  
> - TODO - Loooking for new tools stable maintainer
> + Anthony Perard 

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD



Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests

2022-02-03 Thread Jan Beulich
On 03.02.2022 10:04, Roger Pau Monné wrote:
> On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote:
>> On 02.02.2022 17:13, Roger Pau Monné wrote:
>>> On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote:
 @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d,
  
  ASSERT( pod_target >= p2m->pod.count );
  
 -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
 +if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
>>>
>>> Is it possible to have cache flush allowed without any PCI device
>>> assigned? AFAICT the iomem/ioport_caps would only get setup when there
>>> are device passed through?
>>
>> One can assign MMIO or ports to a guest the raw way. That's not
>> secure, but functionally explicitly permitted.
>>
>>> TBH I would be fine if we just say that PoD cannot be used in
>>> conjunction with an IOMMU, and just check for is_iommu_enable(d) here.
>>>
>>> I understand it's technically possible for PoD to be used together
>>> with a domain that will later get a device passed through once PoD is
>>> no longer in use, but I doubt there's much value in supporting that
>>> use case, and I fear we might be introducing corner cases that could
>>> create issues in the future. Overall I think it would be safer to just
>>> disable PoD in conjunction with an IOMMU.
>>
>> I consider it wrong to put in place such a restriction, but I could
>> perhaps accept you and Andrew thinking this way if this was the only
>> aspect playing into here. However, this would then want an equivalent
>> tools side check, and while hunting down where to make the change as
>> done here, I wasn't able to figure out where that alternative
>> adjustment would need doing. Hence I would possibly(!) buy into this
>> only if someone else took care of doing so properly in the tool stack
>> (including the emission of a sensible error message).
> 
> What about the (completely untested) chunk below:
> 
> diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
> index d7a40d7550..e585ef4c5c 100644
> --- a/tools/libs/light/libxl_create.c
> +++ b/tools/libs/light/libxl_create.c
> @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
>  pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) &&
>  (d_config->b_info.target_memkb < d_config->b_info.max_memkb);
>  
> -/* We cannot have PoD and PCI device assignment at the same time
> +/* We cannot have PoD and an active IOMMU at the same time
>   * for HVM guest. It was reported that IOMMU cannot work with PoD
>   * enabled because it needs to populated entire page table for
> - * guest. To stay on the safe side, we disable PCI device
> - * assignment when PoD is enabled.
> + * guest.
>   */
>  if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
> -d_config->num_pcidevs && pod_enabled) {
> +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
> +pod_enabled) {
>  ret = ERROR_INVAL;
> -LOGD(ERROR, domid,
> - "PCI device assignment for HVM guest failed due to PoD 
> enabled");
> +LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD");
>  goto error_out;
>  }

Perhaps. Seeing this I actually recall coming across this check during
my investigation. Not changing it along the lines of what you do was
then really more because of me not being convinced of the extra
restriction; I clearly misremembered when writing the earlier reply.
If we were to do what you suggest, I'd like to ask that the comment be
changed differently, though: "We cannot ..." then isn't really true
anymore. We choose not to permit this mode; "cannot" only applies to
actual device assignment (and of course only as long as there aren't
restartable IOMMU faults).

>> Finally this still leaves out the "raw MMIO / ports" case mentioned
>> above.
> 
> But the raw MMIO 'mode' doesn't care much about PoD, because if
> there's no PCI device assigned there's no IOMMU setup, and thus such
> raw MMIO regions (could?) belong to a device that's not constrained by
> the guest p2m anyway?

Hmm, yes, true.

 --- a/xen/common/vm_event.c
 +++ b/xen/common/vm_event.c
 @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st
  
  rc = -EXDEV;
  /* Disallow paging in a PoD guest */
 -if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) )
 +if ( p2m_pod_active(d) )
>>>
>>> Isn't it fine to just check for entry_count like you suggest in the
>>> change to libxl?
>>
>> I didn't think it would be, but I'm not entirely sure: If paging was
>> enabled before a guest actually starts, it wouldn't have any entries
>> but still be a PoD guest if it has a non-empty cache. The VM event
>> folks may be able to clarify this either way. But ...
>>
>>> This is what p2m_pod_entry_count actually does (rather than entry_count | 
>>> count).
>>
>> ... you

Re: [PATCH] xen: add option to disable GNTTABOP_transfer

2022-02-03 Thread Jan Beulich
On 01.02.2022 10:02, Juergen Gross wrote:
> The grant table operation GNTTABOP_transfer is meant to be used in
> PV device backends, and it hasn't been used in Linux since the old
> Xen-o-Linux days.

Kind of unusual spelling of XenoLinux ;-)

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -181,6 +181,7 @@ static int parse_gnttab_max_maptrack_frames(const char 
> *arg)
>  
>  unsigned int __read_mostly opt_gnttab_max_version = GNTTAB_MAX_VERSION;
>  static bool __read_mostly opt_transitive_grants = true;
> +static bool __read_mostly opt_grant_transfer = true;

If this was conditional upon PV (with a #define to false in the
opposite case), it could be __ro_after_init right away, while at
the same time allowing the compiler to eliminate gnttab_transfer().

> @@ -204,6 +205,8 @@ static int __init parse_gnttab(const char *s)
>  }
>  else if ( (val = parse_boolean("transitive", s, ss)) >= 0 )
>  opt_transitive_grants = val;
> +else if ( (val = parse_boolean("transfer", s, ss)) >= 0 )
> +opt_grant_transfer = val;
>  else
>  rc = -EINVAL;

To possibly save a further roundtrip: If the PV dependency was added
above, I'd like to ask to follow the model of parse_iommu_param()
here and use "#ifndef opt_grant_transfer" around the added code in
favor of "#ifdef CONFIG_PV".

> @@ -2233,6 +2236,9 @@ gnttab_transfer(
>  unsigned int max_bitsize;
>  struct active_grant_entry *act;
>  
> +if ( !opt_grant_transfer )
> +return -ENOSYS;

-EOPNOTSUPP please.

Jan




Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests

2022-02-03 Thread Roger Pau Monné
On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote:
> On 02.02.2022 17:13, Roger Pau Monné wrote:
> > On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote:
> >> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d,
> >>  
> >>  ASSERT( pod_target >= p2m->pod.count );
> >>  
> >> -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
> >> +if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
> > 
> > Is it possible to have cache flush allowed without any PCI device
> > assigned? AFAICT the iomem/ioport_caps would only get setup when there
> > are device passed through?
> 
> One can assign MMIO or ports to a guest the raw way. That's not
> secure, but functionally explicitly permitted.
> 
> > TBH I would be fine if we just say that PoD cannot be used in
> > conjunction with an IOMMU, and just check for is_iommu_enable(d) here.
> > 
> > I understand it's technically possible for PoD to be used together
> > with a domain that will later get a device passed through once PoD is
> > no longer in use, but I doubt there's much value in supporting that
> > use case, and I fear we might be introducing corner cases that could
> > create issues in the future. Overall I think it would be safer to just
> > disable PoD in conjunction with an IOMMU.
> 
> I consider it wrong to put in place such a restriction, but I could
> perhaps accept you and Andrew thinking this way if this was the only
> aspect playing into here. However, this would then want an equivalent
> tools side check, and while hunting down where to make the change as
> done here, I wasn't able to figure out where that alternative
> adjustment would need doing. Hence I would possibly(!) buy into this
> only if someone else took care of doing so properly in the tool stack
> (including the emission of a sensible error message).

What about the (completely untested) chunk below:

diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index d7a40d7550..e585ef4c5c 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
 pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) &&
 (d_config->b_info.target_memkb < d_config->b_info.max_memkb);
 
-/* We cannot have PoD and PCI device assignment at the same time
+/* We cannot have PoD and an active IOMMU at the same time
  * for HVM guest. It was reported that IOMMU cannot work with PoD
  * enabled because it needs to populated entire page table for
- * guest. To stay on the safe side, we disable PCI device
- * assignment when PoD is enabled.
+ * guest.
  */
 if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
-d_config->num_pcidevs && pod_enabled) {
+d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
+pod_enabled) {
 ret = ERROR_INVAL;
-LOGD(ERROR, domid,
- "PCI device assignment for HVM guest failed due to PoD enabled");
+LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD");
 goto error_out;
 }
 


> Finally this still leaves out the "raw MMIO / ports" case mentioned
> above.

But the raw MMIO 'mode' doesn't care much about PoD, because if
there's no PCI device assigned there's no IOMMU setup, and thus such
raw MMIO regions (could?) belong to a device that's not constrained by
the guest p2m anyway?

> >> --- a/xen/common/vm_event.c
> >> +++ b/xen/common/vm_event.c
> >> @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st
> >>  
> >>  rc = -EXDEV;
> >>  /* Disallow paging in a PoD guest */
> >> -if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) )
> >> +if ( p2m_pod_active(d) )
> > 
> > Isn't it fine to just check for entry_count like you suggest in the
> > change to libxl?
> 
> I didn't think it would be, but I'm not entirely sure: If paging was
> enabled before a guest actually starts, it wouldn't have any entries
> but still be a PoD guest if it has a non-empty cache. The VM event
> folks may be able to clarify this either way. But ...
> 
> > This is what p2m_pod_entry_count actually does (rather than entry_count | 
> > count).
> 
> ... you really mean "did" here, as I'm removing p2m_pod_entry_count()
> in this patch. Of course locking could be added to it instead (or
> p2m_pod_get_mem_target() be used in its place), but I'd prefer if we
> could go with just the check which precisely matches what the comment
> says (IOW otherwise I'd need to additionally know what exactly the
> comment is to say).

Could you briefly mention this in the commit message? Ie: VM event
code is also adjusted to make sure PoD is not in use and cannot be
used during the guest lifetime?

Thanks, Roger.



[PATCH 4.16 / 4.15] MAINTAINERS: Anthony is stable branch tools maintainer

2022-02-03 Thread Jan Beulich
Signed-off-by: Jan Beulich 

--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -60,7 +60,7 @@ The maintainer for this branch is:
 
 Tools backport requests should also be copied to:
 
-   TODO - Loooking for new tools stable maintainer
+   Anthony Perard 
 
 
Unstable Subsystem Maintainers




Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests

2022-02-03 Thread Jan Beulich
On 02.02.2022 17:13, Roger Pau Monné wrote:
> On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote:
>> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d,
>>  
>>  ASSERT( pod_target >= p2m->pod.count );
>>  
>> -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
>> +if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
> 
> Is it possible to have cache flush allowed without any PCI device
> assigned? AFAICT the iomem/ioport_caps would only get setup when there
> are device passed through?

One can assign MMIO or ports to a guest the raw way. That's not
secure, but functionally explicitly permitted.

> TBH I would be fine if we just say that PoD cannot be used in
> conjunction with an IOMMU, and just check for is_iommu_enable(d) here.
> 
> I understand it's technically possible for PoD to be used together
> with a domain that will later get a device passed through once PoD is
> no longer in use, but I doubt there's much value in supporting that
> use case, and I fear we might be introducing corner cases that could
> create issues in the future. Overall I think it would be safer to just
> disable PoD in conjunction with an IOMMU.

I consider it wrong to put in place such a restriction, but I could
perhaps accept you and Andrew thinking this way if this was the only
aspect playing into here. However, this would then want an equivalent
tools side check, and while hunting down where to make the change as
done here, I wasn't able to figure out where that alternative
adjustment would need doing. Hence I would possibly(!) buy into this
only if someone else took care of doing so properly in the tool stack
(including the emission of a sensible error message).

Finally this still leaves out the "raw MMIO / ports" case mentioned
above.

>> --- a/xen/common/vm_event.c
>> +++ b/xen/common/vm_event.c
>> @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st
>>  
>>  rc = -EXDEV;
>>  /* Disallow paging in a PoD guest */
>> -if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) )
>> +if ( p2m_pod_active(d) )
> 
> Isn't it fine to just check for entry_count like you suggest in the
> change to libxl?

I didn't think it would be, but I'm not entirely sure: If paging was
enabled before a guest actually starts, it wouldn't have any entries
but still be a PoD guest if it has a non-empty cache. The VM event
folks may be able to clarify this either way. But ...

> This is what p2m_pod_entry_count actually does (rather than entry_count | 
> count).

... you really mean "did" here, as I'm removing p2m_pod_entry_count()
in this patch. Of course locking could be added to it instead (or
p2m_pod_get_mem_target() be used in its place), but I'd prefer if we
could go with just the check which precisely matches what the comment
says (IOW otherwise I'd need to additionally know what exactly the
comment is to say).

Jan




[xen-unstable test] 167990: tolerable FAIL - PUSHED

2022-02-03 Thread osstest service owner
flight 167990 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167990/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167986
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167986
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167986
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167986
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167986
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167986
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167986
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167986
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167986
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167986
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167986
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167986
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen  b17e0ec72eded037297f34a233655aad23f64711
baseline version:
 xen  9ce3ef20b4f085a7