Re: [PATCH v6 12/13] xen/arm: translate virtual PCI bus topology for guests
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > --- a/xen/drivers/vpci/vpci.c > +++ b/xen/drivers/vpci/vpci.c > @@ -168,6 +168,35 @@ static void vpci_remove_virtual_device(struct domain *d, > pdev->vpci->guest_sbdf.sbdf = ~0; > } > > +/* > + * Find the physical device which is mapped to the virtual device > + * and translate virtual SBDF to the physical one. > + */ > +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf) > +{ > +struct pci_dev *pdev; > + > +ASSERT(!is_hardware_domain(d)); In addition to this, don't you also need to assert that pcidevs_lock is held (or if it isn't, you'd need to acquire it) for ... > +for_each_pdev( d, pdev ) ... this to be race-free? Jan
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, > uint16_t cmd, bool rom_only) > continue; > } > > +spin_lock(&tmp->vpci_lock); > +if ( !tmp->vpci ) > +{ > +spin_unlock(&tmp->vpci_lock); > +continue; > +} > for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) > { > const struct vpci_bar *bar = &tmp->vpci->header.bars[i]; > @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, > uint16_t cmd, bool rom_only) > rc = rangeset_remove_range(mem, start, end); > if ( rc ) > { > +spin_unlock(&tmp->vpci_lock); > printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n", > start, end, rc); > rangeset_destroy(mem); > return rc; > } > } > +spin_unlock(&tmp->vpci_lock); > } At the first glance this simply looks like another unjustified (in the description) change, as you're not converting anything here but you actually add locking (and I realize this was there before, so I'm sorry for not pointing this out earlier). But then I wonder whether you actually tested this, since I can't help getting the impression that you're introducing a live-lock: The function is called from cmd_write() and rom_write(), which in turn are called out of vpci_write(). Yet that function already holds the lock, and the lock is not (currently) recursive. (For the 3rd caller of the function - init_bars() - otoh the locking looks to be entirely unnecessary.) Then again this was present already even in Roger's original patch, so I guess I must be missing something ... > --- a/xen/drivers/vpci/msix.c > +++ b/xen/drivers/vpci/msix.c > @@ -138,7 +138,7 @@ static void control_write(const struct pci_dev *pdev, > unsigned int reg, > pci_conf_write16(pdev->sbdf, reg, val); > } > > -static struct vpci_msix *msix_find(const struct domain *d, unsigned long > addr) > +static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr) > { > struct vpci_msix *msix; > > @@ -150,15 +150,29 @@ static struct vpci_msix *msix_find(const struct domain > *d, unsigned long addr) > for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ ) > if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled && > VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) ) > +{ > +spin_lock(&msix->pdev->vpci_lock); > return msix; > +} I think deliberately returning with a lock held requires a respective comment ahead of the function. > } > > return NULL; > } > > +static void msix_put(struct vpci_msix *msix) > +{ > +if ( !msix ) > +return; > + > +spin_unlock(&msix->pdev->vpci_lock); > +} Maybe shorter if ( msix ) spin_unlock(&msix->pdev->vpci_lock); ? Yet there's only one case where you may pass NULL in here, so maybe it's better anyway to move the conditional ... > static int msix_accept(struct vcpu *v, unsigned long addr) > { > -return !!msix_find(v->domain, addr); > +struct vpci_msix *msix = msix_get(v->domain, addr); > + > +msix_put(msix); > +return !!msix; > } ... here? > @@ -186,7 +200,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, > unsigned int len, > unsigned long *data) > { > const struct domain *d = v->domain; > -struct vpci_msix *msix = msix_find(d, addr); > +struct vpci_msix *msix = msix_get(d, addr); > const struct vpci_msix_entry *entry; > unsigned int offset; > > @@ -196,7 +210,10 @@ static int msix_read(struct vcpu *v, unsigned long addr, > unsigned int len, > return X86EMUL_RETRY; > > if ( !access_allowed(msix->pdev, addr, len) ) > +{ > +msix_put(msix); > return X86EMUL_OKAY; > +} > > if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) ) > { > @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long > addr, unsigned int len, > break; > } > > +msix_put(msix); > return X86EMUL_OKAY; > } > > -spin_lock(&msix->pdev->vpci->lock); > entry = get_entry(msix, addr); > offset = addr & (PCI_MSIX_ENTRY_SIZE - 1); You're increasing the locked region quite a bit here. If this is really needed, it wants explaining. And if this is deemed acceptable as a "side effect", it wants justifying or at least stating imo. Same for msix_write() then, obviously. (I'm not sure Roger actually implied this when suggesting to switch to the get/put pair.) > @@ -327,7 +334,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, > unsigned int size) > if ( !pdev ) > return vpci_read_hw(sbdf, reg, size); > > -spin
[PATCH v6 13/13] xen/arm: account IO handlers for emulated PCI MSI-X
From: Oleksandr Andrushchenko At the moment, we always allocate an extra 16 slots for IO handlers (see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated MSI-X registers we need to explicitly tell that we have additional IO handlers, so those are accounted. Signed-off-by: Oleksandr Andrushchenko --- Cc: Julien Grall Cc: Stefano Stabellini --- This actually moved here from the part 2 of the prep work for PCI passthrough on Arm as it seems to be the proper place for it. Since v5: - optimize with IS_ENABLED(CONFIG_HAS_PCI_MSI) since VPCI_MAX_VIRT_DEV is defined unconditionally New in v5 --- xen/arch/arm/vpci.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c index 84b2b068a0fe..c5902cb9d34d 100644 --- a/xen/arch/arm/vpci.c +++ b/xen/arch/arm/vpci.c @@ -131,6 +131,8 @@ static int vpci_get_num_handlers_cb(struct domain *d, unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d) { +unsigned int count; + if ( !has_vpci(d) ) return 0; @@ -151,7 +153,17 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d) * For guests each host bridge requires one region to cover the * configuration space. At the moment, we only expose a single host bridge. */ -return 1; +count = 1; + +/* + * There's a single MSI-X MMIO handler that deals with both PBA + * and MSI-X tables per each PCI device being passed through. + * Maximum number of emulated virtual devices is VPCI_MAX_VIRT_DEV. + */ +if ( IS_ENABLED(CONFIG_HAS_PCI_MSI) ) +count += VPCI_MAX_VIRT_DEV; + +return count; } /* -- 2.25.1
[PATCH v6 12/13] xen/arm: translate virtual PCI bus topology for guests
From: Oleksandr Andrushchenko There are three originators for the PCI configuration space access: 1. The domain that owns physical host bridge: MMIO handlers are there so we can update vPCI register handlers with the values written by the hardware domain, e.g. physical view of the registers vs guest's view on the configuration space. 2. Guest access to the passed through PCI devices: we need to properly map virtual bus topology to the physical one, e.g. pass the configuration space access to the corresponding physical devices. 3. Emulated host PCI bridge access. It doesn't exist in the physical topology, e.g. it can't be mapped to some physical host bridge. So, all access to the host bridge itself needs to be trapped and emulated. Signed-off-by: Oleksandr Andrushchenko --- Since v5: - add vpci_translate_virtual_device for #ifndef CONFIG_HAS_VPCI_GUEST_SUPPORT case to simplify ifdefery - add ASSERT(!is_hardware_domain(d)); to vpci_translate_virtual_device - reset output register on failed virtual SBDF translation Since v4: - indentation fixes - constify struct domain - updated commit message - updates to the new locking scheme (pdev->vpci_lock) Since v3: - revisit locking - move code to vpci.c Since v2: - pass struct domain instead of struct vcpu - constify arguments where possible - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT New in v2 --- xen/arch/arm/vpci.c | 17 + xen/drivers/vpci/vpci.c | 29 + xen/include/xen/vpci.h | 7 +++ 3 files changed, 53 insertions(+) diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c index a9fc5817f94e..84b2b068a0fe 100644 --- a/xen/arch/arm/vpci.c +++ b/xen/arch/arm/vpci.c @@ -41,6 +41,16 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info, /* data is needed to prevent a pointer cast on 32bit */ unsigned long data; +/* + * For the passed through devices we need to map their virtual SBDF + * to the physical PCI device being passed through. + */ +if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) ) +{ +*r = ~0ul; +return 1; +} + if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa), 1U << info->dabt.size, &data) ) { @@ -59,6 +69,13 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info, struct pci_host_bridge *bridge = p; pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa); +/* + * For the passed through devices we need to map their virtual SBDF + * to the physical PCI device being passed through. + */ +if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) ) +return 1; + return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa), 1U << info->dabt.size, r); } diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 7d422d11f83d..070db7391391 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -168,6 +168,35 @@ static void vpci_remove_virtual_device(struct domain *d, pdev->vpci->guest_sbdf.sbdf = ~0; } +/* + * Find the physical device which is mapped to the virtual device + * and translate virtual SBDF to the physical one. + */ +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf) +{ +struct pci_dev *pdev; + +ASSERT(!is_hardware_domain(d)); + +for_each_pdev( d, pdev ) +{ +bool found; + +spin_lock(&pdev->vpci_lock); +found = pdev->vpci && (pdev->vpci->guest_sbdf.sbdf == sbdf->sbdf); +spin_unlock(&pdev->vpci_lock); + +if ( found ) +{ +/* Replace guest SBDF with the physical one. */ +*sbdf = pdev->sbdf; +return true; +} +} + +return false; +} + /* Notify vPCI that device is assigned to guest. */ int vpci_assign_device(struct domain *d, struct pci_dev *pdev) { diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index 1f04d34a2369..f6eb9f2051af 100644 --- a/xen/include/xen/vpci.h +++ b/xen/include/xen/vpci.h @@ -271,6 +271,7 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v) /* Notify vPCI that device is assigned/de-assigned to/from guest. */ int vpci_assign_device(struct domain *d, struct pci_dev *pdev); void vpci_deassign_device(struct domain *d, struct pci_dev *pdev); +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf); #else static inline int vpci_assign_device(struct domain *d, struct pci_dev *pdev) { @@ -280,6 +281,12 @@ static inline int vpci_assign_device(struct domain *d, struct pci_dev *pdev) static inline void vpci_deassign_device(struct domain *d, struct pci_dev *pdev) { }; + +static inline bool vpci_translate_virtual_device(const struct domain *d, + pci_sbdf_t *sbdf) +{ +return false; +} #endif #endif -- 2.25.1
[PATCH v6 11/13] vpci: add initial support for virtual PCI bus topology
From: Oleksandr Andrushchenko Assign SBDF to the PCI devices being passed through with bus 0. The resulting topology is where PCIe devices reside on the bus 0 of the root complex itself (embedded endpoints). This implementation is limited to 32 devices which are allowed on a single PCI bus. Please note, that at the moment only function 0 of a multifunction device can be passed through. Signed-off-by: Oleksandr Andrushchenko --- Since v5: - s/vpci_add_virtual_device/add_virtual_device and make it static - call add_virtual_device from vpci_assign_device and do not use REGISTER_VPCI_INIT machinery - add pcidevs_locked ASSERT - use DECLARE_BITMAP for vpci_dev_assigned_map Since v4: - moved and re-worked guest sbdf initializers - s/set_bit/__set_bit - s/clear_bit/__clear_bit - minor comment fix s/Virtual/Guest/ - added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used later for counting the number of MMIO handlers required for a guest (Julien) Since v3: - make use of VPCI_INIT - moved all new code to vpci.c which belongs to it - changed open-coded 31 to PCI_SLOT(~0) - added comments and code to reject multifunction devices with functions other than 0 - updated comment about vpci_dev_next and made it unsigned int - implement roll back in case of error while assigning/deassigning devices - s/dom%pd/%pd Since v2: - remove casts that are (a) malformed and (b) unnecessary - add new line for better readability - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI functions are now completely gated with this config - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT New in v2 --- xen/drivers/vpci/vpci.c | 74 +++-- xen/include/xen/sched.h | 8 + xen/include/xen/vpci.h | 11 ++ 3 files changed, 91 insertions(+), 2 deletions(-) diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 3177f13c1c22..7d422d11f83d 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -89,6 +89,9 @@ int vpci_add_handlers(struct pci_dev *pdev) return -ENOMEM; INIT_LIST_HEAD(&vpci->handlers); +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT +vpci->guest_sbdf.sbdf = ~0; +#endif spin_lock(&pdev->vpci_lock); pdev->vpci = vpci; @@ -114,6 +117,57 @@ int vpci_add_handlers(struct pci_dev *pdev) } #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT +static int add_virtual_device(struct pci_dev *pdev) +{ +struct domain *d = pdev->domain; +pci_sbdf_t sbdf = { 0 }; +unsigned long new_dev_number; + +if ( is_hardware_domain(d) ) +return 0; + +ASSERT(pcidevs_locked()); + +/* + * Each PCI bus supports 32 devices/slots at max or up to 256 when + * there are multi-function ones which are not yet supported. + */ +if ( pdev->info.is_extfn ) +{ +gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n", + &pdev->sbdf); +return -EOPNOTSUPP; +} + +new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map, + VPCI_MAX_VIRT_DEV); +if ( new_dev_number >= VPCI_MAX_VIRT_DEV ) +return -ENOSPC; + +__set_bit(new_dev_number, &d->vpci_dev_assigned_map); + +/* + * Both segment and bus number are 0: + * - we emulate a single host bridge for the guest, e.g. segment 0 + * - with bus 0 the virtual devices are seen as embedded + *endpoints behind the root complex + * + * TODO: add support for multi-function devices. + */ +sbdf.devfn = PCI_DEVFN(new_dev_number, 0); +pdev->vpci->guest_sbdf = sbdf; + +return 0; + +} + +static void vpci_remove_virtual_device(struct domain *d, + const struct pci_dev *pdev) +{ +__clear_bit(pdev->vpci->guest_sbdf.dev, &d->vpci_dev_assigned_map); +pdev->vpci->guest_sbdf.sbdf = ~0; +} + /* Notify vPCI that device is assigned to guest. */ int vpci_assign_device(struct domain *d, struct pci_dev *pdev) { @@ -124,8 +178,16 @@ int vpci_assign_device(struct domain *d, struct pci_dev *pdev) rc = vpci_add_handlers(pdev); if ( rc ) -vpci_deassign_device(d, pdev); +goto fail; + +rc = add_virtual_device(pdev); +if ( rc ) +goto fail; +return 0; + + fail: +vpci_deassign_device(d, pdev); return rc; } @@ -135,7 +197,15 @@ void vpci_deassign_device(struct domain *d, struct pci_dev *pdev) if ( !has_vpci(d) ) return; -vpci_remove_device(pdev); +spin_lock(&pdev->vpci_lock); +if ( !pdev->vpci ) +goto done; + +vpci_remove_virtual_device(d, pdev); +vpci_remove_device_locked(pdev); + + done: +spin_unlock(&pdev->vpci_lock); } #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 37f78cc4c4c9..3c25e265eaa8 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -444,6 +444,14 @@ struct domain #i
[PATCH v6 10/13] vpci/header: reset the command register when adding devices
From: Oleksandr Andrushchenko Reset the command register when assigning a PCI device to a guest: according to the PCI spec the PCI_COMMAND register is typically all 0's after reset. Signed-off-by: Oleksandr Andrushchenko --- Since v5: - updated commit message Since v1: - do not write 0 to the command register, but respect host settings. --- xen/drivers/vpci/header.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index 33d8c15ae6e8..407fa2fc4749 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -454,8 +454,7 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg, pci_conf_write16(pdev->sbdf, reg, cmd); } -static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg, -uint32_t cmd, void *data) +static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd) { /* TODO: Add proper emulation for all bits of the command register. */ @@ -467,7 +466,13 @@ static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg, } #endif -cmd_write(pdev, reg, cmd, data); +return cmd; +} + +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg, +uint32_t cmd, void *data) +{ +cmd_write(pdev, reg, emulate_cmd_reg(pdev, cmd), data); } static void bar_write(const struct pci_dev *pdev, unsigned int reg, @@ -676,6 +681,10 @@ static int init_bars(struct pci_dev *pdev) return -EOPNOTSUPP; } +/* Reset the command register for the guest. */ +if ( !is_hwdom ) +pci_conf_write16(pdev->sbdf, PCI_COMMAND, emulate_cmd_reg(pdev, 0)); + /* Setup a handler for the command register. */ rc = vpci_add_register(pdev->vpci, vpci_hw_read16, is_hwdom ? cmd_write : guest_cmd_write, -- 2.25.1
[PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for guests
From: Oleksandr Andrushchenko Add basic emulation support for guests. At the moment only emulate PCI_COMMAND_INTX_DISABLE bit, the rest is not emulated yet and left as TODO. Signed-off-by: Oleksandr Andrushchenko --- Since v5: - add additional check for MSI-X enabled while altering INTX bit - make sure INTx disabled while guests enable MSI/MSI-X Since v3: - gate more code on CONFIG_HAS_MSI - removed logic for the case when MSI/MSI-X not enabled --- xen/drivers/vpci/header.c | 21 +++-- xen/drivers/vpci/msi.c| 4 xen/drivers/vpci/msix.c | 4 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index 88ca1ad8211d..33d8c15ae6e8 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -454,6 +454,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg, pci_conf_write16(pdev->sbdf, reg, cmd); } +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg, +uint32_t cmd, void *data) +{ +/* TODO: Add proper emulation for all bits of the command register. */ + +#ifdef CONFIG_HAS_PCI_MSI +if ( pdev->vpci->msi->enabled || pdev->vpci->msix->enabled ) +{ +/* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */ +cmd |= PCI_COMMAND_INTX_DISABLE; +} +#endif + +cmd_write(pdev, reg, cmd, data); +} + static void bar_write(const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data) { @@ -661,8 +677,9 @@ static int init_bars(struct pci_dev *pdev) } /* Setup a handler for the command register. */ -rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND, - 2, header); +rc = vpci_add_register(pdev->vpci, vpci_hw_read16, + is_hwdom ? cmd_write : guest_cmd_write, + PCI_COMMAND, 2, header); if ( rc ) return rc; diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c index e3ce46869dad..90465dcb4831 100644 --- a/xen/drivers/vpci/msi.c +++ b/xen/drivers/vpci/msi.c @@ -70,6 +70,10 @@ static void control_write(const struct pci_dev *pdev, unsigned int reg, if ( vpci_msi_arch_enable(msi, pdev, vectors) ) return; + +/* Make sure guest doesn't enable INTx while enabling MSI. */ +if ( !is_hardware_domain(pdev->domain) ) +pci_intx(pdev, false); } else vpci_msi_arch_disable(msi, pdev); diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c index d1dbfc6e0ffd..4c0e1836b589 100644 --- a/xen/drivers/vpci/msix.c +++ b/xen/drivers/vpci/msix.c @@ -92,6 +92,10 @@ static void control_write(const struct pci_dev *pdev, unsigned int reg, for ( i = 0; i < msix->max_entries; i++ ) if ( !msix->entries[i].masked && msix->entries[i].updated ) update_entry(&msix->entries[i], pdev, i); + +/* Make sure guest doesn't enable INTx while enabling MSI-X. */ +if ( !is_hardware_domain(pdev->domain) ) +pci_intx(pdev, false); } else if ( !new_enabled && msix->enabled ) { -- 2.25.1
[PATCH v6 08/13] vpci/header: program p2m with guest BAR view
From: Oleksandr Andrushchenko Take into account guest's BAR view and program its p2m accordingly: gfn is guest's view of the BAR and mfn is the physical BAR value as set up by the PCI bus driver in the hardware domain. This way hardware domain sees physical BAR values and guest sees emulated ones. Signed-off-by: Oleksandr Andrushchenko --- Since v5: - remove debug print in map_range callback - remove "identity" from the debug print Since v4: - moved start_{gfn|mfn} calculation into map_range - pass vpci_bar in the map_data instead of start_{gfn|mfn} - s/guest_addr/guest_reg Since v3: - updated comment (Roger) - removed gfn_add(map->start_gfn, rc); which is wrong - use v->domain instead of v->vpci.pdev->domain - removed odd e.g. in comment - s/d%d/%pd in altered code - use gdprintk for map/unmap logs Since v2: - improve readability for data.start_gfn and restructure ?: construct Since v1: - s/MSI/MSI-X in comments --- xen/drivers/vpci/header.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index 0c94504b87d8..88ca1ad8211d 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -30,6 +30,7 @@ struct map_data { struct domain *d; +const struct vpci_bar *bar; bool map; }; @@ -41,8 +42,21 @@ static int map_range(unsigned long s, unsigned long e, void *data, for ( ; ; ) { +/* Start address of the BAR as seen by the guest. */ +gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d) +? map->bar->addr +: map->bar->guest_reg)); +/* Physical start address of the BAR. */ +mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr)); unsigned long size = e - s + 1; +/* + * Ranges to be mapped don't always start at the BAR start address, as + * there can be holes or partially consumed ranges. Account for the + * offset of the current address from the BAR start. + */ +start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn)); + /* * ARM TODOs: * - On ARM whether the memory is prefetchable or not should be passed @@ -52,8 +66,8 @@ static int map_range(unsigned long s, unsigned long e, void *data, * - {un}map_mmio_regions doesn't support preemption. */ -rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s)) - : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s)); +rc = map->map ? map_mmio_regions(map->d, start_gfn, size, _mfn(s)) + : unmap_mmio_regions(map->d, start_gfn, size, _mfn(s)); if ( rc == 0 ) { *c += size; @@ -62,8 +76,8 @@ static int map_range(unsigned long s, unsigned long e, void *data, if ( rc < 0 ) { printk(XENLOG_G_WARNING - "Failed to identity %smap [%lx, %lx] for d%d: %d\n", - map->map ? "" : "un", s, e, map->d->domain_id, rc); + "Failed to %smap [%lx, %lx] for %pd: %d\n", + map->map ? "" : "un", s, e, map->d, rc); break; } ASSERT(rc < size); @@ -154,6 +168,7 @@ bool vpci_process_pending(struct vcpu *v) if ( rangeset_is_empty(bar->mem) ) continue; +data.bar = bar; rc = rangeset_consume_ranges(bar->mem, map_range, &data); if ( rc == -ERESTART ) @@ -206,6 +221,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev, if ( rangeset_is_empty(bar->mem) ) continue; +data.bar = bar; while ( (rc = rangeset_consume_ranges(bar->mem, map_range, &data)) == -ERESTART ) process_pending_softirqs(); -- 2.25.1
[PATCH v6 06/13] vpci/header: implement guest BAR register handlers
From: Oleksandr Andrushchenko Add relevant vpci register handlers when assigning PCI device to a domain and remove those when de-assigning. This allows having different handlers for different domains, e.g. hwdom and other guests. Emulate guest BAR register values: this allows creating a guest view of the registers and emulates size and properties probe as it is done during PCI device enumeration by the guest. All empty, IO and ROM BARs for guests are emulated by returning 0 on reads and ignoring writes: this BARs are special with this respect as their lower bits have special meaning, so returning default ~0 on read may confuse guest OS. Memory decoding is initially disabled when used by guests in order to prevent the BAR being placed on top of a RAM region. Signed-off-by: Oleksandr Andrushchenko --- Since v5: - make sure that the guest set address has the same page offset as the physical address on the host - remove guest_rom_{read|write} as those just implement the default behaviour of the registers not being handled - adjusted comment for struct vpci.addr field - add guest handlers for BARs which are not handled and will otherwise return ~0 on read and ignore writes. The BARs are special with this respect as their lower bits have special meaning, so returning ~0 doesn't seem to be right Since v4: - updated commit message - s/guest_addr/guest_reg Since v3: - squashed two patches: dynamic add/remove handlers and guest BAR handler implementation - fix guest BAR read of the high part of a 64bit BAR (Roger) - add error handling to vpci_assign_device - s/dom%pd/%pd - blank line before return Since v2: - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code has been eliminated from being built on x86 Since v1: - constify struct pci_dev where possible - do not open code is_system_domain() - simplify some code3. simplify - use gdprintk + error code instead of gprintk - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT, so these do not get compiled for x86 - removed unneeded is_system_domain check - re-work guest read/write to be much simpler and do more work on write than read which is expected to be called more frequently - removed one too obvious comment --- xen/drivers/vpci/header.c | 131 +- xen/include/xen/vpci.h| 3 + 2 files changed, 118 insertions(+), 16 deletions(-) diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index bd23c0274d48..2620a95ff35b 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -406,6 +406,81 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg, pci_conf_write32(pdev->sbdf, reg, val); } +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg, +uint32_t val, void *data) +{ +struct vpci_bar *bar = data; +bool hi = false; +uint64_t guest_reg = bar->guest_reg; + +if ( bar->type == VPCI_BAR_MEM64_HI ) +{ +ASSERT(reg > PCI_BASE_ADDRESS_0); +bar--; +hi = true; +} +else +{ +val &= PCI_BASE_ADDRESS_MEM_MASK; +val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32 + : PCI_BASE_ADDRESS_MEM_TYPE_64; +val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0; +} + +guest_reg &= ~(0xull << (hi ? 32 : 0)); +guest_reg |= (uint64_t)val << (hi ? 32 : 0); + +guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK; + +/* + * Make sure that the guest set address has the same page offset + * as the physical address on the host or otherwise things won't work as + * expected. + */ +if ( (guest_reg & (~PAGE_MASK & PCI_BASE_ADDRESS_MEM_MASK)) != + (bar->addr & ~PAGE_MASK) ) +{ +gprintk(XENLOG_WARNING, +"%pp: ignored BAR %zu write with wrong page offset\n", +&pdev->sbdf, bar - pdev->vpci->header.bars + hi); +return; +} + +bar->guest_reg = guest_reg; +} + +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg, + void *data) +{ +const struct vpci_bar *bar = data; +bool hi = false; + +if ( bar->type == VPCI_BAR_MEM64_HI ) +{ +ASSERT(reg > PCI_BASE_ADDRESS_0); +bar--; +hi = true; +} + +return bar->guest_reg >> (hi ? 32 : 0); +} + +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev, + unsigned int reg, void *data) +{ +return 0; +} + +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg, + struct vpci_bar *bar) +{ +if ( is_hardware_domain(pdev->domain) ) +return 0; + +return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL, + reg, 4, bar); +} + static void rom_write(const struct pci_dev *pd
[PATCH v6 04/13] vpci: restrict unhandled read/write operations for guests
From: Oleksandr Andrushchenko A guest can read and write those registers which are not emulated and have no respective vPCI handlers, so it can access the HW directly. In order to prevent a guest from reads and writes from/to the unhandled registers make sure only hardware domain can access HW directly and restrict guests from doing so. Suggested-by: Roger Pau Monné Signed-off-by: Oleksandr Andrushchenko --- New in v6 --- xen/drivers/vpci/vpci.c | 32 +--- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index cb2ababa28e3..f8a93e61c08f 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -215,11 +215,15 @@ int vpci_remove_register(struct vpci *vpci, unsigned int offset, } /* Wrappers for performing reads/writes to the underlying hardware. */ -static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int reg, +static uint32_t vpci_read_hw(bool is_hwdom, pci_sbdf_t sbdf, unsigned int reg, unsigned int size) { uint32_t data; +/* Guest domains are not allowed to read real hardware. */ +if ( !is_hwdom ) +return ~(uint32_t)0; + switch ( size ) { case 4: @@ -260,9 +264,13 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int reg, return data; } -static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg, unsigned int size, - uint32_t data) +static void vpci_write_hw(bool is_hwdom, pci_sbdf_t sbdf, unsigned int reg, + unsigned int size, uint32_t data) { +/* Guest domains are not allowed to write real hardware. */ +if ( !is_hwdom ) +return; + switch ( size ) { case 4: @@ -322,6 +330,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size) const struct vpci_register *r; unsigned int data_offset = 0; uint32_t data = ~(uint32_t)0; +bool is_hwdom = is_hardware_domain(d); if ( !size ) { @@ -332,13 +341,13 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size) /* Find the PCI dev matching the address. */ pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); if ( !pdev ) -return vpci_read_hw(sbdf, reg, size); +return vpci_read_hw(is_hwdom, sbdf, reg, size); spin_lock(&pdev->vpci_lock); if ( !pdev->vpci ) { spin_unlock(&pdev->vpci_lock); -return vpci_read_hw(sbdf, reg, size); +return vpci_read_hw(is_hwdom, sbdf, reg, size); } /* Read from the hardware or the emulated register handlers. */ @@ -361,7 +370,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size) { /* Heading gap, read partial content from hardware. */ read_size = r->offset - emu.offset; -val = vpci_read_hw(sbdf, emu.offset, read_size); +val = vpci_read_hw(is_hwdom, sbdf, emu.offset, read_size); data = merge_result(data, val, read_size, data_offset); data_offset += read_size; } @@ -387,7 +396,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size) if ( data_offset < size ) { /* Tailing gap, read the remaining. */ -uint32_t tmp_data = vpci_read_hw(sbdf, reg + data_offset, +uint32_t tmp_data = vpci_read_hw(is_hwdom, sbdf, reg + data_offset, size - data_offset); data = merge_result(data, tmp_data, size - data_offset, data_offset); @@ -430,6 +439,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size, const struct vpci_register *r; unsigned int data_offset = 0; const unsigned long *ro_map = pci_get_ro_map(sbdf.seg); +bool is_hwdom = is_hardware_domain(d); if ( !size ) { @@ -448,7 +458,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size, pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); if ( !pdev ) { -vpci_write_hw(sbdf, reg, size, data); +vpci_write_hw(is_hwdom, sbdf, reg, size, data); return; } @@ -456,7 +466,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size, if ( !pdev->vpci ) { spin_unlock(&pdev->vpci_lock); -vpci_write_hw(sbdf, reg, size, data); +vpci_write_hw(is_hwdom, sbdf, reg, size, data); return; } @@ -479,7 +489,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size, if ( emu.offset < r->offset ) { /* Heading gap, write partial content to hardware. */ -vpci_write_hw(sbdf, emu.offset, r->offset - emu.offset, +vpci_write_hw(is_hwdom, sbdf, emu.offset, r->offset - emu.offset, data >> (data_offset * 8)); data_offset += r->offset - emu.offset; } @@ -498
[PATCH v6 07/13] vpci/header: handle p2m range sets per BAR
From: Oleksandr Andrushchenko Instead of handling a single range set, that contains all the memory regions of all the BARs and ROM, have them per BAR. As the range sets are now created when a PCI device is added and destroyed when it is removed so make them named and accounted. Note that rangesets were chosen here despite there being only up to 3 separate ranges in each set (typically just 1). But rangeset per BAR was chosen for the ease of implementation and existing code re-usability. This is in preparation of making non-identity mappings in p2m for the MMIOs. Signed-off-by: Oleksandr Andrushchenko --- Since v5: - fix comments - move rangeset allocation to init_bars and only allocate for MAPPABLE BARs - check for overlap with the already setup BAR ranges Since v4: - use named range sets for BARs (Jan) - changes required by the new locking scheme - updated commit message (Jan) Since v3: - re-work vpci_cancel_pending accordingly to the per-BAR handling - s/num_mem_ranges/map_pending and s/uint8_t/bool - ASSERT(bar->mem) in modify_bars - create and destroy the rangesets on add/remove --- xen/drivers/vpci/header.c | 213 -- xen/drivers/vpci/vpci.c | 17 ++- xen/include/xen/vpci.h| 4 +- 3 files changed, 177 insertions(+), 57 deletions(-) diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index 2620a95ff35b..0c94504b87d8 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -131,50 +131,85 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd, bool vpci_process_pending(struct vcpu *v) { -if ( v->vpci.mem ) +struct pci_dev *pdev = v->vpci.pdev; + +if ( !pdev ) +return false; + +spin_lock(&pdev->vpci_lock); +if ( v->vpci.map_pending ) { struct map_data data = { .d = v->domain, .map = v->vpci.cmd & PCI_COMMAND_MEMORY, }; -int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data); +struct vpci_header *header = &pdev->vpci->header; +unsigned int i; + +for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) +{ +struct vpci_bar *bar = &header->bars[i]; +int rc; + +if ( rangeset_is_empty(bar->mem) ) +continue; -if ( rc == -ERESTART ) -return true; +rc = rangeset_consume_ranges(bar->mem, map_range, &data); + +if ( rc == -ERESTART ) +{ +spin_unlock(&pdev->vpci_lock); +return true; +} -spin_lock(&v->vpci.pdev->vpci_lock); -if ( v->vpci.pdev->vpci ) /* Disable memory decoding unconditionally on failure. */ -modify_decoding(v->vpci.pdev, -rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd, -!rc && v->vpci.rom_only); -spin_unlock(&v->vpci.pdev->vpci_lock); +modify_decoding(pdev, rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : + v->vpci.cmd, !rc && v->vpci.rom_only); + +if ( rc ) +{ +/* + * FIXME: in case of failure remove the device from the domain. + * Note that there might still be leftover mappings. While this + * is safe for Dom0, for DomUs the domain needs to be killed in + * order to avoid leaking stale p2m mappings on failure. + */ +if ( is_hardware_domain(v->domain) ) +vpci_remove_device_locked(pdev); +else +domain_crash(v->domain); + +break; +} +} + +v->vpci.map_pending = false; -rangeset_destroy(v->vpci.mem); -v->vpci.mem = NULL; -if ( rc ) -/* - * FIXME: in case of failure remove the device from the domain. - * Note that there might still be leftover mappings. While this is - * safe for Dom0, for DomUs the domain will likely need to be - * killed in order to avoid leaking stale p2m mappings on - * failure. - */ -vpci_remove_device(v->vpci.pdev); } +spin_unlock(&pdev->vpci_lock); return false; } static int __init apply_map(struct domain *d, const struct pci_dev *pdev, -struct rangeset *mem, uint16_t cmd) +uint16_t cmd) { struct map_data data = { .d = d, .map = true }; -int rc; +struct vpci_header *header = &pdev->vpci->header; +int rc = 0; +unsigned int i; + +for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) +{ +struct vpci_bar *bar = &header->bars[i]; -while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART ) -process_pending_softirqs(); -rangeset_destroy(mem); +if ( rangeset_i
[PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole
From: Oleksandr Andrushchenko Add a stub for is_memory_hole which is required for PCI passthrough on Arm. Signed-off-by: Oleksandr Andrushchenko --- Cc: Julien Grall Cc: Stefano Stabellini --- New in v6 --- xen/arch/arm/mm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index b1eae767c27c..c32e34a182a2 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1640,6 +1640,12 @@ unsigned long get_upper_mfn_bound(void) return max_page - 1; } +bool is_memory_hole(mfn_t start, mfn_t end) +{ +/* TODO: this needs to be properly implemented. */ +return true; +} + /* * Local variables: * mode: C -- 2.25.1
[PATCH v6 05/13] vpci: add hooks for PCI device assign/de-assign
From: Oleksandr Andrushchenko When a PCI device gets assigned/de-assigned some work on vPCI side needs to be done for that device. Introduce a pair of hooks so vPCI can handle that. Signed-off-by: Oleksandr Andrushchenko --- Since v5: - do not split code into run_vpci_init - do not check for is_system_domain in vpci_{de}assign_device - do not use vpci_remove_device_handlers_locked and re-allocate pdev->vpci completely - make vpci_deassign_device void Since v4: - de-assign vPCI from the previous domain on device assignment - do not remove handlers in vpci_assign_device as those must not exist at that point Since v3: - remove toolstack roll-back description from the commit message as error are to be handled with proper cleanup in Xen itself - remove __must_check - remove redundant rc check while assigning devices - fix redundant CONFIG_HAS_VPCI check for CONFIG_HAS_VPCI_GUEST_SUPPORT - use REGISTER_VPCI_INIT machinery to run required steps on device init/assign: add run_vpci_init helper Since v2: - define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled for x86 Since v1: - constify struct pci_dev where possible - do not open code is_system_domain() - extended the commit message --- xen/drivers/Kconfig | 4 xen/drivers/passthrough/pci.c | 6 ++ xen/drivers/vpci/vpci.c | 27 +++ xen/include/xen/vpci.h| 15 +++ 4 files changed, 52 insertions(+) diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig index db94393f47a6..780490cf8e39 100644 --- a/xen/drivers/Kconfig +++ b/xen/drivers/Kconfig @@ -15,4 +15,8 @@ source "drivers/video/Kconfig" config HAS_VPCI bool +config HAS_VPCI_GUEST_SUPPORT + bool + depends on HAS_VPCI + endmenu diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 50dec3bb73d0..88836aab6baf 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -943,6 +943,8 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus, if ( ret ) goto out; +vpci_deassign_device(d, pdev); + if ( pdev->domain == hardware_domain ) pdev->quarantine = false; @@ -1488,6 +1490,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) ASSERT(pdev && (pdev->domain == hardware_domain || pdev->domain == dom_io)); +vpci_deassign_device(pdev->domain, pdev); + rc = pdev_msix_assign(d, pdev); if ( rc ) goto done; @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) pci_to_dev(pdev), flag); } +rc = vpci_assign_device(d, pdev); + done: if ( rc ) printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index f8a93e61c08f..4e774875fa04 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -99,6 +99,33 @@ int vpci_add_handlers(struct pci_dev *pdev) return rc; } + +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT +/* Notify vPCI that device is assigned to guest. */ +int vpci_assign_device(struct domain *d, struct pci_dev *pdev) +{ +int rc; + +if ( !has_vpci(d) ) +return 0; + +rc = vpci_add_handlers(pdev); +if ( rc ) +vpci_deassign_device(d, pdev); + +return rc; +} + +/* Notify vPCI that device is de-assigned from guest. */ +void vpci_deassign_device(struct domain *d, struct pci_dev *pdev) +{ +if ( !has_vpci(d) ) +return; + +vpci_remove_device(pdev); +} +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */ + #endif /* __XEN__ */ static int vpci_register_cmp(const struct vpci_register *r1, diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index f2a7d82ce77b..246307e6f5d5 100644 --- a/xen/include/xen/vpci.h +++ b/xen/include/xen/vpci.h @@ -251,6 +251,21 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v) } #endif +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT +/* Notify vPCI that device is assigned/de-assigned to/from guest. */ +int vpci_assign_device(struct domain *d, struct pci_dev *pdev); +void vpci_deassign_device(struct domain *d, struct pci_dev *pdev); +#else +static inline int vpci_assign_device(struct domain *d, struct pci_dev *pdev) +{ +return 0; +}; + +static inline void vpci_deassign_device(struct domain *d, struct pci_dev *pdev) +{ +}; +#endif + #endif /* -- 2.25.1
[PATCH v6 00/13] PCI devices passthrough on Arm, part 3
From: Oleksandr Andrushchenko Hi, all! 1. This patch series is focusing on vPCI and adds support for non-identity PCI BAR mappings which is required while passing through a PCI device to a guest. The highlights are: - Add relevant vpci register handlers when assigning PCI device to a domain and remove those when de-assigning. This allows having different handlers for different domains, e.g. hwdom and other guests. - Emulate guest BAR register values based on physical BAR values. This allows creating a guest view of the registers and emulates size and properties probe as it is done during PCI device enumeration by the guest. - Instead of handling a single range set, that contains all the memory regions of all the BARs and ROM, have them per BAR. - Take into account guest's BAR view and program its p2m accordingly: gfn is guest's view of the BAR and mfn is the physical BAR value as set up by the host bridge in the hardware domain. This way hardware doamin sees physical BAR values and guest sees emulated ones. 2. The series also adds support for virtual PCI bus topology for guests: - We emulate a single host bridge for the guest, so segment is always 0. - The implementation is limited to 32 devices which are allowed on a single PCI bus. - The virtual bus number is set to 0, so virtual devices are seen as embedded endpoints behind the root complex. 3. The series has complete re-work of the locking scheme used/absent before with the help of the work started by Roger [1]: [PATCH v6 03/13] vpci: move lock outside of struct vpci This way the lock can be used to check whether vpci is present, and removal can be performed while holding the lock, in order to make sure there are no accesses to the contents of the vpci struct. Previously removal could race with vpci_read for example, since the lock was dropped prior to freeing pdev->vpci. This also solves synchronization issues between all vPCI code entities which could run in parallel. 4. For unprivileged guests vpci_{read|write} has been re-worked to not passthrough accesses to the registers not explicitly handled by the corresponding vPCI handlers: without that passthrough to guests is completely unsafe as Xen allows them full access to the registers. During development this can be reverted for debugging purposes. 5. The series was also tested on: - x86 PVH Dom0 and doesn't break it. - x86 HVM with PCI passthrough to DomU and doesn't break it. - Arm Thank you, Oleksandr [1] https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger@citrix.com/ Oleksandr Andrushchenko (12): xen/pci: arm: add stub for is_memory_hole rangeset: add RANGESETF_no_print flag vpci: restrict unhandled read/write operations for guests vpci: add hooks for PCI device assign/de-assign vpci/header: implement guest BAR register handlers vpci/header: handle p2m range sets per BAR vpci/header: program p2m with guest BAR view vpci/header: emulate PCI_COMMAND register for guests vpci/header: reset the command register when adding devices vpci: add initial support for virtual PCI bus topology xen/arm: translate virtual PCI bus topology for guests xen/arm: account IO handlers for emulated PCI MSI-X Roger Pau Monné (1): vpci: move lock outside of struct vpci tools/tests/vpci/emul.h | 5 +- tools/tests/vpci/main.c | 3 +- xen/arch/arm/mm.c | 6 + xen/arch/arm/vpci.c | 31 ++- xen/arch/x86/hvm/vmsi.c | 8 +- xen/common/rangeset.c | 5 +- xen/drivers/Kconfig | 4 + xen/drivers/passthrough/pci.c | 7 + xen/drivers/vpci/header.c | 407 +++--- xen/drivers/vpci/msi.c| 15 +- xen/drivers/vpci/msix.c | 43 +++- xen/drivers/vpci/vpci.c | 232 --- xen/include/xen/pci.h | 1 + xen/include/xen/rangeset.h| 5 +- xen/include/xen/sched.h | 8 + xen/include/xen/vpci.h| 43 +++- 16 files changed, 688 insertions(+), 135 deletions(-) -- 2.25.1
[PATCH v6 03/13] vpci: move lock outside of struct vpci
From: Roger Pau Monné This way the lock can be used (and in a few cases is used right away) to check whether vpci is present, and removal can be performed while holding the lock, in order to make sure there are no accesses to the contents of the vpci struct. Previously removal could race with vpci_read for example, since the lock was dropped prior to freeing pdev->vpci. Signed-off-by: Roger Pau Monné Signed-off-by: Oleksandr Andrushchenko --- Cc: Andrew Cooper Cc: Jan Beulich Cc: Julien Grall Cc: Stefano Stabellini --- New in v5 of this series: this is an updated version of the patch published at https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger@citrix.com/ Changes since v5: - vpci_lock in test_pdev is already initialized to false by default - introduce msix_{get|put} to protect former msix_find's result - add comments to vpci_{add|remove}_registers about pdev->vpci_lock must be held. - do not split code into vpci_remove_device_handlers_locked yet - move INIT_LIST_HEAD outside the locked region (Jan) - stripped out locking optimizations for vpci_{read|write} into a dedicated patch Changes since v2: - fixed pdev->vpci = xzalloc(struct vpci); under spin_lock (Jan) Changes since v1: - Assert that vpci_lock is locked in vpci_remove_device_locked. - Remove double newline. - Shrink critical section in vpci_{read/write}. --- tools/tests/vpci/emul.h | 5 ++- tools/tests/vpci/main.c | 3 +- xen/arch/x86/hvm/vmsi.c | 8 ++--- xen/drivers/passthrough/pci.c | 1 + xen/drivers/vpci/header.c | 21 +++ xen/drivers/vpci/msi.c| 11 -- xen/drivers/vpci/msix.c | 39 - xen/drivers/vpci/vpci.c | 65 ++- xen/include/xen/pci.h | 1 + xen/include/xen/vpci.h| 3 +- 10 files changed, 106 insertions(+), 51 deletions(-) diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h index 2e1d3057c9d8..d018fb5eef21 100644 --- a/tools/tests/vpci/emul.h +++ b/tools/tests/vpci/emul.h @@ -44,6 +44,7 @@ struct domain { }; struct pci_dev { +bool vpci_lock; struct vpci *vpci; }; @@ -53,10 +54,8 @@ struct vcpu }; extern const struct vcpu *current; -extern const struct pci_dev test_pdev; +extern struct pci_dev test_pdev; -typedef bool spinlock_t; -#define spin_lock_init(l) (*(l) = false) #define spin_lock(l) (*(l) = true) #define spin_unlock(l) (*(l) = false) diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c index b9a0a6006bb9..3b86ed232eb1 100644 --- a/tools/tests/vpci/main.c +++ b/tools/tests/vpci/main.c @@ -23,7 +23,7 @@ static struct vpci vpci; const static struct domain d; -const struct pci_dev test_pdev = { +struct pci_dev test_pdev = { .vpci = &vpci, }; @@ -158,7 +158,6 @@ main(int argc, char **argv) int rc; INIT_LIST_HEAD(&vpci.handlers); -spin_lock_init(&vpci.lock); VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0); VPCI_READ_CHECK(0, 4, r0); diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 13e2a190b439..1f7a37f78264 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -910,14 +910,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) { struct pci_dev *pdev = msix->pdev; -spin_unlock(&msix->pdev->vpci->lock); +spin_unlock(&msix->pdev->vpci_lock); process_pending_softirqs(); /* NB: we assume that pdev cannot go away for an alive domain. */ -if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) ) +if ( !spin_trylock(&pdev->vpci_lock) ) return -EBUSY; -if ( pdev->vpci->msix != msix ) +if ( !pdev->vpci || pdev->vpci->msix != msix ) { -spin_unlock(&pdev->vpci->lock); +spin_unlock(&pdev->vpci_lock); return -EAGAIN; } } diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index e8b09d77d880..50dec3bb73d0 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -397,6 +397,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) *((u8*) &pdev->bus) = bus; *((u8*) &pdev->devfn) = devfn; pdev->domain = NULL; +spin_lock_init(&pdev->vpci_lock); arch_pci_init_pdev(pdev); diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index 40ff79c33f8f..bd23c0274d48 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v) if ( rc == -ERESTART ) return true; -spin_lock(&v->vpci.pdev->vpci->lock); -/* Disable memory decoding unconditionally on failure. */ -modify_decoding(v->vpci.pdev, -rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd, -!rc && v->vpci.rom_only); -
[PATCH v6 02/13] rangeset: add RANGESETF_no_print flag
From: Oleksandr Andrushchenko There are range sets which should not be printed, so introduce a flag which allows marking those as such. Implement relevant logic to skip such entries while printing. While at it also simplify the definition of the flags by directly defining those without helpers. Suggested-by: Jan Beulich Signed-off-by: Oleksandr Andrushchenko Reviewed-by: Jan Beulich --- Since v5: - comment indentation (Jan) Since v1: - update BUG_ON with new flag - simplify the definition of the flags --- xen/common/rangeset.c | 5 - xen/include/xen/rangeset.h | 5 +++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c index 885b6b15c229..ea27d651723b 100644 --- a/xen/common/rangeset.c +++ b/xen/common/rangeset.c @@ -433,7 +433,7 @@ struct rangeset *rangeset_new( INIT_LIST_HEAD(&r->range_list); r->nr_ranges = -1; -BUG_ON(flags & ~RANGESETF_prettyprint_hex); +BUG_ON(flags & ~(RANGESETF_prettyprint_hex | RANGESETF_no_print)); r->flags = flags; safe_strcpy(r->name, name ?: "(no name)"); @@ -575,6 +575,9 @@ void rangeset_domain_printk( list_for_each_entry ( r, &d->rangesets, rangeset_list ) { +if ( r->flags & RANGESETF_no_print ) +continue; + printk(""); rangeset_printk(r); printk("\n"); diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h index 135f33f6066f..f7c69394d66a 100644 --- a/xen/include/xen/rangeset.h +++ b/xen/include/xen/rangeset.h @@ -49,8 +49,9 @@ void rangeset_limit( /* Flags for passing to rangeset_new(). */ /* Pretty-print range limits in hexadecimal. */ -#define _RANGESETF_prettyprint_hex 0 -#define RANGESETF_prettyprint_hex (1U << _RANGESETF_prettyprint_hex) +#define RANGESETF_prettyprint_hex (1U << 0) + /* Do not print entries marked with this flag. */ +#define RANGESETF_no_print (1U << 1) bool_t __must_check rangeset_is_empty( const struct rangeset *r); -- 2.25.1
[xen-4.16-testing test] 167997: tolerable FAIL - PUSHED
flight 167997 xen-4.16-testing real [real] flight 168005 xen-4.16-testing real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/167997/ http://logs.test-lab.xenproject.org/osstest/logs/168005/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 168005-retest Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167894 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167894 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 167894 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167894 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167894 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167894 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 167894 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 167894 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167894 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167894 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167894 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167894 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-su
[xen-4.15-testing test] 167996: tolerable FAIL - PUSHED
flight 167996 xen-4.15-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/167996/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 167965 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167965 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167965 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 167965 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167965 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167965 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167965 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 167965 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167965 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167965 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167965 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167965 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass version targeted for testing: xen 3a9450fe5eb0fda8b7069f37d21ce2655bb59da6 baseline version: xen 32dcef0
[linux-linus test] 167993: regressions - FAIL
flight 167993 linux-linus real [real] flight 168002 linux-linus real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/167993/ http://logs.test-lab.xenproject.org/osstest/logs/168002/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 167988 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167988 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 167988 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167988 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167988 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167988 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167988 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 167988 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 167988 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: linux88808fbbead481aedb46640a5ace69c58287f56a baseline version: linux27bb0b18c208ecd4c0deda6aad28616d73e4133d Last test of basis 167988 2022-02-02 18:11:17 Z1 days Testing same since 167993 2022-02-03 04:34:02 Z0 days1 attempts People who touched revisions under test: Chuck Lever Dai Ngo Dan Carpenter J. Bruce Fields Jan Kara Linus Torvalds jobs: build-amd64-xsm
[xen-unstable test] 167994: tolerable FAIL
flight 167994 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/167994/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat fail pass in 167990 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167990 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 167990 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167990 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167990 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167990 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167990 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 167990 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167990 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167990 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 167990 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167990 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167990 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targe
[PATCH] tools/guest: Fix comment regarding CPUID compatibility
It was Xen 4.14 where CPUID data was added to the migration stream, and 4.13 that we need to worry about with regards to compatibility. Xen 4.12 isn't relevant. Expand and correct the commentary. Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore") Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu CC: Anthony PERARD CC: Juergen Gross Roger, this also has a knock-on effect on your CPUID series. The 4.12 had been nagging me for a while before I figured out why. --- tools/libs/guest/xg_cpuid_x86.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c index b9e827ce7eb0..57f81eb0a082 100644 --- a/tools/libs/guest/xg_cpuid_x86.c +++ b/tools/libs/guest/xg_cpuid_x86.c @@ -497,9 +497,19 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, if ( restore ) { /* - * Account for feature which have been disabled by default since Xen 4.13, - * so migrated-in VM's don't risk seeing features disappearing. + * Xen 4.14 introduced support to move the guest's CPUID data in the + * migration stream. Previously, the destination side would invent a + * policy out of thin air in the hopes that it was ok. + * + * This restore path is used for incoming VMs with no CPUID data + * i.e. originated on Xen 4.13 or earlier. We must invent a policy + * compatible with what Xen 4.13 would have done on the same hardware. + * + * Specifically: + * - Clamp max leaves. + * - Re-enable features which have become (possibly) off by default. */ + p->basic.rdrand = test_bit(X86_FEATURE_RDRAND, host_featureset); p->feat.hle = test_bit(X86_FEATURE_HLE, host_featureset); p->feat.rtm = test_bit(X86_FEATURE_RTM, host_featureset); @@ -509,7 +519,6 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore, p->feat.mpx = test_bit(X86_FEATURE_MPX, host_featureset); } -/* Clamp maximum leaves to the ones supported on 4.12. */ p->basic.max_leaf = min(p->basic.max_leaf, 0xdu); p->feat.max_subleaf = 0; p->extd.max_leaf = min(p->extd.max_leaf, 0x801c); -- 2.11.0
[xen-unstable-smoke test] 167999: tolerable all pass - PUSHED
flight 167999 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/167999/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 75cc460a1b8cfd8e5d2c4302234ee194defe4872 baseline version: xen b17e0ec72eded037297f34a233655aad23f64711 Last test of basis 167985 2022-02-02 10:00:30 Z1 days Testing same since 167999 2022-02-03 13:01:51 Z0 days1 attempts People who touched revisions under test: Oleksandr Andrushchenko Roger Pau Monné jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git b17e0ec72e..75cc460a1b 75cc460a1b8cfd8e5d2c4302234ee194defe4872 -> smoke
Re: [PATCH] xen: Modify domain_crash() to take a print string
On 03.02.2022 15:41, Andrew Cooper wrote: > On 03/02/2022 14:19, Jan Beulich wrote: >> On 03.02.2022 15:11, Andrew Cooper wrote: >>> On 03/02/2022 13:48, Julien Grall wrote: On 03/02/2022 13:38, Andrew Cooper wrote: > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > index 37f78cc4c4c9..38b390d20371 100644 > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v); > * from any processor. > */ > void __domain_crash(struct domain *d); > -#define domain_crash(d) do > { \ > - printk("domain_crash called from %s:%d\n", __FILE__, > __LINE__); \ > - > __domain_crash(d); \ > -} while (0) > +#define domain_crash(d, ...) \ > + do { \ > + if ( count_args(__VA_ARGS__) == 0 ) \ > + printk("domain_crash called from %s:%d\n", \ > + __FILE__, __LINE__); \ I find a bit odd that here you are using a normal printk >>> That's unmodified from before. Only reformatted. >>> but... > + else \ > + printk(XENLOG_G_ERR __VA_ARGS__); \ here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so, wouldn't it be better to only use XENLOG_ERR so they can always be seen? (A domain shouldn't be able to abuse it). >>> Perhaps. I suppose it is more important information than pretty much >>> anything else about the guest. >> Indeed, but then - is this really an error in all cases? > > Yes. It is always a fatal event for the VM. Which may or may not be Xen's fault. If the guest put itself in a bad state, I don't see why we shouldn't consider such just a warning. IOW I continue to think a log level, if so wanted, should be supplied by the user of the macro. Jan
Re: [PATCH] xen: Modify domain_crash() to take a print string
On 03/02/2022 14:19, Jan Beulich wrote: > On 03.02.2022 15:11, Andrew Cooper wrote: >> On 03/02/2022 13:48, Julien Grall wrote: >>> On 03/02/2022 13:38, Andrew Cooper wrote: diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 37f78cc4c4c9..38b390d20371 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v); * from any processor. */ void __domain_crash(struct domain *d); -#define domain_crash(d) do { \ - printk("domain_crash called from %s:%d\n", __FILE__, __LINE__); \ - __domain_crash(d); \ -} while (0) +#define domain_crash(d, ...) \ + do { \ + if ( count_args(__VA_ARGS__) == 0 ) \ + printk("domain_crash called from %s:%d\n", \ + __FILE__, __LINE__); \ >>> I find a bit odd that here you are using a normal printk >> That's unmodified from before. Only reformatted. >> >>> but... >>> >>> + else \ + printk(XENLOG_G_ERR __VA_ARGS__); \ >>> here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so, >>> wouldn't it be better to only use XENLOG_ERR so they can always be >>> seen? (A domain shouldn't be able to abuse it). >> Perhaps. I suppose it is more important information than pretty much >> anything else about the guest. > Indeed, but then - is this really an error in all cases? Yes. It is always a fatal event for the VM. > The prior > printk() simply ended up defaulting to a warning, and I would think > that's what the new one should be doing too. WARNING isn't exactly correct for a fatal event. Ideally, we want a non-ratelimited G_ERR, but that doesn't exist. If it appears in the future, we can use it. The set of people running with loglvl=error is almost certainly empty, so it doesn't matter. ~Andrew
[PATCH v2] tools/libxl: don't allow IOMMU usage with PoD
Prevent libxl from creating guests that attempts to use PoD together with an IOMMU, even if no devices are actually assigned. While the hypervisor could support using PoD together with an IOMMU as long as no devices are assigned, such usage seems doubtful. There's no guarantee the guest has PoD no longer be active, and thus a later assignment of a PCI device to such domain could fail. Preventing the usage of PoD together with an IOMMU at guest creation avoids having to add checks for active PoD entries in the device assignment paths. Signed-off-by: Roger Pau Monné --- Cc: Jan Beulich --- Changes since v1: - Reword commit message. --- tools/libs/light/libxl_create.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c index d7a40d7550..7499922088 100644 --- a/tools/libs/light/libxl_create.c +++ b/tools/libs/light/libxl_create.c @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc, pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) && (d_config->b_info.target_memkb < d_config->b_info.max_memkb); -/* We cannot have PoD and PCI device assignment at the same time - * for HVM guest. It was reported that IOMMU cannot work with PoD - * enabled because it needs to populated entire page table for - * guest. To stay on the safe side, we disable PCI device - * assignment when PoD is enabled. +/* We don't support having PoD and an IOMMU at the same time for HVM + * guests. An active IOMMU cannot work with PoD because it needs a fully + * populated page-table. Prevent PoD usage if the domain has an IOMMU + * assigned, even if not active. */ if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV && -d_config->num_pcidevs && pod_enabled) { +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED && +pod_enabled) { ret = ERROR_INVAL; -LOGD(ERROR, domid, - "PCI device assignment for HVM guest failed due to PoD enabled"); +LOGD(ERROR, domid, "IOMMU not supported together with PoD"); goto error_out; } -- 2.34.1
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
On 03.02.22 16:05, Roger Pau Monné wrote: > On Thu, Feb 03, 2022 at 01:30:26PM +, Oleksandr Andrushchenko wrote: >> >> On 03.02.22 14:54, Jan Beulich wrote: >>> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote: >> Also memory decoding needs to be initially disabled when used by >> guests, in order to prevent the BAR being placed on top of a RAM >> region. The guest physmap will be different from the host one, so it's >> possible for BARs to end up placed on top of RAM regions initially >> until the firmware or OS places them at a suitable address. > Agree, memory decoding must be disabled Isn't it already achieved by the toolstack resetting the PCI device while assigning it to a guest? >>> Iirc the tool stack would reset a device only after getting it back from >>> a DomU. When coming straight from Dom0 or DomIO, no reset would be >>> performed. Furthermore, (again iirc) there are cases where there's no >>> known (standard) way to reset a device. Assigning such to a guest when >>> it previously was owned by another one is risky (and hence needs an >>> admin knowing what they're doing), but may be acceptable in particular >>> when e.g. simply rebooting a guest. >>> >>> IOW - I don't think you can rely on the bit being in a particular state. >> So, you mean something like: >> >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >> index 7695158e6445..9ebd57472da8 100644 >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev) >> return rc; >> } >> >> + /* >> + * Memory decoding needs to be initially disabled when used by >> + * guests, in order to prevent the BAR being placed on top of a RAM >> + * region. >> + */ >> + if ( !is_hwdom ) >> + pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & >> ~PCI_COMMAND_MEMORY); > Memory decoding is already disabled here, so you just need to avoid > enabling it, for example: > > /* > * Memory decoding needs to be initially disabled when used by > * guests, in order to prevent the BARs being mapped at gfn 0 by > * default. > */ > if ( !is_hwdom ) > cmd &= ~PCI_COMMAND_MEMORY; > >> return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0; > This is important here because guest_reg won't be set (ie: will be set > to 0) so if for some reason memory decoding was enabled you would end > up with BARs mappings overlapping at gfn 0. Then the patch [1] will do what we need to be done for the guest I guess I am thinking to still have it in the series which will solve exactly the problem we are trying to solve > > Thanks, Roger. [1] https://patchwork.kernel.org/project/xen-devel/patch/20211125110251.2877218-11-andr2...@gmail.com/
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
On 03.02.22 16:04, Jan Beulich wrote: > On 03.02.2022 14:30, Oleksandr Andrushchenko wrote: >> >> On 03.02.22 14:54, Jan Beulich wrote: >>> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote: >> Also memory decoding needs to be initially disabled when used by >> guests, in order to prevent the BAR being placed on top of a RAM >> region. The guest physmap will be different from the host one, so it's >> possible for BARs to end up placed on top of RAM regions initially >> until the firmware or OS places them at a suitable address. > Agree, memory decoding must be disabled Isn't it already achieved by the toolstack resetting the PCI device while assigning it to a guest? >>> Iirc the tool stack would reset a device only after getting it back from >>> a DomU. When coming straight from Dom0 or DomIO, no reset would be >>> performed. Furthermore, (again iirc) there are cases where there's no >>> known (standard) way to reset a device. Assigning such to a guest when >>> it previously was owned by another one is risky (and hence needs an >>> admin knowing what they're doing), but may be acceptable in particular >>> when e.g. simply rebooting a guest. >>> >>> IOW - I don't think you can rely on the bit being in a particular state. >> So, you mean something like: > Perhaps, but then I think ... > >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev) >> return rc; >> } >> >> + /* >> + * Memory decoding needs to be initially disabled when used by >> + * guests, in order to prevent the BAR being placed on top of a RAM >> + * region. >> + */ >> + if ( !is_hwdom ) >> + pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & >> ~PCI_COMMAND_MEMORY); >> + >> return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0; > ... you also want to update cmd, thus avoiding the call to modify_bars(). > > And btw, from an abstract pov the same is true for I/O decoding: I > realize that you mean to leave I/O port BARs aside for the moment, but I > think the command register handling could very well take care of both. > > Which quickly gets us to the bus master enable bit: I think that one > should initially be off too. Making me wonder: Doesn't the PCI spec > define what the reset state of this register is? If so, that's what I > think we want to put in place for DomU-s. The spec I have says that all bits are typically 0 after reset. So, it seems to be reasonable to just write 0 to PCI_COMMAND > Jan > Thank you, Oleksandr
Re: [PATCH] xen: Modify domain_crash() to take a print string
On 03.02.2022 15:11, Andrew Cooper wrote: > On 03/02/2022 13:48, Julien Grall wrote: >> On 03/02/2022 13:38, Andrew Cooper wrote: >>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h >>> index 37f78cc4c4c9..38b390d20371 100644 >>> --- a/xen/include/xen/sched.h >>> +++ b/xen/include/xen/sched.h >>> @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v); >>> * from any processor. >>> */ >>> void __domain_crash(struct domain *d); >>> -#define domain_crash(d) do >>> { \ >>> - printk("domain_crash called from %s:%d\n", __FILE__, >>> __LINE__); \ >>> - >>> __domain_crash(d); \ >>> -} while (0) >>> +#define domain_crash(d, ...) \ >>> + do { \ >>> + if ( count_args(__VA_ARGS__) == 0 ) \ >>> + printk("domain_crash called from %s:%d\n", \ >>> + __FILE__, __LINE__); \ >> >> I find a bit odd that here you are using a normal printk > > That's unmodified from before. Only reformatted. > >> but... >> >> >>> + else \ >>> + printk(XENLOG_G_ERR __VA_ARGS__); \ >> >> here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so, >> wouldn't it be better to only use XENLOG_ERR so they can always be >> seen? (A domain shouldn't be able to abuse it). > > Perhaps. I suppose it is more important information than pretty much > anything else about the guest. Indeed, but then - is this really an error in all cases? The prior printk() simply ended up defaulting to a warning, and I would think that's what the new one should be doing too. Or even leave the setting of the log level to the invocation sites of the macro. Jan
Re: [PATCH] xen: Modify domain_crash() to take a print string
Hi Andrew, On 03/02/2022 14:11, Andrew Cooper wrote: On 03/02/2022 13:48, Julien Grall wrote: Hi, On 03/02/2022 13:38, Andrew Cooper wrote: diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 37f78cc4c4c9..38b390d20371 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v); * from any processor. */ void __domain_crash(struct domain *d); -#define domain_crash(d) do { \ - printk("domain_crash called from %s:%d\n", __FILE__, __LINE__); \ - __domain_crash(d); \ -} while (0) +#define domain_crash(d, ...) \ + do { \ + if ( count_args(__VA_ARGS__) == 0 ) \ + printk("domain_crash called from %s:%d\n", \ + __FILE__, __LINE__); \ I find a bit odd that here you are using a normal printk That's unmodified from before. Only reformatted. but... + else \ + printk(XENLOG_G_ERR __VA_ARGS__); \ here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so, wouldn't it be better to only use XENLOG_ERR so they can always be seen? (A domain shouldn't be able to abuse it). Perhaps. I suppose it is more important information than pretty much anything else about the guest. I've changed locally, but won't repost just for this. Ok. With that: Reviewed-by: Julien Grall Cheers, -- Julien Grall
Xen 4.14.4 released
All, we're pleased to announce the release of Xen 4.14.4. This is available immediately from its git repository http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.14 (tag RELEASE-4.14.4) or from the XenProject download page https://xenproject.org/downloads/xen-project-archives/xen-project-4-14-series/xen-project-4-14-4/ (where a list of changes can also be found). We recommend all users of the 4.14 stable series to update to this last point release scheduled to be made by the Xen Project team from this branch. Regards, Jan
Xen 4.15.2 released
All, we're pleased to announce the release of Xen 4.15.2. This is available immediately from its git repository http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.15 (tag RELEASE-4.15.2) or from the XenProject download page https://xenproject.org/downloads/xen-project-archives/xen-project-4-15-series/xen-project-4-15-2/ (where a list of changes can also be found). We recommend all users of the 4.15 stable series to update to this latest point release. Regards, Jan
Re: [PATCH] xen: Modify domain_crash() to take a print string
On 03/02/2022 13:48, Julien Grall wrote: > Hi, > > On 03/02/2022 13:38, Andrew Cooper wrote: >> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h >> index 37f78cc4c4c9..38b390d20371 100644 >> --- a/xen/include/xen/sched.h >> +++ b/xen/include/xen/sched.h >> @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v); >> * from any processor. >> */ >> void __domain_crash(struct domain *d); >> -#define domain_crash(d) do >> { \ >> - printk("domain_crash called from %s:%d\n", __FILE__, >> __LINE__); \ >> - >> __domain_crash(d); \ >> -} while (0) >> +#define domain_crash(d, ...) \ >> + do { \ >> + if ( count_args(__VA_ARGS__) == 0 ) \ >> + printk("domain_crash called from %s:%d\n", \ >> + __FILE__, __LINE__); \ > > I find a bit odd that here you are using a normal printk That's unmodified from before. Only reformatted. > but... > > >> + else \ >> + printk(XENLOG_G_ERR __VA_ARGS__); \ > > here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so, > wouldn't it be better to only use XENLOG_ERR so they can always be > seen? (A domain shouldn't be able to abuse it). Perhaps. I suppose it is more important information than pretty much anything else about the guest. I've changed locally, but won't repost just for this. ~Andrew
Re: [PATCH] tools/libxl: don't allow IOMMU usage with PoD
On 03.02.2022 14:50, Roger Pau Monné wrote: > On Thu, Feb 03, 2022 at 12:55:56PM +0100, Jan Beulich wrote: >> On 03.02.2022 12:06, Roger Pau Monne wrote: >>> Prevent libxl from creating guests that attempts to use PoD together >>> with an IOMMU, even if no devices are actually assigned. >>> >>> While the hypervisor could support using PoD together with an IOMMU as >>> long as no devices are assigned, such usage seems doubtful. There's no >>> guarantee the guest has ballooned down enough memory for PoD to no >>> longer be active, and thus a later assignment of a PCI device to such >>> domain could fail. >> >> That's not a precise description of the constraint: The guest ballooning >> down enough only means entries == cache, but for device assignment we >> need entries == 0 (and a guarantee that no new entries can appear, but I >> think this is already the case once a guest was launched). > > Would you be OK with: > > "While the hypervisor could support using PoD together with an IOMMU as > long as no devices are assigned, such usage seems doubtful. There's no > guarantee the guest has PoD no longer be active, and thus a later > assignment of a PCI device to such domain could fail." Yes, thanks, this sounds better (to me at least). Jan
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
On Thu, Feb 03, 2022 at 01:30:26PM +, Oleksandr Andrushchenko wrote: > > > On 03.02.22 14:54, Jan Beulich wrote: > > On 03.02.2022 13:45, Oleksandr Andrushchenko wrote: > Also memory decoding needs to be initially disabled when used by > guests, in order to prevent the BAR being placed on top of a RAM > region. The guest physmap will be different from the host one, so it's > possible for BARs to end up placed on top of RAM regions initially > until the firmware or OS places them at a suitable address. > >>> Agree, memory decoding must be disabled > >> Isn't it already achieved by the toolstack resetting the PCI device > >> while assigning it to a guest? > > Iirc the tool stack would reset a device only after getting it back from > > a DomU. When coming straight from Dom0 or DomIO, no reset would be > > performed. Furthermore, (again iirc) there are cases where there's no > > known (standard) way to reset a device. Assigning such to a guest when > > it previously was owned by another one is risky (and hence needs an > > admin knowing what they're doing), but may be acceptable in particular > > when e.g. simply rebooting a guest. > > > > IOW - I don't think you can rely on the bit being in a particular state. > So, you mean something like: > > diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c > index 7695158e6445..9ebd57472da8 100644 > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev) > return rc; > } > > + /* > + * Memory decoding needs to be initially disabled when used by > + * guests, in order to prevent the BAR being placed on top of a RAM > + * region. > + */ > + if ( !is_hwdom ) > + pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY); Memory decoding is already disabled here, so you just need to avoid enabling it, for example: /* * Memory decoding needs to be initially disabled when used by * guests, in order to prevent the BARs being mapped at gfn 0 by * default. */ if ( !is_hwdom ) cmd &= ~PCI_COMMAND_MEMORY; > return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0; This is important here because guest_reg won't be set (ie: will be set to 0) so if for some reason memory decoding was enabled you would end up with BARs mappings overlapping at gfn 0. Thanks, Roger.
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
On 03.02.2022 14:30, Oleksandr Andrushchenko wrote: > > > On 03.02.22 14:54, Jan Beulich wrote: >> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote: > Also memory decoding needs to be initially disabled when used by > guests, in order to prevent the BAR being placed on top of a RAM > region. The guest physmap will be different from the host one, so it's > possible for BARs to end up placed on top of RAM regions initially > until the firmware or OS places them at a suitable address. Agree, memory decoding must be disabled >>> Isn't it already achieved by the toolstack resetting the PCI device >>> while assigning it to a guest? >> Iirc the tool stack would reset a device only after getting it back from >> a DomU. When coming straight from Dom0 or DomIO, no reset would be >> performed. Furthermore, (again iirc) there are cases where there's no >> known (standard) way to reset a device. Assigning such to a guest when >> it previously was owned by another one is risky (and hence needs an >> admin knowing what they're doing), but may be acceptable in particular >> when e.g. simply rebooting a guest. >> >> IOW - I don't think you can rely on the bit being in a particular state. > So, you mean something like: Perhaps, but then I think ... > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev) > return rc; > } > > + /* > + * Memory decoding needs to be initially disabled when used by > + * guests, in order to prevent the BAR being placed on top of a RAM > + * region. > + */ > + if ( !is_hwdom ) > + pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY); > + > return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0; ... you also want to update cmd, thus avoiding the call to modify_bars(). And btw, from an abstract pov the same is true for I/O decoding: I realize that you mean to leave I/O port BARs aside for the moment, but I think the command register handling could very well take care of both. Which quickly gets us to the bus master enable bit: I think that one should initially be off too. Making me wonder: Doesn't the PCI spec define what the reset state of this register is? If so, that's what I think we want to put in place for DomU-s. Jan
[PATCH v4] x86/altp2m: p2m_altp2m_propagate_change() should honor present page order
For higher order mappings the comparison against p2m->min_remapped_gfn needs to take the upper bound of the covered GFN range into account, not just the base GFN. Otherwise, i.e. when dropping a mapping overlapping the remapped range but the base GFN outside of that range, an altp2m may wrongly not get reset. Note that there's no need to call get_gfn_type_access() ahead of the check against the remapped range boundaries: None of its outputs are needed earlier, and p2m_reset_altp2m() doesn't require the lock to be held. In fact this avoids a latent lock order violation: With per-GFN locking p2m_reset_altp2m() not only doesn't require the GFN lock to be held, but holding such a lock would actually not be allowed, as the function acquires a P2M lock. Local variables are moved into the more narrow scope (one is deleted altogether) to help see their actual life ranges. Signed-off-by: Jan Beulich --- Note that this addresses only half of the problem: get_gfn_type_access() would also need invoking for all of the involved GFNs, not just the 1st one. --- v4: Restore mistakenly dropped mfn_eq(mfn, INVALID_MFN). v3: Don't pass the minimum of both orders to p2m_set_entry() (as was the case in the original code). Restore get_gfn_type_access() return value checking. --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2534,9 +2534,6 @@ int p2m_altp2m_propagate_change(struct d p2m_type_t p2mt, p2m_access_t p2ma) { struct p2m_domain *p2m; -p2m_access_t a; -p2m_type_t t; -mfn_t m; unsigned int i; unsigned int reset_count = 0; unsigned int last_reset_idx = ~0; @@ -2549,15 +2546,17 @@ int p2m_altp2m_propagate_change(struct d for ( i = 0; i < MAX_ALTP2M; i++ ) { +p2m_type_t t; +p2m_access_t a; + if ( d->arch.altp2m_eptp[i] == mfn_x(INVALID_MFN) ) continue; p2m = d->arch.altp2m_p2m[i]; -m = get_gfn_type_access(p2m, gfn_x(gfn), &t, &a, 0, NULL); /* Check for a dropped page that may impact this altp2m */ if ( mfn_eq(mfn, INVALID_MFN) && - gfn_x(gfn) >= p2m->min_remapped_gfn && + gfn_x(gfn) + (1UL << page_order) > p2m->min_remapped_gfn && gfn_x(gfn) <= p2m->max_remapped_gfn ) { if ( !reset_count++ ) @@ -2568,8 +2566,6 @@ int p2m_altp2m_propagate_change(struct d else { /* At least 2 altp2m's impacted, so reset everything */ -__put_gfn(p2m, gfn_x(gfn)); - for ( i = 0; i < MAX_ALTP2M; i++ ) { if ( i == last_reset_idx || @@ -2583,16 +2579,19 @@ int p2m_altp2m_propagate_change(struct d break; } } -else if ( !mfn_eq(m, INVALID_MFN) ) +else if ( !mfn_eq(get_gfn_type_access(p2m, gfn_x(gfn), &t, &a, 0, + NULL), INVALID_MFN) ) { int rc = p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma); /* Best effort: Don't bail on error. */ if ( !ret ) ret = rc; -} -__put_gfn(p2m, gfn_x(gfn)); +__put_gfn(p2m, gfn_x(gfn)); +} +else +__put_gfn(p2m, gfn_x(gfn)); } altp2m_list_unlock(d);
Re: [PATCH] tools/libxl: don't allow IOMMU usage with PoD
On Thu, Feb 03, 2022 at 12:55:56PM +0100, Jan Beulich wrote: > On 03.02.2022 12:06, Roger Pau Monne wrote: > > Prevent libxl from creating guests that attempts to use PoD together > > with an IOMMU, even if no devices are actually assigned. > > > > While the hypervisor could support using PoD together with an IOMMU as > > long as no devices are assigned, such usage seems doubtful. There's no > > guarantee the guest has ballooned down enough memory for PoD to no > > longer be active, and thus a later assignment of a PCI device to such > > domain could fail. > > That's not a precise description of the constraint: The guest ballooning > down enough only means entries == cache, but for device assignment we > need entries == 0 (and a guarantee that no new entries can appear, but I > think this is already the case once a guest was launched). Would you be OK with: "While the hypervisor could support using PoD together with an IOMMU as long as no devices are assigned, such usage seems doubtful. There's no guarantee the guest has PoD no longer be active, and thus a later assignment of a PCI device to such domain could fail." By "PoD no longer be active" meaning cache == entries == 0. Thanks, Roger.
Re: [PATCH] xen: Modify domain_crash() to take a print string
Hi, On 03/02/2022 13:38, Andrew Cooper wrote: diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 37f78cc4c4c9..38b390d20371 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v); * from any processor. */ void __domain_crash(struct domain *d); -#define domain_crash(d) do { \ -printk("domain_crash called from %s:%d\n", __FILE__, __LINE__); \ -__domain_crash(d);\ -} while (0) +#define domain_crash(d, ...)\ +do {\ +if ( count_args(__VA_ARGS__) == 0 ) \ +printk("domain_crash called from %s:%d\n", \ + __FILE__, __LINE__); \ I find a bit odd that here you are using a normal printk but... +else\ +printk(XENLOG_G_ERR __VA_ARGS__); \ here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so, wouldn't it be better to only use XENLOG_ERR so they can always be seen? (A domain shouldn't be able to abuse it). Cheers, -- Julien Grall
[PATCH] xen: Modify domain_crash() to take a print string
There are two problems with domain_crash(). First, that it is frequently not preceded by a printk() at all, or that it is only preceded by a dprintk(). Either way, critical diagnostic is omitted for an event which is fatal to the guest. Second, the embedded __LINE__ is an issue for livepatching, creating unwanted churn in the binary diff. This is the final __LINE__ remaining in livepatching-relevant contexts. The end goal is to have domain_crash() require a print string which gets fed to printk(), making it far less easy to omit relevant diagnostic information. However, modifying all callers at once is far too big and complicated, so use some macro magic to tolerate the old API (no print string) in the short term. Adjust two callers in load_segments() to demonstrate the new API. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu CC: Stefano Stabellini CC: Julien Grall CC: Volodymyr Babchuk CC: Bertrand Marquis Supersedes my previous attempt to update every caller in one go. In due course I'll split that mammoth patch up into a series. --- xen/arch/x86/domain.c | 14 -- xen/include/xen/sched.h | 13 + 2 files changed, 13 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index ef1812dc1402..45be5e1cd7c9 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1693,11 +1693,8 @@ static void load_segments(struct vcpu *n) put_guest(uregs->fs, esp - 5) | put_guest(uregs->es, esp - 6) | put_guest(uregs->ds, esp - 7) ) -{ -gprintk(XENLOG_ERR, -"error while creating compat failsafe callback frame\n"); -domain_crash(n->domain); -} +domain_crash(n->domain, + "Error creating compat failsafe callback frame\n"); if ( n->arch.pv.vgc_flags & VGCF_failsafe_disables_events ) vcpu_info(n, evtchn_upcall_mask) = 1; @@ -1732,11 +1729,8 @@ static void load_segments(struct vcpu *n) put_guest(uregs->ds, rsp - 9) | put_guest(regs->r11, rsp - 10) | put_guest(regs->rcx, rsp - 11) ) -{ -gprintk(XENLOG_ERR, -"error while creating failsafe callback frame\n"); -domain_crash(n->domain); -} +domain_crash(n->domain, + "Error creating failsafe callback frame\n"); if ( n->arch.pv.vgc_flags & VGCF_failsafe_disables_events ) vcpu_info(n, evtchn_upcall_mask) = 1; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 37f78cc4c4c9..38b390d20371 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v); * from any processor. */ void __domain_crash(struct domain *d); -#define domain_crash(d) do { \ -printk("domain_crash called from %s:%d\n", __FILE__, __LINE__); \ -__domain_crash(d);\ -} while (0) +#define domain_crash(d, ...)\ +do {\ +if ( count_args(__VA_ARGS__) == 0 ) \ +printk("domain_crash called from %s:%d\n", \ + __FILE__, __LINE__); \ +else\ +printk(XENLOG_G_ERR __VA_ARGS__); \ +__domain_crash(d); \ +} while ( 0 ) /* * Called from assembly code, with an optional address to help indicate why -- 2.11.0
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
On 03.02.22 14:54, Jan Beulich wrote: > On 03.02.2022 13:45, Oleksandr Andrushchenko wrote: Also memory decoding needs to be initially disabled when used by guests, in order to prevent the BAR being placed on top of a RAM region. The guest physmap will be different from the host one, so it's possible for BARs to end up placed on top of RAM regions initially until the firmware or OS places them at a suitable address. >>> Agree, memory decoding must be disabled >> Isn't it already achieved by the toolstack resetting the PCI device >> while assigning it to a guest? > Iirc the tool stack would reset a device only after getting it back from > a DomU. When coming straight from Dom0 or DomIO, no reset would be > performed. Furthermore, (again iirc) there are cases where there's no > known (standard) way to reset a device. Assigning such to a guest when > it previously was owned by another one is risky (and hence needs an > admin knowing what they're doing), but may be acceptable in particular > when e.g. simply rebooting a guest. > > IOW - I don't think you can rely on the bit being in a particular state. So, you mean something like: diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index 7695158e6445..9ebd57472da8 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev) return rc; } + /* + * Memory decoding needs to be initially disabled when used by + * guests, in order to prevent the BAR being placed on top of a RAM + * region. + */ + if ( !is_hwdom ) + pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY); + return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0; } REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE); > Jan > Thank you, Oleksandr
Re: [PATCH v2 1/2] xen: add option to disable GNTTABOP_transfer
On 03.02.2022 14:14, Juergen Gross wrote: > The grant table operation GNTTABOP_transfer is meant to be used in > PV device backends, and it hasn't been used in Linux since the old > Xen-o-Linux days. > > Add a command line sub-option to the "gnttab" option for disabling the > GNTTABOP_transfer functionality. > > Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich
[libvirt test] 167992: regressions - FAIL
flight 167992 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/167992/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-raw 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) blocked n/a version targeted for testing: libvirt 41e878859ac7eb4059f2e9b338e585e205fd575c baseline version: libvirt 2c846fa6bcc11929c9fb857a22430fb9945654ad Last test of basis 151777 2020-07-10 04:19:19 Z 573 days Failing since151818 2020-07-11 04:18:52 Z 572 days 554 attempts Testing same since 167992 2022-02-03 04:20:17 Z0 days1 attempts People who touched revisions under test: Adolfo Jayme Barrientos Aleksandr Alekseev Aleksei Zakharov Andika Triwidada Andrea Bolognani Ani Sinha Balázs Meskó Barrett Schonefeld Bastian Germann Bastien Orivel BiaoXiang Ye Bihong Yu Binfeng Wu Bjoern Walk Boris Fiuczynski Brad Laue Brian Turek Bruno Haible Chris Mayo Christian Borntraeger Christian Ehrhardt Christian Kirbach Christian Schoenebeck Christophe Fergeau Cole Robinson Collin Walling Cornelia Huck Cédric Bosdonnat Côme Borsoi Daniel Henrique Barboza Daniel Letai Daniel P. Berrange Daniel P. Berrangé Didik Supriadi dinglimin Divya Garg Dmitrii Shcherbakov Dmytro Linkin Eiichi Tsukata Emilio Herrera Eric Farman Erik Skultety Fabian Affolter Fabian Freyer Fabiano Fidêncio Fangge Jin Farhan Ali Fedora Weblate Translation Franck Ridel Gavi Teitz gongwei Guoyi Tu Göran Uddeborg Halil Pasic Han Han Hao Wang Hela Basa Helmut Grohne Hiroki Narukawa Hyman Huang(黄勇) Ian Wienand Ioanna Alifieraki Ivan Teterevkov Jakob Meng Jamie Strandboge Jamie Strandboge Jan Kuparinen jason lee Jean-Baptiste Holcroft Jia Zhou Jianan Gao Jim Fehlig Jin Yan Jinsheng Zhang Jiri Denemark Joachim Falk John Ferlan Jonathan Watt Jonathon Jongsma Julio Faracco Justin Gatzen Ján Tomko Kashyap Chamarthy Kevin Locke Koichi Murase Kristina Hanicova Laine Stump Laszlo Ersek Lee Yarwood Lei Yang Liao Pingfang Lin Ma Lin Ma Lin Ma Liu Yiding Lubomir Rintel Luke Yue Luyao Zhong Marc Hartmayer Marc-André Lureau Marek Marczykowski-Górecki Markus Schade Martin Kletzander Masayoshi Mizuma Matej Cepl Matt Coleman Matt Coleman Mauro Matteo Cascella Meina Li Michal Privoznik Michał Smyk Milo Casagrande Moshe Levi Muha Aliss Nathan Neal Gompa Nick Chevsky Nick Shyrokovskiy Nickys Music Group Nico Pache Nicolas Lécureuil Nicolas Lécureuil Nikolay Shirokovskiy Olaf Hering Olesya Gerasimenko Or Ozeri Orion Poplawski Pany Patrick Magauran Paulo de Rezende Pinatti Pavel Hrdina Peng Liang Peter Krempa Pino Toscano Pino Toscano Piotr Drąg Prathamesh Chavan Praveen K Paladugu Richard W.M. Jones Ricky Tigg Robin Lee Rohit Kumar Roman Bogorodskiy Roman Bolshakov Ryan Gahagan Ryan Schmidt Sam Hartman Scott Shambarger Sebastian Mitterle SeongHyun Jo Shalini Chellathurai Saroja Shaojun Yang shenjiat
[PATCH v2 2/2] xen/include/public: deprecate GNTTABOP_transfer
Add a comment to include/public/grant_table.h that GNTTABOP_transfer is deprecated, in order to discourage new use cases. Signed-off-by: Juergen Gross --- v2: - new patch --- xen/include/public/grant_table.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/public/grant_table.h b/xen/include/public/grant_table.h index 7934d7b718..7fbd1c6d10 100644 --- a/xen/include/public/grant_table.h +++ b/xen/include/public/grant_table.h @@ -417,6 +417,8 @@ DEFINE_XEN_GUEST_HANDLE(gnttab_dump_table_t); * GNTTABOP_transfer: Transfer to a foreign domain. The foreign domain * has previously registered its interest in the transfer via . * + * This operation is deprecated! Please don't add new use cases! + * * Note that, even if the transfer fails, the specified page no longer belongs * to the calling domain *unless* the error is GNTST_bad_page. * -- 2.34.1
[PATCH v2 1/2] xen: add option to disable GNTTABOP_transfer
The grant table operation GNTTABOP_transfer is meant to be used in PV device backends, and it hasn't been used in Linux since the old Xen-o-Linux days. Add a command line sub-option to the "gnttab" option for disabling the GNTTABOP_transfer functionality. Signed-off-by: Juergen Gross --- V2: - make option available for CONFIG_PV only (Jan Beulich) - return -EOPNOTSUPP instead of -ENOSYS (Jan Beulich) --- docs/misc/xen-command-line.pandoc | 8 ++-- xen/common/grant_table.c | 12 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index 6b3da6ddc1..44232b94c5 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -1167,9 +1167,9 @@ does not provide `VM_ENTRY_LOAD_GUEST_PAT`. Specify which console gdbstub should use. See **console**. ### gnttab -> `= List of [ max-ver:, transitive= ]` +> `= List of [ max-ver:, transitive=, transfer= ]` -> Default: `gnttab=max-ver:2,transitive` +> Default: `gnttab=max-ver:2,transitive,transfer` Control various aspects of the grant table behaviour available to guests. @@ -1178,6 +1178,10 @@ version are 1 and 2. * `transitive` Permit or disallow the use of transitive grants. Note that the use of grant table v2 without transitive grants is an ABI breakage from the guests point of view. +* `transfer` Permit or disallow the GNTTABOP_transfer operation of the +grant table hypercall. Note that disallowing GNTTABOP_transfer is an ABI +breakage from the guests point of view. This option is only available on +hypervisors configured to support PV guests. The usage of gnttab v2 is not security supported on ARM platforms. diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index ed1e2fabce..57dfc54994 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -181,6 +181,11 @@ static int parse_gnttab_max_maptrack_frames(const char *arg) unsigned int __read_mostly opt_gnttab_max_version = GNTTAB_MAX_VERSION; static bool __read_mostly opt_transitive_grants = true; +#ifdef CONFIG_PV +static bool __ro_after_init opt_grant_transfer = true; +#else +#define opt_grant_transfer false +#endif static int __init parse_gnttab(const char *s) { @@ -204,6 +209,10 @@ static int __init parse_gnttab(const char *s) } else if ( (val = parse_boolean("transitive", s, ss)) >= 0 ) opt_transitive_grants = val; +#ifndef opt_grant_transfer +else if ( (val = parse_boolean("transfer", s, ss)) >= 0 ) +opt_grant_transfer = val; +#endif else rc = -EINVAL; @@ -2233,6 +2242,9 @@ gnttab_transfer( unsigned int max_bitsize; struct active_grant_entry *act; +if ( !opt_grant_transfer ) +return -EOPNOTSUPP; + for ( i = 0; i < count; i++ ) { bool_t okay; -- 2.34.1
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
On 03.02.2022 13:45, Oleksandr Andrushchenko wrote: >>> Also memory decoding needs to be initially disabled when used by >>> guests, in order to prevent the BAR being placed on top of a RAM >>> region. The guest physmap will be different from the host one, so it's >>> possible for BARs to end up placed on top of RAM regions initially >>> until the firmware or OS places them at a suitable address. >> Agree, memory decoding must be disabled > Isn't it already achieved by the toolstack resetting the PCI device > while assigning it to a guest? Iirc the tool stack would reset a device only after getting it back from a DomU. When coming straight from Dom0 or DomIO, no reset would be performed. Furthermore, (again iirc) there are cases where there's no known (standard) way to reset a device. Assigning such to a guest when it previously was owned by another one is risky (and hence needs an admin knowing what they're doing), but may be acceptable in particular when e.g. simply rebooting a guest. IOW - I don't think you can rely on the bit being in a particular state. Jan
[PATCH v2] docs: document patch rules
Add a document to describe the rules for sending a proper patch. As it contains all the information already being present in docs/process/tags.pandoc remove that file. The "Reviewed-by:" and "Acked-by:" tags are expanded to allow an optional restriction of the tag. A new tag "Origin:" is added to tag patches taken from another project. Signed-off-by: Juergen Gross --- v2: - expanded commit message (Roger Pau Monné) - some rewordings (Roger Pau Monné, Jan Beulich) - add "Requested-by:" description (Jan Beulich) - rename "Taken-from:" to "Origin:" (Jan Beulich) - add reviewers as recipients of patch (Jan Beulich) - style fixes (Roger Pau Monné, Jan Beulich) Signed-off-by: Juergen Gross --- docs/process/sending-patches.pandoc | 298 docs/process/tags.pandoc| 55 - 2 files changed, 298 insertions(+), 55 deletions(-) create mode 100644 docs/process/sending-patches.pandoc delete mode 100644 docs/process/tags.pandoc diff --git a/docs/process/sending-patches.pandoc b/docs/process/sending-patches.pandoc new file mode 100644 index 00..2091037901 --- /dev/null +++ b/docs/process/sending-patches.pandoc @@ -0,0 +1,298 @@ +# How a proper patch should look like + +This is a brief description how a proper patch for the Xen project should +look like. Examples and tooling tips are not part of this document, those +can be found in the +[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches). + +## The patch subject + +The first line at the top of the patch should contain a short description of +what the patch does, and hints as to what code it touches. This line is used +as the **Subject** line of the mail when sending the patch. + +The hint which code is touched is usually in form of an abstract entity +(like e.g. `build` for the build system), or a component (like `tools` or +`iommu`). Further specification is possible via adding a sub-component with +a slash (e.g. `tools/xenstore`): + +: + +E.g.: + +xen/arm: increase memory banks number define value +tools/libxenevtchn: deduplicate xenevtchn_fd() +MAINTAINERS: update my email address +build: correct usage comments in Kbuild.include + +The description should give a rough hint *what* is done in the patch. + +The subject line should in general not exceed 80 characters. It must be +followed by a blank line. + +## The commit message + +The commit message is free text describing *why* the patch is done and +*how* the goal of the patch is achieved. A good commit message will describe +the current situation, the desired goal, and the way this goal is being +achieved. Parts of that can be omitted in obvious cases. + +In case additional changes are done in the patch (like e.g. cleanups), those +should be mentioned. + +When referencing other patches (e.g. `similar to patch xy ...`) those +patches should be referenced via their commit id (at least 12 digits) +and the patch subject, if the very same patch isn't referenced by the +`Fixes:` tag, too: + +Similar to commit 67d01cdb5518 ("x86: infrastructure to allow converting +certain indirect calls to direct ones") add ... + +The following ``git config`` settings can be used to add a pretty format for +outputting the above style in the ``git log`` or ``git show`` commands: + +[core] +abbrev = 12 +[pretty] +fixes = Fixes: %h (\"%s\") + +Lines in the commit message should not exceed 75 characters, except when +copying error output directly into the commit message. + +## Tags + +Tags are entries in the form + +Tag: something + +In general tags are added in chronological order. So a `Reviewed-by:` tag +should be added **after** the `Signed-off-by:` tag, as the review happened +after the patch was written. + +Do not split a tag across multiple lines, tags are exempt from the +"wrap at 75 columns" rule in order to simplify parsing scripts. + +### Origin: + +Xen has inherited some source files from other open source projects. In case +a patch modifying such an inherited file is taken from that project (maybe in +modified form), the `Origin:` tag specifies the source of the patch: + +Origin: + +E.g.: + +Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f093b08c47b3 + +All tags **above** the `Origin:` tag are from the original patch (which +should all be kept), while tags **after** `Origin:` are related to the +normal Xen patch process as described here. + +### Fixes: + +If your patch fixes a bug in a specific commit, e.g. you found an issue using +``git bisect``, please use the `Fixes:` tag with the first 12 characters of +the commit id, and the one line summary. + +Fixes: ("") + +E.g.: + +Fixes: 67d01cdb5518 ("x86: infrastructure to allow converting certain indirect calls to direct ones") + +### Backport: + +A backport tag is an optional tag in the commit message to request a +given commit to be backported to the released trees: + +
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
On 03.02.22 14:50, Jan Beulich wrote: > On 03.02.2022 13:48, Oleksandr Andrushchenko wrote: >> Hi, Jan! >> >> On 03.02.22 14:44, Jan Beulich wrote: >>> On 03.02.2022 13:36, Oleksandr Andrushchenko wrote: Hi, Bertrand! On 26.11.21 14:19, Oleksandr Andrushchenko wrote: > Hi, Bertrand! > > On 25.11.21 18:28, Bertrand Marquis wrote: >> Hi Oleksandr, >> >>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko >>> wrote: >>> >>> From: Oleksandr Andrushchenko >>> >>> Add relevant vpci register handlers when assigning PCI device to a >>> domain >>> and remove those when de-assigning. This allows having different >>> handlers for different domains, e.g. hwdom and other guests. >>> >>> Emulate guest BAR register values: this allows creating a guest view >>> of the registers and emulates size and properties probe as it is done >>> during PCI device enumeration by the guest. >>> >>> ROM BAR is only handled for the hardware domain and for guest domains >>> there is a stub: at the moment PCI expansion ROM handling is supported >>> for x86 only and it might not be used by other architectures without >>> emulating x86. Other use-cases may include using that expansion ROM >>> before >>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest >>> wants to use the ROM code which seems to be rare. >> In the generic code, bars for ioports are actually skipped (check code >> before >> in header.c, in case of ioports there is a continue) and no handler is >> registered for them. >> The consequence will be that a guest will access hardware when reading >> those BARs. > Yes, this seems to be a valid point So, with the approach we have developed these days we will ignore all writes and return ~0 for reads for all unhandled ops, e.g. those which do not have explicit register handlers employed. Thus, this case will fall into unhandled clause. >>> Except that I guess BARs are special in that reads may not return ~0, >>> or else the low bits carry a meaning we don't want to convey. Unused >>> BARs need to be hard-wired to 0, I think. >> So, you mean we should have 2 sets of BAR handlers for guests: >> 1. normal emulation (these are implemented in this patch) >> 2. all other BARs: read 0/ignore write for all other BARs, including ROM, IO >> etc. >> >> Is this what you mean? > I think that's what we're going to need, yes. Ok, then I'll stuff that into this patch v6 > Jan > Thank you, Oleksandr
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
On 03.02.2022 13:48, Oleksandr Andrushchenko wrote: > Hi, Jan! > > On 03.02.22 14:44, Jan Beulich wrote: >> On 03.02.2022 13:36, Oleksandr Andrushchenko wrote: >>> Hi, Bertrand! >>> >>> On 26.11.21 14:19, Oleksandr Andrushchenko wrote: Hi, Bertrand! On 25.11.21 18:28, Bertrand Marquis wrote: > Hi Oleksandr, > >> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko >> wrote: >> >> From: Oleksandr Andrushchenko >> >> Add relevant vpci register handlers when assigning PCI device to a domain >> and remove those when de-assigning. This allows having different >> handlers for different domains, e.g. hwdom and other guests. >> >> Emulate guest BAR register values: this allows creating a guest view >> of the registers and emulates size and properties probe as it is done >> during PCI device enumeration by the guest. >> >> ROM BAR is only handled for the hardware domain and for guest domains >> there is a stub: at the moment PCI expansion ROM handling is supported >> for x86 only and it might not be used by other architectures without >> emulating x86. Other use-cases may include using that expansion ROM >> before >> Xen boots, hence no emulation is needed in Xen itself. Or when a guest >> wants to use the ROM code which seems to be rare. > In the generic code, bars for ioports are actually skipped (check code > before > in header.c, in case of ioports there is a continue) and no handler is > registered for them. > The consequence will be that a guest will access hardware when reading > those BARs. Yes, this seems to be a valid point >>> So, with the approach we have developed these days we will ignore all writes >>> and return ~0 for reads for all unhandled ops, e.g. those which do not have >>> explicit >>> register handlers employed. Thus, this case will fall into unhandled clause. >> Except that I guess BARs are special in that reads may not return ~0, >> or else the low bits carry a meaning we don't want to convey. Unused >> BARs need to be hard-wired to 0, I think. > So, you mean we should have 2 sets of BAR handlers for guests: > 1. normal emulation (these are implemented in this patch) > 2. all other BARs: read 0/ignore write for all other BARs, including ROM, IO > etc. > > Is this what you mean? I think that's what we're going to need, yes. Jan
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
Hi, Jan! On 03.02.22 14:44, Jan Beulich wrote: > On 03.02.2022 13:36, Oleksandr Andrushchenko wrote: >> Hi, Bertrand! >> >> On 26.11.21 14:19, Oleksandr Andrushchenko wrote: >>> Hi, Bertrand! >>> >>> On 25.11.21 18:28, Bertrand Marquis wrote: Hi Oleksandr, > On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko > wrote: > > From: Oleksandr Andrushchenko > > Add relevant vpci register handlers when assigning PCI device to a domain > and remove those when de-assigning. This allows having different > handlers for different domains, e.g. hwdom and other guests. > > Emulate guest BAR register values: this allows creating a guest view > of the registers and emulates size and properties probe as it is done > during PCI device enumeration by the guest. > > ROM BAR is only handled for the hardware domain and for guest domains > there is a stub: at the moment PCI expansion ROM handling is supported > for x86 only and it might not be used by other architectures without > emulating x86. Other use-cases may include using that expansion ROM before > Xen boots, hence no emulation is needed in Xen itself. Or when a guest > wants to use the ROM code which seems to be rare. In the generic code, bars for ioports are actually skipped (check code before in header.c, in case of ioports there is a continue) and no handler is registered for them. The consequence will be that a guest will access hardware when reading those BARs. >>> Yes, this seems to be a valid point >> So, with the approach we have developed these days we will ignore all writes >> and return ~0 for reads for all unhandled ops, e.g. those which do not have >> explicit >> register handlers employed. Thus, this case will fall into unhandled clause. > Except that I guess BARs are special in that reads may not return ~0, > or else the low bits carry a meaning we don't want to convey. Unused > BARs need to be hard-wired to 0, I think. So, you mean we should have 2 sets of BAR handlers for guests: 1. normal emulation (these are implemented in this patch) 2. all other BARs: read 0/ignore write for all other BARs, including ROM, IO etc. Is this what you mean? > Jan > Thank you, Oleksandr
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
Hi, Roger! >> Also memory decoding needs to be initially disabled when used by >> guests, in order to prevent the BAR being placed on top of a RAM >> region. The guest physmap will be different from the host one, so it's >> possible for BARs to end up placed on top of RAM regions initially >> until the firmware or OS places them at a suitable address. > Agree, memory decoding must be disabled Isn't it already achieved by the toolstack resetting the PCI device while assigning it to a guest? Thank you, Oleksandr
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
On 03.02.2022 13:36, Oleksandr Andrushchenko wrote: > Hi, Bertrand! > > On 26.11.21 14:19, Oleksandr Andrushchenko wrote: >> Hi, Bertrand! >> >> On 25.11.21 18:28, Bertrand Marquis wrote: >>> Hi Oleksandr, >>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko Add relevant vpci register handlers when assigning PCI device to a domain and remove those when de-assigning. This allows having different handlers for different domains, e.g. hwdom and other guests. Emulate guest BAR register values: this allows creating a guest view of the registers and emulates size and properties probe as it is done during PCI device enumeration by the guest. ROM BAR is only handled for the hardware domain and for guest domains there is a stub: at the moment PCI expansion ROM handling is supported for x86 only and it might not be used by other architectures without emulating x86. Other use-cases may include using that expansion ROM before Xen boots, hence no emulation is needed in Xen itself. Or when a guest wants to use the ROM code which seems to be rare. >>> In the generic code, bars for ioports are actually skipped (check code >>> before >>> in header.c, in case of ioports there is a continue) and no handler is >>> registered for them. >>> The consequence will be that a guest will access hardware when reading >>> those BARs. >> Yes, this seems to be a valid point > So, with the approach we have developed these days we will ignore all writes > and return ~0 for reads for all unhandled ops, e.g. those which do not have > explicit > register handlers employed. Thus, this case will fall into unhandled clause. Except that I guess BARs are special in that reads may not return ~0, or else the low bits carry a meaning we don't want to convey. Unused BARs need to be hard-wired to 0, I think. Jan
Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
Hi, Bertrand! On 26.11.21 14:19, Oleksandr Andrushchenko wrote: > Hi, Bertrand! > > On 25.11.21 18:28, Bertrand Marquis wrote: >> Hi Oleksandr, >> >>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko >>> wrote: >>> >>> From: Oleksandr Andrushchenko >>> >>> Add relevant vpci register handlers when assigning PCI device to a domain >>> and remove those when de-assigning. This allows having different >>> handlers for different domains, e.g. hwdom and other guests. >>> >>> Emulate guest BAR register values: this allows creating a guest view >>> of the registers and emulates size and properties probe as it is done >>> during PCI device enumeration by the guest. >>> >>> ROM BAR is only handled for the hardware domain and for guest domains >>> there is a stub: at the moment PCI expansion ROM handling is supported >>> for x86 only and it might not be used by other architectures without >>> emulating x86. Other use-cases may include using that expansion ROM before >>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest >>> wants to use the ROM code which seems to be rare. >> In the generic code, bars for ioports are actually skipped (check code before >> in header.c, in case of ioports there is a continue) and no handler is >> registered for them. >> The consequence will be that a guest will access hardware when reading those >> BARs. > Yes, this seems to be a valid point So, with the approach we have developed these days we will ignore all writes and return ~0 for reads for all unhandled ops, e.g. those which do not have explicit register handlers employed. Thus, this case will fall into unhandled clause. Thank you, Oleksandr
Re: [PATCH] xen: add option to disable GNTTABOP_transfer
On 03.02.2022 11:55, Juergen Gross wrote: > On 03.02.22 10:10, Jan Beulich wrote: >> On 01.02.2022 10:02, Juergen Gross wrote: >>> The grant table operation GNTTABOP_transfer is meant to be used in >>> PV device backends, and it hasn't been used in Linux since the old >>> Xen-o-Linux days. >> >> Kind of unusual spelling of XenoLinux ;-) >> >>> --- a/xen/common/grant_table.c >>> +++ b/xen/common/grant_table.c >>> @@ -181,6 +181,7 @@ static int parse_gnttab_max_maptrack_frames(const char >>> *arg) >>> >>> unsigned int __read_mostly opt_gnttab_max_version = GNTTAB_MAX_VERSION; >>> static bool __read_mostly opt_transitive_grants = true; >>> +static bool __read_mostly opt_grant_transfer = true; >> >> If this was conditional upon PV (with a #define to false in the >> opposite case), it could be __ro_after_init right away, while at >> the same time allowing the compiler to eliminate gnttab_transfer(). > > Nice idea. The other option would be to put all (or most) of > gnttab_transfer() in a "#ifdef CONFIG_PV" section, allowing to > remove the "#ifdef CONFIG_X86" parts in it, too. Yes, sure. The downside being that then this code won't be compile- tested anymore in !PV builds. Yet keeping code visible to compilers is what we aim for elsewhere by preferring if(IS_ENABLED(...)) over #ifdef where possible. Jan
Re: [PATCH] tools/libxl: don't allow IOMMU usage with PoD
On 03.02.2022 12:06, Roger Pau Monne wrote: > Prevent libxl from creating guests that attempts to use PoD together > with an IOMMU, even if no devices are actually assigned. > > While the hypervisor could support using PoD together with an IOMMU as > long as no devices are assigned, such usage seems doubtful. There's no > guarantee the guest has ballooned down enough memory for PoD to no > longer be active, and thus a later assignment of a PCI device to such > domain could fail. That's not a precise description of the constraint: The guest ballooning down enough only means entries == cache, but for device assignment we need entries == 0 (and a guarantee that no new entries can appear, but I think this is already the case once a guest was launched). (FWIW the wording in code comment and log message read fine to me.) Jan
Re: [PATCH] docs: document patch rules
On 03.02.2022 11:31, Juergen Gross wrote: > On 03.02.22 11:07, Jan Beulich wrote: >> On 02.02.2022 12:44, Juergen Gross wrote: >>> +## The commit message >>> + >>> +The commit message is free text describing *why* the patch is done and >>> +*how* the goal of the patch is achieved. A good commit message will >>> describe >>> +the current situation, the desired goal, and the way this goal is being >>> +achieved. Parts of that can be omitted in obvious cases. >>> + >>> +In case additional changes are done in the patch (like e.g. cleanups), >>> those >>> +should be mentioned. >>> + >>> +When referencing other patches (e.g. `patch xy introduced a bug ...`) those >>> +patches should be referenced via their commit id (at least 12 digits) and >>> the >>> +patch subject: >>> + >>> +Commit 67d01cdb5518 ("x86: infrastructure to allow converting certain >>> +indirect calls to direct ones") introduced a bug ... >> >> I think this should have a reference to the Fixes: tag, as generally it >> makes the text less convoluted if it references such a tag rather than >> spelling out hash and title a 2nd time. > > I think this depends on the use case. If the cited patch is in the > Fixes: tag I agree. Sometimes a patch is cited for other reasons, e.g. > when adding a fix similar to the one in the cited patch. I always like > to have the commit id in such a case. > > Are you fine with me rephrasing the text to: > > When referencing other patches (e.g. `similar to patch xy ...`) those > patches should be referenced via their commit id (at least 12 digits) > and the patch subject, if the very same patch isn't referenced by the > `Fixes:` tag, too: Sounds good to me. >>> +### Reviewed-by: >>> + >>> +A `Reviewed-by:` tag can only be given by a reviewer of the patch. With >>> +responding to a sent patch adding the `Reviewed-by:` tag the reviewer >>> +(which can be anybody) confirms to have looked thoroughly at the patch and >>> +didn't find any issue (being it technical, legal or formal ones). If the >>> +review is covering only some parts of the patch, those parts can optionally >>> +be specified (multiple areas can be covered with multiple `Reviewed-by:` >>> +tags). >> >> I'd prefer if the comma separated form was also explicitly mentioned >> (and hence permitted) here. I'd even go as far as suggesting that this >> should be the preferred form as long as line length constraints permit. > > OTOH this will make automated parsing harder. > > I'm open for both variants, just wanted to mention why I've chosen the > multiline form initially. Unless commas are expected to be part of such "identifiers", I don't think I see how parsing would become meaningfully harder. When the email address is wanted, parsing would strip # and everything following anyway. Jan
[qemu-mainline test] 167991: tolerable FAIL - PUSHED
flight 167991 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/167991/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds18 guest-start/debian.repeat fail REGR. vs. 167987 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 167987 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167987 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167987 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167987 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 167987 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 167987 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167987 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167987 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass version targeted for testing: qemuuf7c0e223acd5021d03736644cc0abf3501003820 baseline version: qemuu47cc1a3655135b89fa75c2824fbddd29df874612 Last test of basis 167987 2022-02-02 13:39:33 Z0 days Testing same since 167991 2022-02-02 23:39:43 Z0 days1 attempts ---
[PATCH] tools/libxl: don't allow IOMMU usage with PoD
Prevent libxl from creating guests that attempts to use PoD together with an IOMMU, even if no devices are actually assigned. While the hypervisor could support using PoD together with an IOMMU as long as no devices are assigned, such usage seems doubtful. There's no guarantee the guest has ballooned down enough memory for PoD to no longer be active, and thus a later assignment of a PCI device to such domain could fail. Preventing the usage of PoD together with an IOMMU at guest creation avoids having to add checks for active PoD entries in the device assignment paths. Signed-off-by: Roger Pau Monné --- Cc: Jan Beulich --- tools/libs/light/libxl_create.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c index d7a40d7550..7499922088 100644 --- a/tools/libs/light/libxl_create.c +++ b/tools/libs/light/libxl_create.c @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc, pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) && (d_config->b_info.target_memkb < d_config->b_info.max_memkb); -/* We cannot have PoD and PCI device assignment at the same time - * for HVM guest. It was reported that IOMMU cannot work with PoD - * enabled because it needs to populated entire page table for - * guest. To stay on the safe side, we disable PCI device - * assignment when PoD is enabled. +/* We don't support having PoD and an IOMMU at the same time for HVM + * guests. An active IOMMU cannot work with PoD because it needs a fully + * populated page-table. Prevent PoD usage if the domain has an IOMMU + * assigned, even if not active. */ if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV && -d_config->num_pcidevs && pod_enabled) { +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED && +pod_enabled) { ret = ERROR_INVAL; -LOGD(ERROR, domid, - "PCI device assignment for HVM guest failed due to PoD enabled"); +LOGD(ERROR, domid, "IOMMU not supported together with PoD"); goto error_out; } -- 2.34.1
Re: [PATCH] xen: add option to disable GNTTABOP_transfer
On 03.02.22 10:10, Jan Beulich wrote: On 01.02.2022 10:02, Juergen Gross wrote: The grant table operation GNTTABOP_transfer is meant to be used in PV device backends, and it hasn't been used in Linux since the old Xen-o-Linux days. Kind of unusual spelling of XenoLinux ;-) --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -181,6 +181,7 @@ static int parse_gnttab_max_maptrack_frames(const char *arg) unsigned int __read_mostly opt_gnttab_max_version = GNTTAB_MAX_VERSION; static bool __read_mostly opt_transitive_grants = true; +static bool __read_mostly opt_grant_transfer = true; If this was conditional upon PV (with a #define to false in the opposite case), it could be __ro_after_init right away, while at the same time allowing the compiler to eliminate gnttab_transfer(). Nice idea. The other option would be to put all (or most) of gnttab_transfer() in a "#ifdef CONFIG_PV" section, allowing to remove the "#ifdef CONFIG_X86" parts in it, too. @@ -204,6 +205,8 @@ static int __init parse_gnttab(const char *s) } else if ( (val = parse_boolean("transitive", s, ss)) >= 0 ) opt_transitive_grants = val; +else if ( (val = parse_boolean("transfer", s, ss)) >= 0 ) +opt_grant_transfer = val; else rc = -EINVAL; To possibly save a further roundtrip: If the PV dependency was added above, I'd like to ask to follow the model of parse_iommu_param() here and use "#ifndef opt_grant_transfer" around the added code in favor of "#ifdef CONFIG_PV". Okay. @@ -2233,6 +2236,9 @@ gnttab_transfer( unsigned int max_bitsize; struct active_grant_entry *act; +if ( !opt_grant_transfer ) +return -ENOSYS; -EOPNOTSUPP please. Yes, that's better. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH] docs: document patch rules
On 03.02.22 11:07, Jan Beulich wrote: On 02.02.2022 12:44, Juergen Gross wrote: --- /dev/null +++ b/docs/process/sending-patches.pandoc @@ -0,0 +1,284 @@ +# How a proper patch should look like + +This is a brief description how a proper patch for the Xen project should +look like. Examples and tooling tips are not part of this document, those +can be found in the +[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches). + +## The patch subject + +The first line at the top of the patch should contain a short description of +what the patch does, and hints as to what code it touches. This line is used +as the **Subject** line of the mail when sending the patch. + +The hint which code is touched us usually in form of a relative path inside Nit: s/ us / is / +the Xen git repository, where obvious directories can be omitted or replaced +by abbreviations, or it can be a single word describing the topic: + +: + +E.g.: + +xen/arm: increase memory banks number define value +tools/libs/evtchn: Deduplicate xenevtchn_fd() +MAINTAINERS: update my email address +build: correct usage comments in Kbuild.include I realize there's "usually" in the wording, but I'm still uncertain in how far we want to suggest paths here. I have to admit that I never really liked overly long prefixes like the "tools/libs/evtchn:" you give as example. The prefix should be sufficiently unambiguous, yes, but in this particular case "libs/evtchn:" or "libxenevtchn:" would be enough to achieve that. I'd prefer if the tag was described as specifying a (sub-)component (or other abstract entity, like is the case for your "build:" example). Okay. +The description should give a rough hint *what* is done in the patch. + +The subject line should in general not exceed 80 characters. It must be +followed by a blank line. + +## The commit message + +The commit message is free text describing *why* the patch is done and +*how* the goal of the patch is achieved. A good commit message will describe +the current situation, the desired goal, and the way this goal is being +achieved. Parts of that can be omitted in obvious cases. + +In case additional changes are done in the patch (like e.g. cleanups), those +should be mentioned. + +When referencing other patches (e.g. `patch xy introduced a bug ...`) those +patches should be referenced via their commit id (at least 12 digits) and the +patch subject: + +Commit 67d01cdb5518 ("x86: infrastructure to allow converting certain +indirect calls to direct ones") introduced a bug ... I think this should have a reference to the Fixes: tag, as generally it makes the text less convoluted if it references such a tag rather than spelling out hash and title a 2nd time. I think this depends on the use case. If the cited patch is in the Fixes: tag I agree. Sometimes a patch is cited for other reasons, e.g. when adding a fix similar to the one in the cited patch. I always like to have the commit id in such a case. Are you fine with me rephrasing the text to: When referencing other patches (e.g. `similar to patch xy ...`) those patches should be referenced via their commit id (at least 12 digits) and the patch subject, if the very same patch isn't referenced by the `Fixes:` tag, too: +## Tags + +Tags are entries in the form + +Tag: something + +In general tags are added in chronological order. So a `Reviewed-by:` tag +should be added **after** the `Signed-off-by:` tag, as the review happened +after the patch was written. + +Do not split a tag across multiple lines, tags are exempt from the +"wrap at 75 columns" rule in order to simplify parsing scripts. + +### Taken-from: + +Xen has inherited some source files from other open source projects. In case +a patch modifying such an inherited file is taken from that project (maybe in +modified form), the `Taken-from:` tag specifies the source of the patch: + +Taken-from: + +E.g.: + +Taken-from: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f093b08c47b3 + +All tags **above** the `Taken-from:` tag are from the original patch (which +should all be kept), while tags **after** `Taken-from:` are related to the +normal Xen patch process as described here. While I don't mind it becoming "Taken-from:", I'd like to put up for consideration the (slightly shorter) alternative of "Origin:". Fine with me. +### Reported-by: + +This optional tag can be used to give credit to someone reporting an issue. +It is in the format: + +Reported-by: name + +E.g.: + +Reported-by: Jane Doe + +As the email address will be made public via git, the reporter of an issue +should be asked whether he/she is fine with being mentioned in the patch. + +### Suggested-by: + +This optional tag can be used to give credit to someone having suggested the +solution the patch is implementing. It is in the format: + +Suggested-by: name + +E.g.: + +Suggested-by: Jane Doe + +As the email address will
Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests
On Thu, Feb 03, 2022 at 11:20:15AM +0100, Jan Beulich wrote: > On 03.02.2022 10:52, Roger Pau Monné wrote: > > On Thu, Feb 03, 2022 at 10:21:54AM +0100, Jan Beulich wrote: > >> On 03.02.2022 10:04, Roger Pau Monné wrote: > >>> On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote: > On 02.02.2022 17:13, Roger Pau Monné wrote: > > On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote: > >> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d, > >> > >> ASSERT( pod_target >= p2m->pod.count ); > >> > >> -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/); > >> +if ( has_arch_pdevs(d) || cache_flush_permitted(d) ) > > > > Is it possible to have cache flush allowed without any PCI device > > assigned? AFAICT the iomem/ioport_caps would only get setup when there > > are device passed through? > > One can assign MMIO or ports to a guest the raw way. That's not > secure, but functionally explicitly permitted. > > > TBH I would be fine if we just say that PoD cannot be used in > > conjunction with an IOMMU, and just check for is_iommu_enable(d) here. > > > > I understand it's technically possible for PoD to be used together > > with a domain that will later get a device passed through once PoD is > > no longer in use, but I doubt there's much value in supporting that > > use case, and I fear we might be introducing corner cases that could > > create issues in the future. Overall I think it would be safer to just > > disable PoD in conjunction with an IOMMU. > > I consider it wrong to put in place such a restriction, but I could > perhaps accept you and Andrew thinking this way if this was the only > aspect playing into here. However, this would then want an equivalent > tools side check, and while hunting down where to make the change as > done here, I wasn't able to figure out where that alternative > adjustment would need doing. Hence I would possibly(!) buy into this > only if someone else took care of doing so properly in the tool stack > (including the emission of a sensible error message). > >>> > >>> What about the (completely untested) chunk below: > >>> > >>> diff --git a/tools/libs/light/libxl_create.c > >>> b/tools/libs/light/libxl_create.c > >>> index d7a40d7550..e585ef4c5c 100644 > >>> --- a/tools/libs/light/libxl_create.c > >>> +++ b/tools/libs/light/libxl_create.c > >>> @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc, > >>> pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) && > >>> (d_config->b_info.target_memkb < d_config->b_info.max_memkb); > >>> > >>> -/* We cannot have PoD and PCI device assignment at the same time > >>> +/* We cannot have PoD and an active IOMMU at the same time > >>> * for HVM guest. It was reported that IOMMU cannot work with PoD > >>> * enabled because it needs to populated entire page table for > >>> - * guest. To stay on the safe side, we disable PCI device > >>> - * assignment when PoD is enabled. > >>> + * guest. > >>> */ > >>> if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV && > >>> -d_config->num_pcidevs && pod_enabled) { > >>> +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED && > >>> +pod_enabled) { > >>> ret = ERROR_INVAL; > >>> -LOGD(ERROR, domid, > >>> - "PCI device assignment for HVM guest failed due to PoD > >>> enabled"); > >>> +LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD"); > >>> goto error_out; > >>> } > >> > >> Perhaps. Seeing this I actually recall coming across this check during > >> my investigation. Not changing it along the lines of what you do was > >> then really more because of me not being convinced of the extra > >> restriction; I clearly misremembered when writing the earlier reply. > >> If we were to do what you suggest, I'd like to ask that the comment be > >> changed differently, though: "We cannot ..." then isn't really true > >> anymore. We choose not to permit this mode; "cannot" only applies to > >> actual device assignment (and of course only as long as there aren't > >> restartable IOMMU faults). > > > > I'm fine with an adjusted wording here. This was mostly a placement > > suggestion, but I didn't gave much thought to the error message. > > FTAOD: Are you going to transform this into a proper patch then? While > I wouldn't object to such a behavioral change, I also wouldn't want to > put my name under it. But if it went in, I think I might be able to > then drop the libxl adjustment from my patch. Oh, I somewhat assumed you would integrate this check into the patch. I can send a standalone patch myself if that's your preference. Let me do that now. Roger.
Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests
On 03.02.2022 10:52, Roger Pau Monné wrote: > On Thu, Feb 03, 2022 at 10:21:54AM +0100, Jan Beulich wrote: >> On 03.02.2022 10:04, Roger Pau Monné wrote: >>> On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote: On 02.02.2022 17:13, Roger Pau Monné wrote: > On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote: >> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d, >> >> ASSERT( pod_target >= p2m->pod.count ); >> >> -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/); >> +if ( has_arch_pdevs(d) || cache_flush_permitted(d) ) > > Is it possible to have cache flush allowed without any PCI device > assigned? AFAICT the iomem/ioport_caps would only get setup when there > are device passed through? One can assign MMIO or ports to a guest the raw way. That's not secure, but functionally explicitly permitted. > TBH I would be fine if we just say that PoD cannot be used in > conjunction with an IOMMU, and just check for is_iommu_enable(d) here. > > I understand it's technically possible for PoD to be used together > with a domain that will later get a device passed through once PoD is > no longer in use, but I doubt there's much value in supporting that > use case, and I fear we might be introducing corner cases that could > create issues in the future. Overall I think it would be safer to just > disable PoD in conjunction with an IOMMU. I consider it wrong to put in place such a restriction, but I could perhaps accept you and Andrew thinking this way if this was the only aspect playing into here. However, this would then want an equivalent tools side check, and while hunting down where to make the change as done here, I wasn't able to figure out where that alternative adjustment would need doing. Hence I would possibly(!) buy into this only if someone else took care of doing so properly in the tool stack (including the emission of a sensible error message). >>> >>> What about the (completely untested) chunk below: >>> >>> diff --git a/tools/libs/light/libxl_create.c >>> b/tools/libs/light/libxl_create.c >>> index d7a40d7550..e585ef4c5c 100644 >>> --- a/tools/libs/light/libxl_create.c >>> +++ b/tools/libs/light/libxl_create.c >>> @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc, >>> pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) && >>> (d_config->b_info.target_memkb < d_config->b_info.max_memkb); >>> >>> -/* We cannot have PoD and PCI device assignment at the same time >>> +/* We cannot have PoD and an active IOMMU at the same time >>> * for HVM guest. It was reported that IOMMU cannot work with PoD >>> * enabled because it needs to populated entire page table for >>> - * guest. To stay on the safe side, we disable PCI device >>> - * assignment when PoD is enabled. >>> + * guest. >>> */ >>> if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV && >>> -d_config->num_pcidevs && pod_enabled) { >>> +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED && >>> +pod_enabled) { >>> ret = ERROR_INVAL; >>> -LOGD(ERROR, domid, >>> - "PCI device assignment for HVM guest failed due to PoD >>> enabled"); >>> +LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD"); >>> goto error_out; >>> } >> >> Perhaps. Seeing this I actually recall coming across this check during >> my investigation. Not changing it along the lines of what you do was >> then really more because of me not being convinced of the extra >> restriction; I clearly misremembered when writing the earlier reply. >> If we were to do what you suggest, I'd like to ask that the comment be >> changed differently, though: "We cannot ..." then isn't really true >> anymore. We choose not to permit this mode; "cannot" only applies to >> actual device assignment (and of course only as long as there aren't >> restartable IOMMU faults). > > I'm fine with an adjusted wording here. This was mostly a placement > suggestion, but I didn't gave much thought to the error message. FTAOD: Are you going to transform this into a proper patch then? While I wouldn't object to such a behavioral change, I also wouldn't want to put my name under it. But if it went in, I think I might be able to then drop the libxl adjustment from my patch. Jan
Re: [PATCH] docs: document patch rules
On 02.02.2022 12:44, Juergen Gross wrote: > --- /dev/null > +++ b/docs/process/sending-patches.pandoc > @@ -0,0 +1,284 @@ > +# How a proper patch should look like > + > +This is a brief description how a proper patch for the Xen project should > +look like. Examples and tooling tips are not part of this document, those > +can be found in the > +[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches). > + > +## The patch subject > + > +The first line at the top of the patch should contain a short description of > +what the patch does, and hints as to what code it touches. This line is used > +as the **Subject** line of the mail when sending the patch. > + > +The hint which code is touched us usually in form of a relative path inside Nit: s/ us / is / > +the Xen git repository, where obvious directories can be omitted or replaced > +by abbreviations, or it can be a single word describing the topic: > + > +: > + > +E.g.: > + > +xen/arm: increase memory banks number define value > +tools/libs/evtchn: Deduplicate xenevtchn_fd() > +MAINTAINERS: update my email address > +build: correct usage comments in Kbuild.include I realize there's "usually" in the wording, but I'm still uncertain in how far we want to suggest paths here. I have to admit that I never really liked overly long prefixes like the "tools/libs/evtchn:" you give as example. The prefix should be sufficiently unambiguous, yes, but in this particular case "libs/evtchn:" or "libxenevtchn:" would be enough to achieve that. I'd prefer if the tag was described as specifying a (sub-)component (or other abstract entity, like is the case for your "build:" example). > +The description should give a rough hint *what* is done in the patch. > + > +The subject line should in general not exceed 80 characters. It must be > +followed by a blank line. > + > +## The commit message > + > +The commit message is free text describing *why* the patch is done and > +*how* the goal of the patch is achieved. A good commit message will describe > +the current situation, the desired goal, and the way this goal is being > +achieved. Parts of that can be omitted in obvious cases. > + > +In case additional changes are done in the patch (like e.g. cleanups), those > +should be mentioned. > + > +When referencing other patches (e.g. `patch xy introduced a bug ...`) those > +patches should be referenced via their commit id (at least 12 digits) and the > +patch subject: > + > +Commit 67d01cdb5518 ("x86: infrastructure to allow converting certain > +indirect calls to direct ones") introduced a bug ... I think this should have a reference to the Fixes: tag, as generally it makes the text less convoluted if it references such a tag rather than spelling out hash and title a 2nd time. > +## Tags > + > +Tags are entries in the form > + > +Tag: something > + > +In general tags are added in chronological order. So a `Reviewed-by:` tag > +should be added **after** the `Signed-off-by:` tag, as the review happened > +after the patch was written. > + > +Do not split a tag across multiple lines, tags are exempt from the > +"wrap at 75 columns" rule in order to simplify parsing scripts. > + > +### Taken-from: > + > +Xen has inherited some source files from other open source projects. In case > +a patch modifying such an inherited file is taken from that project (maybe in > +modified form), the `Taken-from:` tag specifies the source of the patch: > + > +Taken-from: > + > +E.g.: > + > +Taken-from: > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f093b08c47b3 > + > +All tags **above** the `Taken-from:` tag are from the original patch (which > +should all be kept), while tags **after** `Taken-from:` are related to the > +normal Xen patch process as described here. While I don't mind it becoming "Taken-from:", I'd like to put up for consideration the (slightly shorter) alternative of "Origin:". > +### Reported-by: > + > +This optional tag can be used to give credit to someone reporting an issue. > +It is in the format: > + > +Reported-by: name > + > +E.g.: > + > +Reported-by: Jane Doe > + > +As the email address will be made public via git, the reporter of an issue > +should be asked whether he/she is fine with being mentioned in the patch. > + > +### Suggested-by: > + > +This optional tag can be used to give credit to someone having suggested the > +solution the patch is implementing. It is in the format: > + > +Suggested-by: name > + > +E.g.: > + > +Suggested-by: Jane Doe > + > +As the email address will be made public via git, the reporter of an issue > +should be asked whether he/she is fine with being mentioned in the patch. Besides these two we've also been using Requested-by:, which I think in some cases conveys information more precisely than Suggested-by: (e.g. when some result was to be achieved without a solution or path there having been given). > +### Reviewed-by: > + > +A `Review
Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests
On Thu, Feb 03, 2022 at 10:21:54AM +0100, Jan Beulich wrote: > On 03.02.2022 10:04, Roger Pau Monné wrote: > > On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote: > >> On 02.02.2022 17:13, Roger Pau Monné wrote: > >>> On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote: > @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d, > > ASSERT( pod_target >= p2m->pod.count ); > > -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/); > +if ( has_arch_pdevs(d) || cache_flush_permitted(d) ) > >>> > >>> Is it possible to have cache flush allowed without any PCI device > >>> assigned? AFAICT the iomem/ioport_caps would only get setup when there > >>> are device passed through? > >> > >> One can assign MMIO or ports to a guest the raw way. That's not > >> secure, but functionally explicitly permitted. > >> > >>> TBH I would be fine if we just say that PoD cannot be used in > >>> conjunction with an IOMMU, and just check for is_iommu_enable(d) here. > >>> > >>> I understand it's technically possible for PoD to be used together > >>> with a domain that will later get a device passed through once PoD is > >>> no longer in use, but I doubt there's much value in supporting that > >>> use case, and I fear we might be introducing corner cases that could > >>> create issues in the future. Overall I think it would be safer to just > >>> disable PoD in conjunction with an IOMMU. > >> > >> I consider it wrong to put in place such a restriction, but I could > >> perhaps accept you and Andrew thinking this way if this was the only > >> aspect playing into here. However, this would then want an equivalent > >> tools side check, and while hunting down where to make the change as > >> done here, I wasn't able to figure out where that alternative > >> adjustment would need doing. Hence I would possibly(!) buy into this > >> only if someone else took care of doing so properly in the tool stack > >> (including the emission of a sensible error message). > > > > What about the (completely untested) chunk below: > > > > diff --git a/tools/libs/light/libxl_create.c > > b/tools/libs/light/libxl_create.c > > index d7a40d7550..e585ef4c5c 100644 > > --- a/tools/libs/light/libxl_create.c > > +++ b/tools/libs/light/libxl_create.c > > @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc, > > pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) && > > (d_config->b_info.target_memkb < d_config->b_info.max_memkb); > > > > -/* We cannot have PoD and PCI device assignment at the same time > > +/* We cannot have PoD and an active IOMMU at the same time > > * for HVM guest. It was reported that IOMMU cannot work with PoD > > * enabled because it needs to populated entire page table for > > - * guest. To stay on the safe side, we disable PCI device > > - * assignment when PoD is enabled. > > + * guest. > > */ > > if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV && > > -d_config->num_pcidevs && pod_enabled) { > > +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED && > > +pod_enabled) { > > ret = ERROR_INVAL; > > -LOGD(ERROR, domid, > > - "PCI device assignment for HVM guest failed due to PoD > > enabled"); > > +LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD"); > > goto error_out; > > } > > Perhaps. Seeing this I actually recall coming across this check during > my investigation. Not changing it along the lines of what you do was > then really more because of me not being convinced of the extra > restriction; I clearly misremembered when writing the earlier reply. > If we were to do what you suggest, I'd like to ask that the comment be > changed differently, though: "We cannot ..." then isn't really true > anymore. We choose not to permit this mode; "cannot" only applies to > actual device assignment (and of course only as long as there aren't > restartable IOMMU faults). I'm fine with an adjusted wording here. This was mostly a placement suggestion, but I didn't gave much thought to the error message. > >> Finally this still leaves out the "raw MMIO / ports" case mentioned > >> above. > > > > But the raw MMIO 'mode' doesn't care much about PoD, because if > > there's no PCI device assigned there's no IOMMU setup, and thus such > > raw MMIO regions (could?) belong to a device that's not constrained by > > the guest p2m anyway? > > Hmm, yes, true. > > --- a/xen/common/vm_event.c > +++ b/xen/common/vm_event.c > @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st > > rc = -EXDEV; > /* Disallow paging in a PoD guest */ > -if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) ) > +if ( p2m_pod_active(d) ) > >>> > >>> Isn't it fine to just check for entry_count like you suggest in the > >>> chan
Re: [PATCH] docs: document patch rules
On 03.02.22 10:36, Roger Pau Monné wrote: On Wed, Feb 02, 2022 at 12:44:48PM +0100, Juergen Gross wrote: Add a document to describe the rules for sending a proper patch. As it contains all the information already being present in docs/process/tags.pandoc remove that file. Signed-off-by: Juergen Gross --- docs/process/sending-patches.pandoc | 284 docs/process/tags.pandoc| 55 -- 2 files changed, 284 insertions(+), 55 deletions(-) create mode 100644 docs/process/sending-patches.pandoc delete mode 100644 docs/process/tags.pandoc diff --git a/docs/process/sending-patches.pandoc b/docs/process/sending-patches.pandoc new file mode 100644 index 00..4cfc6e1a5b --- /dev/null +++ b/docs/process/sending-patches.pandoc @@ -0,0 +1,284 @@ +# How a proper patch should look like + +This is a brief description how a proper patch for the Xen project should +look like. Examples and tooling tips are not part of this document, those +can be found in the +[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches). + +## The patch subject + +The first line at the top of the patch should contain a short description of +what the patch does, and hints as to what code it touches. This line is used +as the **Subject** line of the mail when sending the patch. + +The hint which code is touched us usually in form of a relative path inside +the Xen git repository, where obvious directories can be omitted or replaced +by abbreviations, or it can be a single word describing the topic: + +: I would use maybe instead of path, to explicitly note this is not usually a real path inside the repo. Good idea. + +E.g.: + +xen/arm: increase memory banks number define value +tools/libs/evtchn: Deduplicate xenevtchn_fd() Mostly a nit, but since this document is about style: I wouldn't recommend using a capital letter after ':' by default. The above line should instead be: tools/libs/evtchn: deduplicate xenevtchn_fd() Yes. +MAINTAINERS: update my email address +build: correct usage comments in Kbuild.include + +The description should give a rough hint *what* is done in the patch. + +The subject line should in general not exceed 80 characters. It must be +followed by a blank line. + +## The commit message + +The commit message is free text describing *why* the patch is done and +*how* the goal of the patch is achieved. A good commit message will describe +the current situation, the desired goal, and the way this goal is being +achieved. Parts of that can be omitted in obvious cases. + +In case additional changes are done in the patch (like e.g. cleanups), those +should be mentioned. + +When referencing other patches (e.g. `patch xy introduced a bug ...`) those +patches should be referenced via their commit id (at least 12 digits) and the +patch subject: + +Commit 67d01cdb5518 ("x86: infrastructure to allow converting certain +indirect calls to direct ones") introduced a bug ... + +The following ``git config`` settings can be used to add a pretty format for +outputting the above style in the ``git log`` or ``git show`` commands: + +[core] +abbrev = 12 +[pretty] +fixes = Fixes: %h (\"%s\") + +Lines in the commit message should not exceed 75 characters, except when +copying error output directly into the commit message. + +## Tags + +Tags are entries in the form + +Tag: something + +In general tags are added in chronological order. So a `Reviewed-by:` tag +should be added **after** the `Signed-off-by:` tag, as the review happened +after the patch was written. + +Do not split a tag across multiple lines, tags are exempt from the +"wrap at 75 columns" rule in order to simplify parsing scripts. + +### Taken-from: + +Xen has inherited some source files from other open source projects. In case +a patch modifying such an inherited file is taken from that project (maybe in +modified form), the `Taken-from:` tag specifies the source of the patch: + +Taken-from: + +E.g.: + +Taken-from: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f093b08c47b3 + +All tags **above** the `Taken-from:` tag are from the original patch (which +should all be kept), while tags **after** `Taken-from:` are related to the +normal Xen patch process as described here. + +### Fixes: + +If your patch fixes a bug in a specific commit, e.g. you found an issue using +``git bisect``, please use the `Fixes:` tag with the first 12 characters of +the commit id, and the one line summary. + +Fixes: ("") + +E.g.: + +Fixes: 67d01cdb5518 ("x86: infrastructure to allow converting certain indirect calls to direct ones") + +### Backport: + +A backport tag is an optional tag in the commit message to request a +given commit to be backported to the released trees: + +Backport: [# ] So we already had a documented usage of '#' in tags, which I think should make it a better candidat
Re: [PATCH] docs: document patch rules
On Wed, Feb 02, 2022 at 12:44:48PM +0100, Juergen Gross wrote: > Add a document to describe the rules for sending a proper patch. > > As it contains all the information already being present in > docs/process/tags.pandoc remove that file. > > Signed-off-by: Juergen Gross > --- > docs/process/sending-patches.pandoc | 284 > docs/process/tags.pandoc| 55 -- > 2 files changed, 284 insertions(+), 55 deletions(-) > create mode 100644 docs/process/sending-patches.pandoc > delete mode 100644 docs/process/tags.pandoc > > diff --git a/docs/process/sending-patches.pandoc > b/docs/process/sending-patches.pandoc > new file mode 100644 > index 00..4cfc6e1a5b > --- /dev/null > +++ b/docs/process/sending-patches.pandoc > @@ -0,0 +1,284 @@ > +# How a proper patch should look like > + > +This is a brief description how a proper patch for the Xen project should > +look like. Examples and tooling tips are not part of this document, those > +can be found in the > +[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches). > + > +## The patch subject > + > +The first line at the top of the patch should contain a short description of > +what the patch does, and hints as to what code it touches. This line is used > +as the **Subject** line of the mail when sending the patch. > + > +The hint which code is touched us usually in form of a relative path inside > +the Xen git repository, where obvious directories can be omitted or replaced > +by abbreviations, or it can be a single word describing the topic: > + > +: I would use maybe instead of path, to explicitly note this is not usually a real path inside the repo. > + > +E.g.: > + > +xen/arm: increase memory banks number define value > +tools/libs/evtchn: Deduplicate xenevtchn_fd() Mostly a nit, but since this document is about style: I wouldn't recommend using a capital letter after ':' by default. The above line should instead be: tools/libs/evtchn: deduplicate xenevtchn_fd() > +MAINTAINERS: update my email address > +build: correct usage comments in Kbuild.include > + > +The description should give a rough hint *what* is done in the patch. > + > +The subject line should in general not exceed 80 characters. It must be > +followed by a blank line. > + > +## The commit message > + > +The commit message is free text describing *why* the patch is done and > +*how* the goal of the patch is achieved. A good commit message will describe > +the current situation, the desired goal, and the way this goal is being > +achieved. Parts of that can be omitted in obvious cases. > + > +In case additional changes are done in the patch (like e.g. cleanups), those > +should be mentioned. > + > +When referencing other patches (e.g. `patch xy introduced a bug ...`) those > +patches should be referenced via their commit id (at least 12 digits) and the > +patch subject: > + > +Commit 67d01cdb5518 ("x86: infrastructure to allow converting certain > +indirect calls to direct ones") introduced a bug ... > + > +The following ``git config`` settings can be used to add a pretty format for > +outputting the above style in the ``git log`` or ``git show`` commands: > + > +[core] > +abbrev = 12 > +[pretty] > +fixes = Fixes: %h (\"%s\") > + > +Lines in the commit message should not exceed 75 characters, except when > +copying error output directly into the commit message. > + > +## Tags > + > +Tags are entries in the form > + > +Tag: something > + > +In general tags are added in chronological order. So a `Reviewed-by:` tag > +should be added **after** the `Signed-off-by:` tag, as the review happened > +after the patch was written. > + > +Do not split a tag across multiple lines, tags are exempt from the > +"wrap at 75 columns" rule in order to simplify parsing scripts. > + > +### Taken-from: > + > +Xen has inherited some source files from other open source projects. In case > +a patch modifying such an inherited file is taken from that project (maybe in > +modified form), the `Taken-from:` tag specifies the source of the patch: > + > +Taken-from: > + > +E.g.: > + > +Taken-from: > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f093b08c47b3 > + > +All tags **above** the `Taken-from:` tag are from the original patch (which > +should all be kept), while tags **after** `Taken-from:` are related to the > +normal Xen patch process as described here. > + > +### Fixes: > + > +If your patch fixes a bug in a specific commit, e.g. you found an issue using > +``git bisect``, please use the `Fixes:` tag with the first 12 characters of > +the commit id, and the one line summary. > + > +Fixes: ("") > + > +E.g.: > + > +Fixes: 67d01cdb5518 ("x86: infrastructure to allow converting certain > indirect calls to direct ones") > + > +### Backport: > + > +A backport tag is an optional tag in the commit message to request a > +giv
Re: [PATCH 4.16 / 4.15] MAINTAINERS: Anthony is stable branch tools maintainer
On Thu, Feb 03, 2022 at 09:56:08AM +0100, Jan Beulich wrote: > Signed-off-by: Jan Beulich > > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -60,7 +60,7 @@ The maintainer for this branch is: > > Tools backport requests should also be copied to: > > - TODO - Loooking for new tools stable maintainer > + Anthony Perard Acked-by: Anthony PERARD Thanks, -- Anthony PERARD
Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests
On 03.02.2022 10:04, Roger Pau Monné wrote: > On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote: >> On 02.02.2022 17:13, Roger Pau Monné wrote: >>> On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote: @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d, ASSERT( pod_target >= p2m->pod.count ); -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/); +if ( has_arch_pdevs(d) || cache_flush_permitted(d) ) >>> >>> Is it possible to have cache flush allowed without any PCI device >>> assigned? AFAICT the iomem/ioport_caps would only get setup when there >>> are device passed through? >> >> One can assign MMIO or ports to a guest the raw way. That's not >> secure, but functionally explicitly permitted. >> >>> TBH I would be fine if we just say that PoD cannot be used in >>> conjunction with an IOMMU, and just check for is_iommu_enable(d) here. >>> >>> I understand it's technically possible for PoD to be used together >>> with a domain that will later get a device passed through once PoD is >>> no longer in use, but I doubt there's much value in supporting that >>> use case, and I fear we might be introducing corner cases that could >>> create issues in the future. Overall I think it would be safer to just >>> disable PoD in conjunction with an IOMMU. >> >> I consider it wrong to put in place such a restriction, but I could >> perhaps accept you and Andrew thinking this way if this was the only >> aspect playing into here. However, this would then want an equivalent >> tools side check, and while hunting down where to make the change as >> done here, I wasn't able to figure out where that alternative >> adjustment would need doing. Hence I would possibly(!) buy into this >> only if someone else took care of doing so properly in the tool stack >> (including the emission of a sensible error message). > > What about the (completely untested) chunk below: > > diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c > index d7a40d7550..e585ef4c5c 100644 > --- a/tools/libs/light/libxl_create.c > +++ b/tools/libs/light/libxl_create.c > @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc, > pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) && > (d_config->b_info.target_memkb < d_config->b_info.max_memkb); > > -/* We cannot have PoD and PCI device assignment at the same time > +/* We cannot have PoD and an active IOMMU at the same time > * for HVM guest. It was reported that IOMMU cannot work with PoD > * enabled because it needs to populated entire page table for > - * guest. To stay on the safe side, we disable PCI device > - * assignment when PoD is enabled. > + * guest. > */ > if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV && > -d_config->num_pcidevs && pod_enabled) { > +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED && > +pod_enabled) { > ret = ERROR_INVAL; > -LOGD(ERROR, domid, > - "PCI device assignment for HVM guest failed due to PoD > enabled"); > +LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD"); > goto error_out; > } Perhaps. Seeing this I actually recall coming across this check during my investigation. Not changing it along the lines of what you do was then really more because of me not being convinced of the extra restriction; I clearly misremembered when writing the earlier reply. If we were to do what you suggest, I'd like to ask that the comment be changed differently, though: "We cannot ..." then isn't really true anymore. We choose not to permit this mode; "cannot" only applies to actual device assignment (and of course only as long as there aren't restartable IOMMU faults). >> Finally this still leaves out the "raw MMIO / ports" case mentioned >> above. > > But the raw MMIO 'mode' doesn't care much about PoD, because if > there's no PCI device assigned there's no IOMMU setup, and thus such > raw MMIO regions (could?) belong to a device that's not constrained by > the guest p2m anyway? Hmm, yes, true. --- a/xen/common/vm_event.c +++ b/xen/common/vm_event.c @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st rc = -EXDEV; /* Disallow paging in a PoD guest */ -if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) ) +if ( p2m_pod_active(d) ) >>> >>> Isn't it fine to just check for entry_count like you suggest in the >>> change to libxl? >> >> I didn't think it would be, but I'm not entirely sure: If paging was >> enabled before a guest actually starts, it wouldn't have any entries >> but still be a PoD guest if it has a non-empty cache. The VM event >> folks may be able to clarify this either way. But ... >> >>> This is what p2m_pod_entry_count actually does (rather than entry_count | >>> count). >> >> ... you
Re: [PATCH] xen: add option to disable GNTTABOP_transfer
On 01.02.2022 10:02, Juergen Gross wrote: > The grant table operation GNTTABOP_transfer is meant to be used in > PV device backends, and it hasn't been used in Linux since the old > Xen-o-Linux days. Kind of unusual spelling of XenoLinux ;-) > --- a/xen/common/grant_table.c > +++ b/xen/common/grant_table.c > @@ -181,6 +181,7 @@ static int parse_gnttab_max_maptrack_frames(const char > *arg) > > unsigned int __read_mostly opt_gnttab_max_version = GNTTAB_MAX_VERSION; > static bool __read_mostly opt_transitive_grants = true; > +static bool __read_mostly opt_grant_transfer = true; If this was conditional upon PV (with a #define to false in the opposite case), it could be __ro_after_init right away, while at the same time allowing the compiler to eliminate gnttab_transfer(). > @@ -204,6 +205,8 @@ static int __init parse_gnttab(const char *s) > } > else if ( (val = parse_boolean("transitive", s, ss)) >= 0 ) > opt_transitive_grants = val; > +else if ( (val = parse_boolean("transfer", s, ss)) >= 0 ) > +opt_grant_transfer = val; > else > rc = -EINVAL; To possibly save a further roundtrip: If the PV dependency was added above, I'd like to ask to follow the model of parse_iommu_param() here and use "#ifndef opt_grant_transfer" around the added code in favor of "#ifdef CONFIG_PV". > @@ -2233,6 +2236,9 @@ gnttab_transfer( > unsigned int max_bitsize; > struct active_grant_entry *act; > > +if ( !opt_grant_transfer ) > +return -ENOSYS; -EOPNOTSUPP please. Jan
Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests
On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote: > On 02.02.2022 17:13, Roger Pau Monné wrote: > > On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote: > >> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d, > >> > >> ASSERT( pod_target >= p2m->pod.count ); > >> > >> -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/); > >> +if ( has_arch_pdevs(d) || cache_flush_permitted(d) ) > > > > Is it possible to have cache flush allowed without any PCI device > > assigned? AFAICT the iomem/ioport_caps would only get setup when there > > are device passed through? > > One can assign MMIO or ports to a guest the raw way. That's not > secure, but functionally explicitly permitted. > > > TBH I would be fine if we just say that PoD cannot be used in > > conjunction with an IOMMU, and just check for is_iommu_enable(d) here. > > > > I understand it's technically possible for PoD to be used together > > with a domain that will later get a device passed through once PoD is > > no longer in use, but I doubt there's much value in supporting that > > use case, and I fear we might be introducing corner cases that could > > create issues in the future. Overall I think it would be safer to just > > disable PoD in conjunction with an IOMMU. > > I consider it wrong to put in place such a restriction, but I could > perhaps accept you and Andrew thinking this way if this was the only > aspect playing into here. However, this would then want an equivalent > tools side check, and while hunting down where to make the change as > done here, I wasn't able to figure out where that alternative > adjustment would need doing. Hence I would possibly(!) buy into this > only if someone else took care of doing so properly in the tool stack > (including the emission of a sensible error message). What about the (completely untested) chunk below: diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c index d7a40d7550..e585ef4c5c 100644 --- a/tools/libs/light/libxl_create.c +++ b/tools/libs/light/libxl_create.c @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc, pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) && (d_config->b_info.target_memkb < d_config->b_info.max_memkb); -/* We cannot have PoD and PCI device assignment at the same time +/* We cannot have PoD and an active IOMMU at the same time * for HVM guest. It was reported that IOMMU cannot work with PoD * enabled because it needs to populated entire page table for - * guest. To stay on the safe side, we disable PCI device - * assignment when PoD is enabled. + * guest. */ if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV && -d_config->num_pcidevs && pod_enabled) { +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED && +pod_enabled) { ret = ERROR_INVAL; -LOGD(ERROR, domid, - "PCI device assignment for HVM guest failed due to PoD enabled"); +LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD"); goto error_out; } > Finally this still leaves out the "raw MMIO / ports" case mentioned > above. But the raw MMIO 'mode' doesn't care much about PoD, because if there's no PCI device assigned there's no IOMMU setup, and thus such raw MMIO regions (could?) belong to a device that's not constrained by the guest p2m anyway? > >> --- a/xen/common/vm_event.c > >> +++ b/xen/common/vm_event.c > >> @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st > >> > >> rc = -EXDEV; > >> /* Disallow paging in a PoD guest */ > >> -if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) ) > >> +if ( p2m_pod_active(d) ) > > > > Isn't it fine to just check for entry_count like you suggest in the > > change to libxl? > > I didn't think it would be, but I'm not entirely sure: If paging was > enabled before a guest actually starts, it wouldn't have any entries > but still be a PoD guest if it has a non-empty cache. The VM event > folks may be able to clarify this either way. But ... > > > This is what p2m_pod_entry_count actually does (rather than entry_count | > > count). > > ... you really mean "did" here, as I'm removing p2m_pod_entry_count() > in this patch. Of course locking could be added to it instead (or > p2m_pod_get_mem_target() be used in its place), but I'd prefer if we > could go with just the check which precisely matches what the comment > says (IOW otherwise I'd need to additionally know what exactly the > comment is to say). Could you briefly mention this in the commit message? Ie: VM event code is also adjusted to make sure PoD is not in use and cannot be used during the guest lifetime? Thanks, Roger.
[PATCH 4.16 / 4.15] MAINTAINERS: Anthony is stable branch tools maintainer
Signed-off-by: Jan Beulich --- a/MAINTAINERS +++ b/MAINTAINERS @@ -60,7 +60,7 @@ The maintainer for this branch is: Tools backport requests should also be copied to: - TODO - Loooking for new tools stable maintainer + Anthony Perard Unstable Subsystem Maintainers
Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests
On 02.02.2022 17:13, Roger Pau Monné wrote: > On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote: >> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d, >> >> ASSERT( pod_target >= p2m->pod.count ); >> >> -ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/); >> +if ( has_arch_pdevs(d) || cache_flush_permitted(d) ) > > Is it possible to have cache flush allowed without any PCI device > assigned? AFAICT the iomem/ioport_caps would only get setup when there > are device passed through? One can assign MMIO or ports to a guest the raw way. That's not secure, but functionally explicitly permitted. > TBH I would be fine if we just say that PoD cannot be used in > conjunction with an IOMMU, and just check for is_iommu_enable(d) here. > > I understand it's technically possible for PoD to be used together > with a domain that will later get a device passed through once PoD is > no longer in use, but I doubt there's much value in supporting that > use case, and I fear we might be introducing corner cases that could > create issues in the future. Overall I think it would be safer to just > disable PoD in conjunction with an IOMMU. I consider it wrong to put in place such a restriction, but I could perhaps accept you and Andrew thinking this way if this was the only aspect playing into here. However, this would then want an equivalent tools side check, and while hunting down where to make the change as done here, I wasn't able to figure out where that alternative adjustment would need doing. Hence I would possibly(!) buy into this only if someone else took care of doing so properly in the tool stack (including the emission of a sensible error message). Finally this still leaves out the "raw MMIO / ports" case mentioned above. >> --- a/xen/common/vm_event.c >> +++ b/xen/common/vm_event.c >> @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st >> >> rc = -EXDEV; >> /* Disallow paging in a PoD guest */ >> -if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) ) >> +if ( p2m_pod_active(d) ) > > Isn't it fine to just check for entry_count like you suggest in the > change to libxl? I didn't think it would be, but I'm not entirely sure: If paging was enabled before a guest actually starts, it wouldn't have any entries but still be a PoD guest if it has a non-empty cache. The VM event folks may be able to clarify this either way. But ... > This is what p2m_pod_entry_count actually does (rather than entry_count | > count). ... you really mean "did" here, as I'm removing p2m_pod_entry_count() in this patch. Of course locking could be added to it instead (or p2m_pod_get_mem_target() be used in its place), but I'd prefer if we could go with just the check which precisely matches what the comment says (IOW otherwise I'd need to additionally know what exactly the comment is to say). Jan
[xen-unstable test] 167990: tolerable FAIL - PUSHED
flight 167990 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/167990/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167986 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 167986 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167986 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167986 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167986 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167986 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 167986 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167986 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167986 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 167986 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167986 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167986 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: xen b17e0ec72eded037297f34a233655aad23f64711 baseline version: xen 9ce3ef20b4f085a7