date:20211021

Re: [PATCH v2 1/3] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing

2021-10-21 Thread Jan Beulich

On 21.10.2021 11:58, Jan Beulich wrote:
> x2apic_bsp_setup() gets called ahead of iommu_setup(), and since x2APIC
> mode (physical vs clustered) depends on iommu_intremap, that variable
> needs to be set to off as soon as we know we can't / won't enable
> interrupt remapping, i.e. in particular when parsing of the respective
> ACPI tables failed. Move the turning off of iommu_intremap from AMD
> specific code into acpi_iommu_init(), accompanying it by clearing of
> iommu_enable.
> 
> Take the opportunity and also fully skip ACPI table parsing logic on
> VT-d when both "iommu=off" and "iommu=no-intremap" are in effect anyway,
> like was already the case for AMD.
> 
> The tag below only references the commit uncovering a pre-existing
> anomaly.
> 
> Fixes: d8bd82327b0f ("AMD/IOMMU: obtain IVHD type to use earlier")
> Reported-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

Ouch, forgot to Cc Kevin; now added.

Jan

> ---
> While the change here deals with apic_x2apic_probe() as called from
> x2apic_bsp_setup(), the check_x2apic_preenabled() path looks to be
> similarly affected. That call occurs before acpi_boot_init(), which is
> what calls acpi_iommu_init(). The ordering in setup.c is in part
> relatively fragile, which is why for the moment I'm still hesitant to
> move the generic_apic_probe() call down. Plus I don't have easy access
> to a suitable system to test this case. Thoughts?
> ---
> v2: Treat iommu_enable and iommu_intremap as separate options.
> 
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -183,9 +183,6 @@ int __init acpi_ivrs_init(void)
>  {
>  int rc;
>  
> -if ( !iommu_enable && !iommu_intremap )
> -return 0;
> -
>  rc = amd_iommu_get_supported_ivhd_type();
>  if ( rc < 0 )
>  return rc;
> @@ -193,10 +190,7 @@ int __init acpi_ivrs_init(void)
>  ivhd_type = rc;
>  
>  if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
> -{
> -iommu_intremap = iommu_intremap_off;
>  return -ENODEV;
> -}
>  
>  iommu_init_ops = &_iommu_init_ops;
>  
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -777,11 +777,7 @@ static int __init acpi_parse_dmar(struct
>  dmar = (struct acpi_table_dmar *)table;
>  dmar_flags = dmar->flags;
>  
> -if ( !iommu_enable && !iommu_intremap )
> -{
> -ret = -EINVAL;
> -goto out;
> -}
> +ASSERT(iommu_enable || iommu_intremap);
>  
>  if ( !dmar->width )
>  {
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -41,6 +41,24 @@ enum iommu_intremap __read_mostly iommu_
>  bool __read_mostly iommu_intpost;
>  #endif
>  
> +void __init acpi_iommu_init(void)
> +{
> +int ret;
> +
> +if ( !iommu_enable && !iommu_intremap )
> +return;
> +
> +ret = acpi_dmar_init();
> +if ( ret == -ENODEV )
> +ret = acpi_ivrs_init();
> +
> +if ( ret )
> +{
> +iommu_enable = false;
> +iommu_intremap = iommu_intremap_off;
> +}
> +}
> +
>  int __init iommu_hardware_setup(void)
>  {
>  struct IO_APIC_route_entry **ioapic_entries = NULL;
> --- a/xen/include/asm-x86/acpi.h
> +++ b/xen/include/asm-x86/acpi.h
> @@ -141,16 +141,10 @@ extern u32 x86_acpiid_to_apicid[];
>  extern u32 pmtmr_ioport;
>  extern unsigned int pmtmr_width;
>  
> +void acpi_iommu_init(void);
>  int acpi_dmar_init(void);
>  int acpi_ivrs_init(void);
>  
> -static inline int acpi_iommu_init(void)
> -{
> -int ret = acpi_dmar_init();
> -
> -return ret == -ENODEV ? acpi_ivrs_init() : ret;
> -}
> -
>  void acpi_mmcfg_init(void);
>  
>  /* Incremented whenever we transition through S3. Value is 1 during boot. */
> 
>

[PATCH 1/2] drm/i915/gem: stop using PAGE_KERNEL_IO

2021-10-21 Thread Lucas De Marchi

PAGE_KERNEL_IO is only defined for x86 and nowadays is the same as
PAGE_KERNEL. It was different for some time, OR'ing a `_PAGE_IOMAP` flag
in commit be43d72835ba ("x86: add _PAGE_IOMAP pte flag for IO
mappings").  This got removed in commit f955371ca9d3 ("x86: remove the
Xen-specific _PAGE_IOMAP PTE flag"), so today they are just the same.

This is the same that was done in commit ac96b5566926 ("io-mapping.h:
s/PAGE_KERNEL_IO/PAGE_KERNEL/").

There is a subsequent commit with
'Fixes: ac96b5566926 ("io-mapping.h: s/PAGE_KERNEL_IO/PAGE_KERNEL/")' -
but that is not relevant here since is it's actually fixing the different
names for pgprot_writecombine(), which we also don't have today since
all archs expose pgprot_writecombine(). Microblaze, mentioned in that
discussion, gained pgprot_writecombine() in
commit 97ccedd793ac ("microblaze: Provide pgprot_device/writecombine
macros for nommu").

So, just use PAGE_KERNEL, and just use pgprot_writecombine().

Signed-off-by: Lucas De Marchi 
Reviewed-by: Daniel Vetter 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20211020090625.1037517-1-lucas.demar...@intel.com
---
 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 8eb1c3a6fc9c..68fe1837ef54 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -289,7 +289,7 @@ static void *i915_gem_object_map_page(struct 
drm_i915_gem_object *obj,
pgprot = PAGE_KERNEL;
break;
case I915_MAP_WC:
-   pgprot = pgprot_writecombine(PAGE_KERNEL_IO);
+   pgprot = pgprot_writecombine(PAGE_KERNEL);
break;
}
 
@@ -333,7 +333,7 @@ static void *i915_gem_object_map_pfn(struct 
drm_i915_gem_object *obj,
i = 0;
for_each_sgt_daddr(addr, iter, obj->mm.pages)
pfns[i++] = (iomap + addr) >> PAGE_SHIFT;
-   vaddr = vmap_pfn(pfns, n_pfn, pgprot_writecombine(PAGE_KERNEL_IO));
+   vaddr = vmap_pfn(pfns, n_pfn, pgprot_writecombine(PAGE_KERNEL));
if (pfns != stack)
kvfree(pfns);
 
-- 
2.33.1

[PATCH 0/2] Nuke PAGE_KERNEL_IO

2021-10-21 Thread Lucas De Marchi

Last user of PAGE_KERNEL_IO is the i915 driver. While removing it from
there as we seek to bring the driver to other architectures, Daniel
suggested that we could finish the cleanup and remove it altogether,
through the tip tree. So here I'm sending both commits needed for that.

Lucas De Marchi (2):
  drm/i915/gem: stop using PAGE_KERNEL_IO
  x86/mm: nuke PAGE_KERNEL_IO

 arch/x86/include/asm/fixmap.h | 2 +-
 arch/x86/include/asm/pgtable_types.h  | 7 ---
 arch/x86/mm/ioremap.c | 2 +-
 arch/x86/xen/setup.c  | 2 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 ++--
 include/asm-generic/fixmap.h  | 2 +-
 6 files changed, 6 insertions(+), 13 deletions(-)

-- 
2.33.1

[PATCH 2/2] x86/mm: nuke PAGE_KERNEL_IO

2021-10-21 Thread Lucas De Marchi

PAGE_KERNEL_IO is only defined for x86 and nowadays is the same as
PAGE_KERNEL. It was different for some time, OR'ing a `_PAGE_IOMAP` flag
in commit be43d72835ba ("x86: add _PAGE_IOMAP pte flag for IO
mappings").  This got removed in commit f955371ca9d3 ("x86: remove the
Xen-specific _PAGE_IOMAP PTE flag"), so today they are just the same.

With the last users outside arch/x86 being remove we can now remove
PAGE_KERNEL_IO.

Signed-off-by: Lucas De Marchi 
---
 arch/x86/include/asm/fixmap.h| 2 +-
 arch/x86/include/asm/pgtable_types.h | 7 ---
 arch/x86/mm/ioremap.c| 2 +-
 arch/x86/xen/setup.c | 2 +-
 include/asm-generic/fixmap.h | 2 +-
 5 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index d0dcefb5cc59..5e186a69db10 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -173,7 +173,7 @@ static inline void __set_fixmap(enum fixed_addresses idx,
  * supported for MMIO addresses, so make sure that the memory encryption
  * mask is not part of the page attributes.
  */
-#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
+#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_NOCACHE
 
 /*
  * Early memremap routines used for in-place encryption. The mappings created
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 40497a9020c6..a87224767ff3 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -199,10 +199,6 @@ enum page_cache_mode {
 #define __PAGE_KERNEL_WP(__PP|__RW|   0|___A|__NX|___D|   0|___G| __WP)
 
 
-#define __PAGE_KERNEL_IO   __PAGE_KERNEL
-#define __PAGE_KERNEL_IO_NOCACHE   __PAGE_KERNEL_NOCACHE
-
-
 #ifndef __ASSEMBLY__
 
 #define __PAGE_KERNEL_ENC  (__PAGE_KERNEL| _ENC)
@@ -223,9 +219,6 @@ enum page_cache_mode {
 #define PAGE_KERNEL_LARGE_EXEC __pgprot_mask(__PAGE_KERNEL_LARGE_EXEC | _ENC)
 #define PAGE_KERNEL_VVAR   __pgprot_mask(__PAGE_KERNEL_VVAR   | _ENC)
 
-#define PAGE_KERNEL_IO __pgprot_mask(__PAGE_KERNEL_IO)
-#define PAGE_KERNEL_IO_NOCACHE __pgprot_mask(__PAGE_KERNEL_IO_NOCACHE)
-
 #endif /* __ASSEMBLY__ */
 
 /* xwr */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 026031b3b782..3102dda4b152 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -243,7 +243,7 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long 
size,
 * make sure the memory encryption attribute is enabled in the
 * resulting mapping.
 */
-   prot = PAGE_KERNEL_IO;
+   prot = PAGE_KERNEL;
if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted)
prot = pgprot_encrypted(prot);
 
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 8bfc10330107..5dc0771a50f3 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -435,7 +435,7 @@ static unsigned long __init 
xen_set_identity_and_remap_chunk(
for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
(void)HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
-   mfn_pte(pfn, PAGE_KERNEL_IO), 0);
+   mfn_pte(pfn, PAGE_KERNEL), 0);
 
return remap_pfn;
 }
diff --git a/include/asm-generic/fixmap.h b/include/asm-generic/fixmap.h
index 8cc7b09c1bc7..f1b0c6f3d0be 100644
--- a/include/asm-generic/fixmap.h
+++ b/include/asm-generic/fixmap.h
@@ -54,7 +54,7 @@ static inline unsigned long virt_to_fix(const unsigned long 
vaddr)
 #define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_NOCACHE
 #endif
 #ifndef FIXMAP_PAGE_IO
-#define FIXMAP_PAGE_IO PAGE_KERNEL_IO
+#define FIXMAP_PAGE_IO PAGE_KERNEL
 #endif
 #ifndef FIXMAP_PAGE_CLEAR
 #define FIXMAP_PAGE_CLEAR __pgprot(0)
-- 
2.33.1

RE: [for-4.16] Re: [PATCH v4] xen/arm: vgic: Ignore write access to ICPENDR*

2021-10-21 Thread Hongda Deng

Hi,

> -Original Message-
> From: Julien Grall 
> Sent: 2021年10月22日 1:16
> To: Hongda Deng ; xen-devel@lists.xenproject.org;
> sstabell...@kernel.org
> Cc: Bertrand Marquis ; Wei Chen
> ; Ian Jackson 
> Subject: Re: [for-4.16] Re: [PATCH v4] xen/arm: vgic: Ignore write access to
> ICPENDR*
> 
> 
> 
> On 21/10/2021 16:14, Julien Grall wrote:
> > On the previous version, we discussed to include the patch for 4.16. So
> > please tag it with for-4.16 and CC the Release Manager (Ian). This will
> > help him to track what's outstanding for the release.
> >
> > On 21/10/2021 13:03, Hongda Deng wrote:
> >> Currently, Xen will return IO unhandled when guests write ICPENDR*
> >> virtual registers, which will raise a data abort inside the guest.
> >> For Linux guest, these virtual registers will not be accessed. But
> >> for Zephyr, these virtual registers will be accessed during the
> >> initialization. Zephyr guest will get an IO data abort and crash.
> >> Emulating ICPENDR is not easy with the existing vGIC, this patch
> >> reworks the emulation to ignore write access to ICPENDR* virtual
> >> registers and print a message about whether they are already pending
> >> instead of returning unhandled.
> >> More details can be found at [1].
> >>
> >> [1] https://github.com/zephyrproject-rtos/zephyr/blob/eaf6cf745df3807e6e
> >> cc941c3a30de6c179ae359/drivers/interrupt_controller/intc_gicv3.c#L274
> >>
> >> Signed-off-by: Hongda Deng 
> >
> > While I agree the Reviewed-by from Bertrand should be dropped, the
> > Release-acked-by from Ian is simply a way to say he is happy to include
> > the patch for 4.16. So this should have been retain.
> >
> > The patch looks good to me, so I can add Ian's tag on commit:
> >
> > Reviewed-by: Julien Grall 
> 
> Committed.
> 
> Cheers,
> 
> --
> Julien Grall

Thank you !

I just learned that I should add "Reviewed-by" and " Release-acked-by" tags 
based on previous
patches, sorry for that, I will keep it in mind.

Cheers,

---
Hongda

[xen-unstable test] 165712: regressions - FAIL

2021-10-21 Thread osstest service owner

flight 165712 xen-unstable real [real]
flight 165727 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/165712/
http://logs.test-lab.xenproject.org/osstest/logs/165727/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-examine  4 memdisk-try-append   fail REGR. vs. 165699

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail  like 165699
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 165699
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 165699
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 165699
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 165699
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 165699
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 165699
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 165699
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 165699
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 165699
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 165699
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 165699
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 165699
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-check

[qemu-mainline test] 165721: regressions - FAIL

2021-10-21 Thread osstest service owner

flight 165721 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/165721/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i3866 xen-buildfail REGR. vs. 165682
 build-i386-xsm6 xen-buildfail REGR. vs. 165682
 build-armhf   6 xen-buildfail REGR. vs. 165682

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-i386-xl1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-pvshim 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 165682
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 165682
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 165682
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14

[PATCH 1/3] automation: add x86_64 alpine 3.12 test-artifact

2021-10-21 Thread Stefano Stabellini

From: Stefano Stabellini 

It is the same as the existing ARM64 alpine 3.12 test-artifact. It is
used to export an Alpine rootfs for Dom0 used for testing.

Also add the exporting job to build.yaml so that the binaries can be
used during gitlab-ci runs.

Signed-off-by: Stefano Stabellini 
---
 automation/gitlab-ci/build.yaml   | 13 
 .../tests-artifacts/alpine/3.12.dockerfile| 68 +++
 2 files changed, 81 insertions(+)
 create mode 100644 automation/tests-artifacts/alpine/3.12.dockerfile

diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index d177da1710..76b73beead 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -596,3 +596,16 @@ qemu-system-aarch64-5.2.0-arm64-export:
   tags:
 - arm64
 
+
+# x86_64 test artifacts
+
+alpine-3.12-rootfs-export:
+  stage: build
+  image: registry.gitlab.com/xen-project/xen/tests-artifacts/alpine:3.12
+  script:
+- mkdir binaries && cp /initrd.tar.gz binaries/initrd.tar.gz
+  artifacts:
+paths:
+  - binaries/initrd.tar.gz
+  tags:
+- x86_64
diff --git a/automation/tests-artifacts/alpine/3.12.dockerfile 
b/automation/tests-artifacts/alpine/3.12.dockerfile
new file mode 100644
index 00..fbc26b9e36
--- /dev/null
+++ b/automation/tests-artifacts/alpine/3.12.dockerfile
@@ -0,0 +1,68 @@
+FROM alpine:3.12
+LABEL maintainer.name="The Xen Project" \
+  maintainer.email="xen-devel@lists.xenproject.org"
+
+ENV USER root
+
+RUN mkdir /build
+WORKDIR /build
+
+RUN \
+  # apk
+  apk update && \
+  \
+  # xen runtime deps
+  apk add musl && \
+  apk add openrc && \
+  apk add busybox && \
+  apk add sudo && \
+  apk add dbus && \
+  apk add bash && \
+  apk add python2 && \
+  # gettext for Xen < 4.13
+  apk add gettext && \
+  apk add zlib && \
+  apk add ncurses && \
+  apk add texinfo && \
+  apk add yajl && \
+  apk add libaio && \
+  apk add xz-dev && \
+  apk add util-linux && \
+  apk add argp-standalone && \
+  apk add libfdt && \
+  apk add glib && \
+  apk add pixman && \
+  apk add curl && \
+  apk add udev && \
+  \
+  # Xen
+  cd / && \
+  # Minimal ramdisk environment in case of cpio output
+  rc-update add udev && \
+  rc-update add udev-trigger && \
+  rc-update add udev-settle && \
+  rc-update add networking sysinit && \
+  rc-update add loopback sysinit && \
+  rc-update add bootmisc boot && \
+  rc-update add devfs sysinit && \
+  rc-update add dmesg sysinit && \
+  rc-update add hostname boot && \
+  rc-update add hwclock boot && \
+  rc-update add hwdrivers sysinit && \
+  rc-update add killprocs shutdown && \
+  rc-update add modloop sysinit && \
+  rc-update add modules boot && \
+  rc-update add mount-ro shutdown && \
+  rc-update add savecache shutdown && \
+  rc-update add sysctl boot && \
+  rc-update add local default && \
+  cp -a /sbin/init /init && \
+  echo "ttyS0" >> /etc/securetty && \
+  echo "hvc0" >> /etc/securetty && \
+  echo "ttyS0::respawn:/sbin/getty -L ttyS0 115200 vt100" >> /etc/inittab && \
+  echo "hvc0::respawn:/sbin/getty -L hvc0 115200 vt100" >> /etc/inittab && \
+  passwd -d "root" root && \
+  \
+  # Create rootfs
+  cd / && \
+  tar cvzf /initrd.tar.gz bin dev etc home init lib mnt opt root sbin usr var
-- 
2.17.1

[PATCH 3/3] automation: add a QEMU based x86_64 Dom0/DomU test

2021-10-21 Thread Stefano Stabellini

From: Stefano Stabellini 

Introduce a test based on QEMU to run Xen, Dom0 and start a DomU.
This is similar to the existing qemu-alpine-arm64.sh script and test.
The only differences are:
- use Debian's qemu-system-x86_64 (on ARM we build our own)
- use ipxe instead of u-boot and ImageBuilder

Signed-off-by: Stefano Stabellini 
---
 automation/gitlab-ci/test.yaml   | 24 +++
 automation/scripts/qemu-alpine-x86_64.sh | 92 
 2 files changed, 116 insertions(+)
 create mode 100644 automation/scripts/qemu-alpine-x86_64.sh

diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index 91a10febbf..c1d67ec4b5 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -47,6 +47,30 @@ qemu-alpine-arm64-gcc:
 - /^coverity-tested\/.*/
 - /^stable-.*/
 
+qemu-alpine-x86_64-gcc:
+  stage: test
+  image: registry.gitlab.com/xen-project/xen/${CONTAINER}
+  variables:
+CONTAINER: debian:unstable
+  script:
+- ./automation/scripts/qemu-alpine-x86_64.sh 2>&1 | tee 
qemu-smoke-arm64.log
+  dependencies:
+- alpine-3.12-gcc
+- alpine-3.12-rootfs-export
+- kernel-5.10.74-export
+  artifacts:
+paths:
+  - smoke.serial
+  - '*.log'
+when: always
+  tags:
+- x86_64
+  except:
+- master
+- smoke
+- /^coverity-tested\/.*/
+- /^stable-.*/
+
 qemu-smoke-arm64-gcc:
   stage: test
   image: registry.gitlab.com/xen-project/xen/${CONTAINER}
diff --git a/automation/scripts/qemu-alpine-x86_64.sh 
b/automation/scripts/qemu-alpine-x86_64.sh
new file mode 100644
index 00..41b05210d6
--- /dev/null
+++ b/automation/scripts/qemu-alpine-x86_64.sh
@@ -0,0 +1,92 @@
+#!/bin/bash
+
+set -ex
+
+apt-get -qy update
+apt-get -qy install --no-install-recommends qemu-system-x86 \
+cpio \
+curl \
+busybox-static
+
+# DomU Busybox
+cd binaries
+mkdir -p initrd
+mkdir -p initrd/bin
+mkdir -p initrd/sbin
+mkdir -p initrd/etc
+mkdir -p initrd/dev
+mkdir -p initrd/proc
+mkdir -p initrd/sys
+mkdir -p initrd/lib
+mkdir -p initrd/var
+mkdir -p initrd/mnt
+cp /bin/busybox initrd/bin/busybox
+initrd/bin/busybox --install initrd/bin
+echo "#!/bin/sh
+
+mount -t proc proc /proc
+mount -t sysfs sysfs /sys
+mount -t devtmpfs devtmpfs /dev
+/bin/sh" > initrd/init
+chmod +x initrd/init
+cd initrd
+find . | cpio --create --format='newc' | gzip > ../initrd.cpio.gz
+cd ..
+
+mkdir -p rootfs
+cd rootfs
+tar xvzf ../initrd.tar.gz
+mkdir proc
+mkdir run
+mkdir srv
+mkdir sys
+rm var/run
+cp -ar ../dist/install/* .
+mv ../initrd.cpio.gz ./root
+cp ../bzImage ./root
+echo "name=\"test\"
+memory=512
+vcpus=1
+kernel=\"/root/bzImage\"
+ramdisk=\"/root/initrd.cpio.gz\"
+extra=\"console=hvc0 root=/dev/ram0 rdinit=/bin/sh\"
+" > root/test.cfg
+echo "#!/bin/bash
+
+export LD_LIBRARY_PATH=/usr/local/lib
+bash /etc/init.d/xencommons start
+
+xl list
+
+xl create -c /root/test.cfg
+
+" > etc/local.d/xen.start
+chmod +x etc/local.d/xen.start
+echo "rc_verbose=yes" >> etc/rc.conf
+find . |cpio -H newc -o|gzip > ../xen-rootfs.cpio.gz
+cd ../..
+
+cat >> binaries/pxelinux.0 <<- EOF
+#!ipxe
+
+kernel xen console=com1
+module bzImage console=hvc0
+module xen-rootfs.cpio.gz
+boot
+EOF
+
+# Run the test
+rm -f smoke.serial
+set +e
+timeout -k 1 720 \
+qemu-system-x86_64 \
+-cpu qemu64,+svm \
+-m 2G -smp 2 \
+-monitor none -serial stdio \
+-nographic \
+-device virtio-net-pci,netdev=n0 \
+-netdev user,id=n0,tftp=binaries,bootfile=/pxelinux.0 |& tee smoke.serial
+
+set -e
+(grep -q "Domain-0" smoke.serial && grep -q "BusyBox" smoke.serial) || exit 1
+exit 0
-- 
2.17.1

[PATCH 2/3] automation: Linux 5.10.74 test-artifact

2021-10-21 Thread Stefano Stabellini

From: Stefano Stabellini 

Build a 5.10 kernel to be used as Dom0 and DomU kernel for testing. This
is almost the same as the existing ARM64 recipe for Linux 5.9, the
only differences are:
- upgrade to latest 5.10.x stable
- force Xen modules to built-in (on ARM it was already done by defconfig)

Also add the exporting job to build.yaml so that the binary can be used
during gitlab-ci runs.

Signed-off-by: Stefano Stabellini 
---
 automation/gitlab-ci/build.yaml   | 11 ++
 .../tests-artifacts/kernel/5.10.74.dockerfile | 38 +++
 2 files changed, 49 insertions(+)
 create mode 100644 automation/tests-artifacts/kernel/5.10.74.dockerfile

diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index 76b73beead..0034c50950 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -609,3 +609,14 @@ alpine-3.12-rootfs-export:
   - binaries/initrd.tar.gz
   tags:
 - x86_64
+
+kernel-5.10.74-export:
+  stage: build
+  image: registry.gitlab.com/xen-project/xen/tests-artifacts/kernel:5.10.74
+  script:
+- mkdir binaries && cp /bzImage binaries/bzImage
+  artifacts:
+paths:
+  - binaries/bzImage
+  tags:
+- x86_64
diff --git a/automation/tests-artifacts/kernel/5.10.74.dockerfile 
b/automation/tests-artifacts/kernel/5.10.74.dockerfile
new file mode 100644
index 00..f2dbbecf74
--- /dev/null
+++ b/automation/tests-artifacts/kernel/5.10.74.dockerfile
@@ -0,0 +1,38 @@
+FROM debian:unstable
+LABEL maintainer.name="The Xen Project" \
+  maintainer.email="xen-devel@lists.xenproject.org"
+
+ENV DEBIAN_FRONTEND=noninteractive
+ENV LINUX_VERSION=5.10.74
+ENV USER root
+
+RUN mkdir /build
+WORKDIR /build
+
+# build depends
+RUN apt-get update && \
+apt-get --quiet --yes install \
+build-essential \
+libssl-dev \
+bc \
+curl \
+flex \
+bison \
+libelf-dev \
+&& \
+\
+# Build the kernel
+curl -fsSLO 
https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-"$LINUX_VERSION".tar.xz && \
+tar xvJf linux-"$LINUX_VERSION".tar.xz && \
+cd linux-"$LINUX_VERSION" && \
+make defconfig && \
+make xen.config && \
+cp .config .config.orig && \
+cat .config.orig | grep XEN | grep =m |sed 's/=m/=y/g' >> .config && \
+make -j$(nproc) bzImage && \
+cp arch/x86/boot/bzImage / && \
+cd /build && \
+rm -rf linux-"$LINUX_VERSION"* && \
+apt-get autoremove -y && \
+apt-get clean && \
+rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
-- 
2.17.1

[PATCH 0/3] automation: introduce an x86_64 Dom0/DomU test

2021-10-21 Thread Stefano Stabellini

Hi all,

This small patch series introduces a new QEMU-based test to Gitlab-CI.
It uses QEMU to emulate an x86_64 machine and run Xen, Dom0 and start a
DomU. It is very similar to the existing qemu-alpine-arm64-gcc test but
based on x86_64 rather than ARM64. I think it is important because all
the other x86_64 gitlab-ci tests we currently have are more narrow and
based on XTF. This would be the first end-to-end x86_64 test in
gitlab-ci.

To make it happen, we need an Alpine Linux rootfs and a Linux kernel.
The first two patches introduce them to gitlab-ci. Note that actually
nothing will get build during gitlab-ci runs, it has already been done
beforehand and uploaded as containers. They only import *existing*
containers and binaries into a gitlab-ci run, thus, they cannot fail.
The risk to the release of the first two patches is as close to zero as
possible.

The last patch is the one introducing a new test. This one can fail.
However, it is a new test at the end of the pipeline: it doesn't impact
the existing tests. In the worst case, the test will fail and the whole
pipeline will be marked as "failed" but looking at the jobs all the
other will continue to be marked as successful. In short, if it fails,
we can simply ignore it. Also, at the moment it is actually succeeding.

Cheers,

Stefano


Stefano Stabellini (3):
  automation: add x86_64 alpine 3.12 test-artifact
  automation: Linux 5.10.74 test-artifact
  automation: add a QEMU based x86_64 Dom0/DomU test

 automation/gitlab-ci/build.yaml| 24 ++
 automation/gitlab-ci/test.yaml | 24 ++
 automation/scripts/qemu-alpine-x86_64.sh   | 92 ++
 automation/tests-artifacts/alpine/3.12.dockerfile  | 68 
 .../tests-artifacts/kernel/5.10.74.dockerfile  | 38 +
 5 files changed, 246 insertions(+)
 create mode 100644 automation/scripts/qemu-alpine-x86_64.sh
 create mode 100644 automation/tests-artifacts/alpine/3.12.dockerfile
 create mode 100644 automation/tests-artifacts/kernel/5.10.74.dockerfile

[xen-unstable-smoke test] 165719: tolerable all pass - PUSHED

2021-10-21 Thread osstest service owner

flight 165719 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/165719/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  23ec1ebc8acbfd2bf06f6085a776f0db923f9fa9
baseline version:
 xen  98f60e5de00baf650c574c9352bb19aedb082dea

Last test of basis   165708  2021-10-21 11:01:40 Z0 days
Testing same since   165719  2021-10-21 18:00:27 Z0 days1 attempts


People who touched revisions under test:
  Hongda Deng 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   98f60e5de0..23ec1ebc8a  23ec1ebc8acbfd2bf06f6085a776f0db923f9fa9 -> smoke

[ovmf test] 165714: all pass - PUSHED

2021-10-21 Thread osstest service owner

flight 165714 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/165714/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 2f286930a8280f4d817460020110009938f695b6
baseline version:
 ovmf 305fd6bee0bfe1602163d1f8841954f84aa31b68

Last test of basis   165701  2021-10-21 03:40:20 Z0 days
Testing same since   165714  2021-10-21 15:40:13 Z0 days1 attempts


People who touched revisions under test:
  Abdul Lateef Attar 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   305fd6bee0..2f286930a8  2f286930a8280f4d817460020110009938f695b6 -> 
xen-tested-master

Re: [XEN][RFC PATCH 10/13] xen/arm: Implement device tree node addition functionalities

2021-10-21 Thread Julien Grall


Hi Vikram,

On 02/09/2021 07:06, Vikram Garhwal wrote:

Introduce domctl XEN_DOMCTL_addfpga to add a device-tree node through device


XEN_DOMCTL_* are hypercalls to manage a given domain. However, here you 
are modifying the system.


This is similar to managing PCI devices, except here we are dealing with 
platforms devices. In the case of PCI devices, we are using PHYSDEVOP_*.


If we only expect the toolstack to the call (e.g. the kernel won't use 
it directly), then it would be better to use SYSCTL_* as they are not 
stable (so we have more flexibility to modify the layout).


Furthermore, rather than introduciong 2 new sub-hypercalls for 
XEN_SYSCTL_* I would introduce a single sub-hypercall that takes a 
command (e.g. add/remove).


Regarding the name, in theory this feature is not FPGA specific. So I 
would prefer a more generic name, maybe XEN_DOMCTL_dt_overlay?



tree overlay. This works with a device tree overlay(.dtbo) as input.

Add check_pfdt() to do sanity check on the dtbo.


From my experience with libfdt, the library tends to consider the DTB 
to be sound. So we would have to trust what the toolstack is provided us.


So I think we need to make clear that this is basic sanity check and 
there are no expectation that we will be able to deal with a random blob.




Also, added overlay_get_node_info() to get the node's full name with path. This
comes handy when checking node for duplication.

Each time a overlay node is added, a new fdt(memcpy of device_tree_flattened) is
created and updated with overlay node. This updated fdt is further unflattened
to a dt_host_new. Next, it checks if overlay node already exists in the dt_host.
If overlay node doesn't exist then find the overlay node in dt_host_new, find
the overlay node's parent in dt_host and add the node as child under parent in
the dt_host. The node is attached as the last node under target parent.

Finally, add IRQs, add device to IOMMUs, set permissions and map MMIO for the
overlay node.

When a node is added using overlay, a new entry is allocated in the
overlay_track to keep the track of memory allocation due to addition of overlay
node. This is helpful for freeing the memory allocated when a device tree node
is removed with domctl XEN_DOMCTL_delfpga domctl.

Signed-off-by: Vikram Garhwal 
---
  xen/arch/arm/domctl.c | 262 ++
  xen/common/device_tree.c  |  54 +
  xen/include/public/domctl.h   |   7 ++
  xen/include/xen/device_tree.h |   1 +
  4 files changed, 324 insertions(+)

diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index 5986934..0ac635f 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -15,6 +15,8 @@
  #include 
  #include 
  #include 
+/* Included for FPGA dt add. */


We don't usually write done why an header is included. So I would remove 
this comment.



+#include 
  #include 
  #include 
  #include 
@@ -68,6 +70,61 @@ static int handle_vuart_init(struct domain *d,
  return rc;
  }
  
+static int check_pfdt(void *pfdt, uint32_t pfdt_size)

+{
+if ( fdt_totalsize(pfdt) != pfdt_size )
+{
+printk(XENLOG_ERR "Partial FDT is not a valid Flat Device Tree\n");
+return -EFAULT;


I think -EINVAL is more suitable.


+}
+
+if ( fdt_check_header(pfdt) )
+{
+printk(XENLOG_ERR "Partial FDT is not a valid Flat Device Tree\n");


The printk() is exactly the same as above. So how about combining the 
two check?


Furthemore, from the commit message, the pfdt is an overlay fdt. So 
shouldn't we have write "The overlay FDT" rather than "Partial FDT"?



+return -EFAULT;
+}
+
+return 0;
+}
+
+static void overlay_get_node_info(void *fdto, char *node_full_path)
+{
+int fragment;
+
+/*
+ * Handle overlay nodes. But for now we are just handling one node.


Is this a limitation you plan to handle before this series is going to 
be committed?



+ */
+fdt_for_each_subnode(fragment, fdto, 0)
+{
+int target;
+int overlay;
+int subnode;
+const char *target_path;
+
+target = overlay_get_target(device_tree_flattened, fdto, fragment,
+_path);
+overlay = fdt_subnode_offset(fdto, fragment, "__overlay__");
+
+fdt_for_each_subnode(subnode, fdto, overlay)
+{
+const char *node_name = fdt_get_name(fdto, subnode, NULL);
+int node_name_len = strlen(node_name);
+int target_path_len = strlen(target_path);
+
+memcpy(node_full_path, target_path, target_path_len);
Looking at the caller (handle_add_fpga_overlay()), the node_full_path is 
a fixed array. What does guarantee that the array is large enough to 
target_path?



+
+node_full_path[target_path_len] = '/';
+
+memcpy(node_full_path + target_path_len + 1, node_name,
+   node_name_len);
+
+node_full_path[target_path_len + 1 +

[qemu-mainline test] 165703: regressions - FAIL

2021-10-21 Thread osstest service owner

flight 165703 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/165703/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i3866 xen-buildfail REGR. vs. 165682
 build-i386-xsm6 xen-buildfail REGR. vs. 165682
 build-armhf   6 xen-buildfail REGR. vs. 165682

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-i386-xl1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-pvshim 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 165682
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 165682
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 165682
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14

[linux-linus test] 165700: tolerable FAIL - PUSHED

2021-10-21 Thread osstest service owner

flight 165700 linux-linus real [real]
flight 165715 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/165700/
http://logs.test-lab.xenproject.org/osstest/logs/165715/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl-credit1   8 xen-bootfail pass in 165715-retest
 test-amd64-amd64-freebsd11-amd64 21 guest-start/freebsd.repeat fail pass in 
165715-retest

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-credit1 15 migrate-support-check fail in 165715 never pass
 test-arm64-arm64-xl-credit1 16 saverestore-support-check fail in 165715 never 
pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 165693
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 165693
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 165693
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 165693
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 165693
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 165693
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 165693
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 165693
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux2f111a6fd5b5297b4e92f53798ca086f7c7d33a4
baseline version:
 linux8e37395c3a5dceff62a5010ebbbc107f4145935c

Last test of basis   165693  2021-10-20 16:40:19 Z1 days
Testing same since   165700  2021-10-21 03:27:38 Z0 days1 attempts


People who touched revisions under test:
  Christoph Hellwig 
  Gerald Schaefer 
  Hamza

Re: [for-4.16] Re: [PATCH v4] xen/arm: vgic: Ignore write access to ICPENDR*

2021-10-21 Thread Julien Grall





On 21/10/2021 16:14, Julien Grall wrote:
On the previous version, we discussed to include the patch for 4.16. So 
please tag it with for-4.16 and CC the Release Manager (Ian). This will 
help him to track what's outstanding for the release.


On 21/10/2021 13:03, Hongda Deng wrote:

Currently, Xen will return IO unhandled when guests write ICPENDR*
virtual registers, which will raise a data abort inside the guest.
For Linux guest, these virtual registers will not be accessed. But
for Zephyr, these virtual registers will be accessed during the
initialization. Zephyr guest will get an IO data abort and crash.
Emulating ICPENDR is not easy with the existing vGIC, this patch
reworks the emulation to ignore write access to ICPENDR* virtual
registers and print a message about whether they are already pending
instead of returning unhandled.
More details can be found at [1].

[1] https://github.com/zephyrproject-rtos/zephyr/blob/eaf6cf745df3807e6e
cc941c3a30de6c179ae359/drivers/interrupt_controller/intc_gicv3.c#L274

Signed-off-by: Hongda Deng 


While I agree the Reviewed-by from Bertrand should be dropped, the 
Release-acked-by from Ian is simply a way to say he is happy to include 
the patch for 4.16. So this should have been retain.


The patch looks good to me, so I can add Ian's tag on commit:

Reviewed-by: Julien Grall 


Committed.

Cheers,

--
Julien Grall

[ovmf test] 165701: all pass - PUSHED

2021-10-21 Thread osstest service owner

flight 165701 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/165701/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 305fd6bee0bfe1602163d1f8841954f84aa31b68
baseline version:
 ovmf 6893865b3010bb6192f732643c4b8ba026726d07

Last test of basis   165690  2021-10-20 13:40:00 Z1 days
Testing same since   165701  2021-10-21 03:40:20 Z0 days1 attempts


People who touched revisions under test:
  IanX Kuo 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   6893865b30..305fd6bee0  305fd6bee0bfe1602163d1f8841954f84aa31b68 -> 
xen-tested-master

Re: [PATCH 12/12] xen/x86: add hypercall performance counters for hvm, correct pv

2021-10-21 Thread Jan Beulich

On 15.10.2021 14:51, Juergen Gross wrote:
> The HVM hypercall handler is missing incrementing the per hypercall
> counters. Add that.
> 
> The counters for PV are handled wrong, as they are not using
> perf_incra() with the number of the hypercall as index, but are
> incrementing the total number of hypercalls only. Fix that.

Why do you say "total number"? Isn't it that all accounting goes into
set_trap_table's slot, effectively making that slot a "total number"
despite not being labeled that way?

Also this fix renders largely redundant the calls_to_multicall counter.
Could I talk you into deleting that at the same time? (As to the "not
fully redundant": I consider it suspicious that this counter gets
incremented at the bottom of the function, not at the top.)

Finally I take it that with the Kconfig setting being under DEBUG, we
don't consider security supported builds with PERF_COUNTERS enabled.
Otherwise as a prereq I would think perfc_incra() would need teaching
of array_index_nospec().

In any event, preferably with at least the description slightly
adjusted,
Reviewed-by: Jan Beulich 

Jan

[for-4.16] Re: [PATCH v4] xen/arm: vgic: Ignore write access to ICPENDR*

2021-10-21 Thread Julien Grall


Hello Hongda,

On the previous version, we discussed to include the patch for 4.16. So 
please tag it with for-4.16 and CC the Release Manager (Ian). This will 
help him to track what's outstanding for the release.


On 21/10/2021 13:03, Hongda Deng wrote:

Currently, Xen will return IO unhandled when guests write ICPENDR*
virtual registers, which will raise a data abort inside the guest.
For Linux guest, these virtual registers will not be accessed. But
for Zephyr, these virtual registers will be accessed during the
initialization. Zephyr guest will get an IO data abort and crash.
Emulating ICPENDR is not easy with the existing vGIC, this patch
reworks the emulation to ignore write access to ICPENDR* virtual
registers and print a message about whether they are already pending
instead of returning unhandled.
More details can be found at [1].

[1] https://github.com/zephyrproject-rtos/zephyr/blob/eaf6cf745df3807e6e
cc941c3a30de6c179ae359/drivers/interrupt_controller/intc_gicv3.c#L274

Signed-off-by: Hongda Deng 


While I agree the Reviewed-by from Bertrand should be dropped, the 
Release-acked-by from Ian is simply a way to say he is happy to include 
the patch for 4.16. So this should have been retain.


The patch looks good to me, so I can add Ian's tag on commit:

Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

Re: (subset) [PATCH 0/9] block: reviewed add_disk() error handling set

2021-10-21 Thread Jens Axboe

On Fri, 15 Oct 2021 16:30:19 -0700, Luis Chamberlain wrote:
> Jens,
> 
> I had last split up patches into 7 groups, but at this point now
> most changes are merged except a few more drivers. Instead of creating
> a new patch set for each of the 7 groups I'm creating 3 new groups of
> patches now:
> 
> [...]

Applied, thanks!

[3/9] dm: add add_disk() error handling
  commit: e7089f65dd51afeda5eb760506b5950d95f9ec29
[4/9] bcache: add error handling support for add_disk()
  commit: 2961c3bbcaec0ed7fb7b9a465b3796f37f2294e5
[5/9] xen-blkfront: add error handling support for add_disk()
  commit: 293a7c528803321479593d42d0898bb5a9769db1
[6/9] m68k/emu/nfblock: add error handling support for add_disk()
  commit: 21fd880d3da7564bab68979417cab7408e4f9642
[7/9] um/drivers/ubd_kern: add error handling support for add_disk()
  commit: 66638f163a2b5c5b462ca38525129b14a20117eb
[8/9] rnbd: add error handling support for add_disk()
  commit: 2e9e31bea01997450397d64da43b6675e0adb9e3
[9/9] mtd: add add_disk() error handling
  commit: 83b863f4a3f0de4ece7802d9121fed0c3e64145f

Best regards,
-- 
Jens Axboe

Re: [RFC PATCH 02/10] accel: Use qemu_security_policy_taint(), mark KVM and Xen as safe

2021-10-21 Thread Markus Armbruster

It's been a while...

Daniel P. Berrangé  writes:

> On Thu, Sep 09, 2021 at 01:20:16AM +0200, Philippe Mathieu-Daudé wrote:
>> Add the AccelClass::secure_policy_supported field to classify
>> safe (within security boundary) vs unsafe accelerators.
>> 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  include/qemu/accel.h | 5 +
>>  accel/kvm/kvm-all.c  | 1 +
>>  accel/xen/xen-all.c  | 1 +
>>  softmmu/vl.c | 3 +++
>>  4 files changed, 10 insertions(+)
>> 
>> diff --git a/include/qemu/accel.h b/include/qemu/accel.h
>> index 4f4c283f6fc..895e30be0de 100644
>> --- a/include/qemu/accel.h
>> +++ b/include/qemu/accel.h
>> @@ -44,6 +44,11 @@ typedef struct AccelClass {
>> hwaddr start_addr, hwaddr size);
>>  #endif
>>  bool *allowed;
>> +/*
>> + * Whether the accelerator is withing QEMU security policy boundary.
>> + * See: https://www.qemu.org/contribute/security-process/
>> + */
>> +bool secure_policy_supported;
>
> The security handling policy is a high level concept that is
> open to variation over time and also by downstream distro
> vendors.

Moreover, the concept of "tainting" isn't limited to "because
security".

> At a code level we should be dealing in a more fundamental
> concept. At an accelerator level we should really jsut
> declare whether or not the accelerator impl is considered
> to be secure against malicious guest code.
>
> eg
>
> /* Whether this accelerator is secure against execution
>  * of malciious guest machine code */
> bool secure;

What I'd like to see is a separation of "assertions", like "not meant to
serve as security boundary", and policy.  Yes, this is vague.  Take it
as food for thought.

>>  /*
>>   * Array of global properties that would be applied when specific
>>   * accelerator is chosen. It works like MachineClass.compat_props
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 0125c17edb8..eb6b9e44df2 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -3623,6 +3623,7 @@ static void kvm_accel_class_init(ObjectClass *oc, void 
>> *data)
>>  ac->init_machine = kvm_init;
>>  ac->has_memory = kvm_accel_has_memory;
>>  ac->allowed = _allowed;
>> +ac->secure_policy_supported = true;
>>  
>>  object_class_property_add(oc, "kernel-irqchip", "on|off|split",
>>  NULL, kvm_set_kernel_irqchip,
>> diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
>> index 69aa7d018b2..57867af5faf 100644
>> --- a/accel/xen/xen-all.c
>> +++ b/accel/xen/xen-all.c
>> @@ -198,6 +198,7 @@ static void xen_accel_class_init(ObjectClass *oc, void 
>> *data)
>>  ac->setup_post = xen_setup_post;
>>  ac->allowed = _allowed;
>>  ac->compat_props = g_ptr_array_new();
>> +ac->secure_policy_supported = true;
>>  
>>  compat_props_add(ac->compat_props, compat, G_N_ELEMENTS(compat));
>>  
>> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> index 92c05ac97ee..e4f94e159c3 100644
>> --- a/softmmu/vl.c
>> +++ b/softmmu/vl.c
>> @@ -2388,6 +2388,9 @@ static int do_configure_accelerator(void *opaque, 
>> QemuOpts *opts, Error **errp)
>>  return 0;
>>  }
>>  
>> +qemu_security_policy_taint(!ac->secure_policy_supported,
>> +   "%s accelerator", acc);
>
> We need this information to be introspectable, becuase stuff printed
> to stderr is essentially opaque to libvirt and mgmt apps above.
>
> We don't have a convenient "query-accel" command but I think this
> could possibly fit into 'query-target'. ie the TargetInfo struct
> gain a field:
>  
>
>   ##
>   # @TargetInfo:
>   #
>   # Information describing the QEMU target.
>   #
>   # @arch: the target architecture
>   # @secure: Whether the currently active accelerator for this target
>   #  is secure against execution of malicous guest code
>   #
>   # Since: 1.2
>   ##
>   { 'struct': 'TargetInfo',
> 'data': { 'arch': 'SysEmuTarget',
>   'secure': 'bool'} }

My preferred means of introspection is QAPI schema introspection.  For
QMP, that's query-qmp-schema.  For CLI, it doesn't exist, yet.

If it did, then it would tell us that (QAPIfied) -accel takes an
argument @accel of a certain enumeration type.  We could then tack
suitable feature flags to the enumeration type's values.

If we make the feature flags "special", i.e. known to QAPI, we can then
tie them to policy, like special feature flag 'deprecated' is tied to
policy configured with -compat deprecated-{input,output}=...

Alternatively, leave policy to the management application.

QAPI schema feature flags plus policy are is not a *complete* solution,
just like feature flag 'deprecated' and -compat are not a complete
solution for handling use of deprecated interfaces: we can and do
deprecate usage that isn't tied to a syntactic element in the QAPI
schema.  Example: commit a9b305ba291 deprecated use of socket chardev
option wait together with server=true.

It is, however, a solution for a

Re: [PATCH 10/12] xen/x86: call hypercall handlers via switch statement

2021-10-21 Thread Jan Beulich

On 15.10.2021 14:51, Juergen Gross wrote:
> Instead of using a function table use the generated switch statement
> macros for calling the appropriate hypercall handlers.
> 
> This is beneficial to performance and avoids speculation issues.
> 
> With calling the handlers using the correct number of parameters now
> it is possible to do the parameter register clobbering in the NDEBUG
> case after returning from the handler. This in turn removes the only
> users of hypercall_args_table[] which can be removed now.

"removed" reads misleading to me: You really replace it by new tables,
using script-generated initializers. Also it looks like you're doubling
the data, as the same sets were previously used by pv64/hvm64 and
pv32/hvm32 respectively.

> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -108,56 +108,10 @@ long hvm_physdev_op(int cmd, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>  return compat_physdev_op(cmd, arg);
>  }
>  
> -#define HYPERCALL(x) \
> -[ __HYPERVISOR_ ## x ] = { (hypercall_fn_t *) do_ ## x,  \
> -   (hypercall_fn_t *) do_ ## x }
> -
> -#define HVM_CALL(x)  \
> -[ __HYPERVISOR_ ## x ] = { (hypercall_fn_t *) hvm_ ## x, \
> -   (hypercall_fn_t *) hvm_ ## x }
> -
> -#define COMPAT_CALL(x)   \
> -[ __HYPERVISOR_ ## x ] = { (hypercall_fn_t *) do_ ## x,  \
> -   (hypercall_fn_t *) compat_ ## x }
> -
> -static const struct {
> -hypercall_fn_t *native, *compat;
> -} hvm_hypercall_table[] = {
> -HVM_CALL(memory_op),
> -COMPAT_CALL(multicall),
> -#ifdef CONFIG_GRANT_TABLE
> -HVM_CALL(grant_table_op),
> -#endif
> -HYPERCALL(vm_assist),
> -COMPAT_CALL(vcpu_op),
> -HVM_CALL(physdev_op),
> -COMPAT_CALL(xen_version),
> -HYPERCALL(console_io),
> -HYPERCALL(event_channel_op),
> -COMPAT_CALL(sched_op),
> -COMPAT_CALL(set_timer_op),
> -COMPAT_CALL(xsm_op),
> -HYPERCALL(hvm_op),
> -HYPERCALL(sysctl),
> -HYPERCALL(domctl),
> -#ifdef CONFIG_ARGO
> -COMPAT_CALL(argo_op),
> -#endif
> -COMPAT_CALL(platform_op),
> -#ifdef CONFIG_PV
> -COMPAT_CALL(mmuext_op),
> -#endif
> -HYPERCALL(xenpmu_op),
> -COMPAT_CALL(dm_op),
> -#ifdef CONFIG_HYPFS
> -HYPERCALL(hypfs_op),
> +#ifndef NDEBUG
> +static unsigned char hypercall_args_64[] = hypercall_args_hvm64;
> +static unsigned char hypercall_args_32[] = hypercall_args_hvm32;

Irrespective of this being debugging-only: Const?

> @@ -239,33 +176,11 @@ int hvm_hypercall(struct cpu_user_regs *regs)
>  HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%lu(%lx, %lx, %lx, %lx, %lx)",
>  eax, rdi, rsi, rdx, r10, r8);
>  
> -#ifndef NDEBUG
> -/* Deliberately corrupt parameter regs not used by this hypercall. */
> -switch ( hypercall_args_table[eax].native )
> -{
> -case 0: rdi = 0xdeadbeefdeadf00dUL; fallthrough;
> -case 1: rsi = 0xdeadbeefdeadf00dUL; fallthrough;
> -case 2: rdx = 0xdeadbeefdeadf00dUL; fallthrough;
> -case 3: r10 = 0xdeadbeefdeadf00dUL; fallthrough;
> -case 4: r8 = 0xdeadbeefdeadf00dUL;
> -}
> -#endif
> -
> -regs->rax = hvm_hypercall_table[eax].native(rdi, rsi, rdx, r10, r8);
> +call_handlers_hvm64(eax, regs->rax, rdi, rsi, rdx, r10, r8);
>  
>  #ifndef NDEBUG
> -if ( !curr->hcall_preempted )
> -{
> -/* Deliberately corrupt parameter regs used by this hypercall. */
> -switch ( hypercall_args_table[eax].native )
> -{
> -case 5: regs->r8  = 0xdeadbeefdeadf00dUL; fallthrough;
> -case 4: regs->r10 = 0xdeadbeefdeadf00dUL; fallthrough;
> -case 3: regs->rdx = 0xdeadbeefdeadf00dUL; fallthrough;
> -case 2: regs->rsi = 0xdeadbeefdeadf00dUL; fallthrough;
> -case 1: regs->rdi = 0xdeadbeefdeadf00dUL;
> -}
> -}
> +if ( !curr->hcall_preempted && regs->rax != -ENOSYS )
> +clobber_regs(regs, hypercall_args_64[eax]);

I'm not fundamentally opposed, but sadly -ENOSYS comes back also in undue
situations, e.g. various hypercalls still produce this for "unknown
sub-function". Hence the weakened clobbering wants at least mentioning,
perhaps also justifying, in the description.

> @@ -55,4 +42,34 @@ compat_common_vcpu_op(
>  
>  #endif /* CONFIG_COMPAT */
>  
> +#ifndef NDEBUG

Hmm, I was actuall hoping for the conditional to actually live ...

> +static inline void clobber_regs(struct cpu_user_regs *regs,
> +unsigned int nargs)
> +{

... here and ...

> +/* Deliberately corrupt used parameter regs. */
> +switch ( nargs )
> +{
> +case 5: regs->r8  = 0xdeadbeefdeadf00dUL; fallthrough;
> +case 4: regs->r10 = 0xdeadbeefdeadf00dUL; fallthrough;
> +case 3: regs->rdx =

[xen-unstable test] 165699: tolerable FAIL - PUSHED

2021-10-21 Thread osstest service owner

flight 165699 xen-unstable real [real]
flight 165710 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/165699/
http://logs.test-lab.xenproject.org/osstest/logs/165710/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install 
fail pass in 165710-retest

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail REGR. vs. 165692

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 165692
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 165692
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 165692
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 165692
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 165692
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 165692
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 165692
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 165692
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 165692
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 165692
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 165692
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 165692
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-check

Re: [XEN PATCH v7 42/51] build: grab common EFI source files in arch specific dir

2021-10-21 Thread Jan Beulich

On 21.10.2021 15:54, Anthony PERARD wrote:
> On Thu, Oct 21, 2021 at 01:24:27PM +0200, Jan Beulich wrote:
>> On 21.10.2021 13:03, Anthony PERARD wrote:
>>> On Mon, Oct 18, 2021 at 10:48:26AM +0200, Jan Beulich wrote:
 On 15.10.2021 18:29, Anthony PERARD wrote:
> On Thu, Oct 14, 2021 at 10:51:44AM +0200, Jan Beulich wrote:
>> On 24.08.2021 12:50, Anthony PERARD wrote:
>>>  obj-y += boot.init.o pe.init.o ebmalloc.o runtime.o
>>>  obj-$(CONFIG_ACPI) +=  efi-dom0.init.o
>>> +
>>> +$(obj)/%.c: common/efi/%.c
>>> +   $(Q)cp -f $< $@
>>
>> In case both trees are on the same file system, trying to hardlink first
>> would seem desirable. When copying, I think you should also pass -p.
>
> I don't know if doing an hardlink is a good thing to do, I'm not sure of
> the kind of issue this could bring. As for -p, I don't think it's a good
> idea to copy the mode, ownership, and timestamps of the source file, I'd
> rather have the timestamps that Make expect, e.i. "now".

 Why would "now" be correct (or expected) in any way? The cloned file is no
 different from the original. Nevertheless I agree that -p is not ideal;
 it's just that the more fine grained option to preserve just the timestamp
 is non-standard afaik. You could try that first and fall back to -p ...
 Otherwise, failing hard linking and using "cp -p", I'm afraid I'd prefer
 symlinking despite the arguments against it that you name in the
 description.
>>>
>>> I guess I'm missing something, is there a reason to keep/copy the
>>> timestamps of the original files?
>>
>> Avoidance of confusion is my main aim here. I certainly would be puzzled
>> to see what looks like a source file to have a time stamp much newer than
>> expected.
> 
> So, there isn't really anything to do with the timestamps :-). I guess
> we could keep using symbolic links, but force update the link at every
> build.
> 
> I've tried that:
> $(obj)/%.c: $(abs_srctree)/common/efi/%.c FORCE
> $(Q)ln -nsf $< $@
> 
> and make seems happy. The link command run every time (due to adding
> FORCE), but the `CC` command isn't, so that seems good. The recipe that
> would run the `CC` command check if the prerequisite are newer than the
> target using $? so it doesn't matter if the rule that update the source
> file as run or not.

Looks okay to me.

One additional consideration, though: Linux puts a "source" link in the
build tree. If we did so as well, then that could be the only absolute
symlink that's needed. Links like the one you suggest could be relative
ones into source/. But I guess this could as well be left as a future
exercise, in case anyone cares to limit the number of absolute symlinks.

Jan

Re: [XEN PATCH v7 49/51] build: adding out-of-tree support to the xen build

2021-10-21 Thread Anthony PERARD

On Mon, Oct 18, 2021 at 12:36:45PM +0200, Jan Beulich wrote:
> On 18.10.2021 12:28, Juergen Gross wrote:
> > On 18.10.21 11:51, Anthony PERARD wrote:
> >> On Mon, Oct 18, 2021 at 11:02:20AM +0200, Jan Beulich wrote:
> >>> Oh, now I'm curious as to the why here. I thought use of $(srctree)
> >>> ought to be the exception.
> >>
> >> In Linux, the use of $(srctree) is indeed the exception. This is because
> >> we have VPATH=$(srctree), so when `make` look for a prerequisite or a
> >> target it will look first in the current directory and then in
> >> $(srctree). That works fine as long as the source tree only have sources
> >> and no built files.
> >>
> >> But if we want to be able to build the pv-shim without the linkfarm and
> >> thus using out-of-tree build, we are going to need the ability to build
> >> from a non-clean source tree. I don't think another way is possible.
> > 
> > Is there any reason (apart from historical ones) to build the hypervisor
> > in $(srctree)?
> > 
> > I could see several advantages to build it in another directory as soon
> > as the build system has this capability:
> > 
> > - possibility to have a simple build target for building multiple archs
> >(assuming the cross-tools are available), leading to probably less
> >problems with breaking the build of "the other" architecture we are
> >normally not working with (and in future with e.g. Risc-V being added
> >this will be even more important)
> > 
> > - possibility to have a debug and a non-debug build in parallel (in fact
> >at least at SUSE we are working around that by building those with an
> >intermediate "make clean" for being able to package both variants)
> > 
> > - make clean for the hypervisor part would be just a "rm -r"
> 
> I fully agree, yet ...
> 
> > Yes, this would require us (the developers) to maybe change some habits,
> > but I think this would be better than working around the issues by
> > adding $(srctree) all over the build system.
> 
> ... developers' habits would only be my second concern here (and if that
> had been the only one, then I would not see this as a reason speaking
> against the change, but as said I've never been building from the root,
> and I've also been building sort of out-of-tree all the time). Yet while
> writing this reply I came to realize that my primary concern was wrong:
> People would not need to adjust their spec files (or alike), at least
> not as long as they consume only files living under dist/.
> 
> So, Anthony - thoughts about making the default in-tree Xen build
> actually build into, say, build/xen/?

I don't think I should be the one answering this question. It would
probably be fairly easy to do.

But, I've already got some unfinished patches which add the ability to
do in-tree and out-of-tree at the same time, so I'll propose them and we
can decide whether they are worth it or if they are too much. That is,
after we have at least the ability to do out-of-tree build.

Cheers,

-- 
Anthony PERARD

[xen-unstable-smoke test] 165708: tolerable all pass - PUSHED

2021-10-21 Thread osstest service owner

flight 165708 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/165708/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  98f60e5de00baf650c574c9352bb19aedb082dea
baseline version:
 xen  118da371d1ff5d8432fa299544b1ea5e7e3710f0

Last test of basis   165691  2021-10-20 16:01:40 Z0 days
Testing same since   165708  2021-10-21 11:01:40 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   118da371d1..98f60e5de0  98f60e5de00baf650c574c9352bb19aedb082dea -> smoke

Re: [XEN PATCH v7 42/51] build: grab common EFI source files in arch specific dir

2021-10-21 Thread Anthony PERARD

On Thu, Oct 21, 2021 at 01:24:27PM +0200, Jan Beulich wrote:
> On 21.10.2021 13:03, Anthony PERARD wrote:
> > On Mon, Oct 18, 2021 at 10:48:26AM +0200, Jan Beulich wrote:
> >> On 15.10.2021 18:29, Anthony PERARD wrote:
> >>> On Thu, Oct 14, 2021 at 10:51:44AM +0200, Jan Beulich wrote:
>  On 24.08.2021 12:50, Anthony PERARD wrote:
> >  obj-y += boot.init.o pe.init.o ebmalloc.o runtime.o
> >  obj-$(CONFIG_ACPI) +=  efi-dom0.init.o
> > +
> > +$(obj)/%.c: common/efi/%.c
> > +   $(Q)cp -f $< $@
> 
>  In case both trees are on the same file system, trying to hardlink first
>  would seem desirable. When copying, I think you should also pass -p.
> >>>
> >>> I don't know if doing an hardlink is a good thing to do, I'm not sure of
> >>> the kind of issue this could bring. As for -p, I don't think it's a good
> >>> idea to copy the mode, ownership, and timestamps of the source file, I'd
> >>> rather have the timestamps that Make expect, e.i. "now".
> >>
> >> Why would "now" be correct (or expected) in any way? The cloned file is no
> >> different from the original. Nevertheless I agree that -p is not ideal;
> >> it's just that the more fine grained option to preserve just the timestamp
> >> is non-standard afaik. You could try that first and fall back to -p ...
> >> Otherwise, failing hard linking and using "cp -p", I'm afraid I'd prefer
> >> symlinking despite the arguments against it that you name in the
> >> description.
> > 
> > I guess I'm missing something, is there a reason to keep/copy the
> > timestamps of the original files?
> 
> Avoidance of confusion is my main aim here. I certainly would be puzzled
> to see what looks like a source file to have a time stamp much newer than
> expected.

So, there isn't really anything to do with the timestamps :-). I guess
we could keep using symbolic links, but force update the link at every
build.

I've tried that:
$(obj)/%.c: $(abs_srctree)/common/efi/%.c FORCE
$(Q)ln -nsf $< $@

and make seems happy. The link command run every time (due to adding
FORCE), but the `CC` command isn't, so that seems good. The recipe that
would run the `CC` command check if the prerequisite are newer than the
target using $? so it doesn't matter if the rule that update the source
file as run or not.

Thanks,

-- 
Anthony PERARD

Re: Tentative fix for "out of PoD memory" issue

2021-10-21 Thread Marek Marczykowski-Górecki

On Thu, Oct 21, 2021 at 01:53:06PM +0200, Juergen Gross wrote:
> Marek,
> 
> could you please test whether the attached patch is fixing your
> problem?

Sure. In fact, I made a similar patch in the meantime (attached) to
experiment with this a bit.

> BTW, I don't think this couldn't happen before kernel 5.15. I guess
> my modification to use a kernel thread instead of a workqueue just
> made the issue more probable.

I think you are right here. But this all looks still a bit weird.

1. baseline: 5.10.61 (before using kernel thread - which was backported to
stable branches).

Here the startup completes successfully (no "out of PoD memory"
issue) with memory=270MB. 

2. 5.10.61 with added boot delay patch:
The delay is about 18s and the guest boot successfully.

3. 5.10.71 with "xen/balloon: fix cancelled balloon action" but without
delay patch:
The domain is killed during startup (in the middle of fsck, I think)

4. 5.10.74 with delay patch:
The delay is about 19s and the guest boot successfully.

Now the weird part: with memory=270MB with the delay patch, the balloon
delay _fails_ - state=BP_ECANCELED, and credit is -19712 at that time.
In both thread and workqueue balloon variants. Yet, it isn't killed (*).
But with 5.10.61, even without the delay patch, the guest starts
successfully in the end.

Also, I think there was some implicit wait for initial balloon down
before. That was the main motivation for 197ecb3802c0 "xen/balloon: add
runtime control for scrubbing ballooned out pages" - because that
initial balloon down held the system startup for some long time. Sadly,
I can't find my notes from debugging that (especially if I had written
down a stacktrace _where_ exactly it was waiting)...

> I couldn't reproduce the crash you are seeing, but the introduced
> wait was 4.2 seconds on my test system (a PVH guest with 2 GB of
> memory, maxmem 6 GB).

I'm testing it on a much more aggressive setting:
 - memory: 270 MB (the minimal that is sufficient to boot the system)
 - maxmem: 4 GB

The default settings in Qubes are:
 - memory: 400 MB
 - maxmem: 4 GB

That should explains why it happens on Qubes way more often than
elsewhere.

(*) At some point during system boot, qubes memory manager kicks in and
the VM gets more memory. But it's rather late, and definitely after it is
killed with "out of PoD memory" in other cases.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
From 947818c731094a952d4955e99a23ef336daf7ab9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=

Date: Thu, 21 Oct 2021 01:10:21 +0200
Subject: [PATCH] WIP: xen/balloon: wait for initial balloon down before
 starting userspace
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Organization: Invisible Things Lab
Cc: Marek Marczykowski-Górecki 

When HVM/PVH guest with maxmem > memory, a populate on demand feature is
used. This allows the guest to see up to 'maxmem' memory, but when it
tries to use more than 'memory', it is crashed. Balloon driver should
prevent that by ballooning down the guest before it tries to use too
much memory. Unfortunately, this was done asynchronously and it wasn't
really guaranteed to be quick enough. And indeed, with recent kernel
versions, the initial balloon down process is slower and guests with
small initial 'memory' are crashed frequently by Xen.

Fix this by adding late init call that waits for the initial balloon
down to complete, before allowing any userspace to run. If that initial
balloon down fails, it is very likely that guest will be killed (as soon
as it will really use all the memory that something has allocated) -
print a message about that to aid diagnosing issues.

Signed-off-by: Marek Marczykowski-Górecki 
(cherry picked from commit 9a226e669c918c98ee603ee30a1798da6434a423)
---
 drivers/xen/balloon.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index b57b2067ecbf..c2a4e25a14dc 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -136,6 +137,8 @@ static DEFINE_MUTEX(balloon_mutex);
 struct balloon_stats balloon_stats;
 EXPORT_SYMBOL_GPL(balloon_stats);

+static DECLARE_COMPLETION(initial_balloon);
+
 /* We increase/decrease in batches which fit in a page */
 static xen_pfn_t frame_list[PAGE_SIZE / sizeof(xen_pfn_t)];

@@ -501,7 +504,6 @@ static void balloon_process(struct work_struct *work)
enum bp_state state = BP_DONE;
long credit;

-
do {
mutex_lock(_mutex);

@@ -526,6 +528,15 @@ static void balloon_process(struct work_struct *work)

state = update_schedule(state);

+   if (credit >= 0)
+   complete(_balloon);
+   else if (state == BP_ECANCELED) {
+   if (!completion_done(_balloon) &&

Re: xen/arm: Missing appropriate locking for the IOMMU (WAS Re: [PATCH v5 02/11] xen/arm: Add PHYSDEVOP_pci_device_(*add/remove) support for ARM)

2021-10-21 Thread Bertrand Marquis

Hi Julien,

> On 21 Oct 2021, at 14:47, Julien Grall  wrote:
> 
> 
> 
> On 21/10/2021 14:15, Bertrand Marquis wrote:
>> Hi Julien,
> 
> Hi Bertand,
> 
>>> On 21 Oct 2021, at 10:28, Julien Grall  wrote:
>>> 
>>> Hi all,
>>> 
>>> While going through the passthrough code. I noticed that we don't have a 
>>> common lock for the IOMMU between the PCI and DT code.
>>> 
>>> This is going to be an issue given it would technically be possible to add 
>>> a PCI device while assigning a DT.
>>> 
>>> Rahul, Bertrand, Oleksandr, can you have a look at the issue?
>> Yes we can have a look at this.
>> Right now pci device add is done by dom0 so I do not think we have an issue 
>> in practice unless I wrongly understood something
> This will depend on the XSM policy. With the default one, then yes I agree 
> that only dom0 (we don't support hardware domain) can add PCI device.
> 
> However, this restriction doesn't really matter here. You would be relying on 
> dom0 to do the locking and AFAIK this doesn't exist. Instead, the admin would 
> have to ensure that two don't happen at the same time.
> 
> Anyway, I think Xen should take care of preventing concurrent IOMMU 
> operations rather than relying on external subsystem (e.g. dom0) to do it. At 
> least the Arm SMMU driver will rely the generic locking to modify atomically 
> internal list.

Agree, was just trying to make sure I understood the problem correctly.
We will check that.

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall

Re: xen/arm: Missing appropriate locking for the IOMMU (WAS Re: [PATCH v5 02/11] xen/arm: Add PHYSDEVOP_pci_device_(*add/remove) support for ARM)

2021-10-21 Thread Julien Grall





On 21/10/2021 14:15, Bertrand Marquis wrote:

Hi Julien,


Hi Bertand,


On 21 Oct 2021, at 10:28, Julien Grall  wrote:

Hi all,

While going through the passthrough code. I noticed that we don't have a common 
lock for the IOMMU between the PCI and DT code.

This is going to be an issue given it would technically be possible to add a 
PCI device while assigning a DT.

Rahul, Bertrand, Oleksandr, can you have a look at the issue?


Yes we can have a look at this.

Right now pci device add is done by dom0 so I do not think we have an issue in 
practice unless I wrongly understood something
This will depend on the XSM policy. With the default one, then yes I 
agree that only dom0 (we don't support hardware domain) can add PCI device.


However, this restriction doesn't really matter here. You would be 
relying on dom0 to do the locking and AFAIK this doesn't exist. Instead, 
the admin would have to ensure that two don't happen at the same time.


Anyway, I think Xen should take care of preventing concurrent IOMMU 
operations rather than relying on external subsystem (e.g. dom0) to do 
it. At least the Arm SMMU driver will rely the generic locking to modify 
atomically internal list.


Cheers,

--
Julien Grall

Re: Ping: [PATCH] x86/xstate: reset cached register values on resume

2021-10-21 Thread Roger Pau Monné

On Mon, Oct 18, 2021 at 10:21:28AM +0200, Jan Beulich wrote:
> On 24.08.2021 23:11, Andrew Cooper wrote:
> > On 18/08/2021 13:44, Andrew Cooper wrote:
> >> On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
> >>> set_xcr0() and set_msr_xss() use cached value to avoid setting the
> >>> register to the same value over and over. But suspend/resume implicitly
> >>> reset the registers and since percpu areas are not deallocated on
> >>> suspend anymore, the cache gets stale.
> >>> Reset the cache on resume, to ensure the next write will really hit the
> >>> hardware. Choose value 0, as it will never be a legitimate write to
> >>> those registers - and so, will force write (and cache update).
> >>>
> >>> Note the cache is used io get_xcr0() and get_msr_xss() too, but:
> >>> - set_xcr0() is called few lines below in xstate_init(), so it will
> >>>   update the cache with appropriate value
> >>> - get_msr_xss() is not used anywhere - and thus not before any
> >>>   set_msr_xss() that will fill the cache
> >>>
> >>> Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
> >>> Signed-off-by: Marek Marczykowski-Górecki 
> >>> 
> >> I'd prefer to do this differently.  As I said in the thread, there are
> >> other registers such as MSR_TSC_AUX which fall into the same category,
> >> and I'd like to make something which works systematically.
> > 
> > Ok - after some searching, I think we have problems with:
> > 
> > cpu/common.c:47:DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);
> > cpu/common.c:120:static DEFINE_PER_CPU(uint64_t, msr_misc_features);
> > msr.c:35:DEFINE_PER_CPU(uint32_t, tsc_aux);
> > xstate.c:36:static DEFINE_PER_CPU(uint64_t, xcr0);
> > xstate.c:79:static DEFINE_PER_CPU(uint64_t, xss);
> > 
> > There is also:
> > 
> > traps.c:100:DEFINE_PER_CPU(uint64_t, efer);
> > 
> > which we *almost* handle correctly, but fail to update the cache on the
> > BSP out of S3.
> > 
> > 
> > For the APIC, I think we have issues with:
> > 
> > irq.c:1083:static DEFINE_PER_CPU(struct pending_eoi,
> > pending_eoi[NR_DYNAMIC_VECTORS]);
> > 
> > because we don't defer S3 until all pending EOIs are complete.
> 
> As your planned more extensive rework appears to not have made much
> progress yet, may I suggest that we go with Marek's fix for 4.16,
> with the one adjustment I suggested alongside giving my R-b?

I think that's the only viable solution in order to avoid shipping a
broken 4.16 so we should go ahead with it.

Thanks, Roger.

Re: xen/arm: Missing appropriate locking for the IOMMU (WAS Re: [PATCH v5 02/11] xen/arm: Add PHYSDEVOP_pci_device_(*add/remove) support for ARM)

2021-10-21 Thread Bertrand Marquis

Hi Julien,

> On 21 Oct 2021, at 10:28, Julien Grall  wrote:
> 
> Hi all,
> 
> While going through the passthrough code. I noticed that we don't have a 
> common lock for the IOMMU between the PCI and DT code.
> 
> This is going to be an issue given it would technically be possible to add a 
> PCI device while assigning a DT.
> 
> Rahul, Bertrand, Oleksandr, can you have a look at the issue?

Yes we can have a look at this.

Right now pci device add is done by dom0 so I do not think we have an issue in 
practice unless I wrongly understood something.
But for sure in theory yes we need to look at this.

Cheers
Bertrand 

> 
> Cheers,
> 
> On 06/10/2021 18:40, Rahul Singh wrote:
>> Hardware domain is in charge of doing the PCI enumeration and will
>> discover the PCI devices and then will communicate to XEN via hyper
>> call PHYSDEVOP_pci_device_add(..) to add the PCI devices in XEN.
>> Also implement PHYSDEVOP_pci_device_remove(..) to remove the PCI device.
>> As most of the code for PHYSDEVOP_pci_device_* is the same between x86
>> and ARM, move the code to a common file to avoid duplication.
>> There are other PHYSDEVOP_pci_device_* operations to add PCI devices.
>> Currently implemented PHYSDEVOP_pci_device_remove(..) and
>> PHYSDEVOP_pci_device_add(..) only as those are minimum required to
>> support PCI passthrough on ARM.
>> Signed-off-by: Rahul Singh 
>> ---
>> Change in v5:
>> - Move the pci_physdev_op() stub to xen/arch/arm/physdev.c.
>> Change in v4:
>> - Move file commom/physdev.c to drivers/pci/physdev.c
>> - minor comments.
>> Change in v3: Fixed minor comment.
>> Change in v2:
>> - Add support for PHYSDEVOP_pci_device_remove()
>> - Move code to common code
>> ---
>> ---
>>  xen/arch/arm/physdev.c|  6 ++-
>>  xen/arch/x86/physdev.c| 52 +--
>>  xen/arch/x86/x86_64/physdev.c |  2 +-
>>  xen/drivers/pci/Makefile  |  1 +
>>  xen/drivers/pci/physdev.c | 80 +++
>>  xen/include/public/arch-arm.h |  4 +-
>>  xen/include/xen/hypercall.h   |  4 ++
>>  7 files changed, 96 insertions(+), 53 deletions(-)
>>  create mode 100644 xen/drivers/pci/physdev.c
>> diff --git a/xen/arch/arm/physdev.c b/xen/arch/arm/physdev.c
>> index e91355fe22..f9aa274dda 100644
>> --- a/xen/arch/arm/physdev.c
>> +++ b/xen/arch/arm/physdev.c
>> @@ -8,13 +8,17 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>> +#include 
>>  int do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>  {
>> +#ifdef CONFIG_HAS_PCI
>> +return pci_physdev_op(cmd, arg);
>> +#else
>>  gdprintk(XENLOG_DEBUG, "PHYSDEVOP cmd=%d: not implemented\n", cmd);
>>  return -ENOSYS;
>> +#endif
>>  }
>>/*
>> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
>> index 23465bcd00..ea38be8b79 100644
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -12,7 +12,7 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -480,54 +480,6 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  break;
>>  }
>>  -case PHYSDEVOP_pci_device_add: {
>> -struct physdev_pci_device_add add;
>> -struct pci_dev_info pdev_info;
>> -nodeid_t node;
>> -
>> -ret = -EFAULT;
>> -if ( copy_from_guest(, arg, 1) != 0 )
>> -break;
>> -
>> -pdev_info.is_extfn = !!(add.flags & XEN_PCI_DEV_EXTFN);
>> -if ( add.flags & XEN_PCI_DEV_VIRTFN )
>> -{
>> -pdev_info.is_virtfn = 1;
>> -pdev_info.physfn.bus = add.physfn.bus;
>> -pdev_info.physfn.devfn = add.physfn.devfn;
>> -}
>> -else
>> -pdev_info.is_virtfn = 0;
>> -
>> -if ( add.flags & XEN_PCI_DEV_PXM )
>> -{
>> -uint32_t pxm;
>> -size_t optarr_off = offsetof(struct physdev_pci_device_add, 
>> optarr) /
>> -sizeof(add.optarr[0]);
>> -
>> -if ( copy_from_guest_offset(, arg, optarr_off, 1) )
>> -break;
>> -
>> -node = pxm_to_node(pxm);
>> -}
>> -else
>> -node = NUMA_NO_NODE;
>> -
>> -ret = pci_add_device(add.seg, add.bus, add.devfn, _info, node);
>> -break;
>> -}
>> -
>> -case PHYSDEVOP_pci_device_remove: {
>> -struct physdev_pci_device dev;
>> -
>> -ret = -EFAULT;
>> -if ( copy_from_guest(, arg, 1) != 0 )
>> -break;
>> -
>> -ret = pci_remove_device(dev.seg, dev.bus, dev.devfn);
>> -break;
>> -}
>> -
>>  case PHYSDEVOP_prepare_msix:
>>  case PHYSDEVOP_release_msix: {
>>  struct physdev_pci_device dev;
>> @@ -663,7 +615,7 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  }
>>default:
>> -ret = -ENOSYS;
>> +ret = pci_physdev_op(cmd, arg);
>>  break;
>>  }
>>  diff --git

[PATCH net-next v2 01/12] net: xen: use eth_hw_addr_set()

2021-10-21 Thread Jakub Kicinski

Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.

Signed-off-by: Jakub Kicinski 
---
CC: wei@kernel.org
CC: p...@xen.org
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
CC: sstabell...@kernel.org
CC: xen-devel@lists.xenproject.org
---
 drivers/net/xen-netback/interface.c | 6 --
 drivers/net/xen-netfront.c  | 4 +++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index c58996c1e230..fe8e21ad8ed9 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -494,6 +494,9 @@ static const struct net_device_ops xenvif_netdev_ops = {
 struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
unsigned int handle)
 {
+   static const u8 dummy_addr[ETH_ALEN] = {
+   0xfe, 0xff, 0xff, 0xff, 0xff, 0xff,
+   };
int err;
struct net_device *dev;
struct xenvif *vif;
@@ -551,8 +554,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t 
domid,
 * stolen by an Ethernet bridge for STP purposes.
 * (FE:FF:FF:FF:FF:FF)
 */
-   eth_broadcast_addr(dev->dev_addr);
-   dev->dev_addr[0] &= ~0x01;
+   eth_hw_addr_set(dev, dummy_addr);
 
netif_carrier_off(dev);
 
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index e31b98403f31..57437e4b8a94 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -2157,6 +2157,7 @@ static int talk_to_netback(struct xenbus_device *dev,
unsigned int max_queues = 0;
struct netfront_queue *queue = NULL;
unsigned int num_queues = 1;
+   u8 addr[ETH_ALEN];
 
info->netdev->irq = 0;
 
@@ -2170,11 +2171,12 @@ static int talk_to_netback(struct xenbus_device *dev,
"feature-split-event-channels", 0);
 
/* Read mac addr. */
-   err = xen_net_read_mac(dev, info->netdev->dev_addr);
+   err = xen_net_read_mac(dev, addr);
if (err) {
xenbus_dev_fatal(dev, err, "parsing %s/mac", dev->nodename);
goto out_unlocked;
}
+   eth_hw_addr_set(info->netdev, addr);
 
info->netback_has_xdp_headroom = 
xenbus_read_unsigned(info->xbdev->otherend,
  
"feature-xdp-headroom", 0);
-- 
2.31.1

[PATCH v4] xen/arm: vgic: Ignore write access to ICPENDR*

2021-10-21 Thread Hongda Deng

Currently, Xen will return IO unhandled when guests write ICPENDR*
virtual registers, which will raise a data abort inside the guest.
For Linux guest, these virtual registers will not be accessed. But
for Zephyr, these virtual registers will be accessed during the
initialization. Zephyr guest will get an IO data abort and crash.
Emulating ICPENDR is not easy with the existing vGIC, this patch
reworks the emulation to ignore write access to ICPENDR* virtual
registers and print a message about whether they are already pending
instead of returning unhandled.
More details can be found at [1].

[1] https://github.com/zephyrproject-rtos/zephyr/blob/eaf6cf745df3807e6e
cc941c3a30de6c179ae359/drivers/interrupt_controller/intc_gicv3.c#L274

Signed-off-by: Hongda Deng 
---
Changes since v3:
 *  Commit message modification
 *  Change "goto write_ignore_32" to "goto write_ignore" to avoid double
checking dabt.size
 *  Delete data.size check in vgic_v3_rdistr_sgi_mmio_write to avoid
double check in __vgic_v3_distr_common_mmio_write for SGI
 *  Declare flags, p, v_target within the loop to reduce their scope
 *  Use the same vgic_get_target_vcpu(v, irq) to get v_target for SPI,
PPI and SGI
 *  Code principle modification
Changes since v2:
 *  Avoid to print messages when there is no pending interrupt
 *  Add helper vgic_check_inflight_irqs_pending to check pending status
 *  Print a message for each interrupt separately
Changes since v1:
 *  Check pending states by going through vcpu->arch.vgic.inflight_irqs
instead of checking hardware registers
---
 xen/arch/arm/vgic-v2.c | 10 ++
 xen/arch/arm/vgic-v3.c | 17 -
 xen/arch/arm/vgic.c| 28 
 xen/include/asm-arm/vgic.h |  2 ++
 4 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index b2da886adc..589c033eda 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -481,10 +481,12 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, 
mmio_info_t *info,
 
 case VRANGE32(GICD_ICPENDR, GICD_ICPENDRN):
 if ( dabt.size != DABT_WORD ) goto bad_width;
-printk(XENLOG_G_ERR
-   "%pv: vGICD: unhandled word write %#"PRIregister" to 
ICPENDR%d\n",
-   v, r, gicd_reg - GICD_ICPENDR);
-return 0;
+rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ICPENDR, DABT_WORD);
+if ( rank == NULL ) goto write_ignore;
+
+vgic_check_inflight_irqs_pending(v->domain, v, rank->index, r);
+
+goto write_ignore;
 
 case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
 if ( dabt.size != DABT_WORD ) goto bad_width;
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index cb5a70c42e..65bb7991a6 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -817,10 +817,12 @@ static int __vgic_v3_distr_common_mmio_write(const char 
*name, struct vcpu *v,
 
 case VRANGE32(GICD_ICPENDR, GICD_ICPENDRN):
 if ( dabt.size != DABT_WORD ) goto bad_width;
-printk(XENLOG_G_ERR
-   "%pv: %s: unhandled word write %#"PRIregister" to ICPENDR%d\n",
-   v, name, r, reg - GICD_ICPENDR);
-return 0;
+rank = vgic_rank_offset(v, 1, reg - GICD_ICPENDR, DABT_WORD);
+if ( rank == NULL ) goto write_ignore;
+
+vgic_check_inflight_irqs_pending(v->domain, v, rank->index, r);
+
+goto write_ignore;
 
 case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
 if ( dabt.size != DABT_WORD ) goto bad_width;
@@ -986,11 +988,8 @@ static int vgic_v3_rdistr_sgi_mmio_write(struct vcpu *v, 
mmio_info_t *info,
  info, gicr_reg, r);
 
 case VREG32(GICR_ICPENDR0):
-if ( dabt.size != DABT_WORD ) goto bad_width;
-printk(XENLOG_G_ERR
-   "%pv: vGICR: SGI: unhandled word write %#"PRIregister" to 
ICPENDR0\n",
-   v, r);
-return 0;
+return __vgic_v3_distr_common_mmio_write("vGICR: SGI", v,
+ info, gicr_reg, r);
 
 case VREG32(GICR_IGRPMODR0):
 /* We do not implement security extensions for guests, write ignore */
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 8f9400a519..83386cf3d5 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -726,6 +726,34 @@ unsigned int vgic_max_vcpus(unsigned int 
domctl_vgic_version)
 }
 }
 
+void vgic_check_inflight_irqs_pending(struct domain *d, struct vcpu *v,
+  unsigned int rank, uint32_t r)
+{
+const unsigned long mask = r;
+unsigned int i;
+
+for_each_set_bit( i, , 32 )
+{
+struct pending_irq *p;
+struct vcpu *v_target;
+unsigned long flags;
+unsigned int irq = i + 32 * rank;
+
+v_target = vgic_get_target_vcpu(v, irq);
+
+spin_lock_irqsave(_target->arch.vgic.lock, flags);
+
+p =

Tentative fix for "out of PoD memory" issue

2021-10-21 Thread Juergen Gross


Marek,

could you please test whether the attached patch is fixing your
problem?

BTW, I don't think this couldn't happen before kernel 5.15. I guess
my modification to use a kernel thread instead of a workqueue just
made the issue more probable.

I couldn't reproduce the crash you are seeing, but the introduced
wait was 4.2 seconds on my test system (a PVH guest with 2 GB of
memory, maxmem 6 GB).


Juergen
From 3ee35f6f110e2258ec94f0d1397fac8c26b41761 Mon Sep 17 00:00:00 2001
From: Juergen Gross 
To: linux-ker...@vger.kernel.org
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: xen-devel@lists.xenproject.org
Date: Thu, 21 Oct 2021 12:51:06 +0200
Subject: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning
 done
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When running as PVH or HVM guest with actual memory < max memory the
hypervisor is using "populate on demand" in order to allow the guest
to balloon down from its maximum memory size. For this to work
correctly the guest must not touch more memory pages than its target
memory size as otherwise the PoD cache will be exhausted and the guest
is crashed as a result of that.

In extreme cases ballooning down might not be finished today before
the init process is started, which can consume lots of memory.

In order to avoid random boot crashes in such cases, add a late init
call to wait for ballooning down having finished for PVH/HVM guests.

Reported-by: Marek Marczykowski-Górecki 
Signed-off-by: Juergen Gross 
---
 drivers/xen/balloon.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3a50f097ed3e..d19b851c3d3b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -765,3 +765,23 @@ static int __init balloon_init(void)
 	return 0;
 }
 subsys_initcall(balloon_init);
+
+static int __init balloon_wait_finish(void)
+{
+	if (!xen_domain())
+		return -ENODEV;
+
+	/* PV guests don't need to wait. */
+	if (xen_pv_domain() || !current_credit())
+		return 0;
+
+	pr_info("Waiting for initial ballooning down having finished.\n");
+
+	while (current_credit())
+		schedule_timeout_interruptible(HZ / 10);
+
+	pr_info("Initial ballooning down finished.\n");
+
+	return 0;
+}
+late_initcall_sync(balloon_wait_finish);
-- 
2.26.2



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

[libvirt test] 165702: regressions - FAIL

2021-10-21 Thread osstest service owner

flight 165702 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/165702/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  2772162316dc4938cd60d19e14dabe4371188c05
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  468 days
Failing since151818  2020-07-11 04:18:52 Z  467 days  453 attempts
Testing same since   165702  2021-10-21 04:18:54 Z0 days1 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Andika Triwidada 
  Andrea Bolognani 
  Ani Sinha 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Borntraeger 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  dinglimin 
  Dmytro Linkin 
  Eiichi Tsukata 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Gavi Teitz 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Hela Basa 
  Helmut Grohne 
  Hiroki Narukawa 
  Ian Wienand 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  jason lee 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jinsheng Zhang 
  Jiri Denemark 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Lei Yang 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Chevsky 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Orion Poplawski 
  Pany 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Praveen K Paladugu 
  Richard W.M. Jones 
  Ricky Tigg 
  Robin Lee 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  SeongHyun Jo 
  Shalini Chellathurai Saroja 
  Shaojun Yang 
  Shi Lei 
  simmon 
  Simon Chopin 
  Simon Gaiser 
  Simon Rowe 
  Stefan Bader 
  Stefan Berger 
  Stefan Berger 
  Stefan Hajnoczi 
  Stefan Hajnoczi 
  Szymon Scholz 
  Thomas Huth 
  Tim Wiederhake 
  Tomáš Golembiovský 
  Tomáš Janoušek 
  Tuguoyi 
  Victor Toso 
  Ville Skyttä

Re: [XEN PATCH v7 42/51] build: grab common EFI source files in arch specific dir

2021-10-21 Thread Jan Beulich

On 21.10.2021 13:03, Anthony PERARD wrote:
> On Mon, Oct 18, 2021 at 10:48:26AM +0200, Jan Beulich wrote:
>> On 15.10.2021 18:29, Anthony PERARD wrote:
>>> On Thu, Oct 14, 2021 at 10:51:44AM +0200, Jan Beulich wrote:
 On 24.08.2021 12:50, Anthony PERARD wrote:
> --- a/xen/arch/arm/efi/Makefile
> +++ b/xen/arch/arm/efi/Makefile
> @@ -1,4 +1,10 @@
>  CFLAGS-y += -fshort-wchar
> +CFLAGS-y += -I$(srctree)/common/efi

 Perhaps another opportunity for -iquote?
>>>
>>> Yes.
>>>
>  obj-y += boot.init.o pe.init.o ebmalloc.o runtime.o
>  obj-$(CONFIG_ACPI) +=  efi-dom0.init.o
> +
> +$(obj)/%.c: common/efi/%.c
> + $(Q)cp -f $< $@

 In case both trees are on the same file system, trying to hardlink first
 would seem desirable. When copying, I think you should also pass -p.
>>>
>>> I don't know if doing an hardlink is a good thing to do, I'm not sure of
>>> the kind of issue this could bring. As for -p, I don't think it's a good
>>> idea to copy the mode, ownership, and timestamps of the source file, I'd
>>> rather have the timestamps that Make expect, e.i. "now".
>>
>> Why would "now" be correct (or expected) in any way? The cloned file is no
>> different from the original. Nevertheless I agree that -p is not ideal;
>> it's just that the more fine grained option to preserve just the timestamp
>> is non-standard afaik. You could try that first and fall back to -p ...
>> Otherwise, failing hard linking and using "cp -p", I'm afraid I'd prefer
>> symlinking despite the arguments against it that you name in the
>> description.
> 
> I guess I'm missing something, is there a reason to keep/copy the
> timestamps of the original files?

Avoidance of confusion is my main aim here. I certainly would be puzzled
to see what looks like a source file to have a time stamp much newer than
expected.

Jan

Re: [XEN PATCH v7 42/51] build: grab common EFI source files in arch specific dir

2021-10-21 Thread Anthony PERARD

On Mon, Oct 18, 2021 at 10:48:26AM +0200, Jan Beulich wrote:
> On 15.10.2021 18:29, Anthony PERARD wrote:
> > On Thu, Oct 14, 2021 at 10:51:44AM +0200, Jan Beulich wrote:
> >> On 24.08.2021 12:50, Anthony PERARD wrote:
> >>> --- a/xen/arch/arm/efi/Makefile
> >>> +++ b/xen/arch/arm/efi/Makefile
> >>> @@ -1,4 +1,10 @@
> >>>  CFLAGS-y += -fshort-wchar
> >>> +CFLAGS-y += -I$(srctree)/common/efi
> >>
> >> Perhaps another opportunity for -iquote?
> > 
> > Yes.
> > 
> >>>  obj-y += boot.init.o pe.init.o ebmalloc.o runtime.o
> >>>  obj-$(CONFIG_ACPI) +=  efi-dom0.init.o
> >>> +
> >>> +$(obj)/%.c: common/efi/%.c
> >>> + $(Q)cp -f $< $@
> >>
> >> In case both trees are on the same file system, trying to hardlink first
> >> would seem desirable. When copying, I think you should also pass -p.
> > 
> > I don't know if doing an hardlink is a good thing to do, I'm not sure of
> > the kind of issue this could bring. As for -p, I don't think it's a good
> > idea to copy the mode, ownership, and timestamps of the source file, I'd
> > rather have the timestamps that Make expect, e.i. "now".
> 
> Why would "now" be correct (or expected) in any way? The cloned file is no
> different from the original. Nevertheless I agree that -p is not ideal;
> it's just that the more fine grained option to preserve just the timestamp
> is non-standard afaik. You could try that first and fall back to -p ...
> Otherwise, failing hard linking and using "cp -p", I'm afraid I'd prefer
> symlinking despite the arguments against it that you name in the
> description.

I guess I'm missing something, is there a reason to keep/copy the
timestamps of the original files?

> Might be good to have someone else's view here as well.

Indeed.

-- 
Anthony PERARD

Re: [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation

2021-10-21 Thread Jan Beulich

On 29.09.2021 15:13, Jan Beulich wrote:
> @@ -337,53 +336,65 @@ unsigned long __init dom0_compute_nr_pag
>  avail -= d->max_vcpus - 1;
>  
>  /* Reserve memory for iommu_dom0_init() (rough estimate). */
> -if ( is_iommu_enabled(d) )
> +if ( is_iommu_enabled(d) && !iommu_hwdom_passthrough )
>  {
>  unsigned int s;
>  
>  for ( s = 9; s < BITS_PER_LONG; s += 9 )
> -avail -= max_pdx >> s;
> +iommu_pages += max_pdx >> s;
> +
> +avail -= iommu_pages;
> +}
> +
> +nr_pages = get_memsize(_size, avail);
> +
> +/*
> + * If allocation isn't specified, reserve 1/16th of available memory for
> + * things like DMA buffers. This reservation is clamped to a maximum of
> + * 128MB.
> + */
> +if ( !nr_pages )
> +{
> +nr_pages = avail - (pv_shim ? pv_shim_mem(avail)
> +: min(avail / 16, 128UL << (20 - PAGE_SHIFT)));
> +if ( paging_mode_enabled(d) )
> +/*
> + * Temporary workaround message until internal (paging) memory
> + * accounting required to build a pvh dom0 is improved.
> + */
> +printk("WARNING: PVH dom0 without dom0_mem set is still 
> unstable. "
> +   "If you get crashes during boot, try adding a dom0_mem 
> parameter\n");
>  }
>  
> -need_paging = is_hvm_domain(d) &&
> -(!iommu_use_hap_pt(d) || !paging_mode_hap(d));
> -for ( ; ; need_paging = false )
> +if ( paging_mode_enabled(d) || opt_dom0_shadow )
>  {
> -nr_pages = get_memsize(_size, avail);
> -min_pages = get_memsize(_min_size, avail);
> -max_pages = get_memsize(_max_size, avail);
> +unsigned long cpu_pages;
>  
>  /*
> - * If allocation isn't specified, reserve 1/16th of available memory
> - * for things like DMA buffers. This reservation is clamped to a
> - * maximum of 128MB.
> + * Clamp according to min/max limits and available memory
> + * (preliminary).
>   */
> -if ( !nr_pages )
> -{
> -nr_pages = avail - (pv_shim ? pv_shim_mem(avail)
> - : min(avail / 16, 128UL << (20 - 
> PAGE_SHIFT)));

Just FYI that I've noticed only now that moving this only up is
not enough; the same also ...

> -if ( is_hvm_domain(d) && !need_paging )
> -/*
> - * Temporary workaround message until internal (paging) 
> memory
> - * accounting required to build a pvh dom0 is improved.
> - */
> -printk("WARNING: PVH dom0 without dom0_mem set is still 
> unstable. "
> -   "If you get crashes during boot, try adding a 
> dom0_mem parameter\n");
> -}
> -
> -
> -/* Clamp according to min/max limits and available memory. */
> -nr_pages = max(nr_pages, min_pages);
> -nr_pages = min(nr_pages, max_pages);
> +nr_pages = max(nr_pages, get_memsize(_min_size, avail));
> +nr_pages = min(nr_pages, get_memsize(_max_size, avail));
>  nr_pages = min(nr_pages, avail);
>  
> -if ( !need_paging )
> -break;
> +cpu_pages = dom0_paging_pages(d, nr_pages);
>  
> -/* Reserve memory for shadow or HAP. */
> -avail -= dom0_paging_pages(d, nr_pages);
> +if ( !iommu_use_hap_pt(d) )
> +avail -= cpu_pages;
> +else if ( cpu_pages > iommu_pages )
> +avail -= cpu_pages - iommu_pages;
>  }
>  
> +nr_pages = get_memsize(_size, avail);

... is needed here, or else things won't work e.g. without any "dom0_mem=".
I'll introduce a helper function ...

Jan

Re: [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore

2021-10-21 Thread Ian Jackson

Thanks everyone, committed.

Ian.

[PATCH v2 3/3] AMD/IOMMU: iommu_enable vs iommu_intremap

2021-10-21 Thread Jan Beulich

The two are really meant to be independent settings; iov_supports_xt()
using || instead of && was simply wrong. The corrected check is,
however, redundant, just like the (correct) one in iov_detect(): These
hook functions are unreachable without acpi_ivrs_init() installing the
iommu_init_ops pointer, which it does only upon success. (Unlike for
VT-d there is no late clearing of iommu_enable due to quirks, and any
possible clearing of iommu_intremap happens only after iov_supports_xt()
has run.)

Signed-off-by: Jan Beulich 
---
In fact in iov_detect() it could be iommu_enable alone which gets
checked, but this felt overly aggressive to me. Instead I'm getting the
impression that the function may wrongly not get called when "iommu=off"
but interrupt remapping is in use: We'd not get the interrupt handler
installed, and hence interrupt remapping related events would never get
reported. (Same on VT-d, FTAOD.)

For iov_supports_xt() the question is whether, like VT-d's
intel_iommu_supports_eim(), it shouldn't rather check iommu_intremap
alone (in which case it would need to remain a check rather than getting
converted to ASSERT()).
---
v2: New.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -731,8 +731,7 @@ bool __init iov_supports_xt(void)
 {
 unsigned int apic;
 
-if ( !iommu_enable || !iommu_intremap )
-return false;
+ASSERT(iommu_enable || iommu_intremap);
 
 if ( amd_iommu_prepare(true) )
 return false;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -199,8 +199,7 @@ int __init acpi_ivrs_init(void)
 
 static int __init iov_detect(void)
 {
-if ( !iommu_enable && !iommu_intremap )
-return 0;
+ASSERT(iommu_enable || iommu_intremap);
 
 if ( (init_done ? amd_iommu_init_late()
 : amd_iommu_init(false)) != 0 )

[PATCH v2 2/3] x86/APIC: avoid iommu_supports_x2apic() on error path

2021-10-21 Thread Jan Beulich

The value it returns may change from true to false in case
iommu_enable_x2apic() fails and, as a side effect, clears iommu_intremap
(as can happen at least on AMD). Latch the return value from the first
invocation to replace the second one.

Signed-off-by: Jan Beulich 
---
v2: New.

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -865,6 +865,7 @@ void x2apic_ap_setup(void)
 void __init x2apic_bsp_setup(void)
 {
 struct IO_APIC_route_entry **ioapic_entries = NULL;
+bool iommu_x2apic;
 const char *orig_name;
 
 if ( !cpu_has_x2apic )
@@ -880,7 +881,8 @@ void __init x2apic_bsp_setup(void)
 printk("x2APIC: Already enabled by BIOS: Ignoring cmdline disable.\n");
 }
 
-if ( iommu_supports_x2apic() )
+iommu_x2apic = iommu_supports_x2apic();
+if ( iommu_x2apic )
 {
 if ( (ioapic_entries = alloc_ioapic_entries()) == NULL )
 {
@@ -933,8 +935,11 @@ void __init x2apic_bsp_setup(void)
 printk("Switched to APIC driver %s\n", genapic.name);
 
 restore_out:
-/* iommu_x2apic_enabled cannot be used here in the error case. */
-if ( iommu_supports_x2apic() )
+/*
+ * iommu_x2apic_enabled and iommu_supports_x2apic() cannot be used here
+ * in the error case.
+ */
+if ( iommu_x2apic )
 {
 /*
  * NB: do not use raw mode when restoring entries if the iommu has

[PATCH v2 1/3] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing

2021-10-21 Thread Jan Beulich

x2apic_bsp_setup() gets called ahead of iommu_setup(), and since x2APIC
mode (physical vs clustered) depends on iommu_intremap, that variable
needs to be set to off as soon as we know we can't / won't enable
interrupt remapping, i.e. in particular when parsing of the respective
ACPI tables failed. Move the turning off of iommu_intremap from AMD
specific code into acpi_iommu_init(), accompanying it by clearing of
iommu_enable.

Take the opportunity and also fully skip ACPI table parsing logic on
VT-d when both "iommu=off" and "iommu=no-intremap" are in effect anyway,
like was already the case for AMD.

The tag below only references the commit uncovering a pre-existing
anomaly.

Fixes: d8bd82327b0f ("AMD/IOMMU: obtain IVHD type to use earlier")
Reported-by: Andrew Cooper 
Signed-off-by: Jan Beulich 
---
While the change here deals with apic_x2apic_probe() as called from
x2apic_bsp_setup(), the check_x2apic_preenabled() path looks to be
similarly affected. That call occurs before acpi_boot_init(), which is
what calls acpi_iommu_init(). The ordering in setup.c is in part
relatively fragile, which is why for the moment I'm still hesitant to
move the generic_apic_probe() call down. Plus I don't have easy access
to a suitable system to test this case. Thoughts?
---
v2: Treat iommu_enable and iommu_intremap as separate options.

--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -183,9 +183,6 @@ int __init acpi_ivrs_init(void)
 {
 int rc;
 
-if ( !iommu_enable && !iommu_intremap )
-return 0;
-
 rc = amd_iommu_get_supported_ivhd_type();
 if ( rc < 0 )
 return rc;
@@ -193,10 +190,7 @@ int __init acpi_ivrs_init(void)
 ivhd_type = rc;
 
 if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
-{
-iommu_intremap = iommu_intremap_off;
 return -ENODEV;
-}
 
 iommu_init_ops = &_iommu_init_ops;
 
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -777,11 +777,7 @@ static int __init acpi_parse_dmar(struct
 dmar = (struct acpi_table_dmar *)table;
 dmar_flags = dmar->flags;
 
-if ( !iommu_enable && !iommu_intremap )
-{
-ret = -EINVAL;
-goto out;
-}
+ASSERT(iommu_enable || iommu_intremap);
 
 if ( !dmar->width )
 {
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -41,6 +41,24 @@ enum iommu_intremap __read_mostly iommu_
 bool __read_mostly iommu_intpost;
 #endif
 
+void __init acpi_iommu_init(void)
+{
+int ret;
+
+if ( !iommu_enable && !iommu_intremap )
+return;
+
+ret = acpi_dmar_init();
+if ( ret == -ENODEV )
+ret = acpi_ivrs_init();
+
+if ( ret )
+{
+iommu_enable = false;
+iommu_intremap = iommu_intremap_off;
+}
+}
+
 int __init iommu_hardware_setup(void)
 {
 struct IO_APIC_route_entry **ioapic_entries = NULL;
--- a/xen/include/asm-x86/acpi.h
+++ b/xen/include/asm-x86/acpi.h
@@ -141,16 +141,10 @@ extern u32 x86_acpiid_to_apicid[];
 extern u32 pmtmr_ioport;
 extern unsigned int pmtmr_width;
 
+void acpi_iommu_init(void);
 int acpi_dmar_init(void);
 int acpi_ivrs_init(void);
 
-static inline int acpi_iommu_init(void)
-{
-int ret = acpi_dmar_init();
-
-return ret == -ENODEV ? acpi_ivrs_init() : ret;
-}
-
 void acpi_mmcfg_init(void);
 
 /* Incremented whenever we transition through S3. Value is 1 during boot. */

[PATCH v2 0/3] x86/IOMMU: enabled / intremap handling

2021-10-21 Thread Jan Beulich

In the course of reading the response to v1 (patch 1 only) I realized
that not only that patch needs further adjustment, but that also
further changes are needed (and there's likely yet more amiss).

1: x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing
2: x86/APIC: avoid iommu_supports_x2apic() on error path
3: AMD/IOMMU: iommu_enable vs iommu_intremap

Jan

Re: [XEN][RFC PATCH 08/13] xen/iommu: Introduce iommu_remove_dt_devices function

2021-10-21 Thread Julien Grall


Hi Vikram,

On 02/09/2021 07:05, Vikram Garhwal wrote:

iommu_remove_dt_device function is introduced for supporting dynamic programming
i.e. adding and removing a node during runtime. When user removes the device
node, iommu_remove_dt_device() removes the device entry from smmu-masters too
using following steps:
 1. Find if SMMU master exists for the device node.
 2. Remove the SMMU master.


I would prefer if this patch is split in two:
  * Part 1: Add the generic helper
  * Part 2: Implement the callback for the SMMU



Signed-off-by: Vikram Garhwal 
---
  xen/drivers/passthrough/arm/smmu.c| 53 +++
  xen/drivers/passthrough/device_tree.c | 30 
  xen/include/xen/iommu.h   |  2 ++
  3 files changed, 85 insertions(+)

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index c9dfc4c..7b615bc 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -816,6 +816,17 @@ static int insert_smmu_master(struct arm_smmu_device *smmu,
return 0;
  }
  
+static int remove_smmu_master(struct arm_smmu_device *smmu,

+ struct arm_smmu_master *master)
+{
+   if (!(smmu->masters.rb_node))
+   return -ENOENT;
+
+   rb_erase(>node, >masters);
+
+   return 0;
+}
+
  static int arm_smmu_dt_add_device_legacy(struct arm_smmu_device *smmu,
 struct device *dev,
 struct iommu_fwspec *fwspec)
@@ -853,6 +864,31 @@ static int arm_smmu_dt_add_device_legacy(struct 
arm_smmu_device *smmu,
return insert_smmu_master(smmu, master);
  }
  
+static int arm_smmu_dt_remove_device_legacy(struct arm_smmu_device *smmu,

+struct device *dev)
+{
+   struct arm_smmu_master *master;
+   struct device_node *dev_node = dev_get_dev_node(dev);
+   int ret;
+
+   master = find_smmu_master(smmu, dev_node);
+   if (master == NULL) {
+   dev_err(dev,
+   "No registrations found for master device %s\n",
+   dev_node->name);
+   return -EINVAL;
+   }
+
+   ret = remove_smmu_master(smmu, master);


Can you add a comment why you are not clearing the dev_node->is_protected?


+
+   if (ret)
+   return ret;
+
+   master->of_node = NULL;


NIT: This is a bit pointless to do given that you are freeing master 
right after.



+   kfree(master);
+   return 0;
+}
+
  static int register_smmu_master(struct arm_smmu_device *smmu,
struct device *dev,
struct of_phandle_args *masterspec)
@@ -876,6 +912,22 @@ static int register_smmu_master(struct arm_smmu_device 
*smmu,
 fwspec);
  }
  
+static int arm_smmu_dt_remove_device_generic(u8 devfn, struct device *dev)

+{
+   struct arm_smmu_device *smmu;
+   struct iommu_fwspec *fwspec;
+
+   fwspec = dev_iommu_fwspec_get(dev);
+   if (fwspec == NULL)
+   return -ENXIO;
+
+   smmu = find_smmu(fwspec->iommu_dev);
+   if (smmu == NULL)
+   return -ENXIO;
+
+   return arm_smmu_dt_remove_device_legacy(smmu, dev);
+}
+
  static int arm_smmu_dt_add_device_generic(u8 devfn, struct device *dev)
  {
struct arm_smmu_device *smmu;
@@ -2876,6 +2928,7 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
  .init = arm_smmu_iommu_domain_init,
  .hwdom_init = arm_smmu_iommu_hwdom_init,
  .add_device = arm_smmu_dt_add_device_generic,
+.remove_device = arm_smmu_dt_remove_device_generic,
  .teardown = arm_smmu_iommu_domain_teardown,
  .iotlb_flush = arm_smmu_iotlb_flush,
  .iotlb_flush_all = arm_smmu_iotlb_flush_all,
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 98f2aa0..37f4945 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -127,6 +127,36 @@ int iommu_release_dt_devices(struct domain *d)
  return 0;
  }
  
+int iommu_remove_dt_device(struct dt_device_node *np)

+{
+const struct iommu_ops *ops = iommu_get_ops();
+struct device *dev = dt_to_dev(np);
+int rc = 1;


You set rc to 1 but AFAICT the value is never used.


+
+if ( !ops )
+return -EINVAL;


It would be better to return -EOPNOSUPP.


+
+if ( iommu_dt_device_is_assigned(np) )
+return -EPERM;
+
+/*
+ * The driver which supports generic IOMMU DT bindings must have
+ * these callback implemented.
+ */


I think we should make ops->remove_device optional.


+if ( !ops->remove_device )
+return -EINVAL;

It would be better to return -EOPNOSUPP.


+
+/*
+ * Remove master device from the IOMMU if latter is present and available.
+ */
+rc = ops->remove_device(0, dev);


This

xen/arm: Missing appropriate locking for the IOMMU (WAS Re: [PATCH v5 02/11] xen/arm: Add PHYSDEVOP_pci_device_(*add/remove) support for ARM)

2021-10-21 Thread Julien Grall


Hi all,

While going through the passthrough code. I noticed that we don't have a 
common lock for the IOMMU between the PCI and DT code.


This is going to be an issue given it would technically be possible to 
add a PCI device while assigning a DT.


Rahul, Bertrand, Oleksandr, can you have a look at the issue?

Cheers,

On 06/10/2021 18:40, Rahul Singh wrote:

Hardware domain is in charge of doing the PCI enumeration and will
discover the PCI devices and then will communicate to XEN via hyper
call PHYSDEVOP_pci_device_add(..) to add the PCI devices in XEN.

Also implement PHYSDEVOP_pci_device_remove(..) to remove the PCI device.

As most of the code for PHYSDEVOP_pci_device_* is the same between x86
and ARM, move the code to a common file to avoid duplication.

There are other PHYSDEVOP_pci_device_* operations to add PCI devices.
Currently implemented PHYSDEVOP_pci_device_remove(..) and
PHYSDEVOP_pci_device_add(..) only as those are minimum required to
support PCI passthrough on ARM.

Signed-off-by: Rahul Singh 
---
Change in v5:
- Move the pci_physdev_op() stub to xen/arch/arm/physdev.c.
Change in v4:
- Move file commom/physdev.c to drivers/pci/physdev.c
- minor comments.
Change in v3: Fixed minor comment.
Change in v2:
- Add support for PHYSDEVOP_pci_device_remove()
- Move code to common code
---
---
  xen/arch/arm/physdev.c|  6 ++-
  xen/arch/x86/physdev.c| 52 +--
  xen/arch/x86/x86_64/physdev.c |  2 +-
  xen/drivers/pci/Makefile  |  1 +
  xen/drivers/pci/physdev.c | 80 +++
  xen/include/public/arch-arm.h |  4 +-
  xen/include/xen/hypercall.h   |  4 ++
  7 files changed, 96 insertions(+), 53 deletions(-)
  create mode 100644 xen/drivers/pci/physdev.c

diff --git a/xen/arch/arm/physdev.c b/xen/arch/arm/physdev.c
index e91355fe22..f9aa274dda 100644
--- a/xen/arch/arm/physdev.c
+++ b/xen/arch/arm/physdev.c
@@ -8,13 +8,17 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  
  
  int do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)

  {
+#ifdef CONFIG_HAS_PCI
+return pci_physdev_op(cmd, arg);
+#else
  gdprintk(XENLOG_DEBUG, "PHYSDEVOP cmd=%d: not implemented\n", cmd);
  return -ENOSYS;
+#endif
  }
  
  /*

diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 23465bcd00..ea38be8b79 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -12,7 +12,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 
@@ -480,54 +480,6 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
  break;
  }
  
-case PHYSDEVOP_pci_device_add: {

-struct physdev_pci_device_add add;
-struct pci_dev_info pdev_info;
-nodeid_t node;
-
-ret = -EFAULT;
-if ( copy_from_guest(, arg, 1) != 0 )
-break;
-
-pdev_info.is_extfn = !!(add.flags & XEN_PCI_DEV_EXTFN);
-if ( add.flags & XEN_PCI_DEV_VIRTFN )
-{
-pdev_info.is_virtfn = 1;
-pdev_info.physfn.bus = add.physfn.bus;
-pdev_info.physfn.devfn = add.physfn.devfn;
-}
-else
-pdev_info.is_virtfn = 0;
-
-if ( add.flags & XEN_PCI_DEV_PXM )
-{
-uint32_t pxm;
-size_t optarr_off = offsetof(struct physdev_pci_device_add, 
optarr) /
-sizeof(add.optarr[0]);
-
-if ( copy_from_guest_offset(, arg, optarr_off, 1) )
-break;
-
-node = pxm_to_node(pxm);
-}
-else
-node = NUMA_NO_NODE;
-
-ret = pci_add_device(add.seg, add.bus, add.devfn, _info, node);
-break;
-}
-
-case PHYSDEVOP_pci_device_remove: {
-struct physdev_pci_device dev;
-
-ret = -EFAULT;
-if ( copy_from_guest(, arg, 1) != 0 )
-break;
-
-ret = pci_remove_device(dev.seg, dev.bus, dev.devfn);
-break;
-}
-
  case PHYSDEVOP_prepare_msix:
  case PHYSDEVOP_release_msix: {
  struct physdev_pci_device dev;
@@ -663,7 +615,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
  }
  
  default:

-ret = -ENOSYS;
+ret = pci_physdev_op(cmd, arg);
  break;
  }
  
diff --git a/xen/arch/x86/x86_64/physdev.c b/xen/arch/x86/x86_64/physdev.c

index 0a50cbd4d8..e3cbd5ebcb 100644
--- a/xen/arch/x86/x86_64/physdev.c
+++ b/xen/arch/x86/x86_64/physdev.c
@@ -9,7 +9,7 @@ EMIT_FILE;
  #include 
  #include 
  #include 
-#include 
+#include 
  
  #define do_physdev_op compat_physdev_op
  
diff --git a/xen/drivers/pci/Makefile b/xen/drivers/pci/Makefile

index a98035df4c..972c923db0 100644
--- a/xen/drivers/pci/Makefile
+++ b/xen/drivers/pci/Makefile
@@ -1 +1,2 @@
  obj-y += pci.o
+obj-y += physdev.o
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
new file mode 100644
index 00..4f3e1a96c0
--- /dev/null
+++ b/xen/drivers/pci/physdev.c

Re: [XEN][RFC PATCH 06/13] device tree: Add dt_print_node_names()

2021-10-21 Thread Julien Grall


Hi Vikram,

On 02/09/2021 07:05, Vikram Garhwal wrote:

Add dt_print_node_names() to print all nodes under a dt_device_node.
dt_print_node_names() takes a dt_device_node type input and prints the node name
of all the subsequent nodes. This is added for debugging purpose for device tree
overlays.


I can't find any use of this function in the rest of series. While I 
understand it was helpful for you, I am not entirely convinced that we 
should add it in Xen with no-use. The code is simple enough for anyone 
to re-implement it if needed.


Cheers,



Signed-off-by: Vikram Garhwal 
---
  xen/common/device_tree.c  | 10 ++
  xen/include/xen/device_tree.h |  5 +
  2 files changed, 15 insertions(+)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index cda21be..bfe3191 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -308,6 +308,16 @@ struct dt_device_node *dt_find_node_by_path(const char 
*path)
  return np;
  }
  
+void dt_print_node_names(struct dt_device_node *dt)

+{
+struct dt_device_node *np;
+
+dt_for_each_device_node(dt, np)
+dt_dprintk("Node name: %s Full name %s\n", np->name, np->full_name);
+
+return;
+}
+
  int dt_find_node_by_gpath(XEN_GUEST_HANDLE(char) u_path, uint32_t u_plen,
struct dt_device_node **node)
  {
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index a4e98a7..dcd96b4 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -483,6 +483,11 @@ struct dt_device_node *dt_find_node_by_path(const char 
*path);
  int dt_find_node_by_gpath(XEN_GUEST_HANDLE(char) u_path, uint32_t u_plen,
struct dt_device_node **node);
  
+/*

+ * Prints all node names.
+ */
+void dt_print_node_names(struct dt_device_node *dt);
+
  /**
   * dt_get_parent - Get a node's parent if any
   * @node: Node to get parent



--
Julien Grall

[linux-5.4 test] 165696: tolerable FAIL - PUSHED

2021-10-21 Thread osstest service owner

flight 165696 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/165696/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 165616
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 165616
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 165616
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 165616
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 165616
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 165616
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 165616
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 165616
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 165616
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 165616
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 165616
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 165616
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 linux4f508aa9dd3bde7a1c5e4e6de72abb8a03fd504a
baseline version:
 linux

Re: [PATCH v2] PCI/MSI: Re-add checks for skip masking MSI-X on Xen PV

2021-10-21 Thread Josef Johansson

On 10/20/21 16:03, Jason Andryuk wrote:
> Hi, Marc,
>
> Adding Juergen and Boris since this involves Xen.
>
> On Wed, Oct 20, 2021 at 8:51 AM Marc Zyngier  wrote:
>> On Tue, 19 Oct 2021 22:48:19 +0100,
>> Josef Johansson  wrote:
>>> From: Josef Johansson 
>>>
>>>
>>> PCI/MSI: Re-add checks for skip masking MSI-X on Xen PV
>>>
>>> commit fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask
>>> functions") introduce functions pci_msi_update_mask() and
>>> pci_msix_write_vector_ctrl() that is missing checks for
>>> pci_msi_ignore_mask that exists in commit 446a98b19fd6 ("PCI/MSI: Use
>>> new mask/unmask functions"). Add them back since it is
>>> causing severe lockups in amdgpu drivers under Xen during boot.
>>>
>>> As explained in commit 1a519dc7a73c ("PCI/MSI: Skip masking MSI-X
>>> on Xen PV"), when running as Xen PV guest, masking MSI-X is a
>>> responsibility of the hypervisor.
>>>
>>> Fixes: fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask
>>> functions")
>>> Suggested-by: Jason Andryuk 
>>> Signed-off-by: Josef Johansson 
>>>
>> [...]
>>
>>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>>> index 0099a00af361..355b791e382f 100644
>>> --- a/drivers/pci/msi.c
>>> +++ b/drivers/pci/msi.c
>>> @@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask(struct 
>>> msi_desc *desc, u32 clear, u32 s
>>>   raw_spinlock_t *lock = >dev->msi_lock;
>>>   unsigned long flags;
>>>
>>> + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
>>> + return;
>>> +
>> I'd rather be consistent, and keep the check outside of
>> pci_msi_update_mask(), just like we do in __pci_msi_mask_desc().
>> Something like this instead:
>>
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index 0099a00af361..6c69eab304ce 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -420,7 +420,8 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
>> arch_restore_msi_irqs(dev);
>>
>> pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, );
>> -   pci_msi_update_mask(entry, 0, 0);
>> +   if (!(pci_msi_ignore_mask || desc->msi_attrib.is_virtual))
>> +   pci_msi_update_mask(entry, 0, 0);
>> control &= ~PCI_MSI_FLAGS_QSIZE;
>> control |= (entry->msi_attrib.multiple << 4) | PCI_MSI_FLAGS_ENABLE;
>> pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
>>
>> But the commit message talks about MSI-X, and the above is MSI
>> only. Is Xen messing with the former, the latter, or both?
> My understanding is pci_msi_ignore_mask covers both MSI and MSI-X for Xen.

Please let me know if I should go ahead and try it out and send in a v3
of the patch.

I'm watching for further discussion right now, just to be clear.
>>>   raw_spin_lock_irqsave(lock, flags);
>>>   desc->msi_mask &= ~clear;
>>>   desc->msi_mask |= set;
>>> @@ -181,6 +184,9 @@ static void pci_msix_write_vector_ctrl(struct msi_desc 
>>> *desc, u32 ctrl)
>>>  {
>>>   void __iomem *desc_addr = pci_msix_desc_addr(desc);
>>>
>>> + if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
>>> + return;
>>> +
>>>   writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
>>>  }
>> I have similar reservations for this one.
> The problem here is some of the changes in commit 446a98b19fd6
> ("PCI/MSI: Use new mask/unmask functions") bypass the checks in
> __pci_msi_mask_desc/__pci_msi_unmask_desc.  I've wondered if it would
> be cleaner to push all the `if (pci_msi_ignore_mask)` checks down to
> the place of the writes.  That keeps dropping the write local to the
> write and leaves the higher level code consistent between the regular
> and Xen PV cases.  I don't know where checking
> desc->msi_attrib.is_virtual is appropriate.
>
> Regards,
> Jason

Re: [PATCH] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing

2021-10-21 Thread Jan Beulich

On 21.10.2021 09:21, Jan Beulich wrote:
> On 20.10.2021 22:01, Andrew Cooper wrote:
>> However, I don't think skipping parsing is a sensible move.  Intremap is
>> utterly mandatory if during boot, we discover that our APIC ID is >254,
>> and iommu=no-intremap must be ignored in this case, or if the MADT says
>> we have CPUs beyond that limit and the user hasn't specified nr_cpus=1
>> or equivalent.
> 
> Reading this made me realize that the change breaks other behavior.
> The conditional really needs to be iommu_enable || iommu_intremap -
> at least AMD code added in support for x2APIC already treats the
> latter to not be a sub-option of the former (iov_supports_xt(),
> acpi_ivrs_init()), and e.g. intel_iommu_supports_eim() also checks
> the latter alone.

Actually the check in iov_supports_xt() is wrong - it needs to use
&&, not ||. I'll make a 2nd patch ...

Jan

Re: [PATCH] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing

2021-10-21 Thread Jan Beulich

On 20.10.2021 22:01, Andrew Cooper wrote:
> On 20/10/2021 11:36, Jan Beulich wrote:
>> x2apic_bsp_setup() gets called ahead of iommu_setup(), and since x2APIC
>> mode (physical vs clustered) depends on iommu_intremap, that variable
>> needs to be set to off as soon as we know we can't / won't enable the
>> IOMMU, i.e. in particular when
>> - parsing of the respective ACPI tables failed,
>> - "iommu=off" is in effect, but not "iommu=no-intremap".
>> Move the turning off of iommu_intremap from AMD specific code into
>> acpi_iommu_init(), accompanying it by clearing of iommu_enable.
>>
>> Take the opportunity and also skip ACPI table parsing altogether when
>> "iommu=off" is in effect anyway.
>>
>> Reported-by: Andrew Cooper 
>> Signed-off-by: Jan Beulich 
>> ---
>> I've deliberately not added a Fixes: tag here, as I'm of the opinion
>> that d8bd82327b0f ("AMD/IOMMU: obtain IVHD type to use earlier") only
>> uncovered a pre-existing anomaly.
> 
> I agree it uncovered a pre-existing issue, but that doesn't mean a Fixes
> tag should be omitted.  That change very concretely regressed booting on
> some systems in their pre-existing configuration.
> 
> The commit message needs to spell out a link to d8bd82327b0f, but it's
> fine to say "that commit broke it by violating an unexpected ordering
> dependency, but isn't really the source of the bug".
> 
>> This particularly applies to the "iommu=off" aspect.
> 
> There should be at least two Fixes tags, but I suspect trying to trace
> the history of this mess is not a productive use of time.
> 
>>  (This now allows me to remove an item from my TODO
>> list: I was meaning to figure out why one of my systems wouldn't come
>> up properly with "iommu=off", and I had never thought of combining this
>> with "no-intremap".)
>>
>> Arguably "iommu=off" should turn off subordinate features in common
>> IOMMU code, but doing so in parse_iommu_param() would be wrong (as
>> there might be multiple "iommu=" to parse). This could be placed in
>> iommu_supports_x2apic(), but see the next item.
> 
> I don't think we make any claim or implication that passing the same
> option twice works.  The problem here is the nested structure of
> options, and the variable doing double duty.
> 
>>
>> While the change here deals with apic_x2apic_probe() as called from
>> x2apic_bsp_setup(), the check_x2apic_preenabled() path looks to be
>> similarly affected. That call occurs before acpi_boot_init(), which is
>> what calls acpi_iommu_init(). The ordering in setup.c is in part
>> relatively fragile, which is why for the moment I'm still hesitant to
>> move the generic_apic_probe() call down. Plus I don't have easy access
>> to a suitable system to test this case. Thoughts?
> 
> I've written these thoughts before, but IOMMU handling it a catastrophic
> mess.  It needs burning to the ground and redoing from scratch.
> 
> We currently have two ways of turning on the IOMMU, depending on what
> firmware does, and plenty ways of crashing Xen with cmdline options
> which should work, and the legacy xAPIC startup routine is after
> interrupts have been enabled, leading to an attempt to rewrite
> interrupts in place to remap them.  This in particular has lead to XSAs
> due to trusting registers which can't be trusted, and the rewrite is
> impossible to do safely.
> 
> The correct order is:
> 1) Parse DMAR/IVRS (but do not configure anything), MADT, current APIC
> setting and cmdline arguments
> 2) Figure out whether we want to be in xAPIC or x2APIC mode, and whether
> we need intremap.  Change the LAPIC setting
> 3) Configure the IOMMUs.  In particular, their interrupt needs to be
> after step 2

Leaving aside check_x2apic_preenabled(), all of this is already the
case afaict, almost at least: We do acpi_boot_init(), later
x2apic_bsp_setup(), and yet later iommu_setup(). The only issue might
be inside x2apic_bsp_setup(), where we call iommu_enable_x2apic()
before switching to x2APIC mode. Yet we avoid setting up IOMMU
interrupts during this early stage.

Hence I think, as expressed, that the question really is whether we
can safely defer check_x2apic_preenabled() by a little bit.

> 4) Start up Xen's general IRQ infrastructure.
> 
> It's a fair chunk of work, but it will vastly simplify the boot logic
> and let us delete the infinite flushing loops out of the IOMMU logic,
> and we don't need any logic which has to second guess itself based on
> what happened earlier on boot.
> 
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -41,6 +41,23 @@ enum iommu_intremap __read_mostly iommu_
>>  bool __read_mostly iommu_intpost;
>>  #endif
>>  
>> +void __init acpi_iommu_init(void)
>> +{
>> +if ( iommu_enable )
>> +{
>> +int ret = acpi_dmar_init();
>> +
>> +if ( ret == -ENODEV )
>> +ret = acpi_ivrs_init();
>> +
>> +if ( ret )
>> +iommu_enable = false;
>> +}
>> +
>> +if ( !iommu_enable )
>> +

Re: [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore

2021-10-21 Thread Luca Fancellu




> On 20 Oct 2021, at 15:45, Julien Grall  wrote:
> 
> From: Julien Grall 
> 
> Commit 939775cfd3 "handle dying domains in live update" was meant to
> handle gracefully dying domain. However, the @releaseDomain watch
> will end up to be sent as soon as we finished to restore Xenstored
> state.
> 
> This may be before Xen reports the domain to be dying (such as if
> the guest decided to revoke access to the xenstore page). Consequently
> daemon like xenconsoled will not clean-up the domain and it will be
> left as a zombie.
> 
> To avoid the problem, mark the connection as ignored. This also
> requires to tweak conn_can_write() and conn_can_read() to prevent
> dereferencing a NULL pointer (the interface will not mapped).
> 
> The check conn->is_ignored was originally added after the callbacks
> because the helpers for a socket connection may close the fd. However,
> ignore_connection() will close a socket connection directly. So it is
> fine to do the re-order.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Luca Fancellu 

> 
> ---
> 
> This issue was originally found when developping commit 939775cfd3.
> Unfortunately, due to some miscommunication the wrong patch was sent
> upstream. The approach is re-using the one we discussed back then.
> 
> This was tested by modifying Linux to revoke the Xenstore grant during
> boot. Without this patch, the domain will be left after Live-Update. Note
> that on a basic setup (only xenconsoled and xl watch @releaseDomain),
> the domain may be cleaned on the next domain shutdown/start.
> 
> Xenstore Live-Update is so far a tech preview feature. But I would still
> like to request this patch to be included in 4.16 as this was meant to
> be part of the original work.
> ---
> tools/xenstore/xenstored_core.c   |  8 
> tools/xenstore/xenstored_core.h   |  1 +
> tools/xenstore/xenstored_domain.c | 21 -
> 3 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
> index 0d4c73d6e20c..91d093a12ea6 100644
> --- a/tools/xenstore/xenstored_core.c
> +++ b/tools/xenstore/xenstored_core.c
> @@ -338,10 +338,10 @@ static int destroy_conn(void *_conn)
> 
> static bool conn_can_read(struct connection *conn)
> {
> - if (!conn->funcs->can_read(conn))
> + if (conn->is_ignored)
>   return false;
> 
> - if (conn->is_ignored)
> + if (!conn->funcs->can_read(conn))
>   return false;
> 
>   /*
> @@ -356,7 +356,7 @@ static bool conn_can_read(struct connection *conn)
> 
> static bool conn_can_write(struct connection *conn)
> {
> - return conn->funcs->can_write(conn) && !conn->is_ignored;
> + return !conn->is_ignored && conn->funcs->can_write(conn);
> }
> 
> /* This function returns index inside the array if succeed, -1 if fail */
> @@ -1466,7 +1466,7 @@ static struct {
>  *
>  * All watches, transactions, buffers will be freed.
>  */
> -static void ignore_connection(struct connection *conn)
> +void ignore_connection(struct connection *conn)
> {
>   struct buffered_data *out, *tmp;
> 
> diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
> index 258f6ff38279..07d861d92499 100644
> --- a/tools/xenstore/xenstored_core.h
> +++ b/tools/xenstore/xenstored_core.h
> @@ -206,6 +206,7 @@ struct node *read_node(struct connection *conn, const 
> void *ctx,
> 
> struct connection *new_connection(const struct interface_funcs *funcs);
> struct connection *get_connection_by_id(unsigned int conn_id);
> +void ignore_connection(struct connection *conn);
> void check_store(void);
> void corrupt(struct connection *conn, const char *fmt, ...);
> 
> diff --git a/tools/xenstore/xenstored_domain.c 
> b/tools/xenstore/xenstored_domain.c
> index 47e9107c144e..d03c7d93a9e7 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -268,14 +268,7 @@ void check_domains(void)
>   domain->shutdown = true;
>   notify = 1;
>   }
> - /*
> -  * On Restore, we may have been unable to remap the
> -  * interface and the port. As we don't know whether
> -  * this was because of a dying domain, we need to
> -  * check if the interface and port are still valid.
> -  */
> - if (!dominfo.dying && domain->port &&
> - domain->interface)
> + if (!dominfo.dying)
>   continue;
>   }
>   if (domain->conn) {
> @@ -1303,6 +1296,17 @@ void read_state_connection(const void *ctx, const void 
> *state)
>   if (!domain)
>   barf("domain allocation error");
> 
> + conn = domain->conn;
> +
> + /*
> +  * We may not have been able to restore the domain (for
> +

Re: [PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore

2021-10-21 Thread Juergen Gross


On 20.10.21 16:45, Julien Grall wrote:

From: Julien Grall 

Commit 939775cfd3 "handle dying domains in live update" was meant to
handle gracefully dying domain. However, the @releaseDomain watch
will end up to be sent as soon as we finished to restore Xenstored
state.

This may be before Xen reports the domain to be dying (such as if
the guest decided to revoke access to the xenstore page). Consequently
daemon like xenconsoled will not clean-up the domain and it will be
left as a zombie.

To avoid the problem, mark the connection as ignored. This also
requires to tweak conn_can_write() and conn_can_read() to prevent
dereferencing a NULL pointer (the interface will not mapped).

The check conn->is_ignored was originally added after the callbacks
because the helpers for a socket connection may close the fd. However,
ignore_connection() will close a socket connection directly. So it is
fine to do the re-order.

Signed-off-by: Julien Grall 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

54 matches

Mail list logo