[PATCH] libxl/PCI: defer backend wait upon attaching to PV guest

2021-12-13 Thread Jan Beulich
Attempting to wait when the backend hasn't been created yet can't work:
the function will complain "Backend ... does not exist". Move the
waiting past the creation of the backend (and that of other related
nodes), hoping that there are no other dependencies that would now be
broken.

Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are 
reflected in the config")
Signed-off-by: Jan Beulich 
---
Just to make it explicit: I have no idea why the waiting is needed in
the first place. It's been there from the very introduction of PCI
passthrough support (commit b0a1af61678b). I therefore can't exclude
that an even better fix would be to simply omit the 2nd hunk here.

--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -157,11 +157,6 @@ static int libxl__device_pci_add_xenstor
 if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
 
-if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
-if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", 
XenbusStateConnected)) < 0)
-return ERROR_FAIL;
-}
-
 back = flexarray_make(gc, 16, 1);
 
 LOGD(DEBUG, domid, "Adding new pci device to xenstore");
@@ -213,6 +208,9 @@ static int libxl__device_pci_add_xenstor
 if (rc < 0) goto out;
 }
 
+if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV)
+rc = libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", 
XenbusStateConnected));
+
 out:
 libxl__xs_transaction_abort(gc, &t);
 if (lock) libxl__unlock_file(lock);




Re: [PATCH 03/10] mini-os: don't assume contiguous RAM when initializing in PVH mode

2021-12-13 Thread Samuel Thibault
Juergen Gross, le mar. 14 déc. 2021 07:35:54 +0100, a ecrit:
> On 13.12.21 22:22, Samuel Thibault wrote:
> > Juergen Gross, le lun. 13 déc. 2021 15:58:58 +0100, a ecrit:
> > > On 12.12.21 01:15, Samuel Thibault wrote:
> > > > Juergen Gross, le lun. 06 déc. 2021 08:23:30 +0100, a ecrit:
> > > > > -unsigned long pfn, max = 0;
> > > > > +unsigned long pfns, max = 0;
> > > > 
> > > > I'd say rather rename max to start.
> > > > 
> > > > >e820_get_memmap();
> > > > > @@ -166,9 +166,12 @@ unsigned long e820_get_maxpfn(void)
> > > > >{
> > > > >if ( e820_map[i].type != E820_RAM )
> > > > >continue;
> > > > > -pfn = (e820_map[i].addr + e820_map[i].size) >> PAGE_SHIFT;
> > > > > -if ( pfn > max )
> > > > > -max = pfn;
> > > > > +pfns = e820_map[i].size >> PAGE_SHIFT;
> > > > > +max = e820_map[i].addr >> PAGE_SHIFT;
> > > > 
> > > > since it's it's always the start of the e820 entry.
> > > > 
> > > > > +if ( pages <= pfns )
> > > > > +return max + pages;
> > > > > +pages -= pfns;
> > > > > +max += pfns;
> > > > 
> > > > Here we don't need do change max, only pages.
> > > 
> > > It is needed in case the loop is finished.
> > > 
> > > And this was the reason for naming it max.
> > 
> > Ah, ok.
> > 
> > At first read the name was confusing me. Perhaps better use two
> > variables then: start and max, so that we have
> > 
> > start = e820_map[i].addr >> PAGE_SHIFT;
> > if ( pages <= pfns )
> >  return start + pages;
> > pages -= pfns;
> > max = start + pfns;
> 
> Hmm, or I can rename max to start, drop the "max += pfns;" and do a
> "return start + pfns;" at the end of the function.

That could do as well, yes.

Samuel



Re: [PATCH 03/10] mini-os: don't assume contiguous RAM when initializing in PVH mode

2021-12-13 Thread Juergen Gross

On 13.12.21 22:22, Samuel Thibault wrote:

Juergen Gross, le lun. 13 déc. 2021 15:58:58 +0100, a ecrit:

On 12.12.21 01:15, Samuel Thibault wrote:

Juergen Gross, le lun. 06 déc. 2021 08:23:30 +0100, a ecrit:

-unsigned long pfn, max = 0;
+unsigned long pfns, max = 0;


I'd say rather rename max to start.


   e820_get_memmap();
@@ -166,9 +166,12 @@ unsigned long e820_get_maxpfn(void)
   {
   if ( e820_map[i].type != E820_RAM )
   continue;
-pfn = (e820_map[i].addr + e820_map[i].size) >> PAGE_SHIFT;
-if ( pfn > max )
-max = pfn;
+pfns = e820_map[i].size >> PAGE_SHIFT;
+max = e820_map[i].addr >> PAGE_SHIFT;


since it's it's always the start of the e820 entry.


+if ( pages <= pfns )
+return max + pages;
+pages -= pfns;
+max += pfns;


Here we don't need do change max, only pages.


It is needed in case the loop is finished.

And this was the reason for naming it max.


Ah, ok.

At first read the name was confusing me. Perhaps better use two
variables then: start and max, so that we have

start = e820_map[i].addr >> PAGE_SHIFT;
if ( pages <= pfns )
 return start + pages;
pages -= pfns;
max = start + pfns;


Hmm, or I can rename max to start, drop the "max += pfns;" and do a
"return start + pfns;" at the end of the function.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 02/10] mini-os: sort and sanitize e820 memory map

2021-12-13 Thread Juergen Gross

On 13.12.21 22:19, Samuel Thibault wrote:

Juergen Gross, le lun. 13 déc. 2021 15:56:21 +0100, a ecrit:

On 12.12.21 01:05, Samuel Thibault wrote:

Hello,

Juergen Gross, le lun. 06 déc. 2021 08:23:29 +0100, a ecrit:

- align the entries to page boundaries



+/* Adjust map entries to page boundaries. */
+for ( i = 0; i < e820_entries; i++ )
+{
+end = (e820_map[i].addr + e820_map[i].size + PAGE_SIZE - 1) & 
PAGE_MASK;
+e820_map[i].addr &= PAGE_MASK;
+e820_map[i].size = end - e820_map[i].addr;
+}


Mmm, what if the previous entry ends after the aligned start?

On real machines that does happen, and you'd rather round up the start
address of usable areas, rather than rounding it down (and conversely
for the end).


I think you are partially right. :-)

Entries for resources managed by Mini-OS (RAM, maybe NVME?) should be
rounded to cover only complete pages (start rounded up, end rounded
down), but all other entries should be rounded to cover the complete
area (start rounded down, end rounded up) in order not to use any
partial used page for e.g. mapping foreign pages.


Right!


+/* Sort entries by start address. */
+for ( i = 0; i < e820_entries - 1; i++ )
+{
+if ( e820_map[i].addr > e820_map[i + 1].addr )
+{
+e820_swap_entries(i, i + 1);
+i = -1;
+}
+}


This looks O(n^3) to me? A bubble sort like this should be fine:

  /* Sort entries by start address. */
  for ( last = e820_entries; last > 1; last-- )
  {
  for ( i = 0; i < last - 1; i++ )
  {
  if ( e820_map[i].addr > e820_map[i + 1].addr )
  {
  e820_swap_entries(i, i + 1);
  }
  }
  }


Hmm, depends.

Assuming a rather well sorted map my version is O(n), while yours
is still O(n^2).


Right, I was a bit lazy :)

This should be fine:

/* Sort entries by start address. */
for ( i = 1; i < e820_entries; i++ )
 for ( j = i; j > 0 && e820_map[j-1].addr > e820_map[j].addr ) ; j-- )
 e820_swap_entries(j - 1, j);


I'm fine both ways, whatever you prefer.


I really prefer for loops which don't unexpectedly modify their loop
index, that's much less scary :)


Agreed, I'll take your version.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[xen-4.16-testing test] 167401: tolerable FAIL - PUSHED

2021-12-13 Thread osstest service owner
flight 167401 xen-4.16-testing real [real]
flight 167406 xen-4.16-testing real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/167401/
http://logs.test-lab.xenproject.org/osstest/logs/167406/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-multivcpu  8 xen-boot   fail pass in 167406-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check fail in 167406 never 
pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check fail in 167406 
never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167218
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167218
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167218
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167218
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167218
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167218
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167218
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167218
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167218
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167218
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167218
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167218
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-chec

Re: [patch V3 22/35] soc: ti: ti_sci_inta_msi: Use msi_desc::msi_index

2021-12-13 Thread Nishanth Menon
On 23:19-20211210, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> Use the common msi_index member and get rid of the pointless wrapper struct.
> 
> Signed-off-by: Thomas Gleixner 
> Reviewed-by: Greg Kroah-Hartman 
> Reviewed-by: Jason Gunthorpe 
> Cc: Nishanth Menon 
> Cc: Tero Kristo 
> Cc: Santosh Shilimkar 
> Cc: Thomas Gleixner 
> Cc: linux-arm-ker...@lists.infradead.org

Acked-by: Nishanth Menon 
-- 
Regards,
Nishanth Menon
Key (0xDDB5849D1736249D)/Fingerprint: F8A2 8693 54EB 8232 17A3  1A34 DDB5 849D 
1736 249D



Re: [patch V3 12/35] soc: ti: ti_sci_inta_msi: Allocate MSI device data on first use

2021-12-13 Thread Nishanth Menon
On 23:19-20211210, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> Allocate the MSI device data on first invocation of the allocation function.
> 
> Signed-off-by: Thomas Gleixner 
> Reviewed-by: Greg Kroah-Hartman 
> Reviewed-by: Jason Gunthorpe 
> Cc: Nishanth Menon 
> Cc: Tero Kristo 
> Cc: Santosh Shilimkar 
> Cc: linux-arm-ker...@lists.infradead.org


Acked-by: Nishanth Menon 

-- 
Regards,
Nishanth Menon
Key (0xDDB5849D1736249D)/Fingerprint: F8A2 8693 54EB 8232 17A3  1A34 DDB5 849D 
1736 249D



Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2

2021-12-13 Thread Nishanth Menon
On 23:18-20211210, Thomas Gleixner wrote:
[...]

> 
> It's also available from git:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git 
> msi-v3-part-2

[...]

> ---
>  drivers/dma/ti/k3-udma-private.c|6 
>  drivers/dma/ti/k3-udma.c|   14 -
>  drivers/irqchip/irq-ti-sci-inta.c   |2 
>  drivers/soc/ti/k3-ringacc.c |6 
>  drivers/soc/ti/ti_sci_inta_msi.c|   22 --
>  include/linux/soc/ti/ti_sci_inta_msi.h  |1 

Also while testing on TI K3 platforms, I noticed:

msi_device_data_release/msi_device_destroy_sysfs in am64xx-evm / j7200
[1] 
https://gist.github.com/nmenon/36899c7819681026cfe1ef185fb95f33#file-am64xx-evm-txt-L1018
[2] 
https://gist.github.com/nmenon/36899c7819681026cfe1ef185fb95f33#file-j7200-evm-txt-L1076

Which is not present in vanilla v5.16-rc4

v5.16-rc4:
https://gist.github.com/nmenon/1aee3f0a7da47d5e9dcb7336b32a70cb

msi-v3-part-2:
https://gist.github.com/nmenon/36899c7819681026cfe1ef185fb95f33

(.config https://gist.github.com/nmenon/ec6f95303828abf16a64022d8e3a269f)

Vs:
next-20211208:
https://gist.github.com/nmenon/f5ca3558bd5c1fbe62dc5ceb420b536e

-- 
Regards,
Nishanth Menon
Key (0xDDB5849D1736249D)/Fingerprint: F8A2 8693 54EB 8232 17A3  1A34 DDB5 849D 
1736 249D



Re: [patch V3 34/35] soc: ti: ti_sci_inta_msi: Get rid of ti_sci_inta_msi_get_virq()

2021-12-13 Thread Nishanth Menon
On 23:19-20211210, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> Just use the core function msi_get_virq().
> 
> Signed-off-by: Thomas Gleixner 
> Reviewed-by: Greg Kroah-Hartman 
> Reviewed-by: Jason Gunthorpe 
> Cc: Peter Ujfalusi 
> Cc: Vinod Koul 
> Cc: dmaeng...@vger.kernel.org

Acked-by: Nishanth Menon 

-- 
Regards,
Nishanth Menon
Key (0xDDB5849D1736249D)/Fingerprint: F8A2 8693 54EB 8232 17A3  1A34 DDB5 849D 
1736 249D



[linux-linus test] 167399: tolerable FAIL - PUSHED

2021-12-13 Thread osstest service owner
flight 167399 linux-linus real [real]
flight 167403 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/167399/
http://logs.test-lab.xenproject.org/osstest/logs/167403/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-credit2   8 xen-bootfail pass in 167403-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-credit2 15 migrate-support-check fail in 167403 never pass
 test-armhf-armhf-xl-credit2 16 saverestore-support-check fail in 167403 never 
pass
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail  like 167389
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167389
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167389
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167389
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167389
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167389
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167389
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167389
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167389
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass

version targeted for testing:
 linuxaa50faff4416c869b52dff68a937c84d29e12f4b
baseline version:
 linux2585cf9dfaaddf00b069673f27bb3f8530e2039c

Last test of basis   167389  2021-12-13 03:17:19 Z1 days
Testing same since   167399  2021-12-13 19:39:34 Z0 days1 attempts


People who touched revisions under test:
  Carel Si 
  Linus Torvalds 
  Sergio Paracuellos 

jobs:
 build-

[xen-unstable-smoke test] 167400: tolerable all pass - PUSHED

2021-12-13 Thread osstest service owner
flight 167400 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167400/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  d828caa9aeee80c59a35f662f875f6573e9b532f
baseline version:
 xen  df3e1a5efe700a9f59eced801cac73f9fd02a0e2

Last test of basis   167343  2021-12-10 15:01:44 Z3 days
Testing same since   167400  2021-12-13 20:00:36 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony PERARD 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   df3e1a5efe..d828caa9ae  d828caa9aeee80c59a35f662f875f6573e9b532f -> smoke



Re: [PATCH v2] arm/efi: Handle Xen bootargs from both xen.cfg and DT

2021-12-13 Thread Stefano Stabellini
On Mon, 13 Dec 2021, Luca Fancellu wrote:
> Currently the Xen UEFI stub can accept Xen boot arguments from
> the Xen configuration file using the "options=" keyword, but also
> directly from the device tree specifying xen,xen-bootargs
> property.
> 
> When the configuration file is used, device tree boot arguments
> are ignored and overwritten even if the keyword "options=" is
> not used.
> 
> This patch handle this case, so if the Xen configuration file is not
> specifying boot arguments, the device tree boot arguments will be
> used, if they are present.
> 
> Signed-off-by: Luca Fancellu 

Reviewed-by: Stefano Stabellini 


> ---
> v2 changes:
>  - Changed logic, now xen cfg bootarg value has precedence over DT
> ---
>  docs/misc/efi.pandoc|  4 
>  xen/arch/arm/efi/efi-boot.h | 15 +++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/docs/misc/efi.pandoc b/docs/misc/efi.pandoc
> index abafb3452758..71fdc316b67b 100644
> --- a/docs/misc/efi.pandoc
> +++ b/docs/misc/efi.pandoc
> @@ -249,6 +249,10 @@ UEFI stub for module loading.
>  When adding DomU modules to device tree, also add the property
>  xen,uefi-cfg-load under chosen for Xen to load the Xen config file.
>  Otherwise, Xen will skip the config file and rely on device tree alone.
> +When using the Xen configuration file in conjunction with the device tree, 
> you
> +can specify the Xen boot arguments in the configuration file with the 
> "options="
> +keyword or in the device tree with the "xen,xen-bootargs" property, but be
> +aware that the Xen configuration file value has a precedence over the DT 
> value.
>  
>  Example 1 of how to boot a true dom0less configuration:
>  
> diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
> index 4fb345f225c8..ae8627134e5a 100644
> --- a/xen/arch/arm/efi/efi-boot.h
> +++ b/xen/arch/arm/efi/efi-boot.h
> @@ -503,11 +503,26 @@ static void __init efi_arch_handle_cmdline(CHAR16 
> *image_name,
>  
>  if ( cfgfile_options )
>  {
> +PrintMessage(L"Using bootargs from Xen configuration file.");
>  prop_len += snprintf(buf + prop_len,
> EFI_PAGE_SIZE - prop_len, " %s", 
> cfgfile_options);
>  if ( prop_len >= EFI_PAGE_SIZE )
>  blexit(L"FDT string overflow");
>  }
> +else
> +{
> +/* Get xen,xen-bootargs in /chosen if it is specified */
> +const char *dt_bootargs_prop = fdt_getprop(fdt, chosen,
> +   "xen,xen-bootargs", NULL);
> +if ( dt_bootargs_prop )
> +{
> +PrintMessage(L"Using bootargs from device tree.");
> +prop_len += snprintf(buf + prop_len, EFI_PAGE_SIZE - prop_len,
> + " %s", dt_bootargs_prop);
> +if ( prop_len >= EFI_PAGE_SIZE )
> +blexit(L"FDT string overflow");
> +}
> +}
>  
>  if ( cmdline_options )
>  {
> -- 
> 2.17.1
> 



Re: [PATCH] xen/arm: increase memory banks number define value

2021-12-13 Thread Stefano Stabellini
On Mon, 13 Dec 2021, Luca Fancellu wrote:
> Currently the maximum number of memory banks (NR_MEM_BANKS define)
> is fixed to 128, but on some new platforms that have a large amount
> of memory, this value is not enough and prevents Xen from booting.
> 
> Increase the value to 256.
> 
> Signed-off-by: Luca Fancellu 

Reviewed-by: Stefano Stabellini 


> ---
>  xen/include/asm-arm/setup.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
> index 95da0b7ab9cd..07daf160dc57 100644
> --- a/xen/include/asm-arm/setup.h
> +++ b/xen/include/asm-arm/setup.h
> @@ -6,7 +6,7 @@
>  #define MIN_FDT_ALIGN 8
>  #define MAX_FDT_SIZE SZ_2M
>  
> -#define NR_MEM_BANKS 128
> +#define NR_MEM_BANKS 256
>  
>  #define MAX_MODULES 32 /* Current maximum useful modules */
>  
> -- 
> 2.17.1
> 



Re: [PATCH 03/10] mini-os: don't assume contiguous RAM when initializing in PVH mode

2021-12-13 Thread Samuel Thibault
Juergen Gross, le lun. 13 déc. 2021 15:58:58 +0100, a ecrit:
> On 12.12.21 01:15, Samuel Thibault wrote:
> > Juergen Gross, le lun. 06 déc. 2021 08:23:30 +0100, a ecrit:
> > > -unsigned long pfn, max = 0;
> > > +unsigned long pfns, max = 0;
> > 
> > I'd say rather rename max to start.
> > 
> > >   e820_get_memmap();
> > > @@ -166,9 +166,12 @@ unsigned long e820_get_maxpfn(void)
> > >   {
> > >   if ( e820_map[i].type != E820_RAM )
> > >   continue;
> > > -pfn = (e820_map[i].addr + e820_map[i].size) >> PAGE_SHIFT;
> > > -if ( pfn > max )
> > > -max = pfn;
> > > +pfns = e820_map[i].size >> PAGE_SHIFT;
> > > +max = e820_map[i].addr >> PAGE_SHIFT;
> > 
> > since it's it's always the start of the e820 entry.
> > 
> > > +if ( pages <= pfns )
> > > +return max + pages;
> > > +pages -= pfns;
> > > +max += pfns;
> > 
> > Here we don't need do change max, only pages.
> 
> It is needed in case the loop is finished.
> 
> And this was the reason for naming it max.

Ah, ok.

At first read the name was confusing me. Perhaps better use two
variables then: start and max, so that we have

start = e820_map[i].addr >> PAGE_SHIFT;
if ( pages <= pfns )
return start + pages;
pages -= pfns;
max = start + pfns;

Samuel



Re: [PATCH 02/10] mini-os: sort and sanitize e820 memory map

2021-12-13 Thread Samuel Thibault
Juergen Gross, le lun. 13 déc. 2021 15:56:21 +0100, a ecrit:
> On 12.12.21 01:05, Samuel Thibault wrote:
> > Hello,
> > 
> > Juergen Gross, le lun. 06 déc. 2021 08:23:29 +0100, a ecrit:
> > > - align the entries to page boundaries
> > 
> > > +/* Adjust map entries to page boundaries. */
> > > +for ( i = 0; i < e820_entries; i++ )
> > > +{
> > > +end = (e820_map[i].addr + e820_map[i].size + PAGE_SIZE - 1) & 
> > > PAGE_MASK;
> > > +e820_map[i].addr &= PAGE_MASK;
> > > +e820_map[i].size = end - e820_map[i].addr;
> > > +}
> > 
> > Mmm, what if the previous entry ends after the aligned start?
> > 
> > On real machines that does happen, and you'd rather round up the start
> > address of usable areas, rather than rounding it down (and conversely
> > for the end).
> 
> I think you are partially right. :-)
> 
> Entries for resources managed by Mini-OS (RAM, maybe NVME?) should be
> rounded to cover only complete pages (start rounded up, end rounded
> down), but all other entries should be rounded to cover the complete
> area (start rounded down, end rounded up) in order not to use any
> partial used page for e.g. mapping foreign pages.

Right!

> > > +/* Sort entries by start address. */
> > > +for ( i = 0; i < e820_entries - 1; i++ )
> > > +{
> > > +if ( e820_map[i].addr > e820_map[i + 1].addr )
> > > +{
> > > +e820_swap_entries(i, i + 1);
> > > +i = -1;
> > > +}
> > > +}
> > 
> > This looks O(n^3) to me? A bubble sort like this should be fine:
> > 
> >  /* Sort entries by start address. */
> >  for ( last = e820_entries; last > 1; last-- )
> >  {
> >  for ( i = 0; i < last - 1; i++ )
> >  {
> >  if ( e820_map[i].addr > e820_map[i + 1].addr )
> >  {
> >  e820_swap_entries(i, i + 1);
> >  }
> >  }
> >  }
> 
> Hmm, depends.
> 
> Assuming a rather well sorted map my version is O(n), while yours
> is still O(n^2).

Right, I was a bit lazy :)

This should be fine:

/* Sort entries by start address. */
for ( i = 1; i < e820_entries; i++ )
for ( j = i; j > 0 && e820_map[j-1].addr > e820_map[j].addr ) ; j-- )
e820_swap_entries(j - 1, j);

> I'm fine both ways, whatever you prefer.

I really prefer for loops which don't unexpectedly modify their loop
index, that's much less scary :)

Samuel



[PATCH] tools/libs: Don't recursively expand MAJOR ?= $(shell ...)

2021-12-13 Thread Andrew Cooper
?= is a deferred assignment.  Switch to an alternative form which lets us use
an immediate assignment.

Before, version.sh gets run anywhere between 46 and 88 times, with 50 on a
`clean`.  After, 6 times, invariant of main rune, and whether it is an
incremental build or not.

Signed-off-by: Andrew Cooper 
---
CC: Juergen Gross 
CC: Wei Liu 
CC: Anthony PERARD 

The identity transform comes from the docs
https://www.gnu.org/software/make/manual/make.html#Flavors (final paragraph).

Something slightly weird is going on.  Before this, the exact number of hits
that verson.sh gets isn't stable, even when running repeat incremental builds.
I suspect this means we've got a lurking parallel build issue.
---
 tools/libs/libs.mk | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/libs/libs.mk b/tools/libs/libs.mk
index dfbbef4080f6..b21e0bf083a0 100644
--- a/tools/libs/libs.mk
+++ b/tools/libs/libs.mk
@@ -6,7 +6,10 @@
 #   MINOR:   minor version of lib (0 if empty)
 
 LIBNAME := $(notdir $(CURDIR))
-MAJOR ?= $(shell $(XEN_ROOT)/version.sh $(XEN_ROOT)/xen/Makefile)
+
+ifeq ($(origin MAJOR), undefined)
+MAJOR := $(shell $(XEN_ROOT)/version.sh $(XEN_ROOT)/xen/Makefile)
+endif
 MINOR ?= 0
 
 SHLIB_LDFLAGS += -Wl,--version-script=libxen$(LIBNAME).map
-- 
2.11.0




Re: [PATCH v2 2/2] memory: XENMEM_add_to_physmap (almost) wrapping checks

2021-12-13 Thread Stefano Stabellini
On Mon, 13 Dec 2021, Jan Beulich wrote:
> On 10.12.2021 15:10, Oleksandr Andrushchenko wrote:
> > On 10.12.21 11:40, Jan Beulich wrote:
> >> --- a/xen/arch/arm/mm.c
> >> +++ b/xen/arch/arm/mm.c
> >> @@ -1479,7 +1479,7 @@ int xenmem_add_to_physmap_one(
> >>   break;
> >>   }
> >>   case XENMAPSPACE_dev_mmio:
> >> -rc = map_dev_mmio_region(d, gfn, 1, _mfn(idx));
> >> +rc = map_dev_mmio_region(d, gfn, _mfn(idx));
> > Technically this is ok, but reads odd now: the function maps a single
> > page, but its name has "region" in it (which might also be ok, e.g.
> > for a region of a single page).
> > 
> > I think it is worth either implementing full mfn range check inside
> > map_dev_mmio_region or renaming it to something else:
> > with mfn check map_dev_mmio_region will indeed be able to map
> > a region consisting of multiple pages and perform required validation.
> 
> Well, I had no maintainer comments on v1 regarding the name. I'd be
> happy to rename to e.g. map_dev_mmio_page(), so long as there can be
> an agreed upon name before I submit a possible v3. Julien, Stefano?

I like map_dev_mmio_page



Re: [PATCH] tools/libfsimage: Bump SONAME to 4.17

2021-12-13 Thread Andrew Cooper
On 13/12/2021 18:26, Anthony PERARD wrote:
> On Mon, Dec 13, 2021 at 05:56:33PM +, Andrew Cooper wrote:
>> Fixes: a5706b80f42e ("Set version to 4.17: rerun autogen.sh")
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Wei Liu 
>> CC: Anthony PERARD 
>> CC: Juergen Gross 
>>
>> This gets forgotten each release.  Any chance we can move libfsimage and/or
>> libacpi into libs/, where this issue would go away.
> libfsimage is a weird kind of library, I don't think it's going to sit
> well in libs/, and I don't think moving it just because MAJOR hasn't been
> bumped is a good reason...

Honestly, the better reasons is "because it's a library, it ought to
live in libs/ with the rest of them", but yeah, it is weird.  Perhaps
instead we want to tie it to pygrub so people have an easier time of
turning the whole lot off.

>
> libacpi isn't a library, not really. It's a collection of source files
> designed to be embedded into other programmes.

We'd normally call that libacpi.a, and it's perhaps a worthwhile change
to make.

>
> You know, we could simply replace "4.16" by
> $(shell $(XEN_ROOT)/version.sh $(XEN_ROOT)/xen/Makefile)
> then we can forget libfsimage exist.
>
> Any any case,
> Acked-by: Anthony PERARD 
> for this patch, or for using version.sh instead.

Good shout.  I'll double check that

MAJOR := $(shell $(XEN_ROOT)/version.sh $(XEN_ROOT)/xen/Makefile)

does the right thing, and commit that version.  One fewer thing to worry
about.

~Andrew



Re: [PATCH] tools/libfsimage: Bump SONAME to 4.17

2021-12-13 Thread Anthony PERARD
On Mon, Dec 13, 2021 at 05:56:33PM +, Andrew Cooper wrote:
> Fixes: a5706b80f42e ("Set version to 4.17: rerun autogen.sh")
> Signed-off-by: Andrew Cooper 
> ---
> CC: Wei Liu 
> CC: Anthony PERARD 
> CC: Juergen Gross 
> 
> This gets forgotten each release.  Any chance we can move libfsimage and/or
> libacpi into libs/, where this issue would go away.

libfsimage is a weird kind of library, I don't think it's going to sit
well in libs/, and I don't think moving it just because MAJOR hasn't been
bumped is a good reason...

libacpi isn't a library, not really. It's a collection of source files
designed to be embedded into other programmes.

You know, we could simply replace "4.16" by
$(shell $(XEN_ROOT)/version.sh $(XEN_ROOT)/xen/Makefile)
then we can forget libfsimage exist.

Any any case,
Acked-by: Anthony PERARD 
for this patch, or for using version.sh instead.

> ---
>  tools/libfsimage/common/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/libfsimage/common/Makefile 
> b/tools/libfsimage/common/Makefile
> index 24bc90e73e71..54049ebaae86 100644
> --- a/tools/libfsimage/common/Makefile
> +++ b/tools/libfsimage/common/Makefile
> @@ -1,7 +1,7 @@
>  XEN_ROOT = $(CURDIR)/../../..
>  include $(XEN_ROOT)/tools/libfsimage/Rules.mk
>  
> -MAJOR = 4.16
> +MAJOR = 4.17
>  MINOR = 0
>  
>  LDFLAGS-$(CONFIG_SunOS) = -Wl,-M -Wl,mapfile-SunOS

Thanks,

-- 
Anthony PERARD



[ovmf test] 167394: all pass - PUSHED

2021-12-13 Thread osstest service owner
flight 167394 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167394/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 7a6e6ae9332614d386446d2a73e34b74fe66446f
baseline version:
 ovmf ec37fd9c1fbc6c14ad3291b415ad6677a022a554

Last test of basis   167393  2021-12-13 13:42:29 Z0 days
Testing same since   167394  2021-12-13 16:12:39 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 
  Gerd Hoffmann 
  Rebecca Cran 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   ec37fd9c1f..7a6e6ae933  7a6e6ae9332614d386446d2a73e34b74fe66446f -> 
xen-tested-master



[PATCH] tools/libfsimage: Bump SONAME to 4.17

2021-12-13 Thread Andrew Cooper
Fixes: a5706b80f42e ("Set version to 4.17: rerun autogen.sh")
Signed-off-by: Andrew Cooper 
---
CC: Wei Liu 
CC: Anthony PERARD 
CC: Juergen Gross 

This gets forgotten each release.  Any chance we can move libfsimage and/or
libacpi into libs/, where this issue would go away.
---
 tools/libfsimage/common/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libfsimage/common/Makefile b/tools/libfsimage/common/Makefile
index 24bc90e73e71..54049ebaae86 100644
--- a/tools/libfsimage/common/Makefile
+++ b/tools/libfsimage/common/Makefile
@@ -1,7 +1,7 @@
 XEN_ROOT = $(CURDIR)/../../..
 include $(XEN_ROOT)/tools/libfsimage/Rules.mk
 
-MAJOR = 4.16
+MAJOR = 4.17
 MINOR = 0
 
 LDFLAGS-$(CONFIG_SunOS) = -Wl,-M -Wl,mapfile-SunOS
-- 
2.11.0




Re: [PATCH] tools/libxl: Don't read STORE/CONSOLE_PFN from Xen

2021-12-13 Thread Andrew Cooper
On 10/12/2021 14:08, Juergen Gross wrote:
> On 10.12.21 14:49, Andrew Cooper wrote:
>> On 10/12/2021 11:16, Juergen Gross wrote:
>>> On 09.12.21 18:07, Andrew Cooper wrote:
 The values are already available in dom->{console,xenstore}_pfn, just
 like on
 the PV side of things.  No need to ask Xen.

 Signed-off-by: Andrew Cooper 
 ---
 CC: Wei Liu 
 CC: Anthony PERARD 
 CC: Juergen Gross 
 ---
    tools/libs/light/libxl_dom.c | 17 +
    1 file changed, 5 insertions(+), 12 deletions(-)

 diff --git a/tools/libs/light/libxl_dom.c
 b/tools/libs/light/libxl_dom.c
 index c9c24666cd04..03841243ab47 100644
 --- a/tools/libs/light/libxl_dom.c
 +++ b/tools/libs/light/libxl_dom.c
 @@ -722,13 +722,10 @@ int libxl__build_pv(libxl__gc *gc, uint32_t
 domid,
    }
      static int hvm_build_set_params(xc_interface *handle, uint32_t
 domid,
 -    libxl_domain_build_info *info,
 -    unsigned long *store_mfn,
 -    unsigned long *console_mfn)
 +    libxl_domain_build_info *info)
    {
    struct hvm_info_table *va_hvm;
    uint8_t *va_map, sum;
 -    uint64_t str_mfn, cons_mfn;
    int i;
      if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
>>>
>>> What about moving this if () to the only caller and renaming the
>>> function from hvm_build_set_params() to hvm_set_info_table()?
>>
>> Because I was hoping to delete it outright in a subsequent patch.
>
> I'd suggest to either do the renaming or to add that subsequent patch
> making this a small series.

That's a separate task, which I don't have time to untangle right now.

I don't think it is worth delaying this improvement for a future
only-tangentially-related change.

~Andrew



Re: [PATCH v3 0/6] aio-posix: split poll check from ready handler

2021-12-13 Thread Stefan Hajnoczi
On Tue, Dec 07, 2021 at 01:23:30PM +, Stefan Hajnoczi wrote:
> v3:
> - Fixed FUSE export aio_set_fd_handler() call that I missed and double-checked
>   for any other missing call sites using Coccinelle [Rich]
> v2:
> - Cleaned up unused return values in nvme and virtio-blk [Stefano]
> - Documented try_poll_mode() ready_list argument [Stefano]
> - Unified virtio-blk/scsi dataplane and non-dataplane virtqueue handlers 
> [Stefano]
> 
> The first patch improves AioContext's adaptive polling execution time
> measurement. This can result in better performance because the algorithm makes
> better decisions about when to poll versus when to fall back to file 
> descriptor
> monitoring.
> 
> The remaining patches unify the virtio-blk and virtio-scsi dataplane and
> non-dataplane virtqueue handlers. This became possible because the dataplane
> handler function now has the same function signature as the non-dataplane
> handler function. Stefano Garzarella prompted me to make this refactoring.
> 
> Stefan Hajnoczi (6):
>   aio-posix: split poll check from ready handler
>   virtio: get rid of VirtIOHandleAIOOutput
>   virtio-blk: drop unused virtio_blk_handle_vq() return value
>   virtio-scsi: prepare virtio_scsi_handle_cmd for dataplane
>   virtio: use ->handle_output() instead of ->handle_aio_output()
>   virtio: unify dataplane and non-dataplane ->handle_output()
> 
>  include/block/aio.h |  4 +-
>  include/hw/virtio/virtio-blk.h  |  2 +-
>  include/hw/virtio/virtio.h  |  5 +-
>  util/aio-posix.h|  1 +
>  block/curl.c| 11 ++--
>  block/export/fuse.c |  4 +-
>  block/io_uring.c| 19 ---
>  block/iscsi.c   |  4 +-
>  block/linux-aio.c   | 16 +++---
>  block/nfs.c |  6 +--
>  block/nvme.c| 51 ---
>  block/ssh.c |  4 +-
>  block/win32-aio.c   |  4 +-
>  hw/block/dataplane/virtio-blk.c | 16 +-
>  hw/block/virtio-blk.c   | 14 ++
>  hw/scsi/virtio-scsi-dataplane.c | 60 +++---
>  hw/scsi/virtio-scsi.c   |  2 +-
>  hw/virtio/virtio.c  | 73 +--
>  hw/xen/xen-bus.c|  6 +--
>  io/channel-command.c|  6 ++-
>  io/channel-file.c   |  3 +-
>  io/channel-socket.c |  3 +-
>  migration/rdma.c|  8 +--
>  tests/unit/test-aio.c   |  4 +-
>  util/aio-posix.c| 89 +
>  util/aio-win32.c|  4 +-
>  util/async.c| 10 +++-
>  util/main-loop.c|  4 +-
>  util/qemu-coroutine-io.c|  5 +-
>  util/vhost-user-server.c| 11 ++--
>  30 files changed, 219 insertions(+), 230 deletions(-)
> 
> -- 
> 2.33.1
> 
> 

Thanks, applied to my block-next tree:
https://gitlab.com/stefanha/qemu/commits/block-next

Stefan


signature.asc
Description: PGP signature


[ovmf test] 167393: all pass - PUSHED

2021-12-13 Thread osstest service owner
flight 167393 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167393/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf ec37fd9c1fbc6c14ad3291b415ad6677a022a554
baseline version:
 ovmf 1203eba58ecfcddf9a9ae164ccf32ca29037af82

Last test of basis   167392  2021-12-13 11:41:45 Z0 days
Testing same since   167393  2021-12-13 13:42:29 Z0 days1 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Jiewen Yao 
  Pierre Gondois 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   1203eba58e..ec37fd9c1f  ec37fd9c1fbc6c14ad3291b415ad6677a022a554 -> 
xen-tested-master



Re: [PATCH v2 15/18] IOMMU/x86: prefill newly allocate page tables

2021-12-13 Thread Roger Pau Monné
On Fri, Sep 24, 2021 at 11:54:58AM +0200, Jan Beulich wrote:
> Page table are used for two purposes after allocation: They either start
> out all empty, or they get filled to replace a superpage. Subsequently,
> to replace all empty or fully contiguous page tables, contiguous sub-
> regions will be recorded within individual page tables. Install the
> initial set of markers immediately after allocation. Make sure to retain
> these markers when further populating a page table in preparation for it
> to replace a superpage.
> 
> The markers are simply 4-bit fields holding the order value of
> contiguous entries. To demonstrate this, if a page table had just 16
> entries, this would be the initial (fully contiguous) set of markers:
> 
> index  0 1 2 3 4 5 6 7 8 9 A B C D E F
> marker 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0
> 
> "Contiguous" here means not only present entries with successively
> increasing MFNs, each one suitably aligned for its slot, but also a
> respective number of all non-present entries.

I'm afraid I'm slightly lost with all this, please bear with me. Is
this just a performance improvement when doing super-page
replacements, or there's more to it?

Thanks, Roger.



[PATCH] x86: enable interrupts around dump_execstate()

2021-12-13 Thread Jan Beulich
show_hvm_stack() requires interrupts to be enabled to avoids triggering
the consistency check in check_lock() for the p2m lock. To do so in
spurious_interrupt() requires adding reentrancy protection / handling
there.

Fixes: adb715db698b ("x86/HVM: also dump stacks from show_execution_state()")
Signed-off-by: Jan Beulich 
---
The obvious (but imo undesirable) alternative is to suppress the call to
show_hvm_stack() when interrupts are disabled.

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1386,22 +1386,40 @@ void smp_send_state_dump(unsigned int cp
  */
 void spurious_interrupt(struct cpu_user_regs *regs)
 {
+static DEFINE_PER_CPU(unsigned int, recursed);
+unsigned int cpu = smp_processor_id();
+
 /*
  * Check if this is a vectored interrupt (most likely, as this is probably
  * a request to dump local CPU state or to continue NMI handling).
  * Vectored interrupts are ACKed; spurious interrupts are not.
  */
-if (apic_isr_read(SPURIOUS_APIC_VECTOR)) {
+while ( apic_isr_read(SPURIOUS_APIC_VECTOR) )
+{
 bool is_spurious;
 
+if ( per_cpu(recursed, cpu)++ )
+return;
+
 ack_APIC_irq();
 is_spurious = !nmi_check_continuation();
-if (this_cpu(state_dump_pending)) {
-this_cpu(state_dump_pending) = false;
+
+if ( per_cpu(state_dump_pending, cpu) )
+{
+per_cpu(state_dump_pending, cpu) = false;
+
+local_irq_enable();
+
 dump_execstate(regs);
-is_spurious = false;
+
+local_irq_disable();
+
+/* (Ab)use is_spurious to arrange for loop continuation. */
+is_spurious = per_cpu(recursed, cpu) > 1;
 }
 
+per_cpu(recursed, cpu) = 0;
+
 if ( !is_spurious )
 return;
 }




Re: [PATCH 04/10] mini-os: respect memory map when ballooning up

2021-12-13 Thread Juergen Gross

On 12.12.21 01:26, Samuel Thibault wrote:

Juergen Gross, le lun. 06 déc. 2021 08:23:31 +0100, a ecrit:

@@ -81,8 +93,11 @@ int balloon_up(unsigned long n_pages)
  if ( n_pages > N_BALLOON_FRAMES )
  n_pages = N_BALLOON_FRAMES;
  
+start_pfn = e820_get_maxpfn(nr_mem_pages + 1) - 1;

+n_pages = e820_get_max_pages(start_pfn, n_pages);


I'd say call it e820_get_max_contig_pages?


Fine with me.




+unsigned long e820_get_max_pages(unsigned long pfn, unsigned long pages)
+{
+int i;
+unsigned long end;
+
+for ( i = 0; i < e820_entries; i++ )
+{
+if ( e820_map[i].type != E820_RAM ||
+ (e820_map[i].addr >> PAGE_SHIFT) > pfn )
+continue;


"> pfn" looks odd to me? If the start of the e820 entry is already
beyond pfn, we'll never find any other entry. We however do want to skip
entries that have addr+size that is below pfn.


Oh, you are right.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2 14/18] IOMMU: fold flush-all hook into "flush one"

2021-12-13 Thread Roger Pau Monné
On Fri, Sep 24, 2021 at 11:53:59AM +0200, Jan Beulich wrote:
> Having a separate flush-all hook has always been puzzling me some. We
> will want to be able to force a full flush via accumulated flush flags
> from the map/unmap functions. Introduce a respective new flag and fold
> all flush handling to use the single remaining hook.
> 
> Note that because of the respective comments in SMMU and IPMMU-VMSA
> code, I've folded the two prior hook functions into one. For SMMU-v3,
> which lacks a comment towards incapable hardware, I've left both
> functions in place on the assumption that selective and full flushes
> will eventually want separating.
> 
> Signed-off-by: Jan Beulich 

Just one nit I think.

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -731,18 +731,21 @@ static int __must_check iommu_flush_iotl
>  unsigned long page_count,
>  unsigned int flush_flags)
>  {
> -ASSERT(page_count && !dfn_eq(dfn, INVALID_DFN));
> -ASSERT(flush_flags);
> +if ( flush_flags & IOMMU_FLUSHF_all )
> +{
> +dfn = INVALID_DFN;
> +page_count = 0;

Don't we expect callers to already pass an invalid dfn and a 0 page
count when doing a full flush?

In the equivalent AMD code you didn't set those for the FLUSHF_all
case.

Thanks, Roger.



Re: [PATCH 03/10] mini-os: don't assume contiguous RAM when initializing in PVH mode

2021-12-13 Thread Juergen Gross

On 12.12.21 01:15, Samuel Thibault wrote:

Juergen Gross, le lun. 06 déc. 2021 08:23:30 +0100, a ecrit:

-unsigned long pfn, max = 0;
+unsigned long pfns, max = 0;


I'd say rather rename max to start.


  e820_get_memmap();
  
@@ -166,9 +166,12 @@ unsigned long e820_get_maxpfn(void)

  {
  if ( e820_map[i].type != E820_RAM )
  continue;
-pfn = (e820_map[i].addr + e820_map[i].size) >> PAGE_SHIFT;
-if ( pfn > max )
-max = pfn;
+pfns = e820_map[i].size >> PAGE_SHIFT;
+max = e820_map[i].addr >> PAGE_SHIFT;


since it's it's always the start of the e820 entry.


+if ( pages <= pfns )
+return max + pages;
+pages -= pfns;
+max += pfns;


Here we don't need do change max, only pages.


It is needed in case the loop is finished.

And this was the reason for naming it max.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 02/10] mini-os: sort and sanitize e820 memory map

2021-12-13 Thread Juergen Gross

On 12.12.21 01:05, Samuel Thibault wrote:

Hello,

Juergen Gross, le lun. 06 déc. 2021 08:23:29 +0100, a ecrit:

- align the entries to page boundaries



+/* Adjust map entries to page boundaries. */
+for ( i = 0; i < e820_entries; i++ )
+{
+end = (e820_map[i].addr + e820_map[i].size + PAGE_SIZE - 1) & 
PAGE_MASK;
+e820_map[i].addr &= PAGE_MASK;
+e820_map[i].size = end - e820_map[i].addr;
+}


Mmm, what if the previous entry ends after the aligned start?

On real machines that does happen, and you'd rather round up the start
address of usable areas, rather than rounding it down (and conversely
for the end).


I think you are partially right. :-)

Entries for resources managed by Mini-OS (RAM, maybe NVME?) should be
rounded to cover only complete pages (start rounded up, end rounded
down), but all other entries should be rounded to cover the complete
area (start rounded down, end rounded up) in order not to use any
partial used page for e.g. mapping foreign pages.




+/* Sort entries by start address. */
+for ( i = 0; i < e820_entries - 1; i++ )
+{
+if ( e820_map[i].addr > e820_map[i + 1].addr )
+{
+e820_swap_entries(i, i + 1);
+i = -1;
+}
+}


This looks O(n^3) to me? A bubble sort like this should be fine:

 /* Sort entries by start address. */
 for ( last = e820_entries; last > 1; last-- )
 {
 for ( i = 0; i < last - 1; i++ )
 {
 if ( e820_map[i].addr > e820_map[i + 1].addr )
 {
 e820_swap_entries(i, i + 1);
 }
 }
 }


Hmm, depends.

Assuming a rather well sorted map my version is O(n), while yours
is still O(n^2).

In the end it won't matter that much, because a normal map will have
only very few entries (usually 5 before merging consecutive entries).

I'm fine both ways, whatever you prefer.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [patch V3 27/35] PCI/MSI: Use __msi_get_virq() in pci_get_vector()

2021-12-13 Thread Jason Gunthorpe
On Fri, Dec 10, 2021 at 11:19:25PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> Use msi_get_vector() and handle the return value to be compatible.
> 
> No functional change intended.
> 
> Signed-off-by: Thomas Gleixner 
> Reviewed-by: Greg Kroah-Hartman 
> ---
> V2: Handle the INTx case directly instead of trying to be overly smart - Marc
> ---
>  drivers/pci/msi/msi.c |   25 +
>  1 file changed, 5 insertions(+), 20 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason



Re: [patch V3 26/35] genirq/msi: Provide interface to retrieve Linux interrupt number

2021-12-13 Thread Jason Gunthorpe
On Fri, Dec 10, 2021 at 11:19:23PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> This allows drivers to retrieve the Linux interrupt number instead of
> fiddling with MSI descriptors.
> 
> msi_get_virq() returns the Linux interrupt number or 0 in case that there
> is no entry for the given MSI index.
> 
> Signed-off-by: Thomas Gleixner 
> Reviewed-by: Greg Kroah-Hartman 
> ---
> V2: Simplify the implementation and let PCI deal with the PCI specialities - 
> Marc
> ---
>  include/linux/msi.h |2 ++
>  kernel/irq/msi.c|   36 
>  2 files changed, 38 insertions(+)

Reviewed-by: Jason Gunthorpe 

Jason



Re: [patch V3 25/35] powerpc/pseries/msi: Let core code check for contiguous entries

2021-12-13 Thread Jason Gunthorpe
On Fri, Dec 10, 2021 at 11:19:22PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> Set the domain info flag and remove the check.
> 
> Signed-off-by: Thomas Gleixner 
> Reviewed-by: Greg Kroah-Hartman 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: "Cédric Le Goater" 
> Cc: linuxppc-...@lists.ozlabs.org
> 
> ---
> V2: Remove it completely - Cedric
> ---
>  arch/powerpc/platforms/pseries/msi.c |   33 -
>  1 file changed, 8 insertions(+), 25 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason
 



Re: [patch V3 06/35] powerpc/pseries/msi: Use PCI device properties

2021-12-13 Thread Jason Gunthorpe
On Fri, Dec 10, 2021 at 11:18:52PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> instead of fiddling with MSI descriptors.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Michael Ellerman 
> Cc: linuxppc-...@lists.ozlabs.org
> ---
> V3: Use pci_dev->msix_enabled - Jason
> ---
>  arch/powerpc/platforms/pseries/msi.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe 
 
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -448,8 +448,7 @@ static int pseries_msi_ops_prepare(struc
>  int nvec, msi_alloc_info_t *arg)
>  {
>   struct pci_dev *pdev = to_pci_dev(dev);
> - struct msi_desc *desc = first_pci_msi_entry(pdev);
> - int type = desc->pci.msi_attrib.is_msix ? PCI_CAP_ID_MSIX : 
> PCI_CAP_ID_MSI;
> + int type = pdev->msix_enabled ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI;

Long term it probably makes sense to change the msi_domain_ops so that
it has PCI versions of the ops to use in places like this that hard
assume PCI is the only kind of MSI at all.

If the non-PCI op isn't provided then things like IMS would be denied
- and the PCI op can directly pass in a pci_dev * so we don't have all
these to_pci_devs() in drivers.

Jason



Re: [patch V3 05/35] powerpc/cell/axon_msi: Use PCI device property

2021-12-13 Thread Jason Gunthorpe
On Fri, Dec 10, 2021 at 11:18:51PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> instead of fiddling with MSI descriptors.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Arnd Bergmann 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: linuxppc-...@lists.ozlabs.org
> ---
> V3: Use pci_dev property - Jason
> V2: Invoke the function with the correct number of arguments - Andy
> ---
>  arch/powerpc/platforms/cell/axon_msi.c |5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason



Re: [patch V3 04/35] genirq/msi: Use PCI device property

2021-12-13 Thread Jason Gunthorpe
On Fri, Dec 10, 2021 at 11:18:49PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> to determine whether this is MSI or MSIX instead of consulting MSI
> descriptors.
> 
> Signed-off-by: Thomas Gleixner 
> ---
> V2: Use PCI device property - Jason
> ---
>  kernel/irq/msi.c |   17 ++---
>  1 file changed, 2 insertions(+), 15 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason



Re: [patch V3 03/35] x86/apic/msi: Use PCI device MSI property

2021-12-13 Thread Jason Gunthorpe
On Fri, Dec 10, 2021 at 11:18:47PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> instead of fiddling with MSI descriptors.
> 
> Signed-off-by: Thomas Gleixner 
> ---
> V3: Use pci_dev->msix_enabled - Jason
> ---
>  arch/x86/kernel/apic/msi.c |5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason



Re: [PATCH v7 2/3] usb: Introduce Xen pvUSB frontend (xen hcd)

2021-12-13 Thread Greg Kroah-Hartman
On Fri, Dec 03, 2021 at 01:50:44PM +0100, Juergen Gross wrote:
> On 03.12.21 13:49, Greg Kroah-Hartman wrote:
> > On Tue, Nov 23, 2021 at 02:20:47PM +0100, Juergen Gross wrote:
> > > Introduces the Xen pvUSB frontend. With pvUSB it is possible for a Xen
> > > domU to communicate with a USB device assigned to that domU. The
> > > communication is all done via the pvUSB backend in a driver domain
> > > (usually Dom0) which is owner of the physical device.
> > > 
> > > The pvUSB frontend is a USB hcd for a virtual USB host connector.
> > > 
> > > The code is taken from the pvUSB implementation in Xen done by Fujitsu
> > > based on Linux kernel 2.6.18.
> > > 
> > > Changes from the original version are:
> > > - port to upstream kernel
> > > - put all code in just one source file
> > > - move module to appropriate location in kernel tree
> > > - adapt to Linux style guide
> > > - minor code modifications to increase readability
> > > 
> > > Signed-off-by: Juergen Gross 
> > > ---
> > >   drivers/usb/host/Kconfig   |   11 +
> > >   drivers/usb/host/Makefile  |1 +
> > >   drivers/usb/host/xen-hcd.c | 1606 
> > >   3 files changed, 1618 insertions(+)
> > >   create mode 100644 drivers/usb/host/xen-hcd.c
> > 
> > This looks sane to me, but I don't know the HCD interface as well as
> > others on linux-usb do, like Alan Stern.
> > 
> > What tree do you want this to be merged through, my USB one?
> 
> Either that, or I can carry it through the Xen tree.
> 
> Its your choice. :-)

I've grabbed them now, thanks.

greg k-h



Re: [patch V3 02/35] x86/pci/XEN: Use PCI device property

2021-12-13 Thread Jason Gunthorpe
On Fri, Dec 10, 2021 at 11:18:46PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> instead of fiddling with MSI descriptors.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Juergen Gross 
> Cc: xen-devel@lists.xenproject.org
> ---
> V3: Use pci_dev->msix_enabled.
> ---
>  arch/x86/pci/xen.c |9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason



Re: [PATCH] xen/gntdev: fix unmap notification order

2021-12-13 Thread Boris Ostrovsky



On 12/10/21 4:28 AM, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

While working with Xen's libxenvchan library I have faced an issue with
unmap notifications sent in wrong order if both UNMAP_NOTIFY_SEND_EVENT
and UNMAP_NOTIFY_CLEAR_BYTE were requested: first we send an event channel
notification and then clear the notification byte which renders in the below
inconsistency (cli_live is the byte which was requested to be cleared on unmap):

[  444.514243] gntdev_put_map UNMAP_NOTIFY_SEND_EVENT map->notify.event 6
libxenvchan_is_open cli_live 1
[  444.515239] __unmap_grant_pages UNMAP_NOTIFY_CLEAR_BYTE at 14

Thus it is not possible to reliably implement the checks like
- wait for the notification (UNMAP_NOTIFY_SEND_EVENT)
- check the variable (UNMAP_NOTIFY_CLEAR_BYTE)
because it is possible that the variable gets checked before it is cleared
by the kernel.

To fix that we need to re-order the notifications, so the variable is first
gets cleared and then the event channel notification is sent.
With this fix I can see the correct order of execution:

[   54.522611] __unmap_grant_pages UNMAP_NOTIFY_CLEAR_BYTE at 14
[   54.537966] gntdev_put_map UNMAP_NOTIFY_SEND_EVENT map->notify.event 6
libxenvchan_is_open cli_live 0

Cc: sta...@vger.kernel.org
Signed-off-by: Oleksandr Andrushchenko 




Applied to for-linus-5.16c


-boris




Re: [PATCH v2 13/18] VT-d: allow use of superpage mappings

2021-12-13 Thread Jan Beulich
On 13.12.2021 12:54, Roger Pau Monné wrote:
> On Fri, Sep 24, 2021 at 11:52:47AM +0200, Jan Beulich wrote:
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -743,18 +743,37 @@ static int __must_check iommu_flush_iotl
>>  return iommu_flush_iotlb(d, INVALID_DFN, 0, 0);
>>  }
>>  
>> +static void queue_free_pt(struct domain *d, mfn_t mfn, unsigned int 
>> next_level)
> 
> Same comment as the AMD side patch, about naming the parameter just
> level.

Sure, will change.

>> @@ -1901,13 +1926,15 @@ static int __must_check intel_iommu_map_
>>  }
>>  
>>  page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
>> -pte = &page[dfn_x(dfn) & LEVEL_MASK];
>> +pte = &page[address_level_offset(dfn_to_daddr(dfn), level)];
>>  old = *pte;
>>  
>>  dma_set_pte_addr(new, mfn_to_maddr(mfn));
>>  dma_set_pte_prot(new,
>>   ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
>>   ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
>> +if ( IOMMUF_order(flags) )
> 
> You seem to use level > 1 in other places to check for whether the
> entry is intended to be a super-page. Is there any reason to use
> IOMMUF_order here instead?

"flags" is the original source of information here, so it seemed more
natural to use it. The following hunk uses "level > 1" to better
match the similar unmap logic as well as AMD code. Maybe I should
change those to also use "flags" (or "order" in the unmap case), as
that would allow re-using the local variable in the new patches in v3
doing the re-coalescing of present superpages (right now I'm using a
second, not very nicely named variable there).

I'll have to think about this some and check whether there are other
issues if I made such a change.

>> @@ -2328,6 +2361,11 @@ static int __init vtd_setup(void)
>> cap_sps_2mb(iommu->cap) ? ", 2MB" : "",
>> cap_sps_1gb(iommu->cap) ? ", 1GB" : "");
>>  
>> +if ( !cap_sps_2mb(iommu->cap) )
>> +large_sizes &= ~PAGE_SIZE_2M;
>> +if ( !cap_sps_1gb(iommu->cap) )
>> +large_sizes &= ~PAGE_SIZE_1G;
>> +
>>  #ifndef iommu_snoop
>>  if ( iommu_snoop && !ecap_snp_ctl(iommu->ecap) )
>>  iommu_snoop = false;
>> @@ -2399,6 +2437,9 @@ static int __init vtd_setup(void)
>>  if ( ret )
>>  goto error;
>>  
>> +ASSERT(iommu_ops.page_sizes & PAGE_SIZE_4K);
> 
> Since you are adding the assert, it might be more complete to check no
> other page sizes are set, iommu_ops.page_sizes == PAGE_SIZE_4K?

Ah yes, would make sense. Let me change this.

Jan




Re: [patch V3 01/35] PCI/MSI: Set pci_dev::msi[x]_enabled early

2021-12-13 Thread Jason Gunthorpe
On Fri, Dec 10, 2021 at 11:18:44PM +0100, Thomas Gleixner wrote:
> There are quite some places which retrieve the first MSI descriptor to
> evaluate whether the setup is for MSI or MSI-X. That's required because
> pci_dev::msi[x]_enabled is only set when the setup completed successfully.
> 
> There is no real reason why msi[x]_enabled can't be set at the beginning of
> the setup sequence and cleared in case of a failure.
> 
> Implement that so the MSI descriptor evaluations can be converted to simple
> property queries.
> 
> Signed-off-by: Thomas Gleixner 
> ---
> V3: New patch
> ---
>  drivers/pci/msi/msi.c |   23 +--
>  1 file changed, 17 insertions(+), 6 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason



[ovmf test] 167392: all pass - PUSHED

2021-12-13 Thread osstest service owner
flight 167392 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167392/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 1203eba58ecfcddf9a9ae164ccf32ca29037af82
baseline version:
 ovmf 2686468c437f23e5dbd0a517b04852c3c1f84f39

Last test of basis   167391  2021-12-13 09:41:34 Z0 days
Testing same since   167392  2021-12-13 11:41:45 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 
  Gerd Hoffmann 
  Jiewen Yao 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   2686468c43..1203eba58e  1203eba58ecfcddf9a9ae164ccf32ca29037af82 -> 
xen-tested-master



Re: [PATCH 01/11] x86/entry: Use swapgs and native_iret directly in swapgs_restore_regs_and_return_to_usermode

2021-12-13 Thread Juergen Gross

On 08.12.21 12:08, Lai Jiangshan wrote:

From: Lai Jiangshan 

swapgs_restore_regs_and_return_to_usermode() is used in native code
(non-xenpv) only now, so it doesn't need the PV-aware SWAPGS and
INTERRUPT_RETURN.

Signed-off-by: Lai Jiangshan 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2 13/18] VT-d: allow use of superpage mappings

2021-12-13 Thread Roger Pau Monné
On Fri, Sep 24, 2021 at 11:52:47AM +0200, Jan Beulich wrote:
> ... depending on feature availability (and absence of quirks).
> 
> Also make the page table dumping function aware of superpages.
> 
> Signed-off-by: Jan Beulich 

Just some minor nits.

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -743,18 +743,37 @@ static int __must_check iommu_flush_iotl
>  return iommu_flush_iotlb(d, INVALID_DFN, 0, 0);
>  }
>  
> +static void queue_free_pt(struct domain *d, mfn_t mfn, unsigned int 
> next_level)

Same comment as the AMD side patch, about naming the parameter just
level.

> @@ -1901,13 +1926,15 @@ static int __must_check intel_iommu_map_
>  }
>  
>  page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
> -pte = &page[dfn_x(dfn) & LEVEL_MASK];
> +pte = &page[address_level_offset(dfn_to_daddr(dfn), level)];
>  old = *pte;
>  
>  dma_set_pte_addr(new, mfn_to_maddr(mfn));
>  dma_set_pte_prot(new,
>   ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
>   ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
> +if ( IOMMUF_order(flags) )

You seem to use level > 1 in other places to check for whether the
entry is intended to be a super-page. Is there any reason to use
IOMMUF_order here instead?


> @@ -2328,6 +2361,11 @@ static int __init vtd_setup(void)
> cap_sps_2mb(iommu->cap) ? ", 2MB" : "",
> cap_sps_1gb(iommu->cap) ? ", 1GB" : "");
>  
> +if ( !cap_sps_2mb(iommu->cap) )
> +large_sizes &= ~PAGE_SIZE_2M;
> +if ( !cap_sps_1gb(iommu->cap) )
> +large_sizes &= ~PAGE_SIZE_1G;
> +
>  #ifndef iommu_snoop
>  if ( iommu_snoop && !ecap_snp_ctl(iommu->ecap) )
>  iommu_snoop = false;
> @@ -2399,6 +2437,9 @@ static int __init vtd_setup(void)
>  if ( ret )
>  goto error;
>  
> +ASSERT(iommu_ops.page_sizes & PAGE_SIZE_4K);

Since you are adding the assert, it might be more complete to check no
other page sizes are set, iommu_ops.page_sizes == PAGE_SIZE_4K?

Thanks, Roger.



[PATCH v2] arm/efi: Handle Xen bootargs from both xen.cfg and DT

2021-12-13 Thread Luca Fancellu
Currently the Xen UEFI stub can accept Xen boot arguments from
the Xen configuration file using the "options=" keyword, but also
directly from the device tree specifying xen,xen-bootargs
property.

When the configuration file is used, device tree boot arguments
are ignored and overwritten even if the keyword "options=" is
not used.

This patch handle this case, so if the Xen configuration file is not
specifying boot arguments, the device tree boot arguments will be
used, if they are present.

Signed-off-by: Luca Fancellu 
---
v2 changes:
 - Changed logic, now xen cfg bootarg value has precedence over DT
---
 docs/misc/efi.pandoc|  4 
 xen/arch/arm/efi/efi-boot.h | 15 +++
 2 files changed, 19 insertions(+)

diff --git a/docs/misc/efi.pandoc b/docs/misc/efi.pandoc
index abafb3452758..71fdc316b67b 100644
--- a/docs/misc/efi.pandoc
+++ b/docs/misc/efi.pandoc
@@ -249,6 +249,10 @@ UEFI stub for module loading.
 When adding DomU modules to device tree, also add the property
 xen,uefi-cfg-load under chosen for Xen to load the Xen config file.
 Otherwise, Xen will skip the config file and rely on device tree alone.
+When using the Xen configuration file in conjunction with the device tree, you
+can specify the Xen boot arguments in the configuration file with the 
"options="
+keyword or in the device tree with the "xen,xen-bootargs" property, but be
+aware that the Xen configuration file value has a precedence over the DT value.
 
 Example 1 of how to boot a true dom0less configuration:
 
diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
index 4fb345f225c8..ae8627134e5a 100644
--- a/xen/arch/arm/efi/efi-boot.h
+++ b/xen/arch/arm/efi/efi-boot.h
@@ -503,11 +503,26 @@ static void __init efi_arch_handle_cmdline(CHAR16 
*image_name,
 
 if ( cfgfile_options )
 {
+PrintMessage(L"Using bootargs from Xen configuration file.");
 prop_len += snprintf(buf + prop_len,
EFI_PAGE_SIZE - prop_len, " %s", 
cfgfile_options);
 if ( prop_len >= EFI_PAGE_SIZE )
 blexit(L"FDT string overflow");
 }
+else
+{
+/* Get xen,xen-bootargs in /chosen if it is specified */
+const char *dt_bootargs_prop = fdt_getprop(fdt, chosen,
+   "xen,xen-bootargs", NULL);
+if ( dt_bootargs_prop )
+{
+PrintMessage(L"Using bootargs from device tree.");
+prop_len += snprintf(buf + prop_len, EFI_PAGE_SIZE - prop_len,
+ " %s", dt_bootargs_prop);
+if ( prop_len >= EFI_PAGE_SIZE )
+blexit(L"FDT string overflow");
+}
+}
 
 if ( cmdline_options )
 {
-- 
2.17.1




[PATCH] xen/arm: increase memory banks number define value

2021-12-13 Thread Luca Fancellu
Currently the maximum number of memory banks (NR_MEM_BANKS define)
is fixed to 128, but on some new platforms that have a large amount
of memory, this value is not enough and prevents Xen from booting.

Increase the value to 256.

Signed-off-by: Luca Fancellu 
---
 xen/include/asm-arm/setup.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
index 95da0b7ab9cd..07daf160dc57 100644
--- a/xen/include/asm-arm/setup.h
+++ b/xen/include/asm-arm/setup.h
@@ -6,7 +6,7 @@
 #define MIN_FDT_ALIGN 8
 #define MAX_FDT_SIZE SZ_2M
 
-#define NR_MEM_BANKS 128
+#define NR_MEM_BANKS 256
 
 #define MAX_MODULES 32 /* Current maximum useful modules */
 
-- 
2.17.1




[ovmf test] 167391: all pass - PUSHED

2021-12-13 Thread osstest service owner
flight 167391 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167391/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 2686468c437f23e5dbd0a517b04852c3c1f84f39
baseline version:
 ovmf 8c06c53b585a7443b1e0e6c0eff45a62d56472cc

Last test of basis   167379  2021-12-11 18:11:27 Z1 days
Testing same since   167391  2021-12-13 09:41:34 Z0 days1 attempts


People who touched revisions under test:
  Gerd Hoffmann 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   8c06c53b58..2686468c43  2686468c437f23e5dbd0a517b04852c3c1f84f39 -> 
xen-tested-master



Re: [PATCH 00/10] Add support for Renesas R-Car S4 IPMMU and other misc changes

2021-12-13 Thread Oleksandr



On 13.12.21 12:11, Julien Grall wrote:

Hi,



Hi Julien




On 13/12/2021 10:05, Oleksandr wrote:


On 27.11.21 19:51, Oleksandr Tyshchenko wrote:


Hello all.

Gentle reminder.


This is in my queue of 50+ patches to review. EPAM is the main 
contributor for the IPMMU patches, so can one of your colleagues help 
to review it?


I think, yes, it is possible.





Cheers,


--
Regards,

Oleksandr Tyshchenko




Re: [PATCH v3 00/13] xen: drop hypercall function tables

2021-12-13 Thread Juergen Gross

On 13.12.21 11:35, Jan Beulich wrote:

On 09.12.2021 10:10, Juergen Gross wrote:

On 09.12.21 10:05, Jan Beulich wrote:

On 08.12.2021 16:55, Juergen Gross wrote:

In order to avoid indirect function calls on the hypercall path as
much as possible this series is removing the hypercall function tables
and is replacing the hypercall handler calls via the function array
by automatically generated call macros.

Another by-product of generating the call macros is the automatic
generating of the hypercall handler prototypes from the same data base
which is used to generate the macros.

This has the additional advantage of using type safe calls of the
handlers and to ensure related handler (e.g. PV and HVM ones) share
the same prototypes.

A very brief performance test (parallel build of the Xen hypervisor
in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
the performance with the patches applied. The test was performed using
a PV and a PVH guest.

Changes in V2:
- new patches 6, 14, 15
- patch 7: support hypercall priorities for faster code
- comments addressed

Changes in V3:
- patches 1 and 4 removed as already applied
- comments addressed

Juergen Gross (13):
xen: move do_vcpu_op() to arch specific code
xen: harmonize return types of hypercall handlers
xen: don't include asm/hypercall.h from C sources
xen: include compat/platform.h from hypercall.h
xen: generate hypercall interface related code
xen: use generated prototypes for hypercall handlers
x86/pv-shim: don't modify hypercall table
xen/x86: don't use hypercall table for calling compat hypercalls
xen/x86: call hypercall handlers via generated macro
xen/arm: call hypercall handlers via generated macro
xen/x86: add hypercall performance counters for hvm, correct pv
xen: drop calls_to_multicall performance counter
tools/xenperf: update hypercall names


It's not easy to tell which, if any, of the later patches are fully
independent of earlier ones and could go in ahead of those awaiting
further acks. Do you have any suggestion there, or should we wait
until things can be applied in order?


I think all but the last patch should be applied in order. The last one
obviously can be applied at any time.


Hmm, I think 11 and 12 are fine to go ahead as well; I actually need them
for some immediate purpose and hence I did pull them (but nothing else)
into my local tree, without observing issues.


Yeah, those should be okay to take.


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[linux-linus test] 167389: tolerable FAIL - PUSHED

2021-12-13 Thread osstest service owner
flight 167389 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167389/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail  like 167386
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167386
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167386
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167386
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167386
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167386
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167386
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167386
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167386
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass

version targeted for testing:
 linux2585cf9dfaaddf00b069673f27bb3f8530e2039c
baseline version:
 linux90d9fbc16b691403a80a119d7094528721c03279

Last test of basis   167386  2021-12-12 19:40:29 Z0 days
Testing same since   167389  2021-12-13 03:17:19 Z0 days1 attempts


People who touched revisions under test:
  Linus Torvalds 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass

Re: [PATCH v2 12/18] AMD/IOMMU: allow use of superpage mappings

2021-12-13 Thread Jan Beulich
On 13.12.2021 11:33, Roger Pau Monné wrote:
> On Mon, Dec 13, 2021 at 11:00:23AM +0100, Jan Beulich wrote:
>> On 13.12.2021 10:45, Roger Pau Monné wrote:
>>> It would be better if we could somehow account this in a per-vCPU way,
>>> kind of similar to what we do with vPCI BAR mappings.
>>
>> But recording them per-vCPU wouldn't make any difference to the
>> number of pages that could accumulate in a single run. Maybe I'm
>> missing something in what you're thinking about here ...
> 
> If Xen somehow did the free in guest vCPU context before resuming
> guest execution then you could yield when events are pending and thus
> allow other guests to run without hogging the pCPU, and the freeing
> would be accounted to vCPU sched slice time.

That's an interesting thought. Shouldn't be difficult to arrange for
HVM (from {svm,vmx}_vmenter_helper()), but I can't immediately see a
clean way of having the same for PV (short of an ad hoc call out of
assembly code somewhere after test_all_events).

Jan




Re: [PATCH v3 00/13] xen: drop hypercall function tables

2021-12-13 Thread Jan Beulich
On 09.12.2021 10:10, Juergen Gross wrote:
> On 09.12.21 10:05, Jan Beulich wrote:
>> On 08.12.2021 16:55, Juergen Gross wrote:
>>> In order to avoid indirect function calls on the hypercall path as
>>> much as possible this series is removing the hypercall function tables
>>> and is replacing the hypercall handler calls via the function array
>>> by automatically generated call macros.
>>>
>>> Another by-product of generating the call macros is the automatic
>>> generating of the hypercall handler prototypes from the same data base
>>> which is used to generate the macros.
>>>
>>> This has the additional advantage of using type safe calls of the
>>> handlers and to ensure related handler (e.g. PV and HVM ones) share
>>> the same prototypes.
>>>
>>> A very brief performance test (parallel build of the Xen hypervisor
>>> in a 6 vcpu guest) showed a very slim improvement (less than 1%) of
>>> the performance with the patches applied. The test was performed using
>>> a PV and a PVH guest.
>>>
>>> Changes in V2:
>>> - new patches 6, 14, 15
>>> - patch 7: support hypercall priorities for faster code
>>> - comments addressed
>>>
>>> Changes in V3:
>>> - patches 1 and 4 removed as already applied
>>> - comments addressed
>>>
>>> Juergen Gross (13):
>>>xen: move do_vcpu_op() to arch specific code
>>>xen: harmonize return types of hypercall handlers
>>>xen: don't include asm/hypercall.h from C sources
>>>xen: include compat/platform.h from hypercall.h
>>>xen: generate hypercall interface related code
>>>xen: use generated prototypes for hypercall handlers
>>>x86/pv-shim: don't modify hypercall table
>>>xen/x86: don't use hypercall table for calling compat hypercalls
>>>xen/x86: call hypercall handlers via generated macro
>>>xen/arm: call hypercall handlers via generated macro
>>>xen/x86: add hypercall performance counters for hvm, correct pv
>>>xen: drop calls_to_multicall performance counter
>>>tools/xenperf: update hypercall names
>>
>> It's not easy to tell which, if any, of the later patches are fully
>> independent of earlier ones and could go in ahead of those awaiting
>> further acks. Do you have any suggestion there, or should we wait
>> until things can be applied in order?
> 
> I think all but the last patch should be applied in order. The last one
> obviously can be applied at any time.

Hmm, I think 11 and 12 are fine to go ahead as well; I actually need them
for some immediate purpose and hence I did pull them (but nothing else)
into my local tree, without observing issues.

Jan




Re: [PATCH v2 12/18] AMD/IOMMU: allow use of superpage mappings

2021-12-13 Thread Roger Pau Monné
On Mon, Dec 13, 2021 at 11:00:23AM +0100, Jan Beulich wrote:
> On 13.12.2021 10:45, Roger Pau Monné wrote:
> > On Mon, Dec 13, 2021 at 09:49:50AM +0100, Jan Beulich wrote:
> >> On 10.12.2021 16:06, Roger Pau Monné wrote:
> >>> On Fri, Sep 24, 2021 at 11:52:14AM +0200, Jan Beulich wrote:
>  ---
>  I'm not fully sure about allowing 512G mappings: The scheduling-for-
>  freeing of intermediate page tables can take quite a while when
>  replacing a tree of 4k mappings by a single 512G one. Plus (or otoh)
>  there's no present code path via which 512G chunks of memory could be
>  allocated (and hence mapped) anyway.
> >>>
> >>> I would limit to 1G, which is what we support for CPU page tables
> >>> also.
> >>
> >> I'm not sure I buy comparing with CPU side support when not sharing
> >> page tables. Not the least with PV in mind.
> > 
> > Hm, my thinking was that similar reasons that don't allow us to do
> > 512G mappings for the CPU side would also apply to IOMMU. Regardless
> > of that, given the current way in which replaced page table entries
> > are freed, I'm not sure it's fine to allow 512G mappings as the
> > freeing of the possible huge amount of 4K entries could allow guests
> > to hog a CPU for a long time.
> 
> This huge amount can occur only when replacing a hierarchy with
> sufficiently many 4k leaves by a single 512G page. Yet there's no
> way - afaics - that such an operation can be initiated right now.
> That's, as said in the remark, because there's no way to allocate
> a 512G chunk of memory in one go. When re-coalescing, the worst
> that can happen is one L1 table worth of 4k mappings, one L2
> table worth of 2M mappings, and one L3 table worth of 1G mappings.
> All other mappings already need to have been superpage ones at the
> time of the checking. Hence the total upper bound (for the
> enclosing map / unmap) is again primarily determined by there not
> being any way to establish 512G mappings.
> 
> Actually, thinking about it, there's one path where 512G mappings
> could be established, but that's Dom0-reachable only
> (XEN_DOMCTL_memory_mapping) and would assume gigantic BARs in a
> PCI device. Even if such a device existed, I think we're fine to
> assume that Dom0 won't establish such mappings to replace
> existing ones, but only ever put them in place when nothing was
> mapped in that range yet.
> 
> > It would be better if we could somehow account this in a per-vCPU way,
> > kind of similar to what we do with vPCI BAR mappings.
> 
> But recording them per-vCPU wouldn't make any difference to the
> number of pages that could accumulate in a single run. Maybe I'm
> missing something in what you're thinking about here ...

If Xen somehow did the free in guest vCPU context before resuming
guest execution then you could yield when events are pending and thus
allow other guests to run without hogging the pCPU, and the freeing
would be accounted to vCPU sched slice time.

Thanks, Roger.



[libvirt test] 167390: regressions - FAIL

2021-12-13 Thread osstest service owner
flight 167390 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167390/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  9402db25f8fa88a68c4b3d5b21a459b66c54ef6e
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  521 days
Failing since151818  2020-07-11 04:18:52 Z  520 days  502 attempts
Testing same since   167361  2021-12-11 04:18:51 Z2 days3 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Andika Triwidada 
  Andrea Bolognani 
  Ani Sinha 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Borntraeger 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  dinglimin 
  Dmitrii Shcherbakov 
  Dmytro Linkin 
  Eiichi Tsukata 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Franck Ridel 
  Gavi Teitz 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Hela Basa 
  Helmut Grohne 
  Hiroki Narukawa 
  Ian Wienand 
  Ioanna Alifieraki 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  jason lee 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jinsheng Zhang 
  Jiri Denemark 
  Joachim Falk 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Koichi Murase 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Lei Yang 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Chevsky 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Or Ozeri 
  Orion Poplawski 
  Pany 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Praveen K Paladugu 
  Richard W.M. Jones 
  Ricky Tigg 
  Robin Lee 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  SeongHyun Jo 
  Shalini Chellathurai Saroja 
  Shaojun Yang 
  Shi Lei 
  simmon 
  Simon Chopin 
  Simon Gaiser 
  Simon Rowe 
  Stefan Bader 
  Stefan Berger 
  Stefan Berger 
  Stefan Hajnoczi 
  Stefan Hajnoczi 
  Szymon Scholz 
  Thomas Huth 
  T

Re: [PATCH 00/10] Add support for Renesas R-Car S4 IPMMU and other misc changes

2021-12-13 Thread Julien Grall

Hi,

On 13/12/2021 10:05, Oleksandr wrote:


On 27.11.21 19:51, Oleksandr Tyshchenko wrote:


Hello all.

Gentle reminder.


This is in my queue of 50+ patches to review. EPAM is the main 
contributor for the IPMMU patches, so can one of your colleagues help to 
review it?


Cheers,

--
Julien Grall



Re: [PATCH 00/10] Add support for Renesas R-Car S4 IPMMU and other misc changes

2021-12-13 Thread Oleksandr



On 27.11.21 19:51, Oleksandr Tyshchenko wrote:


Hello all.

Gentle reminder.


From: Oleksandr Tyshchenko 

The R-Car S4 is an automotive System-on-Chip (SoC) for Car Server/Communication
Gateway and is one of the first products in Renesas’ 4th-generation R-Car 
Family.

The integrated IOMMU HW is also VMSA-compatible and supports stage 2 translation
table format, therefore can be used with current R-Car Gen3 driver with slight
modifications.

In the context of Xen driver the main differences between Gen3 and S4 are
the following:
  - HW capacity was enlarged to support up to 16 IPMMU contexts (sets of page 
table)
and up to 64 micro-TLBs per IPMMU device
  - the memory mapped registers have different bases and offset

The first part (commits #1-6) is a non-verbatim port of Linux upstream commits
needed to add support for S4 series easily (prereq work).
The second part (commits #7-8) is based on the code from the Renesas BSP and
actually introduces support for R-Car S4 IPMMU.
The third part (commits #9-10) is misc changes I have locally.

The patch series is based on 4.16.0-rc4 branch and also available at [1].

Tested on Renesas Salvator-X board with H3 ES3.0 SoC (Gen3) and Renesas Spider
board with S4 SoC.

[1] https://github.com/otyshchenko1/xen/commits/s4_ipmmu_ml1

Oleksandr Tyshchenko (10):
   iommu/ipmmu-vmsa: Remove all unused register definitions
   iommu/ipmmu-vmsa: Add helper functions for MMU "context" registers
   iommu/ipmmu-vmsa: Add helper functions for "uTLB" registers
   iommu/ipmmu-vmsa: Add light version of Linux's ipmmu_features
   iommu/ipmmu-vmsa: Calculate context registers' offset instead of a
 macro
   iommu/ipmmu-vmsa: Add utlb_offset_base
   iommu/ipmmu-vmsa: Add Renesas R8A779F0 (R-Car S4) support
   iommu/ipmmu-vmsa: Set IPMMU bit IMSCTLR_USE_SECGRP to 0
   iommu/ipmmu-vmsa: Use refcount for the micro-TLBs
   iommu/arm: Remove code duplication in all IOMMU drivers

  xen/drivers/passthrough/Kconfig  |   6 +-
  xen/drivers/passthrough/arm/iommu.c  |   7 +
  xen/drivers/passthrough/arm/ipmmu-vmsa.c | 278 +++
  xen/drivers/passthrough/arm/smmu-v3.c|  10 --
  xen/drivers/passthrough/arm/smmu.c   |  10 --
  5 files changed, 178 insertions(+), 133 deletions(-)


--
Regards,

Oleksandr Tyshchenko




Re: [PATCH v2 12/18] AMD/IOMMU: allow use of superpage mappings

2021-12-13 Thread Jan Beulich
On 13.12.2021 10:45, Roger Pau Monné wrote:
> On Mon, Dec 13, 2021 at 09:49:50AM +0100, Jan Beulich wrote:
>> On 10.12.2021 16:06, Roger Pau Monné wrote:
>>> On Fri, Sep 24, 2021 at 11:52:14AM +0200, Jan Beulich wrote:
 ---
 I'm not fully sure about allowing 512G mappings: The scheduling-for-
 freeing of intermediate page tables can take quite a while when
 replacing a tree of 4k mappings by a single 512G one. Plus (or otoh)
 there's no present code path via which 512G chunks of memory could be
 allocated (and hence mapped) anyway.
>>>
>>> I would limit to 1G, which is what we support for CPU page tables
>>> also.
>>
>> I'm not sure I buy comparing with CPU side support when not sharing
>> page tables. Not the least with PV in mind.
> 
> Hm, my thinking was that similar reasons that don't allow us to do
> 512G mappings for the CPU side would also apply to IOMMU. Regardless
> of that, given the current way in which replaced page table entries
> are freed, I'm not sure it's fine to allow 512G mappings as the
> freeing of the possible huge amount of 4K entries could allow guests
> to hog a CPU for a long time.

This huge amount can occur only when replacing a hierarchy with
sufficiently many 4k leaves by a single 512G page. Yet there's no
way - afaics - that such an operation can be initiated right now.
That's, as said in the remark, because there's no way to allocate
a 512G chunk of memory in one go. When re-coalescing, the worst
that can happen is one L1 table worth of 4k mappings, one L2
table worth of 2M mappings, and one L3 table worth of 1G mappings.
All other mappings already need to have been superpage ones at the
time of the checking. Hence the total upper bound (for the
enclosing map / unmap) is again primarily determined by there not
being any way to establish 512G mappings.

Actually, thinking about it, there's one path where 512G mappings
could be established, but that's Dom0-reachable only
(XEN_DOMCTL_memory_mapping) and would assume gigantic BARs in a
PCI device. Even if such a device existed, I think we're fine to
assume that Dom0 won't establish such mappings to replace
existing ones, but only ever put them in place when nothing was
mapped in that range yet.

> It would be better if we could somehow account this in a per-vCPU way,
> kind of similar to what we do with vPCI BAR mappings.

But recording them per-vCPU wouldn't make any difference to the
number of pages that could accumulate in a single run. Maybe I'm
missing something in what you're thinking about here ...

Jan




[xen-unstable test] 167387: tolerable FAIL

2021-12-13 Thread osstest service owner
flight 167387 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/167387/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail  like 167381
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167381
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 167381
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167381
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167381
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167381
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167381
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 167381
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 167381
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167381
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167381
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167381
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167381
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  df3e1a5efe700a9f5

Re: [PATCH v2 12/18] AMD/IOMMU: allow use of superpage mappings

2021-12-13 Thread Roger Pau Monné
On Mon, Dec 13, 2021 at 09:49:50AM +0100, Jan Beulich wrote:
> On 10.12.2021 16:06, Roger Pau Monné wrote:
> > On Fri, Sep 24, 2021 at 11:52:14AM +0200, Jan Beulich wrote:
> >> ---
> >> I'm not fully sure about allowing 512G mappings: The scheduling-for-
> >> freeing of intermediate page tables can take quite a while when
> >> replacing a tree of 4k mappings by a single 512G one. Plus (or otoh)
> >> there's no present code path via which 512G chunks of memory could be
> >> allocated (and hence mapped) anyway.
> > 
> > I would limit to 1G, which is what we support for CPU page tables
> > also.
> 
> I'm not sure I buy comparing with CPU side support when not sharing
> page tables. Not the least with PV in mind.

Hm, my thinking was that similar reasons that don't allow us to do
512G mappings for the CPU side would also apply to IOMMU. Regardless
of that, given the current way in which replaced page table entries
are freed, I'm not sure it's fine to allow 512G mappings as the
freeing of the possible huge amount of 4K entries could allow guests
to hog a CPU for a long time.

It would be better if we could somehow account this in a per-vCPU way,
kind of similar to what we do with vPCI BAR mappings.

> > Should we also assert that level (or next_level) is always != 0 for
> > extra safety?
> 
> As said elsewhere - if this wasn't a static helper, I'd agree. But all
> call sites have respective conditionals around the call. If anything
> I'd move those checks into the function (but only if you think that
> would improve things, as to me having them at the call sites is more
> logical).

I'm fine to leave the checks in the callers, was just a suggestion in
case we gain new callers that forgot to do the checks themselves.

Thanks, Roger.



Re: [PATCH v4 05/25] reboot: Warn if restart handler has duplicated priority

2021-12-13 Thread Geert Uytterhoeven
On Fri, Dec 10, 2021 at 8:14 PM Rafael J. Wysocki  wrote:
> On Fri, Dec 10, 2021 at 8:04 PM Dmitry Osipenko  wrote:
> > 10.12.2021 21:27, Rafael J. Wysocki пишет:
> > > On Mon, Nov 29, 2021 at 12:34 PM Dmitry Osipenko  wrote:
> > >> 29.11.2021 03:26, Michał Mirosław пишет:
> > >>> On Mon, Nov 29, 2021 at 12:06:19AM +0300, Dmitry Osipenko wrote:
> >  28.11.2021 03:28, Michał Mirosław пишет:
> > > On Fri, Nov 26, 2021 at 09:00:41PM +0300, Dmitry Osipenko wrote:
> > >> Add sanity check which ensures that there are no two restart handlers
> > >> registered with the same priority. Normally it's a direct sign of a
> > >> problem if two handlers use the same priority.
> > >
> > > The patch doesn't ensure the property that there are no 
> > > duplicated-priority
> > > entries on the chain.
> > 
> >  It's not the exact point of this patch.
> > 
> > > I'd rather see a atomic_notifier_chain_register_unique() that returns
> > > -EBUSY or something istead of adding an entry with duplicate priority.
> > > That way it would need only one list traversal unless you want to
> > > register the duplicate anyway (then you would call the older
> > > atomic_notifier_chain_register() after reporting the error).
> > 
> >  The point of this patch is to warn developers about the problem that
> >  needs to be fixed. We already have such troubling drivers in mainline.
> > 
> >  It's not critical to register different handlers with a duplicated
> >  priorities, but such cases really need to be corrected. We shouldn't
> >  break users' machines during transition to the new API, meanwhile
> >  developers should take action of fixing theirs drivers.
> > 
> > > (Or you could return > 0 when a duplicate is registered in
> > > atomic_notifier_chain_register() if the callers are prepared
> > > for that. I don't really like this way, though.)
> > 
> >  I had a similar thought at some point before and decided that I'm not 
> >  in
> >  favor of this approach. It's nicer to have a dedicated function that
> >  verifies the uniqueness, IMO.
> > >>>
> > >>> I don't like the part that it traverses the list second time to check
> > >>> the uniqueness. But actually you could avoid that if
> > >>> notifier_chain_register() would always add equal-priority entries in
> > >>> reverse order:
> > >>>
> > >>>  static int notifier_chain_register(struct notifier_block **nl,
> > >>>   struct notifier_block *n)
> > >>>  {
> > >>>   while ((*nl) != NULL) {
> > >>>   if (unlikely((*nl) == n)) {
> > >>>   WARN(1, "double register detected");
> > >>>   return 0;
> > >>>   }
> > >>> - if (n->priority > (*nl)->priority)
> > >>> + if (n->priority >= (*nl)->priority)
> > >>>   break;
> > >>>   nl = &((*nl)->next);
> > >>>   }
> > >>>   n->next = *nl;
> > >>>   rcu_assign_pointer(*nl, n);
> > >>>   return 0;
> > >>>  }
> > >>>
> > >>> Then the check for uniqueness after adding would be:
> > >>>
> > >>>  WARN(nb->next && nb->priority == nb->next->priority);
> > >>
> > >> We can't just change the registration order because invocation order of
> > >> the call chain depends on the registration order
> > >
> > > It doesn't if unique priorities are required and isn't that what you want?
> > >
> > >> and some of current
> > >> users may rely on that order. I'm pretty sure that changing the order
> > >> will have unfortunate consequences.
> > >
> > > Well, the WARN() doesn't help much then.
> > >
> > > Either you can make all of the users register with unique priorities,
> > > and then you can make the registration reject non-unique ones, or you
> > > cannot assume them to be unique.
> >
> > There is no strong requirement for priorities to be unique, the reboot.c
> > code will work properly.
>
> In which case adding the WARN() is not appropriate IMV.
>
> Also I've looked at the existing code and at least in some cases the
> order in which the notifiers run doesn't matter.  I'm not sure what
> the purpose of this patch is TBH.
>
> > The potential problem is on the user's side and the warning is intended
> > to aid the user.
>
> Unless somebody has the panic_on_warn mentioned previously set and
> really the user need not understand what the WARN() is about.  IOW,
> WARN() helps developers, not users.

Do panic_on_warn and reboot_on_panic play well with having a WARN()
in the reboot notifier handling?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds



Re: [PATCH v2 12/18] AMD/IOMMU: allow use of superpage mappings

2021-12-13 Thread Jan Beulich
On 10.12.2021 16:06, Roger Pau Monné wrote:
> On Fri, Sep 24, 2021 at 11:52:14AM +0200, Jan Beulich wrote:
>> ---
>> I'm not fully sure about allowing 512G mappings: The scheduling-for-
>> freeing of intermediate page tables can take quite a while when
>> replacing a tree of 4k mappings by a single 512G one. Plus (or otoh)
>> there's no present code path via which 512G chunks of memory could be
>> allocated (and hence mapped) anyway.
> 
> I would limit to 1G, which is what we support for CPU page tables
> also.

I'm not sure I buy comparing with CPU side support when not sharing
page tables. Not the least with PV in mind.

>> @@ -288,10 +289,31 @@ static int iommu_pde_from_dfn(struct dom
>>  return 0;
>>  }
>>  
>> +static void queue_free_pt(struct domain *d, mfn_t mfn, unsigned int 
>> next_level)
> 
> Nit: should the last parameter be named level rather than next_level?
> AFAICT it's the level of the mfn parameter.

Yeah, might make sense.

> Should we also assert that level (or next_level) is always != 0 for
> extra safety?

As said elsewhere - if this wasn't a static helper, I'd agree. But all
call sites have respective conditionals around the call. If anything
I'd move those checks into the function (but only if you think that
would improve things, as to me having them at the call sites is more
logical).

Jan




Re: [PATCH v2 08/18] IOMMU/x86: support freeing of pagetables

2021-12-13 Thread Jan Beulich
On 10.12.2021 14:51, Roger Pau Monné wrote:
> On Fri, Sep 24, 2021 at 11:48:21AM +0200, Jan Beulich wrote:
>> For vendor specific code to support superpages we need to be able to
>> deal with a superpage mapping replacing an intermediate page table (or
>> hierarchy thereof). Consequently an iommu_alloc_pgtable() counterpart is
>> needed to free individual page tables while a domain is still alive.
>> Since the freeing needs to be deferred until after a suitable IOTLB
>> flush was performed, released page tables get queued for processing by a
>> tasklet.
>>
>> Signed-off-by: Jan Beulich 
>> ---
>> I was considering whether to use a softirq-taklet instead. This would
>> have the benefit of avoiding extra scheduling operations, but come with
>> the risk of the freeing happening prematurely because of a
>> process_pending_softirqs() somewhere.
> 
> The main one that comes to mind would be the debug keys and it's usage
> of process_pending_softirqs that could interfere with iommu unmaps, so
> I guess if only for that reason it's best to run in idle vcpu context.

IOW you support the choice made.

>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -12,6 +12,7 @@
>>   * this program; If not, see .
>>   */
>>  
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -463,6 +464,85 @@ struct page_info *iommu_alloc_pgtable(st
>>  return pg;
>>  }
>>  
>> +/*
>> + * Intermediate page tables which get replaced by large pages may only be
>> + * freed after a suitable IOTLB flush. Hence such pages get queued on a
>> + * per-CPU list, with a per-CPU tasklet processing the list on the 
>> assumption
>> + * that the necessary IOTLB flush will have occurred by the time tasklets 
>> get
>> + * to run. (List and tasklet being per-CPU has the benefit of accesses not
>> + * requiring any locking.)
>> + */
>> +static DEFINE_PER_CPU(struct page_list_head, free_pgt_list);
>> +static DEFINE_PER_CPU(struct tasklet, free_pgt_tasklet);
>> +
>> +static void free_queued_pgtables(void *arg)
>> +{
>> +struct page_list_head *list = arg;
>> +struct page_info *pg;
>> +
>> +while ( (pg = page_list_remove_head(list)) )
>> +free_domheap_page(pg);
> 
> Should you add a preempt check here to yield and schedule the tasklet
> again, in order to be able to process any pending work?

I did ask myself this question, yes, but ...

> Maybe just calling process_pending_softirqs would be enough?

... I think I didn't consider this as a possible simpler variant (compared
to re-scheduling the tasklet). Let me add such - I agree that this should
be sufficient.

Jan