date:20240524

flight 186135 xen-4.17-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186135/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail in 
186129 pass in 186135
 test-amd64-coresched-amd64-xl 22 guest-start/debian.repeat fail pass in 186129

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 185864
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185864
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  3c7c9225ffa5605bf0603f9dd1666f3f786e2c44
baseline version:
 xen  effcf70f020ff12d34c80e2abde0ecb00ce92bda

Last test of basis   185864  2024-04-29 08:08:55 Z   25 days
Failing since186063  2024-05-21 10:06:36 Z3 days6 attempts
Testing same since   186069  2024-05-22 01:58:18 Z3 days5 attempts


People who touched revisions under test:
  Andrew Cooper 
  Daniel P. Smith 
  Demi Marie Obenour 
  Jan Beulich 
  Jason Andryuk 
  Jason Andryuk 
  Juergen Gross 
  Leigh Brown 
  Roger Pau Monné 
  Ross Lagerwall 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm

[xen-unstable-smoke test] 186142: tolerable all pass - PUSHED

flight 186142 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186142/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  ac572152e578a8853de0534384c1539ec21f11e0
baseline version:
 xen  2172a01c4cecbaa1d79bad200bfe3b996a3e4ba5

Last test of basis   186139  2024-05-24 17:00:22 Z0 days
Testing same since   186142  2024-05-25 00:02:11 Z0 days1 attempts


People who touched revisions under test:
  Henry Wang 
  Julien Grall 
  Stefano Stabellini 
  Stefano Stabellini 
  Vikram Garhwal 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   2172a01c4c..ac572152e5  ac572152e578a8853de0534384c1539ec21f11e0 -> smoke

Re: [PATCH v3 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor

On Fri, 24 May 2024, Jürgen Groß wrote:
> On 24.05.24 15:58, Julien Grall wrote:
> > Hi Henry,
> > 
> > + Juergen as the Xenstore maintainers. I'd like his opinion on the approach.
> > The documentation of the new logic is in:
> > 
> > https://lore.kernel.org/xen-devel/20240517032156.1490515-5-xin.wa...@amd.com/
> > 
> > FWIW I am happy in principle with the logic (this is what we discussed on
> > the call last week). Some comments below.
> 
> I'm not against this logic, but I'm wondering why it needs to be so
> complicated.

Actually the reason I like it is that in my view, this is the simplest
approach. You allocate a domain, you also allocate the xenstore page
together with it. Initially the xenstore connection has an
"uninitialized" state, as it should be. That's it. At some point, when
xenstored is ready, the state changes to CONNECTED.


> Can't the domU itself allocate the Xenstore page from its RAM pages,
> write the PFN into the Xenstore grant tab entry, and then make it
> public via setting HVM_PARAM_STORE_PFN?

This is not simpler in my view


> The init-dom0less application could then check HVM_PARAM_STORE_PFN
> being set and call XS_introduce_domain.
> 
> Note that at least C-xenstored does not need the PFN of the Xenstore
> page, as it is just using GNTTAB_RESERVED_XENSTORE for mapping the
> page.

Re: [PATCH v6 7/7] docs: Add device tree overlay documentation


Hi Stefano,

On 24/05/2024 23:16, Stefano Stabellini wrote:

From: Vikram Garhwal 

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
  docs/misc/arm/overlay.txt | 82 +++
  1 file changed, 82 insertions(+)
  create mode 100644 docs/misc/arm/overlay.txt

diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt
new file mode 100644
index 00..ef3ef792f7
--- /dev/null
+++ b/docs/misc/arm/overlay.txt
@@ -0,0 +1,82 @@
+# Device Tree Overlays support in Xen
+
+Xen experimentally supports dynamic device assignment to running
+domains, i.e. adding/removing nodes (using .dtbo) to/from Xen device
+tree, and attaching them to a running domain with given $domid.
+
+Dynamic node assignment works in two steps:
+
+## Add/Remove device tree overlay to/from Xen device tree
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor applies/removes the dtbo to/from Xen device tree.
+
+## Attach device from the DT overlay to domain
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor attach the device to the user-provided $domid by
+   mapping node resources in the DT overlay.
+
+# Examples
+
+Here are a few examples on how to use it.
+
+## Dom0 device add
+
+For assigning a device tree overlay to Dom0, user should firstly properly
+prepare the DT overlay. More information about device tree overlays can be
+found in [1]. Then, in Dom0, enter the following:
+
+(dom0) xl dt-overlay add overlay.dtbo
+
+This will allocate the devices mentioned in overlay.dtbo to Xen device tree.
+
+To assign the newly added device from the dtbo to Dom0:
+
+(dom0) xl dt-overlay attach overlay.dtbo 0
+
+Next, if the user wants to add the same device tree overlay to dom0
+Linux, execute the following:
+
+(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(dom0) cat overlay.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally if needed, the relevant Linux kernel drive can be loaded using:
+
+(dom0) modprobe module_name.ko
+
+## DomU device add/remove
+
+All the nodes in dtbo will be assigned to one domain. The user will need
+to prepare a different dtbo for the domU. For example, the
+`interrupt-parent` property of the DomU overlay should be changed to the
+Xen hardcoded value `0xfde8` and the xen,reg property should be added to
+specify the address mappings. If the domain is not 1:1 mapped, xen,reg
+must be present. See the xen,reg format description in
+docs/misc/arm/passthrough.txt. Below assumes the properly written DomU
+dtbo is `overlay_domu.dtbo`.
+
+You need to set the `passthrough` property in the xl config file if you


s/You need/The user needs/ to match the rest of the documentation.

With that addresed:

Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH v6 5/7] xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains


Hi Stefano,

On 24/05/2024 23:16, Stefano Stabellini wrote:

From: Henry Wang 

In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. It is OK to change the sysctl behavior
as this feature is experimental so changing sysctl behavior and breaking
compatibility is OK.

Add the XEN_DOMCTL_dt_overlay with operations
XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment to the domain.

The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach of the device is implemented by
mapping the IRQ and IOMMU resources. All devices in the overlay are
assigned to a single domain.

Also take the opportunity to make one coding style fix in sysctl.h.
Introduce DT_OVERLAY_MAX_SIZE and use it to avoid repetitions of
KB(500).

xen,reg is to be used to handle non-1:1 mappings but it is currently
unsupported. For now return errors for not-1:1 mapped domains.

Signed-off-by: Henry Wang 
Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH v3 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor


Hi,

On 25/05/2024 00:02, Stefano Stabellini wrote:

On Fri, 24 May 2024, Julien Grall wrote:

Hi Stefano,

On 24/05/2024 23:49, Stefano Stabellini wrote:

On Fri, 24 May 2024, Julien Grall wrote:

Hi Henry,

+ Juergen as the Xenstore maintainers. I'd like his opinion on the
approach.
The documentation of the new logic is in:

https://lore.kernel.org/xen-devel/20240517032156.1490515-5-xin.wa...@amd.com/

FWIW I am happy in principle with the logic (this is what we discussed on
the
call last week). Some comments below.

On 17/05/2024 04:21, Henry Wang wrote:

There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.

Since for init-dom0less, the magic page region is only for XenStore.
To solve above issue, this commit allocates the XenStore page for
Dom0less DomUs at the domain construction time. The PFN will be
noted and communicated to the init-dom0less application executed
from Dom0. To keep the XenStore late init protocol, set the connection
status to XENSTORE_RECONNECT.


So this commit is allocating the page, but it will not be used by
init-dom0less until the next patch. But Linux could use it. So would this
break bisection? If so, then I think patch #3 needs to be folded in this
patch.


I think that's fine,


I am not sure what you mean. Are you saying it is ok to break bisection?


No, I meant to say that it is fine to merge on commit.



I'll leave that with you on commit.


I am sorry but I don't think the folding should be done on commit. It should
happen before hand because the commit message will also need to be updated.


Understood. I'll send one more version with the patches merged (ideally
with an ack?)
Sorry I don't feel it is right for me to ack this patch with the pending 
questions from Juergen.


Cheers,

--
Julien Grall

Re: [PATCH v4 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor


Hi Stefano,

You have sent a new version. But you didn't reply to Juergen's comment.

While he is not the maintainer of the Arm code, he is the Xenstore 
maintainer. Even if I agree with this approach I don't feel this is 
right to even ack this patch without pending questions addressed.


In this case, we are changing yet another time the sequence for 
initializing Xenstore dom0less domain. I would rather not want to have 
to change a 4th time...


I don't think it is a big problem if this is not merged for the code 
freeze as this is technically a bug fix.


Cheers,

On 24/05/2024 23:55, Stefano Stabellini wrote:

From: Henry Wang 

There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.

Since for init-dom0less, the magic page region is only for XenStore.
To solve above issue, this commit allocates the XenStore page for
Dom0less DomUs at the domain construction time. The PFN will be
noted and communicated to the init-dom0less application executed
from Dom0. To keep the XenStore late init protocol, set the connection
status to XENSTORE_RECONNECT.

Reported-by: Alec Kwapis 
Suggested-by: Daniel P. Smith 
Signed-off-by: Henry Wang 
Signed-off-by: Stefano Stabellini 
---
  xen/arch/arm/dom0less-build.c | 55 ++-
  1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242..2963ecc0b3 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -1,5 +1,6 @@
  /* SPDX-License-Identifier: GPL-2.0-only */
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -10,6 +11,8 @@
  #include 
  #include 
  
+#include 

+
  #include 
  #include 
  #include 
@@ -739,6 +742,53 @@ static int __init alloc_xenstore_evtchn(struct domain *d)
  return 0;
  }
  
+#define XENSTORE_PFN_OFFSET 1

+static int __init alloc_xenstore_page(struct domain *d)
+{
+struct page_info *xenstore_pg;
+struct xenstore_domain_interface *interface;
+mfn_t mfn;
+gfn_t gfn;
+int rc;
+
+if ( (UINT_MAX - d->max_pages) < 1 )
+{
+printk(XENLOG_ERR "%pd: Over-allocation for d->max_pages by 1 page.\n",
+   d);
+return -EINVAL;
+}
+d->max_pages += 1;
+xenstore_pg = alloc_domheap_page(d, MEMF_bits(32));
+if ( xenstore_pg == NULL && is_64bit_domain(d) )
+xenstore_pg = alloc_domheap_page(d, 0);
+if ( xenstore_pg == NULL )
+return -ENOMEM;
+
+mfn = page_to_mfn(xenstore_pg);
+if ( !mfn_x(mfn) )
+return -ENOMEM;
+
+if ( !is_domain_direct_mapped(d) )
+gfn = gaddr_to_gfn(GUEST_MAGIC_BASE +
+   (XENSTORE_PFN_OFFSET << PAGE_SHIFT));
+else
+gfn = gaddr_to_gfn(mfn_to_maddr(mfn));
+
+rc = guest_physmap_add_page(d, gfn, mfn, 0);
+if ( rc )
+{
+free_domheap_page(xenstore_pg);
+return rc;
+}
+
+d->arch.hvm.params[HVM_PARAM_STORE_PFN] = gfn_x(gfn);
+interface = map_domain_page(mfn);
+interface->connection = XENSTORE_RECONNECT;
+unmap_domain_page(interface);
+
+return 0;
+}
+
  static int __init construct_domU(struct domain *d,
   const struct dt_device_node *node)
  {
@@ -839,7 +889,10 @@ static int __init construct_domU(struct domain *d,
  rc = alloc_xenstore_evtchn(d);
  if ( rc < 0 )
  return rc;
-d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL;
+
+rc = alloc_xenstore_page(d);
+if ( rc < 0 )
+return rc;
  }
  
  return rc;


--
Julien Grall

Re: [PATCH v3 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor

On Fri, 24 May 2024, Julien Grall wrote:
> Hi Stefano,
> 
> On 24/05/2024 23:49, Stefano Stabellini wrote:
> > On Fri, 24 May 2024, Julien Grall wrote:
> > > Hi Henry,
> > > 
> > > + Juergen as the Xenstore maintainers. I'd like his opinion on the
> > > approach.
> > > The documentation of the new logic is in:
> > > 
> > > https://lore.kernel.org/xen-devel/20240517032156.1490515-5-xin.wa...@amd.com/
> > > 
> > > FWIW I am happy in principle with the logic (this is what we discussed on
> > > the
> > > call last week). Some comments below.
> > > 
> > > On 17/05/2024 04:21, Henry Wang wrote:
> > > > There are use cases (for example using the PV driver) in Dom0less
> > > > setup that require Dom0less DomUs start immediately with Dom0, but
> > > > initialize XenStore later after Dom0's successful boot and call to
> > > > the init-dom0less application.
> > > > 
> > > > An error message can seen from the init-dom0less application on
> > > > 1:1 direct-mapped domains:
> > > > ```
> > > > Allocating magic pages
> > > > memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
> > > > Error on alloc magic pages
> > > > ```
> > > > 
> > > > The "magic page" is a terminology used in the toolstack as reserved
> > > > pages for the VM to have access to virtual platform capabilities.
> > > > Currently the magic pages for Dom0less DomUs are populated by the
> > > > init-dom0less app through populate_physmap(), and populate_physmap()
> > > > automatically assumes gfn == mfn for 1:1 direct mapped domains. This
> > > > cannot be true for the magic pages that are allocated later from the
> > > > init-dom0less application executed in Dom0. For domain using statically
> > > > allocated memory but not 1:1 direct-mapped, similar error "failed to
> > > > retrieve a reserved page" can be seen as the reserved memory list is
> > > > empty at that time.
> > > > 
> > > > Since for init-dom0less, the magic page region is only for XenStore.
> > > > To solve above issue, this commit allocates the XenStore page for
> > > > Dom0less DomUs at the domain construction time. The PFN will be
> > > > noted and communicated to the init-dom0less application executed
> > > > from Dom0. To keep the XenStore late init protocol, set the connection
> > > > status to XENSTORE_RECONNECT.
> > > 
> > > So this commit is allocating the page, but it will not be used by
> > > init-dom0less until the next patch. But Linux could use it. So would this
> > > break bisection? If so, then I think patch #3 needs to be folded in this
> > > patch.
> > 
> > I think that's fine, 
> 
> I am not sure what you mean. Are you saying it is ok to break bisection?

No, I meant to say that it is fine to merge on commit.


> > I'll leave that with you on commit.
> 
> I am sorry but I don't think the folding should be done on commit. It should
> happen before hand because the commit message will also need to be updated.

Understood. I'll send one more version with the patches merged (ideally
with an ack?)

Re: [PATCH v3 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor


Hi Stefano,

On 24/05/2024 23:49, Stefano Stabellini wrote:

On Fri, 24 May 2024, Julien Grall wrote:

Hi Henry,

+ Juergen as the Xenstore maintainers. I'd like his opinion on the approach.
The documentation of the new logic is in:

https://lore.kernel.org/xen-devel/20240517032156.1490515-5-xin.wa...@amd.com/

FWIW I am happy in principle with the logic (this is what we discussed on the
call last week). Some comments below.

On 17/05/2024 04:21, Henry Wang wrote:

There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.

Since for init-dom0less, the magic page region is only for XenStore.
To solve above issue, this commit allocates the XenStore page for
Dom0less DomUs at the domain construction time. The PFN will be
noted and communicated to the init-dom0less application executed
from Dom0. To keep the XenStore late init protocol, set the connection
status to XENSTORE_RECONNECT.


So this commit is allocating the page, but it will not be used by
init-dom0less until the next patch. But Linux could use it. So would this
break bisection? If so, then I think patch #3 needs to be folded in this
patch.


I think that's fine, 


I am not sure what you mean. Are you saying it is ok to break bisection?


I'll leave that with you on commit.


I am sorry but I don't think the folding should be done on commit. It 
should happen before hand because the commit message will also need to 
be updated.


Cheers,

--
Julien Grall

[PATCH v4 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor

From: Henry Wang 

There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.

Since for init-dom0less, the magic page region is only for XenStore.
To solve above issue, this commit allocates the XenStore page for
Dom0less DomUs at the domain construction time. The PFN will be
noted and communicated to the init-dom0less application executed
from Dom0. To keep the XenStore late init protocol, set the connection
status to XENSTORE_RECONNECT.

Reported-by: Alec Kwapis 
Suggested-by: Daniel P. Smith 
Signed-off-by: Henry Wang 
Signed-off-by: Stefano Stabellini 
---
 xen/arch/arm/dom0less-build.c | 55 ++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242..2963ecc0b3 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -10,6 +11,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 #include 
 #include 
@@ -739,6 +742,53 @@ static int __init alloc_xenstore_evtchn(struct domain *d)
 return 0;
 }
 
+#define XENSTORE_PFN_OFFSET 1
+static int __init alloc_xenstore_page(struct domain *d)
+{
+struct page_info *xenstore_pg;
+struct xenstore_domain_interface *interface;
+mfn_t mfn;
+gfn_t gfn;
+int rc;
+
+if ( (UINT_MAX - d->max_pages) < 1 )
+{
+printk(XENLOG_ERR "%pd: Over-allocation for d->max_pages by 1 page.\n",
+   d);
+return -EINVAL;
+}
+d->max_pages += 1;
+xenstore_pg = alloc_domheap_page(d, MEMF_bits(32));
+if ( xenstore_pg == NULL && is_64bit_domain(d) )
+xenstore_pg = alloc_domheap_page(d, 0);
+if ( xenstore_pg == NULL )
+return -ENOMEM;
+
+mfn = page_to_mfn(xenstore_pg);
+if ( !mfn_x(mfn) )
+return -ENOMEM;
+
+if ( !is_domain_direct_mapped(d) )
+gfn = gaddr_to_gfn(GUEST_MAGIC_BASE +
+   (XENSTORE_PFN_OFFSET << PAGE_SHIFT));
+else
+gfn = gaddr_to_gfn(mfn_to_maddr(mfn));
+
+rc = guest_physmap_add_page(d, gfn, mfn, 0);
+if ( rc )
+{
+free_domheap_page(xenstore_pg);
+return rc;
+}
+
+d->arch.hvm.params[HVM_PARAM_STORE_PFN] = gfn_x(gfn);
+interface = map_domain_page(mfn);
+interface->connection = XENSTORE_RECONNECT;
+unmap_domain_page(interface);
+
+return 0;
+}
+
 static int __init construct_domU(struct domain *d,
  const struct dt_device_node *node)
 {
@@ -839,7 +889,10 @@ static int __init construct_domU(struct domain *d,
 rc = alloc_xenstore_evtchn(d);
 if ( rc < 0 )
 return rc;
-d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL;
+
+rc = alloc_xenstore_page(d);
+if ( rc < 0 )
+return rc;
 }
 
 return rc;
-- 
2.25.1

[PATCH v4 3/4] tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE

From: Henry Wang 

Currently the GUEST_MAGIC_BASE in the init-dom0less application is
hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less
DomUs.

Since the guest magic region allocation from init-dom0less is for
XenStore, and the XenStore page is now allocated from the hypervisor,
instead of hardcoding the guest magic pages region, use
xc_hvm_param_get() to get the XenStore page PFN. Rename alloc_xs_page()
to get_xs_page() to reflect the changes.

With this change, some existing code is not needed anymore, including:
(1) The definition of the XenStore page offset.
(2) Call to xc_domain_setmaxmem() and xc_clear_domain_page() as we
don't need to set the max mem and clear the page anymore.
(3) Foreign mapping of the XenStore page, setting of XenStore interface
status and HVM_PARAM_STORE_PFN from init-dom0less, as they are set
by the hypervisor.

Take the opportunity to do some coding style improvements when possible.

Reported-by: Alec Kwapis 
Signed-off-by: Henry Wang 
Reviewed-by: Jason Andryuk 
CC: anth...@xenproject.org
---
 tools/helpers/init-dom0less.c | 58 +--
 1 file changed, 14 insertions(+), 44 deletions(-)

diff --git a/tools/helpers/init-dom0less.c b/tools/helpers/init-dom0less.c
index fee93459c4..2b51965fa7 100644
--- a/tools/helpers/init-dom0less.c
+++ b/tools/helpers/init-dom0less.c
@@ -16,30 +16,18 @@
 
 #include "init-dom-json.h"
 
-#define XENSTORE_PFN_OFFSET 1
 #define STR_MAX_LENGTH 128
 
-static int alloc_xs_page(struct xc_interface_core *xch,
- libxl_dominfo *info,
- uint64_t *xenstore_pfn)
+static int get_xs_page(struct xc_interface_core *xch, libxl_dominfo *info,
+   uint64_t *xenstore_pfn)
 {
 int rc;
-const xen_pfn_t base = GUEST_MAGIC_BASE >> XC_PAGE_SHIFT;
-xen_pfn_t p2m = (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET;
 
-rc = xc_domain_setmaxmem(xch, info->domid,
- info->max_memkb + (XC_PAGE_SIZE/1024));
-if (rc < 0)
-return rc;
-
-rc = xc_domain_populate_physmap_exact(xch, info->domid, 1, 0, 0, );
-if (rc < 0)
-return rc;
-
-*xenstore_pfn = base + XENSTORE_PFN_OFFSET;
-rc = xc_clear_domain_page(xch, info->domid, *xenstore_pfn);
-if (rc < 0)
-return rc;
+rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_STORE_PFN, xenstore_pfn);
+if (rc < 0) {
+printf("Failed to get HVM_PARAM_STORE_PFN\n");
+return 1;
+}
 
 return 0;
 }
@@ -100,6 +88,7 @@ static bool do_xs_write_vm(struct xs_handle *xsh, 
xs_transaction_t t,
  */
 static int create_xenstore(struct xs_handle *xsh,
libxl_dominfo *info, libxl_uuid uuid,
+   uint64_t xenstore_pfn,
evtchn_port_t xenstore_port)
 {
 domid_t domid;
@@ -145,8 +134,7 @@ static int create_xenstore(struct xs_handle *xsh,
 rc = snprintf(target_memkb_str, STR_MAX_LENGTH, "%"PRIu64, 
info->current_memkb);
 if (rc < 0 || rc >= STR_MAX_LENGTH)
 return rc;
-rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%lld",
-  (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET);
+rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%"PRIu64, xenstore_pfn);
 if (rc < 0 || rc >= STR_MAX_LENGTH)
 return rc;
 rc = snprintf(xenstore_port_str, STR_MAX_LENGTH, "%u", xenstore_port);
@@ -230,7 +218,6 @@ static int init_domain(struct xs_handle *xsh,
 libxl_uuid uuid;
 uint64_t xenstore_evtchn, xenstore_pfn;
 int rc;
-struct xenstore_domain_interface *intf;
 
 printf("Init dom0less domain: %u\n", info->domid);
 
@@ -245,20 +232,11 @@ static int init_domain(struct xs_handle *xsh,
 if (!xenstore_evtchn)
 return 0;
 
-/* Alloc xenstore page */
-if (alloc_xs_page(xch, info, _pfn) != 0) {
-printf("Error on alloc magic pages\n");
-return 1;
-}
-
-intf = xenforeignmemory_map(xfh, info->domid, PROT_READ | PROT_WRITE, 1,
-_pfn, NULL);
-if (!intf) {
-printf("Error mapping xenstore page\n");
+/* Get xenstore page */
+if (get_xs_page(xch, info, _pfn) != 0) {
+printf("Error on getting xenstore page\n");
 return 1;
 }
-intf->connection = XENSTORE_RECONNECT;
-xenforeignmemory_unmap(xfh, intf, 1);
 
 rc = xc_dom_gnttab_seed(xch, info->domid, true,
 (xen_pfn_t)-1, xenstore_pfn, 0, 0);
@@ -272,19 +250,11 @@ static int init_domain(struct xs_handle *xsh,
 if (rc)
 err(1, "gen_stub_json_config");
 
-/* Now everything is ready: set HVM_PARAM_STORE_PFN */
-rc = xc_hvm_param_set(xch, info->domid, HVM_PARAM_STORE_PFN,
-  xenstore_pfn);
-if (rc < 0)
-return rc;
-
-rc = create_xenstore(xsh, info, uuid, xenstore_evtchn);
+rc = create_xenstore(xsh, info, uuid, xenstore_pfn,

[PATCH v4 0/4] Guest XenStore page allocation for 11 Dom0less domUs

Hi all,

This series is trying to fix the reported guest magic region alloc
issue for 11 Dom0less domUs, an error message can seen from the
init-dom0less application on 1:1 direct-mapped Dom0less DomUs:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

This is because populate_physmap() automatically assumes gfn == mfn
for direct mapped domains. This cannot be true for the magic pages
that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
helper application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list
is empty at that time.

In [1] I've tried to fix this issue by the domctl approach, and
discussions in [2] and [3] indicates that a domctl is not really
necessary, as we can simplify the issue to "allocate the Dom0less
guest magic regions at the Dom0less DomU build time and pass the
region base PFN to init-dom0less application". The later on
discussion [4] reached an agreement that we only need to allocate
one single page for XenStore, and set the HVM_PARAM_STORE_PFN from
hypervisor with some Linux XenStore late init protocol improvements.
Therefore, this series tries to fix the issue based on all discussions.
The first patch puts a restriction that static shared memory on
direct-mapped DomUs should also be direct mapped, as otherwise it will
clash [5]. Patch 2 allocates the XenStore page from Xen and set the
initial connection status to RECONNECTED. Patch 3 is the update of the
init-dom0less application with all of the changes. Patch 4 is the doc
change to reflect the changes introduced by this series.

**NOTE**: This series should work with the Linux change [6].

[1] https://lore.kernel.org/xen-devel/20240409045357.236802-1-xin.wa...@amd.com/
[2] 
https://lore.kernel.org/xen-devel/c7857223-eab8-409a-b618-6ec70f616...@apertussolutions.com/
[3] 
https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2404251508470.3940@ubuntu-linux-20-04-desktop/
[4] 
https://lore.kernel.org/xen-devel/d33ea00d-890d-45cc-9583-64c953abd...@xen.org/
[5] 
https://lore.kernel.org/xen-devel/686ba256-f8bf-47e7-872f-d277bf7df...@xen.org/
[6] 
https://lore.kernel.org/xen-devel/20240517011516.1451087-1-xin.wa...@amd.com/

Henry Wang (4):
  xen/arm/static-shmem: Static-shmem should be direct-mapped for
direct-mapped domains
  xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor
  tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE
  docs/features/dom0less: Update the late XenStore init protocol

[PATCH v4 4/4] docs/features/dom0less: Update the late XenStore init protocol

From: Henry Wang 

With the new allocation strategy of Dom0less DomUs XenStore page,
update the doc of the late XenStore init protocol accordingly.

Signed-off-by: Henry Wang 
---
 docs/features/dom0less.pandoc | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/docs/features/dom0less.pandoc b/docs/features/dom0less.pandoc
index 725afa0558..8b178edee0 100644
--- a/docs/features/dom0less.pandoc
+++ b/docs/features/dom0less.pandoc
@@ -110,9 +110,10 @@ hotplug PV drivers to dom0less guests. E.g. xl 
network-attach domU.
 The implementation works as follows:
 - Xen allocates the xenstore event channel for each dom0less domU that
   has the "xen,enhanced" property, and sets HVM_PARAM_STORE_EVTCHN
-- Xen does *not* allocate the xenstore page and sets HVM_PARAM_STORE_PFN
-  to ~0ULL (invalid)
-- Dom0less domU kernels check that HVM_PARAM_STORE_PFN is set to invalid
+- Xen allocates the xenstore page and sets HVM_PARAM_STORE_PFN as well
+  as the connection status to XENSTORE_RECONNECT.
+- Dom0less domU kernels check that HVM_PARAM_STORE_PFN is set to
+  ~0ULL (invalid) or the connection status is *not* XENSTORE_CONNECTED.
 - Old kernels will continue without xenstore support (Note: some old
   buggy kernels might crash because they don't check the validity of
   HVM_PARAM_STORE_PFN before using it! Disable "xen,enhanced" in
@@ -121,13 +122,14 @@ The implementation works as follows:
   channel (HVM_PARAM_STORE_EVTCHN) before continuing with the
   initialization
 - Once dom0 is booted, init-dom0less is executed:
-- it allocates the xenstore shared page and sets HVM_PARAM_STORE_PFN
+- it gets the xenstore shared page from HVM_PARAM_STORE_PFN
 - it calls xs_introduce_domain
 - Xenstored notices the new domain, initializes interfaces as usual, and
   sends an event channel notification to the domain using the xenstore
   event channel (HVM_PARAM_STORE_EVTCHN)
 - The Linux domU kernel receives the event channel notification, checks
-  HVM_PARAM_STORE_PFN again and continue with the initialization
+  HVM_PARAM_STORE_PFN and the connection status again and continue with
+  the initialization
 
 
 Limitations
-- 
2.25.1

[PATCH v4 1/4] xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains

From: Henry Wang 

Currently, users are allowed to map static shared memory in a
non-direct-mapped way for direct-mapped domains. This can lead to
clashing of guest memory spaces. Also, the current extended region
finding logic only removes the host physical addresses of the
static shared memory areas for direct-mapped domains, which may be
inconsistent with the guest memory map if users map the static
shared memory in a non-direct-mapped way. This will lead to incorrect
extended region calculation results.

To make things easier, add restriction that static shared memory
should also be direct-mapped for direct-mapped domains. Check the
host physical address to be matched with guest physical address when
parsing the device tree. Document this restriction in the doc.

Signed-off-by: Henry Wang 
Signed-off-by: Stefano Stabellini 
Acked-by: Michal Orzel 
---
 docs/misc/arm/device-tree/booting.txt | 3 +++
 xen/arch/arm/static-shmem.c   | 6 ++
 2 files changed, 9 insertions(+)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index bbd955e9c2..c994e48391 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -591,6 +591,9 @@ communication.
 shared memory region in host physical address space, a size, and a guest
 physical address, as the target address of the mapping.
 e.g. xen,shared-mem = < [host physical address] [guest address] [size] >
+Note that if a domain is direct-mapped, i.e. the Dom0 and the Dom0less
+DomUs with `direct-map` device tree property, the static shared memory
+should also be direct-mapped (host physical address == guest address).
 
 It shall also meet the following criteria:
 1) If the SHM ID matches with an existing region, the address range of the
diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index 78881dd1d3..5bf1916e5e 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -235,6 +235,12 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
d, psize);
 return -EINVAL;
 }
+if ( is_domain_direct_mapped(d) && (pbase != gbase) )
+{
+printk("%pd: physical address 0x%"PRIpaddr" and guest address 
0x%"PRIpaddr" are not direct-mapped.\n",
+   d, pbase, gbase);
+return -EINVAL;
+}
 
 for ( i = 0; i < PFN_DOWN(psize); i++ )
 if ( !mfn_valid(mfn_add(maddr_to_mfn(pbase), i)) )
-- 
2.25.1

Re: [PATCH v3 3/4] tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE

On Mon, 20 May 2024, Jason Andryuk wrote:
> On 2024-05-16 23:21, Henry Wang wrote:
> > Currently the GUEST_MAGIC_BASE in the init-dom0less application is
> > hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less
> > DomUs.
> > 
> > Since the guest magic region allocation from init-dom0less is for
> > XenStore, and the XenStore page is now allocated from the hypervisor,
> > instead of hardcoding the guest magic pages region, use
> > xc_hvm_param_get() to get the XenStore page PFN. Rename alloc_xs_page()
> > to get_xs_page() to reflect the changes.
> > 
> > With this change, some existing code is not needed anymore, including:
> > (1) The definition of the XenStore page offset.
> > (2) Call to xc_domain_setmaxmem() and xc_clear_domain_page() as we
> >  don't need to set the max mem and clear the page anymore.
> > (3) Foreign mapping of the XenStore page, setting of XenStore interface
> >  status and HVM_PARAM_STORE_PFN from init-dom0less, as they are set
> >  by the hypervisor.
> > 
> > Take the opportunity to do some coding style improvements when possible.
> > 
> > Reported-by: Alec Kwapis 
> > Signed-off-by: Henry Wang 
> 
> Reviewed-by: Jason Andryuk 

Reviewed-by: Stefano Stabellini

Re: [PATCH v3 4/4] docs/features/dom0less: Update the late XenStore init protocol

On Fri, 17 May 2024, Henry Wang wrote:
> With the new allocation strategy of Dom0less DomUs XenStore page,
> update the doc of the late XenStore init protocol accordingly.
> 
> Signed-off-by: Henry Wang 

Reviewed-by: Stefano Stabellini 


> ---
> v3:
> - Wording change.
> v2:
> - New patch.
> ---
>  docs/features/dom0less.pandoc | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/docs/features/dom0less.pandoc b/docs/features/dom0less.pandoc
> index 725afa0558..8b178edee0 100644
> --- a/docs/features/dom0less.pandoc
> +++ b/docs/features/dom0less.pandoc
> @@ -110,9 +110,10 @@ hotplug PV drivers to dom0less guests. E.g. xl 
> network-attach domU.
>  The implementation works as follows:
>  - Xen allocates the xenstore event channel for each dom0less domU that
>has the "xen,enhanced" property, and sets HVM_PARAM_STORE_EVTCHN
> -- Xen does *not* allocate the xenstore page and sets HVM_PARAM_STORE_PFN
> -  to ~0ULL (invalid)
> -- Dom0less domU kernels check that HVM_PARAM_STORE_PFN is set to invalid
> +- Xen allocates the xenstore page and sets HVM_PARAM_STORE_PFN as well
> +  as the connection status to XENSTORE_RECONNECT.
> +- Dom0less domU kernels check that HVM_PARAM_STORE_PFN is set to
> +  ~0ULL (invalid) or the connection status is *not* XENSTORE_CONNECTED.
>  - Old kernels will continue without xenstore support (Note: some old
>buggy kernels might crash because they don't check the validity of
>HVM_PARAM_STORE_PFN before using it! Disable "xen,enhanced" in
> @@ -121,13 +122,14 @@ The implementation works as follows:
>channel (HVM_PARAM_STORE_EVTCHN) before continuing with the
>initialization
>  - Once dom0 is booted, init-dom0less is executed:
> -- it allocates the xenstore shared page and sets HVM_PARAM_STORE_PFN
> +- it gets the xenstore shared page from HVM_PARAM_STORE_PFN
>  - it calls xs_introduce_domain
>  - Xenstored notices the new domain, initializes interfaces as usual, and
>sends an event channel notification to the domain using the xenstore
>event channel (HVM_PARAM_STORE_EVTCHN)
>  - The Linux domU kernel receives the event channel notification, checks
> -  HVM_PARAM_STORE_PFN again and continue with the initialization
> +  HVM_PARAM_STORE_PFN and the connection status again and continue with
> +  the initialization
>  
>  
>  Limitations
> -- 
> 2.34.1
> 
>

Re: [PATCH v3 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor

On Fri, 24 May 2024, Julien Grall wrote:
> Hi Henry,
> 
> + Juergen as the Xenstore maintainers. I'd like his opinion on the approach.
> The documentation of the new logic is in:
> 
> https://lore.kernel.org/xen-devel/20240517032156.1490515-5-xin.wa...@amd.com/
> 
> FWIW I am happy in principle with the logic (this is what we discussed on the
> call last week). Some comments below.
> 
> On 17/05/2024 04:21, Henry Wang wrote:
> > There are use cases (for example using the PV driver) in Dom0less
> > setup that require Dom0less DomUs start immediately with Dom0, but
> > initialize XenStore later after Dom0's successful boot and call to
> > the init-dom0less application.
> > 
> > An error message can seen from the init-dom0less application on
> > 1:1 direct-mapped domains:
> > ```
> > Allocating magic pages
> > memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
> > Error on alloc magic pages
> > ```
> > 
> > The "magic page" is a terminology used in the toolstack as reserved
> > pages for the VM to have access to virtual platform capabilities.
> > Currently the magic pages for Dom0less DomUs are populated by the
> > init-dom0less app through populate_physmap(), and populate_physmap()
> > automatically assumes gfn == mfn for 1:1 direct mapped domains. This
> > cannot be true for the magic pages that are allocated later from the
> > init-dom0less application executed in Dom0. For domain using statically
> > allocated memory but not 1:1 direct-mapped, similar error "failed to
> > retrieve a reserved page" can be seen as the reserved memory list is
> > empty at that time.
> > 
> > Since for init-dom0less, the magic page region is only for XenStore.
> > To solve above issue, this commit allocates the XenStore page for
> > Dom0less DomUs at the domain construction time. The PFN will be
> > noted and communicated to the init-dom0less application executed
> > from Dom0. To keep the XenStore late init protocol, set the connection
> > status to XENSTORE_RECONNECT.
> 
> So this commit is allocating the page, but it will not be used by
> init-dom0less until the next patch. But Linux could use it. So would this
> break bisection? If so, then I think patch #3 needs to be folded in this
> patch.

I think that's fine, I'll leave that with you on commit. I'll resend as
is addressing the other comments.


> > 
> > Reported-by: Alec Kwapis 
> > Suggested-by: Daniel P. Smith 
> > Signed-off-by: Henry Wang 
> > ---
> > v3:
> > - Only allocate XenStore page. (Julien)
> > - Set HVM_PARAM_STORE_PFN and the XenStore connection status directly
> >from hypervisor. (Stefano)
> > v2:
> > - Reword the commit msg to explain what is "magic page" and use generic
> >terminology "hypervisor reserved pages" in commit msg. (Daniel)
> > - Also move the offset definition of magic pages. (Michal)
> > - Extract the magic page allocation logic to a function. (Michal)
> > ---
> >   xen/arch/arm/dom0less-build.c | 44 ++-
> >   1 file changed, 43 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
> > index 74f053c242..95c4fd1a2d 100644
> > --- a/xen/arch/arm/dom0less-build.c
> > +++ b/xen/arch/arm/dom0less-build.c
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: GPL-2.0-only */
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -10,6 +11,8 @@
> >   #include 
> >   #include 
> >   +#include 
> > +
> >   #include 
> >   #include 
> >   #include 
> > @@ -739,6 +742,42 @@ static int __init alloc_xenstore_evtchn(struct domain
> > *d)
> >   return 0;
> >   }
> >   +#define XENSTORE_PFN_OFFSET 1
> > +static int __init alloc_xenstore_page(struct domain *d)
> > +{
> > +struct page_info *xenstore_pg;
> > +struct xenstore_domain_interface *interface;
> > +mfn_t mfn;
> > +gfn_t gfn;
> > +int rc;
> > +
> > +d->max_pages += 1;
> 
> Sorry I should have spotted it earlier. But you want to check d->max_pages is
> not overflowing. You can look at acquire_shared_memory_bank() for how to do
> it.
> 
> Also, maybe we want an helper to do it so it is not open-coded in multiple
> places.

Makes sense. I open-coded it because I wasn't sure if you preferred a
static inline or a macro and where to add the implementation.


> > +xenstore_pg = alloc_domheap_page(d, 0);
> 
> I think we may want to restrict where the page is allocated. For instance,
> 32-bit domain using short page-tables will not be able to address all the
> physical memory.
> 
> I would consider to try to allocate the page below 32-bit (using
> MEMF-bits(32). And then fallback to above 32-bit only 64-bit for domain.

done


> Also, just to note that in theory alloc_domheap_page() could return MFN 0. In
> practice we have excluded MFN 0 because it breaks the page allocator so far.
> 
> But I would still prefer if we add a check on the MFN. This will make easier
> to spot any issue if we ever give MFN 0 to the allocator.

Good idea, but for

Re: [PATCH v3 1/4] xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains

On Tue, 21 May 2024, Henry Wang wrote:
> Hi Michal,
> 
> On 5/21/2024 12:09 AM, Michal Orzel wrote:
> > > > > Thanks. I will take the tag if you are ok with above diff (for the
> > > > > case
> > > > > if this series goes in later than Luca's).
> > > > I would move this check to process_shm() right after "gbase =
> > > > dt_read_paddr" setting.
> > > > This would be the most natural placement for such a check.
> > > That sounds good. Thanks! IIUC we only need to add the check for the
> > > pbase != INVALID_PADDR case correct?
> > Yes, but at the same time I wonder whether we should also return error if a
> > user omits pbase
> > for direct mapped domain.
> 
> I think this makes sense. So I will add also a check for the case if users
> omit pbase in the device tree for the direct mapped domain.

I fixed the NIT and added the ack, but as Luca's series hasn't been
committed yet, I have not made this change. I'll leave it to Julien when
he commits both series.

[PATCH v6 7/7] docs: Add device tree overlay documentation

From: Vikram Garhwal 

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
 docs/misc/arm/overlay.txt | 82 +++
 1 file changed, 82 insertions(+)
 create mode 100644 docs/misc/arm/overlay.txt

diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt
new file mode 100644
index 00..ef3ef792f7
--- /dev/null
+++ b/docs/misc/arm/overlay.txt
@@ -0,0 +1,82 @@
+# Device Tree Overlays support in Xen
+
+Xen experimentally supports dynamic device assignment to running
+domains, i.e. adding/removing nodes (using .dtbo) to/from Xen device
+tree, and attaching them to a running domain with given $domid.
+
+Dynamic node assignment works in two steps:
+
+## Add/Remove device tree overlay to/from Xen device tree
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor applies/removes the dtbo to/from Xen device tree.
+
+## Attach device from the DT overlay to domain
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor attach the device to the user-provided $domid by
+   mapping node resources in the DT overlay.
+
+# Examples
+
+Here are a few examples on how to use it.
+
+## Dom0 device add
+
+For assigning a device tree overlay to Dom0, user should firstly properly
+prepare the DT overlay. More information about device tree overlays can be
+found in [1]. Then, in Dom0, enter the following:
+
+(dom0) xl dt-overlay add overlay.dtbo
+
+This will allocate the devices mentioned in overlay.dtbo to Xen device tree.
+
+To assign the newly added device from the dtbo to Dom0:
+
+(dom0) xl dt-overlay attach overlay.dtbo 0
+
+Next, if the user wants to add the same device tree overlay to dom0
+Linux, execute the following:
+
+(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(dom0) cat overlay.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally if needed, the relevant Linux kernel drive can be loaded using:
+
+(dom0) modprobe module_name.ko
+
+## DomU device add/remove
+
+All the nodes in dtbo will be assigned to one domain. The user will need
+to prepare a different dtbo for the domU. For example, the
+`interrupt-parent` property of the DomU overlay should be changed to the
+Xen hardcoded value `0xfde8` and the xen,reg property should be added to
+specify the address mappings. If the domain is not 1:1 mapped, xen,reg
+must be present. See the xen,reg format description in
+docs/misc/arm/passthrough.txt. Below assumes the properly written DomU
+dtbo is `overlay_domu.dtbo`.
+
+You need to set the `passthrough` property in the xl config file if you
+plan to use DT overlay and devices requiring an IOMMU.
+
+User will also need to modprobe the relevant drivers. For already
+running domains, the user can use the xl dt-overlay attach command,
+example:
+
+(dom0) xl dt-overlay add overlay.dtbo# If not executed before
+(dom0) xl dt-overlay attach overlay_domu.dtbo $domid
+(dom0) xl console $domid # To access $domid console
+
+Next, if the user needs to modify/prepare the overlay.dtbo suitable for
+the domU:
+
+(domU) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(domU) cat overlay_domu.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally, if needed, the relevant Linux kernel drive can be probed:
+
+(domU) modprobe module_name.ko
+
+[1] https://www.kernel.org/doc/Documentation/devicetree/overlay-notes.txt
-- 
2.25.1

[PATCH v6 5/7] xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains

From: Henry Wang 

In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. It is OK to change the sysctl behavior
as this feature is experimental so changing sysctl behavior and breaking
compatibility is OK.

Add the XEN_DOMCTL_dt_overlay with operations
XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment to the domain.

The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach of the device is implemented by
mapping the IRQ and IOMMU resources. All devices in the overlay are
assigned to a single domain.

Also take the opportunity to make one coding style fix in sysctl.h.
Introduce DT_OVERLAY_MAX_SIZE and use it to avoid repetitions of
KB(500).

xen,reg is to be used to handle non-1:1 mappings but it is currently
unsupported. For now return errors for not-1:1 mapped domains.

Signed-off-by: Henry Wang 
Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
 xen/arch/arm/domctl.c|   3 +
 xen/common/dt-overlay.c  | 211 ++-
 xen/include/public/domctl.h  |  16 ++-
 xen/include/public/sysctl.h  |  11 +-
 xen/include/xen/dt-overlay.h |   8 ++
 5 files changed, 189 insertions(+), 60 deletions(-)

diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index ad56efb0f5..12a12ee781 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2012, Citrix Systems
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -176,6 +177,8 @@ long arch_do_domctl(struct xen_domctl *domctl, struct 
domain *d,
 
 return rc;
 }
+case XEN_DOMCTL_dt_overlay:
+return dt_overlay_domctl(d, >u.dt_overlay);
 default:
 return subarch_do_domctl(domctl, d, u_domctl);
 }
diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 9cece79067..d53b4706cd 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -15,6 +15,8 @@
 #include 
 #include 
 
+#define DT_OVERLAY_MAX_SIZE KB(500)
+
 static LIST_HEAD(overlay_tracker);
 static DEFINE_SPINLOCK(overlay_lock);
 
@@ -356,6 +358,42 @@ static int overlay_get_nodes_info(const void *fdto, char 
**nodes_full_path)
 return 0;
 }
 
+/* This function should be called with the overlay_lock taken */
+static struct overlay_track *
+find_track_entry_from_tracker(const void *overlay_fdt,
+  uint32_t overlay_fdt_size)
+{
+struct overlay_track *entry, *temp;
+bool found_entry = false;
+
+ASSERT(spin_is_locked(_lock));
+
+/*
+ * First check if dtbo is correct i.e. it should one of the dtbo which was
+ * used when dynamically adding the node.
+ * Limitation: Cases with same node names but different property are not
+ * supported currently. We are relying on user to provide the same dtbo
+ * as it was used when adding the nodes.
+ */
+list_for_each_entry_safe( entry, temp, _tracker, entry )
+{
+if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 )
+{
+found_entry = true;
+break;
+}
+}
+
+if ( !found_entry )
+{
+printk(XENLOG_ERR "Cannot find any matching tracker with input dtbo."
+   " Operation is supported only for prior added dtbo.\n");
+return NULL;
+}
+
+return entry;
+}
+
 /* Check if node itself can be removed and remove node from IOMMU. */
 static int remove_node_resources(struct dt_device_node *device_node)
 {
@@ -485,8 +523,7 @@ static long handle_remove_overlay_nodes(const void 
*overlay_fdt,
 uint32_t overlay_fdt_size)
 {
 int rc;
-struct overlay_track *entry, *temp, *track;
-bool found_entry = false;
+struct overlay_track *entry;
 
 rc = check_overlay_fdt(overlay_fdt, overlay_fdt_size);
 if ( rc )
@@ -494,29 +531,10 @@ static long handle_remove_overlay_nodes(const void 
*overlay_fdt,
 
 spin_lock(_lock);
 
-/*
- * First check if dtbo is correct i.e. it should one of the dtbo which was
- * used when dynamically adding the node.
- * Limitation: Cases with same node names but different property are not
- * supported currently. We are relying on user to provide the same dtbo
- * as it was used when adding the nodes.
- */
-list_for_each_entry_safe( entry, temp, _tracker, entry )
-{
-if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 )
-{
-track = entry;
-found_entry = true;
-break;
-}
-}
-
-if ( !found_entry )
+entry =

[PATCH v6 6/7] tools: Introduce the "xl dt-overlay attach" command

From: Henry Wang 

With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach (in the future also detach) devices from the provided DT overlay
to domains. Support this by introducing a new "xl dt-overlay" command
and related documentation, i.e. "xl dt-overlay attach. Slightly rework
the command option parsing logic.

Signed-off-by: Henry Wang 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Jason Andryuk 
Reviewed-by: Stefano Stabellini 
---
 tools/include/libxl.h   | 15 +++
 tools/include/xenctrl.h |  3 +++
 tools/libs/ctrl/xc_dt_overlay.c | 31 +++
 tools/libs/light/libxl_dt_overlay.c | 28 +
 tools/xl/xl_cmdtable.c  |  4 +--
 tools/xl/xl_vmcontrol.c | 39 -
 6 files changed, 106 insertions(+), 14 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index a3d05c840b..f5c7167742 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -641,6 +641,12 @@
  */
 #define LIBXL_HAVE_XEN_9PFS 1
 
+/*
+ * LIBXL_HAVE_DT_OVERLAY_DOMAIN indicates the presence of
+ * libxl_dt_overlay_domain.
+ */
+#define LIBXL_HAVE_DT_OVERLAY_DOMAIN 1
+
 /*
  * libxl memory management
  *
@@ -2554,8 +2560,17 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, 
uint32_t domid,
 void libxl_device_pci_list_free(libxl_device_pci* list, int num);
 
 #if defined(__arm__) || defined(__aarch64__)
+/* Values should keep consistent with the op from XEN_SYSCTL_dt_overlay */
+#define LIBXL_DT_OVERLAY_ADD   1
+#define LIBXL_DT_OVERLAY_REMOVE2
 int libxl_dt_overlay(libxl_ctx *ctx, void *overlay,
  uint32_t overlay_size, uint8_t overlay_op);
+
+/* Values should keep consistent with the op from XEN_DOMCTL_dt_overlay */
+#define LIBXL_DT_OVERLAY_DOMAIN_ATTACH 1
+int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id,
+void *overlay_dt, uint32_t overlay_dt_size,
+uint8_t overlay_op);
 #endif
 
 /*
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 4996855944..9ceca0cffc 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2657,6 +2657,9 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t 
domid,
 #if defined(__arm__) || defined(__aarch64__)
 int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
   uint32_t overlay_fdt_size, uint8_t overlay_op);
+int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt,
+ uint32_t overlay_fdt_size, uint8_t overlay_op,
+ uint32_t domain_id);
 #endif
 
 /* Compat shims */
diff --git a/tools/libs/ctrl/xc_dt_overlay.c b/tools/libs/ctrl/xc_dt_overlay.c
index c2224c4d15..ea1da522d1 100644
--- a/tools/libs/ctrl/xc_dt_overlay.c
+++ b/tools/libs/ctrl/xc_dt_overlay.c
@@ -48,3 +48,34 @@ err:
 
 return err;
 }
+
+int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt,
+ uint32_t overlay_fdt_size, uint8_t overlay_op,
+ uint32_t domain_id)
+{
+int err;
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_dt_overlay,
+.domain = domain_id,
+.u.dt_overlay = {
+.overlay_op = overlay_op,
+.overlay_fdt_size = overlay_fdt_size,
+}
+};
+
+DECLARE_HYPERCALL_BOUNCE(overlay_fdt, overlay_fdt_size,
+ XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( (err = xc_hypercall_bounce_pre(xch, overlay_fdt)) )
+goto err;
+
+set_xen_guest_handle(domctl.u.dt_overlay.overlay_fdt, overlay_fdt);
+
+if ( (err = do_domctl(xch, )) != 0 )
+PERROR("%s failed", __func__);
+
+err:
+xc_hypercall_bounce_post(xch, overlay_fdt);
+
+return err;
+}
diff --git a/tools/libs/light/libxl_dt_overlay.c 
b/tools/libs/light/libxl_dt_overlay.c
index a6c709a6dc..00503b76bd 100644
--- a/tools/libs/light/libxl_dt_overlay.c
+++ b/tools/libs/light/libxl_dt_overlay.c
@@ -69,3 +69,31 @@ out:
 return rc;
 }
 
+int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id,
+void *overlay_dt, uint32_t overlay_dt_size,
+uint8_t overlay_op)
+{
+int rc;
+int r;
+GC_INIT(ctx);
+
+if (check_overlay_fdt(gc, overlay_dt, overlay_dt_size)) {
+LOG(ERROR, "Overlay DTB check failed");
+rc = ERROR_FAIL;
+goto out;
+} else {
+LOG(DEBUG, "Overlay DTB check passed");
+rc = 0;
+}
+
+r = xc_dt_overlay_domain(ctx->xch, overlay_dt, overlay_dt_size, overlay_op,
+ domain_id);
+if (r) {
+LOG(ERROR, "%s: Attaching/Detaching overlay dtb failed.", __func__);
+rc = ERROR_FAIL;
+}
+
+out:
+GC_FREE;
+return rc;
+}
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 1f3c6b5897..42751228c1 100644
--- a/tools/xl/xl_cmdtable.c
+++

[PATCH v6 1/7] tools/xl: Correct the help information and exit code of the dt-overlay command

From: Henry Wang 

Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".
Add the missing "," in the cmdtable.

Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.

Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree 
overlay support")
Suggested-by: Anthony PERARD 
Signed-off-by: Henry Wang 
Reviewed-by: Jason Andryuk 
Reviewed-by: Stefano Stabellini 
---
 tools/xl/xl_cmdtable.c  | 2 +-
 tools/xl/xl_vmcontrol.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 62bdb2aeaa..1f3c6b5897 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -635,7 +635,7 @@ const struct cmd_spec cmd_table[] = {
 { "dt-overlay",
   _dt_overlay, 0, 1,
   "Add/Remove a device tree overlay",
-  "add/remove <.dtbo>"
+  "add/remove <.dtbo>",
   "-h print this help\n"
 },
 #endif
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 98f6bd2e76..02575d5d36 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv)
 const int overlay_remove_op = 2;
 
 if (argc < 2) {
-help("dt_overlay");
+help("dt-overlay");
 return EXIT_FAILURE;
 }
 
@@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv)
 fprintf(stderr, "failed to read the overlay device tree file %s\n",
 overlay_config_file);
 free(overlay_dtb);
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 } else {
 fprintf(stderr, "overlay dtbo file not provided\n");
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 
 rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op);
-- 
2.25.1

[PATCH v6 2/7] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

From: Henry Wang 

There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.

Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.

Signed-off-by: Henry Wang 
Reviewed-by: Julien Grall 
---
 docs/misc/arm/device-tree/booting.txt | 16 
 xen/arch/arm/dom0less-build.c | 11 +--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index bbd955e9c2..f1fd069c87 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -260,6 +260,22 @@ with the following properties:
 value specified by Xen command line parameter gnttab_max_maptrack_frames
 (or its default value if unspecified, i.e. 1024) is used.
 
+- passthrough
+
+A string property specifying whether IOMMU mappings are enabled for the
+domain and hence whether it will be enabled for passthrough hardware.
+Possible property values are:
+
+- "enabled"
+IOMMU mappings are enabled for the domain. Note that this option is the
+default if the user provides the device partial passthrough device tree
+for the domain.
+
+- "disabled"
+IOMMU mappings are disabled for the domain and so hardware may not be
+passed through. This option is the default if this property is missing
+and the user does not provide the device partial device tree for the 
domain.
+
 Under the "xen,domain" compatible node, one or more sub-nodes are present
 for the DomU kernel and ramdisk.
 
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242..5830a7051d 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -848,6 +848,8 @@ static int __init construct_domU(struct domain *d,
 void __init create_domUs(void)
 {
 struct dt_device_node *node;
+const char *dom0less_iommu;
+bool iommu = false;
 const struct dt_device_node *cpupool_node,
 *chosen = dt_find_node_by_path("/chosen");
 
@@ -895,8 +897,13 @@ void __init create_domUs(void)
 panic("Missing property 'cpus' for domain %s\n",
   dt_node_name(node));
 
-if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") &&
- iommu_enabled )
+if ( !dt_property_read_string(node, "passthrough", _iommu) &&
+ !strcmp(dom0less_iommu, "enabled") )
+iommu = true;
+
+if ( iommu_enabled &&
+ (iommu || dt_find_compatible_node(node, NULL,
+   "multiboot,device-tree")) )
 d_cfg.flags |= XEN_DOMCTL_CDF_iommu;
 
 if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) )
-- 
2.25.1

[PATCH v6 3/7] tools/arm: Introduce the "nr_spis" xl config entry

From: Henry Wang 

Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.

Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.

Update the doc and the golang bindings accordingly.

Signed-off-by: Henry Wang 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Jason Andryuk 
---
 docs/man/xl.cfg.5.pod.in | 16 
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/include/libxl.h|  5 +
 tools/libs/light/libxl_arm.c |  4 ++--
 tools/libs/light/libxl_types.idl |  1 +
 tools/xl/xl_parse.c  |  3 +++
 7 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 8f2b375ce9..ac3f88fd57 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -3072,6 +3072,22 @@ raised.
 
 =back
 
+=over 4
+
+=item B
+
+An optional integer parameter specifying the number of SPIs (Shared
+Peripheral Interrupts) to allocate for the domain. Max is 991 SPIs. If
+the value specified by the `nr_spis` parameter is smaller than the
+number of SPIs calculated by the toolstack based on the devices
+allocated for the domain, or the `nr_spis` parameter is not specified,
+the value calculated by the toolstack will be used for the domain.
+Otherwise, the value specified by the `nr_spis` parameter will be used.
+The number of SPIs should match the highest interrupt ID that will be
+assigned to the domain.
+
+=back
+
 =head3 x86
 
 =over 4
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index b9cb5b33c7..fe5110474d 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl)
+x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis)
 if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
@@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl)
+xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis)
 if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 5b293755d7..c9e45b306f 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -597,6 +597,7 @@ ArchArm struct {
 GicVersion GicVersion
 Vuart VuartType
 SveVl SveType
+NrSpis uint32
 }
 ArchX86 struct {
 MsrRelaxed Defbool
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 62cb07dea6..a3d05c840b 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -308,6 +308,11 @@
  */
 #define LIBXL_HAVE_BUILDINFO_ARCH_ARM_SVE_VL 1
 
+/*
+ * libxl_domain_build_info has the arch_arm.nr_spis field
+ */
+#define LIBXL_HAVE_BUILDINFO_ARCH_NR_SPIS 1
+
 /*
  * LIBXL_HAVE_SOFT_RESET indicates that libxl supports performing
  * 'soft reset' for domains and there is 'soft_reset' shutdown reason
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 1cb89fa584..a4029e3ac8 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 
 LOG(DEBUG, "Configure the domain");
 
-config->arch.nr_spis = nr_spis;
-LOG(DEBUG, " - Allocate %u SPIs", nr_spis);
+config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis);
+LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis);
 
 switch (d_config->b_info.arch_arm.gic_version) {
 case LIBXL_GIC_VERSION_DEFAULT:
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 79e9c656cc..4e65e6fda5 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -722,6 +722,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
("vuart", libxl_vuart_type),
("sve_vl", libxl_sve_type),
+   ("nr_spis", uint32),
   ])),
 ("arch_x86", Struct(None, [("msr_relaxed",

[PATCH v6 4/7] xen/arm/gic: Allow adding interrupt to running VMs

From: Henry Wang 

Currently, adding physical interrupts are only allowed at
the domain creation time. For use cases such as dynamic device
tree overlay addition, the adding of physical IRQ to
running domains should be allowed.

Drop the above-mentioned domain creation check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these
cases we simply reject the operation. Do it for both new and old
vGIC implementations.

Signed-off-by: Henry Wang 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Julien Grall 
---
 xen/arch/arm/gic-vgic.c  | 9 +++--
 xen/arch/arm/gic.c   | 8 
 xen/arch/arm/vgic/vgic.c | 7 +--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 56490dbc43..b99e287224 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -442,9 +442,14 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
 
 if ( connect )
 {
-/* The VIRQ should not be already enabled by the guest */
+/*
+ * The VIRQ should not be already enabled by the guest nor
+ * active/pending in the guest.
+ */
 if ( !p->desc &&
- !test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
+ !test_bit(GIC_IRQ_GUEST_ENABLED, >status) &&
+ !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) &&
+ !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )
 p->desc = desc;
 else
 ret = -EBUSY;
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 44c40e86de..b3467a76ae 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -135,14 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int 
virq,
 ASSERT(virq < vgic_num_irqs(d));
 ASSERT(!is_lpi(virq));
 
-/*
- * When routing an IRQ to guest, the virtual state is not synced
- * back to the physical IRQ. To prevent get unsync, restrict the
- * routing to when the Domain is been created.
- */
-if ( d->creation_finished )
-return -EBUSY;
-
 ret = vgic_connect_hw_irq(d, NULL, virq, desc, true);
 if ( ret )
 return ret;
diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index b9463a5f27..6cabd0496d 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -876,8 +876,11 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu 
*vcpu,
 
 if ( connect )  /* assign a mapped IRQ */
 {
-/* The VIRQ should not be already enabled by the guest */
-if ( !irq->hw && !irq->enabled )
+/*
+ * The VIRQ should not be already enabled by the guest nor
+ * active/pending in the guest.
+ */
+if ( !irq->hw && !irq->enabled && !irq->active && !irq->pending_latch )
 {
 irq->hw = true;
 irq->hwintid = desc->irq;
-- 
2.25.1

[PATCH v6 0/7] Remaining patches for dynamic node programming using overlay dtbo

Hi all,

This is the remaining series for the full functional "dynamic node
programming using overlay dtbo" feature. The first part [1] has
already been merged.

Quoting from the original series, the first part has already made
Xen aware of new device tree node which means updating the dt_host
with overlay node information, and in this series, the goal is to
map IRQ and IOMMU during runtime, where we will do the actual IOMMU
and IRQ mapping and unmapping to a running domain. Also, documentation
of the "dynamic node programming using overlay dtbo" feature is added.

During the discussion in v3, I was recommended to split the overlay
devices attach/detach to/from running domains to separated patches [3].
But I decided to only expose the xl user interfaces together to the
users after device attach/detach is fully functional, so I didn't
split the toolstack patch (#8).

Patch 1 is a fix of the existing code which is noticed during my local
tests, details please see the commit message.

Gitlab CI for this series can be found in [2].

[1] 
https://lore.kernel.org/xen-devel/20230906011631.30310-1-vikram.garh...@amd.com/
[2] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1301720278
[3] 
https://lore.kernel.org/xen-devel/e743d3d2-5884-4e55-8627-85985ba33...@amd.com/


Changes in v6:
- address Julien's comments
- use a !is_domain_direct_mapped check

- Stefano

Re: [PATCH v5 7/7] docs: Add device tree overlay documentation

On Fri, 24 May 2024, Julien Grall wrote:
> Hi Stefano,
> 
> On 24/05/2024 03:18, Stefano Stabellini wrote:
> > From: Vikram Garhwal 
> > 
> > Signed-off-by: Vikram Garhwal 
> > Signed-off-by: Stefano Stabellini 
> > Signed-off-by: Henry Wang 
> > ---
> >   docs/misc/arm/overlay.txt | 82 +++
> >   1 file changed, 82 insertions(+)
> >   create mode 100644 docs/misc/arm/overlay.txt
> > 
> > diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt
> > new file mode 100644
> > index 00..0a2dee951a
> > --- /dev/null
> > +++ b/docs/misc/arm/overlay.txt
> > @@ -0,0 +1,82 @@
> > +# Device Tree Overlays support in Xen
> > +
> > +Xen experimentally supports dynamic device assignment to running
> > +domains, i.e. adding/removing nodes (using .dtbo) to/from Xen device
> > +tree, and attaching them to a running domain with given $domid.
> > +
> > +Dynamic node assignment works in two steps:
> > +
> > +## Add/Remove device tree overlay to/from Xen device tree
> > +
> > +1. Xen tools check the dtbo given and parse all other user provided
> > arguments
> > +2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
> > +3. Xen hypervisor applies/removes the dtbo to/from Xen device tree.
> > +
> > +## Attach device from the DT overlay to domain
> > +
> > +1. Xen tools check the dtbo given and parse all other user provided
> > arguments
> > +2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
> > +3. Xen hypervisor attach the device to the user-provided $domid by
> > +   mapping node resources in the DT overlay.
> > +
> > +# Examples
> > +
> > +Here are a few examples on how to use it.
> > +
> > +## Dom0 device add
> > +
> > +For assigning a device tree overlay to Dom0, user should firstly properly
> > +prepare the DT overlay. More information about device tree overlays can be
> > +found in [1]. Then, in Dom0, enter the following:
> > +
> > +(dom0) xl dt-overlay add overlay.dtbo
> > +
> > +This will allocate the devices mentioned in overlay.dtbo to Xen device
> > tree.
> > +
> > +To assign the newly added device from the dtbo to Dom0:
> > +
> > +(dom0) xl dt-overlay attach overlay.dtbo 0
> > +
> > +Next, if the user wants to add the same device tree overlay to dom0
> > +Linux, execute the following:
> > +
> > +(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
> > +(dom0) cat overlay.dtbo >
> > /sys/kernel/config/device-tree/overlays/new_overlay/dtbo
> > +
> > +Finally if needed, the relevant Linux kernel drive can be loaded using:
> > +
> > +(dom0) modprobe module_name.ko
> > +
> > +## DomU device add/remove
> > +
> > +All the nodes in dtbo will be assigned to a domain; the user will need
> > +to prepare the dtb for the domU.
> 
> s/dtb/dtbo/? 

Yes, done


> But I am little bit confused with the wording. I think you may
> want to add *different dtbo* so it clarifies from the start (this only becomes
> obvious at the end of the section) that the user is not meant to use the same
> for all the commands.

Yes it was confusing, I tried to clarify it. This is a doc so we could
improve it further after the freeze.


>  For example, the `interrupt-parent`
> > +property of the DomU overlay should be changed to the Xen hardcoded
> > +value `0xfde8`, and the xen,reg property should be added to specify the
> > +address mappings. If xen,reg is not present, it is assumed 1:1 mapping.
> 
> Repeating an earlier comment here. I think xen,reg should be mandatory for
> non-direct mapped domain.

That's OK


> Also, can you clarify what is the expect property layout for xen,reg?

I'll add a pointer


> > +Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`.
> > +
> > +For new domains to be created, the user will need to create the DomU
> > +with below properties properly configured in the xl config file:
> > +- `iomem`
> 
> I looked at your reply in v4 and I am afraid I still don't understand why we
> are mentioning 'iomem'. If we want to use the commands below, then the domain
> needs to be created in advance. So you can't yet know 'iomem'.
> 
> You could avoid "xl dt-overlay attach" but then you need the user to specify
> both "irqs" and "iomem". From a user point of view, it would be easier to add
> a new propery in the configuration file listing the overlays. Something like:
> 
> dt_overlays = [ "overlay.dtbo", ... ]
> 
> Anyway, that somewhat separate. For now, I think we want to drop 'iomem' from
> the list and reword this paragraph to say that the 'passthrough' property
> needs to be set if you plan to use DT overlay and devices requiring the IOMMU.

OK, I did that


> 
> > +- `passthrough` (if IOMMU is needed)
> 
> This property is required at the start because we don't support enabling the
> IOMMU lazily.

Yeah. I think it is much clearer now.


> > +
> > +User will also need to modprobe the relevant drivers. For already
> > +running domains, the user can use the xl dt-overlay attach command,
> > +example:
> > +
> > +

Re: [PATCH v5 6/7] tools: Introduce the "xl dt-overlay attach" command

On Fri, 24 May 2024, Julien Grall wrote:
> Hi Stefano,
> 
> On 24/05/2024 03:18, Stefano Stabellini wrote:
> > From: Henry Wang 
> > 
> > With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
> > attach (in the future also detach) devices from the provided DT overlay
> > to domains. Support this by introducing a new "xl dt-overlay" command
> > and related documentation, i.e. "xl dt-overlay attach. Slightly rework
> > the command option parsing logic.
> > 
> > Signed-off-by: Henry Wang 
> > Signed-off-by: Stefano Stabellini 
> > Reviewed-by: Jason Andryuk 
> > Reviewed-by: Stefano Stabellini 
> > ---
> >   tools/include/libxl.h   | 15 +++
> >   tools/include/xenctrl.h |  3 +++
> >   tools/libs/ctrl/xc_dt_overlay.c | 31 +++
> >   tools/libs/light/libxl_dt_overlay.c | 28 +
> >   tools/xl/xl_cmdtable.c  |  4 +--
> >   tools/xl/xl_vmcontrol.c | 39 -
> >   6 files changed, 106 insertions(+), 14 deletions(-)
> > 
> > diff --git a/tools/include/libxl.h b/tools/include/libxl.h
> > index 3b5c18b48b..f2e19ec592 100644
> > --- a/tools/include/libxl.h
> > +++ b/tools/include/libxl.h
> > @@ -643,6 +643,12 @@
> >*/
> >   #define LIBXL_HAVE_NR_SPIS 1
> >   +/*
> > + * LIBXL_HAVE_OVERLAY_DOMAIN indicates the presence of
> > + * libxl_dt_overlay_domain.
> > + */
> > +#define LIBXL_HAVE_OVERLAY_DOMAIN 1
> I think this wants to gain DT_ just before OVERLAY. So from the name it is
> clearer we are talking about the Device-Tree overlay and not filesystem (or
> anything else where overlays are involved).

Done

Re: [PATCH v5 5/7] xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains

On Fri, 24 May 2024, Julien Grall wrote:
> Hi Stefano,
> 
> On 24/05/2024 03:18, Stefano Stabellini wrote:
> > From: Henry Wang 
> > 
> > In order to support the dynamic dtbo device assignment to a running
> > VM, the add/remove of the DT overlay and the attach/detach of the
> > device from the DT overlay should happen separately. Therefore,
> > repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
> > overlay to Xen device tree, instead of assigning the device to the
> > hardware domain at the same time. It is OK to change the sysctl behavior
> > as this feature is experimental so changing sysctl behavior and breaking
> > compatibility is OK.
> > 
> > Add the XEN_DOMCTL_dt_overlay with operations
> > XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment to the domain.
> > 
> > The hypervisor firstly checks the DT overlay passed from the toolstack
> > is valid. Then the device nodes are retrieved from the overlay tracker
> > based on the DT overlay. The attach of the device is implemented by
> > mapping the IRQ and IOMMU resources. All devices in the overlay are
> > assigned to a single domain.
> > 
> > Also take the opportunity to make one coding style fix in sysctl.h.
> > 
> > xen,reg is to be used to handle non-1:1 mappings but it is currently
> > unsupported.
> 
> This means that we would still try to use 1:1 mappings for non-directmap
> domain. Given that the overlay is a blob, I am a bit concerned that the user
> may not notice any clash and it would be difficult to debug.
> 
> Therefore, I would like xen,reg to be mandatory when using non directmapped
> domain. For now, the best approach would be to prevent device assignment if
> !is_domain_direct_mapped().

That's fine, I'll make the change


> > +long dt_overlay_domctl(struct domain *d, struct xen_domctl_dt_overlay *op)
> > +{
> > +long ret;
> > +void *overlay_fdt;
> > +
> > +if ( op->overlay_op != XEN_DOMCTL_DT_OVERLAY_ATTACH )
> > +return -EOPNOTSUPP;
> > +
> > +if ( op->overlay_fdt_size == 0 || op->overlay_fdt_size > KB(500) )
> 
> Please add #define DT_OVERLAY_MAX_SIZE KB(500) and use it here and the other
> place.

OK

[libvirt test] 186133: tolerable all pass - PUSHED

flight 186133 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186133/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186070
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 libvirt  3b3efef58dc4bf6c07a73862c280e30f2023054d
baseline version:
 libvirt  7dda4a03ac77bbe14b12b7b8f3a509a0e09f3129

Last test of basis   186070  2024-05-22 04:20:52 Z2 days
Failing since186099  2024-05-23 04:18:41 Z1 days2 attempts
Testing same since   186133  2024-05-24 04:18:53 Z0 days1 attempts


People who touched revisions under test:
  Daniel P. Berrangé 
  Laine Stump 
  Michal Privoznik 
  Peter Krempa 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-amd64-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-amd64-amd64-libvirt-raw pass
 test-arm64-arm64-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd pass
 test-armhf-armhf-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/libvirt.git
   7dda4a03ac..3b3efef58d  3b3efef58dc4bf6c07a73862c280e30f2023054d -> 
xen-tested-master

Re: [PATCH v5 3/7] tools/arm: Introduce the "nr_spis" xl config entry

On Fri, 23 May 2024, Julien Grall wrote:
> Hi,
> 
> On 24/05/2024 03:18, Stefano Stabellini wrote:
> > From: Henry Wang 
> > 
> > Currently, the number of SPIs allocated to the domain is only
> > configurable for Dom0less DomUs. Xen domains are supposed to be
> > platform agnostics and therefore the numbers of SPIs for libxl
> > guests should not be based on the hardware.
> > 
> > Introduce a new xl config entry for Arm to provide a method for
> > user to decide the number of SPIs. This would help to avoid
> > bumping the `config->arch.nr_spis` in libxl everytime there is a
> > new platform with increased SPI numbers.
> > 
> > Update the doc and the golang bindings accordingly.
> > 
> > Signed-off-by: Henry Wang 
> > Signed-off-by: Stefano Stabellini 
> > Reviewed-by: Jason Andryuk 
> > ---
> >   docs/man/xl.cfg.5.pod.in | 16 
> >   tools/golang/xenlight/helpers.gen.go |  2 ++
> >   tools/golang/xenlight/types.gen.go   |  1 +
> >   tools/include/libxl.h|  7 +++
> >   tools/libs/light/libxl_arm.c |  4 ++--
> >   tools/libs/light/libxl_types.idl |  1 +
> >   tools/xl/xl_parse.c  |  3 +++
> >   7 files changed, 32 insertions(+), 2 deletions(-)
> > 
> > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> > index 8f2b375ce9..ac3f88fd57 100644
> > --- a/docs/man/xl.cfg.5.pod.in
> > +++ b/docs/man/xl.cfg.5.pod.in
> > @@ -3072,6 +3072,22 @@ raised.
> > =back
> >   +=over 4
> > +
> > +=item B
> > +
> > +An optional integer parameter specifying the number of SPIs (Shared
> > +Peripheral Interrupts) to allocate for the domain. Max is 991 SPIs. If
> > +the value specified by the `nr_spis` parameter is smaller than the
> > +number of SPIs calculated by the toolstack based on the devices
> > +allocated for the domain, or the `nr_spis` parameter is not specified,
> > +the value calculated by the toolstack will be used for the domain.
> > +Otherwise, the value specified by the `nr_spis` parameter will be used.
> > +The number of SPIs should match the highest interrupt ID that will be
> > +assigned to the domain.
> > +
> > +=back
> > +
> >   =head3 x86
> > =over 4
> > diff --git a/tools/golang/xenlight/helpers.gen.go
> > b/tools/golang/xenlight/helpers.gen.go
> > index b9cb5b33c7..fe5110474d 100644
> > --- a/tools/golang/xenlight/helpers.gen.go
> > +++ b/tools/golang/xenlight/helpers.gen.go
> > @@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
> >   x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
> >   x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
> >   x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl)
> > +x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis)
> >   if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil
> > {
> >   return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
> >   }
> > @@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
> >   xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
> >   xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
> >   xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl)
> > +xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis)
> >   if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil {
> >   return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
> >   }
> > diff --git a/tools/golang/xenlight/types.gen.go
> > b/tools/golang/xenlight/types.gen.go
> > index 5b293755d7..c9e45b306f 100644
> > --- a/tools/golang/xenlight/types.gen.go
> > +++ b/tools/golang/xenlight/types.gen.go
> > @@ -597,6 +597,7 @@ ArchArm struct {
> >   GicVersion GicVersion
> >   Vuart VuartType
> >   SveVl SveType
> > +NrSpis uint32
> >   }
> >   ArchX86 struct {
> >   MsrRelaxed Defbool
> > diff --git a/tools/include/libxl.h b/tools/include/libxl.h
> > index 62cb07dea6..3b5c18b48b 100644
> > --- a/tools/include/libxl.h
> > +++ b/tools/include/libxl.h
> > @@ -636,6 +636,13 @@
> >*/
> >   #define LIBXL_HAVE_XEN_9PFS 1
> >   +/*
> > + * LIBXL_HAVE_NR_SPIS indicates the presence of the nr_spis field in
> > + * libxl_domain_build_info that specifies the number of SPIs interrupts
> > + * for the guest.
> > + */
> > +#define LIBXL_HAVE_NR_SPIS 1
> > +
> 
> Looking at the other arch.arm field, I think this wants to be:
> 
> /*
>  * libxl_domain_build_info has the arch_arm.nr_spis field
>  */
> #define LIBXL_HAVE_BUILDINFO_ARCH_NR_SPIS 1
> 
> This would also clarify that the field is Arm specific.

I made the change

[PATCH v2 13/13] xen/bitops: Rearrange the top of xen/bitops.h

The #include  can move to the top of the file now now that
generic_f?s() have been untangled.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * New
---
 xen/include/xen/bitops.h | 18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index c5518d2c8552..6a5e28730a25 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -4,6 +4,8 @@
 #include 
 #include 
 
+#include 
+
 /*
  * Create a contiguous bitmask starting at bit position @l and ending at
  * position @h. For example GENMASK(30, 21) gives us 0x7fe0ul.
@@ -15,27 +17,13 @@
 (((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LLONG - 1 - (h
 
 /*
- * Find First/Last Set bit.
+ * Find First/Last Set bit (all forms).
  *
  * Bits are labelled from 1.  Returns 0 if given 0.
  */
 unsigned int __pure generic_ffsl(unsigned long x);
 unsigned int __pure generic_flsl(unsigned long x);
 
-/*
- * Include this here because some architectures need generic_ffs/fls in
- * scope
- */
-
-/* - Please tidy above here - */
-
-#include 
-
-/*
- * Find First/Last Set bit (all forms).
- *
- * Bits are labelled from 1.  Returns 0 if given 0.
- */
 static always_inline __pure unsigned int ffs(unsigned int x)
 {
 if ( __builtin_constant_p(x) )
-- 
2.30.2

[PATCH v2 11/13] xen/bitops: Implement fls()/flsl() in common logic

From: Oleksii Kurochko 

This is most easily done together because of how arm32 is currently
structured, but it does just mirror the existing ffs()/ffsl() work.

Introduce compile and boot time testing.

Signed-off-by: Oleksii Kurochko 
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * New, incorperated from Oleksii's RISC-V series and adjusted.

for x86:

  add/remove: 0/0 grow/shrink: 3/17 up/down: 28/-153 (-125)
  Function old new   delta
  pci_enable_msi  10331049 +16
  vlapic_lowest_prio   330 338  +8
  kexec_early_calculations  53  57  +4
  pci_restore_msi_state   11591157  -2
  arch_hwdom_irqs   61  59  -2
  control_read 132 129  -3
  pci_enable_msi.cold  121 117  -4
  arch_get_dma_bitsize 173 169  -4
  xmem_pool_alloc 10391032  -7
  xenheap_max_mfn   49  42  -7
  mba_sanitize_thrtl83  76  -7
  xstate_init  807 799  -8
  offline_page 965 957  -8
  apicid_to_socket 160 152  -8
  vlapic_find_highest_vector61  48 -13
  xmem_pool_free   983 967 -16
  iommu_alloc  935 919 -16
  free_heap_pages 15121496 -16
  detect_ht318 302 -16
  alloc_heap_pages15691553 -16

showing that the optimiser can now do a better job in most cases.
---
 xen/arch/arm/include/asm/arm32/bitops.h |  2 --
 xen/arch/arm/include/asm/arm64/bitops.h | 12 ---
 xen/arch/arm/include/asm/bitops.h   | 19 ++
 xen/arch/ppc/include/asm/bitops.h   |  4 +--
 xen/arch/x86/include/asm/bitops.h   | 46 +++--
 xen/common/bitops.c | 25 ++
 xen/include/xen/bitops.h| 24 +
 7 files changed, 80 insertions(+), 52 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm32/bitops.h 
b/xen/arch/arm/include/asm/arm32/bitops.h
index d0309d47c188..0d7bb12d5c19 100644
--- a/xen/arch/arm/include/asm/arm32/bitops.h
+++ b/xen/arch/arm/include/asm/arm32/bitops.h
@@ -1,8 +1,6 @@
 #ifndef _ARM_ARM32_BITOPS_H
 #define _ARM_ARM32_BITOPS_H
 
-#define flsl fls
-
 /*
  * Little endian assembly bitops.  nr = 0 -> byte 0 bit 0.
  */
diff --git a/xen/arch/arm/include/asm/arm64/bitops.h 
b/xen/arch/arm/include/asm/arm64/bitops.h
index 906d84e5f295..a6135838dcfa 100644
--- a/xen/arch/arm/include/asm/arm64/bitops.h
+++ b/xen/arch/arm/include/asm/arm64/bitops.h
@@ -1,18 +1,6 @@
 #ifndef _ARM_ARM64_BITOPS_H
 #define _ARM_ARM64_BITOPS_H
 
-static inline int flsl(unsigned long x)
-{
-uint64_t ret;
-
-if (__builtin_constant_p(x))
-   return generic_flsl(x);
-
-asm("clz\t%0, %1" : "=r" (ret) : "r" (x));
-
-return BITS_PER_LONG - ret;
-}
-
 /* Based on linux/include/asm-generic/bitops/find.h */
 
 #ifndef CONFIG_GENERIC_FIND_FIRST_BIT
diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h
index d30ba44598e3..8f4bdc09d128 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -140,25 +140,10 @@ static inline int test_bit(int nr, const volatile void 
*addr)
 return 1UL & (p[BITOP_WORD(nr)] >> (nr & (BITOP_BITS_PER_WORD-1)));
 }
 
-/*
- * On ARMv5 and above those functions can be implemented around
- * the clz instruction for much better code efficiency.
- */
-
-static inline int fls(unsigned int x)
-{
-int ret;
-
-if (__builtin_constant_p(x))
-   return generic_flsl(x);
-
-asm("clz\t%"__OP32"0, %"__OP32"1" : "=r" (ret) : "r" (x));
-return 32 - ret;
-}
-
-
 #define arch_ffs(x)  ((x) ? 1 + __builtin_ctz(x) : 0)
 #define arch_ffsl(x) ((x) ? 1 + __builtin_ctzl(x) : 0)
+#define arch_fls(x)  ((x) ? 32 - __builtin_clz(x) : 0)
+#define arch_flsl(x) ((x) ? BITS_PER_LONG - __builtin_clzl(x) : 0)
 
 /**
  * hweightN - returns the hamming weight of a N-bit word
diff --git a/xen/arch/ppc/include/asm/bitops.h 
b/xen/arch/ppc/include/asm/bitops.h
index 761361291e6f..8119b5ace877 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -171,10 +171,10 @@ static inline int __test_and_clear_bit(int nr,

[PATCH v2 06/13] xen/bitops: Implement ffs() in common logic

Perform constant-folding unconditionally, rather than having it implemented
inconsistency between architectures.

Confirm the expected behaviour with compile time and boot time tests.

For non-constant inputs, use arch_ffs() if provided but fall back to
generic_ffsl() if not.  In particular, RISC-V doesn't have a builtin that
works in all configurations.

For x86, rename ffs() to arch_ffs() and adjust the prototype.

For PPC, __builtin_ctz() is 1/3 of the size of size of the transform to
generic_fls().  Drop the definition entirely.  ARM too benefits in the general
case by using __builtin_ctz(), but less dramatically because it using
optimised asm().

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * Fall back to generic, not builtin.
 * Extend the testing with multi-bit values.
 * Use always_inline for x86
 * Defer x86 optimisation to a later change
---
 xen/arch/arm/include/asm/bitops.h |  2 +-
 xen/arch/ppc/include/asm/bitops.h |  2 +-
 xen/arch/x86/include/asm/bitops.h |  3 ++-
 xen/common/Makefile   |  1 +
 xen/common/bitops.c   | 19 +++
 xen/include/xen/bitops.h  | 17 +
 6 files changed, 41 insertions(+), 3 deletions(-)
 create mode 100644 xen/common/bitops.c

diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h
index ec1cf7b9b323..a88ec2612e16 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -157,7 +157,7 @@ static inline int fls(unsigned int x)
 }
 
 
-#define ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
+#define arch_ffs(x)  ((x) ? 1 + __builtin_ctz(x) : 0)
 #define ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
 
 /**
diff --git a/xen/arch/ppc/include/asm/bitops.h 
b/xen/arch/ppc/include/asm/bitops.h
index ab692d01717b..5c36a6cc0ce3 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -173,7 +173,7 @@ static inline int __test_and_clear_bit(int nr, volatile 
void *addr)
 
 #define flsl(x) generic_flsl(x)
 #define fls(x) generic_flsl(x)
-#define ffs(x) ({ unsigned int t_ = (x); fls(t_ & -t_); })
+#define arch_ffs(x)  ((x) ? 1 + __builtin_ctz(x) : 0)
 #define ffsl(x) ({ unsigned long t_ = (x); flsl(t_ & -t_); })
 
 /**
diff --git a/xen/arch/x86/include/asm/bitops.h 
b/xen/arch/x86/include/asm/bitops.h
index 5a71afbc89d5..122767fc0d10 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -430,7 +430,7 @@ static inline int ffsl(unsigned long x)
 return (int)r+1;
 }
 
-static inline int ffs(unsigned int x)
+static always_inline unsigned int arch_ffs(unsigned int x)
 {
 int r;
 
@@ -440,6 +440,7 @@ static inline int ffs(unsigned int x)
   "1:" : "=r" (r) : "rm" (x));
 return r + 1;
 }
+#define arch_ffs arch_ffs
 
 /**
  * fls - find last bit set
diff --git a/xen/common/Makefile b/xen/common/Makefile
index d512cad5243f..21a4fb4c7166 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_ARGO) += argo.o
 obj-y += bitmap.o
+obj-bin-$(CONFIG_DEBUG) += bitops.init.o
 obj-$(CONFIG_GENERIC_BUG_FRAME) += bug.o
 obj-$(CONFIG_HYPFS_CONFIG) += config_data.o
 obj-$(CONFIG_CORE_PARKING) += core_parking.o
diff --git a/xen/common/bitops.c b/xen/common/bitops.c
new file mode 100644
index ..8c161b8ea7fa
--- /dev/null
+++ b/xen/common/bitops.c
@@ -0,0 +1,19 @@
+#include 
+#include 
+#include 
+
+static void __init test_ffs(void)
+{
+/* unsigned int ffs(unsigned int) */
+CHECK(ffs, 0, 0);
+CHECK(ffs, 1, 1);
+CHECK(ffs, 3, 1);
+CHECK(ffs, 7, 1);
+CHECK(ffs, 6, 2);
+CHECK(ffs, 0x8000U, 32);
+}
+
+static void __init __constructor test_bitops(void)
+{
+test_ffs();
+}
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index cd405df96180..f7e90a2893a5 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -31,6 +31,23 @@ unsigned int __pure generic_flsl(unsigned long x);
 
 #include 
 
+/*
+ * Find First/Last Set bit (all forms).
+ *
+ * Bits are labelled from 1.  Returns 0 if given 0.
+ */
+static always_inline __pure unsigned int ffs(unsigned int x)
+{
+if ( __builtin_constant_p(x) )
+return __builtin_ffs(x);
+
+#ifdef arch_ffs
+return arch_ffs(x);
+#else
+return generic_ffsl(x);
+#endif
+}
+
 /* - Please tidy below here - */
 
 #ifndef find_next_bit
-- 
2.30.2

[PATCH v2 09/13] xen/bitops: Replace find_first_set_bit() with ffsl() - 1

find_first_set_bit() is a Xen-ism which has undefined behaviour with a 0
input.  The latter is well defined with an input of 0, and is a found outside
of Xen too.

_init_heap_pages() is the one special case here, comparing the LSB of two
different addresses.  The -1 cancels off both sides of the expression.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * Reorder from later in the series to keep ARM bisectable

In an x86 build, we get the following delta:

  add/remove: 0/0 grow/shrink: 2/4 up/down: 39/-52 (-13)
  Function old new   delta
  hpet_write  21832206 +23
  init_heap_pages 12221238 +16
  dom0_construct_pvh  39593958  -1
  mapping_order139 126 -13
  guest_physmap_mark_populate_on_demand   13011285 -16
  vcpumask_to_pcpumask 525 503 -22

so the optimiser improvements for ffsl() really do speak for themselves.

I'm surprised by the increase in hpet_write(), but looking at the code, it
very clearly wants the same treatment as:

  commit 188fa82305e72b725473db9146e20cc9abf7bff3
  Author: Andrew Cooper 
  Date:   Fri Mar 15 11:31:33 2024

  xen/vpci: Improve code generation in mask_write()

which I'm confident will end up as a net improvement.
---
 xen/arch/x86/guest/xen/xen.c | 4 ++--
 xen/arch/x86/hvm/dom0_build.c| 2 +-
 xen/arch/x86/hvm/hpet.c  | 8 
 xen/arch/x86/include/asm/pt-contig-markers.h | 2 +-
 xen/arch/x86/mm.c| 2 +-
 xen/arch/x86/mm/p2m-pod.c| 4 ++--
 xen/common/page_alloc.c  | 2 +-
 xen/common/softirq.c | 2 +-
 xen/drivers/passthrough/amd/iommu_map.c  | 2 +-
 xen/drivers/passthrough/iommu.c  | 4 ++--
 xen/drivers/passthrough/x86/iommu.c  | 4 ++--
 11 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index d9768cc9527d..7484b3f73ad3 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -168,14 +168,14 @@ static void cf_check xen_evtchn_upcall(void)
 
 while ( pending )
 {
-unsigned int l1 = find_first_set_bit(pending);
+unsigned int l1 = ffsl(pending) - 1;
 unsigned long evtchn = xchg(_shared_info->evtchn_pending[l1], 0);
 
 __clear_bit(l1, );
 evtchn &= ~XEN_shared_info->evtchn_mask[l1];
 while ( evtchn )
 {
-unsigned int port = find_first_set_bit(evtchn);
+unsigned int port = ffsl(evtchn) - 1;
 
 __clear_bit(port, );
 port += l1 * BITS_PER_LONG;
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index b0cb96c3bc76..68c08bbe94f7 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -139,7 +139,7 @@ static int __init pvh_populate_memory_range(struct domain 
*d,
 order = get_order_from_pages(end - start + 1);
 order = min(order ? order - 1 : 0, max_order);
 /* The order allocated and populated must be aligned to the address. */
-order = min(order, start ? find_first_set_bit(start) : MAX_ORDER);
+order = min(order, start ? ffsl(start) - 1 : MAX_ORDER);
 page = alloc_domheap_pages(d, order, dom0_memflags | MEMF_no_scrub);
 if ( page == NULL )
 {
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 12b00b770257..37e765e97df9 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -335,7 +335,7 @@ static void timer_sanitize_int_route(HPETState *h, unsigned 
int tn)
  * enabled pick the first irq.
  */
 timer_config(h, tn) |=
-MASK_INSR(find_first_set_bit(timer_int_route_cap(h, tn)),
+MASK_INSR(ffsl(timer_int_route_cap(h, tn)) - 1,
   HPET_TN_ROUTE);
 }
 
@@ -409,7 +409,7 @@ static int cf_check hpet_write(
 {
 bool active;
 
-i = find_first_set_bit(new_val);
+i = ffsl(new_val) - 1;
 if ( i >= HPET_TIMER_NUM )
 break;
 __clear_bit(i, _val);
@@ -535,14 +535,14 @@ static int cf_check hpet_write(
 /* stop/start timers whos state was changed by this write. */
 while (stop_timers)
 {
-i = find_first_set_bit(stop_timers);
+i = ffsl(stop_timers) - 1;
 __clear_bit(i, _timers);
 hpet_stop_timer(h, i, guest_time);
 }
 
 while (start_timers)
 {
-i =

[PATCH v2 02/13] xen/bitops: Cleanup ahead of rearrangements

 * Rename __attribute_pure__ to just __pure before it gains users.
 * Introduce __constructor which is going to be used in lib/, and is
   unconditionally cf_check.
 * Identify the areas of xen/bitops.h which are a mess.
 * Introduce xen/boot-check.h as helpers for compile and boot time testing.
   This provides a statement of the ABI, and a confirmation that arch-specific
   implementations behave as expected.

Sadly Clang 7 and older isn't happy with the compile time checks.  Skip them,
and just rely on the runtime checks.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * Break macros out into a header as they're going to be used elsewhere too
 * Use panic() rather than BUG_ON() to be more helpful when something fails
 * Brackets in HIDE()
 * Alignment adjustments
 * Skip COMPILE_CHECK() for Clang < 8
---
 xen/include/xen/bitops.h | 13 ++--
 xen/include/xen/boot-check.h | 60 
 xen/include/xen/compiler.h   |  3 +-
 3 files changed, 72 insertions(+), 4 deletions(-)
 create mode 100644 xen/include/xen/boot-check.h

diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index e3c5a4ccf321..9b40f20381a2 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -1,5 +1,7 @@
-#ifndef _LINUX_BITOPS_H
-#define _LINUX_BITOPS_H
+#ifndef XEN_BITOPS_H
+#define XEN_BITOPS_H
+
+#include 
 #include 
 
 /*
@@ -103,8 +105,13 @@ static inline int generic_flsl(unsigned long x)
  * Include this here because some architectures need generic_ffs/fls in
  * scope
  */
+
+/* - Please tidy above here - */
+
 #include 
 
+/* - Please tidy below here - */
+
 #ifndef find_next_bit
 /**
  * find_next_bit - find the next set bit in a memory region
@@ -294,4 +301,4 @@ static inline __u32 ror32(__u32 word, unsigned int shift)
 
 #define BIT_WORD(nr) ((nr) / BITS_PER_LONG)
 
-#endif
+#endif /* XEN_BITOPS_H */
diff --git a/xen/include/xen/boot-check.h b/xen/include/xen/boot-check.h
new file mode 100644
index ..250f9a40d3b0
--- /dev/null
+++ b/xen/include/xen/boot-check.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Helpers for boot-time checks of basic logic, including confirming that
+ * examples which should be calculated by the compiler are.
+ */
+#ifndef XEN_BOOT_CHECK_H
+#define XEN_BOOT_CHECK_H
+
+#include 
+
+/* Hide a value from the optimiser. */
+#define HIDE(x) \
+({ typeof(x) _x = (x); asm volatile ( "" : "+r" (_x) ); _x; })
+
+/*
+ * Check that fn(val) can be calcuated by the compiler, and that it gives the
+ * expected answer.
+ *
+ * Clang < 8 can't fold constants through static inlines, causing this to
+ * fail.  Simply skip it for incredibly old compilers.
+ */
+#if !CONFIG_CC_IS_CLANG || CONFIG_CLANG_VERSION >= 8
+#define COMPILE_CHECK(fn, val, res) \
+do {\
+typeof(fn(val)) real = fn(val); \
+\
+if ( !__builtin_constant_p(real) )  \
+asm ( ".error \"'" STR(fn(val)) "' not compile-time constant\"" ); 
\
+else if ( real != res ) \
+asm ( ".error \"Compile time check '" STR(fn(val) == res) "' 
failed\"" ); \
+} while ( 0 )
+#else
+#define COMPILE_CHECK(fn, val, res)
+#endif
+
+/*
+ * Check that Xen's runtime logic for fn(val) gives the expected answer.  This
+ * requires using HIDE() to prevent the optimiser from collapsing the logic
+ * into a constant.
+ */
+#define RUNTIME_CHECK(fn, val, res) \
+do {\
+typeof(fn(val)) real = fn(HIDE(val));   \
+\
+if ( real != res )  \
+panic("%s: %s(%s) expected %u, got %u\n",   \
+  __func__, #fn, #val, real, res);  \
+} while ( 0 )
+
+/*
+ * Perform compiletime and runtime checks for fn(val) == res.
+ */
+#define CHECK(fn, val, res) \
+do {\
+COMPILE_CHECK(fn, val, res);\
+RUNTIME_CHECK(fn, val, res);\
+} while ( 0 )
+
+#endif /* XEN_BOOT_CHECK_H */
diff --git a/xen/include/xen/compiler.h b/xen/include/xen/compiler.h
index 179ff23e62c5..444bf80142c7 100644
--- a/xen/include/xen/compiler.h
+++

[PATCH v2 for-4.19 00/13] xen/bitops: Untangle ffs()/fls() infrastructure

bitops.h is a mess.  It has grown organtically over many years, and forces
unreasonable repsonsibilities out into the per-arch stubs.

Start cleaning it up with ffs() and friends.  Across the board, this adds:

 * Functioning bitops without arch-specific asm
 * An option for arches to provide more optimal code generation
 * Compile-time constant folding
 * Testing at both compile time and during init that the basic operations
   behave according to spec.

and the only reason this series isn't a net reduction in code alone is the
because of the new unit testing.

This form is superior in many ways, including getting RISC-V support for free.

v2:
 * Many changes.  See patches for details
 * Include the fls() side of the infrastructure too.

Testing:
  https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1304664544
  https://cirrus-ci.com/github/andyhhp/xen/

Series-wide net bloat-o-meter:

  x86:   up/down: 51/-247 (-196)
  ARM64: up/down: 40/-400 (-360)

and PPC64 reproduced in full, just to demonstrate how absurd it was to have
generic_f?s() as static inlines...

  add/remove: 1/0 grow/shrink: 1/11 up/down: 228/-4832 (-4604)
  Function old new   delta
  init_constructors  - 220+220
  start_xen 92 100  +8
  alloc_heap_pages19801744-236
  xenheap_max_mfn  360 120-240
  free_heap_pages  784 536-248
  find_next_zero_bit   564 276-288
  find_next_bit548 260-288
  find_first_zero_bit  444 148-296
  find_first_bit   444 132-312
  xmem_pool_free  17761440-336
  __do_softirq 604 252-352
  init_heap_pages 23281416-912
  xmem_pool_alloc 29201596   -1324


Andrew Cooper (12):
  ppc/boot: Run constructors on boot
  xen/bitops: Cleanup ahead of rearrangements
  ARM/bitops: Change find_first_set_bit() to be a define
  xen/page_alloc: Coerce min(flsl(), foo) expressions to being unsigned
  xen/bitops: Implement generic_f?sl() in lib/
  xen/bitops: Implement ffs() in common logic
  x86/bitops: Improve arch_ffs() in the general case
  xen/bitops: Implement ffsl() in common logic
  xen/bitops: Replace find_first_set_bit() with ffsl() - 1
  xen/bitops: Delete find_first_set_bit()
  xen/bitops: Clean up ffs64()/fls64() definitions
  xen/bitops: Rearrange the top of xen/bitops.h

Oleksii Kurochko (1):
  xen/bitops: Implement fls()/flsl() in common logic

 xen/arch/arm/include/asm/arm32/bitops.h  |   2 -
 xen/arch/arm/include/asm/arm64/bitops.h  |  12 --
 xen/arch/arm/include/asm/bitops.h|  35 +---
 xen/arch/ppc/include/asm/bitops.h|  17 +-
 xen/arch/ppc/setup.c |   2 +
 xen/arch/x86/guest/xen/xen.c |   4 +-
 xen/arch/x86/hvm/dom0_build.c|   2 +-
 xen/arch/x86/hvm/hpet.c  |   8 +-
 xen/arch/x86/include/asm/bitops.h| 114 +++--
 xen/arch/x86/include/asm/pt-contig-markers.h |   2 +-
 xen/arch/x86/mm.c|   2 +-
 xen/arch/x86/mm/p2m-pod.c|   4 +-
 xen/common/Makefile  |   1 +
 xen/common/bitops.c  |  89 +++
 xen/common/page_alloc.c  |   6 +-
 xen/common/softirq.c |   2 +-
 xen/drivers/passthrough/amd/iommu_map.c  |   2 +-
 xen/drivers/passthrough/iommu.c  |   4 +-
 xen/drivers/passthrough/x86/iommu.c  |   4 +-
 xen/include/xen/bitops.h | 159 ---
 xen/include/xen/boot-check.h |  60 +++
 xen/include/xen/compiler.h   |   3 +-
 xen/lib/Makefile |   2 +
 xen/lib/generic-ffsl.c   |  65 
 xen/lib/generic-flsl.c   |  68 
 25 files changed, 444 insertions(+), 225 deletions(-)
 create mode 100644 xen/common/bitops.c
 create mode 100644 xen/include/xen/boot-check.h
 create mode 100644 xen/lib/generic-ffsl.c
 create mode 100644 xen/lib/generic-flsl.c

-- 
2.30.2

[PATCH v2 05/13] xen/bitops: Implement generic_f?sl() in lib/

generic_f?s() being static inline is the cause of lots of the complexity
between the common and arch-specific bitops.h

They appear to be static inline for constant-folding reasons (ARM uses them
for this), but there are better ways to achieve the same effect.

It is presumptuous that an unrolled binary search is the right algorithm to
use on all microarchitectures.  Indeed, it's not for the eventual users, but
that can be addressed at a later point.

It is also nonsense to implement the int form as the base primitive and
construct the long form from 2x int in 64-bit builds, when it's just one extra
step to operate at the native register width.

Therefore, implement generic_f?sl() in lib/.  They're not actually needed in
x86/ARM/PPC by the end of the cleanup (i.e. the functions will be dropped by
the linker), and they're only expected be needed by RISC-V on hardware which
lacks the Zbb extension.

Implement generic_fls() in terms of generic_flsl() for now, but this will be
cleaned up in due course.

Provide basic runtime testing using __constructor inside the lib/ file.  This
is important, as it means testing runs if and only if generic_f?sl() are used
elsewhere in Xen.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * New

I suspect we want to swap CONFIG_DEBUG for CONFIG_BOOT_UNIT_TESTS in due
course.  These ought to be able to be used in a release build too.
---
 xen/arch/arm/include/asm/bitops.h |  2 +-
 xen/arch/ppc/include/asm/bitops.h |  2 +-
 xen/include/xen/bitops.h  | 89 ++-
 xen/lib/Makefile  |  2 +
 xen/lib/generic-ffsl.c| 65 ++
 xen/lib/generic-flsl.c| 68 +++
 6 files changed, 142 insertions(+), 86 deletions(-)
 create mode 100644 xen/lib/generic-ffsl.c
 create mode 100644 xen/lib/generic-flsl.c

diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h
index 199252201291..ec1cf7b9b323 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -150,7 +150,7 @@ static inline int fls(unsigned int x)
 int ret;
 
 if (__builtin_constant_p(x))
-   return generic_fls(x);
+   return generic_flsl(x);
 
 asm("clz\t%"__OP32"0, %"__OP32"1" : "=r" (ret) : "r" (x));
 return 32 - ret;
diff --git a/xen/arch/ppc/include/asm/bitops.h 
b/xen/arch/ppc/include/asm/bitops.h
index bea655796d64..ab692d01717b 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -172,7 +172,7 @@ static inline int __test_and_clear_bit(int nr, volatile 
void *addr)
 }
 
 #define flsl(x) generic_flsl(x)
-#define fls(x) generic_fls(x)
+#define fls(x) generic_flsl(x)
 #define ffs(x) ({ unsigned int t_ = (x); fls(t_ & -t_); })
 #define ffsl(x) ({ unsigned long t_ = (x); flsl(t_ & -t_); })
 
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index 9b40f20381a2..cd405df96180 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -15,91 +15,12 @@
 (((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LLONG - 1 - (h
 
 /*
- * ffs: find first bit set. This is defined the same way as
- * the libc and compiler builtin ffs routines, therefore
- * differs in spirit from the above ffz (man ffs).
- */
-
-static inline int generic_ffs(unsigned int x)
-{
-int r = 1;
-
-if (!x)
-return 0;
-if (!(x & 0x)) {
-x >>= 16;
-r += 16;
-}
-if (!(x & 0xff)) {
-x >>= 8;
-r += 8;
-}
-if (!(x & 0xf)) {
-x >>= 4;
-r += 4;
-}
-if (!(x & 3)) {
-x >>= 2;
-r += 2;
-}
-if (!(x & 1)) {
-x >>= 1;
-r += 1;
-}
-return r;
-}
-
-/*
- * fls: find last bit set.
+ * Find First/Last Set bit.
+ *
+ * Bits are labelled from 1.  Returns 0 if given 0.
  */
-
-static inline int generic_fls(unsigned int x)
-{
-int r = 32;
-
-if (!x)
-return 0;
-if (!(x & 0xu)) {
-x <<= 16;
-r -= 16;
-}
-if (!(x & 0xff00u)) {
-x <<= 8;
-r -= 8;
-}
-if (!(x & 0xf000u)) {
-x <<= 4;
-r -= 4;
-}
-if (!(x & 0xc000u)) {
-x <<= 2;
-r -= 2;
-}
-if (!(x & 0x8000u)) {
-x <<= 1;
-r -= 1;
-}
-return r;
-}
-
-#if BITS_PER_LONG == 64
-
-static inline int generic_ffsl(unsigned long x)
-{
-return !x || (u32)x ? generic_ffs(x) : generic_ffs(x >> 32) + 32;
-}
-
-static inline int generic_flsl(unsigned long x)
-{
-u32 h = x >> 32;
-
-return h ? generic_fls(h) + 32 : generic_fls(x);
-}
-
-#else
-# define generic_ffsl generic_ffs
-# define

[PATCH v2 12/13] xen/bitops: Clean up ffs64()/fls64() definitions

Implement ffs64() and fls64() as plain static inlines, dropping the ifdefary
and intermediate generic_f?s64() forms.

Add tests for all interesting bit positions at 32bit boundaries.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * Use ULL rather than a uint64_t cast.
 * Extend to fls64() too.
---
 xen/common/bitops.c  | 32 ++
 xen/include/xen/bitops.h | 42 +++-
 2 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/xen/common/bitops.c b/xen/common/bitops.c
index b4845d9e84d1..5482e5a1218d 100644
--- a/xen/common/bitops.c
+++ b/xen/common/bitops.c
@@ -24,6 +24,22 @@ static void __init test_ffs(void)
 CHECK(ffsl, 1UL << 32, 33);
 CHECK(ffsl, 1UL << 63, 64);
 #endif
+
+/*
+ * unsigned int ffs64(uint64_t)
+ *
+ * 32-bit builds of Xen have to split this into two adjacent operations,
+ * so test all interesting bit positions across the divide.
+ */
+CHECK(ffs64, 0, 0);
+CHECK(ffs64, 1, 1);
+CHECK(ffs64, 3, 1);
+CHECK(ffs64, 7, 1);
+CHECK(ffs64, 6, 2);
+
+CHECK(ffs64, 0x80008000ULL, 32);
+CHECK(ffs64, 0x8001ULL, 33);
+CHECK(ffs64, 0x8000ULL, 64);
 }
 
 static void __init test_fls(void)
@@ -48,6 +64,22 @@ static void __init test_fls(void)
 CHECK(flsl, 1 | (1UL << 32), 33);
 CHECK(flsl, 1 | (1UL << 63), 64);
 #endif
+
+/*
+ * unsigned int ffl64(uint64_t)
+ *
+ * 32-bit builds of Xen have to split this into two adjacent operations,
+ * so test all interesting bit positions across the divide.
+ */
+CHECK(fls64, 0, 0);
+CHECK(fls64, 1, 1);
+CHECK(fls64, 3, 2);
+CHECK(fls64, 7, 3);
+CHECK(fls64, 6, 3);
+
+CHECK(fls64, 0x8001ULL, 32);
+CHECK(fls64, 0x00010001ULL, 33);
+CHECK(fls64, 0x8001ULL, 64);
 }
 
 static void __init __constructor test_bitops(void)
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index e7df6377372d..c5518d2c8552 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -60,6 +60,14 @@ static always_inline __pure unsigned int ffsl(unsigned long 
x)
 #endif
 }
 
+static always_inline __pure unsigned int ffs64(uint64_t x)
+{
+if ( BITS_PER_LONG == 64 )
+return ffsl(x);
+else
+return !x || (uint32_t)x ? ffs(x) : ffs(x >> 32) + 32;
+}
+
 static always_inline __pure unsigned int fls(unsigned int x)
 {
 if ( __builtin_constant_p(x) )
@@ -84,6 +92,18 @@ static always_inline __pure unsigned int flsl(unsigned long 
x)
 #endif
 }
 
+static always_inline __pure unsigned int fls64(uint64_t x)
+{
+if ( BITS_PER_LONG == 64 )
+return flsl(x);
+else
+{
+uint32_t h = x >> 32;
+
+return h ? fls(h) + 32 : fls(x);
+}
+}
+
 /* - Please tidy below here - */
 
 #ifndef find_next_bit
@@ -134,28 +154,6 @@ extern unsigned long find_first_zero_bit(const unsigned 
long *addr,
  unsigned long size);
 #endif
 
-#if BITS_PER_LONG == 64
-# define fls64 flsl
-# define ffs64 ffsl
-#else
-# ifndef ffs64
-static inline int generic_ffs64(__u64 x)
-{
-return !x || (__u32)x ? ffs(x) : ffs(x >> 32) + 32;
-}
-#  define ffs64 generic_ffs64
-# endif
-# ifndef fls64
-static inline int generic_fls64(__u64 x)
-{
-__u32 h = x >> 32;
-
-return h ? fls(h) + 32 : fls(x);
-}
-#  define fls64 generic_fls64
-# endif
-#endif
-
 static inline int get_bitmask_order(unsigned int count)
 {
 int order;
-- 
2.30.2

[PATCH v2 08/13] xen/bitops: Implement ffsl() in common logic

Just like ffs() in the previous changes.  Express the upper bound of the
testing in terms of BITS_PER_LONG as it varies between architectures.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * Swap to #if BITS_PER_LONG > 32 to avoid a compile error on arm32
 * Changes to mirror ffs() v2.
---
 xen/arch/arm/include/asm/bitops.h |  2 +-
 xen/arch/ppc/include/asm/bitops.h |  2 +-
 xen/arch/x86/include/asm/bitops.h | 35 ---
 xen/common/bitops.c   | 13 
 xen/include/xen/bitops.h  | 12 +++
 5 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h
index a88ec2612e16..ba39802c9de3 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -158,7 +158,7 @@ static inline int fls(unsigned int x)
 
 
 #define arch_ffs(x)  ((x) ? 1 + __builtin_ctz(x) : 0)
-#define ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
+#define arch_ffsl(x) ((x) ? 1 + __builtin_ctzl(x) : 0)
 
 /**
  * find_first_set_bit - find the first set bit in @word
diff --git a/xen/arch/ppc/include/asm/bitops.h 
b/xen/arch/ppc/include/asm/bitops.h
index 5c36a6cc0ce3..ce0f6436f727 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -174,7 +174,7 @@ static inline int __test_and_clear_bit(int nr, volatile 
void *addr)
 #define flsl(x) generic_flsl(x)
 #define fls(x) generic_flsl(x)
 #define arch_ffs(x)  ((x) ? 1 + __builtin_ctz(x) : 0)
-#define ffsl(x) ({ unsigned long t_ = (x); flsl(t_ & -t_); })
+#define arch_ffsl(x) ((x) ? 1 + __builtin_ctzl(x) : 0)
 
 /**
  * hweightN - returns the hamming weight of a N-bit word
diff --git a/xen/arch/x86/include/asm/bitops.h 
b/xen/arch/x86/include/asm/bitops.h
index 1d7aea6065ef..51d3c0f40473 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -413,23 +413,6 @@ static inline unsigned int find_first_set_bit(unsigned 
long word)
 return (unsigned int)word;
 }
 
-/**
- * ffs - find first bit set
- * @x: the word to search
- *
- * This is defined the same way as the libc and compiler builtin ffs routines.
- */
-static inline int ffsl(unsigned long x)
-{
-long r;
-
-asm ( "bsf %1,%0\n\t"
-  "jnz 1f\n\t"
-  "mov $-1,%0\n"
-  "1:" : "=r" (r) : "rm" (x));
-return (int)r+1;
-}
-
 static always_inline unsigned int arch_ffs(unsigned int x)
 {
 unsigned int r;
@@ -458,6 +441,24 @@ static always_inline unsigned int arch_ffs(unsigned int x)
 }
 #define arch_ffs arch_ffs
 
+static always_inline unsigned int arch_ffsl(unsigned long x)
+{
+unsigned int r;
+
+/* See arch_ffs() for safety discussions. */
+if ( __builtin_constant_p(x > 0) && x > 0 )
+asm ( "bsf %[val], %q[res]"
+  : [res] "=r" (r)
+  : [val] "rm" (x) );
+else
+asm ( "bsf %[val], %q[res]"
+  : [res] "=r" (r)
+  : [val] "rm" (x), "[res]" (-1) );
+
+return r + 1;
+}
+#define arch_ffsl arch_ffsl
+
 /**
  * fls - find last bit set
  * @x: the word to search
diff --git a/xen/common/bitops.c b/xen/common/bitops.c
index 8c161b8ea7fa..b3813f818198 100644
--- a/xen/common/bitops.c
+++ b/xen/common/bitops.c
@@ -11,6 +11,19 @@ static void __init test_ffs(void)
 CHECK(ffs, 7, 1);
 CHECK(ffs, 6, 2);
 CHECK(ffs, 0x8000U, 32);
+
+/* unsigned int ffsl(unsigned long) */
+CHECK(ffsl, 0, 0);
+CHECK(ffsl, 1, 1);
+CHECK(ffsl, 3, 1);
+CHECK(ffsl, 7, 1);
+CHECK(ffsl, 6, 2);
+
+CHECK(ffsl, 1UL << (BITS_PER_LONG - 1), BITS_PER_LONG);
+#if BITS_PER_LONG > 32
+CHECK(ffsl, 1UL << 32, 33);
+CHECK(ffsl, 1UL << 63, 64);
+#endif
 }
 
 static void __init __constructor test_bitops(void)
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index f7e90a2893a5..88cf27a88bcf 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -48,6 +48,18 @@ static always_inline __pure unsigned int ffs(unsigned int x)
 #endif
 }
 
+static always_inline __pure unsigned int ffsl(unsigned long x)
+{
+if ( __builtin_constant_p(x) )
+return __builtin_ffsl(x);
+
+#ifdef arch_ffs
+return arch_ffsl(x);
+#else
+return generic_ffsl(x);
+#endif
+}
+
 /* - Please tidy below here - */
 
 #ifndef find_next_bit
-- 
2.30.2

[PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case

The asm in arch_ffs() is safe but inefficient.

CMOV would be an improvement over a conditional branch, but for 64bit CPUs
both Intel and AMD have provided enough details about the behaviour for a zero
input.  It is safe to pre-load the destination register with -1 and drop the
conditional logic.

However, it is common to find ffs() in a context where the optimiser knows
that x in nonzero even if it the value isn't known precisely, and in that case
it's safe to drop the preload of -1 too.

There are only a handful of uses of ffs() in the x86 build, and all of them
improve as a result of this:

  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31)
  Function old new   delta
  mask_write   114 107  -7
  xmem_pool_alloc 10631039 -24

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * New.
 * Use __builtin_constant_p(x > 0) to optimise better.
---
 xen/arch/x86/include/asm/bitops.h | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/include/asm/bitops.h 
b/xen/arch/x86/include/asm/bitops.h
index 122767fc0d10..1d7aea6065ef 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
 
 static always_inline unsigned int arch_ffs(unsigned int x)
 {
-int r;
+unsigned int r;
+
+if ( __builtin_constant_p(x > 0) && x > 0 )
+{
+/* Safe, when the compiler knows that x is nonzero. */
+asm ( "bsf %[val], %[res]"
+  : [res] "=r" (r)
+  : [val] "rm" (x) );
+}
+else
+{
+/*
+ * The AMD manual states that BSF won't modify the destination
+ * register if x=0.  The Intel manual states that the result is
+ * undefined, but the architects have said that the register is
+ * written back with it's old value (zero extended as normal).
+ */
+asm ( "bsf %[val], %[res]"
+  : [res] "=r" (r)
+  : [val] "rm" (x), "[res]" (-1) );
+}
 
-asm ( "bsf %1,%0\n\t"
-  "jnz 1f\n\t"
-  "mov $-1,%0\n"
-  "1:" : "=r" (r) : "rm" (x));
 return r + 1;
 }
 #define arch_ffs arch_ffs
-- 
2.30.2

[PATCH v2 10/13] xen/bitops: Delete find_first_set_bit()

No more users.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * Reorder from later in the series to keep ARM bisectable
---
 xen/arch/arm/include/asm/bitops.h |  9 -
 xen/arch/ppc/include/asm/bitops.h |  9 -
 xen/arch/x86/include/asm/bitops.h | 12 
 3 files changed, 30 deletions(-)

diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h
index ba39802c9de3..d30ba44598e3 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -160,15 +160,6 @@ static inline int fls(unsigned int x)
 #define arch_ffs(x)  ((x) ? 1 + __builtin_ctz(x) : 0)
 #define arch_ffsl(x) ((x) ? 1 + __builtin_ctzl(x) : 0)
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- *
- * Returns the bit-number of the first set bit (first bit being 0).
- * The input must *not* be zero.
- */
-#define find_first_set_bit(w) (ffsl(w) - 1)
-
 /**
  * hweightN - returns the hamming weight of a N-bit word
  * @x: the word to weigh
diff --git a/xen/arch/ppc/include/asm/bitops.h 
b/xen/arch/ppc/include/asm/bitops.h
index ce0f6436f727..761361291e6f 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -187,13 +187,4 @@ static inline int __test_and_clear_bit(int nr, volatile 
void *addr)
 #define hweight16(x) __builtin_popcount((uint16_t)(x))
 #define hweight8(x)  __builtin_popcount((uint8_t)(x))
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- *
- * Returns the bit-number of the first set bit (first bit being 0).
- * The input must *not* be zero.
- */
-#define find_first_set_bit(x) (ffsl(x) - 1)
-
 #endif /* _ASM_PPC_BITOPS_H */
diff --git a/xen/arch/x86/include/asm/bitops.h 
b/xen/arch/x86/include/asm/bitops.h
index 51d3c0f40473..830e488f33a0 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long 
val, unsigned int max)
 r__;\
 })
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- * 
- * Returns the bit-number of the first set bit. The input must *not* be zero.
- */
-static inline unsigned int find_first_set_bit(unsigned long word)
-{
-asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
-return (unsigned int)word;
-}
-
 static always_inline unsigned int arch_ffs(unsigned int x)
 {
 unsigned int r;
-- 
2.30.2

[PATCH v2 03/13] ARM/bitops: Change find_first_set_bit() to be a define

This is in order to maintain bisectability through the subsequent changes, as
the order of definitions is altered.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * New
---
 xen/arch/arm/include/asm/bitops.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h
index ab030b6cb032..199252201291 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -167,10 +167,7 @@ static inline int fls(unsigned int x)
  * Returns the bit-number of the first set bit (first bit being 0).
  * The input must *not* be zero.
  */
-static inline unsigned int find_first_set_bit(unsigned long word)
-{
-return ffsl(word) - 1;
-}
+#define find_first_set_bit(w) (ffsl(w) - 1)
 
 /**
  * hweightN - returns the hamming weight of a N-bit word
-- 
2.30.2

[PATCH v2 04/13] xen/page_alloc: Coerce min(flsl(), foo) expressions to being unsigned

This is in order to maintain bisectability through the subsequent changes,
where flsl() changes sign-ness non-atomically by architecture.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * New
---
 xen/common/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 7c1bdfc046bf..8d3342e95236 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1842,7 +1842,7 @@ static void _init_heap_pages(const struct page_info *pg,
  * Note that the value of ffsl() and flsl() starts from 1 so we need
  * to decrement it by 1.
  */
-unsigned int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
+unsigned int inc_order = min(MAX_ORDER + 0U, flsl(e - s) - 1U);
 
 if ( s )
 inc_order = min(inc_order, ffsl(s) - 1U);
@@ -2266,7 +2266,7 @@ void __init xenheap_max_mfn(unsigned long mfn)
 ASSERT(!first_node_initialised);
 ASSERT(!xenheap_bits);
 BUILD_BUG_ON((PADDR_BITS - PAGE_SHIFT) >= BITS_PER_LONG);
-xenheap_bits = min(flsl(mfn + 1) - 1 + PAGE_SHIFT, PADDR_BITS);
+xenheap_bits = min(flsl(mfn + 1) - 1U + PAGE_SHIFT, PADDR_BITS + 0U);
 printk(XENLOG_INFO "Xen heap: %u bits\n", xenheap_bits);
 }
 
-- 
2.30.2

[PATCH v2 01/13] ppc/boot: Run constructors on boot

PPC collects constructors, but doesn't run them yet.  Do so.

They'll shortly be used to confirm correct behaviour of the bitops primitives.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

CI: https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/6931084695

v2:
 * New

RISC-V collects them too, but can't call init_constructors() until lib/ctors.c
is included in the build.

Constructors is the only way to get these tests working on PPC/RISC-V as
neither suvivie boot with initcalls() active.  Then again, initcalls() are
just a not-invented-here constructor, and we'd probably do well to move them
over..
---
 xen/arch/ppc/setup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/ppc/setup.c b/xen/arch/ppc/setup.c
index 101bdd8bb648..7fe06aa4bfb0 100644
--- a/xen/arch/ppc/setup.c
+++ b/xen/arch/ppc/setup.c
@@ -39,6 +39,8 @@ void __init noreturn start_xen(unsigned long r3, unsigned 
long r4,
 
 setup_initial_pagetables();
 
+init_constructors();
+
 early_printk("Hello, ppc64le!\n");
 
 for ( ; ; )
-- 
2.30.2

[xen-unstable test] 186132: tolerable FAIL - PUSHED

flight 186132 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186132/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186078
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186105
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186105
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186105
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186105
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186105
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  2a40b106e92aaa7ce808c8608dd6473edc67f608
baseline version:
 xen  ced21fbb2842ac4655048bdee56232974ff9ff9c

Last test of basis   186105  2024-05-23 09:38:07 Z1 days
Testing same since   186132  2024-05-24 04:13:21 Z0 days1 attempts


People who touched revisions under test:
  Alejandro Vallejo 
  Alessandro Zucchelli 
  Andrew Cooper 
  Bobby Eshleman 
  Christian Lindig 
  George Dunlap 
  Jan Beulich 
  Julien Grall 
  Olaf Hering 
  Oleksandr Andrushchenko 
  Oleksii Kurochko 
  Roger Pau Monné 
  Stewart Hildebrand 
  Tamas K Lengyel 
  Volodymyr Babchuk 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64

[xen-unstable-smoke test] 186139: tolerable all pass - PUSHED

flight 186139 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186139/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  2172a01c4cecbaa1d79bad200bfe3b996a3e4ba5
baseline version:
 xen  2a40b106e92aaa7ce808c8608dd6473edc67f608

Last test of basis   186117  2024-05-23 17:02:09 Z1 days
Failing since186136  2024-05-24 14:02:11 Z0 days2 attempts
Testing same since   186139  2024-05-24 17:00:22 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  George Dunlap 
  Henry Wang 
  Henry Wang 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   2a40b106e9..2172a01c4c  2172a01c4cecbaa1d79bad200bfe3b996a3e4ba5 -> smoke

[ovmf test] 186137: all pass - PUSHED

flight 186137 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186137/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 3e722403cd16388a0e4044e705a2b34c841d76ca
baseline version:
 ovmf 7142e648416ff5d3eac6c6d607874805f5de0ca8

Last test of basis   186054  2024-05-21 02:43:06 Z3 days
Testing same since   186137  2024-05-24 16:12:58 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 
  Doug Flick 
  Doug Flick [MSFT] 
  Flickdm 
  Gerd Hoffmann 
  Jiewe Yao 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   7142e64841..3e722403cd  3e722403cd16388a0e4044e705a2b34c841d76ca -> 
xen-tested-master

Re: [GIT PULL] xen: branch for v6.10-rc1

2024-05-24 Thread pr-tracker-bot

The pull request you sent on Fri, 24 May 2024 15:37:33 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
> for-linus-6.10a-rc1-tag

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/9351f138d1dcbe504cd829abe590ba7f3387f09c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

Re: [PATCH v2 8/8] xen/x86: Synthesise domain topologies

On 24/05/2024 09:58, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:27PM +0100, Alejandro Vallejo wrote:
>> Expose sensible topologies in leaf 0xb. At the moment it synthesises non-HT
>> systems, in line with the previous code intent.
>>
>> Signed-off-by: Alejandro Vallejo 
>> ---
>> v2:
>>   * Zap the topology leaves of (pv/hvm)_(def/max)_policy rather than the 
>> host policy
>> ---
>>  tools/libs/guest/xg_cpuid_x86.c | 62 +
>>  xen/arch/x86/cpu-policy.c   |  9 +++--
>>  2 files changed, 15 insertions(+), 56 deletions(-)
>>
>> diff --git a/tools/libs/guest/xg_cpuid_x86.c 
>> b/tools/libs/guest/xg_cpuid_x86.c
>> index 4453178100ad..8170769dbe43 100644
>> --- a/tools/libs/guest/xg_cpuid_x86.c
>> +++ b/tools/libs/guest/xg_cpuid_x86.c
>> @@ -584,7 +584,7 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t 
>> domid, bool restore,
>>  bool hvm;
>>  xc_domaininfo_t di;
>>  struct xc_cpu_policy *p = xc_cpu_policy_init();
>> -unsigned int i, nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
>> +unsigned int nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
>>  uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
>>  uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
>>  uint32_t len = ARRAY_SIZE(host_featureset);
>> @@ -727,59 +727,15 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t 
>> domid, bool restore,
>>  }
>>  else
>>  {
>> -/*
>> - * Topology for HVM guests is entirely controlled by Xen.  For now, 
>> we
>> - * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
>> - */
>> -p->policy.basic.htt = true;
>> -p->policy.extd.cmp_legacy = false;
>> -
>> -/*
>> - * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
>> - * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
>> - * overflow.
>> - */
>> -if ( !p->policy.basic.lppp )
>> -p->policy.basic.lppp = 2;
>> -else if ( !(p->policy.basic.lppp & 0x80) )
>> -p->policy.basic.lppp *= 2;
>> -
>> -switch ( p->policy.x86_vendor )
>> +/* TODO: Expose the ability to choose a custom topology for HVM/PVH 
>> */
>> +unsigned int threads_per_core = 1;
>> +unsigned int cores_per_pkg = di.max_vcpu_id + 1;
> 
> Newline.

ack

> 
>> +rc = x86_topo_from_parts(>policy, threads_per_core, 
>> cores_per_pkg);
> 
> I assume this generates the same topology as the current code, or will
> the population of the leaves be different in some way?
> 

The current code does not populate 0xb. This generates a topology
consistent with the existing INTENDED topology. The actual APIC IDs will
be different though (because there's no skipping of odd values).

All the dance in patch 1 was to make this migrate-safe. The x2apic ID is
stored in the lapic hidden regs so differences with previous behaviour
don't matter.

IOW, The differences are:
  * 0xb is exposed, whereas previously it wasn't
  * APIC IDs are compacted such that new_apicid=old_apicid/2
  * There's also a cleanup of the murkier paths to put the right core
counts in the right leaves (whereas previously it was bonkers)

>> +if ( rc )
>>  {
>> -case X86_VENDOR_INTEL:
>> -for ( i = 0; (p->policy.cache.subleaf[i].type &&
>> -  i < ARRAY_SIZE(p->policy.cache.raw)); ++i )
>> -{
>> -p->policy.cache.subleaf[i].cores_per_package =
>> -(p->policy.cache.subleaf[i].cores_per_package << 1) | 1;
>> -p->policy.cache.subleaf[i].threads_per_cache = 0;
>> -}
>> -break;
>> -
>> -case X86_VENDOR_AMD:
>> -case X86_VENDOR_HYGON:
>> -/*
>> - * Leaf 0x8008 ECX[15:12] is ApicIdCoreSize.
>> - * Leaf 0x8008 ECX[7:0] is NumberOfCores (minus one).
>> - * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
>> - * - overflow,
>> - * - going out of sync with leaf 1 EBX[23:16],
>> - * - incrementing ApicIdCoreSize when it's zero (which changes 
>> the
>> - *   meaning of bits 7:0).
>> - *
>> - * UPDATE: I addition to avoiding overflow, some
>> - * proprietary operating systems have trouble with
>> - * apic_id_size values greater than 7.  Limit the value to
>> - * 7 for now.
>> - */
>> -if ( p->policy.extd.nc < 0x7f )
>> -{
>> -if ( p->policy.extd.apic_id_size != 0 && 
>> p->policy.extd.apic_id_size < 0x7 )
>> -p->policy.extd.apic_id_size++;
>> -
>> -p->policy.extd.nc = (p->policy.extd.nc << 1) | 1;
>> -}
>> -break;
>> +ERROR("Failed to generate topology: t/c=%u c/p=%u",
>> +  threads_per_core,

Re: [PATCH v2 7/8] xen/x86: Derive topologically correct x2APIC IDs from the policy

On 24/05/2024 09:39, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:26PM +0100, Alejandro Vallejo wrote:
>> Implements the helper for mapping vcpu_id to x2apic_id given a valid
>> topology in a policy. The algo is written with the intention of extending
>> it to leaves 0x1f and e26 in the future.
> 
> Using 0x1f and e26 is kind of confusing.  I would word as "0x1f and
> extended leaf 0x26" to avoid confusion.
> 
>>
>> Toolstack doesn't set leaf 0xb and the HVM default policy has it cleared,
>> so the leaf is not implemented. In that case, the new helper just returns
>> the legacy mapping.
>>
>> Signed-off-by: Alejandro Vallejo 
>> ---
>> v2:
>>   * const-ify the test definitions
>>   * Cosmetic changes (newline + parameter name in prototype)
>> ---
>>  tools/tests/cpu-policy/test-cpu-policy.c | 63 
>>  xen/include/xen/lib/x86/cpu-policy.h |  2 +
>>  xen/lib/x86/policy.c | 73 ++--
>>  3 files changed, 133 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/tests/cpu-policy/test-cpu-policy.c 
>> b/tools/tests/cpu-policy/test-cpu-policy.c
>> index 0ba8c418b1b3..82a6aeb23317 100644
>> --- a/tools/tests/cpu-policy/test-cpu-policy.c
>> +++ b/tools/tests/cpu-policy/test-cpu-policy.c
>> @@ -776,6 +776,68 @@ static void test_topo_from_parts(void)
>>  }
>>  }
>>  
>> +static void test_x2apic_id_from_vcpu_id_success(void)
>> +{
>> +static const struct test {
>> +unsigned int vcpu_id;
>> +unsigned int threads_per_core;
>> +unsigned int cores_per_pkg;
>> +uint32_t x2apic_id;
>> +uint8_t x86_vendor;
>> +} tests[] = {
>> +{
>> +.vcpu_id = 3, .threads_per_core = 3, .cores_per_pkg = 8,
>> +.x2apic_id = 1 << 2,
>> +},
>> +{
>> +.vcpu_id = 6, .threads_per_core = 3, .cores_per_pkg = 8,
>> +.x2apic_id = 2 << 2,
>> +},
>> +{
>> +.vcpu_id = 24, .threads_per_core = 3, .cores_per_pkg = 8,
>> +.x2apic_id = 1 << 5,
>> +},
>> +{
>> +.vcpu_id = 35, .threads_per_core = 3, .cores_per_pkg = 8,
>> +.x2apic_id = (35 % 3) | (((35 / 3) % 8)  << 2) | ((35 / 24) << 
>> 5),
>> +},
>> +{
>> +.vcpu_id = 96, .threads_per_core = 7, .cores_per_pkg = 3,
>> +.x2apic_id = (96 % 7) | (((96 / 7) % 3)  << 3) | ((96 / 21) << 
>> 5),
>   ^ extra space (same 
> above)
> 
>> +},
>> +};
>> +
>> +const uint8_t vendors[] = {
>> +X86_VENDOR_INTEL,
>> +X86_VENDOR_AMD,
>> +X86_VENDOR_CENTAUR,
>> +X86_VENDOR_SHANGHAI,
>> +X86_VENDOR_HYGON,
>> +};
>> +
>> +printf("Testing x2apic id from vcpu id success:\n");
>> +
>> +/* Perform the test run on every vendor we know about */
>> +for ( size_t i = 0; i < ARRAY_SIZE(vendors); ++i )
>> +{
>> +struct cpu_policy policy = { .x86_vendor = vendors[i] };
> 
> Newline.

Ack

> 
>> +for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
>> +{
>> +const struct test *t = [i];
>> +uint32_t x2apic_id;
>> +int rc = x86_topo_from_parts(, t->threads_per_core, 
>> t->cores_per_pkg);
> 
> Overly long line.
> 
> Won't it be better to define `policy` in this scope, so that for each
> test you start with a clean policy, rather than having leftover data
> from the previous test?

The leftover data is overridden during setup by x86_topo_from_parts(),
but I can see the appeal. Sure.

> 
> Also you could initialize x2apic_id at definition:
> 
> const struct test *t = [j];
> struct cpu_policy policy = { .x86_vendor = vendors[i] };
> int rc = x86_topo_from_parts(, t->threads_per_core, t->cores_per_pkg);
> uint32_t x2apic_id = x86_x2apic_id_from_vcpu_id(, t->vcpu_id);

Seeing this snippet I just realized there's a bug. The second loop
should use j rather than i. Ugh.

As for the initialization, I want to prevent feeding garbage into
x86_x2apic_id_from_vcpu_id(). For which there's an "if ( !rc )" missing
to gate the call.

I'll sort both of those.

> 
>> +
>> +x2apic_id = x86_x2apic_id_from_vcpu_id(, t->vcpu_id);
>> +if ( rc || x2apic_id != t->x2apic_id )
>> +fail("FAIL[%d] - '%s cpu%u %u t/c %u c/p'. bad x2apic_id: 
>> expected=%u actual=%u\n",
>> + rc,
>> + x86_cpuid_vendor_to_str(policy.x86_vendor),
>> + t->vcpu_id, t->threads_per_core, t->cores_per_pkg,
>> + t->x2apic_id, x2apic_id);
>> +}
>> +}
>> +}
>> +
>>  int main(int argc, char **argv)
>>  {
>>  printf("CPU Policy unit tests\n");
>> @@ -794,6 +856,7 @@ int main(int argc, char **argv)
>>  test_is_compatible_failure();
>>  
>>  test_topo_from_parts();
>> +test_x2apic_id_from_vcpu_id_success();
>>  
>>  if ( nr_failures )
>>  printf("Done: %u

[xen-unstable-smoke test] 186136: regressions - FAIL

flight 186136 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186136/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 186117

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  701190abb3d5873cf77349b3d90bd11ded74871a
baseline version:
 xen  2a40b106e92aaa7ce808c8608dd6473edc67f608

Last test of basis   186117  2024-05-23 17:02:09 Z0 days
Testing same since   186136  2024-05-24 14:02:11 Z0 days1 attempts


People who touched revisions under test:
  Henry Wang 
  Henry Wang 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  fail
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 701190abb3d5873cf77349b3d90bd11ded74871a
Author: Henry Wang 
Date:   Thu Mar 21 11:57:06 2024 +0800

xen/arm: Set correct per-cpu cpu_core_mask

In the common sysctl command XEN_SYSCTL_physinfo, the value of
cores_per_socket is calculated based on the cpu_core_mask of CPU0.
Currently on Arm this is a fixed value 1 (can be checked via xl info),
which is not correct. This is because during the Arm CPU online
process at boot time, setup_cpu_sibling_map() only sets the per-cpu
cpu_core_mask for itself.

cores_per_socket refers to the number of cores that belong to the same
socket (NUMA node). Currently Xen on Arm does not support physical
CPU hotplug and NUMA, also we assume there is no multithread. Therefore
cores_per_socket means all possible CPUs detected from the device
tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map()
accordingly. Modify the in-code comment which seems to be outdated. Add
a warning to users if Xen is running on processors with multithread
support.

Signed-off-by: Henry Wang 
Signed-off-by: Henry Wang 
Reviewed-by: Michal Orzel 
(qemu changes not included)

Re: [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves

On 23/05/2024 17:13, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:24PM +0100, Alejandro Vallejo wrote:
>> Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
>> populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
>> from hvmloader.
>>
>> While at this also remove ap_callin, as writing the APIC ID may serve the 
>> same
>> purpose.
>>
>> Signed-off-by: Alejandro Vallejo 
>> ---
>> v2:
>>   * New patch. Replaces adding cpu policy to hvmloader in v1.
>> ---
>>  tools/firmware/hvmloader/config.h|  6 -
>>  tools/firmware/hvmloader/hvmloader.c |  4 +--
>>  tools/firmware/hvmloader/smp.c   | 40 +++-
>>  tools/firmware/hvmloader/util.h  |  5 
>>  xen/arch/x86/include/asm/hvm/hvm.h   |  1 +
>>  5 files changed, 47 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/config.h 
>> b/tools/firmware/hvmloader/config.h
>> index c82adf6dc508..edf6fa9c908c 100644
>> --- a/tools/firmware/hvmloader/config.h
>> +++ b/tools/firmware/hvmloader/config.h
>> @@ -4,6 +4,8 @@
>>  #include 
>>  #include 
>>  
>> +#include 
>> +
>>  enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
>>  extern enum virtual_vga virtual_vga;
>>  
>> @@ -49,8 +51,10 @@ extern uint8_t ioapic_version;
>>  
>>  #define IOAPIC_ID   0x01
>>  
>> +extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
>> +
>>  #define LAPIC_BASE_ADDRESS  0xfee0
>> -#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
>> +#define LAPIC_ID(vcpu_id)   (CPU_TO_X2APICID[(vcpu_id)])
>>  
>>  #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
>>  #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>> diff --git a/tools/firmware/hvmloader/hvmloader.c 
>> b/tools/firmware/hvmloader/hvmloader.c
>> index c58841e5b556..1eba92229925 100644
>> --- a/tools/firmware/hvmloader/hvmloader.c
>> +++ b/tools/firmware/hvmloader/hvmloader.c
>> @@ -342,11 +342,11 @@ int main(void)
>>  
>>  printf("CPU speed is %u MHz\n", get_cpu_mhz());
>>  
>> +smp_initialise();
>> +
>>  apic_setup();
>>  pci_setup();
>>  
>> -smp_initialise();
>> -
>>  perform_tests();
>>  
>>  if ( bios->bios_info_setup )
>> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
>> index a668f15d7e1f..4d75f239c2f5 100644
>> --- a/tools/firmware/hvmloader/smp.c
>> +++ b/tools/firmware/hvmloader/smp.c
>> @@ -29,7 +29,34 @@
>>  
>>  #include 
>>  
>> -static int ap_callin, ap_cpuid;
>> +static int ap_cpuid;
>> +
>> +/**
>> + * Lookup table of x2APIC IDs.
>> + *
>> + * Each entry is populated its respective CPU as they come online. This is 
>> required
>> + * for generating the MADT with minimal assumptions about ID relationships.
>> + */
>> +uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
>> +
>> +static uint32_t read_apic_id(void)
>> +{
>> +uint32_t apic_id;
>> +
>> +cpuid(1, NULL, _id, NULL, NULL);
>> +apic_id >>= 24;
>> +
>> +/*
>> + * APIC IDs over 255 are represented by 255 in leaf 1 and are meant to 
>> be
>> + * read from topology leaves instead. Xen exposes x2APIC IDs in leaf 
>> 0xb,
>> + * but only if the x2APIC feature is present. If there are that many 
>> CPUs
>> + * it's guaranteed to be there so we can avoid checking for it 
>> specifically
>> + */
> 
> Maybe I'm missing something, but given the current code won't Xen just
> return the low 8 bits from the x2APIC ID?  I don't see any code in
> guest_cpuid() that adjusts the IDs to be 255 when > 255.>
>> +if ( apic_id == 255 )
>> +cpuid(0xb, NULL, NULL, NULL, _id);
> 
> Won't the correct logic be to check if x2APIC is set in CPUID, and
> then fetch the APIC ID from leaf 0xb, otherwise fallback to fetching
> the APID ID from leaf 1?

I was pretty sure that was the behaviour of real HW, but clearly I was
wrong. Just checked on a beefy machine and that's indeed the low 8 bits,
just as Xen emulates. Got confused with the core count, that does clip
to 255.

Will adjust by explicitly checking for the x2apic_id bit.

> 
>> +
>> +return apic_id;
>> +}
>>  
>>  static void ap_start(void)
>>  {
>> @@ -37,12 +64,12 @@ static void ap_start(void)
>>  cacheattr_init();
>>  printf("done.\n");
>>  
>> +wmb();
>> +ACCESS_ONCE(CPU_TO_X2APICID[ap_cpuid]) = read_apic_id();
> 
> A comment would be helpful here, that CPU_TO_X2APICID[ap_cpuid] is
> used as synchronization that the AP has started.
> 
> You probably want to assert that read_apic_id() doesn't return 0,
> otherwise we are skewed.

Not a bad idea. Sure

> 
>> +
>>  if ( !ap_cpuid )
>>  return;
>>  
>> -wmb();
>> -ap_callin = 1;
>> -
>>  while ( 1 )
>>  asm volatile ( "hlt" );
>>  }
>> @@ -86,10 +113,11 @@ static void boot_cpu(unsigned int cpu)
>>  BUG();
>>  
>>  /*
>> - * Wait for the secondary processor to complete initialisation.
>> + * Wait for the secondary processor to complete initialisation,
>> + *

Re: [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves

On 24/05/2024 08:21, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:24PM +0100, Alejandro Vallejo wrote:
>> Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
>> populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
>> from hvmloader.
>>
>> While at this also remove ap_callin, as writing the APIC ID may serve the 
>> same
>> purpose.
>>
>> Signed-off-by: Alejandro Vallejo 
>> ---
>> v2:
>>   * New patch. Replaces adding cpu policy to hvmloader in v1.
>> ---
>>  tools/firmware/hvmloader/config.h|  6 -
>>  tools/firmware/hvmloader/hvmloader.c |  4 +--
>>  tools/firmware/hvmloader/smp.c   | 40 +++-
>>  tools/firmware/hvmloader/util.h  |  5 
>>  xen/arch/x86/include/asm/hvm/hvm.h   |  1 +
>>  5 files changed, 47 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/config.h 
>> b/tools/firmware/hvmloader/config.h
>> index c82adf6dc508..edf6fa9c908c 100644
>> --- a/tools/firmware/hvmloader/config.h
>> +++ b/tools/firmware/hvmloader/config.h
>> @@ -4,6 +4,8 @@
>>  #include 
>>  #include 
>>  
>> +#include 
>> +
>>  enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
>>  extern enum virtual_vga virtual_vga;
>>  
>> @@ -49,8 +51,10 @@ extern uint8_t ioapic_version;
>>  
>>  #define IOAPIC_ID   0x01
>>  
>> +extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
>> +
>>  #define LAPIC_BASE_ADDRESS  0xfee0
>> -#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
>> +#define LAPIC_ID(vcpu_id)   (CPU_TO_X2APICID[(vcpu_id)])
>>  
>>  #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
>>  #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>> diff --git a/tools/firmware/hvmloader/hvmloader.c 
>> b/tools/firmware/hvmloader/hvmloader.c
>> index c58841e5b556..1eba92229925 100644
>> --- a/tools/firmware/hvmloader/hvmloader.c
>> +++ b/tools/firmware/hvmloader/hvmloader.c
>> @@ -342,11 +342,11 @@ int main(void)
>>  
>>  printf("CPU speed is %u MHz\n", get_cpu_mhz());
>>  
>> +smp_initialise();
>> +
>>  apic_setup();
>>  pci_setup();
>>  
>> -smp_initialise();
>> -
>>  perform_tests();
>>  
>>  if ( bios->bios_info_setup )
>> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
>> index a668f15d7e1f..4d75f239c2f5 100644
>> --- a/tools/firmware/hvmloader/smp.c
>> +++ b/tools/firmware/hvmloader/smp.c
>> @@ -29,7 +29,34 @@
>>  
>>  #include 
>>  
>> -static int ap_callin, ap_cpuid;
>> +static int ap_cpuid;
>> +
>> +/**
>> + * Lookup table of x2APIC IDs.
>> + *
>> + * Each entry is populated its respective CPU as they come online. This is 
>> required
>> + * for generating the MADT with minimal assumptions about ID relationships.
>> + */
>> +uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
>> +
>> +static uint32_t read_apic_id(void)
>> +{
>> +uint32_t apic_id;
>> +
>> +cpuid(1, NULL, _id, NULL, NULL);
>> +apic_id >>= 24;
>> +
>> +/*
>> + * APIC IDs over 255 are represented by 255 in leaf 1 and are meant to 
>> be
>> + * read from topology leaves instead. Xen exposes x2APIC IDs in leaf 
>> 0xb,
>> + * but only if the x2APIC feature is present. If there are that many 
>> CPUs
>> + * it's guaranteed to be there so we can avoid checking for it 
>> specifically
>> + */
>> +if ( apic_id == 255 )
>> +cpuid(0xb, NULL, NULL, NULL, _id);
>> +
>> +return apic_id;
>> +}
>>  
>>  static void ap_start(void)
>>  {
>> @@ -37,12 +64,12 @@ static void ap_start(void)
>>  cacheattr_init();
>>  printf("done.\n");
>>  
>> +wmb();
>> +ACCESS_ONCE(CPU_TO_X2APICID[ap_cpuid]) = read_apic_id();
> 
> Further thinking about this: do we really need the wmb(), given the
> usage of ACCESS_ONCE()?
> 
> wmb() is a compiler barrier, and the usage of volatile in
> ACCESS_ONCE() should already prevent any compiler re-ordering.
> 
> Thanks, Roger.

volatile reads/writes cannot be reordered with other volatile
reads/writes, but volatile reads/writes can be reordered with
non-volatile reads/writes, AFAIR.

My intent here was to prevent the compiler omitting the write (though in
practice it didn't). I think the barrier is still required to prevent
reordering according to the spec.

Cheers,
Alejandro

Re: [PATCH v3 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor

2024-05-24 Thread Juergen Gross


On 24.05.24 16:30, Jürgen Groß wrote:

On 24.05.24 15:58, Julien Grall wrote:

Hi Henry,

+ Juergen as the Xenstore maintainers. I'd like his opinion on the approach. 
The documentation of the new logic is in:


https://lore.kernel.org/xen-devel/20240517032156.1490515-5-xin.wa...@amd.com/

FWIW I am happy in principle with the logic (this is what we discussed on the 
call last week). Some comments below.


I'm not against this logic, but I'm wondering why it needs to be so
complicated.

Can't the domU itself allocate the Xenstore page from its RAM pages,
write the PFN into the Xenstore grant tab entry, and then make it
public via setting HVM_PARAM_STORE_PFN?

The init-dom0less application could then check HVM_PARAM_STORE_PFN
being set and call XS_introduce_domain.

Note that at least C-xenstored does not need the PFN of the Xenstore
page, as it is just using GNTTAB_RESERVED_XENSTORE for mapping the
page.


Hmm, seems as if O-xenstored is using the map_foreign_page mechanism with
the PFN to map the Xenstore ring page. Maybe this should be changed to use
the grant entry, too?


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH v3 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor

2024-05-24 Thread Jürgen Groß


On 24.05.24 15:58, Julien Grall wrote:

Hi Henry,

+ Juergen as the Xenstore maintainers. I'd like his opinion on the approach. The 
documentation of the new logic is in:


https://lore.kernel.org/xen-devel/20240517032156.1490515-5-xin.wa...@amd.com/

FWIW I am happy in principle with the logic (this is what we discussed on the 
call last week). Some comments below.


I'm not against this logic, but I'm wondering why it needs to be so
complicated.

Can't the domU itself allocate the Xenstore page from its RAM pages,
write the PFN into the Xenstore grant tab entry, and then make it
public via setting HVM_PARAM_STORE_PFN?

The init-dom0less application could then check HVM_PARAM_STORE_PFN
being set and call XS_introduce_domain.

Note that at least C-xenstored does not need the PFN of the Xenstore
page, as it is just using GNTTAB_RESERVED_XENSTORE for mapping the
page.


Juergen

Re: [PATCH v3 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor


Hi Henry,

+ Juergen as the Xenstore maintainers. I'd like his opinion on the 
approach. The documentation of the new logic is in:


https://lore.kernel.org/xen-devel/20240517032156.1490515-5-xin.wa...@amd.com/

FWIW I am happy in principle with the logic (this is what we discussed 
on the call last week). Some comments below.


On 17/05/2024 04:21, Henry Wang wrote:

There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.

Since for init-dom0less, the magic page region is only for XenStore.
To solve above issue, this commit allocates the XenStore page for
Dom0less DomUs at the domain construction time. The PFN will be
noted and communicated to the init-dom0less application executed
from Dom0. To keep the XenStore late init protocol, set the connection
status to XENSTORE_RECONNECT.


So this commit is allocating the page, but it will not be used by 
init-dom0less until the next patch. But Linux could use it. So would 
this break bisection? If so, then I think patch #3 needs to be folded in 
this patch.




Reported-by: Alec Kwapis 
Suggested-by: Daniel P. Smith 
Signed-off-by: Henry Wang 
---
v3:
- Only allocate XenStore page. (Julien)
- Set HVM_PARAM_STORE_PFN and the XenStore connection status directly
   from hypervisor. (Stefano)
v2:
- Reword the commit msg to explain what is "magic page" and use generic
   terminology "hypervisor reserved pages" in commit msg. (Daniel)
- Also move the offset definition of magic pages. (Michal)
- Extract the magic page allocation logic to a function. (Michal)
---
  xen/arch/arm/dom0less-build.c | 44 ++-
  1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242..95c4fd1a2d 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -1,5 +1,6 @@
  /* SPDX-License-Identifier: GPL-2.0-only */
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -10,6 +11,8 @@
  #include 
  #include 
  
+#include 

+
  #include 
  #include 
  #include 
@@ -739,6 +742,42 @@ static int __init alloc_xenstore_evtchn(struct domain *d)
  return 0;
  }
  
+#define XENSTORE_PFN_OFFSET 1

+static int __init alloc_xenstore_page(struct domain *d)
+{
+struct page_info *xenstore_pg;
+struct xenstore_domain_interface *interface;
+mfn_t mfn;
+gfn_t gfn;
+int rc;
+
+d->max_pages += 1;


Sorry I should have spotted it earlier. But you want to check 
d->max_pages is not overflowing. You can look at 
acquire_shared_memory_bank() for how to do it.


Also, maybe we want an helper to do it so it is not open-coded in 
multiple places.



+xenstore_pg = alloc_domheap_page(d, 0);


I think we may want to restrict where the page is allocated. For 
instance, 32-bit domain using short page-tables will not be able to 
address all the physical memory.


I would consider to try to allocate the page below 32-bit (using 
MEMF-bits(32). And then fallback to above 32-bit only 64-bit for domain.


Also, just to note that in theory alloc_domheap_page() could return MFN 
0. In practice we have excluded MFN 0 because it breaks the page 
allocator so far.


But I would still prefer if we add a check on the MFN. This will make 
easier to spot any issue if we ever give MFN 0 to the allocator.


A possible implementation would be to call alloc_domphea_page() a second 
time and then free the first one (e.g. MFN 0).



+if ( xenstore_pg == NULL )
+return -ENOMEM;
+
+mfn = page_to_mfn(xenstore_pg);
+if ( !is_domain_direct_mapped(d) )
+gfn = gaddr_to_gfn(GUEST_MAGIC_BASE +
+   (XENSTORE_PFN_OFFSET << PAGE_SHIFT));
+else
+gfn = gaddr_to_gfn(mfn_to_maddr(mfn));
+
+rc = guest_physmap_add_page(d, gfn, mfn, 0); > +if ( rc )
+{
+free_domheap_page(xenstore_pg);
+return rc;
+}
+
+d->arch.hvm.params[HVM_PARAM_STORE_PFN] = gfn_x(gfn);
+interface = (struct

[GIT PULL] xen: branch for v6.10-rc1

2024-05-24 Thread Juergen Gross

Linus,

Please git pull the following tag:

 git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
for-linus-6.10a-rc1-tag

xen: branch for v6.10-rc1

It contains the following patches:

- a small cleanup in the drivers/xen/xenbus Makefile
- a fix of the Xen xenstore driver to improve connecting to a late started
  Xenstore
- an enhancement for better support of ballooning in PVH guests
- a cleanup using try_cmpxchg() instead of open coding it

Thanks.

Juergen

 arch/x86/xen/enlighten.c  | 33 +
 arch/x86/xen/p2m.c| 11 +--
 drivers/xen/xenbus/Makefile   | 14 ++
 drivers/xen/xenbus/xenbus_probe.c | 36 +++-
 4 files changed, 67 insertions(+), 27 deletions(-)

Andy Shevchenko (1):
  xen/xenbus: Use *-y instead of *-objs in Makefile

Henry Wang (1):
  drivers/xen: Improve the late XenStore init protocol

Roger Pau Monne (1):
  xen/x86: add extra pages to unpopulated-alloc if available

Uros Bizjak (1):
  locking/x86/xen: Use try_cmpxchg() in xen_alloc_p2m_entry()

Re: [PATCH for-4.19 v3 2/3] xen: enable altp2m at create domain domctl

2024-05-24 Thread Jürgen Groß


On 17.05.24 15:33, Roger Pau Monne wrote:

Enabling it using an HVM param is fragile, and complicates the logic when
deciding whether options that interact with altp2m can also be enabled.

Leave the HVM param value for consumption by the guest, but prevent it from
being set.  Enabling is now done using and additional altp2m specific field in
xen_domctl_createdomain.

Note that albeit only currently implemented in x86, altp2m could be implemented
in other architectures, hence why the field is added to xen_domctl_createdomain
instead of xen_arch_domainconfig.

Signed-off-by: Roger Pau Monné 


Reviewed-by: Juergen Gross  # tools/libs/


Juergen

Re: [PATCH v3] xen/arm: Set correct per-cpu cpu_core_mask


Hi,

On 21/05/2024 08:57, Michal Orzel wrote:



On 21/05/2024 09:51, Henry Wang wrote:

Hi Michal,

On 5/21/2024 3:47 PM, Michal Orzel wrote:

Hi Henry.

On 3/21/2024 11:57 AM, Henry Wang wrote:

In the common sysctl command XEN_SYSCTL_physinfo, the value of
cores_per_socket is calculated based on the cpu_core_mask of CPU0.
Currently on Arm this is a fixed value 1 (can be checked via xl info),
which is not correct. This is because during the Arm CPU online
process at boot time, setup_cpu_sibling_map() only sets the per-cpu
cpu_core_mask for itself.

cores_per_socket refers to the number of cores that belong to the same
socket (NUMA node). Currently Xen on Arm does not support physical
CPU hotplug and NUMA, also we assume there is no multithread. Therefore
cores_per_socket means all possible CPUs detected from the device
tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map()
accordingly. Modify the in-code comment which seems to be outdated. Add
a warning to users if Xen is running on processors with multithread
support.

Signed-off-by: Henry Wang 
Signed-off-by: Henry Wang 

Reviewed-by: Michal Orzel 


Thanks.


/* ID of the PCPU we're running on */
DEFINE_PER_CPU(unsigned int, cpu_id);
-/* XXX these seem awfully x86ish... */
+/*
+ * Although multithread is part of the Arm spec, there are not many
+ * processors support multithread and current Xen on Arm assumes there

NIT: s/support/supporting


Sorry, it should have been spotted locally before sending. Anyway, I
will correct this in v4 with your Reviewed-by tag taken. Thanks for
pointing this out.

I don't think there is a need to resend a patch just for fixing this typo. It 
can be done on commit.


Fixed and committed.

Cheers,



~Michal



--
Julien Grall

Re: [PATCH v16 4/5] xen/arm: translate virtual PCI bus topology for guests


Hi,

Sorry I didn't notice there was a v16 and posted comments on the v15. 
The only one is about the size of the list we iterate.


On 23/05/2024 08:48, Roger Pau Monné wrote:

On Wed, May 22, 2024 at 06:59:23PM -0400, Stewart Hildebrand wrote:

From: Oleksandr Andrushchenko 

There are three  originators for the PCI configuration space access:
1. The domain that owns physical host bridge: MMIO handlers are
there so we can update vPCI register handlers with the values
written by the hardware domain, e.g. physical view of the registers
vs guest's view on the configuration space.
2. Guest access to the passed through PCI devices: we need to properly
map virtual bus topology to the physical one, e.g. pass the configuration
space access to the corresponding physical devices.
3. Emulated host PCI bridge access. It doesn't exist in the physical
topology, e.g. it can't be mapped to some physical host bridge.
So, all access to the host bridge itself needs to be trapped and
emulated.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Stewart Hildebrand 


Acked-by: Roger Pau Monné 


For Arm:

Acked-by: Julien Grall 



One unrelated question below.


---
In v15:
- base on top of ("arm/vpci: honor access size when returning an error")
In v11:
- Fixed format issues
- Added ASSERT_UNREACHABLE() to the dummy implementation of
vpci_translate_virtual_device()
- Moved variable in vpci_sbdf_from_gpa(), now it is easier to follow
the logic in the function
Since v9:
- Commend about required lock replaced with ASSERT()
- Style fixes
- call to vpci_translate_virtual_device folded into vpci_sbdf_from_gpa
Since v8:
- locks moved out of vpci_translate_virtual_device()
Since v6:
- add pcidevs locking to vpci_translate_virtual_device
- update wrt to the new locking scheme
Since v5:
- add vpci_translate_virtual_device for #ifndef CONFIG_HAS_VPCI_GUEST_SUPPORT
   case to simplify ifdefery
- add ASSERT(!is_hardware_domain(d)); to vpci_translate_virtual_device
- reset output register on failed virtual SBDF translation
Since v4:
- indentation fixes
- constify struct domain
- updated commit message
- updates to the new locking scheme (pdev->vpci_lock)
Since v3:
- revisit locking
- move code to vpci.c
Since v2:
  - pass struct domain instead of struct vcpu
  - constify arguments where possible
  - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
  xen/arch/arm/vpci.c | 45 -
  xen/drivers/vpci/vpci.c | 24 ++
  xen/include/xen/vpci.h  | 12 +++
  3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index b63a356bb4a8..516933bebfb3 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -7,33 +7,53 @@
  
  #include 
  
-static pci_sbdf_t vpci_sbdf_from_gpa(const struct pci_host_bridge *bridge,

- paddr_t gpa)
+static bool vpci_sbdf_from_gpa(struct domain *d,
+   const struct pci_host_bridge *bridge,
+   paddr_t gpa, pci_sbdf_t *sbdf)
  {
-pci_sbdf_t sbdf;
+bool translated = true;
+
+ASSERT(sbdf);
  
  if ( bridge )

  {
-sbdf.sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
-sbdf.seg = bridge->segment;
-sbdf.bus += bridge->cfg->busn_start;
+sbdf->sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
+sbdf->seg = bridge->segment;
+sbdf->bus += bridge->cfg->busn_start;
  }
  else
-sbdf.sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
+{
+/*
+ * For the passed through devices we need to map their virtual SBDF
+ * to the physical PCI device being passed through.
+ */
+sbdf->sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
+read_lock(>pci_lock);
+translated = vpci_translate_virtual_device(d, sbdf);
+read_unlock(>pci_lock);


I would consider moving the read_{,un}lock() calls inside
vpci_translate_virtual_device(), if that's the only caller of
vpci_translate_virtual_device().  Maybe further patches add other
instances that call from an already locked context.


+}
  
-return sbdf;

+return translated;
  }
  
  static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,

register_t *r, void *p)
  {
  struct pci_host_bridge *bridge = p;
-pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+pci_sbdf_t sbdf;
  const unsigned int access_size = (1U << info->dabt.size) * 8;
  const register_t invalid = GENMASK_ULL(access_size - 1, 0);


Do you know why the invalid value is truncated to the access size.


Because no other callers are doing the truncation and therefore the 
guest would read 1s even for 8-byte unsigned access.


Cheers,

--
Julien Grall

Re: [PATCH v15 4/5] xen/arm: translate virtual PCI bus topology for guests


Hi Stewart,

On 17/05/2024 18:06, Stewart Hildebrand wrote:

From: Oleksandr Andrushchenko 

There are three  originators for the PCI configuration space access:
1. The domain that owns physical host bridge: MMIO handlers are
there so we can update vPCI register handlers with the values
written by the hardware domain, e.g. physical view of the registers
vs guest's view on the configuration space.
2. Guest access to the passed through PCI devices: we need to properly
map virtual bus topology to the physical one, e.g. pass the configuration
space access to the corresponding physical devices.
3. Emulated host PCI bridge access. It doesn't exist in the physical
topology, e.g. it can't be mapped to some physical host bridge.
So, all access to the host bridge itself needs to be trapped and
emulated.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Stewart Hildebrand 


With one remark below, for Arm:

Acked-by: Julien Grall 

[...]


diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 23722634d50b..98b294f09688 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -81,6 +81,30 @@ static int add_virtual_device(struct pci_dev *pdev)
  return 0;
  }
  
+/*

+ * Find the physical device which is mapped to the virtual device
+ * and translate virtual SBDF to the physical one.
+ */
+bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
+{
+const struct pci_dev *pdev;
+
+ASSERT(!is_hardware_domain(d));
+ASSERT(rw_is_locked(>pci_lock));
+
+for_each_pdev ( d, pdev )


(This doesn't need to be addressed now)


I see that we have other places with for_each_pdev() in the vCPI. So are 
we expecting the list to be smallish?


If not, then we may want to consider reworking the datastructure or put 
a limit on the number of PCI devices assigned.




+{
+if ( pdev->vpci && (pdev->vpci->guest_sbdf.sbdf == sbdf->sbdf) )
+{
+/* Replace guest SBDF with the physical one. */
+*sbdf = pdev->sbdf;
+return true;
+}
+}
+
+return false;
+}
+
  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
  
  void vpci_deassign_device(struct pci_dev *pdev)

diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 980aded26fc1..7e5a0f0c50c1 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -303,6 +303,18 @@ static inline bool __must_check 
vpci_process_pending(struct vcpu *v)
  }
  #endif
  
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT

+bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf);
+#else
+static inline bool vpci_translate_virtual_device(const struct domain *d,
+ pci_sbdf_t *sbdf)
+{
+ASSERT_UNREACHABLE();
+
+return false;
+}
+#endif
+
  #endif
  
  /*


Cheers,

--
Julien Grall

[xen-4.17-testing test] 186129: regressions - FAIL

flight 186129 xen-4.17-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186129/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-pvops 6 kernel-build   fail in 186109 REGR. vs. 185864

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-credit1 10 host-ping-check-xen fail in 186109 pass in 
186129
 test-armhf-armhf-xl   8 xen-boot fail in 186109 pass in 186129
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 186109

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked in 186109 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked in 186109 n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked in 186109 n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked in 186109 n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked in 186109 n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked in 186109 n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked in 186109 n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked in 186109 n/a
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 185864
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185864
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass

Re: [PATCH v4 3/7] xen/p2m: put reference for level 2 superpage


Hi Luca,

On 24/05/2024 13:40, Luca Fancellu wrote:

From: Penny Zheng 

We are doing foreign memory mapping for static shared memory, and
there is a great possibility that it could be super mapped.
But today, p2m_put_l3_page could not handle superpages.

This commits implements a new function p2m_put_l2_superpage to handle
2MB superpages, specifically for helping put extra references for
foreign superpages.

Modify relinquish_p2m_mapping as well to take into account preemption
when type is foreign memory and order is above 9 (2MB).

Currently 1GB superpages are not handled because Xen is not preemptible
and therefore some work is needed to handle such superpages, for which
at some point Xen might end up freeing memory and therefore for such a
big mapping it could end up in a very long operation.

Signed-off-by: Penny Zheng 
Signed-off-by: Luca Fancellu 
---
v4 changes:
  - optimised the path to call put_page() on the foreign mapping as
Julien suggested. Add a comment in p2m_put_l2_superpage to state
that any changes needs to take into account some change in the
relinquish code. (Julien)
v3 changes:
  - Add reasoning why we don't support now 1GB superpage, remove level_order
variable from p2m_put_l2_superpage, update TODO comment inside
p2m_free_entry, use XEN_PT_LEVEL_ORDER(2) instead of value 9 inside
relinquish_p2m_mapping. (Michal)
v2:
  - Do not handle 1GB super page as there might be some issue where
a lot of calls to put_page(...) might be issued which could lead
to free memory that is a long operation.
v1:
  - patch from 
https://patchwork.kernel.org/project/xen-devel/patch/20231206090623.1932275-9-penny.zh...@arm.com/
---
  xen/arch/arm/mmu/p2m.c | 82 +++---
  1 file changed, 62 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
index 41fcca011cf4..986c5a03c54b 100644
--- a/xen/arch/arm/mmu/p2m.c
+++ b/xen/arch/arm/mmu/p2m.c
@@ -753,34 +753,66 @@ static int p2m_mem_access_radix_set(struct p2m_domain 
*p2m, gfn_t gfn,
  return rc;
  }
  
-/*

- * Put any references on the single 4K page referenced by pte.
- * TODO: Handle superpages, for now we only take special references for leaf
- * pages (specifically foreign ones, which can't be super mapped today).
- */
-static void p2m_put_l3_page(const lpae_t pte)
+static void p2m_put_foreign_page(struct page_info *pg)
  {
-mfn_t mfn = lpae_get_mfn(pte);
-
-ASSERT(p2m_is_valid(pte));
-
  /*
- * TODO: Handle other p2m types
- *
   * It's safe to do the put_page here because page_alloc will
   * flush the TLBs if the page is reallocated before the end of
   * this loop.
   */
-if ( p2m_is_foreign(pte.p2m.type) )
+put_page(pg);
+}
+
+/* Put any references on the single 4K page referenced by mfn. */
+static void p2m_put_l3_page(mfn_t mfn, p2m_type_t type)
+{
+/* TODO: Handle other p2m types */
+if ( p2m_is_foreign(type) )
  {
  ASSERT(mfn_valid(mfn));
-put_page(mfn_to_page(mfn));
+p2m_put_foreign_page(mfn_to_page(mfn));
  }
  /* Detect the xenheap page and mark the stored GFN as invalid. */
-else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
+else if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) )
  page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
  }
  
+/* Put any references on the superpage referenced by mfn. */

+static void p2m_put_l2_superpage(mfn_t mfn, p2m_type_t type)
+{
+struct page_info *pg;
+unsigned int i;
+
+/*
+ * TODO: Handle other p2m types, but be aware that any changes to handle
+ * different types should require an update on the relinquish code to 
handle
+ * preemption.
+ */
+if ( !p2m_is_foreign(type) )
+return;
+
+ASSERT(mfn_valid(mfn));
+
+pg = mfn_to_page(mfn);
+
+for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++, pg++ )
+p2m_put_foreign_page(pg);
+}
+
+/* Put any references on the page referenced by pte. */
+static void p2m_put_page(const lpae_t pte, unsigned int level)
+{
+mfn_t mfn = lpae_get_mfn(pte);
+
+ASSERT(p2m_is_valid(pte));
+
+/* We have a second level 2M superpage */
+if ( p2m_is_superpage(pte, level) && (level == 2) )


AFAICT, p2m_put_page() can only be called if the pte points to a 
superpage or page:


if ( p2m_is_superpage(entry, level) || (level == 3) )
{
   ...
   p2m_put_page()

}

So do we actually need to check p2m_is_superpage()?


+return p2m_put_l2_superpage(mfn, pte.p2m.type);
+else if ( level == 3 )
+return p2m_put_l3_page(mfn, pte.p2m.type);
+}
+
  /* Free lpae sub-tree behind an entry */
  static void p2m_free_entry(struct p2m_domain *p2m,
 lpae_t entry, unsigned int level)
@@ -809,9 +841,16 @@ static void p2m_free_entry(struct p2m_domain *p2m,
  #endif
  
  p2m->stats.mappings[level]--;

-/* Nothing to do if the entry is a

Re: [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook

2024-05-24 Thread Roger Pau Monné

On Fri, May 24, 2024 at 12:16:00PM +0100, Alejandro Vallejo wrote:
> On 23/05/2024 15:50, Roger Pau Monné wrote:
> > On Wed, May 08, 2024 at 01:39:22PM +0100, Alejandro Vallejo wrote:
> >> While at it, add a check for the reserved field in the hidden save area.
> >>
> >> Signed-off-by: Alejandro Vallejo 
> >> ---
> >> v2:
> >>   * New patch. Addresses the missing check for rsvd_zero in v1.
> > 
> > Oh, it would be better if this was done at the time when rsvd_zero is
> > introduced.  I think this should be moved ahead of the series, so that
> > the patch that introduces rsvd_zero can add the check in
> > lapic_check_hidden().
> 
> I'll give that a whirl.
> 
> > 
> >> ---
> >>  xen/arch/x86/hvm/vlapic.c | 41 ---
> >>  1 file changed, 30 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> >> index 8a24419c..2f06bff1b2cc 100644
> >> --- a/xen/arch/x86/hvm/vlapic.c
> >> +++ b/xen/arch/x86/hvm/vlapic.c
> >> @@ -1573,35 +1573,54 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> >> v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
> >>  }
> >>  
> >> -static int cf_check lapic_load_hidden(struct domain *d, 
> >> hvm_domain_context_t *h)
> >> +static int cf_check lapic_check_hidden(const struct domain *d,
> >> +   hvm_domain_context_t *h)
> >>  {
> >>  unsigned int vcpuid = hvm_load_instance(h);
> >> -struct vcpu *v;
> >> -struct vlapic *s;
> >> +struct hvm_hw_lapic s;
> >>  
> >>  if ( !has_vlapic(d) )
> >>  return -ENODEV;
> >>  
> >>  /* Which vlapic to load? */
> >> -if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> >> +if ( vcpuid >= d->max_vcpus || d->vcpu[vcpuid] == NULL )
> >>  {
> >>  dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
> >>  d->domain_id, vcpuid);
> >>  return -EINVAL;
> >>  }
> >> -s = vcpu_vlapic(v);
> >>  
> >> -if ( hvm_load_entry_zeroextend(LAPIC, h, >hw) != 0 )
> >> +if ( hvm_load_entry_zeroextend(LAPIC, h, ) )
> > 
> > Can't you use hvm_get_entry() to perform the sanity checks:
> > 
> > const struct hvm_hw_lapic *s = hvm_get_entry(LAPIC, h);
> > 
> > Thanks, Roger.
> 
> I don't think I can. Because the last field (rsvd_zero) might or might
> not be there, so it needs to be zero-extended. Unless I misunderstood
> what hvm_get_entry() is meant to do. It seems to check for exact sizes.

Oh, indeed, hvm_get_entry() uses strict checking and will refuse to
return the entry if sizes don't match.  There seems to be no way to
avoid the copy if we want to do this in a sane way.

Thanks, Roger.

[PATCH v4 6/7] xen/arm: Implement the logic for static shared memory from Xen heap

This commit implements the logic to have the static shared memory banks
from the Xen heap instead of having the host physical address passed from
the user.

When the host physical address is not supplied, the physical memory is
taken from the Xen heap using allocate_domheap_memory, the allocation
needs to occur at the first handled DT node and the allocated banks
need to be saved somewhere.

Introduce the 'shm_heap_banks' for that reason, a struct that will hold
the banks allocated from the heap, its field bank[].shmem_extra will be
used to point to the bootinfo shared memory banks .shmem_extra space, so
that there is not further allocation of memory and every bank in
shm_heap_banks can be safely identified by the shm_id to reconstruct its
traceability and if it was allocated or not.

A search into 'shm_heap_banks' will reveal if the banks were allocated
or not, in case the host address is not passed, and the callback given
to allocate_domheap_memory will store the banks in the structure and
map them to the current domain, to do that, some changes to
acquire_shared_memory_bank are made to let it differentiate if the bank
is from the heap and if it is, then assign_pages is called for every
bank.

When the bank is already allocated, for every bank allocated with the
corresponding shm_id, handle_shared_mem_bank is called and the mapping
are done.

Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v4 changes:
 - Add R-by Michal
v3 changes:
 - reworded commit msg section, swap role_str and gbase in
   alloc_heap_pages_cb_extra to avoid padding hole in arm32, remove
   not needed printk, modify printk to print KB instead of KB, swap
   strncmp for strcmp, reduced memory footprint for shm_heap_banks.
   (Michal)
v2 changes:
 - add static inline get_shmem_heap_banks(), given the changes to the
   struct membanks interface. Rebase changes due to removal of
   owner_dom_io arg from handle_shared_mem_bank.
   Change save_map_heap_pages return type given the changes to the
   allocate_domheap_memory callback type.
---
 xen/arch/arm/static-shmem.c | 187 ++--
 1 file changed, 155 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index bc093b9da9ea..dbb017c7d76e 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -9,6 +9,25 @@
 #include 
 #include 
 
+typedef struct {
+struct domain *d;
+const char *role_str;
+paddr_t gbase;
+struct shmem_membank_extra *bank_extra_info;
+} alloc_heap_pages_cb_extra;
+
+static struct {
+struct membanks_hdr common;
+struct membank bank[NR_SHMEM_BANKS];
+} shm_heap_banks __initdata = {
+.common.max_banks = NR_SHMEM_BANKS
+};
+
+static inline struct membanks *get_shmem_heap_banks(void)
+{
+return container_of(_heap_banks.common, struct membanks, common);
+}
+
 static void __init __maybe_unused build_assertions(void)
 {
 /*
@@ -63,7 +82,8 @@ static bool __init is_shm_allocated_to_domio(paddr_t pbase)
 }
 
 static mfn_t __init acquire_shared_memory_bank(struct domain *d,
-   paddr_t pbase, paddr_t psize)
+   paddr_t pbase, paddr_t psize,
+   bool bank_from_heap)
 {
 mfn_t smfn;
 unsigned long nr_pfns;
@@ -83,19 +103,31 @@ static mfn_t __init acquire_shared_memory_bank(struct 
domain *d,
 d->max_pages += nr_pfns;
 
 smfn = maddr_to_mfn(pbase);
-res = acquire_domstatic_pages(d, smfn, nr_pfns, 0);
+if ( bank_from_heap )
+/*
+ * When host address is not provided, static shared memory is
+ * allocated from heap and shall be assigned to owner domain.
+ */
+res = assign_pages(maddr_to_page(pbase), nr_pfns, d, 0);
+else
+res = acquire_domstatic_pages(d, smfn, nr_pfns, 0);
+
 if ( res )
 {
-printk(XENLOG_ERR
-   "%pd: failed to acquire static memory: %d.\n", d, res);
-d->max_pages -= nr_pfns;
-return INVALID_MFN;
+printk(XENLOG_ERR "%pd: failed to %s static memory: %d.\n", d,
+   bank_from_heap ? "assign" : "acquire", res);
+goto fail;
 }
 
 return smfn;
+
+ fail:
+d->max_pages -= nr_pfns;
+return INVALID_MFN;
 }
 
 static int __init assign_shared_memory(struct domain *d, paddr_t gbase,
+   bool bank_from_heap,
const struct membank *shm_bank)
 {
 mfn_t smfn;
@@ -108,10 +140,7 @@ static int __init assign_shared_memory(struct domain *d, 
paddr_t gbase,
 psize = shm_bank->size;
 nr_borrowers = shm_bank->shmem_extra->nr_shm_borrowers;
 
-printk("%pd: allocate static shared memory BANK 
%#"PRIpaddr"-%#"PRIpaddr".\n",
-   d, pbase, pbase + psize);
-
-smfn = acquire_shared_memory_bank(d, pbase, psize);
+smfn = acquire_shared_memory_bank(d, pbase, psize,

[PATCH v4 1/7] xen/arm: Lookup bootinfo shm bank during the mapping

The current static shared memory code is using bootinfo banks when it
needs to find the number of borrowers, so every time assign_shared_memory
is called, the bank is searched in the bootinfo.shmem structure.

There is nothing wrong with it, however the bank can be used also to
retrieve the start address and size and also to pass less argument to
assign_shared_memory. When retrieving the information from the bootinfo
bank, it's also possible to move the checks on alignment to
process_shm_node in the early stages.

So create a new function find_shm_bank_by_id() which takes a
'struct shared_meminfo' structure and the shared memory ID, to look for a
bank with a matching ID, take the physical host address and size from the
bank, pass the bank to assign_shared_memory() removing the now unnecessary
arguments and finally remove the acquire_nr_borrower_domain() function
since now the information can be extracted from the passed bank.
Move the "xen,shm-id" parsing early in process_shm to bail out quickly in
case of errors (unlikely), as said above, move the checks on alignment
to process_shm_node.

Drawback of this change is that now the bootinfo are used also when the
bank doesn't need to be allocated, however it will be convenient later
to use it as an argument for assign_shared_memory when dealing with
the use case where the Host physical address is not supplied by the user.

Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v3 changes:
 - switch strncmp with strcmp in find_shm_bank_by_id, fix commit msg typo,
   add R-by Michal.
v2 changes:
 - fix typo commit msg, renamed find_shm() to find_shm_bank_by_id(),
   swap region size check different from zero and size alignment, remove
   not necessary BUGON(). (Michal)
---
 xen/arch/arm/static-shmem.c | 100 +++-
 1 file changed, 53 insertions(+), 47 deletions(-)

diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index 78881dd1d3f7..0a1c327e90ea 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -19,29 +19,21 @@ static void __init __maybe_unused build_assertions(void)
  offsetof(struct shared_meminfo, bank)));
 }
 
-static int __init acquire_nr_borrower_domain(struct domain *d,
- paddr_t pbase, paddr_t psize,
- unsigned long *nr_borrowers)
+static const struct membank __init *
+find_shm_bank_by_id(const struct membanks *shmem, const char *shm_id)
 {
-const struct membanks *shmem = bootinfo_get_shmem();
 unsigned int bank;
 
-/* Iterate reserved memory to find requested shm bank. */
 for ( bank = 0 ; bank < shmem->nr_banks; bank++ )
 {
-paddr_t bank_start = shmem->bank[bank].start;
-paddr_t bank_size = shmem->bank[bank].size;
-
-if ( (pbase == bank_start) && (psize == bank_size) )
+if ( strcmp(shm_id, shmem->bank[bank].shmem_extra->shm_id) == 0 )
 break;
 }
 
 if ( bank == shmem->nr_banks )
-return -ENOENT;
+return NULL;
 
-*nr_borrowers = shmem->bank[bank].shmem_extra->nr_shm_borrowers;
-
-return 0;
+return >bank[bank];
 }
 
 /*
@@ -103,14 +95,18 @@ static mfn_t __init acquire_shared_memory_bank(struct 
domain *d,
 return smfn;
 }
 
-static int __init assign_shared_memory(struct domain *d,
-   paddr_t pbase, paddr_t psize,
-   paddr_t gbase)
+static int __init assign_shared_memory(struct domain *d, paddr_t gbase,
+   const struct membank *shm_bank)
 {
 mfn_t smfn;
 int ret = 0;
 unsigned long nr_pages, nr_borrowers, i;
 struct page_info *page;
+paddr_t pbase, psize;
+
+pbase = shm_bank->start;
+psize = shm_bank->size;
+nr_borrowers = shm_bank->shmem_extra->nr_shm_borrowers;
 
 printk("%pd: allocate static shared memory BANK 
%#"PRIpaddr"-%#"PRIpaddr".\n",
d, pbase, pbase + psize);
@@ -135,14 +131,6 @@ static int __init assign_shared_memory(struct domain *d,
 }
 }
 
-/*
- * Get the right amount of references per page, which is the number of
- * borrower domains.
- */
-ret = acquire_nr_borrower_domain(d, pbase, psize, _borrowers);
-if ( ret )
-return ret;
-
 /*
  * Instead of letting borrower domain get a page ref, we add as many
  * additional reference as the number of borrowers when the owner
@@ -199,6 +187,7 @@ int __init process_shm(struct domain *d, struct kernel_info 
*kinfo,
 
 dt_for_each_child_node(node, shm_node)
 {
+const struct membank *boot_shm_bank;
 const struct dt_property *prop;
 const __be32 *cells;
 uint32_t addr_cells, size_cells;
@@ -212,6 +201,23 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
 if ( !dt_device_is_compatible(shm_node, "xen,domain-shared-memory-v1") 
)

[PATCH v4 0/7] Static shared memory followup v2 - pt2

This serie is a partial rework of this other serie:
https://patchwork.kernel.org/project/xen-devel/cover/20231206090623.1932275-1-penny.zh...@arm.com/

The original serie is addressing an issue of the static shared memory feature
that impacts the memory footprint of other component when the feature is
enabled, another issue impacts the device tree generation for the guests when
the feature is enabled and used and the last one is a missing feature that is
the option to have a static shared memory region that is not from the host
address space.

This serie is handling some comment on the original serie and it is splitting
the rework in two part, this first part is addressing the memory footprint issue
and the device tree generation and currently is fully merged
(https://patchwork.kernel.org/project/xen-devel/cover/20240418073652.3622828-1-luca.fance...@arm.com/),
this serie is addressing the static shared memory allocation from the Xen heap.

Luca Fancellu (5):
  xen/arm: Lookup bootinfo shm bank during the mapping
  xen/arm: Wrap shared memory mapping code in one function
  xen/arm: Parse xen,shared-mem when host phys address is not provided
  xen/arm: Rework heap page allocation outside allocate_bank_memory
  xen/arm: Implement the logic for static shared memory from Xen heap

Penny Zheng (2):
  xen/p2m: put reference for level 2 superpage
  xen/docs: Describe static shared memory when host address is not
provided

 docs/misc/arm/device-tree/booting.txt   |  52 ++-
 xen/arch/arm/arm32/mmu/mm.c |  11 +-
 xen/arch/arm/dom0less-build.c   |   4 +-
 xen/arch/arm/domain_build.c |  84 +++--
 xen/arch/arm/include/asm/domain_build.h |   9 +-
 xen/arch/arm/mmu/p2m.c  |  82 +++--
 xen/arch/arm/setup.c|  14 +-
 xen/arch/arm/static-shmem.c | 432 +---
 8 files changed, 502 insertions(+), 186 deletions(-)

-- 
2.34.1

[PATCH v4 2/7] xen/arm: Wrap shared memory mapping code in one function

Wrap the code and logic that is calling assign_shared_memory
and map_regions_p2mt into a new function 'handle_shared_mem_bank',
it will become useful later when the code will allow the user to
don't pass the host physical address.

Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v3 changes:
 - check return value of dt_property_read_string, add R-by Michal
v2 changes:
 - add blank line, move owner_dom_io computation inside
   handle_shared_mem_bank in order to reduce args count, remove
   not needed BUGON(). (Michal)
---
 xen/arch/arm/static-shmem.c | 86 +++--
 1 file changed, 53 insertions(+), 33 deletions(-)

diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index 0a1c327e90ea..c15a65130659 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -180,6 +180,53 @@ append_shm_bank_to_domain(struct kernel_info *kinfo, 
paddr_t start,
 return 0;
 }
 
+static int __init handle_shared_mem_bank(struct domain *d, paddr_t gbase,
+ const char *role_str,
+ const struct membank *shm_bank)
+{
+bool owner_dom_io = true;
+paddr_t pbase, psize;
+int ret;
+
+pbase = shm_bank->start;
+psize = shm_bank->size;
+
+/*
+ * "role" property is optional and if it is defined explicitly,
+ * then the owner domain is not the default "dom_io" domain.
+ */
+if ( role_str != NULL )
+owner_dom_io = false;
+
+/*
+ * DOMID_IO is a fake domain and is not described in the Device-Tree.
+ * Therefore when the owner of the shared region is DOMID_IO, we will
+ * only find the borrowers.
+ */
+if ( (owner_dom_io && !is_shm_allocated_to_domio(pbase)) ||
+ (!owner_dom_io && strcmp(role_str, "owner") == 0) )
+{
+/*
+ * We found the first borrower of the region, the owner was not
+ * specified, so they should be assigned to dom_io.
+ */
+ret = assign_shared_memory(owner_dom_io ? dom_io : d, gbase, shm_bank);
+if ( ret )
+return ret;
+}
+
+if ( owner_dom_io || (strcmp(role_str, "borrower") == 0) )
+{
+/* Set up P2M foreign mapping for borrower domain. */
+ret = map_regions_p2mt(d, _gfn(PFN_UP(gbase)), PFN_DOWN(psize),
+   _mfn(PFN_UP(pbase)), p2m_map_foreign_rw);
+if ( ret )
+return ret;
+}
+
+return 0;
+}
+
 int __init process_shm(struct domain *d, struct kernel_info *kinfo,
const struct dt_device_node *node)
 {
@@ -196,7 +243,6 @@ int __init process_shm(struct domain *d, struct kernel_info 
*kinfo,
 unsigned int i;
 const char *role_str;
 const char *shm_id;
-bool owner_dom_io = true;
 
 if ( !dt_device_is_compatible(shm_node, "xen,domain-shared-memory-v1") 
)
 continue;
@@ -237,39 +283,13 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
 return -EINVAL;
 }
 
-/*
- * "role" property is optional and if it is defined explicitly,
- * then the owner domain is not the default "dom_io" domain.
- */
-if ( dt_property_read_string(shm_node, "role", _str) == 0 )
-owner_dom_io = false;
+/* "role" property is optional */
+if ( dt_property_read_string(shm_node, "role", _str) != 0 )
+role_str = NULL;
 
-/*
- * DOMID_IO is a fake domain and is not described in the Device-Tree.
- * Therefore when the owner of the shared region is DOMID_IO, we will
- * only find the borrowers.
- */
-if ( (owner_dom_io && !is_shm_allocated_to_domio(pbase)) ||
- (!owner_dom_io && strcmp(role_str, "owner") == 0) )
-{
-/*
- * We found the first borrower of the region, the owner was not
- * specified, so they should be assigned to dom_io.
- */
-ret = assign_shared_memory(owner_dom_io ? dom_io : d, gbase,
-   boot_shm_bank);
-if ( ret )
-return ret;
-}
-
-if ( owner_dom_io || (strcmp(role_str, "borrower") == 0) )
-{
-/* Set up P2M foreign mapping for borrower domain. */
-ret = map_regions_p2mt(d, _gfn(PFN_UP(gbase)), PFN_DOWN(psize),
-   _mfn(PFN_UP(pbase)), p2m_map_foreign_rw);
-if ( ret )
-return ret;
-}
+ret = handle_shared_mem_bank(d, gbase, role_str, boot_shm_bank);
+if ( ret )
+return ret;
 
 /*
  * Record static shared memory region info for later setting
-- 
2.34.1

[PATCH v4 7/7] xen/docs: Describe static shared memory when host address is not provided

From: Penny Zheng 

This commit describe the new scenario where host address is not provided
in "xen,shared-mem" property and a new example is added to the page to
explain in details.

Take the occasion to fix some typos in the page.

Signed-off-by: Penny Zheng 
Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v2:
 - Add Michal R-by
v1:
 - patch from 
https://patchwork.kernel.org/project/xen-devel/patch/20231206090623.1932275-10-penny.zh...@arm.com/
   with some changes in the commit message.
---
 docs/misc/arm/device-tree/booting.txt | 52 ---
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index bbd955e9c2f6..ac4bad6fe5e0 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -590,7 +590,7 @@ communication.
 An array takes a physical address, which is the base address of the
 shared memory region in host physical address space, a size, and a guest
 physical address, as the target address of the mapping.
-e.g. xen,shared-mem = < [host physical address] [guest address] [size] >
+e.g. xen,shared-mem = < [host physical address] [guest address] [size] >;
 
 It shall also meet the following criteria:
 1) If the SHM ID matches with an existing region, the address range of the
@@ -601,8 +601,8 @@ communication.
 The number of cells for the host address (and size) is the same as the
 guest pseudo-physical address and they are inherited from the parent node.
 
-Host physical address is optional, when missing Xen decides the location
-(currently unimplemented).
+Host physical address is optional, when missing Xen decides the location.
+e.g. xen,shared-mem = < [guest address] [size] >;
 
 - role (Optional)
 
@@ -629,7 +629,7 @@ chosen {
 role = "owner";
 xen,shm-id = "my-shared-mem-0";
 xen,shared-mem = <0x1000 0x1000 0x1000>;
-}
+};
 
 domU1 {
 compatible = "xen,domain";
@@ -640,25 +640,36 @@ chosen {
 vpl011;
 
 /*
- * shared memory region identified as 0x0(xen,shm-id = <0x0>)
- * is shared between Dom0 and DomU1.
+ * shared memory region "my-shared-mem-0" is shared
+ * between Dom0 and DomU1.
  */
 domU1-shared-mem@1000 {
 compatible = "xen,domain-shared-memory-v1";
 role = "borrower";
 xen,shm-id = "my-shared-mem-0";
 xen,shared-mem = <0x1000 0x5000 0x1000>;
-}
+};
 
 /*
- * shared memory region identified as 0x1(xen,shm-id = <0x1>)
- * is shared between DomU1 and DomU2.
+ * shared memory region "my-shared-mem-1" is shared between
+ * DomU1 and DomU2.
  */
 domU1-shared-mem@5000 {
 compatible = "xen,domain-shared-memory-v1";
 xen,shm-id = "my-shared-mem-1";
 xen,shared-mem = <0x5000 0x6000 0x2000>;
-}
+};
+
+/*
+ * shared memory region "my-shared-mem-2" is shared between
+ * DomU1 and DomU2.
+ */
+domU1-shared-mem-2 {
+compatible = "xen,domain-shared-memory-v1";
+xen,shm-id = "my-shared-mem-2";
+role = "owner";
+xen,shared-mem = <0x8000 0x2000>;
+};
 
 ..
 
@@ -672,14 +683,21 @@ chosen {
 cpus = <1>;
 
 /*
- * shared memory region identified as 0x1(xen,shm-id = <0x1>)
- * is shared between domU1 and domU2.
+ * shared memory region "my-shared-mem-1" is shared between
+ * domU1 and domU2.
  */
 domU2-shared-mem@5000 {
 compatible = "xen,domain-shared-memory-v1";
 xen,shm-id = "my-shared-mem-1";
 xen,shared-mem = <0x5000 0x7000 0x2000>;
-}
+};
+
+domU2-shared-mem-2 {
+compatible = "xen,domain-shared-memory-v1";
+xen,shm-id = "my-shared-mem-2";
+role = "borrower";
+xen,shared-mem = <0x9000 0x2000>;
+};
 
 ..
 };
@@ -699,3 +717,11 @@ shared between DomU1 and DomU2. It will get mapped at 
0x6000 in DomU1 guest
 physical address space, and at 0x7000 in DomU2 guest physical address 
space.
 DomU1 and DomU2 are both the borrower domain, the owner domain is the default
 owner domain DOMID_IO.
+
+For the static shared memory region "my-shared-mem-2", since host physical
+address is not provided by user, Xen will automatically allocate 512MB
+from heap as static shared memory to be shared between DomU1 and DomU2.
+The automatically allocated static shared memory will get mapped at
+0x8000 in DomU1 guest physical address space, and at 0x9000 in DomU2
+guest physical address space. DomU1 is explicitly defined as the owner domain,
+and DomU2

[PATCH v4 4/7] xen/arm: Parse xen,shared-mem when host phys address is not provided

Handle the parsing of the 'xen,shared-mem' property when the host physical
address is not provided, this commit is introducing the logic to parse it,
but the functionality is still not implemented and will be part of future
commits.

Rework the logic inside process_shm_node to check the shm_id before doing
the other checks, because it ease the logic itself, add more comment on
the logic.
Now when the host physical address is not provided, the value
INVALID_PADDR is chosen to signal this condition and it is stored as
start of the bank, due to that change also early_print_info_shmem and
init_sharedmem_pages are changed, to not handle banks with start equal
to INVALID_PADDR.

Another change is done inside meminfo_overlap_check, to skip banks that
are starting with the start address INVALID_PADDR, that function is used
to check banks from reserved memory, shared memory and ACPI and since
the comment above the function states that wrapping around is not handled,
it's unlikely for these bank to have the start address as INVALID_PADDR.
Same change is done inside consider_modules, find_unallocated_memory and
dt_unreserved_regions functions, in order to skip banks that starts with
INVALID_PADDR from any computation.
The changes above holds because of this consideration.

Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v4 changes:
 - Fix wrong condition for paddr_assigned (Michal)
v3 changes:
 - fix typo in commit msg, add R-by Michal
v2 changes:
 - fix comments, add parenthesis to some conditions, remove unneeded
   variables, remove else branch, increment counter in the for loop,
   skip INVALID_PADDR start banks from also consider_modules,
   find_unallocated_memory and dt_unreserved_regions. (Michal)
---
 xen/arch/arm/arm32/mmu/mm.c |  11 +++-
 xen/arch/arm/domain_build.c |   5 ++
 xen/arch/arm/setup.c|  14 +++-
 xen/arch/arm/static-shmem.c | 125 +---
 4 files changed, 111 insertions(+), 44 deletions(-)

diff --git a/xen/arch/arm/arm32/mmu/mm.c b/xen/arch/arm/arm32/mmu/mm.c
index be480c31ea05..30a7aa1e8e51 100644
--- a/xen/arch/arm/arm32/mmu/mm.c
+++ b/xen/arch/arm/arm32/mmu/mm.c
@@ -101,8 +101,15 @@ static paddr_t __init consider_modules(paddr_t s, paddr_t 
e,
 nr += reserved_mem->nr_banks;
 for ( ; i - nr < shmem->nr_banks; i++ )
 {
-paddr_t r_s = shmem->bank[i - nr].start;
-paddr_t r_e = r_s + shmem->bank[i - nr].size;
+paddr_t r_s, r_e;
+
+r_s = shmem->bank[i - nr].start;
+
+/* Shared memory banks can contain INVALID_PADDR as start */
+if ( INVALID_PADDR == r_s )
+continue;
+
+r_e = r_s + shmem->bank[i - nr].size;
 
 if ( s < r_e && r_s < e )
 {
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 968c497efc78..02e741685102 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -927,6 +927,11 @@ static int __init find_unallocated_memory(const struct 
kernel_info *kinfo,
 for ( j = 0; j < mem_banks[i]->nr_banks; j++ )
 {
 start = mem_banks[i]->bank[j].start;
+
+/* Shared memory banks can contain INVALID_PADDR as start */
+if ( INVALID_PADDR == start )
+continue;
+
 end = mem_banks[i]->bank[j].start + mem_banks[i]->bank[j].size;
 res = rangeset_remove_range(unalloc_mem, PFN_DOWN(start),
 PFN_DOWN(end - 1));
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index c4e5c19b11d6..0c2fdaceaf21 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -240,8 +240,15 @@ static void __init dt_unreserved_regions(paddr_t s, 
paddr_t e,
 offset = reserved_mem->nr_banks;
 for ( ; i - offset < shmem->nr_banks; i++ )
 {
-paddr_t r_s = shmem->bank[i - offset].start;
-paddr_t r_e = r_s + shmem->bank[i - offset].size;
+paddr_t r_s, r_e;
+
+r_s = shmem->bank[i - offset].start;
+
+/* Shared memory banks can contain INVALID_PADDR as start */
+if ( INVALID_PADDR == r_s )
+continue;
+
+r_e = r_s + shmem->bank[i - offset].size;
 
 if ( s < r_e && r_s < e )
 {
@@ -272,7 +279,8 @@ static bool __init meminfo_overlap_check(const struct 
membanks *mem,
 bank_start = mem->bank[i].start;
 bank_end = bank_start + mem->bank[i].size;
 
-if ( region_end <= bank_start || region_start >= bank_end )
+if ( INVALID_PADDR == bank_start || region_end <= bank_start ||
+ region_start >= bank_end )
 continue;
 else
 {
diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index c15a65130659..bc093b9da9ea 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -264,6 +264,12 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
 pbase = boot_shm_bank->start;
 psize =

[PATCH v4 3/7] xen/p2m: put reference for level 2 superpage

From: Penny Zheng 

We are doing foreign memory mapping for static shared memory, and
there is a great possibility that it could be super mapped.
But today, p2m_put_l3_page could not handle superpages.

This commits implements a new function p2m_put_l2_superpage to handle
2MB superpages, specifically for helping put extra references for
foreign superpages.

Modify relinquish_p2m_mapping as well to take into account preemption
when type is foreign memory and order is above 9 (2MB).

Currently 1GB superpages are not handled because Xen is not preemptible
and therefore some work is needed to handle such superpages, for which
at some point Xen might end up freeing memory and therefore for such a
big mapping it could end up in a very long operation.

Signed-off-by: Penny Zheng 
Signed-off-by: Luca Fancellu 
---
v4 changes:
 - optimised the path to call put_page() on the foreign mapping as
   Julien suggested. Add a comment in p2m_put_l2_superpage to state
   that any changes needs to take into account some change in the
   relinquish code. (Julien)
v3 changes:
 - Add reasoning why we don't support now 1GB superpage, remove level_order
   variable from p2m_put_l2_superpage, update TODO comment inside
   p2m_free_entry, use XEN_PT_LEVEL_ORDER(2) instead of value 9 inside
   relinquish_p2m_mapping. (Michal)
v2:
 - Do not handle 1GB super page as there might be some issue where
   a lot of calls to put_page(...) might be issued which could lead
   to free memory that is a long operation.
v1:
 - patch from 
https://patchwork.kernel.org/project/xen-devel/patch/20231206090623.1932275-9-penny.zh...@arm.com/
---
 xen/arch/arm/mmu/p2m.c | 82 +++---
 1 file changed, 62 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
index 41fcca011cf4..986c5a03c54b 100644
--- a/xen/arch/arm/mmu/p2m.c
+++ b/xen/arch/arm/mmu/p2m.c
@@ -753,34 +753,66 @@ static int p2m_mem_access_radix_set(struct p2m_domain 
*p2m, gfn_t gfn,
 return rc;
 }
 
-/*
- * Put any references on the single 4K page referenced by pte.
- * TODO: Handle superpages, for now we only take special references for leaf
- * pages (specifically foreign ones, which can't be super mapped today).
- */
-static void p2m_put_l3_page(const lpae_t pte)
+static void p2m_put_foreign_page(struct page_info *pg)
 {
-mfn_t mfn = lpae_get_mfn(pte);
-
-ASSERT(p2m_is_valid(pte));
-
 /*
- * TODO: Handle other p2m types
- *
  * It's safe to do the put_page here because page_alloc will
  * flush the TLBs if the page is reallocated before the end of
  * this loop.
  */
-if ( p2m_is_foreign(pte.p2m.type) )
+put_page(pg);
+}
+
+/* Put any references on the single 4K page referenced by mfn. */
+static void p2m_put_l3_page(mfn_t mfn, p2m_type_t type)
+{
+/* TODO: Handle other p2m types */
+if ( p2m_is_foreign(type) )
 {
 ASSERT(mfn_valid(mfn));
-put_page(mfn_to_page(mfn));
+p2m_put_foreign_page(mfn_to_page(mfn));
 }
 /* Detect the xenheap page and mark the stored GFN as invalid. */
-else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
+else if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) )
 page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
 }
 
+/* Put any references on the superpage referenced by mfn. */
+static void p2m_put_l2_superpage(mfn_t mfn, p2m_type_t type)
+{
+struct page_info *pg;
+unsigned int i;
+
+/*
+ * TODO: Handle other p2m types, but be aware that any changes to handle
+ * different types should require an update on the relinquish code to 
handle
+ * preemption.
+ */
+if ( !p2m_is_foreign(type) )
+return;
+
+ASSERT(mfn_valid(mfn));
+
+pg = mfn_to_page(mfn);
+
+for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++, pg++ )
+p2m_put_foreign_page(pg);
+}
+
+/* Put any references on the page referenced by pte. */
+static void p2m_put_page(const lpae_t pte, unsigned int level)
+{
+mfn_t mfn = lpae_get_mfn(pte);
+
+ASSERT(p2m_is_valid(pte));
+
+/* We have a second level 2M superpage */
+if ( p2m_is_superpage(pte, level) && (level == 2) )
+return p2m_put_l2_superpage(mfn, pte.p2m.type);
+else if ( level == 3 )
+return p2m_put_l3_page(mfn, pte.p2m.type);
+}
+
 /* Free lpae sub-tree behind an entry */
 static void p2m_free_entry(struct p2m_domain *p2m,
lpae_t entry, unsigned int level)
@@ -809,9 +841,16 @@ static void p2m_free_entry(struct p2m_domain *p2m,
 #endif
 
 p2m->stats.mappings[level]--;
-/* Nothing to do if the entry is a super-page. */
-if ( level == 3 )
-p2m_put_l3_page(entry);
+/*
+ * TODO: Currently we don't handle 1GB super-page, Xen is not
+ * preemptible and therefore some work is needed to handle such
+ * superpages, for which at some point Xen might end up freeing memory
+ * and therefore for such

[PATCH v4 5/7] xen/arm: Rework heap page allocation outside allocate_bank_memory

The function allocate_bank_memory allocates pages from the heap and
maps them to the guest using guest_physmap_add_page.

As a preparation work to support static shared memory bank when the
host physical address is not provided, Xen needs to allocate memory
from the heap, so rework allocate_bank_memory moving out the page
allocation in a new function called allocate_domheap_memory.

The function allocate_domheap_memory takes a callback function and
a pointer to some extra information passed to the callback and this
function will be called for every region, until a defined size is
reached.

In order to keep allocate_bank_memory functionality, the callback
passed to allocate_domheap_memory is a wrapper for
guest_physmap_add_page.

Let allocate_domheap_memory be externally visible, in order to use
it in the future from the static shared memory module.

Take the opportunity to change the signature of allocate_bank_memory
and remove the 'struct domain' parameter, which can be retrieved from
'struct kernel_info'.

No functional changes is intended.

Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v3 changes:
 - Add R-by Michal
v2:
 - Reduced scope of pg var in allocate_domheap_memory, removed not
   necessary BUG_ON(), changed callback to return bool and fix
   comment. (Michal)
---
 xen/arch/arm/dom0less-build.c   |  4 +-
 xen/arch/arm/domain_build.c | 79 +
 xen/arch/arm/include/asm/domain_build.h |  9 ++-
 3 files changed, 62 insertions(+), 30 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242f4..20ddf6f8f250 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -60,12 +60,12 @@ static void __init allocate_memory(struct domain *d, struct 
kernel_info *kinfo)
 
 mem->nr_banks = 0;
 bank_size = MIN(GUEST_RAM0_SIZE, kinfo->unassigned_mem);
-if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM0_BASE),
+if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(GUEST_RAM0_BASE),
bank_size) )
 goto fail;
 
 bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);
-if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
+if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
bank_size) )
 goto fail;
 
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 02e741685102..669970c86fd5 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -417,30 +417,15 @@ static void __init allocate_memory_11(struct domain *d,
 }
 
 #ifdef CONFIG_DOM0LESS_BOOT
-bool __init allocate_bank_memory(struct domain *d, struct kernel_info *kinfo,
- gfn_t sgfn, paddr_t tot_size)
+bool __init allocate_domheap_memory(struct domain *d, paddr_t tot_size,
+alloc_domheap_mem_cb cb, void *extra)
 {
-struct membanks *mem = kernel_info_get_mem(kinfo);
-int res;
-struct page_info *pg;
-struct membank *bank;
-unsigned int max_order = ~0;
-
-/*
- * allocate_bank_memory can be called with a tot_size of zero for
- * the second memory bank. It is not an error and we can safely
- * avoid creating a zero-size memory bank.
- */
-if ( tot_size == 0 )
-return true;
-
-bank = >bank[mem->nr_banks];
-bank->start = gfn_to_gaddr(sgfn);
-bank->size = tot_size;
+unsigned int max_order = UINT_MAX;
 
 while ( tot_size > 0 )
 {
 unsigned int order = get_allocation_size(tot_size);
+struct page_info *pg;
 
 order = min(max_order, order);
 
@@ -463,17 +448,61 @@ bool __init allocate_bank_memory(struct domain *d, struct 
kernel_info *kinfo,
 continue;
 }
 
-res = guest_physmap_add_page(d, sgfn, page_to_mfn(pg), order);
-if ( res )
-{
-dprintk(XENLOG_ERR, "Failed map pages to DOMU: %d", res);
+if ( !cb(d, pg, order, extra) )
 return false;
-}
 
-sgfn = gfn_add(sgfn, 1UL << order);
 tot_size -= (1ULL << (PAGE_SHIFT + order));
 }
 
+return true;
+}
+
+static bool __init guest_map_pages(struct domain *d, struct page_info *pg,
+   unsigned int order, void *extra)
+{
+gfn_t *sgfn = (gfn_t *)extra;
+int res;
+
+BUG_ON(!sgfn);
+res = guest_physmap_add_page(d, *sgfn, page_to_mfn(pg), order);
+if ( res )
+{
+dprintk(XENLOG_ERR, "Failed map pages to DOMU: %d", res);
+return false;
+}
+
+*sgfn = gfn_add(*sgfn, 1UL << order);
+
+return true;
+}
+
+bool __init allocate_bank_memory(struct kernel_info *kinfo, gfn_t sgfn,
+ paddr_t tot_size)
+{
+struct membanks *mem = kernel_info_get_mem(kinfo);
+struct domain *d = kinfo->d;
+struct membank *bank;
+
+/*
+ * allocate_bank_memory can be

[RFC for-4.20 v1 0/1] x86/hvm: Introduce Xen-wide ASID allocator

2024-05-24 Thread Vaishali Thakkar

Motivation:
---
This is part of the effort to enable AMD SEV technologies in Xen. For
AMD SEV support, we need a fixed ASID associated with all vcpus of the
same domain throughout the domain's lifetime. This is because for SEV/
SEV-{ES,SNP} VM, the ASID is the index which is associated with the
encryption key.

Currently, ASID generation and management is done per-PCPU in Xen. And
at the time of each VMENTER, the ASID associated with vcpus of the
domain is changed. This implementation is incompatible with SEV
technologies for the above mentioned reasons. In a discussion with
Andrew Cooper, it came up that it’ll be nice to have fixed ASIDs not
only for SEV VMs but also for all VMs. Because it opens up the
opportunity to use instructions like TLBSYNC and INVLPGB (Section
5.5.3 in AMD Architecture manual[0]) for broadcasting the TLB
Invalidations.

Why is this RFC?

This is only tested on AMD SVM at the moment. There are a few points
that I would like to discuss and get a feedback on from the community
before further development and testing. I’ve also submitted a design
session for this RFC to discuss further at the Xen Summit.

Points of discussion:
-
1. I’m not sure how this should be handled for the nestedhvm. To start
with, at the moment all the values seem to be handled via struct
nestedvcpu. Do we want to keep it that way or do we want to have
something like nestedhvm_domain to associate certain values like asid?
I’ve not handled this as part of this RFC as I would like to know the
opinions and plans of those working on nested virtualization.

2. I’m doing initialization of xen-wide asids at the moment in setup.c
but is that the right place to do it? I’m asking this because I’ve
been seeing a weird bug with the code in this RFC. Dom0 is able to
have a fixed asid through the lifecycle of it. But if I start a domU
with 2/4 vcpus via xl, sometimes it only brings up the one vcpu and
shows ‘tsc: Unable to calibrate against PIT’ while booting the kernel.

Notes:
-
1. Currently the RFC doesn’t demonstrate the use of TLBSYNC and INVLPGB.
It can further be added if required. I'm not sure if it should be part
of the same patch series or not.

2. This is a basic RFC to start the discussion on the above points but
I further plan to add a logic to reclaim the asids that are no longer
in use and add a check to pick the asid from such stack before doing
hvm_asid_flush_all.

[0]
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf

Vaishali Thakkar (1):
x86/hvm: Introduce Xen-wide ASID Allocator

--
2.45.0

[RFC for-4.20 v1 1/1] x86/hvm: Introduce Xen-wide ASID Allocator

2024-05-24 Thread Vaishali Thakkar

Currently ASID generation and management is done per-PCPU. This
scheme is incompatible with SEV technologies as SEV VMs need to
have a fixed ASID associated with all vcpus of the VM throughout
it's lifetime.

This commit introduces a Xen-wide allocator which initializes
the asids at the start of xen and allows to have a fixed asids
throughout the lifecycle of all domains. Having a fixed asid
for non-SEV domains also presents us with the opportunity to
further take use of AMD instructions like TLBSYNC and INVLPGB
for broadcasting the TLB invalidations.

Signed-off-by: Vaishali Thakkar 
---
 xen/arch/x86/flushtlb.c|  1 -
 xen/arch/x86/hvm/asid.c| 61 ++
 xen/arch/x86/hvm/emulate.c |  3 --
 xen/arch/x86/hvm/hvm.c |  2 +-
 xen/arch/x86/hvm/nestedhvm.c   |  4 +-
 xen/arch/x86/hvm/svm/asid.c| 28 +++-
 xen/arch/x86/hvm/svm/nestedsvm.c   |  2 +-
 xen/arch/x86/hvm/svm/svm.c | 33 ++
 xen/arch/x86/hvm/svm/svm.h |  1 -
 xen/arch/x86/hvm/vmx/vmcs.c|  2 +-
 xen/arch/x86/hvm/vmx/vmx.c | 15 +++
 xen/arch/x86/hvm/vmx/vvmx.c|  6 +--
 xen/arch/x86/include/asm/hvm/asid.h| 19 
 xen/arch/x86/include/asm/hvm/domain.h  |  1 +
 xen/arch/x86/include/asm/hvm/hvm.h |  2 +-
 xen/arch/x86/include/asm/hvm/svm/svm.h |  1 +
 xen/arch/x86/include/asm/hvm/vcpu.h|  6 +--
 xen/arch/x86/include/asm/hvm/vmx/vmx.h |  3 +-
 xen/arch/x86/mm/hap/hap.c  |  4 +-
 xen/arch/x86/mm/p2m.c  |  6 +--
 xen/arch/x86/mm/paging.c   |  2 +-
 xen/arch/x86/setup.c   |  7 +++
 22 files changed, 108 insertions(+), 101 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 18748b2bc8..69d72944d7 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -124,7 +124,6 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 
 if ( tlb_clk_enabled )
 t = pre_flush();
-hvm_flush_guest_tlbs();
 
 old_cr4 = read_cr4();
 ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE));
diff --git a/xen/arch/x86/hvm/asid.c b/xen/arch/x86/hvm/asid.c
index 8d27b7dba1..f343bfcdb9 100644
--- a/xen/arch/x86/hvm/asid.c
+++ b/xen/arch/x86/hvm/asid.c
@@ -27,8 +27,8 @@ boolean_param("asid", opt_asid_enabled);
  * the TLB.
  *
  * Sketch of the Implementation:
- *
- * ASIDs are a CPU-local resource.  As preemption of ASIDs is not possible,
+ * TODO(vaishali): Update this comment
+ * ASIDs are Xen-wide resource.  As preemption of ASIDs is not possible,
  * ASIDs are assigned in a round-robin scheme.  To minimize the overhead of
  * ASID invalidation, at the time of a TLB flush,  ASIDs are tagged with a
  * 64-bit generation.  Only on a generation overflow the code needs to
@@ -38,20 +38,21 @@ boolean_param("asid", opt_asid_enabled);
  * ASID useage to retain correctness.
  */
 
-/* Per-CPU ASID management. */
+/* Xen-wide ASID management */
 struct hvm_asid_data {
-   uint64_t core_asid_generation;
+   uint64_t asid_generation;
uint32_t next_asid;
uint32_t max_asid;
+   uint32_t min_asid;
bool disabled;
 };
 
-static DEFINE_PER_CPU(struct hvm_asid_data, hvm_asid_data);
+static struct hvm_asid_data asid_data;
 
 void hvm_asid_init(int nasids)
 {
 static int8_t g_disabled = -1;
-struct hvm_asid_data *data = _cpu(hvm_asid_data);
+struct hvm_asid_data *data = _data;
 
 data->max_asid = nasids - 1;
 data->disabled = !opt_asid_enabled || (nasids <= 1);
@@ -64,67 +65,73 @@ void hvm_asid_init(int nasids)
 }
 
 /* Zero indicates 'invalid generation', so we start the count at one. */
-data->core_asid_generation = 1;
+data->asid_generation = 1;
 
 /* Zero indicates 'ASIDs disabled', so we start the count at one. */
 data->next_asid = 1;
 }
 
-void hvm_asid_flush_vcpu_asid(struct hvm_vcpu_asid *asid)
+void hvm_asid_flush_domain_asid(struct hvm_domain_asid *asid)
 {
 write_atomic(>generation, 0);
 }
 
-void hvm_asid_flush_vcpu(struct vcpu *v)
+void hvm_asid_flush_domain(struct domain *d)
 {
-hvm_asid_flush_vcpu_asid(>arch.hvm.n1asid);
-hvm_asid_flush_vcpu_asid(_nestedhvm(v).nv_n2asid);
+hvm_asid_flush_domain_asid(>arch.hvm.n1asid);
+//hvm_asid_flush_domain_asid(_nestedhvm(v).nv_n2asid);
 }
 
-void hvm_asid_flush_core(void)
+void hvm_asid_flush_all(void)
 {
-struct hvm_asid_data *data = _cpu(hvm_asid_data);
+struct hvm_asid_data *data = _data;
 
-if ( data->disabled )
+if ( data->disabled)
 return;
 
-if ( likely(++data->core_asid_generation != 0) )
+if ( likely(++data->asid_generation != 0) )
 return;
 
 /*
- * ASID generations are 64 bit.  Overflow of generations never happens.
- * For safety, we simply disable ASIDs, so correctness is established; it
- * only runs a bit slower.
- */
+* ASID generations are 64 bit.  Overflow of

Re: [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook

On 23/05/2024 15:50, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:22PM +0100, Alejandro Vallejo wrote:
>> While at it, add a check for the reserved field in the hidden save area.
>>
>> Signed-off-by: Alejandro Vallejo 
>> ---
>> v2:
>>   * New patch. Addresses the missing check for rsvd_zero in v1.
> 
> Oh, it would be better if this was done at the time when rsvd_zero is
> introduced.  I think this should be moved ahead of the series, so that
> the patch that introduces rsvd_zero can add the check in
> lapic_check_hidden().

I'll give that a whirl.

> 
>> ---
>>  xen/arch/x86/hvm/vlapic.c | 41 ---
>>  1 file changed, 30 insertions(+), 11 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>> index 8a24419c..2f06bff1b2cc 100644
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -1573,35 +1573,54 @@ static void lapic_load_fixup(struct vlapic *vlapic)
>> v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
>>  }
>>  
>> -static int cf_check lapic_load_hidden(struct domain *d, 
>> hvm_domain_context_t *h)
>> +static int cf_check lapic_check_hidden(const struct domain *d,
>> +   hvm_domain_context_t *h)
>>  {
>>  unsigned int vcpuid = hvm_load_instance(h);
>> -struct vcpu *v;
>> -struct vlapic *s;
>> +struct hvm_hw_lapic s;
>>  
>>  if ( !has_vlapic(d) )
>>  return -ENODEV;
>>  
>>  /* Which vlapic to load? */
>> -if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
>> +if ( vcpuid >= d->max_vcpus || d->vcpu[vcpuid] == NULL )
>>  {
>>  dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
>>  d->domain_id, vcpuid);
>>  return -EINVAL;
>>  }
>> -s = vcpu_vlapic(v);
>>  
>> -if ( hvm_load_entry_zeroextend(LAPIC, h, >hw) != 0 )
>> +if ( hvm_load_entry_zeroextend(LAPIC, h, ) )
> 
> Can't you use hvm_get_entry() to perform the sanity checks:
> 
> const struct hvm_hw_lapic *s = hvm_get_entry(LAPIC, h);
> 
> Thanks, Roger.

I don't think I can. Because the last field (rsvd_zero) might or might
not be there, so it needs to be zero-extended. Unless I misunderstood
what hvm_get_entry() is meant to do. It seems to check for exact sizes.

Cheers,
Alejandro

Re: [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area

2024-05-24 Thread Roger Pau Monné

On Fri, May 24, 2024 at 11:58:44AM +0100, Alejandro Vallejo wrote:
> On 23/05/2024 15:32, Roger Pau Monné wrote:
> >>  case 0xb:
> >> -/*
> >> - * In principle, this leaf is Intel-only.  In practice, it is 
> >> tightly
> >> - * coupled with x2apic, and we offer an x2apic-capable APIC 
> >> emulation
> >> - * to guests on AMD hardware as well.
> >> - *
> >> - * TODO: Rework topology logic.
> >> - */
> >> -if ( p->basic.x2apic )
> >> +/* Don't expose topology information to PV guests */
> > 
> > Not sure whether we want to keep part of the comment about exposing
> > x2APIC to guests even when x2APIC is not present in the host.  I think
> > this code has changed and the comment is kind of stale now.
> 
> The comment is definitely stale. Nowadays x2APIC is fully supported by
> AMD, as is leaf 0xb. The fact we emulate the x2APIC seems hardly
> relevant in a CPUID leaf about topology. I could keep a note showing...
> 
> /* Exposed alongside x2apic, as it's tightly coupled with it */
> 
> ... although that's directly implied by the conditional.

Yeah, I haven't gone through the history of this file, but I bet at
some point before the introduction of CPUID policies we leaked (part
of) the host CPUID contents in here.

It's also no longer true that the leaf is Intel only.

I fine either adding your newly proposed comment, or leaving it as-is.

> >> +}
> >> +
> >>  int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
> >>  {
> >>  const struct cpu_policy *cp = v->domain->arch.cpu_policy;
Le> >> @@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
> >>  if ( v->vcpu_id == 0 )
> >>  vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
> >>  
> >> -vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> >> +vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
> >>  vlapic_do_init(vlapic);
> >>  }
> >>  
> >> @@ -1514,6 +1530,16 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> >>  const struct vcpu *v = vlapic_vcpu(vlapic);
> >>  uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
> >>  
> >> +/*
> >> + * Loading record without hw.x2apic_id in the save stream, calculate 
> >> using
> >> + * the traditional "vcpu_id * 2" relation. There's an implicit 
> >> assumption
> >> + * that vCPU0 always has x2APIC0, which is true for the old relation, 
> >> and
> >> + * still holds under the new x2APIC generation algorithm. While that 
> >> case
> >> + * goes through the conditional it's benign because it still maps to 
> >> zero.
> >> + */
> >> +if ( !vlapic->hw.x2apic_id )
> >> +vlapic->hw.x2apic_id = v->vcpu_id * 2;
> >> +
> >>  /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct 
> >> */
> >>  if ( !vlapic_x2apic_mode(vlapic) ||
> >>   (vlapic->loaded.ldr == good_ldr) )
> >> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h 
> >> b/xen/arch/x86/include/asm/hvm/hvm.h
> >> index 0c9e6f15645d..e1f0585d75a9 100644
> >> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> >> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> >> @@ -448,6 +448,7 @@ static inline void hvm_update_guest_efer(struct vcpu 
> >> *v)
> >>  static inline void hvm_cpuid_policy_changed(struct vcpu *v)
> >>  {
> >>  alternative_vcall(hvm_funcs.cpuid_policy_changed, v);
> >> +vlapic_cpu_policy_changed(v);
> > 
> > Note sure whether this call would better be placed in
> > cpu_policy_updated() inside the is_hvm_vcpu() conditional branch.
> > 
> > hvm_cpuid_policy_changed()  are just wrappers around the hvm_funcs
> > hooks, pulling vlapic functions in there is likely to complicate the
> > header dependencies in the long term.
> > 
> 
> That's how it was in v1 and I moved it in v2 answering one of Jan's
> feedback points.
> 
> I don't mind either way.

Oh (goes and reads Jan's reply to v1) I see.  Let's leave it as-is
then.

> 
> >>  }
> >>  
> >>  static inline void hvm_set_tsc_offset(struct vcpu *v, uint64_t offset,
> >> diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h 
> >> b/xen/arch/x86/include/asm/hvm/vlapic.h
> >> index 88ef94524339..e8d41313abd3 100644
> >> --- a/xen/arch/x86/include/asm/hvm/vlapic.h
> >> +++ b/xen/arch/x86/include/asm/hvm/vlapic.h
> >> @@ -44,6 +44,7 @@
> >>  #define vlapic_xapic_mode(vlapic)   \
> >>  (!vlapic_hw_disabled(vlapic) && \
> >>   !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
> >> +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
> >>  
> >>  /*
> >>   * Generic APIC bitmap vector update & search routines.
> >> @@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, 
> >> bool force_ack);
> >>  
> >>  int  vlapic_init(struct vcpu *v);
> >>  void vlapic_destroy(struct vcpu *v);
> >> +void vlapic_cpu_policy_changed(struct vcpu *v);
> >>  
> >>  void vlapic_reset(struct vlapic *vlapic);
> >>  
> >> diff --git a/xen/include/public/arch-x86/hvm/save.h

[PATCH v11 6/9] xen/riscv: add minimal stuff to mm.h to build full Xen

Signed-off-by: Oleksii Kurochko 
Acked-by: Jan Beulich 
---
Changes in V8-V11:
 - Nothing changed only rebase.
---
Changes in V7:
 - update argument type of maddr_to_virt() function: unsigned long -> paddr_t
 - rename argument of PFN_ORDER(): pfn -> pg.
 - add Acked-by: Jan Beulich 
---
Changes in V6:
 - drop __virt_to_maddr() ( transform to macro ) and __maddr_to_virt ( rename 
to maddr_to_virt ).
 - parenthesize va in definition of vmap_to_mfn().
 - Code style fixes.
---
Changes in V5:
 - update the comment around "struct domain *domain;" : zero -> NULL
 - fix ident. for unsigned long val;
 - put page_to_virt() and virt_to_page() close to each other.
 - drop unnessary leading underscore
 - drop a space before the comment: /* Count of uses of this frame as its 
current type. */
 - drop comment about a page 'not as a shadow'. it is not necessary for RISC-V
---
Changes in V4:
 - update an argument name of PFN_ORDERN macros.
 - drop pad at the end of 'struct page_info'.
 - Change message -> subject in "Changes in V3"
 - delete duplicated macros from riscv/mm.h
 - fix identation in struct page_info
 - align comment for PGC_ macros
 - update definitions of domain_set_alloc_bitsize() and 
domain_clamp_alloc_bitsize()
 - drop unnessary comments.
 - s/BUG/BUG_ON("...")
 - define __virt_to_maddr, __maddr_to_virt as stubs
 - add inclusion of xen/mm-frame.h for mfn_x and others
 - include "xen/mm.h" instead of "asm/mm.h" to fix compilation issues:
 In file included from arch/riscv/setup.c:7:
./arch/riscv/include/asm/mm.h:60:28: error: field 'list' has incomplete 
type
   60 | struct page_list_entry list;
  |^~~~
./arch/riscv/include/asm/mm.h:81:43: error: 'MAX_ORDER' undeclared here 
(not in a function)
   81 | unsigned long first_dirty:MAX_ORDER + 1;
  |   ^
./arch/riscv/include/asm/mm.h:81:31: error: bit-field 'first_dirty' 
width not an integer constant
   81 | unsigned long first_dirty:MAX_ORDER + 1;
 - Define __virt_to_mfn() and __mfn_to_virt() using maddr_to_mfn() and 
mfn_to_maddr().
---
Changes in V3:
 - update the commit title
 - introduce DIRECTMAP_VIRT_START.
 - drop changes related pfn_to_paddr() and paddr_to_pfn as they were remvoe in
   [PATCH v2 32/39] xen/riscv: add minimal stuff to asm/page.h to build full Xen
 - code style fixes.
 - drop get_page_nr  and put_page_nr as they don't need for time being
 - drop CONFIG_STATIC_MEMORY related things
 - code style fixes
---
Changes in V2:
 - define stub for arch_get_dma_bitsize(void)
---
 xen/arch/riscv/include/asm/mm.h | 240 
 xen/arch/riscv/mm.c |   2 +-
 xen/arch/riscv/setup.c  |   2 +-
 3 files changed, 242 insertions(+), 2 deletions(-)

diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 07c7a0abba..cc4a07a71c 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -3,11 +3,246 @@
 #ifndef _ASM_RISCV_MM_H
 #define _ASM_RISCV_MM_H
 
+#include 
+#include 
+#include 
+#include 
+#include 
+
 #include 
 
 #define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT)
 #define paddr_to_pfn(pa)  ((unsigned long)((pa) >> PAGE_SHIFT))
 
+#define paddr_to_pdx(pa)mfn_to_pdx(maddr_to_mfn(pa))
+#define gfn_to_gaddr(gfn)   pfn_to_paddr(gfn_x(gfn))
+#define gaddr_to_gfn(ga)_gfn(paddr_to_pfn(ga))
+#define mfn_to_maddr(mfn)   pfn_to_paddr(mfn_x(mfn))
+#define maddr_to_mfn(ma)_mfn(paddr_to_pfn(ma))
+#define vmap_to_mfn(va) maddr_to_mfn(virt_to_maddr((vaddr_t)(va)))
+#define vmap_to_page(va)mfn_to_page(vmap_to_mfn(va))
+
+static inline void *maddr_to_virt(paddr_t ma)
+{
+BUG_ON("unimplemented");
+return NULL;
+}
+
+#define virt_to_maddr(va) ({ BUG_ON("unimplemented"); 0; })
+
+/* Convert between Xen-heap virtual addresses and machine frame numbers. */
+#define __virt_to_mfn(va)  mfn_x(maddr_to_mfn(virt_to_maddr(va)))
+#define __mfn_to_virt(mfn) maddr_to_virt(mfn_to_maddr(_mfn(mfn)))
+
+/*
+ * We define non-underscored wrappers for above conversion functions.
+ * These are overriden in various source files while underscored version
+ * remain intact.
+ */
+#define virt_to_mfn(va) __virt_to_mfn(va)
+#define mfn_to_virt(mfn)__mfn_to_virt(mfn)
+
+struct page_info
+{
+/* Each frame can be threaded onto a doubly-linked list. */
+struct page_list_entry list;
+
+/* Reference count and various PGC_xxx flags and fields. */
+unsigned long count_info;
+
+/* Context-dependent fields follow... */
+union {
+/* Page is in use: ((count_info & PGC_count_mask) != 0). */
+struct {
+/* Type reference count and various PGT_xxx flags and fields. */
+unsigned long type_info;
+} inuse;
+
+/* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
+union {
+

[PATCH v11 7/9] xen/riscv: add minimal amount of stubs to build full Xen

Signed-off-by: Oleksii Kurochko 
Acked-by: Jan Beulich 
---
Changes in V7-V11:
 - Only rebase was done.
---
Changes in V6:
 - update the commit in stubs.c around /* ... common/irq.c ... */
 - add Acked-by: Jan Beulich 
---
Changes in V5:
 - drop unrelated changes
 - assert_failed("unimplmented...") change to BUG_ON()
---
Changes in V4:
  - added new stubs which are necessary for compilation after rebase: 
__cpu_up(), __cpu_disable(), __cpu_die()
from smpboot.c
  - back changes related to printk() in early_printk() as they should be 
removed in the next patch to avoid
compilation error.
  - update definition of cpu_khz: __read_mostly -> __ro_after_init.
  - drop vm_event_reset_vmtrace(). It is defibed in asm-generic/vm_event.h.
  - move vm_event_*() functions from stubs.c to riscv/vm_event.c.
  - s/BUG/BUG_ON("unimplemented") in stubs.c
  - back irq_actor_none() and irq_actor_none() as common/irq.c isn't compiled 
at this moment,
so this function are needed to avoid compilation error.
  - defined max_page to avoid compilation error, it will be removed as soon as 
common/page_alloc.c will
be compiled.
---
Changes in V3:
 - code style fixes.
 - update attribute for frametable_base_pdx  and frametable_virt_end to 
__ro_after_init.
   insteaf of read_mostly.
 - use BUG() instead of assert_failed/WARN for newly introduced stubs.
 - drop "#include " in stubs.c and use forward declaration 
instead.
 - drop ack_node() and end_node() as they aren't used now.
---
Changes in V2:
 - define udelay stub
 - remove 'select HAS_PDX' from RISC-V Kconfig because of
   
https://lore.kernel.org/xen-devel/20231006144405.1078260-1-andrew.coop...@citrix.com/
---
 xen/arch/riscv/Makefile |   1 +
 xen/arch/riscv/mm.c |  50 +
 xen/arch/riscv/setup.c  |   8 +
 xen/arch/riscv/stubs.c  | 439 
 xen/arch/riscv/traps.c  |  25 +++
 5 files changed, 523 insertions(+)
 create mode 100644 xen/arch/riscv/stubs.c

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 1ed1a8369b..60afbc0ad9 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -4,6 +4,7 @@ obj-y += mm.o
 obj-$(CONFIG_RISCV_64) += riscv64/
 obj-y += sbi.o
 obj-y += setup.o
+obj-y += stubs.o
 obj-y += traps.o
 obj-y += vm_event.o
 
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index fe3a43be20..2c3fb7d72e 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 
+#include 
 #include 
 #include 
 #include 
@@ -14,6 +15,9 @@
 #include 
 #include 
 
+unsigned long __ro_after_init frametable_base_pdx;
+unsigned long __ro_after_init frametable_virt_end;
+
 struct mmu_desc {
 unsigned int num_levels;
 unsigned int pgtbl_count;
@@ -294,3 +298,49 @@ unsigned long __init calc_phys_offset(void)
 phys_offset = load_start - XEN_VIRT_START;
 return phys_offset;
 }
+
+void put_page(struct page_info *page)
+{
+BUG_ON("unimplemented");
+}
+
+unsigned long get_upper_mfn_bound(void)
+{
+/* No memory hotplug yet, so current memory limit is the final one. */
+return max_page - 1;
+}
+
+void arch_dump_shared_mem_info(void)
+{
+BUG_ON("unimplemented");
+}
+
+int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
+{
+BUG_ON("unimplemented");
+return -1;
+}
+
+int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,
+  union add_to_physmap_extra extra,
+  unsigned long idx, gfn_t gfn)
+{
+BUG_ON("unimplemented");
+
+return 0;
+}
+
+int destroy_xen_mappings(unsigned long s, unsigned long e)
+{
+BUG_ON("unimplemented");
+return -1;
+}
+
+int map_pages_to_xen(unsigned long virt,
+ mfn_t mfn,
+ unsigned long nr_mfns,
+ unsigned int flags)
+{
+BUG_ON("unimplemented");
+return -1;
+}
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 98a94c4c48..8bb5bdb2ae 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -1,11 +1,19 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 
+#include 
 #include 
 #include 
 #include 
 
+#include 
+
 #include 
 
+void arch_get_xen_caps(xen_capabilities_info_t *info)
+{
+BUG_ON("unimplemented");
+}
+
 /* Xen stack for bringing up the first CPU. */
 unsigned char __initdata cpu0_boot_stack[STACK_SIZE]
 __aligned(STACK_SIZE);
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
new file mode 100644
index 00..8285bcffef
--- /dev/null
+++ b/xen/arch/riscv/stubs.c
@@ -0,0 +1,439 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/* smpboot.c */
+
+cpumask_t cpu_online_map;
+cpumask_t cpu_present_map;
+cpumask_t cpu_possible_map;
+
+/* ID of the PCPU we're running on */
+DEFINE_PER_CPU(unsigned int, cpu_id);
+/* XXX these seem awfully x86ish... */
+/* representing HT siblings of each logical CPU

[PATCH v11 2/9] xen: introduce generic non-atomic test_*bit()

The following generic functions were introduced:
* test_bit
* generic__test_and_set_bit
* generic__test_and_clear_bit
* generic__test_and_change_bit

These functions and macros can be useful for architectures
that don't have corresponding arch-specific instructions.

Also, the patch introduces the following generics which are
used by the functions mentioned above:
* BITOP_BITS_PER_WORD
* BITOP_MASK
* BITOP_WORD
* BITOP_TYPE

BITS_PER_BYTE was moved to xen/bitops.h as BITS_PER_BYTE is the same
across architectures.

The following approach was chosen for generic*() and arch*() bit
operation functions:
If the bit operation function that is going to be generic starts
with the prefix "__", then the corresponding generic/arch function
will also contain the "__" prefix. For example:
 * test_bit() will be defined using arch_test_bit() and
   generic_test_bit().
 * __test_and_set_bit() will be defined using
   arch__test_and_set_bit() and generic__test_and_set_bit().

Signed-off-by: Oleksii Kurochko 
---
 Reviewed-by: Jan Beulich jbeul...@suse.com? Jan gave his R-by for the previous
 version of the patch, but some changes were done, so I wasn't sure if I could
 use the R-by in this patch version.


 The current patch can be merged without waiting for Andrew's patch series as
 it was rebased on top of staging.
---
Changes in V11:
 - fix identation in generic_test_bit() function.
 - move definition of BITS_PER_BYTE from /bitops.h to xen/bitops.h
 - drop the changes in arm64/livepatch.c.
 - update the the comments on top of functions: generic__test_and_set_bit(),
   generic__test_and_clear_bit(),  generic__test_and_change_bit(),
   generic_test_bit().
 - Update footer after Signed-off section.
 - Rebase the patch on top of staging branch, so it can be merged when necessary
   approves will be given.
 - Update the commit message.
---
Changes in V10:
 - update the commit message. ( re-order paragraphs and add explanation usage 
of prefix "__" in bit
   operation function names )
 - add  parentheses around the whole expression of bitop_bad_size() macros.
 - move macros bitop_bad_size() above asm/bitops.h as it is not arch-specific 
anymore and there is
   no need for overriding it.
 - drop macros check_bitop_size() and use "if ( bitop_bad_size(addr) ) 
__bitop_bad_size();" implictly
   where it is needed.
 - in  use 'int' as a first parameter for __test_and_*(), 
generic__test_and_*() to be
   consistent with how the mentioned functions were declared in the original 
per-arch functions.
 - add 'const' to p variable in generic_test_bit().
 - move definition of BITOP_BITS_PER_WORD and bitop_uint_t to xen/bitops.h as 
we don't allow for arch
   overrides these definitions anymore.
---
Changes in V9:
  - move up xen/bitops.h in ppc/asm/page.h.
  - update defintion of arch_check_bitop_size.
And drop correspondent macros from x86/asm/bitops.h
  - drop parentheses in generic__test_and_set_bit() for definition of
local variable p.
  - fix indentation inside #ifndef BITOP_TYPE...#endif
  - update the commit message.
---
 Changes in V8:
  - drop __pure for function which uses volatile.
  - drop unnessary () in generic__test_and_change_bit() for addr casting.
  - update prototype of generic_test_bit() and test_bit(): now it returns bool
instead of int.
  - update generic_test_bit() to use BITOP_MASK().
  - Deal with fls{l} changes: it should be in the patch with introduced generic 
fls{l}.
  - add a footer with explanation of dependency on an uncommitted patch after 
Signed-off.
  - abstract bitop_size().
  - move BITOP_TYPE define to .
---
 Changes in V7:
  - move everything to xen/bitops.h to follow the same approach for all generic
bit ops.
  - put together BITOP_BITS_PER_WORD and bitops_uint_t.
  - make BITOP_MASK more generic.
  - drop #ifdef ... #endif around BITOP_MASK, BITOP_WORD as they are generic
enough.
  - drop "_" for generic__{test_and_set_bit,...}().
  - drop " != 0" for functions which return bool.
  - add volatile during the cast for generic__{...}().
  - update the commit message.
  - update arch related code to follow the proposed generic approach.
---
 Changes in V6:
  - Nothing changed ( only rebase )
---
 Changes in V5:
   - new patch
---
 xen/arch/arm/include/asm/bitops.h |  69 ---
 xen/arch/ppc/include/asm/bitops.h |  54 -
 xen/arch/ppc/include/asm/page.h   |   2 +-
 xen/arch/ppc/mm-radix.c   |   2 +-
 xen/arch/x86/include/asm/bitops.h |  31 ++---
 xen/include/xen/bitops.h  | 185 ++
 6 files changed, 196 insertions(+), 147 deletions(-)

diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h
index ab030b6cb0..af38fbffdd 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -22,11 +22,6 @@
 #define __set_bit(n,p)set_bit(n,p)
 #define __clear_bit(n,p)  clear_bit(n,p)
 
-#define BITOP_BITS_PER_WORD 32
-#define BITOP_MASK(nr)  (1UL <<

[PATCH v11 3/9] xen/bitops: implement fls{l}() in common logic

To avoid the compilation error below, it is needed to update to places
in common/page_alloc.c where flsl() is used as now flsl() returns unsigned int:

./include/xen/kernel.h:18:21: error: comparison of distinct pointer types lacks 
a cast [-Werror]
   18 | (void) (&_x == &_y);\
  | ^~
common/page_alloc.c:1843:34: note: in expansion of macro 'min'
 1843 | unsigned int inc_order = min(MAX_ORDER, flsl(e - s) - 1);

generic_fls{l} was used instead of __builtin_clz{l}(x) as if x is 0,
the result in undefined.

The prototype of the per-architecture fls{l}() functions was changed to
return 'unsigned int' to align with the generic implementation of these
functions and avoid introducing signed/unsigned mismatches.

Signed-off-by: Oleksii Kurochko 
---
 The current one patch can be merged w/o waiting of Andrew's patch series.
 
 Andrew C. could you please consider the following patch to be as a part of 
your bit operation
 patch series if your patch series will be in staging before mine:
   
https://gitlab.com/xen-project/people/olkur/xen/-/commit/24a346c7aa4f51ba34eacb7bfee2808e431daf00
 
 Thanks in advance.
---
Changes in V11:
 - drop unnessary case and fix code style for x86's arch_flsl()
 - rebase on top of staging, so it can be merged to staging if necessary 
apporves will be given
 - move changes related to fls related function to separate patch on top of 
Andrew's patch series
   as the C file with tests is introduced in his patch series. The patch is 
mentioned in footer.
---
Changes in V10:
 - update return type of arch_flsl() across arcitectures to 'unsigned int' to 
be aligned
   with return type of generic flsl() in xen/bitops.h.
 - switch inline to always_inline for arch_flsl() across architectures to be in 
sync
   with other similar changes.
 - define arch_flsl as arch_fls not just only fls.
 - update the commit message ( add information that per-arch fls{l)() protypes 
were
   changed ).
---
Changes in V9:
 - update return type of fls and flsl() to unsigned int to be aligned with other
   bit ops.
 - update places where return value of fls() and flsl() is compared with int.
 - update the commit message.
---
Changes in V8:
 - do proper rebase: back definition of fls{l} to the current patch.
 - drop the changes which removed ffz() in PPC. it should be done not
   in this patch.
 - add a message after Signed-off.
---
Changes in V7:
 - Code style fixes
---
Changes in V6:
 - new patch for the patch series.
---
 xen/arch/arm/include/asm/arm32/bitops.h |  2 +-
 xen/arch/arm/include/asm/arm64/bitops.h |  6 ++
 xen/arch/arm/include/asm/bitops.h   |  9 +++--
 xen/arch/ppc/include/asm/bitops.h   |  2 --
 xen/arch/x86/include/asm/bitops.h   | 12 +++-
 xen/common/page_alloc.c |  4 ++--
 xen/include/xen/bitops.h| 24 
 7 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm32/bitops.h 
b/xen/arch/arm/include/asm/arm32/bitops.h
index d0309d47c1..9ee96f568b 100644
--- a/xen/arch/arm/include/asm/arm32/bitops.h
+++ b/xen/arch/arm/include/asm/arm32/bitops.h
@@ -1,7 +1,7 @@
 #ifndef _ARM_ARM32_BITOPS_H
 #define _ARM_ARM32_BITOPS_H
 
-#define flsl fls
+#define arch_flsl arch_fls
 
 /*
  * Little endian assembly bitops.  nr = 0 -> byte 0 bit 0.
diff --git a/xen/arch/arm/include/asm/arm64/bitops.h 
b/xen/arch/arm/include/asm/arm64/bitops.h
index 906d84e5f2..d942077392 100644
--- a/xen/arch/arm/include/asm/arm64/bitops.h
+++ b/xen/arch/arm/include/asm/arm64/bitops.h
@@ -1,17 +1,15 @@
 #ifndef _ARM_ARM64_BITOPS_H
 #define _ARM_ARM64_BITOPS_H
 
-static inline int flsl(unsigned long x)
+static always_inline unsigned int arch_flsl(unsigned long x)
 {
 uint64_t ret;
 
-if (__builtin_constant_p(x))
-   return generic_flsl(x);
-
 asm("clz\t%0, %1" : "=r" (ret) : "r" (x));
 
 return BITS_PER_LONG - ret;
 }
+#define arch_flsl arch_flsl
 
 /* Based on linux/include/asm-generic/bitops/find.h */
 
diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h
index af38fbffdd..52fd33a797 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -76,17 +76,14 @@ bool clear_mask16_timeout(uint16_t mask, volatile void *p,
  * the clz instruction for much better code efficiency.
  */
 
-static inline int fls(unsigned int x)
+static always_inline unsigned int arch_fls(unsigned int x)
 {
-int ret;
-
-if (__builtin_constant_p(x))
-   return generic_fls(x);
+unsigned int ret;
 
 asm("clz\t%"__OP32"0, %"__OP32"1" : "=r" (ret) : "r" (x));
 return 32 - ret;
 }
-
+#define arch_fls arch_fls
 
 #define ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
 #define ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
diff --git a/xen/arch/ppc/include/asm/bitops.h 
b/xen/arch/ppc/include/asm/bitops.h

[PATCH v11 0/9] Enable build of full Xen for RISC-V

This patch series performs all of the additions necessary to drop the
build overrides for RISCV and enable the full Xen build. Except in cases
where compatibile implementations already exist (e.g. atomic.h and
bitops.h), the newly added definitions are simple.

The patch series is based on the following patch series:
- [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() [1]
- [PATCH] move __read_mostly to xen/cache.h  [2]

Right now, the patch series doesn't have a direct dependency on [2] and it
provides __read_mostly in the patch:
[PATCH v3 26/34] xen/riscv: add definition of __read_mostly
However, it will be dropped as soon as [2] is merged or at least when the
final version of the patch [2] is provided.
As an option, it can be considered to merge arch-specific patch and then just
rebase [2] and drop arch-specific changes.

[1] 
https://lore.kernel.org/xen-devel/20240313172716.2325427-1-andrew.coop...@citrix.com/T/#t
[2] 
https://lore.kernel.org/xen-devel/f25eb5c9-7c14-6e23-8535-2c66772b3...@suse.com/

---
Changes in V11:
  - Patch was merged to staging:
- [PATCH v10 05/14] xen/riscv: introduce cmpxchg.h
  - [PATCH v10 06/14] xen/riscv: introduce atomic.h
  - [PATCH v10 07/14] xen/riscv: introduce monitor.h
  - [PATCH v10 09/14] xen/riscv: add required things to current.h
  - [PATCH v10 11/14] xen/riscv: introduce vm_event_*() functions
 - Other changes are specific to specific patches. Please look at changes for
   specific patch.
---
Changes in V10:
  - Patch was merged to staging:
- [PATCH v9 04/15] xen/bitops: put __ffs() into linux compatible header
 - Other changes are specific to specific patches. Please look at changes for
   specific patch.
---
Changes in V9:
 - Patch was merged to staging:
- [PATCH v8 07/17] xen/riscv: introduce io.h
- [PATCH v7 14/19] xen/riscv: add minimal stuff to page.h to build full 
Xen
 - Other changes are specific to specific patches. Please look at changes for
   specific patch.
---
Changes in V8:
 - Patch was merged to staging:
- [PATCH v7 01/19] automation: introduce fixed randconfig for RISC-V
- [PATCH v7 03/19] xen/riscv: introduce extenstion support check by compiler
 - Other changes are specific to specific patches. Please look at changes for
   specific patch.
 - Update the commit message:
 - drop the dependency from STATIC_ASSERT_UNREACHABLE() implementation.
 - Add suggestion to merge arch-specific changes related to __read_mostly.
---
Changes in V7:
 - Patch was merged to staging:
   [PATCH v6 15/20] xen/riscv: add minimal stuff to processor.h to build full 
Xen.
 - Other changes are specific to specific patches. Please look at changes for
   specific patch.
---
Changes in V6:
 - Update the cover letter message: drop already merged dependecies and add
   a new one.
 - Patches were merged to staging:
   - [PATCH v5 02/23] xen/riscv: use some asm-generic headers ( even v4 was
 merged to staging branch, I just wasn't apply changes on top of the latest 
staging branch )
   - [PATCH v5 03/23] xen/riscv: introduce nospec.h
   - [PATCH v5 10/23] xen/riscv: introduces acrquire, release and full barriers
 - Introduce new patches:
   - xen/riscv: introduce extenstion support check by compiler
   - xen/bitops: put __ffs() and ffz() into linux compatible header
   - xen/bitops: implement fls{l}() in common logic
 - The following patches were dropped:
   - drop some patches related to bitops operations as they were introduced in 
another
 patch series [...]
   - introduce new version for generic __ffs(), ffz() and fls{l}().
 - Merge patch from patch series "[PATCH v9 0/7]  Introduce generic headers" to 
this patch
   series as only one patch left in the generic headers patch series and it is 
more about
   RISC-V.
 - Other changes are specific to specific patches. please look at specific 
patch.
---
Changes in V5:
 - Update the cover letter as one of the dependencies were merged to staging.
 - Was introduced asm-generic for atomic ops and separate patches for 
asm-generic bit ops
 - Moved fence.h to separate patch to deal with some patches dependecies on 
fence.h
 - Patches were dropped as they were merged to staging:
   * [PATCH v4 03/30] xen: add support in public/hvm/save.h for PPC and RISC-V
   * [PATCH v4 04/30] xen/riscv: introduce cpufeature.h
   * [PATCH v4 05/30] xen/riscv: introduce guest_atomics.h
   * [PATCH v4 06/30] xen: avoid generation of empty asm/iommu.h
   * [PATCH v4 08/30] xen/riscv: introduce setup.h
   * [PATCH v4 10/30] xen/riscv: introduce flushtlb.h
   * [PATCH v4 11/30] xen/riscv: introduce smp.h
   * [PATCH v4 15/30] xen/riscv: introduce irq.h
   * [PATCH v4 16/30] xen/riscv: introduce p2m.h
   * [PATCH v4 17/30] xen/riscv: introduce regs.h
   * [PATCH v4 18/30] xen/riscv: introduce time.h
   * [PATCH v4 19/30] xen/riscv: introduce event.h
   * [PATCH v4 22/30] xen/riscv: define an address of frame table
 - Other changes are specific to specific patches.

[PATCH v11 5/9] xen/riscv: add definition of __read_mostly

The definition of __read_mostly should be removed in:
https://lore.kernel.org/xen-devel/f25eb5c9-7c14-6e23-8535-2c66772b3...@suse.com/

The patch introduces it in arch-specific header to not
block enabling of full Xen build for RISC-V.

Signed-off-by: Oleksii Kurochko 
---
- [PATCH] move __read_mostly to xen/cache.h  [2]

Right now, the patch series doesn't have a direct dependency on [2] and it
provides __read_mostly in the patch:
[PATCH v3 26/34] xen/riscv: add definition of __read_mostly
However, it will be dropped as soon as [2] is merged or at least when the
final version of the patch [2] is provided.

Considering that there is still no still final decision regarding patch [2] my 
suggestion
is to merge RISC-V specific patch and just drop the changes in patch [2].

[2] 
https://lore.kernel.org/xen-devel/f25eb5c9-7c14-6e23-8535-2c66772b3...@suse.com/
---
Changes in V9-V11:
 - Only rebase was done.
---
Change in V8:
 - update the footer after Signed-off.
---
Changes in V4-V7:
 - Nothing changed. Only rebase.
---
 xen/arch/riscv/include/asm/cache.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/riscv/include/asm/cache.h 
b/xen/arch/riscv/include/asm/cache.h
index 69573eb051..94bd94db53 100644
--- a/xen/arch/riscv/include/asm/cache.h
+++ b/xen/arch/riscv/include/asm/cache.h
@@ -3,4 +3,6 @@
 #ifndef _ASM_RISCV_CACHE_H
 #define _ASM_RISCV_CACHE_H
 
+#define __read_mostly __section(".data.read_mostly")
+
 #endif /* _ASM_RISCV_CACHE_H */
-- 
2.45.0

[PATCH v11 9/9] xen/README: add compiler and binutils versions for RISC-V64

This patch doesn't represent a strict lower bound for GCC and
GNU Binutils; rather, these versions are specifically employed by
the Xen RISC-V container and are anticipated to undergo continuous
testing. Older GCC and GNU Binutils would work,
but this is not a guarantee.

While it is feasible to utilize Clang, it's important to note that,
currently, there is no Xen RISC-V CI job in place to verify the
seamless functioning of the build with Clang.

Signed-off-by: Oleksii Kurochko 
--
Changes in V5-V11:
 - Nothing changed. Only rebase.
---
 Changes in V6:
  - update the message in README.
---
 Changes in V5:
  - update the commit message and README file with additional explanation about 
GCC and
GNU Binutils version. Additionally, it was added information about Clang.
---
 Changes in V4:
  - Update version of GCC (12.2) and GNU Binutils (2.39) to the version
which are in Xen's contrainter for RISC-V
---
 Changes in V3:
  - new patch
---
 README | 4 
 1 file changed, 4 insertions(+)

diff --git a/README b/README
index c8a108449e..30da5ff9c0 100644
--- a/README
+++ b/README
@@ -48,6 +48,10 @@ provided by your OS distributor:
   - For ARM 64-bit:
 - GCC 5.1 or later
 - GNU Binutils 2.24 or later
+  - For RISC-V 64-bit:
+- GCC 12.2 or later
+- GNU Binutils 2.39 or later
+  Older GCC and GNU Binutils would work, but this is not a guarantee.
 * POSIX compatible awk
 * Development install of zlib (e.g., zlib-dev)
 * Development install of Python 2.7 or later (e.g., python-dev)
-- 
2.45.0

[PATCH v11 1/9] xen/riscv: disable unnecessary configs

Disables unnecessary configs for two cases:
1. By utilizing EXTRA_FIXED_RANDCONFIG for randconfig builds (GitLab CI jobs).
2. By using tiny64_defconfig for non-randconfig builds.

Only configs which lead to compilation issues were disabled.

Remove lines related to disablement of configs which aren't affected
compilation:
 -# CONFIG_SCHED_CREDIT is not set
 -# CONFIG_SCHED_RTDS is not set
 -# CONFIG_SCHED_NULL is not set
 -# CONFIG_SCHED_ARINC653 is not set
 -# CONFIG_TRACEBUFFER is not set
 -# CONFIG_HYPFS is not set
 -# CONFIG_SPECULATIVE_HARDEN_ARRAY is not set

To allow CONFIG_ARGO build happy it was included  to 
as ARGO requires p2m_type_t ( p2m_ram_rw ) and declaration of
check_get_page_from_gfn() from xen/p2m-common.h.

Also, it was included  to asm/p2m.h as after the latter was
included to  the compilation error that EINVAL, EOPNOTSUPP
aren't declared started to occur.

CONFIG_XSM=n as it requires an introduction of:
* boot_module_find_by_kind()
* BOOTMOD_XSM
* struct bootmodule
* copy_from_paddr()
The mentioned things aren't introduced now.

CPU_BOOT_TIME_CPUPOOLS requires an introduction of cpu_physical_id() and
acpi_disabled, so it is disabled for now.

PERF_COUNTERS requires asm/perf.h and asm/perfc-defn.h, so it is
also disabled for now, as RISC-V hasn't introduced this headers yet.

LIVEPATCH isn't ready for RISC-V too and it can be overriden by randconfig,
so to avoid compilation errors for randconfig it is disabled for now.

Signed-off-by: Oleksii Kurochko 
---
Changes in V10-V11:
 - Nothing changed. Only rebase.
---
Changes in V9:
 - update the commit message: add info about LIVEPATCH and PERF_COUNTERS.
---
Changes in V8:
 - disabled CPU_BOOT_TIME_CPUPOOLS as it requires an introduction of 
cpu_physical_id() and acpi_disabled.
 - leave XSM disabled, add explanation in the commit message.
 - drop HYPFS as the patch was provided to resolve compilation issue when this 
condif is enabled for RISC-V.
 - include asm/p2m.h to asm/domain.h, and xen/errno.h to asm/p2m.h to drop ARGO 
config from
   tiny64_defconfing and build.yaml.
 - update the commit message.
---
Changes in V7:
 - Disable only configs which cause compilation issues.
 - Update the commit message.
---
Changes in V6:
 - Nothing changed. Only rebase.
---
Changes in V5:
 - Rebase and drop duplicated configs in EXTRA_FIXED_RANDCONFIG list
 - Update the commit message
---
Changes in V4:
 - Nothing changed. Only rebase
---
Changes in V3:
 - Remove EXTRA_FIXED_RANDCONFIG for non-randconfig jobs.
   For non-randconfig jobs, it is sufficient to disable configs by using the 
defconfig.
 - Remove double blank lines in build.yaml file before 
archlinux-current-gcc-riscv64-debug
---
Changes in V2:
 - update the commit message.
 - remove xen/arch/riscv/Kconfig changes.
---
 automation/gitlab-ci/build.yaml |  4 
 xen/arch/riscv/configs/tiny64_defconfig | 12 +---
 xen/arch/riscv/include/asm/domain.h |  2 ++
 xen/arch/riscv/include/asm/p2m.h|  2 ++
 4 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index 49d6265ad5..ff5c9055d1 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -494,10 +494,14 @@ alpine-3.18-gcc-debug-arm64-earlyprintk:
 .riscv-fixed-randconfig:
   variables: 
 EXTRA_FIXED_RANDCONFIG: |
+  CONFIG_BOOT_TIME_CPUPOOLS=n
   CONFIG_COVERAGE=n
   CONFIG_EXPERT=y
   CONFIG_GRANT_TABLE=n
   CONFIG_MEM_ACCESS=n
+  CONFIG_PERF_COUNTERS=n
+  CONFIG_LIVEPATCH=n
+  CONFIG_XSM=n
 
 archlinux-current-gcc-riscv64:
   extends: .gcc-riscv64-cross-build
diff --git a/xen/arch/riscv/configs/tiny64_defconfig 
b/xen/arch/riscv/configs/tiny64_defconfig
index 09defe236b..fc7a04872f 100644
--- a/xen/arch/riscv/configs/tiny64_defconfig
+++ b/xen/arch/riscv/configs/tiny64_defconfig
@@ -1,12 +1,10 @@
-# CONFIG_SCHED_CREDIT is not set
-# CONFIG_SCHED_RTDS is not set
-# CONFIG_SCHED_NULL is not set
-# CONFIG_SCHED_ARINC653 is not set
-# CONFIG_TRACEBUFFER is not set
-# CONFIG_HYPFS is not set
+# CONFIG_BOOT_TIME_CPUPOOLS is not set
 # CONFIG_GRANT_TABLE is not set
-# CONFIG_SPECULATIVE_HARDEN_ARRAY is not set
 # CONFIG_MEM_ACCESS is not set
+# CONFIG_PERF_COUNTERS is not set
+# CONFIG_COVERAGE is not set
+# CONFIG_LIVEPATCH is not set
+# CONFIG_XSM is not set
 
 CONFIG_RISCV_64=y
 CONFIG_DEBUG=y
diff --git a/xen/arch/riscv/include/asm/domain.h 
b/xen/arch/riscv/include/asm/domain.h
index 027bfa8a93..16a9dd57aa 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -5,6 +5,8 @@
 #include 
 #include 
 
+#include 
+
 struct hvm_domain
 {
 uint64_t  params[HVM_NR_PARAMS];
diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index 387f372b5d..26860c0ae7 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_RISCV_P2M_H__
 #define

[PATCH v11 8/9] xen/riscv: enable full Xen build

Signed-off-by: Oleksii Kurochko 
Reviewed-by: Jan Beulich 
---
 At least this patch cann't be merged w/o Andrew's patch series is merged as 
ffs related
 functions are used from that patch series:
  
https://lore.kernel.org/xen-devel/20240313172716.2325427-1-andrew.coop...@citrix.com/T/#t
---
Changes in V5-V11:
 - Nothing changed. Only rebase.
 - Add the footer after Signed-off section.
---
Changes in V4:
 - drop stubs for irq_actor_none() and irq_actor_none() as common/irq.c is 
compiled now.
 - drop defintion of max_page in stubs.c as common/page_alloc.c is compiled now.
 - drop printk() related changes in riscv/early_printk.c as common version will 
be used.
---
Changes in V3:
 - Reviewed-by: Jan Beulich 
 - unrealted change dropped in tiny64_defconfig
---
Changes in V2:
 - Nothing changed. Only rebase.
---

 xen/arch/riscv/Makefile   |  16 +++-
 xen/arch/riscv/arch.mk|   4 -
 xen/arch/riscv/early_printk.c | 168 --
 xen/arch/riscv/stubs.c|  24 -
 4 files changed, 15 insertions(+), 197 deletions(-)

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 60afbc0ad9..81b77b13d6 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -12,10 +12,24 @@ $(TARGET): $(TARGET)-syms
$(OBJCOPY) -O binary -S $< $@
 
 $(TARGET)-syms: $(objtree)/prelink.o $(obj)/xen.lds
-   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) -o $@
+   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< \
+   $(objtree)/common/symbols-dummy.o -o $(dot-target).0
+   $(NM) -pa --format=sysv $(dot-target).0 \
+   | $(objtree)/tools/symbols $(all_symbols) --sysv --sort \
+   > $(dot-target).0.S
+   $(MAKE) $(build)=$(@D) $(dot-target).0.o
+   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< \
+   $(dot-target).0.o -o $(dot-target).1
+   $(NM) -pa --format=sysv $(dot-target).1 \
+   | $(objtree)/tools/symbols $(all_symbols) --sysv --sort \
+   > $(dot-target).1.S
+   $(MAKE) $(build)=$(@D) $(dot-target).1.o
+   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) \
+   $(dot-target).1.o -o $@
$(NM) -pa --format=sysv $@ \
| $(objtree)/tools/symbols --all-symbols --xensyms --sysv 
--sort \
> $@.map
+   rm -f $(@D)/.$(@F).[0-9]*
 
 $(obj)/xen.lds: $(src)/xen.lds.S FORCE
$(call if_changed_dep,cpp_lds_S)
diff --git a/xen/arch/riscv/arch.mk b/xen/arch/riscv/arch.mk
index 8c071aff65..17827c302c 100644
--- a/xen/arch/riscv/arch.mk
+++ b/xen/arch/riscv/arch.mk
@@ -38,7 +38,3 @@ extensions := $(subst $(space),,$(extensions))
 # -mcmodel=medlow would force Xen into the lower half.
 
 CFLAGS += $(riscv-generic-flags)$(extensions) -mstrict-align -mcmodel=medany
-
-# TODO: Drop override when more of the build is working
-override ALL_OBJS-y = arch/$(SRCARCH)/built_in.o
-override ALL_LIBS-y =
diff --git a/xen/arch/riscv/early_printk.c b/xen/arch/riscv/early_printk.c
index 60742a042d..610c814f54 100644
--- a/xen/arch/riscv/early_printk.c
+++ b/xen/arch/riscv/early_printk.c
@@ -40,171 +40,3 @@ void early_printk(const char *str)
 str++;
 }
 }
-
-/*
- * The following #if 1 ... #endif should be removed after printk
- * and related stuff are ready.
- */
-#if 1
-
-#include 
-#include 
-
-/**
- * strlen - Find the length of a string
- * @s: The string to be sized
- */
-size_t (strlen)(const char * s)
-{
-const char *sc;
-
-for (sc = s; *sc != '\0'; ++sc)
-/* nothing */;
-return sc - s;
-}
-
-/**
- * memcpy - Copy one area of memory to another
- * @dest: Where to copy to
- * @src: Where to copy from
- * @count: The size of the area.
- *
- * You should not use this function to access IO space, use memcpy_toio()
- * or memcpy_fromio() instead.
- */
-void *(memcpy)(void *dest, const void *src, size_t count)
-{
-char *tmp = (char *) dest, *s = (char *) src;
-
-while (count--)
-*tmp++ = *s++;
-
-return dest;
-}
-
-int vsnprintf(char* str, size_t size, const char* format, va_list args)
-{
-size_t i = 0; /* Current position in the output string */
-size_t written = 0; /* Total number of characters written */
-char* dest = str;
-
-while ( format[i] != '\0' && written < size - 1 )
-{
-if ( format[i] == '%' )
-{
-i++;
-
-if ( format[i] == '\0' )
-break;
-
-if ( format[i] == '%' )
-{
-if ( written < size - 1 )
-{
-dest[written] = '%';
-written++;
-}
-i++;
-continue;
-}
-
-/*
- * Handle format specifiers.
- * For simplicity, only %s and %d are implemented here.
- */
-
-if ( format[i] == 's' )
-{
-char* arg = va_arg(args, char*);
-size_t

[PATCH v11 4/9] xen/riscv: introduce bitops.h

Taken from Linux-6.4.0-rc1

Xen's bitops.h consists of several Linux's headers:
* linux/arch/include/asm/bitops.h:
  * The following function were removed as they aren't used in Xen:
* test_and_set_bit_lock
* clear_bit_unlock
* __clear_bit_unlock
  * The following functions were renamed in the way how they are
used by common code:
* __test_and_set_bit
* __test_and_clear_bit
  * The declaration and implementation of the following functios
were updated to make Xen build happy:
* clear_bit
* set_bit
* __test_and_clear_bit
* __test_and_set_bit

Signed-off-by: Oleksii Kurochko 
Acked-by: Jan Beulich 
---
Changes in V11:
 - Nothing changed. Only rebase was done.
---
Changes in V10:
 - update the error message BITS_PER_LONG -> BITOP_BITS_PER_WORD
---
Changes in V9:
 - add Acked-by: Jan Beulich 
 - drop redefinition of bitop_uint_t in asm/types.h as some operation in Xen 
common code expects
   to work with 32-bit quantities.
 - s/BITS_PER_LONG/BITOP_BITS_PER_WORD in asm/bitops.h around __AMO() macros.
---
Changes in V8:
 - define bitop_uint_t in  after the changes in patch related to 
introduction of
   "introduce generic non-atomic test_*bit()".
 - drop duplicated __set_bit() and __clear_bit().
 - drop duplicated comment: /* Based on linux/arch/include/asm/bitops.h */.
 - update type of res and mask in test_and_op_bit_ord(): unsigned long -> 
bitop_uint_t.
 - drop 1 padding blank in test_and_op_bit_ord().
 - update definition of 
test_and_set_bit(),test_and_clear_bit(),test_and_change_bit:
   change return type to bool.
 - change addr argument type of test_and_change_bit(): unsigned long * -> void 
*.
 - move test_and_change_bit() closer to other test_and-s function.
 - Code style fixes: tabs -> space.
 - s/#undef __op_bit/#undef op_bit.
 - update the commit message: delete information about generic-non-atomic.h 
changes as now
   it is a separate patch.
---
Changes in V7:
 - Update the commit message.
 - Drop "__" for __op_bit and __op_bit_ord as they are atomic.
 - add comment above __set_bit and __clear_bit about why they are defined as 
atomic.
 - align bitops_uint_t with __AMO().
 - make changes after  generic non-atomic test_*bit() were changed.
 - s/__asm__ __volatile__/asm volatile
---
Changes in V6:
 - rebase clean ups were done: drop unused asm-generic includes
---
 Changes in V5:
   - new patch
---
 xen/arch/riscv/include/asm/bitops.h | 137 
 1 file changed, 137 insertions(+)
 create mode 100644 xen/arch/riscv/include/asm/bitops.h

diff --git a/xen/arch/riscv/include/asm/bitops.h 
b/xen/arch/riscv/include/asm/bitops.h
new file mode 100644
index 00..7f7af3fda1
--- /dev/null
+++ b/xen/arch/riscv/include/asm/bitops.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2012 Regents of the University of California */
+
+#ifndef _ASM_RISCV_BITOPS_H
+#define _ASM_RISCV_BITOPS_H
+
+#include 
+
+#if BITOP_BITS_PER_WORD == 64
+#define __AMO(op)   "amo" #op ".d"
+#elif BITOP_BITS_PER_WORD == 32
+#define __AMO(op)   "amo" #op ".w"
+#else
+#error "Unexpected BITOP_BITS_PER_WORD"
+#endif
+
+/* Based on linux/arch/include/asm/bitops.h */
+
+/*
+ * Non-atomic bit manipulation.
+ *
+ * Implemented using atomics to be interrupt safe. Could alternatively
+ * implement with local interrupt masking.
+ */
+#define __set_bit(n, p)  set_bit(n, p)
+#define __clear_bit(n, p)clear_bit(n, p)
+
+#define test_and_op_bit_ord(op, mod, nr, addr, ord) \
+({  \
+bitop_uint_t res, mask; \
+mask = BITOP_MASK(nr);  \
+asm volatile (  \
+__AMO(op) #ord " %0, %2, %1"\
+: "=r" (res), "+A" (addr[BITOP_WORD(nr)])   \
+: "r" (mod(mask))   \
+: "memory");\
+((res & mask) != 0);\
+})
+
+#define op_bit_ord(op, mod, nr, addr, ord)  \
+asm volatile (  \
+__AMO(op) #ord " zero, %1, %0"  \
+: "+A" (addr[BITOP_WORD(nr)])   \
+: "r" (mod(BITOP_MASK(nr))) \
+: "memory");
+
+#define test_and_op_bit(op, mod, nr, addr)\
+test_and_op_bit_ord(op, mod, nr, addr, .aqrl)
+#define op_bit(op, mod, nr, addr) \
+op_bit_ord(op, mod, nr, addr, )
+
+/* Bitmask modifiers */
+#define NOP(x)(x)
+#define NOT(x)(~(x))
+
+/**
+ * test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ */
+static inline bool test_and_set_bit(int nr, volatile void *p)
+{
+volatile bitop_uint_t *addr = p;
+
+return test_and_op_bit(or, NOP, nr, addr);
+}
+
+/**
+ * test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to

Re: [PATCH v3 1/2] tools/xg: Streamline cpu policy serialise/deserialise calls

2024-05-24 Thread Roger Pau Monné

On Fri, May 24, 2024 at 11:32:50AM +0100, Alejandro Vallejo wrote:
> On 23/05/2024 11:21, Roger Pau Monné wrote:
> > On Thu, May 23, 2024 at 10:41:29AM +0100, Alejandro Vallejo wrote:
> >> -int xc_cpu_policy_serialise(xc_interface *xch, const xc_cpu_policy_t *p,
> >> -xen_cpuid_leaf_t *leaves, uint32_t *nr_leaves,
> >> -xen_msr_entry_t *msrs, uint32_t *nr_msrs)
> >> +int xc_cpu_policy_serialise(xc_interface *xch, xc_cpu_policy_t *p)
> >>  {
> >> +unsigned int nr_leaves = ARRAY_SIZE(p->leaves);
> >> +unsigned int nr_msrs = ARRAY_SIZE(p->msrs);
> >>  int rc;
> >>  
> >> -if ( leaves )
> >> +rc = x86_cpuid_copy_to_buffer(>policy, p->leaves, _leaves);
> >> +if ( rc )
> >>  {
> >> -rc = x86_cpuid_copy_to_buffer(>policy, leaves, nr_leaves);
> >> -if ( rc )
> >> -{
> >> -ERROR("Failed to serialize CPUID policy");
> >> -errno = -rc;
> >> -return -1;
> >> -}
> >> +ERROR("Failed to serialize CPUID policy");
> >> +errno = -rc;
> >> +return -1;
> >>  }
> >>  
> >> -if ( msrs )
> >> +p->nr_leaves = nr_leaves;
> > 
> > Nit: FWIW, I think you could avoid having to introduce local
> > nr_{leaves,msrs} variables and just use p->nr_{leaves,msrs}?  By
> > setting them to ARRAY_SIZE() at the top of the function and then
> > letting x86_{cpuid,msr}_copy_to_buffer() adjust as necessary.
> > 
> > Thanks, Roger.
> 
> The intent was to avoid mutating the policy object in the error cases
> during deserialise. Then I adjusted the serialise case to have symmetry.

It's currently unavoidable for the policy to be likely mutated even in
case of error, as x86_{cpuid,msr}_copy_to_buffer() are two separate
operations, and hence the first succeeding but the second failing will
already result in the policy being mutated on error.

> It's true the preservation is not meaningful in the serialise case
> because at that point the serialised form is already corrupted.
> 
> I don't mind either way. Seeing how I'm sending one final version with
> the comments of patch2 I'll just adjust as you proposed.

I'm fine either way (hence why prefix it with "nit:") albeit I have a
preference for not introducing the local variables if they are not
needed.

Thanks, Roger.

Re: [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area

On 23/05/2024 15:32, Roger Pau Monné wrote:
>>  case 0xb:
>> -/*
>> - * In principle, this leaf is Intel-only.  In practice, it is 
>> tightly
>> - * coupled with x2apic, and we offer an x2apic-capable APIC 
>> emulation
>> - * to guests on AMD hardware as well.
>> - *
>> - * TODO: Rework topology logic.
>> - */
>> -if ( p->basic.x2apic )
>> +/* Don't expose topology information to PV guests */
> 
> Not sure whether we want to keep part of the comment about exposing
> x2APIC to guests even when x2APIC is not present in the host.  I think
> this code has changed and the comment is kind of stale now.

The comment is definitely stale. Nowadays x2APIC is fully supported by
AMD, as is leaf 0xb. The fact we emulate the x2APIC seems hardly
relevant in a CPUID leaf about topology. I could keep a note showing...

/* Exposed alongside x2apic, as it's tightly coupled with it */

... although that's directly implied by the conditional.

>> +void vlapic_cpu_policy_changed(struct vcpu *v)
>> +{
>> +struct vlapic *vlapic = vcpu_vlapic(v);
>> +const struct cpu_policy *cp = v->domain->arch.cpu_policy;
>> +
>> +/*
>> + * Don't override the initial x2APIC ID if we have migrated it or
>> + * if the domain doesn't have vLAPIC at all.
>> + */
>> +if ( !has_vlapic(v->domain) || vlapic->loaded.hw )
>> +return;
>> +
>> +vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id);
>> +vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
> 
> Nit: in case we decide to start APICs in x2APIC mode, might be good to
> take this into account here and use vlapic_x2apic_mode(vlapic) to
> select whether SET_xAPIC_ID() needs to be used or not:>
> vlapic_set_reg(vlapic, APIC_ID,
> vlapic_x2apic_mode(vlapic) ? vlapic->hw.x2apic_id
>  : SET_xAPIC_ID(vlapic->hw.x2apic_id));
> 
> Or similar.
> 

I like it. Sure.

>> +}
>> +
>>  int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>>  {
>>  const struct cpu_policy *cp = v->domain->arch.cpu_policy;
>> @@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
>>  if ( v->vcpu_id == 0 )
>>  vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
>>  
>> -vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
>> +vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
>>  vlapic_do_init(vlapic);
>>  }
>>  
>> @@ -1514,6 +1530,16 @@ static void lapic_load_fixup(struct vlapic *vlapic)
>>  const struct vcpu *v = vlapic_vcpu(vlapic);
>>  uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>>  
>> +/*
>> + * Loading record without hw.x2apic_id in the save stream, calculate 
>> using
>> + * the traditional "vcpu_id * 2" relation. There's an implicit 
>> assumption
>> + * that vCPU0 always has x2APIC0, which is true for the old relation, 
>> and
>> + * still holds under the new x2APIC generation algorithm. While that 
>> case
>> + * goes through the conditional it's benign because it still maps to 
>> zero.
>> + */
>> +if ( !vlapic->hw.x2apic_id )
>> +vlapic->hw.x2apic_id = v->vcpu_id * 2;
>> +
>>  /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
>>  if ( !vlapic_x2apic_mode(vlapic) ||
>>   (vlapic->loaded.ldr == good_ldr) )
>> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h 
>> b/xen/arch/x86/include/asm/hvm/hvm.h
>> index 0c9e6f15645d..e1f0585d75a9 100644
>> --- a/xen/arch/x86/include/asm/hvm/hvm.h
>> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
>> @@ -448,6 +448,7 @@ static inline void hvm_update_guest_efer(struct vcpu *v)
>>  static inline void hvm_cpuid_policy_changed(struct vcpu *v)
>>  {
>>  alternative_vcall(hvm_funcs.cpuid_policy_changed, v);
>> +vlapic_cpu_policy_changed(v);
> 
> Note sure whether this call would better be placed in
> cpu_policy_updated() inside the is_hvm_vcpu() conditional branch.
> 
> hvm_cpuid_policy_changed()  are just wrappers around the hvm_funcs
> hooks, pulling vlapic functions in there is likely to complicate the
> header dependencies in the long term.
> 

That's how it was in v1 and I moved it in v2 answering one of Jan's
feedback points.

I don't mind either way.

>>  }
>>  
>>  static inline void hvm_set_tsc_offset(struct vcpu *v, uint64_t offset,
>> diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h 
>> b/xen/arch/x86/include/asm/hvm/vlapic.h
>> index 88ef94524339..e8d41313abd3 100644
>> --- a/xen/arch/x86/include/asm/hvm/vlapic.h
>> +++ b/xen/arch/x86/include/asm/hvm/vlapic.h
>> @@ -44,6 +44,7 @@
>>  #define vlapic_xapic_mode(vlapic)   \
>>  (!vlapic_hw_disabled(vlapic) && \
>>   !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
>> +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
>>  
>>  /*
>>   * Generic APIC bitmap vector update & search routines.
>> @@ -107,6 +108,7 @@ int

[PATCH v7 6/8] xen: mapcache: Pass the ram_addr offset to xen_map_cache()

From: "Edgar E. Iglesias" 

Pass the ram_addr offset to xen_map_cache.
This is in preparation for adding grant mappings that need
to compute the address within the RAMBlock.

No functional changes.

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: David Hildenbrand 
Reviewed-by: Stefano Stabellini 
---
 hw/xen/xen-mapcache.c | 16 +++-
 include/sysemu/xen-mapcache.h |  2 ++
 system/physmem.c  |  9 +
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index ec95445696..a07c47b0b1 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -167,7 +167,8 @@ static void xen_remap_bucket(MapCache *mc,
  void *vaddr,
  hwaddr size,
  hwaddr address_index,
- bool dummy)
+ bool dummy,
+ ram_addr_t ram_offset)
 {
 uint8_t *vaddr_base;
 xen_pfn_t *pfns;
@@ -266,6 +267,7 @@ static void xen_remap_bucket(MapCache *mc,
 
 static uint8_t *xen_map_cache_unlocked(MapCache *mc,
hwaddr phys_addr, hwaddr size,
+   ram_addr_t ram_offset,
uint8_t lock, bool dma, bool is_write)
 {
 MapCacheEntry *entry, *pentry = NULL,
@@ -337,14 +339,16 @@ tryagain:
 if (!entry) {
 entry = g_new0(MapCacheEntry, 1);
 pentry->next = entry;
-xen_remap_bucket(mc, entry, NULL, cache_size, address_index, dummy);
+xen_remap_bucket(mc, entry, NULL, cache_size, address_index, dummy,
+ ram_offset);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(mc, entry, NULL, cache_size, address_index, 
dummy);
+xen_remap_bucket(mc, entry, NULL, cache_size, address_index, dummy,
+ ram_offset);
 }
 }
 
@@ -391,13 +395,15 @@ tryagain:
 
 uint8_t *xen_map_cache(MemoryRegion *mr,
hwaddr phys_addr, hwaddr size,
+   ram_addr_t ram_addr_offset,
uint8_t lock, bool dma,
bool is_write)
 {
 uint8_t *p;
 
 mapcache_lock(mapcache);
-p = xen_map_cache_unlocked(mapcache, phys_addr, size, lock, dma, is_write);
+p = xen_map_cache_unlocked(mapcache, phys_addr, size, ram_addr_offset,
+   lock, dma, is_write);
 mapcache_unlock(mapcache);
 return p;
 }
@@ -632,7 +638,7 @@ static uint8_t *xen_replace_cache_entry_unlocked(MapCache 
*mc,
 trace_xen_replace_cache_entry_dummy(old_phys_addr, new_phys_addr);
 
 xen_remap_bucket(mc, entry, entry->vaddr_base,
- cache_size, address_index, false);
+ cache_size, address_index, false, old_phys_addr);
 if (!test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
diff --git a/include/sysemu/xen-mapcache.h b/include/sysemu/xen-mapcache.h
index 1ec9e66752..b5e3ea1bc0 100644
--- a/include/sysemu/xen-mapcache.h
+++ b/include/sysemu/xen-mapcache.h
@@ -19,6 +19,7 @@ typedef hwaddr (*phys_offset_to_gaddr_t)(hwaddr phys_offset,
 void xen_map_cache_init(phys_offset_to_gaddr_t f,
 void *opaque);
 uint8_t *xen_map_cache(MemoryRegion *mr, hwaddr phys_addr, hwaddr size,
+   ram_addr_t ram_addr_offset,
uint8_t lock, bool dma,
bool is_write);
 ram_addr_t xen_ram_addr_from_mapcache(void *ptr);
@@ -37,6 +38,7 @@ static inline void xen_map_cache_init(phys_offset_to_gaddr_t 
f,
 static inline uint8_t *xen_map_cache(MemoryRegion *mr,
  hwaddr phys_addr,
  hwaddr size,
+ ram_addr_t ram_addr_offset,
  uint8_t lock,
  bool dma,
  bool is_write)
diff --git a/system/physmem.c b/system/physmem.c
index b7847db1a2..33d09f7571 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2231,13 +2231,14 @@ static void *qemu_ram_ptr_length(RAMBlock *block, 
ram_addr_t addr,
  */
 if (xen_mr_is_memory(block->mr)) {
 return xen_map_cache(block->mr, block->offset + addr,
- len, lock, lock,
- is_write);
+ len, block->offset,
+ lock, lock, is_write);
 }

[PATCH v7 2/8] xen: mapcache: Unmap first entries in buckets

From: "Edgar E. Iglesias" 

When invalidating memory ranges, if we happen to hit the first
entry in a bucket we were never unmapping it. This was harmless
for foreign mappings but now that we're looking to reuse the
mapcache for transient grant mappings, we must unmap entries
when invalidated.

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
---
 hw/xen/xen-mapcache.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index bc860f4373..ec95445696 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -491,18 +491,23 @@ static void 
xen_invalidate_map_cache_entry_unlocked(MapCache *mc,
 return;
 }
 entry->lock--;
-if (entry->lock > 0 || pentry == NULL) {
+if (entry->lock > 0) {
 return;
 }
 
-pentry->next = entry->next;
 ram_block_notify_remove(entry->vaddr_base, entry->size, entry->size);
 if (munmap(entry->vaddr_base, entry->size) != 0) {
 perror("unmap fails");
 exit(-1);
 }
+
 g_free(entry->valid_mapping);
-g_free(entry);
+if (pentry) {
+pentry->next = entry->next;
+g_free(entry);
+} else {
+memset(entry, 0, sizeof *entry);
+}
 }
 
 typedef struct XenMapCacheData {
-- 
2.40.1

[PATCH v7 7/8] xen: mapcache: Add support for grant mappings

From: "Edgar E. Iglesias" 

Add a second mapcache for grant mappings. The mapcache for
grants needs to work with XC_PAGE_SIZE granularity since
we can't map larger ranges than what has been granted to us.

Like with foreign mappings (xen_memory), machines using grants
are expected to initialize the xen_grants MR and map it
into their address-map accordingly.

CC: Manos Pitsidianakis 
Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
---
 hw/xen/xen-hvm-common.c |  12 ++-
 hw/xen/xen-mapcache.c   | 165 +---
 include/hw/xen/xen-hvm-common.h |   3 +
 include/sysemu/xen.h|   7 ++
 4 files changed, 150 insertions(+), 37 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index a0a0252da0..b8ace1c368 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -10,12 +10,18 @@
 #include "hw/boards.h"
 #include "hw/xen/arch_hvm.h"
 
-MemoryRegion xen_memory;
+MemoryRegion xen_memory, xen_grants;
 
-/* Check for xen memory.  */
+/* Check for any kind of xen memory, foreign mappings or grants.  */
 bool xen_mr_is_memory(MemoryRegion *mr)
 {
-return mr == _memory;
+return mr == _memory || mr == _grants;
+}
+
+/* Check specifically for grants.  */
+bool xen_mr_is_grants(MemoryRegion *mr)
+{
+return mr == _grants;
 }
 
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index a07c47b0b1..5f23b0adbe 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -14,6 +14,7 @@
 
 #include 
 
+#include "hw/xen/xen-hvm-common.h"
 #include "hw/xen/xen_native.h"
 #include "qemu/bitmap.h"
 
@@ -21,6 +22,8 @@
 #include "sysemu/xen-mapcache.h"
 #include "trace.h"
 
+#include 
+#include 
 
 #if HOST_LONG_BITS == 32
 #  define MCACHE_MAX_SIZE (1UL<<31) /* 2GB Cap */
@@ -41,6 +44,7 @@ typedef struct MapCacheEntry {
 unsigned long *valid_mapping;
 uint32_t lock;
 #define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0)
+#define XEN_MAPCACHE_ENTRY_GRANT (1 << 1)
 uint8_t flags;
 hwaddr size;
 struct MapCacheEntry *next;
@@ -71,6 +75,8 @@ typedef struct MapCache {
 } MapCache;
 
 static MapCache *mapcache;
+static MapCache *mapcache_grants;
+static xengnttab_handle *xen_region_gnttabdev;
 
 static inline void mapcache_lock(MapCache *mc)
 {
@@ -131,6 +137,12 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 unsigned long max_mcache_size;
 unsigned int bucket_shift;
 
+xen_region_gnttabdev = xengnttab_open(NULL, 0);
+if (xen_region_gnttabdev == NULL) {
+error_report("mapcache: Failed to open gnttab device");
+exit(EXIT_FAILURE);
+}
+
 if (HOST_LONG_BITS == 32) {
 bucket_shift = 16;
 } else {
@@ -159,6 +171,15 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 mapcache = xen_map_cache_init_single(f, opaque,
  bucket_shift,
  max_mcache_size);
+
+/*
+ * Grant mappings must use XC_PAGE_SIZE granularity since we can't
+ * map anything beyond the number of pages granted to us.
+ */
+mapcache_grants = xen_map_cache_init_single(f, opaque,
+XC_PAGE_SHIFT,
+max_mcache_size);
+
 setrlimit(RLIMIT_AS, _as);
 }
 
@@ -168,17 +189,24 @@ static void xen_remap_bucket(MapCache *mc,
  hwaddr size,
  hwaddr address_index,
  bool dummy,
+ bool grant,
+ bool is_write,
  ram_addr_t ram_offset)
 {
 uint8_t *vaddr_base;
-xen_pfn_t *pfns;
-int *err;
+g_autofree uint32_t *refs = NULL;
+g_autofree xen_pfn_t *pfns = NULL;
+g_autofree int *err;
 unsigned int i;
 hwaddr nb_pfn = size >> XC_PAGE_SHIFT;
 
 trace_xen_remap_bucket(address_index);
 
-pfns = g_new0(xen_pfn_t, nb_pfn);
+if (grant) {
+refs = g_new0(uint32_t, nb_pfn);
+} else {
+pfns = g_new0(xen_pfn_t, nb_pfn);
+}
 err = g_new0(int, nb_pfn);
 
 if (entry->vaddr_base != NULL) {
@@ -207,21 +235,51 @@ static void xen_remap_bucket(MapCache *mc,
 g_free(entry->valid_mapping);
 entry->valid_mapping = NULL;
 
-for (i = 0; i < nb_pfn; i++) {
-pfns[i] = (address_index << (mc->bucket_shift - XC_PAGE_SHIFT)) + i;
+if (grant) {
+hwaddr grant_base = address_index - (ram_offset >> XC_PAGE_SHIFT);
+
+for (i = 0; i < nb_pfn; i++) {
+refs[i] = grant_base + i;
+}
+} else {
+for (i = 0; i < nb_pfn; i++) {
+pfns[i] = (address_index << (mc->bucket_shift - XC_PAGE_SHIFT)) + 
i;
+}
 }
 
-/*
- * If the caller has requested the mapping at a specific address use
- * MAP_FIXED to make sure

[PATCH v7 3/8] xen: Add xen_mr_is_memory()

From: "Edgar E. Iglesias" 

Add xen_mr_is_memory() to abstract away tests for the
xen_memory MR.

No functional changes.

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
Acked-by: David Hildenbrand 
---
 hw/xen/xen-hvm-common.c | 10 --
 include/sysemu/xen.h|  8 
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 2d1b032121..a0a0252da0 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -12,6 +12,12 @@
 
 MemoryRegion xen_memory;
 
+/* Check for xen memory.  */
+bool xen_mr_is_memory(MemoryRegion *mr)
+{
+return mr == _memory;
+}
+
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
Error **errp)
 {
@@ -28,7 +34,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr,
 return;
 }
 
-if (mr == _memory) {
+if (xen_mr_is_memory(mr)) {
 return;
 }
 
@@ -55,7 +61,7 @@ static void xen_set_memory(struct MemoryListener *listener,
 {
 XenIOState *state = container_of(listener, XenIOState, memory_listener);
 
-if (section->mr == _memory) {
+if (xen_mr_is_memory(section->mr)) {
 return;
 } else {
 if (add) {
diff --git a/include/sysemu/xen.h b/include/sysemu/xen.h
index 754ec2e6cb..dc72f83bcb 100644
--- a/include/sysemu/xen.h
+++ b/include/sysemu/xen.h
@@ -34,6 +34,8 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t 
length);
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
struct MemoryRegion *mr, Error **errp);
 
+bool xen_mr_is_memory(MemoryRegion *mr);
+
 #else /* !CONFIG_XEN_IS_POSSIBLE */
 
 #define xen_enabled() 0
@@ -47,6 +49,12 @@ static inline void xen_ram_alloc(ram_addr_t ram_addr, 
ram_addr_t size,
 g_assert_not_reached();
 }
 
+static inline bool xen_mr_is_memory(MemoryRegion *mr)
+{
+g_assert_not_reached();
+return false;
+}
+
 #endif /* CONFIG_XEN_IS_POSSIBLE */
 
 #endif
-- 
2.40.1

[PATCH v7 1/8] xen: mapcache: Make MCACHE_BUCKET_SHIFT runtime configurable

From: "Edgar E. Iglesias" 

Make MCACHE_BUCKET_SHIFT runtime configurable per cache instance.

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
---
 hw/xen/xen-mapcache.c | 54 ++-
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index fa6813b1ad..bc860f4373 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -23,13 +23,10 @@
 
 
 #if HOST_LONG_BITS == 32
-#  define MCACHE_BUCKET_SHIFT 16
 #  define MCACHE_MAX_SIZE (1UL<<31) /* 2GB Cap */
 #else
-#  define MCACHE_BUCKET_SHIFT 20
 #  define MCACHE_MAX_SIZE (1UL<<35) /* 32GB Cap */
 #endif
-#define MCACHE_BUCKET_SIZE (1UL << MCACHE_BUCKET_SHIFT)
 
 /* This is the size of the virtual address space reserve to QEMU that will not
  * be use by MapCache.
@@ -65,7 +62,8 @@ typedef struct MapCache {
 /* For most cases (>99.9%), the page address is the same. */
 MapCacheEntry *last_entry;
 unsigned long max_mcache_size;
-unsigned int mcache_bucket_shift;
+unsigned int bucket_shift;
+unsigned long bucket_size;
 
 phys_offset_to_gaddr_t phys_offset_to_gaddr;
 QemuMutex lock;
@@ -95,11 +93,14 @@ static inline int test_bits(int nr, int size, const 
unsigned long *addr)
 
 static MapCache *xen_map_cache_init_single(phys_offset_to_gaddr_t f,
void *opaque,
+   unsigned int bucket_shift,
unsigned long max_size)
 {
 unsigned long size;
 MapCache *mc;
 
+assert(bucket_shift >= XC_PAGE_SHIFT);
+
 mc = g_new0(MapCache, 1);
 
 mc->phys_offset_to_gaddr = f;
@@ -108,12 +109,14 @@ static MapCache 
*xen_map_cache_init_single(phys_offset_to_gaddr_t f,
 
 QTAILQ_INIT(>locked_entries);
 
+mc->bucket_shift = bucket_shift;
+mc->bucket_size = 1UL << bucket_shift;
 mc->max_mcache_size = max_size;
 
 mc->nr_buckets =
 (((mc->max_mcache_size >> XC_PAGE_SHIFT) +
-  (1UL << (MCACHE_BUCKET_SHIFT - XC_PAGE_SHIFT)) - 1) >>
- (MCACHE_BUCKET_SHIFT - XC_PAGE_SHIFT));
+  (1UL << (bucket_shift - XC_PAGE_SHIFT)) - 1) >>
+ (bucket_shift - XC_PAGE_SHIFT));
 
 size = mc->nr_buckets * sizeof(MapCacheEntry);
 size = (size + XC_PAGE_SIZE - 1) & ~(XC_PAGE_SIZE - 1);
@@ -126,6 +129,13 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 {
 struct rlimit rlimit_as;
 unsigned long max_mcache_size;
+unsigned int bucket_shift;
+
+if (HOST_LONG_BITS == 32) {
+bucket_shift = 16;
+} else {
+bucket_shift = 20;
+}
 
 if (geteuid() == 0) {
 rlimit_as.rlim_cur = RLIM_INFINITY;
@@ -146,7 +156,9 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 }
 }
 
-mapcache = xen_map_cache_init_single(f, opaque, max_mcache_size);
+mapcache = xen_map_cache_init_single(f, opaque,
+ bucket_shift,
+ max_mcache_size);
 setrlimit(RLIMIT_AS, _as);
 }
 
@@ -195,7 +207,7 @@ static void xen_remap_bucket(MapCache *mc,
 entry->valid_mapping = NULL;
 
 for (i = 0; i < nb_pfn; i++) {
-pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT)) + i;
+pfns[i] = (address_index << (mc->bucket_shift - XC_PAGE_SHIFT)) + i;
 }
 
 /*
@@ -266,8 +278,8 @@ static uint8_t *xen_map_cache_unlocked(MapCache *mc,
 bool dummy = false;
 
 tryagain:
-address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
-address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);
+address_index  = phys_addr >> mc->bucket_shift;
+address_offset = phys_addr & (mc->bucket_size - 1);
 
 trace_xen_map_cache(phys_addr);
 
@@ -294,14 +306,14 @@ tryagain:
 return mc->last_entry->vaddr_base + address_offset;
 }
 
-/* size is always a multiple of MCACHE_BUCKET_SIZE */
+/* size is always a multiple of mc->bucket_size */
 if (size) {
 cache_size = size + address_offset;
-if (cache_size % MCACHE_BUCKET_SIZE) {
-cache_size += MCACHE_BUCKET_SIZE - (cache_size % 
MCACHE_BUCKET_SIZE);
+if (cache_size % mc->bucket_size) {
+cache_size += mc->bucket_size - (cache_size % mc->bucket_size);
 }
 } else {
-cache_size = MCACHE_BUCKET_SIZE;
+cache_size = mc->bucket_size;
 }
 
 entry = >entry[address_index % mc->nr_buckets];
@@ -422,7 +434,7 @@ static ram_addr_t 
xen_ram_addr_from_mapcache_single(MapCache *mc, void *ptr)
 trace_xen_ram_addr_from_mapcache_not_in_cache(ptr);
 raddr = RAM_ADDR_INVALID;
 } else {
-raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
+raddr = (reventry->paddr_index << mc->bucket_shift) +
  ((unsigned long) ptr - (unsigned long) entry->vaddr_base);
 }
 mapcache_unlock(mc);
@@ -585,8 +597,8 @@

Re: [PATCH v3 1/2] tools/xg: Streamline cpu policy serialise/deserialise calls

On 23/05/2024 11:21, Roger Pau Monné wrote:
> On Thu, May 23, 2024 at 10:41:29AM +0100, Alejandro Vallejo wrote:
>> The idea is to use xc_cpu_policy_t as a single object containing both the
>> serialised and deserialised forms of the policy. Note that we need lengths
>> for the arrays, as the serialised policies may be shorter than the array
>> capacities.
>>
>> * Add the serialised lengths to the struct so we can distinguish
>>   between length and capacity of the serialisation buffers.
>> * Remove explicit buffer+lengths in serialise/deserialise calls
>>   and use the internal buffer inside xc_cpu_policy_t instead.
>> * Refactor everything to use the new serialisation functions.
>> * Remove redundant serialization calls and avoid allocating dynamic
>>   memory aside from the policy objects in xen-cpuid. Also minor cleanup
>>   in the policy print call sites.
>>
>> No functional change intended.
>>
>> Signed-off-by: Alejandro Vallejo 
> 
> Acked-by: Roger Pau Monné 
> 
> Just two comments.
> 
>> ---
>> v3:
>>   * Better context scoping in xg_sr_common_x86.
>> * Can't be const because write_record() takes non-const.
>>   * Adjusted line length of xen-cpuid's print_policy.
>>   * Adjusted error messages in xen-cpuid's print_policy.
>>   * Reverted removal of overscoped loop indices.
>> ---
>>  tools/include/xenguest.h|  8 ++-
>>  tools/libs/guest/xg_cpuid_x86.c | 98 -
>>  tools/libs/guest/xg_private.h   |  2 +
>>  tools/libs/guest/xg_sr_common_x86.c | 56 ++---
>>  tools/misc/xen-cpuid.c  | 41 
>>  5 files changed, 106 insertions(+), 99 deletions(-)
>>
>> diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
>> index e01f494b772a..563811cd8dde 100644
>> --- a/tools/include/xenguest.h
>> +++ b/tools/include/xenguest.h
>> @@ -799,14 +799,16 @@ int xc_cpu_policy_set_domain(xc_interface *xch, 
>> uint32_t domid,
>>   xc_cpu_policy_t *policy);
>>  
>>  /* Manipulate a policy via architectural representations. */
>> -int xc_cpu_policy_serialise(xc_interface *xch, const xc_cpu_policy_t 
>> *policy,
>> -xen_cpuid_leaf_t *leaves, uint32_t *nr_leaves,
>> -xen_msr_entry_t *msrs, uint32_t *nr_msrs);
>> +int xc_cpu_policy_serialise(xc_interface *xch, xc_cpu_policy_t *policy);
>>  int xc_cpu_policy_update_cpuid(xc_interface *xch, xc_cpu_policy_t *policy,
>> const xen_cpuid_leaf_t *leaves,
>> uint32_t nr);
>>  int xc_cpu_policy_update_msrs(xc_interface *xch, xc_cpu_policy_t *policy,
>>const xen_msr_entry_t *msrs, uint32_t nr);
>> +int xc_cpu_policy_get_leaves(xc_interface *xch, const xc_cpu_policy_t 
>> *policy,
>> + const xen_cpuid_leaf_t **leaves, uint32_t *nr);
>> +int xc_cpu_policy_get_msrs(xc_interface *xch, const xc_cpu_policy_t *policy,
>> +   const xen_msr_entry_t **msrs, uint32_t *nr);
> 
> Maybe it would be helpful to have a comment clarifying that the return
> of xc_cpu_policy_get_{leaves,msrs}() is a reference to the content of
> the policy, not a copy of it (and hence is tied to the lifetime of
> policy, and doesn't require explicit freeing).

Sure.

> 
>>  
>>  /* Compatibility calculations. */
>>  bool xc_cpu_policy_is_compatible(xc_interface *xch, xc_cpu_policy_t *host,
>> diff --git a/tools/libs/guest/xg_cpuid_x86.c 
>> b/tools/libs/guest/xg_cpuid_x86.c
>> index 4453178100ad..4f4b86b59470 100644
>> --- a/tools/libs/guest/xg_cpuid_x86.c
>> +++ b/tools/libs/guest/xg_cpuid_x86.c
>> @@ -834,14 +834,13 @@ void xc_cpu_policy_destroy(xc_cpu_policy_t *policy)
>>  }
>>  }
>>  
>> -static int deserialize_policy(xc_interface *xch, xc_cpu_policy_t *policy,
>> -  unsigned int nr_leaves, unsigned int 
>> nr_entries)
>> +static int deserialize_policy(xc_interface *xch, xc_cpu_policy_t *policy)
>>  {
>>  uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
>>  int rc;
>>  
>>  rc = x86_cpuid_copy_from_buffer(>policy, policy->leaves,
>> -nr_leaves, _leaf, _subleaf);
>> +policy->nr_leaves, _leaf, 
>> _subleaf);
>>  if ( rc )
>>  {
>>  if ( err_leaf != -1 )
>> @@ -851,7 +850,7 @@ static int deserialize_policy(xc_interface *xch, 
>> xc_cpu_policy_t *policy,
>>  }
>>  
>>  rc = x86_msr_copy_from_buffer(>policy, policy->msrs,
>> -  nr_entries, _msr);
>> +  policy->nr_msrs, _msr);
>>  if ( rc )
>>  {
>>  if ( err_msr != -1 )
>> @@ -878,7 +877,10 @@ int xc_cpu_policy_get_system(xc_interface *xch, 
>> unsigned int policy_idx,
>>  return rc;
>>  }
>>  
>> -rc = deserialize_policy(xch, policy, nr_leaves, nr_msrs);
>> +policy->nr_leaves = nr_leaves;
>> +

Re: [PATCH v3 2/2] tools/xg: Clean up xend-style overrides for CPU policies

On 23/05/2024 11:47, Roger Pau Monné wrote:
>> -static xen_cpuid_leaf_t *find_leaf(
>> -xen_cpuid_leaf_t *leaves, unsigned int nr_leaves,
>> -const struct xc_xend_cpuid *xend)
>> +static xen_cpuid_leaf_t *find_leaf(xc_cpu_policy_t *p,
>> +   const struct xc_xend_cpuid *xend)
>>  {
>>  const xen_cpuid_leaf_t key = { xend->leaf, xend->subleaf };
>>  
>> -return bsearch(, leaves, nr_leaves, sizeof(*leaves), 
>> compare_leaves);
>> +return bsearch(, p->leaves, ARRAY_SIZE(p->leaves),
> 
> Don't you need to use p->nr_leaves here, as otherwise we could check
> against possibly uninitialized leaves (or leaves with stale data)?

Indeed. Good catch (same on the MSR side).

>> -switch ( p->policy.x86_vendor )
>> +switch ( cur->policy.x86_vendor )
>>  {
>>  case X86_VENDOR_INTEL:
>> -for ( i = 0; (p->policy.cache.subleaf[i].type &&
>> -  i < ARRAY_SIZE(p->policy.cache.raw)); ++i )
>> +for ( i = 0; (cur->policy.cache.subleaf[i].type &&
>> +i < ARRAY_SIZE(cur->policy.cache.raw)); ++i 
>> )
> 
> Nit: indentation is weird here.  I would use:
> 
> for ( i = 0; cur->policy.cache.subleaf[i].type &&
>  i < ARRAY_SIZE(cur->policy.cache.raw); ++i )
> 
> Thanks, Roger.

Sure. Leftover from removing the size_t in v2.

Cheers,
Alejandro

Re: [PATCH v5 7/7] docs: Add device tree overlay documentation


Hi Stefano,

On 24/05/2024 03:18, Stefano Stabellini wrote:

From: Vikram Garhwal 

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
  docs/misc/arm/overlay.txt | 82 +++
  1 file changed, 82 insertions(+)
  create mode 100644 docs/misc/arm/overlay.txt

diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt
new file mode 100644
index 00..0a2dee951a
--- /dev/null
+++ b/docs/misc/arm/overlay.txt
@@ -0,0 +1,82 @@
+# Device Tree Overlays support in Xen
+
+Xen experimentally supports dynamic device assignment to running
+domains, i.e. adding/removing nodes (using .dtbo) to/from Xen device
+tree, and attaching them to a running domain with given $domid.
+
+Dynamic node assignment works in two steps:
+
+## Add/Remove device tree overlay to/from Xen device tree
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor applies/removes the dtbo to/from Xen device tree.
+
+## Attach device from the DT overlay to domain
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor attach the device to the user-provided $domid by
+   mapping node resources in the DT overlay.
+
+# Examples
+
+Here are a few examples on how to use it.
+
+## Dom0 device add
+
+For assigning a device tree overlay to Dom0, user should firstly properly
+prepare the DT overlay. More information about device tree overlays can be
+found in [1]. Then, in Dom0, enter the following:
+
+(dom0) xl dt-overlay add overlay.dtbo
+
+This will allocate the devices mentioned in overlay.dtbo to Xen device tree.
+
+To assign the newly added device from the dtbo to Dom0:
+
+(dom0) xl dt-overlay attach overlay.dtbo 0
+
+Next, if the user wants to add the same device tree overlay to dom0
+Linux, execute the following:
+
+(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(dom0) cat overlay.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally if needed, the relevant Linux kernel drive can be loaded using:
+
+(dom0) modprobe module_name.ko
+
+## DomU device add/remove
+
+All the nodes in dtbo will be assigned to a domain; the user will need
+to prepare the dtb for the domU.


s/dtb/dtbo/? But I am little bit confused with the wording. I think you 
may want to add *different dtbo* so it clarifies from the start (this 
only becomes obvious at the end of the section) that the user is not 
meant to use the same for all the commands.


 For example, the `interrupt-parent`

+property of the DomU overlay should be changed to the Xen hardcoded
+value `0xfde8`, and the xen,reg property should be added to specify the
+address mappings. 
If xen,reg is not present, it is assumed 1:1 mapping.


Repeating an earlier comment here. I think xen,reg should be mandatory 
for non-direct mapped domain.


Also, can you clarify what is the expect property layout for xen,reg?


+Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`.
+
+For new domains to be created, the user will need to create the DomU
+with below properties properly configured in the xl config file:
+- `iomem`


I looked at your reply in v4 and I am afraid I still don't understand 
why we are mentioning 'iomem'. If we want to use the commands below, 
then the domain needs to be created in advance. So you can't yet know 
'iomem'.


You could avoid "xl dt-overlay attach" but then you need the user to 
specify both "irqs" and "iomem". From a user point of view, it would be 
easier to add a new propery in the configuration file listing the 
overlays. Something like:


dt_overlays = [ "overlay.dtbo", ... ]

Anyway, that somewhat separate. For now, I think we want to drop 'iomem' 
from the list and reword this paragraph to say that the 'passthrough' 
property needs to be set if you plan to use DT overlay and devices 
requiring the IOMMU.



+- `passthrough` (if IOMMU is needed)


This property is required at the start because we don't support enabling 
the IOMMU lazily.



+
+User will also need to modprobe the relevant drivers. For already
+running domains, the user can use the xl dt-overlay attach command,
+example:
+
+(dom0) xl dt-overlay add overlay.dtbo# If not executed before
+(dom0) xl dt-overlay attach overlay.dtbo $domid
+(dom0) xl console $domid # To access $domid console
+
+Next, if the user needs to modify/prepare the overlay.dtbo suitable for
+the domU:
+
+(domU) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(domU) cat overlay_domu.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally, if needed, the relevant Linux kernel drive can be probed:
+
+(domU) modprobe module_name.ko
+
+[1]

[REMINDER] Xen Summit 2024 - Verification code for design sessions

2024-05-24 Thread Kelly Choi

Hello Xen Community,

Day 2 design sessions will be locked end of today, so get voting!

Day 3 design sessions are still open, and will be finalized 3 June 2024
after 5pm Lisbon Time.

*The verification code is 'LFXEN24'. *

The final schedule will be allocated and arranged by the highest-voted
sessions.

Virtual links for the community to join Xen Summit design sessions will be
shared next week.  For in-person tickets, click here
.

--


We look forward to holding the Design Sessions at the upcoming Xen Project
Summit. The design sessions will be on Wednesday, 5 June, and Thursday, 6
June 2024.

We encourage everyone to submit a Design Session, the verification code is:
“*LFXEN24*”.

*SUBMIT A DESIGN SESSION* 



The process involves the following steps:

   - Anyone interested can propose
    a topic.
   - All participants review the list of sessions
   , indicating
   their interest in attending each one.
   - The session scheduler optimizes the schedule
    to accommodate as many
   preferences as possible.

Participants can also propose long-form talks by adding [TALK] to the
session title.

For suggested topics, sample Design Session submissions, and more tips
check out the Xen Design Session page

for
more information.


Best Regards,
Xen Project Events Team


Many thanks,
Kelly Choi

Community Manager
Xen Project

Re: [PATCH v5 6/7] tools: Introduce the "xl dt-overlay attach" command


Hi Stefano,

On 24/05/2024 03:18, Stefano Stabellini wrote:

From: Henry Wang 

With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach (in the future also detach) devices from the provided DT overlay
to domains. Support this by introducing a new "xl dt-overlay" command
and related documentation, i.e. "xl dt-overlay attach. Slightly rework
the command option parsing logic.

Signed-off-by: Henry Wang 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Jason Andryuk 
Reviewed-by: Stefano Stabellini 
---
  tools/include/libxl.h   | 15 +++
  tools/include/xenctrl.h |  3 +++
  tools/libs/ctrl/xc_dt_overlay.c | 31 +++
  tools/libs/light/libxl_dt_overlay.c | 28 +
  tools/xl/xl_cmdtable.c  |  4 +--
  tools/xl/xl_vmcontrol.c | 39 -
  6 files changed, 106 insertions(+), 14 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 3b5c18b48b..f2e19ec592 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -643,6 +643,12 @@
   */
  #define LIBXL_HAVE_NR_SPIS 1
  
+/*

+ * LIBXL_HAVE_OVERLAY_DOMAIN indicates the presence of
+ * libxl_dt_overlay_domain.
+ */
+#define LIBXL_HAVE_OVERLAY_DOMAIN 1
I think this wants to gain DT_ just before OVERLAY. So from the name it 
is clearer we are talking about the Device-Tree overlay and not 
filesystem (or anything else where overlays are involved).


Cheers,

--
Julien Grall

[linux-linus test] 186127: tolerable FAIL - PUSHED