[ovmf test] 176059: all pass - PUSHED

2023-01-23 Thread osstest service owner
flight 176059 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/176059/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 37d3eb026a766b2405daae47e02094c2ec248646
baseline version:
 ovmf 7afef31b2b17d1a8d5248eb562352c6d3505ea14

Last test of basis   176004  2023-01-20 17:41:43 Z2 days
Testing same since   176059  2023-01-23 06:10:49 Z0 days1 attempts


People who touched revisions under test:
  Jan Bobek 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   7afef31b2b..37d3eb026a  37d3eb026a766b2405daae47e02094c2ec248646 -> 
xen-tested-master



[PATCH] x86/shadow: sh_type_to_size[] needs L2H entry when HVM+PV32

2023-01-23 Thread Jan Beulich
While the table is used only when HVM=y, the table entry of course needs
to be properly populated when also PV32=y. Fully removing the table
entry we therefore wrong.

Fixes: 1894049fa283 ("x86/shadow: L2H shadow type is PV32-only")
Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -56,7 +56,9 @@ const uint8_t sh_type_to_size[] = {
 [SH_type_l1_64_shadow]   = 1,
 [SH_type_fl1_64_shadow]  = 1,
 [SH_type_l2_64_shadow]   = 1,
-/*  [SH_type_l2h_64_shadow]  = 1,  PV32-only */
+#ifdef CONFIG_PV32
+[SH_type_l2h_64_shadow]  = 1,
+#endif
 [SH_type_l3_64_shadow]   = 1,
 [SH_type_l4_64_shadow]   = 1,
 [SH_type_p2m_table]  = 1,



Re: [PATCH 1/2] libxl: Fix guest kexec - skip cpuid policy

2023-01-23 Thread Juergen Gross

On 21.01.23 22:39, Jason Andryuk wrote:

When a domain performs a kexec (soft reset), libxl__build_pre() is
called with the existing domid.  Calling libxl__cpuid_legacy() on the
existing domain fails since the cpuid policy has already been set, and
the guest isn't rebuilt and doesn't kexec.

xc: error: Failed to set d1's policy (err leaf 0x, subleaf 0x, 
msr 0x) (17 = File exists): Internal error
libxl: error: libxl_cpuid.c:494:libxl__cpuid_legacy: Domain 1:Failed to apply 
CPUID policy: File exists
libxl: error: libxl_create.c:1641:domcreate_rebuild_done: Domain 1:cannot 
(re-)build domain: -3
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read 
failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type 
for domid=1, assuming HVM

During a soft_reset, skip calling libxl__cpuid_legacy() to avoid the
issue.  Before the fixes commit, the libxl__cpuid_legacy() failure would
have been ignored, so kexec would continue.

Fixes: 34990446ca91 "libxl: don't ignore the return value from 
xc_cpuid_apply_policy"
Signed-off-by: Jason Andryuk 
---
Probably a backport candidate since this has been broken for a while.
---
  tools/libs/light/libxl_create.c   | 4 ++--
  tools/libs/light/libxl_dom.c  | 5 +++--
  tools/libs/light/libxl_internal.h | 2 +-
  3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index 5cddc3df79..587a515dff 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -510,7 +510,7 @@ int libxl__domain_build(libxl__gc *gc,
  struct timeval start_time;
  int i, ret;
  
-ret = libxl__build_pre(gc, domid, d_config, state);

+ret = libxl__build_pre(gc, domid, d_config, state, false);


Instead of adding a parameter to libxl__build_pre() I'd rather add another
bool "soft_reset" to libxl__domain_build_state.

This would be more similar to the libxl__domain_build_state->restore use
case.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/2] Revert "tools/xenstore: simplify loop handling connection I/O"

2023-01-23 Thread Juergen Gross

On 21.01.23 22:39, Jason Andryuk wrote:

I'm observing guest kexec trigger xenstored to abort on a double free.

gdb output:
Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140645614258112) at 
./nptl/pthread_kill.c:44
44./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
 at ./nptl/pthread_kill.c:44
 at ./nptl/pthread_kill.c:78
 at ./nptl/pthread_kill.c:89
 at ../sysdeps/posix/raise.c:26
 at talloc.c:119
 ptr=ptr@entry=0x559fae724290) at talloc.c:232
 at xenstored_core.c:2945
(gdb) frame 5
 at talloc.c:119
119TALLOC_ABORT("Bad talloc magic value - double free");
(gdb) frame 7
 at xenstored_core.c:2945
2945talloc_increase_ref_count(conn);
(gdb) p conn
$1 = (struct connection *) 0x559fae724290

Looking at a xenstore trace, we have:
IN 0x559fae71f250 20230120 17:40:53 READ (/local/domain/3/image/device-model-dom
id )
wrl: dom0  1  msec  1 credit 100 reserve100 disc
ard
wrl: dom3  1  msec  1 credit 100 reserve100 disc
ard
wrl: dom0  0  msec  1 credit 100 reserve  0 disc
ard
wrl: dom3  0  msec  1 credit 100 reserve  0 disc
ard
OUT 0x559fae71f250 20230120 17:40:53 ERROR (ENOENT )
wrl: dom0  1  msec  1 credit 100 reserve100 disc
ard
wrl: dom3  1  msec  1 credit 100 reserve100 disc
ard
IN 0x559fae71f250 20230120 17:40:53 RELEASE (3 )
DESTROY watch 0x559fae73f630
DESTROY watch 0x559fae75ddf0
DESTROY watch 0x559fae75ec30
DESTROY watch 0x559fae75ea60
DESTROY watch 0x559fae732c00
DESTROY watch 0x559fae72cea0
DESTROY watch 0x559fae728fc0
DESTROY watch 0x559fae729570
DESTROY connection 0x559fae724290
orphaned node /local/domain/3/device/suspend/event-channel deleted
orphaned node /local/domain/3/device/vbd/51712 deleted
orphaned node /local/domain/3/device/vkbd/0 deleted
orphaned node /local/domain/3/device/vif/0 deleted
orphaned node /local/domain/3/control/shutdown deleted
orphaned node /local/domain/3/control/feature-poweroff deleted
orphaned node /local/domain/3/control/feature-reboot deleted
orphaned node /local/domain/3/control/feature-suspend deleted
orphaned node /local/domain/3/control/feature-s3 deleted
orphaned node /local/domain/3/control/feature-s4 deleted
orphaned node /local/domain/3/control/sysrq deleted
orphaned node /local/domain/3/data deleted
orphaned node /local/domain/3/drivers deleted
orphaned node /local/domain/3/feature deleted
orphaned node /local/domain/3/attr deleted
orphaned node /local/domain/3/error deleted
orphaned node /local/domain/3/console/backend-id deleted

and no further output.

The trace shows that DESTROY was called for connection 0x559fae724290,
but that is the same pointer (conn) main() was looping through from
connections.  So it wasn't actually removed from the connections list?

Reverting commit e8e6e42279a5 "tools/xenstore: simplify loop handling
connection I/O" fixes the abort/double free.  I think the use of
list_for_each_entry_safe is incorrect.  list_for_each_entry_safe makes
traversal safe for deleting the current iterator, but RELEASE/do_release
will delete some other entry in the connections list.  I think the
observed abort is because list_for_each_entry has next pointing to the
deleted connection, and it is used in the subsequent iteration.

Add a comment explaining the unsuitability of list_for_each_entry_safe.
Also notice that the old code takes a reference on next which would
prevents a use-after-free.

This reverts commit e8e6e42279a5723239c5c40ba4c7f579a979465d.

Signed-off-by: Jason Andryuk 


Good catch!

Reviewed-by: Juergen Gross 

with one nit: a "Fixes:" tag for commit e8e6e42279a5 should be added.


---
I didn't verify the stale pointers, which is why there are a lot of "I
think" qualifiers.  But reverting the commit has xenstored still running
whereas it was aborting consistently beforehand.


Your analysis seems to be fine. Soft reset handling includes a
"XS_RELEASE" message for the affected guest, which results in the
struct domain and the associated connection to be freed. This can
happen to be the connection in the "next" pointer, resulting in
the crash you've observed.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH RFC 07/10] domain: map/unmap GADDR based shared guest areas

2023-01-23 Thread Jan Beulich
On 20.01.2023 19:15, Andrew Cooper wrote:
> On 18/01/2023 9:55 am, Jan Beulich wrote:
>> On 17.01.2023 23:04, Andrew Cooper wrote:
>>> On 19/10/2022 8:43 am, Jan Beulich wrote:
 Noteworthy differences from map_vcpu_info():
 - areas can be registered more than once (and de-registered),
>>> When register by GFN is available, there is never a good reason to the
>>> same area twice.
>> Why not? Why shouldn't different entities be permitted to register their
>> areas, one after the other? This at the very least requires a way to
>> de-register.
> 
> Because it's useless and extra complexity.  From the point of view of
> any guest, its an MMIO(ish) window that Xen happens to update the
> content of.
> 
> You don't get systems where you can ask hardware for e.g. "another copy
> of the HPET at mfn $foo please".

I/O ports appear in multiple places on many systems. I think MMIO regions
can, too. And then I don't see why there couldn't be a way to actually
control this (via e.g. some chipset specific register).

 RFC: By using global domain page mappings the demand on the underlying
  VA range may increase significantly. I did consider to use per-
  domain mappings instead, but they exist for x86 only. Of course we
  could have arch_{,un}map_guest_area() aliasing global domain page
  mapping functions on Arm and using per-domain mappings on x86. Yet
  then again map_vcpu_info() doesn't do so either (albeit that's
  likely to be converted subsequently to use map_vcpu_area() anyway).
>>> ... this by providing a bound on the amount of vmap() space can be consumed.
>> I'm afraid I don't understand. When re-registering a different area, the
>> earlier one will be unmapped. The consumption of vmap space cannot grow
>> (or else we'd have a resource leak and hence an XSA).
> 
> In which case you mean "can be re-registered elsewhere".  More
> specifically, the area can be moved, and isn't a singleton operation
> like map_vcpu_info was.
> 
> The wording as presented firmly suggests the presence of an XSA.

You mean the "map_vcpu_info() doesn't do so either"? That talks about the
function not using per-domain mappings. There's no connection at all that
I can see to a missed unmapping, which at this point is the only thing I
can deduce you might be referring to.

 RFC: In map_guest_area() I'm not checking the P2M type, instead - just
  like map_vcpu_info() - solely relying on the type ref acquisition.
  Checking for p2m_ram_rw alone would be wrong, as at least
  p2m_ram_logdirty ought to also be okay to use here (and in similar
  cases, e.g. in Argo's find_ring_mfn()). p2m_is_pageable() could be
  used here (like altp2m_vcpu_enable_ve() does) as well as in
  map_vcpu_info(), yet then again the P2M type is stale by the time
  it is being looked at anyway without the P2M lock held.
>>> Again, another error caused by Xen not knowing the guest physical
>>> address layout.  These mappings should be restricted to just RAM regions
>>> and I think we want to enforce that right from the outset.
>> Meaning what exactly in terms of action for me to take? As said, checking
>> the P2M type is pointless. So without you being more explicit, all I can
>> take your reply for is merely a comment, with no action on my part (not
>> even to remove this RFC remark).
> 
> There will become a point where it will need to become prohibited to
> issue this against something which isn't p2m_type_ram.  If we had a sane
> idea of the guest physmap, I'd go as far as saying E820_RAM, but that's
> clearly not feasible yet.
> 
> Even now, absolutely nothing good can possibly come of e.g. trying to
> overlay it on the grant table, or a grant mapping.
> 
> ram || logdirty ought to exclude most cases we care about the guest
> (not) putting the mapping.

It's still not clear to me what you want me to do: If I add the P2M type
check here including log-dirty, then this will be inconsistent with what
we do elsewhere _and_ useless code (for the time being). I hope you're
not making a scope-creeping request for me to "fix" all the other places
(I may not have found all) where such a P2M type check is either missing
of failing to include log-dirty.

Jan



[xen-unstable test] 176056: regressions - FAIL

2023-01-23 Thread osstest service owner
flight 176056 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/176056/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-coresched-i386-xl 18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl-xsm   18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl   18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-pair  26 guest-migrate/src_host/dst_host fail REGR. vs. 175994
 test-amd64-i386-xl-vhd   17 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl-shadow18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-libvirt-pair 26 guest-migrate/src_host/dst_host fail REGR. vs. 
175994

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl-vhd 17 guest-start/debian.repeat fail in 176042 pass in 
176056
 test-amd64-i386-libvirt-xsm   7 xen-installfail pass in 176042

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt-xsm 15 migrate-support-check fail in 176042 never pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 175987
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 175987
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 175987
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 175994
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 175994
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 175994
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 175994
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 175994
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 175994
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 175994
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 175994
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 175994
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf

Re: [PATCH v2 1/9] x86/shadow: replace sh_reset_l3_up_pointers()

2023-01-23 Thread Jan Beulich
On 20.01.2023 18:02, George Dunlap wrote:
> On Wed, Jan 11, 2023 at 1:52 PM Jan Beulich  wrote:
> 
>> Rather than doing a separate hash walk (and then even using the vCPU
>> variant, which is to go away), do the up-pointer-clearing right in
>> sh_unpin(), as an alternative to the (now further limited) enlisting on
>> a "free floating" list fragment. This utilizes the fact that such list
>> fragments are traversed only for multi-page shadows (in shadow_free()).
>> Furthermore sh_terminate_list() is a safe guard only anyway, which isn't
>> in use in the common case (it actually does anything only for BIGMEM
>> configurations).
> 
> One thing that seems strange about this patch is that you're essentially
> adding a field to the domain shadow struct in lieu of adding another
> another argument to sh_unpin() (unless the bit is referenced elsewhere in
> subsequent patches, which I haven't reviewed, in part because about half of
> them don't apply cleanly to the current tree).

Well, to me adding another parameter to sh_unpin() would have looked odd;
the new field looks slightly cleaner to me. But changing that is merely a
matter of taste, so if you and e.g. Andrew think that approach was better,
I could switch to that. And no, I don't foresee further uses of the field.

As to half of the patches not applying: Some where already applied out of
order, and others therefore need re-basing slightly. Till now I saw no
reason to re-send the remaining patches just for that.

Jan



Re: [PATCH RFC 07/10] domain: map/unmap GADDR based shared guest areas

2023-01-23 Thread Jan Beulich
On 20.01.2023 19:15, Andrew Cooper wrote:
> On 18/01/2023 9:55 am, Jan Beulich wrote:
>> On 17.01.2023 23:04, Andrew Cooper wrote:
>>> On 19/10/2022 8:43 am, Jan Beulich wrote:
 In preparation of the introduction of new vCPU operations allowing to
 register the respective areas (one of the two is x86-specific) by
 guest-physical address, flesh out the map/unmap functions.

 Noteworthy differences from map_vcpu_info():
 - areas can be registered more than once (and de-registered),
>>> When register by GFN is available, there is never a good reason to the
>>> same area twice.
>> Why not? Why shouldn't different entities be permitted to register their
>> areas, one after the other? This at the very least requires a way to
>> de-register.
> 
> Because it's useless and extra complexity.

As to this: Looking at the code I think that I would actually add
complexity (just a little - an extra check) to prevent re-registration.
Things come out more naturally, from what I can tell, by allowing it.
This can also be seen in "common: convert vCPU info area registration"
where I'm actually adding such a (conditional) check to maintain the
"no re-registration" property of the sub-op there. Granted there can be
an argument towards making that check unconditional then ...

Jan



Re: [XEN][RFC PATCH v4 07/16] xen/iommu: Move spin_lock from iommu_dt_device_is_assigned to caller

2023-01-23 Thread Michal Orzel
Hi Vikram,

On 07/12/2022 07:18, Vikram Garhwal wrote:
> 
> 
> Rename iommu_dt_device_is_assigned() to iommu_dt_device_is_assigned_lock().
s/lock/locked/

> 
> Moving spin_lock to caller was done to prevent the concurrent access to
> iommu_dt_device_is_assigned while doing add/remove/assign/deassign.
> 
> Signed-off-by: Vikram Garhwal 
> Reviewed-by: Luca Fancellu 
> ---
>  xen/drivers/passthrough/device_tree.c | 23 +++
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/device_tree.c 
> b/xen/drivers/passthrough/device_tree.c
> index 1c32d7b50c..bb4cf7784d 100644
> --- a/xen/drivers/passthrough/device_tree.c
> +++ b/xen/drivers/passthrough/device_tree.c
> @@ -83,16 +83,15 @@ fail:
>  return rc;
>  }
> 
> -static bool_t iommu_dt_device_is_assigned(const struct dt_device_node *dev)
> +static bool_t
> +iommu_dt_device_is_assigned_locked(const struct dt_device_node *dev)
This should not be indented
>  {
>  bool_t assigned = 0;
> 
>  if ( !dt_device_is_protected(dev) )
>  return 0;
> 
> -spin_lock(&dtdevs_lock);
>  assigned = !list_empty(&dev->domain_list);
> -spin_unlock(&dtdevs_lock);
> 
>  return assigned;
>  }
> @@ -213,27 +212,43 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, 
> struct domain *d,
>  if ( (d && d->is_dying) || domctl->u.assign_device.flags )
>  break;
> 
> +spin_lock(&dtdevs_lock);
> +
>  ret = dt_find_node_by_gpath(domctl->u.assign_device.u.dt.path,
>  domctl->u.assign_device.u.dt.size,
>  &dev);
>  if ( ret )
> +{
> +spin_unlock(&dtdevs_lock);
> +
I think removing a blank line here and in other places would look better.

~Michal



Re: [XEN PATCH v4 3/3] build: compat-xlat-header.py: optimisation to search for just '{' instead of [{}]

2023-01-23 Thread Anthony PERARD
On Fri, Jan 20, 2023 at 06:26:14PM +, Andrew Cooper wrote:
> On 19/01/2023 3:22 pm, Anthony PERARD wrote:
> > `fields` and `extrafields` always all the parts of a sub-struct, so
> > when there is '}', there is always a '{' before it. Also, both are
> > lists.
> >
> > Signed-off-by: Anthony PERARD 
> > ---
> >  xen/tools/compat-xlat-header.py | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/tools/compat-xlat-header.py 
> > b/xen/tools/compat-xlat-header.py
> > index ae5c9f11c9..d0a864b68e 100644
> > --- a/xen/tools/compat-xlat-header.py
> > +++ b/xen/tools/compat-xlat-header.py
> > @@ -105,7 +105,7 @@ def handle_field(prefix, name, id, type, fields):
> >  else:
> >  k = id.replace('.', '_')
> >  print("%sXLAT_%s_HNDL_%s(_d_, _s_);" % (prefix, name, k), 
> > end='')
> > -elif not re_brackets.search(' '.join(fields)):
> > +elif not '{' in fields:
> >  tag = ' '.join(fields)
> >  tag = re.sub(r'\s*(struct|union)\s+(compat_)?(\w+)\s.*', '\\3', 
> > tag)
> >  print(" \\")
> > @@ -290,7 +290,7 @@ def build_body(name, tokens):
> >  print(" \\\n} while (0)")
> >  
> >  def check_field(kind, name, field, extrafields):
> > -if not re_brackets.search(' '.join(extrafields)):
> > +if not '{' in extrafields:
> >  print("; \\")
> >  if len(extrafields) != 0:
> >  for token in extrafields:
> 
> These are the only two users of re_brackets aren't they?  In which case
> you should drop the re.compile() too.

Indeed, I miss that, we can drop re_brackets.

Cheers,

-- 
Anthony PERARD



Re: [PATCH v1 01/14] xen/riscv: add _zicsr to CFLAGS

2023-01-23 Thread Oleksii
On Fri, 2023-01-20 at 15:29 +, Andrew Cooper wrote:
> On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
> > Work with some registers requires csr command which is part of
> > Zicsr.
> > 
> > Signed-off-by: Oleksii Kurochko 
> > ---
> >  xen/arch/riscv/arch.mk | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/riscv/arch.mk b/xen/arch/riscv/arch.mk
> > index 012dc677c3..95b41d9f3e 100644
> > --- a/xen/arch/riscv/arch.mk
> > +++ b/xen/arch/riscv/arch.mk
> > @@ -10,7 +10,7 @@ riscv-march-$(CONFIG_RISCV_ISA_C)   :=
> > $(riscv-march-y)c
> >  # into the upper half _or_ the lower half of the address space.
> >  # -mcmodel=medlow would force Xen into the lower half.
> >  
> > -CFLAGS += -march=$(riscv-march-y) -mstrict-align -mcmodel=medany
> > +CFLAGS += -march=$(riscv-march-y)_zicsr -mstrict-align -
> > mcmodel=medany
> 
> Should we just go straight for G, rather than bumping it along every
> time we make a tweak?
> 
I didn't go straight for G as it represents the “IMAFDZicsr Zifencei”
base and extensions thereby it will be needed to add support for FPU
(at least it requires {save,restore}_fp_state) but I am not sure that
we need it in general.

Another one reason is that Linux kernel introduces _zicsr extenstion
separately (but I am not sure that it can be considered as a serious
argument):
https://elixir.bootlin.com/linux/latest/source/arch/riscv/Makefile#L58
https://lore.kernel.org/all/20221024113000.891820...@linuxfoundation.org/
 
> ~Andrew
~Oleksii




Re: [PATCH] x86/shadow: sh_type_to_size[] needs L2H entry when HVM+PV32

2023-01-23 Thread Andrew Cooper
On 23/01/2023 8:12 am, Jan Beulich wrote:
> While the table is used only when HVM=y, the table entry of course needs
> to be properly populated when also PV32=y. Fully removing the table
> entry we therefore wrong.
>
> Fixes: 1894049fa283 ("x86/shadow: L2H shadow type is PV32-only")
> Signed-off-by: Jan Beulich 

Erm, why?

The safety justification for the original patch was that this is HVM
only code.  And it really is HVM only code - it's genuinely compiled out
for !HVM builds.

So if putting this entry back in fixes the regression OSSTest
identified, then either SH_type_l2h_64_shadow isn't PV32-only, or we
have PV guests entering HVM-only logic.  Either way, the precondition
for correctness of the original patch is violated, and it needs
reverting on those grounds alone.

~Andrew


Re: [XEN][RFC PATCH v4 09/16] xen/iommu: Introduce iommu_remove_dt_device()

2023-01-23 Thread Michal Orzel
Hi Vikram,

On 07/12/2022 07:18, Vikram Garhwal wrote:
> 
> 
> Remove master device from the IOMMU.
Adding some description on the purpose would be beneficial.

> 
> Signed-off-by: Vikram Garhwal 
> ---
>  xen/drivers/passthrough/device_tree.c | 38 +++
>  xen/include/xen/iommu.h   |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/device_tree.c 
> b/xen/drivers/passthrough/device_tree.c
> index 457df333a0..a8ba0b0d17 100644
> --- a/xen/drivers/passthrough/device_tree.c
> +++ b/xen/drivers/passthrough/device_tree.c
> @@ -126,6 +126,44 @@ int iommu_release_dt_devices(struct domain *d)
>  return 0;
>  }
> 
> +int iommu_remove_dt_device(struct dt_device_node *np)
> +{
> +const struct iommu_ops *ops = iommu_get_ops();
> +struct device *dev = dt_to_dev(np);
> +int rc;
> +
Aren't we missing a check if iommu is enabled?

> +if ( !ops )
> +return -EOPNOTSUPP;
-EINVAL to match the return values returned by other functions?

> +
> +spin_lock(&dtdevs_lock);
> +
> +if ( iommu_dt_device_is_assigned_locked(np) ) {
Incorrect coding style. The closing brace should be placed on the next line.

> +rc = -EBUSY;
> +goto fail;
> +}
> +
> +/*
> + * The driver which supports generic IOMMU DT bindings must have
> + * these callback implemented.
> + */
> +if ( !ops->remove_device ) {
Incorrect coding style. The closing brace should be placed on the next line.

> +rc = -EOPNOTSUPP;
-EINVAL to match the return values returned by other functions?

> +goto fail;
> +}
> +
> +/*
> + * Remove master device from the IOMMU if latter is present and 
> available.
> + */
No need for a multi-line comment style.

> +rc = ops->remove_device(0, dev);
> +
> +if ( rc == 0 )
!rc is preffered.

> +iommu_fwspec_free(dev);
> +
> +fail:
> +spin_unlock(&dtdevs_lock);
> +return rc;
> +}
> +
>  int iommu_add_dt_device(struct dt_device_node *np)
>  {
>  const struct iommu_ops *ops = iommu_get_ops();
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index 4f22fc1bed..1b36c0419d 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -225,6 +225,8 @@ int iommu_release_dt_devices(struct domain *d);
>   */
>  int iommu_add_dt_device(struct dt_device_node *np);
> 
> +int iommu_remove_dt_device(struct dt_device_node *np);
These prototypes look to be placed in order. So your function should be
placed before add function.

> +
>  int iommu_do_dt_domctl(struct xen_domctl *, struct domain *,
> XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
> 
> --
> 2.17.1
> 
> 

~Michal



Re: [XEN][RFC PATCH v4 09/16] xen/iommu: Introduce iommu_remove_dt_device()

2023-01-23 Thread Julien Grall

Hi,

On 23/01/2023 10:00, Michal Orzel wrote:

Signed-off-by: Vikram Garhwal 
---
  xen/drivers/passthrough/device_tree.c | 38 +++
  xen/include/xen/iommu.h   |  2 ++
  2 files changed, 40 insertions(+)

diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 457df333a0..a8ba0b0d17 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -126,6 +126,44 @@ int iommu_release_dt_devices(struct domain *d)
  return 0;
  }

+int iommu_remove_dt_device(struct dt_device_node *np)
+{
+const struct iommu_ops *ops = iommu_get_ops();
+struct device *dev = dt_to_dev(np);
+int rc;
+

Aren't we missing a check if iommu is enabled?


+if ( !ops )
+return -EOPNOTSUPP;

-EINVAL to match the return values returned by other functions?


The meaning of -EINVAL is quite overloaded. So it would be better to use 
a mix of errno to help differentiating the error paths.


In this case, '!ops' means there are no possibility (read "support") to 
remove the device. So I think -EOPNOTUSUPP is suitable.





+
+spin_lock(&dtdevs_lock);
+
+if ( iommu_dt_device_is_assigned_locked(np) ) {

Incorrect coding style. The closing brace should be placed on the next line.


+rc = -EBUSY;
+goto fail;
+}
+
+/*
+ * The driver which supports generic IOMMU DT bindings must have
+ * these callback implemented.
+ */
+if ( !ops->remove_device ) {

Incorrect coding style. The closing brace should be placed on the next line.


+rc = -EOPNOTSUPP;

-EINVAL to match the return values returned by other functions?


Ditto.

Cheers,

--
Julien Grall



Re: [PATCH v1 02/14] xen/riscv: add header

2023-01-23 Thread Jan Beulich
On 20.01.2023 16:31, Andrew Cooper wrote:
> On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
>> Signed-off-by: Oleksii Kurochko 
> 
> There's some stuff in here which is not RISCV-specific.  We really want
> to dedup with the other architectures and move into common.

I have to admit that I'm not fully convinced in this case: What an arch
may or may not need in support of its assembly code may heavily vary. It
would need to be very generic thing which could be moved out. Then again
xen/asm.h feels like slightly odd a name with, as kind of already implied
above, assembly code being at times very specific to an architecture
(including e.g. formatting constraints or whether labels are to be
followed by colons).

Jan



Re: [PATCH v1 05/14] xen/riscv: add early_printk_hnum() function

2023-01-23 Thread Jan Beulich
On 20.01.2023 15:59, Oleksii Kurochko wrote:
> Add ability to print hex number.
> It might be useful to print register value as debug information
> in BUG(), WARN(), etc...
> 
> Signed-off-by: Oleksii Kurochko 

Orthogonal to Andrew's reply (following which I think would be best)
a couple of comments which may be applicable elsewhere as well:

> --- a/xen/arch/riscv/early_printk.c
> +++ b/xen/arch/riscv/early_printk.c
> @@ -43,3 +43,42 @@ void early_printk(const char *str)
>  str++;
>  }
>  }
> +
> +static void reverse(char *s, int length)

Please can you get things const-correct (const char *s) and signedness-
correct (unsigned int length) from the beginning. We're converting other
code as we touch it, but this is extremely slow going and hence would
better be avoided in the first place in new code.

> +{
> +int c;
> +char *begin, *end, temp;
> +
> +begin  = s;
> +end= s + length - 1;
> +
> +for ( c = 0; c < length/2; c++ )

Style: Blanks around binary operators.

> +{
> +temp   = *end;
> +*end   = *begin;
> +*begin = temp;
> +
> +begin++;
> +end--;
> +}
> +}
> +
> +void early_printk_hnum(const register_t reg_val)

Likely this function wants to be __init? (All functions that can be
should also be made so.) With that, reverse() then would also want
to become __init.

As to the const here vs the remark further up: In cases like this one
we typically don't use const. You're free to keep it of course, but
I think it should at least be purged from the declaration (and maybe
also the stub).

> +{
> +char hex[] = "0123456789ABCDEF";

static const char __initconst?

> +char buf[17] = {0};
> +
> +register_t num = reg_val;
> +unsigned int count = 0;
> +
> +for ( count = 0; num != 0; count++, num >>= 4 )
> +buf[count] = hex[num & 0x000f];

Just 0xf?

Jan



Re: [PATCH] x86/shadow: sh_type_to_size[] needs L2H entry when HVM+PV32

2023-01-23 Thread Jan Beulich
On 23.01.2023 11:43, Andrew Cooper wrote:
> On 23/01/2023 8:12 am, Jan Beulich wrote:
>> While the table is used only when HVM=y, the table entry of course needs
>> to be properly populated when also PV32=y. Fully removing the table
>> entry we therefore wrong.
>>
>> Fixes: 1894049fa283 ("x86/shadow: L2H shadow type is PV32-only")
>> Signed-off-by: Jan Beulich 
> 
> Erm, why?
> 
> The safety justification for the original patch was that this is HVM
> only code.  And it really is HVM only code - it's genuinely compiled out
> for !HVM builds.

Right, and we have logic taking care of the !HVM case. But that same
logic uses this "HVM-only" table when HVM=y also for all PV types. Hence
the PV32-special type needs to have a non-zero entry when, besides HVM=y,
PV32=y as well.

> So if putting this entry back in fixes the regression OSSTest
> identified, then either SH_type_l2h_64_shadow isn't PV32-only, or we
> have PV guests entering HVM-only logic.  Either way, the precondition
> for correctness of the original patch is violated, and it needs
> reverting on those grounds alone.

I disagree - the table isn't needed when !HVM, and as such can be
considered HVM-only. It merely needs to deal with all cases correctly
when HVM=y.

Jan



Re: [PATCH v1 02/14] xen/riscv: add header

2023-01-23 Thread Andrew Cooper
On 23/01/2023 11:00 am, Jan Beulich wrote:
> On 20.01.2023 16:31, Andrew Cooper wrote:
>> On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
>>> Signed-off-by: Oleksii Kurochko 
>> There's some stuff in here which is not RISCV-specific.  We really want
>> to dedup with the other architectures and move into common.
> I have to admit that I'm not fully convinced in this case: What an arch
> may or may not need in support of its assembly code may heavily vary. It
> would need to be very generic thing which could be moved out. Then again
> xen/asm.h feels like slightly odd a name with, as kind of already implied
> above, assembly code being at times very specific to an architecture
> (including e.g. formatting constraints or whether labels are to be
> followed by colons).

Half of this header file is re-inventing generic concepts that we
already spell differently in the Xen codebase.

It is the difference between bolting something on the side, and
integrating the code properly.

~Andrew


Re: [PATCH v1 06/14] xen/riscv: introduce exception context

2023-01-23 Thread Jan Beulich
On 20.01.2023 15:59, Oleksii Kurochko wrote:
> +/* On stack VCPU state */
> +struct cpu_user_regs
> +{
> +register_t zero;
> +register_t ra;
> +register_t sp;
> +register_t gp;
> +register_t tp;
> +register_t t0;
> +register_t t1;
> +register_t t2;
> +register_t s0;
> +register_t s1;
> +register_t a0;
> +register_t a1;
> +register_t a2;
> +register_t a3;
> +register_t a4;
> +register_t a5;
> +register_t a6;
> +register_t a7;
> +register_t s2;
> +register_t s3;
> +register_t s4;
> +register_t s5;
> +register_t s6;
> +register_t s7;
> +register_t s8;
> +register_t s9;
> +register_t s10;
> +register_t s11;
> +register_t t3;
> +register_t t4;
> +register_t t5;
> +register_t t6;
> +register_t sepc;
> +register_t sstatus;
> +/* pointer to previous stack_cpu_regs */
> +register_t pregs;
> +};

What is the planned correlation of this to what x86 a Arm have in their
public headers (under the same name)? I think the public header want
spelling out first, and if a different internal structure is intended to
be used, the interaction between the two would then want outlining in
the description here.

Jan



Re: [PATCH v1 07/14] xen/riscv: introduce exception handlers implementation

2023-01-23 Thread Jan Beulich
On 20.01.2023 15:59, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/entry.S
> @@ -0,0 +1,97 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +.global handle_exception
> +.align 4
> +
> +handle_exception:
> +
> +/* Exceptions from xen */
> +save_to_stack:
> +/* Save context to stack */
> +REG_S   sp, (RISCV_CPU_USER_REGS_OFFSET(sp) - 
> RISCV_CPU_USER_REGS_SIZE) (sp)
> +addisp, sp, -RISCV_CPU_USER_REGS_SIZE
> +REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(t0)(sp)
> +j   save_context
> +
> +save_context:

Just curious: Why not simply fall through here, i.e. why the J which really
is a NOP in this case?

Jan



Re: [PATCH v1 12/14] xen/riscv: introduce an implementation of macros from

2023-01-23 Thread Jan Beulich
On 20.01.2023 15:59, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/bug.h
> @@ -0,0 +1,120 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2012 Regents of the University of California
> + * Copyright (C) 2021-2023 Vates
> + *
> + */
> +
> +#ifndef _ASM_RISCV_BUG_H
> +#define _ASM_RISCV_BUG_H
> +
> +#include 
> +#include 
> +
> +#ifndef __ASSEMBLY__
> +
> +struct bug_frame {
> +signed int loc_disp;/* Relative address to the bug address */
> +signed int file_disp;   /* Relative address to the filename */
> +signed int msg_disp;/* Relative address to the predicate (for 
> ASSERT) */
> +uint16_t line;  /* Line number */
> +uint32_t pad0:16;   /* Padding for 8-bytes align */
> +};
> +
> +#define bug_loc(b) ((const void *)(b) + (b)->loc_disp)
> +#define bug_file(b) ((const void *)(b) + (b)->file_disp);
> +#define bug_line(b) ((b)->line)
> +#define bug_msg(b) ((const char *)(b) + (b)->msg_disp)
> +
> +#define BUGFRAME_run_fn 0
> +#define BUGFRAME_warn   1
> +#define BUGFRAME_bug2
> +#define BUGFRAME_assert 3
> +
> +#define BUGFRAME_NR 4
> +
> +#define __INSN_LENGTH_MASK  _UL(0x3)
> +#define __INSN_LENGTH_32_UL(0x3)
> +#define __COMPRESSED_INSN_MASK   _UL(0x)
> +
> +#define __BUG_INSN_32_UL(0x00100073) /* ebreak */
> +#define __BUG_INSN_16_UL(0x9002) /* c.ebreak */

May I suggest that you avoid double-underscore (or other reserved) names
where possible?

> +#define GET_INSN_LENGTH(insn)
> \
> +({   \
> + unsigned long __len;\
> + __len = ((insn & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) ? \
> + 4UL : 2UL;  \
> + __len;  \
> +})
> +
> +typedef u32 bug_insn_t;

This is problematic beyond the u32 instead of uint32_t. You use it once, ...

> +/* These are defined by the architecture */
> +int is_valid_bugaddr(bug_insn_t addr);

... in a call to this function, but you can't assume that you can access
32 bits when the insn you look at might be a compressed one. Just to be
on the safe side I'd like to suggest to either avoid such a type, or to
introduce two (32- and 16-bit) which then of course need using properly
in respective contexts.


> +#define BUG_FN_REG t0
> +
> +/* Many versions of GCC doesn't support the asm %c parameter which would
> + * be preferable to this unpleasantness. We use mergeable string
> + * sections to avoid multiple copies of the string appearing in the
> + * Xen image. BUGFRAME_run_fn needs to be handled separately.
> + */
> +#define BUG_FRAME(type, line, file, has_msg, msg) do {  \
> +asm ("1:ebreak\n"
> \

Something's odd with the padding here; looks like there might be hard tabs.

> + ".pushsection .rodata.str, \"aMS\", %progbits, 1\n"\
> + "2:\t.asciz " __stringify(file) "\n"   \
> + "3:\n" \
> + ".if " #has_msg "\n"   \
> + "\t.asciz " #msg "\n"  \
> + ".endif\n" \
> + ".popsection\n"\
> + ".pushsection .bug_frames." __stringify(type) ", \"a\", 
> %progbits\n"\
> + "4:\n" \
> + ".p2align 2\n" \
> + ".long (1b - 4b)\n"\
> + ".long (2b - 4b)\n"\
> + ".long (3b - 4b)\n"\
> + ".hword " __stringify(line) ", 0\n"\
> + ".popsection");\
> +} while (0)
> +
> +/*
> + * GCC will not allow to use "i"  when PIE is enabled (Xen doesn't set the
> + * flag but instead rely on the default value from the compiler). So the
> + * easiest way to implement run_in_exception_handler() is to pass the to
> + * be called function in a fixed register.
> + */
> +#define  run_in_exception_handler(fn) do {  \

With

register void *fn_ asm(__stringify(BUG_FN_REG)) = (fn);

you should be able to avoid ...

> +asm ("mv " __stringify(BUG_FN_REG) ", %0\n"  
> \

... this and simply use ...

> + "1:ebreak\n"
> 

Re: [PATCH v1 07/14] xen/riscv: introduce exception handlers implementation

2023-01-23 Thread Andrew Cooper
On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
> diff --git a/xen/arch/riscv/entry.S b/xen/arch/riscv/entry.S
> new file mode 100644
> index 00..f7d46f42bb
> --- /dev/null
> +++ b/xen/arch/riscv/entry.S
> @@ -0,0 +1,97 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +.global handle_exception
> +.align 4
> +
> +handle_exception:

ENTRY() which takes care of the global and the align.

Also, you want a size and type at the end, just like in head.S  (Sorry,
we *still* don't have any sane infrastructure for doing that nicely. 
Opencode it for now.)

> +
> +/* Exceptions from xen */
> +save_to_stack:

This label isn't used at all, is it?

> +/* Save context to stack */
> +REG_S   sp, (RISCV_CPU_USER_REGS_OFFSET(sp) - 
> RISCV_CPU_USER_REGS_SIZE) (sp)
> +addisp, sp, -RISCV_CPU_USER_REGS_SIZE
> +REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(t0)(sp)

Exceptions on RISC-V don't adjust the stack pointer.  This logic depends
on interrupting Xen code, and Xen not having suffered a stack overflow
(and actually, that the space on the stack for all registers also
doesn't overflow).

Which might be fine for now, but I think it warrants a comment somewhere
(probably at handle_exception itself) stating the expectations while
it's still a work in progress.  So in this case something like:

/* Work-in-progress:  Depends on interrupting Xen, and the stack being
good. */


But, do we want to allocate stemp right away (even with an empty
struct), and get tp set up properly?

That said, aren't we going to have to rewrite this when enabling H mode
anyway?

> +j   save_context
> +
> +save_context:

I'd drop this.  It's a nop right now.

> 
> +csrrt0, CSR_SEPC
> +REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(sepc)(sp)
> +csrrt0, CSR_SSTATUS
> +REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(sstatus)(sp)

So something I've noticed about CSRs through this series.

The C CSR macros are set up to use real CSR names, but the CSR_*
constants used in C and ASM are raw numbers.

If we're using raw numbers, then the C CSR accessors should be static
inlines instead, but the advantage of using names is the toolchain can
issue an error when we reference a CSR not supported by the current
extensions.

We ought to use a single form, consistently through Xen.  How feasible
will it be to use names throughout?

~Andrew


Re: [PATCH v1 06/14] xen/riscv: introduce exception context

2023-01-23 Thread Oleksii
On Fri, 2023-01-20 at 15:54 +, Andrew Cooper wrote:
> On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
> > diff --git a/xen/arch/riscv/include/asm/processor.h
> > b/xen/arch/riscv/include/asm/processor.h
> > new file mode 100644
> > index 00..5898a09ce6
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/processor.h
> > @@ -0,0 +1,114 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > *
> > + *
> > + * Copyright 2019 (C) Alistair Francis 
> > + * Copyright 2021 (C) Bobby Eshleman 
> > + * Copyright 2023 (C) Vates
> > + *
> > + */
> > +
> > +#ifndef _ASM_RISCV_PROCESSOR_H
> > +#define _ASM_RISCV_PROCESSOR_H
> > +
> > +#include 
> > +
> > +#define RISCV_CPU_USER_REGS_zero    0
> > +#define RISCV_CPU_USER_REGS_ra  1
> > +#define RISCV_CPU_USER_REGS_sp  2
> > +#define RISCV_CPU_USER_REGS_gp  3
> > +#define RISCV_CPU_USER_REGS_tp  4
> > +#define RISCV_CPU_USER_REGS_t0  5
> > +#define RISCV_CPU_USER_REGS_t1  6
> > +#define RISCV_CPU_USER_REGS_t2  7
> > +#define RISCV_CPU_USER_REGS_s0  8
> > +#define RISCV_CPU_USER_REGS_s1  9
> > +#define RISCV_CPU_USER_REGS_a0  10
> > +#define RISCV_CPU_USER_REGS_a1  11
> > +#define RISCV_CPU_USER_REGS_a2  12
> > +#define RISCV_CPU_USER_REGS_a3  13
> > +#define RISCV_CPU_USER_REGS_a4  14
> > +#define RISCV_CPU_USER_REGS_a5  15
> > +#define RISCV_CPU_USER_REGS_a6  16
> > +#define RISCV_CPU_USER_REGS_a7  17
> > +#define RISCV_CPU_USER_REGS_s2  18
> > +#define RISCV_CPU_USER_REGS_s3  19
> > +#define RISCV_CPU_USER_REGS_s4  20
> > +#define RISCV_CPU_USER_REGS_s5  21
> > +#define RISCV_CPU_USER_REGS_s6  22
> > +#define RISCV_CPU_USER_REGS_s7  23
> > +#define RISCV_CPU_USER_REGS_s8  24
> > +#define RISCV_CPU_USER_REGS_s9  25
> > +#define RISCV_CPU_USER_REGS_s10 26
> > +#define RISCV_CPU_USER_REGS_s11 27
> > +#define RISCV_CPU_USER_REGS_t3  28
> > +#define RISCV_CPU_USER_REGS_t4  29
> > +#define RISCV_CPU_USER_REGS_t5  30
> > +#define RISCV_CPU_USER_REGS_t6  31
> > +#define RISCV_CPU_USER_REGS_sepc    32
> > +#define RISCV_CPU_USER_REGS_sstatus 33
> > +#define RISCV_CPU_USER_REGS_pregs   34
> > +#define RISCV_CPU_USER_REGS_last    35
> 
> This block wants moving into the asm-offsets infrastructure, but I
> suspect they won't want to survive in this form.
> 
> edit: yeah, definitely not this form.  RISCV_CPU_USER_REGS_OFFSET is
> a
> recipe for bugs.
> 
Thanks for the recommendation I'll take it into account during a work
on new version of the patch series.

> > +
> > +#define RISCV_CPU_USER_REGS_OFFSET(x)   ((RISCV_CPU_USER_REGS_##x)
> > * __SIZEOF_POINTER__)
> > +#define RISCV_CPU_USER_REGS_SIZE   
> > RISCV_CPU_USER_REGS_OFFSET(last)
> > +
> > +#ifndef __ASSEMBLY__
> > +
> > +/* On stack VCPU state */
> > +struct cpu_user_regs
> > +{
> > +    register_t zero;
> 
> unsigned long.
Why is it better to define them as \unsigned long' instead of
register_t?
> 
> > +    register_t ra;
> > +    register_t sp;
> > +    register_t gp;
> > +    register_t tp;
> > +    register_t t0;
> > +    register_t t1;
> > +    register_t t2;
> > +    register_t s0;
> > +    register_t s1;
> > +    register_t a0;
> > +    register_t a1;
> > +    register_t a2;
> > +    register_t a3;
> > +    register_t a4;
> > +    register_t a5;
> > +    register_t a6;
> > +    register_t a7;
> > +    register_t s2;
> > +    register_t s3;
> > +    register_t s4;
> > +    register_t s5;
> > +    register_t s6;
> > +    register_t s7;
> > +    register_t s8;
> > +    register_t s9;
> > +    register_t s10;
> > +    register_t s11;
> > +    register_t t3;
> > +    register_t t4;
> > +    register_t t5;
> > +    register_t t6;
> > +    register_t sepc;
> > +    register_t sstatus;
> > +    /* pointer to previous stack_cpu_regs */
> > +    register_t pregs;
> 
> Stale comment?  Also, surely this wants to be cpu_user_regs *pregs; ?
> 
Not really.
Later it would be introduced another one structure:
struct pcpu_info {
...
struct cpu_user_regs *stack_cpu_regs;
...
};
And stack_cpu_regs will be updated during context saving before jump to
__handle_exception:

/* new_stack_cpu_regs.pregs = old_stack_cpu_res */
REG_L   t0, RISCV_PCPUINFO_OFFSET(stack_cpu_regs)(tp)
REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(pregs)(sp)
/* Update stack_cpu_regs */
REG_S   sp, RISCV_PCPUINFO_OFFSET(stack_cpu_regs)(tp)
And I skipped this part as pcpu_info isn't used anywhere now but
reserve some place for pregs in advance.

> > +};
> > +
> > +static inline void wait_for_interrupt(void)
> 
> There's no point writing out the name in longhand for a wrapper
> around a
> single instruction.
> 
Will change it to "... wf

Re: [PATCH v1 05/14] xen/riscv: add early_printk_hnum() function

2023-01-23 Thread Oleksii
On Fri, 2023-01-20 at 15:39 +, Andrew Cooper wrote:
> On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
> > Add ability to print hex number.
> > It might be useful to print register value as debug information
> > in BUG(), WARN(), etc...
> > 
> > Signed-off-by: Oleksii Kurochko 
> 
> I think it would be better to get s(n)printf() working than to take
> these.  We're going to need to get it working soon anyway, and will
> be
> much easier than doing the full printk() infrastructure.
> 
Agree here.

I re-watched the patch and I do not use this function now at all.
(it looks like it was needed only for my personal debug stuff)

This patch can be dropped now.
> ~Andrew




Re: [PATCH v1 08/14] xen/riscv: introduce decode_cause() stuff

2023-01-23 Thread Andrew Cooper
On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
> diff --git a/xen/arch/riscv/traps.c b/xen/arch/riscv/traps.c
> index 3201b851ef..dd64f053a5 100644
> --- a/xen/arch/riscv/traps.c
> +++ b/xen/arch/riscv/traps.c
> @@ -4,8 +4,96 @@
>   *
>   * RISC-V Trap handlers
>   */
> +#include 
> +#include 
>  #include 
>  #include 
> +#include 
> +
> +const char *decode_trap_cause(unsigned long cause)

These should be static as you've not put a declaration in a header
file.  But as it stands, you'll then get a compiler warning on
decode_cause() as it's not used.

I would merge this patch with the following patch, as the following
patch is very related to this, and then you can get everything nicely
static without unused warnings.

> +{
> +switch ( cause )
> +{
> +case CAUSE_MISALIGNED_FETCH:
> +return "Instruction Address Misaligned";
> +case CAUSE_FETCH_ACCESS:
> +return "Instruction Access Fault";
> +case CAUSE_ILLEGAL_INSTRUCTION:
> +return "Illegal Instruction";
> +case CAUSE_BREAKPOINT:
> +return "Breakpoint";
> +case CAUSE_MISALIGNED_LOAD:
> +return "Load Address Misaligned";
> +case CAUSE_LOAD_ACCESS:
> +return "Load Access Fault";
> +case CAUSE_MISALIGNED_STORE:
> +return "Store/AMO Address Misaligned";
> +case CAUSE_STORE_ACCESS:
> +return "Store/AMO Access Fault";
> +case CAUSE_USER_ECALL:
> +return "Environment Call from U-Mode";
> +case CAUSE_SUPERVISOR_ECALL:
> +return "Environment Call from S-Mode";
> +case CAUSE_MACHINE_ECALL:
> +return "Environment Call from M-Mode";
> +case CAUSE_FETCH_PAGE_FAULT:
> +return "Instruction Page Fault";
> +case CAUSE_LOAD_PAGE_FAULT:
> +return "Load Page Fault";
> +case CAUSE_STORE_PAGE_FAULT:
> +return "Store/AMO Page Fault";
> +case CAUSE_FETCH_GUEST_PAGE_FAULT:
> +return "Instruction Guest Page Fault";
> +case CAUSE_LOAD_GUEST_PAGE_FAULT:
> +return "Load Guest Page Fault";
> +case CAUSE_VIRTUAL_INST_FAULT:
> +return "Virtualized Instruction Fault";
> +case CAUSE_STORE_GUEST_PAGE_FAULT:
> +return "Guest Store/AMO Page Fault";
> +default:
> +return "UNKNOWN";

This style tends to lead to poor code generation.  You probably want:

const char *decode_trap_cause(unsigned long cause)
{
    static const char *const trap_causes[] = {
    [CAUSE_MISALIGNED_FETCH] = "Instruction Address Misaligned",
    ...
    [CAUSE_STORE_GUEST_PAGE_FAULT] = "Guest Store/AMO Page Fault",
    };

    if ( cause < ARRAY_SIZE(trap_causes) && trap_causes[cause] )
        return trap_causes[cause];
    return "UNKNOWN";
}

(note the trailing comma on the final entry, which is there to simply
future diffs)

However, given the hope to get snprintf() wired up, you actually want to
to adjust this to:

    if ( cause < ARRAY_SIZE(trap_causes) )
        return trap_causes[cause];
    return NULL;

And render the raw cause number for the unknown case, because that is
far more useful for whomever is debugging.

~Andrew


Re: [PATCH] x86/shadow: sh_type_to_size[] needs L2H entry when HVM+PV32

2023-01-23 Thread Andrew Cooper
On 23/01/2023 10:47 am, Jan Beulich wrote:
> On 23.01.2023 11:43, Andrew Cooper wrote:
>> On 23/01/2023 8:12 am, Jan Beulich wrote:
>>> While the table is used only when HVM=y, the table entry of course needs
>>> to be properly populated when also PV32=y. Fully removing the table
>>> entry we therefore wrong.
>>>
>>> Fixes: 1894049fa283 ("x86/shadow: L2H shadow type is PV32-only")
>>> Signed-off-by: Jan Beulich 
>> Erm, why?
>>
>> The safety justification for the original patch was that this is HVM
>> only code.  And it really is HVM only code - it's genuinely compiled out
>> for !HVM builds.
> Right, and we have logic taking care of the !HVM case. But that same
> logic uses this "HVM-only" table when HVM=y also for all PV types.

Ok - this is what needs fixing then.

This is a layering violation which has successfully tricked you into
making a buggy patch.

I'm unwilling to bet this will be the final time either...  "this file
is HVM-only, therefore no PV paths enter it" is a reasonable
expectation, and should be true.

~Andrew


Re: [PATCH] x86/shadow: sh_type_to_size[] needs L2H entry when HVM+PV32

2023-01-23 Thread Jan Beulich
On 23.01.2023 13:30, Andrew Cooper wrote:
> On 23/01/2023 10:47 am, Jan Beulich wrote:
>> On 23.01.2023 11:43, Andrew Cooper wrote:
>>> On 23/01/2023 8:12 am, Jan Beulich wrote:
 While the table is used only when HVM=y, the table entry of course needs
 to be properly populated when also PV32=y. Fully removing the table
 entry we therefore wrong.

 Fixes: 1894049fa283 ("x86/shadow: L2H shadow type is PV32-only")
 Signed-off-by: Jan Beulich 
>>> Erm, why?
>>>
>>> The safety justification for the original patch was that this is HVM
>>> only code.  And it really is HVM only code - it's genuinely compiled out
>>> for !HVM builds.
>> Right, and we have logic taking care of the !HVM case. But that same
>> logic uses this "HVM-only" table when HVM=y also for all PV types.
> 
> Ok - this is what needs fixing then.
> 
> This is a layering violation which has successfully tricked you into
> making a buggy patch.
> 
> I'm unwilling to bet this will be the final time either...  "this file
> is HVM-only, therefore no PV paths enter it" is a reasonable
> expectation, and should be true.

Nice abstract consideration, but would mind pointing out how you envision
shadow_size() to look like meeting your constraints _and_ meeting my
demand of no excess #ifdef-ary? The way I'm reading your reply is that
you ask to special case L2H _right in_ shadow_size(). Then again see also
my remark in the original (now known faulty) patch regarding such special
casing. I could of course follow that route, regardless of HVM (i.e.
unlike said there not just for the #else part) ...

Jan



Re: [PATCH v1 06/14] xen/riscv: introduce exception context

2023-01-23 Thread Andrew Cooper
On 23/01/2023 12:03 pm, Oleksii wrote:
> On Fri, 2023-01-20 at 15:54 +, Andrew Cooper wrote:
>> On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
>>> +
>>> +#define RISCV_CPU_USER_REGS_OFFSET(x)   ((RISCV_CPU_USER_REGS_##x)
>>> * __SIZEOF_POINTER__)
>>> +#define RISCV_CPU_USER_REGS_SIZE   
>>> RISCV_CPU_USER_REGS_OFFSET(last)
>>> +
>>> +#ifndef __ASSEMBLY__
>>> +
>>> +/* On stack VCPU state */
>>> +struct cpu_user_regs
>>> +{
>>> +    register_t zero;
>> unsigned long.
> Why is it better to define them as \unsigned long' instead of
> register_t?

Because there is a material cost to deliberately hiding the type, in
terms of code clarity and legibility.

Things like register_t and vaddr_t are nonsense in a POSIX-y build
environment where these things are spelled "unsigned long", not to
mention that the associated infrastructure is longer than the
non-obfuscated form.

~Andrew


[PATCH] automation: Modify static-mem check in qemu-smoke-dom0less-arm64.sh

2023-01-23 Thread Michal Orzel
At the moment, the static-mem check relies on the way Xen exposes the
memory banks in device tree. As this might change, the check should be
modified to be generic and not to rely on device tree. In this case,
let's use /proc/iomem which exposes the memory ranges in %08x format
as follows:
- : 

This way, we can grep in /proc/iomem for an entry containing memory
region defined by the static-mem configuration with "System RAM"
description. If it exists, mark the test as passed. Also, take the
opportunity to add 0x prefix to domu_{base,size} definition rather than
adding it in front of each occurence.

Signed-off-by: Michal Orzel 
---
Patch made as part of the discussion:
https://lore.kernel.org/xen-devel/ba37ee02-c07c-2803-0867-149c77989...@amd.com/

CC: Julien, Ayan
---
 automation/scripts/qemu-smoke-dom0less-arm64.sh | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/automation/scripts/qemu-smoke-dom0less-arm64.sh 
b/automation/scripts/qemu-smoke-dom0less-arm64.sh
index 2b59346fdcfd..182a4b6c18fc 100755
--- a/automation/scripts/qemu-smoke-dom0less-arm64.sh
+++ b/automation/scripts/qemu-smoke-dom0less-arm64.sh
@@ -16,14 +16,13 @@ fi
 
 if [[ "${test_variant}" == "static-mem" ]]; then
 # Memory range that is statically allocated to DOM1
-domu_base="5000"
-domu_size="1000"
+domu_base="0x5000"
+domu_size="0x1000"
 passed="${test_variant} test passed"
 domU_check="
-current=\$(hexdump -e '16/1 \"%02x\"' 
/proc/device-tree/memory@${domu_base}/reg 2>/dev/null)
-expected=$(printf \"%016x%016x\" 0x${domu_base} 0x${domu_size})
-if [[ \"\${expected}\" == \"\${current}\" ]]; then
-   echo \"${passed}\"
+mem_range=$(printf \"%08x-%08x\" ${domu_base} $(( ${domu_base} + ${domu_size} 
- 1 )))
+if grep -q -x \"\${mem_range} : System RAM\" /proc/iomem; then
+echo \"${passed}\"
 fi
 "
 fi
@@ -126,7 +125,7 @@ UBOOT_SOURCE="boot.source"
 UBOOT_SCRIPT="boot.scr"' > binaries/config
 
 if [[ "${test_variant}" == "static-mem" ]]; then
-echo -e "\nDOMU_STATIC_MEM[0]=\"0x${domu_base} 0x${domu_size}\"" >> 
binaries/config
+echo -e "\nDOMU_STATIC_MEM[0]=\"${domu_base} ${domu_size}\"" >> 
binaries/config
 fi
 
 if [[ "${test_variant}" == "boot-cpupools" ]]; then
-- 
2.25.1




Re: [PATCH v1 07/14] xen/riscv: introduce exception handlers implementation

2023-01-23 Thread Jan Beulich
On 23.01.2023 12:50, Andrew Cooper wrote:
> On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
>> +csrrt0, CSR_SEPC
>> +REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(sepc)(sp)
>> +csrrt0, CSR_SSTATUS
>> +REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(sstatus)(sp)
> 
> So something I've noticed about CSRs through this series.
> 
> The C CSR macros are set up to use real CSR names, but the CSR_*
> constants used in C and ASM are raw numbers.
> 
> If we're using raw numbers, then the C CSR accessors should be static
> inlines instead, but the advantage of using names is the toolchain can
> issue an error when we reference a CSR not supported by the current
> extensions.

That's a default-off diagnostic iirc, so we'd gain something here only
when explicitly turning that on as well.

Jan



[XEN v3 1/3] xen/arm: Use the correct format specifier

2023-01-23 Thread Ayan Kumar Halder
1. One should use 'PRIpaddr' to display 'paddr_t' variables. However,
while creating nodes in fdt, the address (if present in the node name)
should be represented using 'PRIx64'. This is to be in conformance
with the following rule present in https://elinux.org/Device_Tree_Linux

. node names
"unit-address does not have leading zeros"

As 'PRIpaddr' introduces leading zeros, we cannot use it.

So, we have introduced a wrapper ie domain_fdt_begin_node() which will
represent physical address using 'PRIx64'.

2. One should use 'PRIx64' to display 'u64' in hex format. The current
use of 'PRIpaddr' for printing PTE is buggy as this is not a physical
address.

Signed-off-by: Ayan Kumar Halder 
---

Changes from -

v1 - 1. Moved the patch earlier.
2. Moved a part of change from "[XEN v1 8/9] xen/arm: Other adaptations 
required to support 32bit paddr"
into this patch.

v2 - 1. Use PRIx64 for appending addresses to fdt node names. This fixes the CI 
failure.

 xen/arch/arm/domain_build.c | 45 +
 xen/arch/arm/gic-v2.c   |  6 ++---
 xen/arch/arm/mm.c   |  2 +-
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index f35f4d2456..97c2395f9a 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1288,6 +1288,20 @@ static int __init fdt_property_interrupts(const struct 
kernel_info *kinfo,
 return res;
 }
 
+static int __init domain_fdt_begin_node(void *fdt, const char *name,
+uint64_t unit)
+{
+/*
+ * The size of the buffer to hold the longest possible string ie
+ * interrupt-controller@ + a 64-bit number + \0
+ */
+char buf[38];
+
+/* ePAPR 3.4 */
+snprintf(buf, sizeof(buf), "%s@%"PRIx64, name, unit);
+return fdt_begin_node(fdt, buf);
+}
+
 static int __init make_memory_node(const struct domain *d,
void *fdt,
int addrcells, int sizecells,
@@ -1296,8 +1310,6 @@ static int __init make_memory_node(const struct domain *d,
 unsigned int i;
 int res, reg_size = addrcells + sizecells;
 int nr_cells = 0;
-/* Placeholder for memory@ + a 64-bit number + \0 */
-char buf[24];
 __be32 reg[NR_MEM_BANKS * 4 /* Worst case addrcells + sizecells */];
 __be32 *cells;
 
@@ -1314,9 +1326,7 @@ static int __init make_memory_node(const struct domain *d,
 
 dt_dprintk("Create memory node\n");
 
-/* ePAPR 3.4 */
-snprintf(buf, sizeof(buf), "memory@%"PRIx64, mem->bank[i].start);
-res = fdt_begin_node(fdt, buf);
+res = domain_fdt_begin_node(fdt, "memory", mem->bank[i].start);
 if ( res )
 return res;
 
@@ -1375,16 +1385,13 @@ static int __init make_shm_memory_node(const struct 
domain *d,
 {
 uint64_t start = mem->bank[i].start;
 uint64_t size = mem->bank[i].size;
-/* Placeholder for xen-shmem@ + a 64-bit number + \0 */
-char buf[27];
 const char compat[] = "xen,shared-memory-v1";
 /* Worst case addrcells + sizecells */
 __be32 reg[GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS];
 __be32 *cells;
 unsigned int len = (addrcells + sizecells) * sizeof(__be32);
 
-snprintf(buf, sizeof(buf), "xen-shmem@%"PRIx64, mem->bank[i].start);
-res = fdt_begin_node(fdt, buf);
+res = domain_fdt_begin_node(fdt, "xen-shmem", mem->bank[i].start);
 if ( res )
 return res;
 
@@ -2716,12 +2723,9 @@ static int __init make_gicv2_domU_node(struct 
kernel_info *kinfo)
 __be32 reg[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) * 2];
 __be32 *cells;
 const struct domain *d = kinfo->d;
-/* Placeholder for interrupt-controller@ + a 64-bit number + \0 */
-char buf[38];
 
-snprintf(buf, sizeof(buf), "interrupt-controller@%"PRIx64,
- vgic_dist_base(&d->arch.vgic));
-res = fdt_begin_node(fdt, buf);
+res = domain_fdt_begin_node(fdt, "interrupt-controller",
+vgic_dist_base(&d->arch.vgic));
 if ( res )
 return res;
 
@@ -2771,14 +2775,10 @@ static int __init make_gicv3_domU_node(struct 
kernel_info *kinfo)
 int res = 0;
 __be32 *reg, *cells;
 const struct domain *d = kinfo->d;
-/* Placeholder for interrupt-controller@ + a 64-bit number + \0 */
-char buf[38];
 unsigned int i, len = 0;
 
-snprintf(buf, sizeof(buf), "interrupt-controller@%"PRIx64,
- vgic_dist_base(&d->arch.vgic));
-
-res = fdt_begin_node(fdt, buf);
+res = domain_fdt_begin_node(fdt, "interrupt-controller",
+vgic_dist_base(&d->arch.vgic));
 if ( res )
 return res;
 
@@ -2858,11 +2858,8 @@ static int __init make_vpl011_uart_node(struct 
kernel_info *kinfo)
 __be32 reg[GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS];
 __be32 *cells;
 struct domain *d = kinfo->d;
-

[XEN v3 0/3] Pre-requisite patches for supporting 32 bit physical address

2023-01-23 Thread Ayan Kumar Halder
Hi All,

These series include some patches and fixes identified during the review of
"[XEN v2 00/11] Add support for 32 bit physical address".

Patch 1/3 : The previous version causes CI to fail. This patch attempts to fix
this.

Patch 2/3 : This was pointed by Jan during the review of
"[XEN v2 05/11] xen/arm: Use paddr_t instead of u64 for address/size".
Similar to Patch 1/3, this can also be considered as a pre-req for supporting
32 bit physical address.

Patch 3/3 : This was also pointed by Jan during the review of
"[XEN v2 05/11] xen/arm: Use paddr_t instead of u64 for address/size".

Ayan Kumar Halder (3):
  xen/arm: Use the correct format specifier
  xen/drivers: ns16550: Fix the use of simple_strtoul() for extracting
u64
  xen/drivers: ns16550: Fix an incorrect assignment to uart->io_size

 xen/arch/arm/domain_build.c | 45 +
 xen/arch/arm/gic-v2.c   |  6 ++---
 xen/arch/arm/mm.c   |  2 +-
 xen/drivers/char/ns16550.c  |  6 ++---
 4 files changed, 28 insertions(+), 31 deletions(-)

-- 
2.17.1




Re: [PATCH v1 03/14] xen/riscv: add

2023-01-23 Thread Jan Beulich
On 20.01.2023 15:59, Oleksii Kurochko wrote:
> Signed-off-by: Oleksii Kurochko 

I was about to commit this, but ...

> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
> @@ -0,0 +1,945 @@
> +/* SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-Clause) */
> +/*
> + * Copyright (c) 2019 Western Digital Corporation or its affiliates.
> + *
> + * Authors:
> + *   Anup Patel 

... this raises a patch authorship question: Are you missing her/his
S-o-b: and/or From:? 

> + * The source has been largely adapted from OpenSBI:
> + * include/sbi/riscv_encodnig.h

Nit: Typo.

> + * 

Nit: trailing blank.

There also look to be hard tabs in the file. This is fine if the file
is being imported (almost) verbatim from elsewhere, but then the origin
wants stating in an Origin: tag (see docs/process/sending-patches.pandoc).

>[...]
> +#define IMM_I(insn)  ((s32)(insn) >> 20)
> +#define IMM_S(insn)  (((s32)(insn) >> 25 << 5) | \
> +  (s32)(((insn) >> 7) & 0x1f))

Please can you avoid introducing new instances of s or u? See
./CODING_STYLE.

Jan



[XEN v3 2/3] xen/drivers: ns16550: Fix the use of simple_strtoul() for extracting u64

2023-01-23 Thread Ayan Kumar Halder
One should be using simple_strtoull() ( instead of simple_strtoul() )
to assign value to 'u64' variable. The reason being u64 can be
represented by 'unsigned long long' on all the platforms (ie Arm32,
Arm64 and x86).

Signed-off-by: Ayan Kumar Halder 
---

Changes from -

v1,v2 - NA (This patch is introduced in v3).

 xen/drivers/char/ns16550.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index 58d0ccd889..43e1f971ab 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -1532,7 +1532,7 @@ static bool __init parse_positional(struct ns16550 *uart, 
char **str)
 else
 #endif
 {
-uart->io_base = simple_strtoul(conf, &conf, 0);
+uart->io_base = simple_strtoull(conf, &conf, 0);
 }
 }
 
@@ -1603,7 +1603,7 @@ static bool __init parse_namevalue_pairs(char *str, 
struct ns16550 *uart)
"Can't use io_base with dev=pci or dev=amt options\n");
 break;
 }
-uart->io_base = simple_strtoul(param_value, NULL, 0);
+uart->io_base = simple_strtoull(param_value, NULL, 0);
 break;
 
 case irq:
-- 
2.17.1




Re: [XEN v3 2/3] xen/drivers: ns16550: Fix the use of simple_strtoul() for extracting u64

2023-01-23 Thread Jan Beulich
On 23.01.2023 14:44, Ayan Kumar Halder wrote:
> One should be using simple_strtoull() ( instead of simple_strtoul() )
> to assign value to 'u64' variable. The reason being u64 can be
> represented by 'unsigned long long' on all the platforms (ie Arm32,
> Arm64 and x86).

Suggested-by: Jan Beulich 
(or Reported-by or Requested-by, to your liking)

> Signed-off-by: Ayan Kumar Halder 

Reviewed-by: Jan Beulich 

Jan



Re: [PATCH v1 03/14] xen/riscv: add

2023-01-23 Thread Oleksii
On Mon, 2023-01-23 at 14:52 +0100, Jan Beulich wrote:
> On 20.01.2023 15:59, Oleksii Kurochko wrote:
> > Signed-off-by: Oleksii Kurochko 
> 
> I was about to commit this, but ...
> 
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
> > @@ -0,0 +1,945 @@
> > +/* SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-Clause) */
> > +/*
> > + * Copyright (c) 2019 Western Digital Corporation or its
> > affiliates.
> > + *
> > + * Authors:
> > + *   Anup Patel 
> 
> ... this raises a patch authorship question: Are you missing her/his
> S-o-b: and/or From:? 
> 
It is not clear who should be in S-o-b and/or From. So let me explain
situation:

Anup Patel  is a person who introduced
riscv_encoding.h in OpenSBI.

A person who introduced the header to Xen isn't clear as I see 3 people
who did it:
- Bobby Eshleman 
- Alistair Francis 
- One more person whoose last name, unfortunately, I can't find
And in all cases I saw that an author is different.

> > + * The source has been largely adapted from OpenSBI:
> > + * include/sbi/riscv_encodnig.h
> 
> Nit: Typo.
> 
> > + * 
> 
> Nit: trailing blank.
> 
> There also look to be hard tabs in the file. This is fine if the file
> is being imported (almost) verbatim from elsewhere, but then the
> origin
> wants stating in an Origin: tag (see docs/process/sending-
> patches.pandoc).
> 
> > [...]
> > +#define IMM_I(insn)((s32)(insn) >> 20)
> > +#define IMM_S(insn)(((s32)(insn) >> 25 << 5) |
> > \
> > +    (s32)(((insn) >> 7) &
> > 0x1f))
> 
> Please can you avoid introducing new instances of s or u? See
> ./CODING_STYLE.
> 
Thanks. I will update the header.
> Jan




Re: [PATCH v1 03/14] xen/riscv: add

2023-01-23 Thread Jan Beulich
On 23.01.2023 15:04, Oleksii wrote:
> On Mon, 2023-01-23 at 14:52 +0100, Jan Beulich wrote:
>> On 20.01.2023 15:59, Oleksii Kurochko wrote:
>>> Signed-off-by: Oleksii Kurochko 
>>
>> I was about to commit this, but ...
>>
>>> --- /dev/null
>>> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
>>> @@ -0,0 +1,945 @@
>>> +/* SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-Clause) */
>>> +/*
>>> + * Copyright (c) 2019 Western Digital Corporation or its
>>> affiliates.
>>> + *
>>> + * Authors:
>>> + *   Anup Patel 
>>
>> ... this raises a patch authorship question: Are you missing her/his
>> S-o-b: and/or From:? 
>>
> It is not clear who should be in S-o-b and/or From. So let me explain
> situation:
> 
> Anup Patel  is a person who introduced
> riscv_encoding.h in OpenSBI.
> 
> A person who introduced the header to Xen isn't clear as I see 3 people
> who did it:
> - Bobby Eshleman 
> - Alistair Francis 
> - One more person whoose last name, unfortunately, I can't find
> And in all cases I saw that an author is different.

Then maybe simply move the "Author:" part into ...

>>> + * The source has been largely adapted from OpenSBI:
>>> + * include/sbi/riscv_encodnig.h

... this sentence, e.g. by appending "originally authored by ..."?

Jan



Re: [PATCH v1 04/14] xen/riscv: add header

2023-01-23 Thread Jan Beulich
On 20.01.2023 15:59, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/csr.h
> @@ -0,0 +1,82 @@
> +/*
> + * Take from Linux.

This again means you want an Origin: tag. Whether the comment itself is
useful depends on how much customization you expect there to be down
the road. But wait - the header here is quite dissimilar from Linux'es,
so the description wants to go into further detail. That would then want
to include why 5 of the 7 functions are actually commented out at this
point.

Jan



Re: [PATCH v1 04/14] xen/riscv: add header

2023-01-23 Thread Oleksii
On Mon, 2023-01-23 at 14:57 +0100, Jan Beulich wrote:
> On 20.01.2023 15:59, Oleksii Kurochko wrote:
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/csr.h
> > @@ -0,0 +1,82 @@
> > +/*
> > + * Take from Linux.
> 
> This again means you want an Origin: tag. Whether the comment itself
> is
> useful depends on how much customization you expect there to be down
> the road. But wait - the header here is quite dissimilar from
> Linux'es,
> so the description wants to go into further detail. That would then
> want
> to include why 5 of the 7 functions are actually commented out at
> this
> point.
> 
I forgot two remove them. They were commented as they aren't used now.
But probably there is a sense to add them from the start.

I am curious if "Take from Linux" is needed at all?
Should it be described what was removed from the original header [1] ?

[1]
https://elixir.bootlin.com/linux/latest/source/arch/riscv/include/asm/csr.h
> Jan




[XEN v3 3/3] xen/drivers: ns16550: Fix an incorrect assignment to uart->io_size

2023-01-23 Thread Ayan Kumar Halder
uart->io_size represents the size in bytes. Thus, when serial_port.bit_width
is assigned to it, it should be converted to size in bytes.

Fixes: 17b516196c55 ("ns16550: add ACPI support for ARM only")
Signed-off-by: Ayan Kumar Halder 
---

Changes from -

v1, v2 - NA (New patch introduced in v3).

 xen/drivers/char/ns16550.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index 43e1f971ab..092f6b9c4b 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -1870,7 +1870,7 @@ static int __init ns16550_acpi_uart_init(const void *data)
 uart->parity = spcr->parity;
 uart->stop_bits = spcr->stop_bits;
 uart->io_base = spcr->serial_port.address;
-uart->io_size = spcr->serial_port.bit_width;
+uart->io_size = DIV_ROUND_UP(spcr->serial_port.bit_width, BITS_PER_BYTE);
 uart->reg_shift = spcr->serial_port.bit_offset;
 uart->reg_width = spcr->serial_port.access_width;
 
-- 
2.17.1




[PATCH v2 0/3] x86/shadow: sh_page_fault() adjustments

2023-01-23 Thread Jan Beulich
The original 2nd patch of v1 was split into two and extended by a 3rd
(1st one here) one.

1: move dm-mmio handling code in sh_page_fault()
2: mark more of sh_page_fault() HVM-only
3: drop dead code from HVM-only sh_page_fault() pieces

Jan



[PATCH v2 1/3] x86/shadow: move dm-mmio handling code in sh_page_fault()

2023-01-23 Thread Jan Beulich
Do away with the partly mis-named "mmio" label there, which really is
only about emulated MMIO. Move the code to the place where the sole
"goto" was. Re-order steps slightly: Assertion first, perfc increment
outside of the locked region, and "gpa" calculation closer to the first
use of the variable. Also make the HVM conditional cover the entire
if(), as p2m_mmio_dm isn't applicable to PV; specifically get_gfn()
won't ever return this type for PV domains.

Signed-off-by: Jan Beulich 
---
v2: New.

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2588,13 +2588,33 @@ static int cf_check sh_page_fault(
 goto emulate;
 }
 
+#ifdef CONFIG_HVM
+
 /* Need to hand off device-model MMIO to the device model */
 if ( p2mt == p2m_mmio_dm )
 {
+ASSERT(is_hvm_vcpu(v));
+if ( !guest_mode(regs) )
+goto not_a_shadow_fault;
+
+sh_audit_gw(v, &gw);
 gpa = guest_walk_to_gpa(&gw);
-goto mmio;
+SHADOW_PRINTK("mmio %#"PRIpaddr"\n", gpa);
+shadow_audit_tables(v);
+sh_reset_early_unshadow(v);
+
+paging_unlock(d);
+put_gfn(d, gfn_x(gfn));
+
+perfc_incr(shadow_fault_mmio);
+trace_shadow_gen(TRC_SHADOW_MMIO, va);
+
+return handle_mmio_with_translation(va, gpa >> PAGE_SHIFT, access)
+   ? EXCRET_fault_fixed : 0;
 }
 
+#endif /* CONFIG_HVM */
+
 /* Ignore attempts to write to read-only memory. */
 if ( p2m_is_readonly(p2mt) && (ft == ft_demand_write) )
 goto emulate_readonly; /* skip over the instruction */
@@ -2867,25 +2887,6 @@ static int cf_check sh_page_fault(
 return EXCRET_fault_fixed;
 #endif /* CONFIG_HVM */
 
- mmio:
-if ( !guest_mode(regs) )
-goto not_a_shadow_fault;
-#ifdef CONFIG_HVM
-ASSERT(is_hvm_vcpu(v));
-perfc_incr(shadow_fault_mmio);
-sh_audit_gw(v, &gw);
-SHADOW_PRINTK("mmio %#"PRIpaddr"\n", gpa);
-shadow_audit_tables(v);
-sh_reset_early_unshadow(v);
-paging_unlock(d);
-put_gfn(d, gfn_x(gfn));
-trace_shadow_gen(TRC_SHADOW_MMIO, va);
-return (handle_mmio_with_translation(va, gpa >> PAGE_SHIFT, access)
-? EXCRET_fault_fixed : 0);
-#else
-BUG();
-#endif
-
  not_a_shadow_fault:
 sh_audit_gw(v, &gw);
 SHADOW_PRINTK("not a shadow fault\n");




[PATCH v2 2/3] x86/shadow: mark more of sh_page_fault() HVM-only

2023-01-23 Thread Jan Beulich
The types p2m_is_readonly() checks for aren't applicable to PV;
specifically get_gfn() won't ever return any such type for PV domains.
Extend the HVM-conditional block of code, also past the subsequent HVM-
only if(). This way the "emulate_readonly" also becomes unreachable when
!HVM, so move the conditional there upwards as well. Noticing the
earlier shadow_mode_refcounts() check, move it up even further, right
after that check. With that, the "done" label also needs marking as
potentially unused.

Signed-off-by: Jan Beulich 
---
v2: Parts split off to a subsequent patch.

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2613,8 +2613,6 @@ static int cf_check sh_page_fault(
? EXCRET_fault_fixed : 0;
 }
 
-#endif /* CONFIG_HVM */
-
 /* Ignore attempts to write to read-only memory. */
 if ( p2m_is_readonly(p2mt) && (ft == ft_demand_write) )
 goto emulate_readonly; /* skip over the instruction */
@@ -2633,12 +2631,14 @@ static int cf_check sh_page_fault(
 goto emulate;
 }
 
+#endif /* CONFIG_HVM */
+
 perfc_incr(shadow_fault_fixed);
 d->arch.paging.log_dirty.fault_count++;
 sh_reset_early_unshadow(v);
 
 trace_shadow_fixup(gw.l1e, va);
- done:
+ done: __maybe_unused;
 sh_audit_gw(v, &gw);
 SHADOW_PRINTK("fixed\n");
 shadow_audit_tables(v);
@@ -2650,6 +2650,7 @@ static int cf_check sh_page_fault(
 if ( !shadow_mode_refcounts(d) || !guest_mode(regs) )
 goto not_a_shadow_fault;
 
+#ifdef CONFIG_HVM
 /*
  * We do not emulate user writes. Instead we use them as a hint that the
  * page is no longer a page table. This behaviour differs from native, but
@@ -2677,7 +2678,6 @@ static int cf_check sh_page_fault(
 goto not_a_shadow_fault;
 }
 
-#ifdef CONFIG_HVM
 /* Unshadow if we are writing to a toplevel pagetable that is
  * flagged as a dying process, and that is not currently used. */
 if ( sh_mfn_is_a_page_table(gmfn) && is_hvm_domain(d) &&




[PATCH v2 3/3] x86/shadow: drop dead code from HVM-only sh_page_fault() pieces

2023-01-23 Thread Jan Beulich
The shadow_mode_refcounts() check immediately ahead of the "emulate"
label renders redundant two subsequent is_hvm_domain() checks (the
latter of which was already redundant with the former).

Also guest_mode() checks are pointless when we already know we're
dealing with a HVM domain.

Finally style-adjust a comment which otherwise would be fully visible as
patch context anyway.

Signed-off-by: Jan Beulich 
---
v2: New, split off from earlier patch.

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2594,8 +2594,6 @@ static int cf_check sh_page_fault(
 if ( p2mt == p2m_mmio_dm )
 {
 ASSERT(is_hvm_vcpu(v));
-if ( !guest_mode(regs) )
-goto not_a_shadow_fault;
 
 sh_audit_gw(v, &gw);
 gpa = guest_walk_to_gpa(&gw);
@@ -2647,7 +2645,7 @@ static int cf_check sh_page_fault(
 return EXCRET_fault_fixed;
 
  emulate:
-if ( !shadow_mode_refcounts(d) || !guest_mode(regs) )
+if ( !shadow_mode_refcounts(d) )
 goto not_a_shadow_fault;
 
 #ifdef CONFIG_HVM
@@ -2672,16 +2670,11 @@ static int cf_check sh_page_fault(
  * caught by user-mode page-table check above.
  */
  emulate_readonly:
-if ( !is_hvm_domain(d) )
-{
-ASSERT_UNREACHABLE();
-goto not_a_shadow_fault;
-}
-
-/* Unshadow if we are writing to a toplevel pagetable that is
- * flagged as a dying process, and that is not currently used. */
-if ( sh_mfn_is_a_page_table(gmfn) && is_hvm_domain(d) &&
- mfn_to_page(gmfn)->pagetable_dying )
+/*
+ * Unshadow if we are writing to a toplevel pagetable that is
+ * flagged as a dying process, and that is not currently used.
+ */
+if ( sh_mfn_is_a_page_table(gmfn) && mfn_to_page(gmfn)->pagetable_dying )
 {
 int used = 0;
 struct vcpu *tmp;




Re: [PATCH] automation: Modify static-mem check in qemu-smoke-dom0less-arm64.sh

2023-01-23 Thread Xenia Ragiadakou



On 1/23/23 15:10, Michal Orzel wrote:

At the moment, the static-mem check relies on the way Xen exposes the
memory banks in device tree. As this might change, the check should be
modified to be generic and not to rely on device tree. In this case,
let's use /proc/iomem which exposes the memory ranges in %08x format
as follows:
- : 

This way, we can grep in /proc/iomem for an entry containing memory
region defined by the static-mem configuration with "System RAM"
description. If it exists, mark the test as passed. Also, take the
opportunity to add 0x prefix to domu_{base,size} definition rather than
adding it in front of each occurence.

Signed-off-by: Michal Orzel 


Reviewed-by: Xenia Ragiadakou 

Also you fixed the hard tab.


---
Patch made as part of the discussion:
https://lore.kernel.org/xen-devel/ba37ee02-c07c-2803-0867-149c77989...@amd.com/

CC: Julien, Ayan
---
  automation/scripts/qemu-smoke-dom0less-arm64.sh | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/automation/scripts/qemu-smoke-dom0less-arm64.sh 
b/automation/scripts/qemu-smoke-dom0less-arm64.sh
index 2b59346fdcfd..182a4b6c18fc 100755
--- a/automation/scripts/qemu-smoke-dom0less-arm64.sh
+++ b/automation/scripts/qemu-smoke-dom0less-arm64.sh
@@ -16,14 +16,13 @@ fi
  
  if [[ "${test_variant}" == "static-mem" ]]; then

  # Memory range that is statically allocated to DOM1
-domu_base="5000"
-domu_size="1000"
+domu_base="0x5000"
+domu_size="0x1000"
  passed="${test_variant} test passed"
  domU_check="
-current=\$(hexdump -e '16/1 \"%02x\"' /proc/device-tree/memory@${domu_base}/reg 
2>/dev/null)
-expected=$(printf \"%016x%016x\" 0x${domu_base} 0x${domu_size})
-if [[ \"\${expected}\" == \"\${current}\" ]]; then
-   echo \"${passed}\"
+mem_range=$(printf \"%08x-%08x\" ${domu_base} $(( ${domu_base} + ${domu_size} 
- 1 )))
+if grep -q -x \"\${mem_range} : System RAM\" /proc/iomem; then
+echo \"${passed}\"
  fi
  "
  fi
@@ -126,7 +125,7 @@ UBOOT_SOURCE="boot.source"
  UBOOT_SCRIPT="boot.scr"' > binaries/config
  
  if [[ "${test_variant}" == "static-mem" ]]; then

-echo -e "\nDOMU_STATIC_MEM[0]=\"0x${domu_base} 0x${domu_size}\"" >> 
binaries/config
+echo -e "\nDOMU_STATIC_MEM[0]=\"${domu_base} ${domu_size}\"" >> 
binaries/config
  fi
  
  if [[ "${test_variant}" == "boot-cpupools" ]]; then


--
Xenia



Re: [PATCH v1 04/14] xen/riscv: add header

2023-01-23 Thread Jan Beulich
On 23.01.2023 15:23, Oleksii wrote:
> On Mon, 2023-01-23 at 14:57 +0100, Jan Beulich wrote:
>> On 20.01.2023 15:59, Oleksii Kurochko wrote:
>>> --- /dev/null
>>> +++ b/xen/arch/riscv/include/asm/csr.h
>>> @@ -0,0 +1,82 @@
>>> +/*
>>> + * Take from Linux.
>>
>> This again means you want an Origin: tag. Whether the comment itself
>> is
>> useful depends on how much customization you expect there to be down
>> the road. But wait - the header here is quite dissimilar from
>> Linux'es,
>> so the description wants to go into further detail. That would then
>> want
>> to include why 5 of the 7 functions are actually commented out at
>> this
>> point.
>>
> I forgot two remove them. They were commented as they aren't used now.
> But probably there is a sense to add them from the start.
> 
> I am curious if "Take from Linux" is needed at all?

I said, I was wondering too. The fewer you take from Linux (and the more
you add on top), the less useful such a comment is going to be.

> Should it be described what was removed from the original header [1] ?

In the description, yes (or, if it's very little, simply say that much
more is present there). Doing so in the leading comment in the header
is risking to go stale very quickly.

Jan



Re: [PATCH v2] x86/ucode/AMD: apply the patch early on every logical thread

2023-01-23 Thread Sergey Dyasli
On Mon, Jan 16, 2023 at 2:47 PM Jan Beulich  wrote:
>
> On 11.01.2023 15:23, Sergey Dyasli wrote:
> > --- a/xen/arch/x86/cpu/microcode/amd.c
> > +++ b/xen/arch/x86/cpu/microcode/amd.c
> > @@ -176,8 +176,13 @@ static enum microcode_match_result compare_revisions(
> >  if ( new_rev > old_rev )
> >  return NEW_UCODE;
> >
> > -if ( opt_ucode_allow_same && new_rev == old_rev )
> > -return NEW_UCODE;
> > +if ( new_rev == old_rev )
> > +{
> > +if ( opt_ucode_allow_same )
> > +return NEW_UCODE;
> > +else
> > +return SAME_UCODE;
> > +}
>
> I find this misleading: "same" should not depend on the command line
> option.

The alternative diff I was considering is this:

--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -179,6 +179,9 @@ static enum microcode_match_result compare_revisions(
 if ( opt_ucode_allow_same && new_rev == old_rev )
 return NEW_UCODE;

+if ( new_rev == old_rev )
+return SAME_UCODE;
+
 return OLD_UCODE;
 }

Do you think the logic is clearer this way? Or should I simply remove
"else" from the first diff above?

> In fact the command line option should affect only the cases
> where ucode is actually to be loaded; it should not affect cases where
> the check is done merely to know whether the cache needs updating.
>
> With that e.g. microcode_update_helper() should then also be adjusted:
> It shouldn't say merely "newer" when "allow-same" is in effect.

I haven't tried late-loading an older ucode blob to see this
inconsistency, but you should be right. I'll test and adjust the
message.

Sergey



[PATCH v2 0/8] runstate/time area registration by (guest) physical address

2023-01-23 Thread Jan Beulich
Since it was indicated that introducing specific new vCPU ops may be
beneficial independent of the introduction of a fully physical-
address-based ABI flavor, here we go. There continue to be a number of
open questions throughout the series, resolving of which is one of the
main goals of this v2 posting.

1: domain: GADDR based shared guest area registration alternative - cleanup
3: domain: update GADDR based runstate guest area
4: x86: update GADDR based secondary time area
5: x86/mem-sharing: copy GADDR based shared guest areas
6: domain: map/unmap GADDR based shared guest areas
7: domain: introduce GADDR based runstate area registration alternative
8: x86: introduce GADDR based secondary time area registration alternative
9: common: convert vCPU info area registration

Jan



[PATCH RFC v2 1/8] domain: GADDR based shared guest area registration alternative - teardown

2023-01-23 Thread Jan Beulich
In preparation of the introduction of new vCPU operations allowing to
register the respective areas (one of the two is x86-specific) by
guest-physical address, add the necessary domain cleanup hooks.

Signed-off-by: Jan Beulich 
Reviewed-by: Julien Grall 
---
RFC: Zapping the areas in pv_shim_shutdown() may not be strictly
 necessary: Aiui unmap_vcpu_info() is called only because the vCPU
 info area cannot be re-registered. Beyond that I guess the
 assumption is that the areas would only be re-registered as they
 were before. If that's not the case I wonder whether the guest
 handles for both areas shouldn't also be zapped.
---
v2: Add assertion in unmap_guest_area().

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1014,7 +1014,10 @@ int arch_domain_soft_reset(struct domain
 }
 
 for_each_vcpu ( d, v )
+{
 set_xen_guest_handle(v->arch.time_info_guest, NULL);
+unmap_guest_area(v, &v->arch.time_guest_area);
+}
 
  exit_put_gfn:
 put_gfn(d, gfn_x(gfn));
@@ -2329,6 +2332,8 @@ int domain_relinquish_resources(struct d
 if ( ret )
 return ret;
 
+unmap_guest_area(v, &v->arch.time_guest_area);
+
 vpmu_destroy(v);
 }
 
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -658,6 +658,7 @@ struct arch_vcpu
 
 /* A secondary copy of the vcpu time info. */
 XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
+struct guest_area time_guest_area;
 
 struct arch_vm_event *vm_event;
 
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -394,8 +394,10 @@ int pv_shim_shutdown(uint8_t reason)
 
 for_each_vcpu ( d, v )
 {
-/* Unmap guest vcpu_info pages. */
+/* Unmap guest vcpu_info page and runstate/time areas. */
 unmap_vcpu_info(v);
+unmap_guest_area(v, &v->runstate_guest_area);
+unmap_guest_area(v, &v->arch.time_guest_area);
 
 /* Reset the periodic timer to the default value. */
 vcpu_set_periodic_timer(v, MILLISECS(10));
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -963,7 +963,10 @@ int domain_kill(struct domain *d)
 if ( cpupool_move_domain(d, cpupool0) )
 return -ERESTART;
 for_each_vcpu ( d, v )
+{
 unmap_vcpu_info(v);
+unmap_guest_area(v, &v->runstate_guest_area);
+}
 d->is_dying = DOMDYING_dead;
 /* Mem event cleanup has to go here because the rings 
  * have to be put before we call put_domain. */
@@ -1417,6 +1420,7 @@ int domain_soft_reset(struct domain *d,
 {
 set_xen_guest_handle(runstate_guest(v), NULL);
 unmap_vcpu_info(v);
+unmap_guest_area(v, &v->runstate_guest_area);
 }
 
 rc = arch_domain_soft_reset(d);
@@ -1568,6 +1572,19 @@ void unmap_vcpu_info(struct vcpu *v)
 put_page_and_type(mfn_to_page(mfn));
 }
 
+/*
+ * This is only intended to be used for domain cleanup (or more generally only
+ * with at least the respective vCPU, if it's not the current one, reliably
+ * paused).
+ */
+void unmap_guest_area(struct vcpu *v, struct guest_area *area)
+{
+struct domain *d = v->domain;
+
+if ( v != current )
+ASSERT(atomic_read(&v->pause_count) | atomic_read(&d->pause_count));
+}
+
 int default_initialise_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 struct vcpu_guest_context *ctxt;
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -5,6 +5,12 @@
 #include 
 
 #include 
+
+struct guest_area {
+struct page_info *pg;
+void *map;
+};
+
 #include 
 #include 
 
@@ -76,6 +82,11 @@ void arch_vcpu_destroy(struct vcpu *v);
 int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned int offset);
 void unmap_vcpu_info(struct vcpu *v);
 
+int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
+   struct guest_area *area,
+   void (*populate)(void *dst, struct vcpu *v));
+void unmap_guest_area(struct vcpu *v, struct guest_area *area);
+
 int arch_domain_create(struct domain *d,
struct xen_domctl_createdomain *config,
unsigned int flags);
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -202,6 +202,7 @@ struct vcpu
 XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
 } runstate_guest; /* guest address */
 #endif
+struct guest_area runstate_guest_area;
 unsigned int new_state;
 
 /* Has the FPU been initialised? */




[PATCH RFC v2 2/8] domain: update GADDR based runstate guest area

2023-01-23 Thread Jan Beulich
Before adding a new vCPU operation to register the runstate area by
guest-physical address, add code to actually keep such areas up-to-date.

Note that updating of the area will be done exclusively following the
model enabled by VMASST_TYPE_runstate_update_flag for virtual-address
based registered areas.

Note further that pages aren't marked dirty when written to (matching
the handling of space mapped by map_vcpu_info()), on the basis that the
registrations are lost anyway across migration (or would need re-
populating at the target for transparent migration). Plus the contents
of the areas in question have to be deemed volatile in the first place
(so saving a "most recent" value is pretty meaningless even for e.g.
snapshotting).

Signed-off-by: Jan Beulich 
---
RFC: HVM guests (on x86) can change bitness and hence layout (and size!
 and alignment) of the runstate area. I don't think it is an option
 to require 32-bit code to pass a range such that even the 64-bit
 layout wouldn't cross a page boundary (and be suitably aligned). I
 also don't see any other good solution, so for now a crude approach
 with an extra boolean is used (using has_32bit_shinfo() isn't race
 free and could hence lead to overrunning the mapped space).
---
v2: Drop VM-assist conditionals.

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1616,14 +1616,53 @@ bool update_runstate_area(struct vcpu *v
 struct guest_memory_policy policy = { };
 void __user *guest_handle = NULL;
 struct vcpu_runstate_info runstate;
+struct vcpu_runstate_info *map = v->runstate_guest_area.map;
+
+memcpy(&runstate, &v->runstate, sizeof(runstate));
+
+if ( map )
+{
+uint64_t *pset;
+#ifdef CONFIG_COMPAT
+struct compat_vcpu_runstate_info *cmap = NULL;
+
+if ( v->runstate_guest_area_compat )
+cmap = (void *)map;
+#endif
+
+/*
+ * NB: No VM_ASSIST(v->domain, runstate_update_flag) check here.
+ * Always using that updating model.
+ */
+#ifdef CONFIG_COMPAT
+if ( cmap )
+pset = &cmap->state_entry_time;
+else
+#endif
+pset = &map->state_entry_time;
+runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+write_atomic(pset, runstate.state_entry_time);
+smp_wmb();
+
+#ifdef CONFIG_COMPAT
+if ( cmap )
+XLAT_vcpu_runstate_info(cmap, &runstate);
+else
+#endif
+*map = runstate;
+
+smp_wmb();
+runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+write_atomic(pset, runstate.state_entry_time);
+
+return true;
+}
 
 if ( guest_handle_is_null(runstate_guest(v)) )
 return true;
 
 update_guest_memory_policy(v, &policy);
 
-memcpy(&runstate, &v->runstate, sizeof(runstate));
-
 if ( VM_ASSIST(v->domain, runstate_update_flag) )
 {
 #ifdef CONFIG_COMPAT
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -231,6 +231,8 @@ struct vcpu
 #ifdef CONFIG_COMPAT
 /* A hypercall is using the compat ABI? */
 bool hcall_compat;
+/* Physical runstate area registered via compat ABI? */
+bool runstate_guest_area_compat;
 #endif
 
 #ifdef CONFIG_IOREQ_SERVER




[PATCH v2 3/8] x86: update GADDR based secondary time area

2023-01-23 Thread Jan Beulich
Before adding a new vCPU operation to register the secondary time area
by guest-physical address, add code to actually keep such areas up-to-
date.

Note that pages aren't marked dirty when written to (matching the
handling of space mapped by map_vcpu_info()), on the basis that the
registrations are lost anyway across migration (or would need re-
populating at the target for transparent migration). Plus the contents
of the areas in question have to be deemed volatile in the first place
(so saving a "most recent" value is pretty meaningless even for e.g.
snapshotting).

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1462,12 +1462,34 @@ static void __update_vcpu_system_time(st
 v->arch.pv.pending_system_time = _u;
 }
 
+static void write_time_guest_area(struct vcpu_time_info *map,
+  const struct vcpu_time_info *src)
+{
+/* 1. Update userspace version. */
+write_atomic(&map->version, src->version);
+smp_wmb();
+
+/* 2. Update all other userspace fields. */
+*map = *src;
+
+/* 3. Update userspace version again. */
+smp_wmb();
+write_atomic(&map->version, version_update_end(src->version));
+}
+
 bool update_secondary_system_time(struct vcpu *v,
   struct vcpu_time_info *u)
 {
 XEN_GUEST_HANDLE(vcpu_time_info_t) user_u = v->arch.time_info_guest;
+struct vcpu_time_info *map = v->arch.time_guest_area.map;
 struct guest_memory_policy policy = { .nested_guest_mode = false };
 
+if ( map )
+{
+write_time_guest_area(map, u);
+return true;
+}
+
 if ( guest_handle_is_null(user_u) )
 return true;
 




[PATCH v2 4/8] x86/mem-sharing: copy GADDR based shared guest areas

2023-01-23 Thread Jan Beulich
In preparation of the introduction of new vCPU operations allowing to
register the respective areas (one of the two is x86-specific) by
guest-physical address, add the necessary fork handling (with the
backing function yet to be filled in).

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1653,6 +1653,65 @@ static void copy_vcpu_nonreg_state(struc
 hvm_set_nonreg_state(cd_vcpu, &nrs);
 }
 
+static int copy_guest_area(struct guest_area *cd_area,
+   const struct guest_area *d_area,
+   struct vcpu *cd_vcpu,
+   const struct domain *d)
+{
+mfn_t d_mfn, cd_mfn;
+
+if ( !d_area->pg )
+return 0;
+
+d_mfn = page_to_mfn(d_area->pg);
+
+/* Allocate & map a page for the area if it hasn't been already. */
+if ( !cd_area->pg )
+{
+gfn_t gfn = mfn_to_gfn(d, d_mfn);
+struct p2m_domain *p2m = p2m_get_hostp2m(cd_vcpu->domain);
+p2m_type_t p2mt;
+p2m_access_t p2ma;
+unsigned int offset;
+int ret;
+
+cd_mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL, NULL);
+if ( mfn_eq(cd_mfn, INVALID_MFN) )
+{
+struct page_info *pg = alloc_domheap_page(cd_vcpu->domain, 0);
+
+if ( !pg )
+return -ENOMEM;
+
+cd_mfn = page_to_mfn(pg);
+set_gpfn_from_mfn(mfn_x(cd_mfn), gfn_x(gfn));
+
+ret = p2m->set_entry(p2m, gfn, cd_mfn, PAGE_ORDER_4K, p2m_ram_rw,
+ p2m->default_access, -1);
+if ( ret )
+return ret;
+}
+else if ( p2mt != p2m_ram_rw )
+return -EBUSY;
+
+/*
+ * Simply specify the entire range up to the end of the page. All the
+ * function uses it for is a check for not crossing page boundaries.
+ */
+offset = PAGE_OFFSET(d_area->map);
+ret = map_guest_area(cd_vcpu, gfn_to_gaddr(gfn) + offset,
+ PAGE_SIZE - offset, cd_area, NULL);
+if ( ret )
+return ret;
+}
+else
+cd_mfn = page_to_mfn(cd_area->pg);
+
+copy_domain_page(cd_mfn, d_mfn);
+
+return 0;
+}
+
 static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
 {
 struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
@@ -1745,6 +1804,16 @@ static int copy_vcpu_settings(struct dom
 copy_domain_page(new_vcpu_info_mfn, vcpu_info_mfn);
 }
 
+/* Same for the (physically registered) runstate and time info areas. 
*/
+ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
+  &d_vcpu->runstate_guest_area, cd_vcpu, d);
+if ( ret )
+return ret;
+ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
+  &d_vcpu->arch.time_guest_area, cd_vcpu, d);
+if ( ret )
+return ret;
+
 ret = copy_vpmu(d_vcpu, cd_vcpu);
 if ( ret )
 return ret;
@@ -1987,7 +2056,10 @@ int mem_sharing_fork_reset(struct domain
 
  state:
 if ( reset_state )
+{
 rc = copy_settings(d, pd);
+/* TBD: What to do here with -ERESTART? */
+}
 
 domain_unpause(d);
 
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1572,6 +1572,13 @@ void unmap_vcpu_info(struct vcpu *v)
 put_page_and_type(mfn_to_page(mfn));
 }
 
+int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
+   struct guest_area *area,
+   void (*populate)(void *dst, struct vcpu *v))
+{
+return -EOPNOTSUPP;
+}
+
 /*
  * This is only intended to be used for domain cleanup (or more generally only
  * with at least the respective vCPU, if it's not the current one, reliably




[PATCH v2 5/8] domain: map/unmap GADDR based shared guest areas

2023-01-23 Thread Jan Beulich
The registration by virtual/linear address has downsides: At least on
x86 the access is expensive for HVM/PVH domains. Furthermore for 64-bit
PV domains the areas are inaccessible (and hence cannot be updated by
Xen) when in guest-user mode, and for HVM guests they may be
inaccessible when Meltdown mitigations are in place. (There are yet
more issues.)

In preparation of the introduction of new vCPU operations allowing to
register the respective areas (one of the two is x86-specific) by
guest-physical address, flesh out the map/unmap functions.

Noteworthy differences from map_vcpu_info():
- areas can be registered more than once (and de-registered),
- remote vCPU-s are paused rather than checked for being down (which in
  principle can change right after the check),
- the domain lock is taken for a much smaller region.

Signed-off-by: Jan Beulich 
---
RFC: By using global domain page mappings the demand on the underlying
 VA range may increase significantly. I did consider to use per-
 domain mappings instead, but they exist for x86 only. Of course we
 could have arch_{,un}map_guest_area() aliasing global domain page
 mapping functions on Arm and using per-domain mappings on x86. Yet
 then again map_vcpu_info() doesn't (and can't) do so.

RFC: In map_guest_area() I'm not checking the P2M type, instead - just
 like map_vcpu_info() - solely relying on the type ref acquisition.
 Checking for p2m_ram_rw alone would be wrong, as at least
 p2m_ram_logdirty ought to also be okay to use here (and in similar
 cases, e.g. in Argo's find_ring_mfn()). p2m_is_pageable() could be
 used here (like altp2m_vcpu_enable_ve() does) as well as in
 map_vcpu_info(), yet then again the P2M type is stale by the time
 it is being looked at anyway without the P2M lock held.
---
v2: currd -> d, to cover mem-sharing's copy_guest_area(). Re-base over
change(s) earlier in the series. Use ~0 as "unmap" request indicator.

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1576,7 +1576,82 @@ int map_guest_area(struct vcpu *v, paddr
struct guest_area *area,
void (*populate)(void *dst, struct vcpu *v))
 {
-return -EOPNOTSUPP;
+struct domain *d = v->domain;
+void *map = NULL;
+struct page_info *pg = NULL;
+int rc = 0;
+
+if ( ~gaddr )
+{
+unsigned long gfn = PFN_DOWN(gaddr);
+unsigned int align;
+p2m_type_t p2mt;
+
+if ( gfn != PFN_DOWN(gaddr + size - 1) )
+return -ENXIO;
+
+#ifdef CONFIG_COMPAT
+if ( has_32bit_shinfo(d) )
+align = alignof(compat_ulong_t);
+else
+#endif
+align = alignof(xen_ulong_t);
+if ( gaddr & (align - 1) )
+return -ENXIO;
+
+rc = check_get_page_from_gfn(d, _gfn(gfn), false, &p2mt, &pg);
+if ( rc )
+return rc;
+
+if ( !get_page_type(pg, PGT_writable_page) )
+{
+put_page(pg);
+return -EACCES;
+}
+
+map = __map_domain_page_global(pg);
+if ( !map )
+{
+put_page_and_type(pg);
+return -ENOMEM;
+}
+map += PAGE_OFFSET(gaddr);
+}
+
+if ( v != current )
+{
+if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
+{
+rc = -ERESTART;
+goto unmap;
+}
+
+vcpu_pause(v);
+
+spin_unlock(&d->hypercall_deadlock_mutex);
+}
+
+domain_lock(d);
+
+if ( map )
+populate(map, v);
+
+SWAP(area->pg, pg);
+SWAP(area->map, map);
+
+domain_unlock(d);
+
+if ( v != current )
+vcpu_unpause(v);
+
+ unmap:
+if ( pg )
+{
+unmap_domain_page_global(map);
+put_page_and_type(pg);
+}
+
+return rc;
 }
 
 /*
@@ -1587,9 +1662,24 @@ int map_guest_area(struct vcpu *v, paddr
 void unmap_guest_area(struct vcpu *v, struct guest_area *area)
 {
 struct domain *d = v->domain;
+void *map;
+struct page_info *pg;
 
 if ( v != current )
 ASSERT(atomic_read(&v->pause_count) | atomic_read(&d->pause_count));
+
+domain_lock(d);
+map = area->map;
+area->map = NULL;
+pg = area->pg;
+area->pg = NULL;
+domain_unlock(d);
+
+if ( pg )
+{
+unmap_domain_page_global(map);
+put_page_and_type(pg);
+}
 }
 
 int default_initialise_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)




[PATCH v2 6/8] domain: introduce GADDR based runstate area registration alternative

2023-01-23 Thread Jan Beulich
The registration by virtual/linear address has downsides: At least on
x86 the access is expensive for HVM/PVH domains. Furthermore for 64-bit
PV domains the area is inaccessible (and hence cannot be updated by Xen)
when in guest-user mode.

Introduce a new vCPU operation allowing to register the runstate area by
guest-physical address.

An at least theoretical downside to using physically registered areas is
that PV then won't see dirty (and perhaps also accessed) bits set in its
respective page table entries.

Signed-off-by: Jan Beulich 
---
v2: Extend comment in public header.

--- a/xen/arch/x86/x86_64/domain.c
+++ b/xen/arch/x86/x86_64/domain.c
@@ -12,6 +12,22 @@
 CHECK_vcpu_get_physid;
 #undef xen_vcpu_get_physid
 
+static void cf_check
+runstate_area_populate(void *map, struct vcpu *v)
+{
+if ( is_pv_vcpu(v) )
+v->arch.pv.need_update_runstate_area = false;
+
+v->runstate_guest_area_compat = true;
+
+if ( v == current )
+{
+struct compat_vcpu_runstate_info *info = map;
+
+XLAT_vcpu_runstate_info(info, &v->runstate);
+}
+}
+
 int
 compat_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
@@ -57,6 +73,25 @@ compat_vcpu_op(int cmd, unsigned int vcp
 
 break;
 }
+
+case VCPUOP_register_runstate_phys_area:
+{
+struct compat_vcpu_register_runstate_memory_area area;
+
+rc = -EFAULT;
+if ( copy_from_guest(&area.addr.p, arg, 1) )
+break;
+
+rc = map_guest_area(v, area.addr.p,
+sizeof(struct compat_vcpu_runstate_info),
+&v->runstate_guest_area,
+runstate_area_populate);
+if ( rc == -ERESTART )
+rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
+   cmd, vcpuid, arg);
+
+break;
+}
 
 case VCPUOP_register_vcpu_time_memory_area:
 {
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1803,6 +1803,26 @@ bool update_runstate_area(struct vcpu *v
 return rc;
 }
 
+static void cf_check
+runstate_area_populate(void *map, struct vcpu *v)
+{
+#ifdef CONFIG_PV
+if ( is_pv_vcpu(v) )
+v->arch.pv.need_update_runstate_area = false;
+#endif
+
+#ifdef CONFIG_COMPAT
+v->runstate_guest_area_compat = false;
+#endif
+
+if ( v == current )
+{
+struct vcpu_runstate_info *info = map;
+
+*info = v->runstate;
+}
+}
+
 long common_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 long rc = 0;
@@ -1977,6 +1997,25 @@ long common_vcpu_op(int cmd, struct vcpu
 
 break;
 }
+
+case VCPUOP_register_runstate_phys_area:
+{
+struct vcpu_register_runstate_memory_area area;
+
+rc = -EFAULT;
+if ( copy_from_guest(&area.addr.p, arg, 1) )
+break;
+
+rc = map_guest_area(v, area.addr.p,
+sizeof(struct vcpu_runstate_info),
+&v->runstate_guest_area,
+runstate_area_populate);
+if ( rc == -ERESTART )
+rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
+   cmd, vcpuid, arg);
+
+break;
+}
 
 default:
 rc = -ENOSYS;
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -218,6 +218,19 @@ struct vcpu_register_time_memory_area {
 typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
 DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
 
+/*
+ * Like the respective VCPUOP_register_*_memory_area, just using the "addr.p"
+ * field of the supplied struct as a guest physical address (i.e. in GFN 
space).
+ * The respective area may not cross a page boundary.  Pass ~0 to unregister an
+ * area.  Note that as long as an area is registered by physical address, the
+ * linear address based area will not be serviced (updated) by the hypervisor.
+ *
+ * Note that the area registered via VCPUOP_register_runstate_memory_area will
+ * be updated in the same manner as the one registered via virtual address PLUS
+ * VMASST_TYPE_runstate_update_flag engaged by the domain.
+ */
+#define VCPUOP_register_runstate_phys_area  14
+
 #endif /* __XEN_PUBLIC_VCPU_H__ */
 
 /*




[PATCH v2 7/8] x86: introduce GADDR based secondary time area registration alternative

2023-01-23 Thread Jan Beulich
The registration by virtual/linear address has downsides: The access is
expensive for HVM/PVH domains. Furthermore for 64-bit PV domains the area
is inaccessible (and hence cannot be updated by Xen) when in guest-user
mode.

Introduce a new vCPU operation allowing to register the secondary time
area by guest-physical address.

An at least theoretical downside to using physically registered areas is
that PV then won't see dirty (and perhaps also accessed) bits set in its
respective page table entries.

Signed-off-by: Jan Beulich 
---
v2: Forge version in force_update_secondary_system_time().

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1499,6 +1499,15 @@ int arch_vcpu_reset(struct vcpu *v)
 return 0;
 }
 
+static void cf_check
+time_area_populate(void *map, struct vcpu *v)
+{
+if ( is_pv_vcpu(v) )
+v->arch.pv.pending_system_time.version = 0;
+
+force_update_secondary_system_time(v, map);
+}
+
 long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 long rc = 0;
@@ -1536,6 +1545,25 @@ long do_vcpu_op(int cmd, unsigned int vc
 
 break;
 }
+
+case VCPUOP_register_vcpu_time_phys_area:
+{
+struct vcpu_register_time_memory_area area;
+
+rc = -EFAULT;
+if ( copy_from_guest(&area.addr.p, arg, 1) )
+break;
+
+rc = map_guest_area(v, area.addr.p,
+sizeof(vcpu_time_info_t),
+&v->arch.time_guest_area,
+time_area_populate);
+if ( rc == -ERESTART )
+rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
+   cmd, vcpuid, arg);
+
+break;
+}
 
 case VCPUOP_get_physid:
 {
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -681,6 +681,8 @@ void domain_cpu_policy_changed(struct do
 
 bool update_secondary_system_time(struct vcpu *,
   struct vcpu_time_info *);
+void force_update_secondary_system_time(struct vcpu *,
+struct vcpu_time_info *);
 
 void vcpu_show_execution_state(struct vcpu *);
 void vcpu_show_registers(const struct vcpu *);
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1524,6 +1524,16 @@ void force_update_vcpu_system_time(struc
 __update_vcpu_system_time(v, 1);
 }
 
+void force_update_secondary_system_time(struct vcpu *v,
+struct vcpu_time_info *map)
+{
+struct vcpu_time_info u;
+
+collect_time_info(v, &u);
+u.version = -1; /* Compensate for version_update_end(). */
+write_time_guest_area(map, &u);
+}
+
 static void update_domain_rtc(void)
 {
 struct domain *d;
--- a/xen/arch/x86/x86_64/domain.c
+++ b/xen/arch/x86/x86_64/domain.c
@@ -115,6 +115,7 @@ compat_vcpu_op(int cmd, unsigned int vcp
 
 case VCPUOP_send_nmi:
 case VCPUOP_get_physid:
+case VCPUOP_register_vcpu_time_phys_area:
 rc = do_vcpu_op(cmd, vcpuid, arg);
 break;
 
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -230,6 +230,7 @@ DEFINE_XEN_GUEST_HANDLE(vcpu_register_ti
  * VMASST_TYPE_runstate_update_flag engaged by the domain.
  */
 #define VCPUOP_register_runstate_phys_area  14
+#define VCPUOP_register_vcpu_time_phys_area 15
 
 #endif /* __XEN_PUBLIC_VCPU_H__ */
 




[PATCH v2 8/8] common: convert vCPU info area registration

2023-01-23 Thread Jan Beulich
Switch to using map_guest_area(). Noteworthy differences from
map_vcpu_info():
- remote vCPU-s are paused rather than checked for being down (which in
  principle can change right after the check),
- the domain lock is taken for a much smaller region,
- the error code for an attempt to re-register the area is now -EBUSY,
- we could in principle permit de-registration when no area was
  previously registered (which would permit "probing", if necessary for
  anything).

Note that this eliminates a bug in copy_vcpu_settings(): The function
did allocate a new page regardless of the GFN already having a mapping,
thus in particular breaking the case of two vCPU-s having their info
areas on the same page.

Signed-off-by: Jan Beulich 
---
RFC: I'm not really certain whether the preliminary check (ahead of
 calling map_guest_area()) is worthwhile to have.
---
v2: Re-base over changes earlier in the series. Properly enforce no re-
registration. Avoid several casts by introducing local variables.

--- a/xen/arch/x86/include/asm/shared.h
+++ b/xen/arch/x86/include/asm/shared.h
@@ -26,17 +26,20 @@ static inline void arch_set_##field(stru
 #define GET_SET_VCPU(type, field)   \
 static inline type arch_get_##field(const struct vcpu *v)   \
 {   \
+const vcpu_info_t *vi = v->vcpu_info_area.map;  \
+\
 return !has_32bit_shinfo(v->domain) ?   \
-   v->vcpu_info->native.arch.field :\
-   v->vcpu_info->compat.arch.field; \
+   vi->native.arch.field : vi->compat.arch.field;   \
 }   \
 static inline void arch_set_##field(struct vcpu *v, \
 type val)   \
 {   \
+vcpu_info_t *vi = v->vcpu_info_area.map;\
+\
 if ( !has_32bit_shinfo(v->domain) ) \
-v->vcpu_info->native.arch.field = val;  \
+vi->native.arch.field = val;\
 else\
-v->vcpu_info->compat.arch.field = val;  \
+vi->compat.arch.field = val;\
 }
 
 #else
@@ -57,12 +60,16 @@ static inline void arch_set_##field(stru
 #define GET_SET_VCPU(type, field)   \
 static inline type arch_get_##field(const struct vcpu *v)   \
 {   \
-return v->vcpu_info->arch.field;\
+const vcpu_info_t *vi = v->vcpu_info_area.map;  \
+\
+return vi->arch.field;  \
 }   \
 static inline void arch_set_##field(struct vcpu *v, \
 type val)   \
 {   \
-v->vcpu_info->arch.field = val; \
+vcpu_info_t *vi = v->vcpu_info_area.map;\
+\
+vi->arch.field = val;   \
 }
 
 #endif
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1758,53 +1758,24 @@ static int copy_vpmu(struct vcpu *d_vcpu
 static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
 {
 unsigned int i;
-struct p2m_domain *p2m = p2m_get_hostp2m(cd);
 int ret = -EINVAL;
 
 for ( i = 0; i < cd->max_vcpus; i++ )
 {
 struct vcpu *d_vcpu = d->vcpu[i];
 struct vcpu *cd_vcpu = cd->vcpu[i];
-mfn_t vcpu_info_mfn;
 
 if ( !d_vcpu || !cd_vcpu )
 continue;
 
-/* Copy & map in the vcpu_info page if the guest uses one */
-vcpu_info_mfn = d_vcpu->vcpu_info_mfn;
-if ( !mfn_eq(vcpu_info_mfn, INVALID_MFN) )
-{
-mfn_t new_vcpu_info_mfn = cd_vcpu->vcpu_info_mfn;
-
-/* Allocate & map the page for it if it hasn't been already */
-if ( mfn_eq(new_vcpu_info_mfn, INVALID_MFN) )
-{
-gfn_t gfn = mfn_to_gfn(d, vcpu_info_mfn);
-unsigned long gfn_l = gfn_x(gfn);
-struct page_info *page;
-
-if ( !(page = alloc_domheap_page(cd, 0)) )
-return -ENOMEM;
-
-new_vcpu_info_mfn = page_to_mfn(page);
-set_gpfn_from_mfn(mfn_x(new_vcpu_info_mfn), gfn_l);
-
-ret = p2m->set_entry(p2m, gfn, new_vcpu_info_mfn,
-

Re: [PATCH v2] x86/ucode/AMD: apply the patch early on every logical thread

2023-01-23 Thread Jan Beulich
On 23.01.2023 15:32, Sergey Dyasli wrote:
> On Mon, Jan 16, 2023 at 2:47 PM Jan Beulich  wrote:
>> On 11.01.2023 15:23, Sergey Dyasli wrote:
>>> --- a/xen/arch/x86/cpu/microcode/amd.c
>>> +++ b/xen/arch/x86/cpu/microcode/amd.c
>>> @@ -176,8 +176,13 @@ static enum microcode_match_result compare_revisions(
>>>  if ( new_rev > old_rev )
>>>  return NEW_UCODE;
>>>
>>> -if ( opt_ucode_allow_same && new_rev == old_rev )
>>> -return NEW_UCODE;
>>> +if ( new_rev == old_rev )
>>> +{
>>> +if ( opt_ucode_allow_same )
>>> +return NEW_UCODE;
>>> +else
>>> +return SAME_UCODE;
>>> +}
>>
>> I find this misleading: "same" should not depend on the command line
>> option.
> 
> The alternative diff I was considering is this:
> 
> --- a/xen/arch/x86/cpu/microcode/amd.c
> +++ b/xen/arch/x86/cpu/microcode/amd.c
> @@ -179,6 +179,9 @@ static enum microcode_match_result compare_revisions(
>  if ( opt_ucode_allow_same && new_rev == old_rev )
>  return NEW_UCODE;
> 
> +if ( new_rev == old_rev )
> +return SAME_UCODE;
> +
>  return OLD_UCODE;
>  }
> 
> Do you think the logic is clearer this way? Or should I simply remove
> "else" from the first diff above?

Neither addresses my comment. I think the command line option check
needs to move out of this function, into ...

>> In fact the command line option should affect only the cases
>> where ucode is actually to be loaded; it should not affect cases where
>> the check is done merely to know whether the cache needs updating.

... some (but not all) of the callers.

Jan



Re: [PATCH v1 07/14] xen/riscv: introduce exception handlers implementation

2023-01-23 Thread Oleksii
On Mon, 2023-01-23 at 12:17 +0100, Jan Beulich wrote:
> On 20.01.2023 15:59, Oleksii Kurochko wrote:
> > --- /dev/null
> > +++ b/xen/arch/riscv/entry.S
> > @@ -0,0 +1,97 @@
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +    .global handle_exception
> > +    .align 4
> > +
> > +handle_exception:
> > +
> > +    /* Exceptions from xen */
> > +save_to_stack:
> > +    /* Save context to stack */
> > +    REG_S   sp, (RISCV_CPU_USER_REGS_OFFSET(sp) -
> > RISCV_CPU_USER_REGS_SIZE) (sp)
> > +    addi    sp, sp, -RISCV_CPU_USER_REGS_SIZE
> > +    REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(t0)(sp)
> > +    j   save_context
> > +
> > +save_context:
> 
> Just curious: Why not simply fall through here, i.e. why the J which
> really
> is a NOP in this case?
> 
There is no any specific reason.
I left it for future.
Will remove it in the next patch version.
> Jan




Re: [PATCH v2 1/3] x86/shadow: move dm-mmio handling code in sh_page_fault()

2023-01-23 Thread Jan Beulich
On 23.01.2023 15:26, Jan Beulich wrote:
> Do away with the partly mis-named "mmio" label there, which really is
> only about emulated MMIO. Move the code to the place where the sole
> "goto" was. Re-order steps slightly: Assertion first, perfc increment
> outside of the locked region, and "gpa" calculation closer to the first
> use of the variable. Also make the HVM conditional cover the entire
> if(), as p2m_mmio_dm isn't applicable to PV; specifically get_gfn()
> won't ever return this type for PV domains.
> 
> Signed-off-by: Jan Beulich 
> ---
> v2: New.
> 
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c

I've sent a stale patch, I'm sorry. This further hunk is needed to keep
!HVM builds working:

@@ -2144,8 +2144,8 @@ static int cf_check sh_page_fault(
 gfn_t gfn = _gfn(0);
 mfn_t gmfn, sl1mfn = _mfn(0);
 shadow_l1e_t sl1e, *ptr_sl1e;
-paddr_t gpa;
 #ifdef CONFIG_HVM
+paddr_t gpa;
 struct sh_emulate_ctxt emul_ctxt;
 const struct x86_emulate_ops *emul_ops;
 int r;

Jan

> @@ -2588,13 +2588,33 @@ static int cf_check sh_page_fault(
>  goto emulate;
>  }
>  
> +#ifdef CONFIG_HVM
> +
>  /* Need to hand off device-model MMIO to the device model */
>  if ( p2mt == p2m_mmio_dm )
>  {
> +ASSERT(is_hvm_vcpu(v));
> +if ( !guest_mode(regs) )
> +goto not_a_shadow_fault;
> +
> +sh_audit_gw(v, &gw);
>  gpa = guest_walk_to_gpa(&gw);
> -goto mmio;
> +SHADOW_PRINTK("mmio %#"PRIpaddr"\n", gpa);
> +shadow_audit_tables(v);
> +sh_reset_early_unshadow(v);
> +
> +paging_unlock(d);
> +put_gfn(d, gfn_x(gfn));
> +
> +perfc_incr(shadow_fault_mmio);
> +trace_shadow_gen(TRC_SHADOW_MMIO, va);
> +
> +return handle_mmio_with_translation(va, gpa >> PAGE_SHIFT, access)
> +   ? EXCRET_fault_fixed : 0;
>  }
>  
> +#endif /* CONFIG_HVM */
> +
>  /* Ignore attempts to write to read-only memory. */
>  if ( p2m_is_readonly(p2mt) && (ft == ft_demand_write) )
>  goto emulate_readonly; /* skip over the instruction */
> @@ -2867,25 +2887,6 @@ static int cf_check sh_page_fault(
>  return EXCRET_fault_fixed;
>  #endif /* CONFIG_HVM */
>  
> - mmio:
> -if ( !guest_mode(regs) )
> -goto not_a_shadow_fault;
> -#ifdef CONFIG_HVM
> -ASSERT(is_hvm_vcpu(v));
> -perfc_incr(shadow_fault_mmio);
> -sh_audit_gw(v, &gw);
> -SHADOW_PRINTK("mmio %#"PRIpaddr"\n", gpa);
> -shadow_audit_tables(v);
> -sh_reset_early_unshadow(v);
> -paging_unlock(d);
> -put_gfn(d, gfn_x(gfn));
> -trace_shadow_gen(TRC_SHADOW_MMIO, va);
> -return (handle_mmio_with_translation(va, gpa >> PAGE_SHIFT, access)
> -? EXCRET_fault_fixed : 0);
> -#else
> -BUG();
> -#endif
> -
>   not_a_shadow_fault:
>  sh_audit_gw(v, &gw);
>  SHADOW_PRINTK("not a shadow fault\n");
> 
> 




Re: [PATCH v1 07/14] xen/riscv: introduce exception handlers implementation

2023-01-23 Thread Oleksii
On Mon, 2023-01-23 at 11:50 +, Andrew Cooper wrote:
> On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
> > diff --git a/xen/arch/riscv/entry.S b/xen/arch/riscv/entry.S
> > new file mode 100644
> > index 00..f7d46f42bb
> > --- /dev/null
> > +++ b/xen/arch/riscv/entry.S
> > @@ -0,0 +1,97 @@
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +    .global handle_exception
> > +    .align 4
> > +
> > +handle_exception:
> 
> ENTRY() which takes care of the global and the align.
> 
> Also, you want a size and type at the end, just like in head.S 
> (Sorry,
> we *still* don't have any sane infrastructure for doing that nicely. 
> Opencode it for now.)
> 
> > +
> > +    /* Exceptions from xen */
> > +save_to_stack:
> 
> This label isn't used at all, is it?
> 
> > +    /* Save context to stack */
> > +    REG_S   sp, (RISCV_CPU_USER_REGS_OFFSET(sp) -
> > RISCV_CPU_USER_REGS_SIZE) (sp)
> > +    addi    sp, sp, -RISCV_CPU_USER_REGS_SIZE
> > +    REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(t0)(sp)
> 
> Exceptions on RISC-V don't adjust the stack pointer.  This logic
> depends
> on interrupting Xen code, and Xen not having suffered a stack
> overflow
> (and actually, that the space on the stack for all registers also
> doesn't overflow).
> 
> Which might be fine for now, but I think it warrants a comment
> somewhere
> (probably at handle_exception itself) stating the expectations while
> it's still a work in progress.  So in this case something like:
> 
> /* Work-in-progress:  Depends on interrupting Xen, and the stack
> being
> good. */
> 
> 
> But, do we want to allocate stemp right away (even with an empty
> struct), and get tp set up properly?
> 
I am not sure that I get you here about stemp. Could you please clarify
a little bit.

> That said, aren't we going to have to rewrite this when enabling H
> mode
> anyway?
I based these code on a code from Bobby's repo ( on top of which with
some additional patches I've successfully ran Dom0 ) so I am not sure
that it will be re-written.
Probably I don't understand about which one part you are talking about.

Regarding H mode if to be honest I didn't see where it is switched to
it.
Maybe Bobby or Alistair can explain me?
> 
> > +    j   save_context
> > +
> > +save_context:
> 
> I'd drop this.  It's a nop right now.
> 
> > 
> > +    csrr    t0, CSR_SEPC
> > +    REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(sepc)(sp)
> > +    csrr    t0, CSR_SSTATUS
> > +    REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(sstatus)(sp)
> 
> So something I've noticed about CSRs through this series.
> 
> The C CSR macros are set up to use real CSR names, but the CSR_*
> constants used in C and ASM are raw numbers.
> 
> If we're using raw numbers, then the C CSR accessors should be static
> inlines instead, but the advantage of using names is the toolchain
> can
> issue an error when we reference a CSR not supported by the current
> extensions.
> 
> We ought to use a single form, consistently through Xen.  How
> feasible
> will it be to use names throughout?
> 
> ~Andrew




Re: [PATCH v2 1/9] x86/shadow: replace sh_reset_l3_up_pointers()

2023-01-23 Thread George Dunlap
On Mon, Jan 23, 2023 at 8:41 AM Jan Beulich  wrote:

> On 20.01.2023 18:02, George Dunlap wrote:
> > On Wed, Jan 11, 2023 at 1:52 PM Jan Beulich  wrote:
> >
> >> Rather than doing a separate hash walk (and then even using the vCPU
> >> variant, which is to go away), do the up-pointer-clearing right in
> >> sh_unpin(), as an alternative to the (now further limited) enlisting on
> >> a "free floating" list fragment. This utilizes the fact that such list
> >> fragments are traversed only for multi-page shadows (in shadow_free()).
> >> Furthermore sh_terminate_list() is a safe guard only anyway, which isn't
> >> in use in the common case (it actually does anything only for BIGMEM
> >> configurations).
> >
> > One thing that seems strange about this patch is that you're essentially
> > adding a field to the domain shadow struct in lieu of adding another
> > another argument to sh_unpin() (unless the bit is referenced elsewhere in
> > subsequent patches, which I haven't reviewed, in part because about half
> of
> > them don't apply cleanly to the current tree).
>
> Well, to me adding another parameter to sh_unpin() would have looked odd;
> the new field looks slightly cleaner to me. But changing that is merely a
> matter of taste, so if you and e.g. Andrew think that approach was better,
> I could switch to that. And no, I don't foresee further uses of the field.
>

You're about to call sh_unpin(), and you want to tell that function to
change its behavior.  What's so odd about adding an argument to the
function to indicate the behavior?  Instead you're adding a bit of global
state which is carried around 100% of the time, even when that function
isn't being called.  That's not what people normally expect; it makes the
code harder to reason about.

It would certainly be ugly to have to add "false" to every other instance
of sh_unpin; but the normal way you get around that is to redefine
sh_unpin() as a wrapper which calls the other function with the 'false'
argument set.

You asked me to review this for a second opinion on the safety of clearing
the up-pointer this way, not because you need an ack; so I don't really
want to block the patch for non-functional reasons.  But I think this is
one of the "death by a thousand cuts" that makes the shadow code more
fragile and difficult for new people to approach and understand.

Re the original question: I've stared at the code for a bit now, and I
can't see anything obviously wrong or dangerous about it.

But it does make me ask, why do we need the "unpinning_l3" pseudo-argument
at all?  Is there any reason not to unconditionally zero out sp->up when we
find a head_type of SH_type_l3_64_shadow?  As far as I can tell, sp->list
doesn't require any special state.  Why do we make the effort to leave it
alone when we're not unpinning all l3s?

In fact, is there a way to unpin an l3 shadow *other* than when we're
unpinning all l3's?  If so, then this patch, as written, is broken -- the
original code clears the up-pointer for *all* L3_64 shadows, regardless of
whether they're on the pinned list; the new patch will only clear the ones
on the pinned list.  But unconditionally clearing sp->up could actually fix
that.

Thoughts?

As to half of the patches not applying: Some where already applied out of
> order, and others therefore need re-basing slightly. Till now I saw no
> reason to re-send the remaining patches just for that.
>

Sorry if that sounded like complaining; I was only being preemptively
defensive against the potential accusation that the answer would have been
obvious if I'd just continued reviewing the series. :-). (And indeed if the
whole series had applied I would have checked that the final result didn't
have any other references to it.)

 -George


[PATCH v4 00/11] Arm cache coloring

2023-01-23 Thread Carlo Nonato
Shared caches in multi-core CPU architectures represent a problem for
predictability of memory access latency. This jeopardizes applicability
of many Arm platform in real-time critical and mixed-criticality
scenarios. We introduce support for cache partitioning with page
coloring, a transparent software technique that enables isolation
between domains and Xen, and thus avoids cache interference.

When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows
the user to define assignments of cache partitions ids, called colors,
where assigning different colors guarantees no mutual eviction on cache
will ever happen. This instructs the Xen memory allocator to provide
the i-th color assignee only with pages that maps to color i, i.e. that
are indexed in the i-th cache partition.

The proposed implementation supports the dom0less feature.
The proposed implementation doesn't support the static-mem feature.
The solution has been tested in several scenarios, including Xilinx Zynq
MPSoCs.

v4 global changes:
- added "llc" acronym (Last Level Cache) in multiple places in code
  (e.g. coloring.{c|h} -> llc_coloring.{c|h}) to better describe the
  feature and to remove ambiguity with too generic "colors". "llc" is also
  shorter than "cache"
- reordered again patches since code is now splitted in common + arch

Carlo Nonato (8):
  xen/common: add cache coloring common code
  xen/arm: add cache coloring initialization
  xen: extend domctl interface for cache coloring
  tools: add support for cache coloring configuration
  xen/arm: add support for cache coloring configuration via device-tree
  xen/arm: use colored allocator for p2m page tables
  Revert "xen/arm: Remove unused BOOT_RELOC_VIRT_START"
  xen/arm: add cache coloring support for Xen

Luca Miccio (3):
  xen/arm: add Dom0 cache coloring support
  xen: add cache coloring allocator for domains
  xen/arm: add Xen cache colors command line parameter

 docs/man/xl.cfg.5.pod.in|  10 +
 docs/misc/arm/cache-coloring.rst| 223 +
 docs/misc/arm/device-tree/booting.txt   |   4 +
 docs/misc/xen-command-line.pandoc   |  61 
 tools/libs/ctrl/xc_domain.c |  17 +
 tools/libs/light/libxl_create.c |   2 +
 tools/libs/light/libxl_types.idl|   1 +
 tools/xl/xl_parse.c |  38 ++-
 xen/arch/Kconfig|  29 ++
 xen/arch/arm/Kconfig|   1 +
 xen/arch/arm/Makefile   |   1 +
 xen/arch/arm/alternative.c  |   9 +-
 xen/arch/arm/arm64/head.S   |  50 +++
 xen/arch/arm/arm64/mm.c |  26 +-
 xen/arch/arm/domain_build.c |  35 ++-
 xen/arch/arm/include/asm/config.h   |   4 +-
 xen/arch/arm/include/asm/llc_coloring.h |  65 
 xen/arch/arm/include/asm/mm.h   |  10 +-
 xen/arch/arm/include/asm/processor.h|  16 +
 xen/arch/arm/llc_coloring.c | 397 
 xen/arch/arm/mm.c   |  95 +-
 xen/arch/arm/p2m.c  |  11 +-
 xen/arch/arm/psci.c |   9 +-
 xen/arch/arm/setup.c|  82 -
 xen/arch/arm/smpboot.c  |   9 +-
 xen/arch/arm/xen.lds.S  |   2 +-
 xen/common/Kconfig  |   3 +
 xen/common/domain.c |  23 +-
 xen/common/domctl.c |  12 +-
 xen/common/keyhandler.c |   4 +
 xen/common/page_alloc.c | 247 +--
 xen/include/public/domctl.h |   6 +-
 xen/include/xen/llc_coloring.h  |  63 
 xen/include/xen/mm.h|  33 ++
 xen/include/xen/sched.h |   9 +
 35 files changed, 1552 insertions(+), 55 deletions(-)
 create mode 100644 docs/misc/arm/cache-coloring.rst
 create mode 100644 xen/arch/arm/include/asm/llc_coloring.h
 create mode 100644 xen/arch/arm/llc_coloring.c
 create mode 100644 xen/include/xen/llc_coloring.h

-- 
2.34.1




[PATCH v4 03/11] xen/arm: add Dom0 cache coloring support

2023-01-23 Thread Carlo Nonato
From: Luca Miccio 

This commit allows the user to set the cache coloring configuration for
Dom0 via a command line parameter.
Since cache coloring and static memory are incompatible, direct mapping
Dom0 isn't possible when coloring is enabled.

Here is also introduced a common configuration syntax for cache colors.

Signed-off-by: Luca Miccio 
Signed-off-by: Marco Solieri 
Signed-off-by: Carlo Nonato 
---
v4:
- dom0 colors are dynamically allocated as for any other domain
  (colors are duplicated in dom0_colors and in the new array, but logic
  is simpler)
---
 docs/misc/arm/cache-coloring.rst| 32 ++---
 xen/arch/arm/domain_build.c | 17 +++--
 xen/arch/arm/include/asm/llc_coloring.h |  4 
 xen/arch/arm/llc_coloring.c | 14 +++
 4 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/docs/misc/arm/cache-coloring.rst b/docs/misc/arm/cache-coloring.rst
index 0244d2f606..c2e0e87426 100644
--- a/docs/misc/arm/cache-coloring.rst
+++ b/docs/misc/arm/cache-coloring.rst
@@ -83,12 +83,38 @@ manually set the way size it's left for the user to 
overcome failing situations
 or for debugging/testing purposes. See `Coloring parameters and domain
 configurations`_ section for more information on that.
 
+Colors selection format
+***
+
+Regardless of the memory pool that has to be colored (Xen, Dom0/DomUs),
+the color selection can be expressed using the same syntax. In particular a
+comma-separated list of colors or ranges of colors is used.
+Ranges are hyphen-separated intervals (such as `0-4`) and are inclusive on both
+sides.
+
+Note that:
+ - no spaces are allowed between values.
+ - no overlapping ranges or duplicated colors are allowed.
+ - values must be written in ascending order.
+
+Examples:
+
++-+---+
+|**Configuration**|**Actual selection**   |
++-+---+
+|  1-2,5-8| [1, 2, 5, 6, 7, 8]|
++-+---+
+|  4-8,10,11,12   | [4, 5, 6, 7, 8, 10, 11, 12]   |
++-+---+
+|  0  | [0]   |
++-+---+
+
 Coloring parameters and domain configurations
 *
 
-LLC way size (as previously discussed) can be set using the appropriate command
-line parameter. See the relevant documentation in
-"docs/misc/xen-command-line.pandoc".
+LLC way size (as previously discussed) and Dom0 colors can be set using the
+appropriate command line parameters. See the relevant documentation
+in "docs/misc/xen-command-line.pandoc".
 
 **Note:** If no color configuration is provided for a domain, the default one,
 which corresponds to all available colors, is used instead.
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index f35f4d2456..093d4ad6f6 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -4014,7 +4015,10 @@ static int __init construct_dom0(struct domain *d)
 /* type must be set before allocate_memory */
 d->arch.type = kinfo.type;
 #endif
-allocate_memory_11(d, &kinfo);
+if ( is_domain_llc_colored(d) )
+allocate_memory(d, &kinfo);
+else
+allocate_memory_11(d, &kinfo);
 find_gnttab_region(d, &kinfo);
 
 #ifdef CONFIG_STATIC_SHM
@@ -4060,6 +4064,8 @@ void __init create_dom0(void)
 .max_maptrack_frames = -1,
 .grant_opts = XEN_DOMCTL_GRANT_version(opt_gnttab_max_version),
 };
+unsigned int *llc_colors = NULL;
+unsigned int num_llc_colors = 0, flags = CDF_privileged;
 
 /* The vGIC for DOM0 is exactly emulating the hardware GIC */
 dom0_cfg.arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE;
@@ -4076,7 +4082,14 @@ void __init create_dom0(void)
 if ( iommu_enabled )
 dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
 
-dom0 = domain_create(0, &dom0_cfg, CDF_privileged | CDF_directmap);
+if ( llc_coloring_enabled )
+llc_colors = dom0_llc_colors(&num_llc_colors);
+else
+flags |= CDF_directmap;
+
+dom0 = domain_create_llc_colored(0, &dom0_cfg, flags, llc_colors,
+ num_llc_colors);
+
 if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
 panic("Error creating domain 0\n");
 
diff --git a/xen/arch/arm/include/asm/llc_coloring.h 
b/xen/arch/arm/include/asm/llc_coloring.h
index c7985c8fd0..382ff7de47 100644
--- a/xen/arch/arm/include/asm/llc_coloring.h
+++ b/xen/arch/arm/include/asm/llc_coloring.h
@@ -17,9 +17,13 @@
 
 bool __init llc_coloring_init(void);
 
+unsigned int *dom0_llc_colors(unsigned int *num_colors);
+
 #else /* !CONFIG_LLC_COLORING */
 
 static inline bool __i

[PATCH v4 06/11] xen/arm: add support for cache coloring configuration via device-tree

2023-01-23 Thread Carlo Nonato
This commit adds the "llc-colors" Device Tree attribute that can be used
for DomUs and Dom0less color configurations. The syntax is the same used
for every color config.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
---
 docs/misc/arm/cache-coloring.rst| 43 +
 docs/misc/arm/device-tree/booting.txt   |  4 +++
 xen/arch/arm/domain_build.c | 18 ++-
 xen/arch/arm/include/asm/llc_coloring.h |  3 ++
 xen/arch/arm/llc_coloring.c | 10 ++
 5 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/docs/misc/arm/cache-coloring.rst b/docs/misc/arm/cache-coloring.rst
index c2e0e87426..a28f75dc26 100644
--- a/docs/misc/arm/cache-coloring.rst
+++ b/docs/misc/arm/cache-coloring.rst
@@ -116,6 +116,49 @@ LLC way size (as previously discussed) and Dom0 colors can 
be set using the
 appropriate command line parameters. See the relevant documentation
 in "docs/misc/xen-command-line.pandoc".
 
+DomUs colors can be set either in the xl configuration file (relative
+documentation at "docs/man/xl.cfg.pod.5.in") or via Device Tree, also for
+Dom0less configurations (relative documentation in
+"docs/misc/arm/device-tree/booting.txt"), as in the following example:
+
+.. raw:: html
+
+
+xen,xen-bootargs = "console=dtuart dtuart=serial0 dom0_mem=1G 
dom0_max_vcpus=1 sched=null llc-coloring=on llc-way-size=64K xen-llc-colors=0-1 
dom0-llc-colors=2-6";
+xen,dom0-bootargs "console=hvc0 earlycon=xen earlyprintk=xen 
root=/dev/ram0"
+
+dom0 {
+compatible = "xen,linux-zimage" "xen,multiboot-module";
+reg = <0x0 0x100 0x0 15858176>;
+};
+
+dom0-ramdisk {
+compatible = "xen,linux-initrd" "xen,multiboot-module";
+reg = <0x0 0x200 0x0 20638062>;
+};
+
+domU0 {
+#address-cells = <0x1>;
+#size-cells = <0x1>;
+compatible = "xen,domain";
+memory = <0x0 0x4>;
+llc-colors = "4-8,10,11,12";
+cpus = <0x1>;
+vpl011 = <0x1>;
+
+module@200 {
+compatible = "multiboot,kernel", "multiboot,module";
+reg = <0x200 0xff>;
+bootargs = "console=ttyAMA0";
+};
+
+module@3000 {
+compatible = "multiboot,ramdisk", "multiboot,module";
+reg = <0x300 0xff>;
+};
+};
+
+
 **Note:** If no color configuration is provided for a domain, the default one,
 which corresponds to all available colors, is used instead.
 
diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index 3879340b5e..ad71c16b00 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -162,6 +162,10 @@ with the following properties:
 
 An integer specifying the number of vcpus to allocate to the guest.
 
+- llc-colors
+A string specifying the LLC color configuration for the guest.
+Refer to "docs/misc/arm/cache_coloring.rst" for syntax.
+
 - vpl011
 
 An empty property to enable/disable a virtual pl011 for the guest to
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 093d4ad6f6..2c1307d349 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -3854,6 +3854,8 @@ void __init create_domUs(void)
 struct dt_device_node *node;
 const struct dt_device_node *cpupool_node,
 *chosen = dt_find_node_by_path("/chosen");
+const char *llc_colors_str;
+unsigned int *llc_colors = NULL, num_llc_colors = 0;
 
 BUG_ON(chosen == NULL);
 dt_for_each_child_node(chosen, node)
@@ -3960,12 +3962,26 @@ void __init create_domUs(void)
 d_cfg.max_maptrack_frames = val;
 }
 
+if ( !dt_property_read_string(node, "llc-colors", &llc_colors_str) )
+{
+if ( !llc_coloring_enabled )
+printk(XENLOG_WARNING
+   "'llc-colors' found, but LLC coloring is disabled\n");
+else if ( dt_find_property(node, "xen,static-mem", NULL) )
+panic("static-mem and LLC coloring are incompatible\n");
+else
+llc_colors = llc_colors_from_str(llc_colors_str,
+ &num_llc_colors);
+}
+
 /*
  * The variable max_init_domid is initialized with zero, so here it's
  * very important to use the pre-increment operator to call
  * domain_create() with a domid > 0. (domid == 0 is reserved for Dom0)
  */
-d = domain_create(++max_init_domid, &d_cfg, flags);
+d = domain_create_llc_colored(++max_init_domid, &d_cfg, flags,
+  llc_colors, num_llc_colors);
+
 if ( IS_ERR(d) )
 panic("Error creating dom

[PATCH v4 05/11] tools: add support for cache coloring configuration

2023-01-23 Thread Carlo Nonato
Add a new "llc_colors" parameter that defines the LLC color assignment for
a domain. The user can specify one or more color ranges using the same
syntax used everywhere else for color config described in the
documentation.
The parameter is defined as a list of strings that represent the color
ranges.

Documentation is also added.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
---
v4:
- removed overlapping color ranges checks during parsing
- moved hypercall buffer initialization in libxenctrl
---
 docs/man/xl.cfg.5.pod.in | 10 +
 tools/libs/ctrl/xc_domain.c  | 17 ++
 tools/libs/light/libxl_create.c  |  2 ++
 tools/libs/light/libxl_types.idl |  1 +
 tools/xl/xl_parse.c  | 38 +++-
 5 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 024bceeb61..96f9249c3d 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2903,6 +2903,16 @@ Currently, only the "sbsa_uart" model is supported for 
ARM.
 
 =back
 
+=over 4
+
+=item B
+
+Specify the Last Level Cache (LLC) color configuration for the guest.
+B can be either a single color value or a hypen-separated closed
+interval of colors (such as "0-4").
+
+=back
+
 =head3 x86
 
 =over 4
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index e939d07157..064f54c349 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -28,6 +28,20 @@ int xc_domain_create(xc_interface *xch, uint32_t *pdomid,
 {
 int err;
 DECLARE_DOMCTL;
+DECLARE_HYPERCALL_BUFFER(uint32_t, llc_colors);
+
+if ( config->num_llc_colors )
+{
+size_t bytes = sizeof(uint32_t) * config->num_llc_colors;
+
+llc_colors = xc_hypercall_buffer_alloc(xch, llc_colors, bytes);
+if ( llc_colors == NULL ) {
+PERROR("Could not allocate LLC colors for xc_domain_create");
+return -ENOMEM;
+}
+memcpy(llc_colors, config->llc_colors.p, bytes);
+set_xen_guest_handle(config->llc_colors, llc_colors);
+}
 
 domctl.cmd = XEN_DOMCTL_createdomain;
 domctl.domain = *pdomid;
@@ -39,6 +53,9 @@ int xc_domain_create(xc_interface *xch, uint32_t *pdomid,
 *pdomid = (uint16_t)domctl.domain;
 *config = domctl.u.createdomain;
 
+if ( llc_colors )
+xc_hypercall_buffer_free(xch, llc_colors);
+
 return 0;
 }
 
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index beec3f6b6f..6d0c768241 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -638,6 +638,8 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config 
*d_config,
 .grant_opts = XEN_DOMCTL_GRANT_version(b_info->max_grant_version),
 .vmtrace_size = ROUNDUP(b_info->vmtrace_buf_kb << 10, 
XC_PAGE_SHIFT),
 .cpupool_id = info->poolid,
+.num_llc_colors = b_info->num_llc_colors,
+.llc_colors.p = b_info->llc_colors,
 };
 
 if (info->type != LIBXL_DOMAIN_TYPE_PV) {
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 0cfad8508d..1f944ca6d7 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -562,6 +562,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 ("ioports",  Array(libxl_ioport_range, "num_ioports")),
 ("irqs", Array(uint32, "num_irqs")),
 ("iomem",Array(libxl_iomem_range, "num_iomem")),
+("llc_colors",   Array(uint32, "num_llc_colors")),
 ("claim_mode",  libxl_defbool),
 ("event_channels",   uint32),
 ("kernel",   string),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 853e9f357a..0f8c469fb5 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1297,8 +1297,9 @@ void parse_config_data(const char *config_source,
 XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms,
*usbctrls, *usbdevs, *p9devs, *vdispls, *pvcallsifs_devs;
 XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian, *dtdevs,
-   *mca_caps;
+   *mca_caps, *llc_colors;
 int num_ioports, num_irqs, num_iomem, num_cpus, num_viridian, num_mca_caps;
+int num_llc_colors;
 int pci_power_mgmt = 0;
 int pci_msitranslate = 0;
 int pci_permissive = 0;
@@ -1447,6 +1448,41 @@ void parse_config_data(const char *config_source,
 if (!xlu_cfg_get_long (config, "maxmem", &l, 0))
 b_info->max_memkb = l * 1024;
 
+if (!xlu_cfg_get_list(config, "llc_colors", &llc_colors, &num_llc_colors, 
0)) {
+int k, cur_index = 0;
+
+b_info->num_llc_colors = 0;
+for (i = 0; i < num_llc_colors; i++) {
+uint32_t start = 0, end = 0;
+
+buf = xlu_cfg_get_listitem(llc_colors, i);
+if (!buf) {
+

[PATCH v4 02/11] xen/arm: add cache coloring initialization

2023-01-23 Thread Carlo Nonato
This commit implements functions declared in the LLC coloring common header
for arm64 and adds documentation. It also adds two command line options: a
runtime switch for the cache coloring feature and the LLC way size
parameter.

The feature init function consists of an auto probing of the cache layout
necessary to retrieve the LLC way size which is used to compute the number
of platform colors. It also adds a debug-key to dump general cache coloring
info.

The domain init function, instead, allocates default colors if needed and
checks the provided configuration for errors.

Note that until this patch, there are no implemented methods for actually
configuring cache colors for domains and all the configurations fall back
to the default one.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
---
v4:
- added "llc-coloring" cmdline option for the boot-time switch
- dom0 colors are now checked during domain init as for any other domain
- fixed processor.h masks bit width
- check for overflow in parse_color_config()
- check_colors() now checks also that colors are sorted and unique
---
 docs/misc/arm/cache-coloring.rst| 105 +
 docs/misc/xen-command-line.pandoc   |  37 
 xen/arch/arm/Kconfig|   1 +
 xen/arch/arm/Makefile   |   1 +
 xen/arch/arm/include/asm/llc_coloring.h |  36 
 xen/arch/arm/include/asm/processor.h|  16 ++
 xen/arch/arm/llc_coloring.c | 272 
 xen/arch/arm/setup.c|   7 +
 8 files changed, 475 insertions(+)
 create mode 100644 docs/misc/arm/cache-coloring.rst
 create mode 100644 xen/arch/arm/include/asm/llc_coloring.h
 create mode 100644 xen/arch/arm/llc_coloring.c

diff --git a/docs/misc/arm/cache-coloring.rst b/docs/misc/arm/cache-coloring.rst
new file mode 100644
index 00..0244d2f606
--- /dev/null
+++ b/docs/misc/arm/cache-coloring.rst
@@ -0,0 +1,105 @@
+Xen cache coloring user guide
+=
+
+The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
+partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is supported.
+
+In order to enable and use it, few steps are needed.
+
+In Kconfig:
+
+- Enable LLC coloring.
+
+CONFIG_LLC_COLORING=y
+- If needed, change the maximum number of colors (refer to menuconfig help for
+  value meaning and when it should be changed).
+
+CONFIG_NR_LLC_COLORS=
+
+Compile Xen and the toolstack and then:
+
+- Set the `llc-coloring=on` command line option.
+- Set `Coloring parameters and domain configurations`_.
+
+Background
+**
+
+Cache hierarchy of a modern multi-core CPU typically has first levels dedicated
+to each core (hence using multiple cache units), while the last level is shared
+among all of them. Such configuration implies that memory operations on one
+core (e.g. running a DomU) are able to generate interference on another core
+(e.g .hosting another DomU). Cache coloring allows eliminating this
+mutual interference, and thus guaranteeing higher and more predictable
+performances for memory accesses.
+The key concept underlying cache coloring is a fragmentation of the memory
+space into a set of sub-spaces called colors that are mapped to disjoint cache
+partitions. Technically, the whole memory space is first divided into a number
+of subsequent regions. Then each region is in turn divided into a number of
+subsequent sub-colors. The generic i-th color is then obtained by all the
+i-th sub-colors in each region.
+
+.. raw:: html
+
+
+Region jRegion j+1
+.   
+. . .
+.   .
+_ _ ___ _ _ _ _
+| | | | | | |
+| c_0 | c_1 | | c_n | c_0 | c_1 |
+   _ _ _|_|_|_ _ _|_|_|_|_ _ _
+:   :
+:   :... ... .
+:color 0
+:... ... .
+:
+  . . ..:
+
+
+There are two pragmatic lesson to be learnt.
+
+1. If one wants to avoid cache interference between two domains, different
+   colors needs to be used for their memory.
+
+2. Color assignment must privilege contiguity in the partitioning. E.g.,
+   assigning colors (0,1) to domain I  and (2,3) to domain  J is better than
+   assigning colors (0,2) to I and (1,3) to J.
+
+How to compute the number of colors
+***
+
+To compute the number of available colors for a specific platform, the size of
+an LLC way and the page size used by Xen must be known. The first parameter can
+be 

[PATCH v4 04/11] xen: extend domctl interface for cache coloring

2023-01-23 Thread Carlo Nonato
This commit updates the domctl interface to allow the user to set cache
coloring configurations from the toolstack.
It also implements the functionality for arm64.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
---
v4:
- updated XEN_DOMCTL_INTERFACE_VERSION
---
 xen/arch/arm/llc_coloring.c| 14 ++
 xen/common/domctl.c| 12 +++-
 xen/include/public/domctl.h|  6 +-
 xen/include/xen/llc_coloring.h |  4 
 4 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/llc_coloring.c b/xen/arch/arm/llc_coloring.c
index 51f057d7c9..2d0457cdbc 100644
--- a/xen/arch/arm/llc_coloring.c
+++ b/xen/arch/arm/llc_coloring.c
@@ -10,6 +10,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -275,6 +276,19 @@ unsigned int *dom0_llc_colors(unsigned int *num_colors)
 return colors;
 }
 
+unsigned int *llc_colors_from_guest(struct xen_domctl_createdomain *config)
+{
+unsigned int *colors;
+
+if ( !config->num_llc_colors )
+return NULL;
+
+colors = alloc_colors(config->num_llc_colors);
+copy_from_guest(colors, config->llc_colors, config->num_llc_colors);
+
+return colors;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index ad71ad8a4c..505626ec46 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -409,6 +410,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
 {
 domid_tdom;
 static domid_t rover = 0;
+unsigned int *llc_colors = NULL, num_llc_colors = 0;
 
 dom = op->domain;
 if ( (dom > 0) && (dom < DOMID_FIRST_RESERVED) )
@@ -434,7 +436,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
 rover = dom;
 }
 
-d = domain_create(dom, &op->u.createdomain, false);
+if ( llc_coloring_enabled )
+{
+llc_colors = llc_colors_from_guest(&op->u.createdomain);
+num_llc_colors = op->u.createdomain.num_llc_colors;
+}
+
+d = domain_create_llc_colored(dom, &op->u.createdomain, false,
+  llc_colors, num_llc_colors);
+
 if ( IS_ERR(d) )
 {
 ret = PTR_ERR(d);
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 51be28c3de..498503 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -21,7 +21,7 @@
 #include "hvm/save.h"
 #include "memory.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x0015
+#define XEN_DOMCTL_INTERFACE_VERSION 0x0016
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -92,6 +92,10 @@ struct xen_domctl_createdomain {
 /* CPU pool to use; specify 0 or a specific existing pool */
 uint32_t cpupool_id;
 
+/* IN LLC coloring parameters */
+uint32_t num_llc_colors;
+XEN_GUEST_HANDLE(uint32) llc_colors;
+
 struct xen_arch_domainconfig arch;
 };
 
diff --git a/xen/include/xen/llc_coloring.h b/xen/include/xen/llc_coloring.h
index 625930d378..2855f38296 100644
--- a/xen/include/xen/llc_coloring.h
+++ b/xen/include/xen/llc_coloring.h
@@ -24,6 +24,8 @@ int domain_llc_coloring_init(struct domain *d, unsigned int 
*colors,
 void domain_llc_coloring_free(struct domain *d);
 void domain_dump_llc_colors(struct domain *d);
 
+unsigned int *llc_colors_from_guest(struct xen_domctl_createdomain *config);
+
 #else
 
 #define llc_coloring_enabled (false)
@@ -36,6 +38,8 @@ static inline int domain_llc_coloring_init(struct domain *d,
 }
 static inline void domain_llc_coloring_free(struct domain *d) {}
 static inline void domain_dump_llc_colors(struct domain *d) {}
+static inline unsigned int *llc_colors_from_guest(
+struct xen_domctl_createdomain *config) { return NULL; }
 
 #endif /* CONFIG_HAS_LLC_COLORING */
 
-- 
2.34.1




[PATCH v4 09/11] Revert "xen/arm: Remove unused BOOT_RELOC_VIRT_START"

2023-01-23 Thread Carlo Nonato
This reverts commit 0c18fb76323bfb13615b6f13c98767face2d8097 (not clean).

This is not a clean revert since the rework of the memory layout, but it is
sufficiently similar to a clean one.
The only difference is that the BOOT_RELOC_VIRT_START must match the new
layout.

Cache coloring support for Xen needs to relocate Xen code and data in a new
colored physical space. The BOOT_RELOC_VIRT_START will be used as the virtual
base address for a temporary mapping to this new space.

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
---
 xen/arch/arm/include/asm/config.h | 4 +++-
 xen/arch/arm/mm.c | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index c5d407a749..5359acd529 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -96,7 +96,8 @@
  *   2M -   4M   Xen text, data, bss
  *   4M -   6M   Fixmap: special-purpose 4K mapping slots
  *   6M -  10M   Early boot mapping of FDT
- *  10M -  12M   Livepatch vmap (if compiled in)
+ *  10M -  12M   Early relocation address (used when relocating Xen)
+ *   and later for livepatch vmap (if compiled in)
  *
  *   1G -   2G   VMAP: ioremap and early_ioremap
  *
@@ -133,6 +134,7 @@
 #define BOOT_FDT_VIRT_START (FIXMAP_VIRT_START + FIXMAP_VIRT_SIZE)
 #define BOOT_FDT_VIRT_SIZE  _AT(vaddr_t, MB(4))
 
+#define BOOT_RELOC_VIRT_START   (BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE)
 #ifdef CONFIG_LIVEPATCH
 #define LIVEPATCH_VMAP_START(BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE)
 #define LIVEPATCH_VMAP_SIZE_AT(vaddr_t, MB(2))
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index b9c698088b..7015a0f841 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -145,6 +145,7 @@ static void __init __maybe_unused build_assertions(void)
 /* 2MB aligned regions */
 BUILD_BUG_ON(XEN_VIRT_START & ~SECOND_MASK);
 BUILD_BUG_ON(FIXMAP_ADDR(0) & ~SECOND_MASK);
+BUILD_BUG_ON(BOOT_RELOC_VIRT_START & ~SECOND_MASK);
 /* 1GB aligned regions */
 #ifdef CONFIG_ARM_32
 BUILD_BUG_ON(XENHEAP_VIRT_START & ~FIRST_MASK);
-- 
2.34.1




[PATCH v4 01/11] xen/common: add cache coloring common code

2023-01-23 Thread Carlo Nonato
This commit adds the Last Level Cache (LLC) coloring common header,
Kconfig options and stub functions for domain coloring. Since this is an
arch specific feature, actual implementation is postponed to later patches
and Kconfig options are placed under xen/arch.

LLC colors are represented as dynamic arrays plus their size and since
they have to be passed during domain creation, domain_create() is replaced
by domain_create_llc_colored(). domain_create() is then turned into a
wrapper of the colored version to let all the original call sites remain
untouched.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
---
v4:
- Kconfig options moved to xen/arch
- removed range for CONFIG_NR_LLC_COLORS
- added "llc_coloring_enabled" global to later implement the boot-time
  switch
- added domain_create_llc_colored() to be able to pass colors
- added is_domain_llc_colored() macro
---
 xen/arch/Kconfig   | 17 +++
 xen/common/Kconfig |  3 ++
 xen/common/domain.c| 23 +--
 xen/common/keyhandler.c|  4 +++
 xen/include/xen/llc_coloring.h | 54 ++
 xen/include/xen/sched.h|  9 ++
 6 files changed, 107 insertions(+), 3 deletions(-)
 create mode 100644 xen/include/xen/llc_coloring.h

diff --git a/xen/arch/Kconfig b/xen/arch/Kconfig
index 7028f7b74f..39c23f2528 100644
--- a/xen/arch/Kconfig
+++ b/xen/arch/Kconfig
@@ -28,3 +28,20 @@ config NR_NUMA_NODES
  associated with multiple-nodes management. It is the upper bound of
  the number of NUMA nodes that the scheduler, memory allocation and
  other NUMA-aware components can handle.
+
+config LLC_COLORING
+   bool "Last Level Cache (LLC) coloring" if EXPERT
+   depends on HAS_LLC_COLORING
+
+config NR_LLC_COLORS
+   int "Maximum number of LLC colors"
+   default 128
+   depends on LLC_COLORING
+   help
+ Controls the build-time size of various arrays associated with LLC
+ coloring. Refer to the documentation for how to compute the number
+ of colors supported by the platform.
+ The default value corresponds to an 8 MiB 16-ways LLC, which should be
+ more than what needed in the general case.
+ Note that if, at any time, a color configuration with more colors than
+ the maximum is employed, an error is produced.
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index f1ea3199c8..c796c633f1 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -49,6 +49,9 @@ config HAS_IOPORTS
 config HAS_KEXEC
bool
 
+config HAS_LLC_COLORING
+   bool
+
 config HAS_PDX
bool
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 626debbae0..87aae86081 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -549,9 +550,11 @@ static int sanitise_domain_config(struct 
xen_domctl_createdomain *config)
 return arch_sanitise_domain_config(config);
 }
 
-struct domain *domain_create(domid_t domid,
- struct xen_domctl_createdomain *config,
- unsigned int flags)
+struct domain *domain_create_llc_colored(domid_t domid,
+ struct xen_domctl_createdomain 
*config,
+ unsigned int flags,
+ unsigned int *llc_colors,
+ unsigned int num_llc_colors)
 {
 struct domain *d, **pd, *old_hwdom = NULL;
 enum { INIT_watchdog = 1u<<1,
@@ -663,6 +666,10 @@ struct domain *domain_create(domid_t domid,
 d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
 
 radix_tree_init(&d->pirq_tree);
+
+if ( llc_coloring_enabled &&
+ (err = domain_llc_coloring_init(d, llc_colors, num_llc_colors)) )
+return ERR_PTR(err);
 }
 
 if ( (err = arch_domain_create(d, config, flags)) != 0 )
@@ -769,6 +776,13 @@ struct domain *domain_create(domid_t domid,
 return ERR_PTR(err);
 }
 
+struct domain *domain_create(domid_t domid,
+ struct xen_domctl_createdomain *config,
+ unsigned int flags)
+{
+return domain_create_llc_colored(domid, config, flags, 0, 0);
+}
+
 void __init setup_system_domains(void)
 {
 /*
@@ -1103,6 +1117,9 @@ static void cf_check complete_domain_destroy(struct 
rcu_head *head)
 struct vcpu *v;
 int i;
 
+if ( is_domain_llc_colored(d) )
+domain_llc_coloring_free(d);
+
 /*
  * Flush all state for the vCPU previously having run on the current CPU.
  * This is in particular relevant for x86 HVM ones on VMX, so that this
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 0a551033c4..56f7731595 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -6,6 +

[PATCH v4 08/11] xen/arm: use colored allocator for p2m page tables

2023-01-23 Thread Carlo Nonato
Cache colored domains can benefit from having p2m page tables allocated
with the same coloring schema so that isolation can be achieved also for
those kind of memory accesses.
In order to do that, the domain struct is passed to the allocator and the
MEMF_no_owner flag is used.

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
---
v4:
- fixed p2m page allocation using MEMF_no_owner memflag
---
 xen/arch/arm/p2m.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 948f199d84..f9faeb61af 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -56,7 +57,10 @@ static struct page_info *p2m_alloc_page(struct domain *d)
  */
 if ( is_hardware_domain(d) )
 {
-pg = alloc_domheap_page(NULL, 0);
+if ( is_domain_llc_colored(d) )
+pg = alloc_domheap_page(d, MEMF_no_owner);
+else
+pg = alloc_domheap_page(NULL, 0);
 if ( pg == NULL )
 printk(XENLOG_G_ERR "Failed to allocate P2M pages for hwdom.\n");
 }
@@ -105,7 +109,10 @@ int p2m_set_allocation(struct domain *d, unsigned long 
pages, bool *preempted)
 if ( d->arch.paging.p2m_total_pages < pages )
 {
 /* Need to allocate more memory from domheap */
-pg = alloc_domheap_page(NULL, 0);
+if ( is_domain_llc_colored(d) )
+pg = alloc_domheap_page(d, MEMF_no_owner);
+else
+pg = alloc_domheap_page(NULL, 0);
 if ( pg == NULL )
 {
 printk(XENLOG_ERR "Failed to allocate P2M pages.\n");
-- 
2.34.1




[PATCH v4 07/11] xen: add cache coloring allocator for domains

2023-01-23 Thread Carlo Nonato
From: Luca Miccio 

This commit adds a new memory page allocator that implements the cache
coloring mechanism. The allocation algorithm follows the given domain color
configuration and maximizes contiguity in the page selection of multiple
subsequent requests.

Pages are stored in a color-indexed array of lists, each one sorted by
machine address, that is referred to as "colored heap". Those lists are
filled by a simple init function which computes the color of each page.
When a domain requests a page, the allocator takes one from those lists
whose colors equals the domain configuration. It chooses the page with the
lowest machine address such that contiguous pages are sequentially
allocated if this is made possible by a color assignment which includes
adjacent colors.

The allocator can handle only requests with order equal to 0 since the
single color granularity is represented in memory by one page.

The buddy allocator must coexist with the colored one because the Xen heap
isn't colored. For this reason a new Kconfig option and a command line
parameter are added to let the user set the amount of memory reserved for
the buddy allocator. Even when cache coloring is enabled, this memory
isn't managed by the colored allocator.

Colored heap information is dumped in the dump_heap() debug-key function.

Signed-off-by: Luca Miccio 
Signed-off-by: Marco Solieri 
Signed-off-by: Carlo Nonato 
---
v4:
- moved colored allocator code after buddy allocator because it now has
  some dependencies on buddy functions
- buddy_alloc_size is now used only by the colored allocator
- fixed a bug that allowed the buddy to merge pages when they were colored
- free_color_heap_page() now calls mark_page_free()
- free_color_heap_page() uses of the frametable array for faster searches
- added FIXME comment for the linear search in free_color_heap_page()
- removed alloc_color_domheap_page() to let the colored allocator exploit
  some more buddy allocator code
- alloc_color_heap_page() now allocs min address pages first
- reduced the mess in end_boot_allocator(): use the first loop for
  init_color_heap_pages()
- fixed page_list_add_prev() (list.h) since it was doing the opposite of
  what it was supposed to do
- fixed page_list_add_prev() (non list.h) to check also for next existence
- removed unused page_list_add_next()
- moved p2m code in another patch
---
 docs/misc/arm/cache-coloring.rst  |  49 ++
 docs/misc/xen-command-line.pandoc |  14 ++
 xen/arch/Kconfig  |  12 ++
 xen/arch/arm/include/asm/mm.h |   3 +
 xen/arch/arm/llc_coloring.c   |  12 ++
 xen/common/page_alloc.c   | 247 +++---
 xen/include/xen/llc_coloring.h|   5 +
 xen/include/xen/mm.h  |  33 
 8 files changed, 355 insertions(+), 20 deletions(-)

diff --git a/docs/misc/arm/cache-coloring.rst b/docs/misc/arm/cache-coloring.rst
index a28f75dc26..d56dafe815 100644
--- a/docs/misc/arm/cache-coloring.rst
+++ b/docs/misc/arm/cache-coloring.rst
@@ -15,10 +15,16 @@ In Kconfig:
   value meaning and when it should be changed).
 
 CONFIG_NR_LLC_COLORS=
+- If needed, change the amount of memory reserved for the buddy allocator
+  (see `Colored allocator and buddy allocator`_).
+
+CONFIG_BUDDY_ALLOCATOR_SIZE=
 
 Compile Xen and the toolstack and then:
 
 - Set the `llc-coloring=on` command line option.
+- If needed, set the amount of memory reserved for the buddy allocator
+  via the appropriate command line option.
 - Set `Coloring parameters and domain configurations`_.
 
 Background
@@ -162,6 +168,18 @@ Dom0less configurations (relative documentation in
 **Note:** If no color configuration is provided for a domain, the default one,
 which corresponds to all available colors, is used instead.
 
+Colored allocator and buddy allocator
+*
+
+The colored allocator distributes pages based on color configurations of
+domains so that each domains only gets pages of its own colors.
+The colored allocator is meant as an alternative to the buddy allocator because
+its allocation policy is by definition incompatible with the generic one. Since
+the Xen heap is not colored yet, we need to support the coexistence of the two
+allocators and some memory must be left for the buddy one.
+The buddy allocator memory can be reserved from the Xen Kconfig or with the
+help of a command-line option.
+
 Known issues and limitations
 
 
@@ -172,3 +190,34 @@ In the domain configuration, "xen,static-mem" allows 
memory to be statically
 allocated to the domain. This isn't possibile when LLC coloring is enabled,
 because that memory can't be guaranteed to use only colors assigned to the
 domain.
+
+Cache coloring is intended only for embedded systems
+
+
+The current implementation aims to satisfy the need of predictability in
+embedded systems with small amount of memory to be managed in a colored w

[PATCH v4 10/11] xen/arm: add Xen cache colors command line parameter

2023-01-23 Thread Carlo Nonato
From: Luca Miccio 

This commit adds a new command line parameter to configure Xen cache
colors. These colors can be dumped with the cache coloring info debug-key.

By default, Xen uses the first color.
Benchmarking the VM interrupt response time provides an estimation of
LLC usage by Xen's most latency-critical runtime task. Results on Arm
Cortex-A53 on Xilinx Zynq UltraScale+ XCZU9EG show that one color, which
reserves 64 KiB of L2, is enough to attain best responsiveness.

More colors are instead very likely to be needed on processors whose L1
cache is physically-indexed and physically-tagged, such as Cortex-A57.
In such cases, coloring applies to L1 also, and there typically are two
distinct L1-colors. Therefore, reserving only one color for Xen would
senselessly partitions a cache memory that is already private, i.e.
underutilize it. The default amount of Xen colors is thus set to one.

Signed-off-by: Luca Miccio 
Signed-off-by: Marco Solieri 
Signed-off-by: Carlo Nonato 
---
 docs/misc/xen-command-line.pandoc | 10 ++
 xen/arch/arm/llc_coloring.c   | 30 ++
 2 files changed, 40 insertions(+)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index a89c0cef61..d486946648 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2796,6 +2796,16 @@ In the case that x2apic is in use, this option switches 
between physical and
 clustered mode.  The default, given no hint from the **FADT**, is cluster
 mode.
 
+### xen-llc-colors (arm64)
+> `= List of [  | - ]`
+
+> Default: `0: the lowermost color`
+
+Specify Xen LLC color configuration. This options is available only when
+`CONFIG_LLC_COLORING` is enabled.
+Two colors are most likely needed on platforms where private caches are
+physically indexed, e.g. the L1 instruction cache of the Arm Cortex-A57.
+
 ### xenheap_megabytes (arm32)
 > `= `
 
diff --git a/xen/arch/arm/llc_coloring.c b/xen/arch/arm/llc_coloring.c
index 22612d455b..745e93a61a 100644
--- a/xen/arch/arm/llc_coloring.c
+++ b/xen/arch/arm/llc_coloring.c
@@ -19,6 +19,10 @@
 #include 
 #include 
 
+/* By default Xen uses the lowest color */
+#define XEN_DEFAULT_COLOR   0
+#define XEN_DEFAULT_NUM_COLORS  1
+
 bool llc_coloring_enabled;
 boolean_param("llc-coloring", llc_coloring_enabled);
 
@@ -33,6 +37,9 @@ static paddr_t __ro_after_init addr_col_mask;
 static unsigned int __ro_after_init dom0_colors[CONFIG_NR_LLC_COLORS];
 static unsigned int __ro_after_init dom0_num_colors;
 
+static unsigned int __ro_after_init xen_colors[CONFIG_NR_LLC_COLORS];
+static unsigned int __ro_after_init xen_num_colors;
+
 #define addr_to_color(addr) (((addr) & addr_col_mask) >> PAGE_SHIFT)
 
 /*
@@ -83,6 +90,12 @@ static int parse_color_config(const char *buf, unsigned int 
*colors,
 return *s ? -EINVAL : 0;
 }
 
+static int __init parse_xen_colors(const char *s)
+{
+return parse_color_config(s, xen_colors, &xen_num_colors);
+}
+custom_param("xen-llc-colors", parse_xen_colors);
+
 static int __init parse_dom0_colors(const char *s)
 {
 return parse_color_config(s, dom0_colors, &dom0_num_colors);
@@ -166,6 +179,8 @@ static void dump_coloring_info(unsigned char key)
 printk("LLC way size: %u KiB\n", llc_way_size >> 10);
 printk("Number of LLC colors supported: %u\n", nr_colors);
 printk("Address to LLC color mask: 0x%lx\n", addr_col_mask);
+printk("Xen LLC colors: ");
+print_colors(xen_colors, xen_num_colors);
 }
 
 bool __init llc_coloring_init(void)
@@ -202,6 +217,21 @@ bool __init llc_coloring_init(void)
 
 addr_col_mask = (nr_colors - 1) << PAGE_SHIFT;
 
+if ( !xen_num_colors )
+{
+printk(XENLOG_WARNING
+   "Xen LLC color config not found. Using default color: %u\n",
+   XEN_DEFAULT_COLOR);
+xen_colors[0] = XEN_DEFAULT_COLOR;
+xen_num_colors = XEN_DEFAULT_NUM_COLORS;
+}
+
+if ( !check_colors(xen_colors, xen_num_colors) )
+{
+printk(XENLOG_ERR "Bad LLC color config for Xen\n");
+return false;
+}
+
 register_keyhandler('K', dump_coloring_info, "dump LLC coloring info", 1);
 
 return true;
-- 
2.34.1




[PATCH v4 11/11] xen/arm: add cache coloring support for Xen

2023-01-23 Thread Carlo Nonato
This commit adds the cache coloring support for Xen own physical space.

It extends the implementation of setup_pagetables() to make use of Xen
cache coloring configuration. Page tables construction is essentially the
same except for the fact that PTEs point to a new temporary mapped,
physically colored space.

The temporary mapping is also used to relocate Xen to the new physical
space starting at the address taken from the old get_xen_paddr() function
which is brought back for the occasion.
The temporary mapping is finally converted to a mapping of the "old"
(meaning the original physical space) Xen code, so that the boot CPU can
actually address the variables and functions used by secondary CPUs until
they enable the MMU. This happens when the boot CPU needs to bring up other
CPUs (psci.c and smpboot.c) and when the TTBR value is passed to them
(init_secondary_pagetables()).

Finally, since the alternative framework needs to remap the Xen text and
inittext sections, this operation must be done in a coloring-aware way.
The function xen_remap_colored() is introduced for that.

Based on original work from: Luca Miccio 

Signed-off-by: Carlo Nonato 
Signed-off-by: Marco Solieri 
---
v4:
- removed set_value_for_secondary() because it was wrongly cleaning cache
- relocate_xen() now calls switch_ttbr_id()
---
 xen/arch/arm/alternative.c  |  9 ++-
 xen/arch/arm/arm64/head.S   | 50 +
 xen/arch/arm/arm64/mm.c | 26 +--
 xen/arch/arm/include/asm/llc_coloring.h | 22 ++
 xen/arch/arm/include/asm/mm.h   |  7 +-
 xen/arch/arm/llc_coloring.c | 45 
 xen/arch/arm/mm.c   | 94 ++---
 xen/arch/arm/psci.c |  9 ++-
 xen/arch/arm/setup.c| 75 +++-
 xen/arch/arm/smpboot.c  |  9 ++-
 xen/arch/arm/xen.lds.S  |  2 +-
 11 files changed, 325 insertions(+), 23 deletions(-)

diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c
index f00e3b9b3c..29f1ff34d4 100644
--- a/xen/arch/arm/alternative.c
+++ b/xen/arch/arm/alternative.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -209,8 +210,12 @@ void __init apply_alternatives_all(void)
  * The text and inittext section are read-only. So re-map Xen to
  * be able to patch the code.
  */
-xenmap = __vmap(&xen_mfn, 1U << xen_order, 1, 1, PAGE_HYPERVISOR,
-VMAP_DEFAULT);
+if ( llc_coloring_enabled )
+xenmap = xen_remap_colored(xen_mfn, xen_size);
+else
+xenmap = __vmap(&xen_mfn, 1U << xen_order, 1, 1, PAGE_HYPERVISOR,
+VMAP_DEFAULT);
+
 /* Re-mapping Xen is not expected to fail during boot. */
 BUG_ON(!xenmap);
 
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index a61b4d3c27..9ed7610afa 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -801,6 +801,56 @@ fail:   PRINT("- Boot failed -\r\n")
 b 1b
 ENDPROC(fail)
 
+GLOBAL(_end_boot)
+
+/* Copy Xen to new location and switch TTBR
+ * x0ttbr
+ * x1source address
+ * x2destination address
+ * x3length
+ *
+ * Source and destination must be word aligned, length is rounded up
+ * to a 16 byte boundary.
+ *
+ * MUST BE VERY CAREFUL when saving things to RAM over the copy */
+ENTRY(relocate_xen)
+/* Copy 16 bytes at a time using:
+ *   x9: counter
+ *   x10: data
+ *   x11: data
+ *   x12: source
+ *   x13: destination
+ */
+mov x9, x3
+mov x12, x1
+mov x13, x2
+
+1:  ldp x10, x11, [x12], #16
+stp x10, x11, [x13], #16
+
+subsx9, x9, #16
+bgt 1b
+
+/* Flush destination from dcache using:
+ * x9: counter
+ * x10: step
+ * x11: vaddr
+ */
+dsb   sy/* So the CPU issues all writes to the range */
+
+mov   x9, x3
+ldr   x10, =dcache_line_bytes /* x10 := step */
+ldr   x10, [x10]
+mov   x11, x2
+
+1:  dccvac, x11
+
+add   x11, x11, x10
+subs  x9, x9, x10
+bgt   1b
+
+b switch_ttbr_id
+
 /*
  * Switch TTBR
  *
diff --git a/xen/arch/arm/arm64/mm.c b/xen/arch/arm/arm64/mm.c
index 2ede4e75ae..4419381fdd 100644
--- a/xen/arch/arm/arm64/mm.c
+++ b/xen/arch/arm/arm64/mm.c
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
 #include 
+#include 
 #include 
 
 #include 
@@ -121,26 +122,43 @@ void update_identity_mapping(bool enable)
 }
 
 extern void switch_ttbr_id(uint64_t ttbr);
+extern void relocate_xen(uint64_t ttbr, void *src, void *dst, size_t len);
 
 typedef void (switch_ttbr_fn)(uint64_t ttbr);
+typedef void (relocate_xen_fn)(uint64_t ttbr, void *src, void *dst, size_t 
len);
 
 void __init switch_ttbr(uint64_t ttbr)
 {
-vaddr_t id_addr = virt_to_madd

Re: [PATCH v4 00/11] Arm cache coloring

2023-01-23 Thread Jan Beulich
On 23.01.2023 16:47, Carlo Nonato wrote:
> Shared caches in multi-core CPU architectures represent a problem for
> predictability of memory access latency. This jeopardizes applicability
> of many Arm platform in real-time critical and mixed-criticality
> scenarios. We introduce support for cache partitioning with page
> coloring, a transparent software technique that enables isolation
> between domains and Xen, and thus avoids cache interference.
> 
> When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows
> the user to define assignments of cache partitions ids, called colors,
> where assigning different colors guarantees no mutual eviction on cache
> will ever happen. This instructs the Xen memory allocator to provide
> the i-th color assignee only with pages that maps to color i, i.e. that
> are indexed in the i-th cache partition.
> 
> The proposed implementation supports the dom0less feature.
> The proposed implementation doesn't support the static-mem feature.
> The solution has been tested in several scenarios, including Xilinx Zynq
> MPSoCs.
> 
> v4 global changes:
> - added "llc" acronym (Last Level Cache) in multiple places in code
>   (e.g. coloring.{c|h} -> llc_coloring.{c|h}) to better describe the

Please can you use dashes in favor of underscores in the names of new
files?

Jan



Re: [PATCH] automation: Modify static-mem check in qemu-smoke-dom0less-arm64.sh

2023-01-23 Thread Ayan Kumar Halder



On 23/01/2023 14:30, Xenia Ragiadakou wrote:


On 1/23/23 15:10, Michal Orzel wrote:

At the moment, the static-mem check relies on the way Xen exposes the
memory banks in device tree. As this might change, the check should be
modified to be generic and not to rely on device tree. In this case,
let's use /proc/iomem which exposes the memory ranges in %08x format
as follows:
- : 

This way, we can grep in /proc/iomem for an entry containing memory
region defined by the static-mem configuration with "System RAM"
description. If it exists, mark the test as passed. Also, take the
opportunity to add 0x prefix to domu_{base,size} definition rather than
adding it in front of each occurence.

Signed-off-by: Michal Orzel 


Reviewed-by: Xenia Ragiadakou 

Reviewed-by: Ayan Kumar Halder 


Also you fixed the hard tab.


---
Patch made as part of the discussion:
https://lore.kernel.org/xen-devel/ba37ee02-c07c-2803-0867-149c77989...@amd.com/ 



CC: Julien, Ayan
---
  automation/scripts/qemu-smoke-dom0less-arm64.sh | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/automation/scripts/qemu-smoke-dom0less-arm64.sh 
b/automation/scripts/qemu-smoke-dom0less-arm64.sh

index 2b59346fdcfd..182a4b6c18fc 100755
--- a/automation/scripts/qemu-smoke-dom0less-arm64.sh
+++ b/automation/scripts/qemu-smoke-dom0less-arm64.sh
@@ -16,14 +16,13 @@ fi
    if [[ "${test_variant}" == "static-mem" ]]; then
  # Memory range that is statically allocated to DOM1
-    domu_base="5000"
-    domu_size="1000"
+    domu_base="0x5000"
+    domu_size="0x1000"
  passed="${test_variant} test passed"
  domU_check="
-current=\$(hexdump -e '16/1 \"%02x\"' 
/proc/device-tree/memory@${domu_base}/reg 2>/dev/null)

-expected=$(printf \"%016x%016x\" 0x${domu_base} 0x${domu_size})
-if [[ \"\${expected}\" == \"\${current}\" ]]; then
-    echo \"${passed}\"
+mem_range=$(printf \"%08x-%08x\" ${domu_base} $(( ${domu_base} + 
${domu_size} - 1 )))

+if grep -q -x \"\${mem_range} : System RAM\" /proc/iomem; then
+    echo \"${passed}\"
  fi
  "
  fi
@@ -126,7 +125,7 @@ UBOOT_SOURCE="boot.source"
  UBOOT_SCRIPT="boot.scr"' > binaries/config
    if [[ "${test_variant}" == "static-mem" ]]; then
-    echo -e "\nDOMU_STATIC_MEM[0]=\"0x${domu_base} 0x${domu_size}\"" 
>> binaries/config
+    echo -e "\nDOMU_STATIC_MEM[0]=\"${domu_base} ${domu_size}\"" >> 
binaries/config

  fi
    if [[ "${test_variant}" == "boot-cpupools" ]]; then






Re: [PATCH v2 4/8] x86/mem-sharing: copy GADDR based shared guest areas

2023-01-23 Thread Tamas K Lengyel
On Mon, Jan 23, 2023 at 9:55 AM Jan Beulich  wrote:
>
> In preparation of the introduction of new vCPU operations allowing to
> register the respective areas (one of the two is x86-specific) by
> guest-physical address, add the necessary fork handling (with the
> backing function yet to be filled in).
>
> Signed-off-by: Jan Beulich 
>
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1653,6 +1653,65 @@ static void copy_vcpu_nonreg_state(struc
>  hvm_set_nonreg_state(cd_vcpu, &nrs);
>  }
>
> +static int copy_guest_area(struct guest_area *cd_area,
> +   const struct guest_area *d_area,
> +   struct vcpu *cd_vcpu,
> +   const struct domain *d)
> +{
> +mfn_t d_mfn, cd_mfn;
> +
> +if ( !d_area->pg )
> +return 0;
> +
> +d_mfn = page_to_mfn(d_area->pg);
> +
> +/* Allocate & map a page for the area if it hasn't been already. */
> +if ( !cd_area->pg )
> +{
> +gfn_t gfn = mfn_to_gfn(d, d_mfn);
> +struct p2m_domain *p2m = p2m_get_hostp2m(cd_vcpu->domain);
> +p2m_type_t p2mt;
> +p2m_access_t p2ma;
> +unsigned int offset;
> +int ret;
> +
> +cd_mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL, NULL);
> +if ( mfn_eq(cd_mfn, INVALID_MFN) )
> +{
> +struct page_info *pg = alloc_domheap_page(cd_vcpu->domain,
0);
> +
> +if ( !pg )
> +return -ENOMEM;
> +
> +cd_mfn = page_to_mfn(pg);
> +set_gpfn_from_mfn(mfn_x(cd_mfn), gfn_x(gfn));
> +
> +ret = p2m->set_entry(p2m, gfn, cd_mfn, PAGE_ORDER_4K,
p2m_ram_rw,
> + p2m->default_access, -1);
> +if ( ret )
> +return ret;
> +}
> +else if ( p2mt != p2m_ram_rw )
> +return -EBUSY;
> +
> +/*
> + * Simply specify the entire range up to the end of the page.
All the
> + * function uses it for is a check for not crossing page
boundaries.
> + */
> +offset = PAGE_OFFSET(d_area->map);
> +ret = map_guest_area(cd_vcpu, gfn_to_gaddr(gfn) + offset,
> + PAGE_SIZE - offset, cd_area, NULL);
> +if ( ret )
> +return ret;
> +}
> +else
> +cd_mfn = page_to_mfn(cd_area->pg);

Everything to this point seems to be non mem-sharing/forking related. Could
these live somewhere else? There must be some other place where allocating
these areas happens already for non-fork VMs so it would make sense to just
refactor that code to be callable from here.

> +
> +copy_domain_page(cd_mfn, d_mfn);
> +
> +return 0;
> +}
> +
>  static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>  {
>  struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> @@ -1745,6 +1804,16 @@ static int copy_vcpu_settings(struct dom
>  copy_domain_page(new_vcpu_info_mfn, vcpu_info_mfn);
>  }
>
> +/* Same for the (physically registered) runstate and time info
areas. */
> +ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
> +  &d_vcpu->runstate_guest_area, cd_vcpu, d);
> +if ( ret )
> +return ret;
> +ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
> +  &d_vcpu->arch.time_guest_area, cd_vcpu, d);
> +if ( ret )
> +return ret;
> +
>  ret = copy_vpmu(d_vcpu, cd_vcpu);
>  if ( ret )
>  return ret;
> @@ -1987,7 +2056,10 @@ int mem_sharing_fork_reset(struct domain
>
>   state:
>  if ( reset_state )
> +{
>  rc = copy_settings(d, pd);
> +/* TBD: What to do here with -ERESTART? */

Where does ERESTART coming from?


Re: [PATCH v2 1/9] x86/shadow: replace sh_reset_l3_up_pointers()

2023-01-23 Thread Jan Beulich
On 23.01.2023 16:20, George Dunlap wrote:
> Re the original question: I've stared at the code for a bit now, and I
> can't see anything obviously wrong or dangerous about it.
> 
> But it does make me ask, why do we need the "unpinning_l3" pseudo-argument
> at all?  Is there any reason not to unconditionally zero out sp->up when we
> find a head_type of SH_type_l3_64_shadow?  As far as I can tell, sp->list
> doesn't require any special state.  Why do we make the effort to leave it
> alone when we're not unpinning all l3s?

This was an attempt to retain original behavior as much as possible, but I'm
afraid that, ...

> In fact, is there a way to unpin an l3 shadow *other* than when we're
> unpinning all l3's?

... since the answer here is of course "yes", ...

>  If so, then this patch, as written, is broken -- the
> original code clears the up-pointer for *all* L3_64 shadows, regardless of
> whether they're on the pinned list; the new patch will only clear the ones
> on the pinned list.  But unconditionally clearing sp->up could actually fix
> that.

... you're right, and I failed (went too far) with that attempt. Plus it'll
naturally resolve the parameter-vs-state aspect.

Jan



Re: [PATCH v4 00/11] Arm cache coloring

2023-01-23 Thread Carlo Nonato
Hi Jan,

On Mon, Jan 23, 2023 at 4:52 PM Jan Beulich  wrote:
>
> On 23.01.2023 16:47, Carlo Nonato wrote:
> > Shared caches in multi-core CPU architectures represent a problem for
> > predictability of memory access latency. This jeopardizes applicability
> > of many Arm platform in real-time critical and mixed-criticality
> > scenarios. We introduce support for cache partitioning with page
> > coloring, a transparent software technique that enables isolation
> > between domains and Xen, and thus avoids cache interference.
> >
> > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows
> > the user to define assignments of cache partitions ids, called colors,
> > where assigning different colors guarantees no mutual eviction on cache
> > will ever happen. This instructs the Xen memory allocator to provide
> > the i-th color assignee only with pages that maps to color i, i.e. that
> > are indexed in the i-th cache partition.
> >
> > The proposed implementation supports the dom0less feature.
> > The proposed implementation doesn't support the static-mem feature.
> > The solution has been tested in several scenarios, including Xilinx Zynq
> > MPSoCs.
> >
> > v4 global changes:
> > - added "llc" acronym (Last Level Cache) in multiple places in code
> >   (e.g. coloring.{c|h} -> llc_coloring.{c|h}) to better describe the
>
> Please can you use dashes in favor of underscores in the names of new
> files?

Yes, ok.

> Jan

I also forgot to mention that this patch series applies on top of the
most recent
version of Julien's series (https://marc.info/?l=xen-devel&m=167360469228247).

Thanks.



Re: [PATCH v3 01/18] xen/arm64: flushtlb: Reduce scope of barrier for local TLB flush

2023-01-23 Thread Ayan Kumar Halder



On 12/12/2022 09:55, Julien Grall wrote:

CAUTION: This message has originated from an External Source. Please use proper 
judgment and caution when opening attachments, clicking links, or responding to 
this email.


From: Julien Grall 

Per D5-4929 in ARM DDI 0487H.a:
"A DSB NSH is sufficient to ensure completion of TLB maintenance
  instructions that apply to a single PE. A DSB ISH is sufficient to
  ensure completion of TLB maintenance instructions that apply to PEs
  in the same Inner Shareable domain.
"

This means barrier after local TLB flushes could be reduced to
non-shareable.

Note that the scope of the barrier in the workaround has not been
changed because Linux v6.1-rc8 is also using 'ish' and I couldn't
find anything in the Neoverse N1 suggesting that a 'nsh' would
be sufficient.

Signed-off-by: Julien Grall 

---

 I have used an older version of the Arm Arm because the explanation
 in the latest (ARM DDI 0487I.a) is less obvious. I reckon the paragraph
 about DSB in D8.13.8 is missing the shareability. But this is implied
 in B2.3.11:

 "If the required access types of the DSB is reads and writes, the
  following instructions issued by PEe before the DSB are complete for
  the required shareability domain:

  [...]

  — All TLB maintenance instructions.
 "

 Changes in v3:
 - Patch added
---
  xen/arch/arm/include/asm/arm64/flushtlb.h | 27 ++-
  1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/flushtlb.h 
b/xen/arch/arm/include/asm/arm64/flushtlb.h
index 7c5431518741..39d429ace552 100644
--- a/xen/arch/arm/include/asm/arm64/flushtlb.h
+++ b/xen/arch/arm/include/asm/arm64/flushtlb.h
@@ -12,8 +12,9 @@
   * ARM64_WORKAROUND_REPEAT_TLBI:
   * Modification of the translation table for a virtual address might lead to
   * read-after-read ordering violation.
- * The workaround repeats TLBI+DSB operation for all the TLB flush operations.
- * While this is stricly not necessary, we don't want to take any risk.
+ * The workaround repeats TLBI+DSB ISH operation for all the TLB flush
+ * operations. While this is stricly not necessary, we don't want to
+ * take any risk.
   *
   * For Xen page-tables the ISB will discard any instructions fetched
   * from the old mappings.
@@ -21,38 +22,42 @@
   * For the Stage-2 page-tables the ISB ensures the completion of the DSB
   * (and therefore the TLB invalidation) before continuing. So we know
   * the TLBs cannot contain an entry for a mapping we may have removed.
+ *
+ * Note that for local TLB flush, using non-shareable (nsh) is sufficient
+ * (see D5-4929 in ARM DDI 0487H.a). Althougth, the memory barrier in
+ * for the workaround is left as inner-shareable to match with Linux.


Nit:- It might be good to mention the Linux commit id.


   */
-#define TLB_HELPER(name, tlbop)  \
+#define TLB_HELPER(name, tlbop, sh)  \
  static inline void name(void)\
  {\
  asm volatile(\
-"dsb  ishst;"\
+"dsb  "  # sh  "st;" \
  "tlbi "  # tlbop  ";"\
  ALTERNATIVE( \
  "nop; nop;", \
-"dsb  ish;"  \
+"dsb  "  # sh  ";"   \
  "tlbi "  # tlbop  ";",   \
  ARM64_WORKAROUND_REPEAT_TLBI,\
  CONFIG_ARM64_WORKAROUND_REPEAT_TLBI) \
-"dsb  ish;"  \
+"dsb  "  # sh  ";"   \
  "isb;"   \
  : : : "memory"); \
  }

  /* Flush local TLBs, current VMID only. */
-TLB_HELPER(flush_guest_tlb_local, vmalls12e1);
+TLB_HELPER(flush_guest_tlb_local, vmalls12e1, nsh);

  /* Flush innershareable TLBs, current VMID only */
-TLB_HELPER(flush_guest_tlb, vmalls12e1is);
+TLB_HELPER(flush_guest_tlb, vmalls12e1is, ish);

  /* Flush local TLBs, all VMIDs, non-hypervisor mode */
-TLB_HELPER(flush_all_guests_tlb_local, alle1);
+TLB_HELPER(flush_all_guests_tlb_local, alle1, nsh);

  /* Flush innershareable TLBs, all VMIDs, non-hypervisor mode */
-TLB_HELPER(flush_all_guests_tlb, alle1is);
+TLB_HELPER(flush_all_guests_tlb, alle1is, ish);

  /* Flush all hypervisor mappings from the TLB of the local processor. */
-TLB_HELPER(flush_xen_tlb_local, alle2);
+TLB_HELPER(flush_xen_tlb_local, alle2, nsh);

  /* Flush TLB of local processor for address va. */
  static inline void  __flush_xen_tlb_one_local(vaddr_t va)
--
2.38.1

Reviewed-by: Ayan Kumar Halder 






Re: [PATCH v2 4/8] x86/mem-sharing: copy GADDR based shared guest areas

2023-01-23 Thread Jan Beulich
On 23.01.2023 17:09, Tamas K Lengyel wrote:
> On Mon, Jan 23, 2023 at 9:55 AM Jan Beulich  wrote:
>> --- a/xen/arch/x86/mm/mem_sharing.c
>> +++ b/xen/arch/x86/mm/mem_sharing.c
>> @@ -1653,6 +1653,65 @@ static void copy_vcpu_nonreg_state(struc
>>  hvm_set_nonreg_state(cd_vcpu, &nrs);
>>  }
>>
>> +static int copy_guest_area(struct guest_area *cd_area,
>> +   const struct guest_area *d_area,
>> +   struct vcpu *cd_vcpu,
>> +   const struct domain *d)
>> +{
>> +mfn_t d_mfn, cd_mfn;
>> +
>> +if ( !d_area->pg )
>> +return 0;
>> +
>> +d_mfn = page_to_mfn(d_area->pg);
>> +
>> +/* Allocate & map a page for the area if it hasn't been already. */
>> +if ( !cd_area->pg )
>> +{
>> +gfn_t gfn = mfn_to_gfn(d, d_mfn);
>> +struct p2m_domain *p2m = p2m_get_hostp2m(cd_vcpu->domain);
>> +p2m_type_t p2mt;
>> +p2m_access_t p2ma;
>> +unsigned int offset;
>> +int ret;
>> +
>> +cd_mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL, NULL);
>> +if ( mfn_eq(cd_mfn, INVALID_MFN) )
>> +{
>> +struct page_info *pg = alloc_domheap_page(cd_vcpu->domain,
> 0);
>> +
>> +if ( !pg )
>> +return -ENOMEM;
>> +
>> +cd_mfn = page_to_mfn(pg);
>> +set_gpfn_from_mfn(mfn_x(cd_mfn), gfn_x(gfn));
>> +
>> +ret = p2m->set_entry(p2m, gfn, cd_mfn, PAGE_ORDER_4K,
> p2m_ram_rw,
>> + p2m->default_access, -1);
>> +if ( ret )
>> +return ret;
>> +}
>> +else if ( p2mt != p2m_ram_rw )
>> +return -EBUSY;
>> +
>> +/*
>> + * Simply specify the entire range up to the end of the page.
> All the
>> + * function uses it for is a check for not crossing page
> boundaries.
>> + */
>> +offset = PAGE_OFFSET(d_area->map);
>> +ret = map_guest_area(cd_vcpu, gfn_to_gaddr(gfn) + offset,
>> + PAGE_SIZE - offset, cd_area, NULL);
>> +if ( ret )
>> +return ret;
>> +}
>> +else
>> +cd_mfn = page_to_mfn(cd_area->pg);
> 
> Everything to this point seems to be non mem-sharing/forking related. Could
> these live somewhere else? There must be some other place where allocating
> these areas happens already for non-fork VMs so it would make sense to just
> refactor that code to be callable from here.

It is the "copy" aspect with makes this mem-sharing (or really fork)
specific. Plus in the end this is no different from what you have
there right now for copying the vCPU info area. In the final patch
that other code gets removed by re-using the code here.

I also haven't been able to spot anything that could be factored
out (and one might expect that if there was something, then the vCPU
info area copying should also already have used it). map_guest_area()
is all that is used for other purposes as well.

>> +
>> +copy_domain_page(cd_mfn, d_mfn);
>> +
>> +return 0;
>> +}
>> +
>>  static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>  {
>>  struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
>> @@ -1745,6 +1804,16 @@ static int copy_vcpu_settings(struct dom
>>  copy_domain_page(new_vcpu_info_mfn, vcpu_info_mfn);
>>  }
>>
>> +/* Same for the (physically registered) runstate and time info
> areas. */
>> +ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
>> +  &d_vcpu->runstate_guest_area, cd_vcpu, d);
>> +if ( ret )
>> +return ret;
>> +ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
>> +  &d_vcpu->arch.time_guest_area, cd_vcpu, d);
>> +if ( ret )
>> +return ret;
>> +
>>  ret = copy_vpmu(d_vcpu, cd_vcpu);
>>  if ( ret )
>>  return ret;
>> @@ -1987,7 +2056,10 @@ int mem_sharing_fork_reset(struct domain
>>
>>   state:
>>  if ( reset_state )
>> +{
>>  rc = copy_settings(d, pd);
>> +/* TBD: What to do here with -ERESTART? */
> 
> Where does ERESTART coming from?

>From map_guest_area()'s attempt to acquire the hypercall deadlock mutex,
in order to then pause the subject vCPU. I suppose that in the forking
case it may already be paused, but then there's no way map_guest_area()
could know. Looking at the pause count is fragile, as there's no
guarantee that the vCPU may be unpaused while we're still doing work on
it. Hence I view such checks as only suitable for assertions.

Jan



Re: [PATCH] x86/shadow: sh_type_to_size[] needs L2H entry when HVM+PV32

2023-01-23 Thread Jan Beulich
On 23.01.2023 13:49, Jan Beulich wrote:
> On 23.01.2023 13:30, Andrew Cooper wrote:
>> On 23/01/2023 10:47 am, Jan Beulich wrote:
>>> On 23.01.2023 11:43, Andrew Cooper wrote:
 On 23/01/2023 8:12 am, Jan Beulich wrote:
> While the table is used only when HVM=y, the table entry of course needs
> to be properly populated when also PV32=y. Fully removing the table
> entry we therefore wrong.
>
> Fixes: 1894049fa283 ("x86/shadow: L2H shadow type is PV32-only")
> Signed-off-by: Jan Beulich 
 Erm, why?

 The safety justification for the original patch was that this is HVM
 only code.

Coming back to this: There was no such claim. There was a claim about
the type in question being PV32-only, and there was a comparison with
other types which are HVM-only.

  And it really is HVM only code - it's genuinely compiled out
 for !HVM builds.
>>> Right, and we have logic taking care of the !HVM case. But that same
>>> logic uses this "HVM-only" table when HVM=y also for all PV types.
>>
>> Ok - this is what needs fixing then.
>>
>> This is a layering violation which has successfully tricked you into
>> making a buggy patch.
>>
>> I'm unwilling to bet this will be the final time either...  "this file
>> is HVM-only, therefore no PV paths enter it" is a reasonable
>> expectation, and should be true.
> 
> Nice abstract consideration, but would mind pointing out how you envision
> shadow_size() to look like meeting your constraints _and_ meeting my
> demand of no excess #ifdef-ary? The way I'm reading your reply is that
> you ask to special case L2H _right in_ shadow_size(). Then again see also
> my remark in the original (now known faulty) patch regarding such special
> casing. I could of course follow that route, regardless of HVM (i.e.
> unlike said there not just for the #else part) ...

Actually no, that remark was about the opposite (!PV32) case, so if I
took both together, this would result:

static inline unsigned int
shadow_size(unsigned int shadow_type)
{
#ifdef CONFIG_HVM
#ifdef CONFIG_PV32
if ( shadow_type == SH_type_l2h_64_shadow )
return 1;
#endif
ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size));
return sh_type_to_size[shadow_type];
#else
#ifndef CONFIG_PV32
if ( shadow_type == SH_type_l2h_64_shadow )
return 0;
#endif
ASSERT(shadow_type < SH_type_unused);
return shadow_type != SH_type_none;
#endif
}

I think that's quite a bit worse than using sh_type_to_size[] for all
kinds of guest uniformly when HVM=y. This

static inline unsigned int
shadow_size(unsigned int shadow_type)
{
if ( shadow_type == SH_type_l2h_64_shadow )
return IS_ENABLED(CONFIG_PV32);
#ifdef CONFIG_HVM
ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size));
return sh_type_to_size[shadow_type];
#else
ASSERT(shadow_type < SH_type_unused);
return shadow_type != SH_type_none;
#endif
}

is also only marginally better, as we really would better avoid any
such open-coding.

Jan



[RISC-V] Switch to H-mode

2023-01-23 Thread Oleksii
Hi Alistair and community,

I am working on RISC-V support upstream for Xen based on your and Bobby
patches.

Adding the RISC-V support I realized that Xen is ran in S-mode. Output
of OpenSBI:
...
Domain0 Next Mode : S-mode
...
So the first my question is shouldn't it be in H-mode?

If I am right than it looks like we have to do a patch to OpenSBI to
add support of H-mode as it is not supported now:
[1]
https://github.com/riscv-software-src/opensbi/blob/master/lib/sbi/sbi_domain.c#L380
[2]
https://github.com/riscv-software-src/opensbi/blob/master/include/sbi/riscv_encoding.h#L110
Please correct me if I am wrong.

The other option I see is to switch to H-mode in U-boot as I understand
the classical boot flow is:
OpenSBI -> U-boot -> Xen -> Domain{0,...}
If it is at all possible since U-boot will be in S mode after OpenSBI.

Thanks in advance.

~ Oleksii



Re: [PATCH] x86/shadow: sh_type_to_size[] needs L2H entry when HVM+PV32

2023-01-23 Thread Jan Beulich
On 23.01.2023 17:56, Jan Beulich wrote:
> On 23.01.2023 13:49, Jan Beulich wrote:
>> On 23.01.2023 13:30, Andrew Cooper wrote:
>>> This is a layering violation which has successfully tricked you into
>>> making a buggy patch.
>>>
>>> I'm unwilling to bet this will be the final time either...  "this file
>>> is HVM-only, therefore no PV paths enter it" is a reasonable
>>> expectation, and should be true.
>>
>> Nice abstract consideration, but would mind pointing out how you envision
>> shadow_size() to look like meeting your constraints _and_ meeting my
>> demand of no excess #ifdef-ary? The way I'm reading your reply is that
>> you ask to special case L2H _right in_ shadow_size(). Then again see also
>> my remark in the original (now known faulty) patch regarding such special
>> casing. I could of course follow that route, regardless of HVM (i.e.
>> unlike said there not just for the #else part) ...
> 
> Actually no, that remark was about the opposite (!PV32) case, so if I
> took both together, this would result:
> 
> static inline unsigned int
> shadow_size(unsigned int shadow_type)
> {
> #ifdef CONFIG_HVM
> #ifdef CONFIG_PV32
> if ( shadow_type == SH_type_l2h_64_shadow )
> return 1;
> #endif
> ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size));
> return sh_type_to_size[shadow_type];
> #else
> #ifndef CONFIG_PV32
> if ( shadow_type == SH_type_l2h_64_shadow )
> return 0;
> #endif
> ASSERT(shadow_type < SH_type_unused);
> return shadow_type != SH_type_none;
> #endif
> }
> 
> I think that's quite a bit worse than using sh_type_to_size[] for all
> kinds of guest uniformly when HVM=y. This
> 
> static inline unsigned int
> shadow_size(unsigned int shadow_type)
> {
> if ( shadow_type == SH_type_l2h_64_shadow )
> return IS_ENABLED(CONFIG_PV32);

Which might better use opt_pv32 instead, if we really were to go this route.

Jan

> #ifdef CONFIG_HVM
> ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size));
> return sh_type_to_size[shadow_type];
> #else
> ASSERT(shadow_type < SH_type_unused);
> return shadow_type != SH_type_none;
> #endif
> }
> 
> is also only marginally better, as we really would better avoid any
> such open-coding.
> 
> Jan
> 




Re: [PATCH v2 12/40] xen/mpu: introduce helpers for MPU enablement

2023-01-23 Thread Ayan Kumar Halder

Hi Penny,

On 13/01/2023 05:28, Penny Zheng wrote:

CAUTION: This message has originated from an External Source. Please use proper 
judgment and caution when opening attachments, clicking links, or responding to 
this email.


We need a new helper for Xen to enable MPU in boot-time.
The new helper is semantically consistent with the original enable_mmu.

If the Background region is enabled, then the MPU uses the default memory
map as the Background region for generating the memory
attributes when MPU is disabled.
Since the default memory map of the Armv8-R AArch64 architecture is
IMPLEMENTATION DEFINED, we always turn off the Background region.

In this patch, we also introduce a neutral name enable_mm for
Xen to enable MMU/MPU. This can help us to keep one code flow
in head.S

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
  xen/arch/arm/arm64/head.S |  5 +++--
  xen/arch/arm/arm64/head_mmu.S |  4 ++--
  xen/arch/arm/arm64/head_mpu.S | 19 +++
  3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 145e3d53dc..7f3f973468 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -258,7 +258,8 @@ real_start_efi:
   * and memory regions for MPU systems.
   */
  blprepare_early_mappings
-blenable_mmu
+/* Turn on MMU or MPU */
+blenable_mm

  /* We are still in the 1:1 mapping. Jump to the runtime Virtual 
Address. */
  ldr   x0, =primary_switched
@@ -316,7 +317,7 @@ GLOBAL(init_secondary)
  blcheck_cpu_mode
  blcpu_init
  blprepare_early_mappings
-blenable_mmu
+blenable_mm

  /* We are still in the 1:1 mapping. Jump to the runtime Virtual 
Address. */
  ldr   x0, =secondary_switched
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index 2346f755df..b59c40495f 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -217,7 +217,7 @@ ENDPROC(prepare_early_mappings)
   *
   * Clobbers x0 - x3
   */
-ENTRY(enable_mmu)
+ENTRY(enable_mm)
  PRINT("- Turning on paging -\r\n")

  /*
@@ -239,7 +239,7 @@ ENTRY(enable_mmu)
  msr   SCTLR_EL2, x0  /* now paging is enabled */
  isb  /* Now, flush the icache */
  ret
-ENDPROC(enable_mmu)
+ENDPROC(enable_mm)

  /*
   * Remove the 1:1 map from the page-tables. It is not easy to keep track
diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
index 0b97ce4646..e2ac69b0cc 100644
--- a/xen/arch/arm/arm64/head_mpu.S
+++ b/xen/arch/arm/arm64/head_mpu.S
@@ -315,6 +315,25 @@ ENDPROC(prepare_early_mappings)

  GLOBAL(_end_boot)

+/*
+ * Enable EL2 MPU and data cache
+ * If the Background region is enabled, then the MPU uses the default memory
+ * map as the Background region for generating the memory
+ * attributes when MPU is disabled.
+ * Since the default memory map of the Armv8-R AArch64 architecture is
+ * IMPLEMENTATION DEFINED, we intend to turn off the Background region here.
+ */
+ENTRY(enable_mm)
+mrs   x0, SCTLR_EL2
+orr   x0, x0, #SCTLR_Axx_ELx_M/* Enable MPU */
+orr   x0, x0, #SCTLR_Axx_ELx_C/* Enable D-cache */
+orr   x0, x0, #SCTLR_Axx_ELx_WXN  /* Enable WXN */
+dsb   sy
+msr   SCTLR_EL2, x0
+isb
+ret
+ENDPROC(enable_mm)


Can this be renamed to enable_mpu or enable_mpu_and_cache() ?

Can we also have the corresponding disable function in this patch ?

Also (compared with "[PATCH v6 10/11] xen/arm64: introduce helpers for 
MPU enable/disable"), I see that you have added #SCTLR_Axx_ELx_WXN. What 
is the reason for this ?


- Ayan


+
  /*
   * Local variables:
   * mode: ASM
--
2.25.1






[linux-linus test] 176060: regressions - FAIL

2023-01-23 Thread osstest service owner
flight 176060 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/176060/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-examine  8 reboot   fail REGR. vs. 173462
 test-arm64-arm64-xl-vhd   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-libvirt-raw  8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-seattle   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-arndale   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-credit2   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-multivcpu  8 xen-bootfail REGR. vs. 173462
 test-armhf-armhf-xl   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-vhd   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-libvirt  8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-examine  8 reboot   fail REGR. vs. 173462
 test-armhf-armhf-libvirt-qcow2  8 xen-boot   fail REGR. vs. 173462
 test-armhf-armhf-libvirt-raw  8 xen-boot fail REGR. vs. 173462

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds  8 xen-boot fail REGR. vs. 173462

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173462
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass

version targeted for testing:
 linux2475bf0250dee99b477e0c56d7dc9d7ac3f04117
baseline version:
 linux9d84bb40bcb30a7fa16f33baa967aeb9953dda78

Last test of basis   173462  2022-10-07 18:41:45 Z  107 days
Failing since173470  2022-10-08 06:21:34 Z  107 days  221 attempts
Testing same since   176053  2023-01-22 23:10:33 Z0 days2 attempts


3437 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-coresched-amd64-xlpass
 test-arm64-arm64-xl  fail
 test-armhf-armhf-xl  fail
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-am

[xen-unstable-smoke test] 176066: tolerable all pass - PUSHED

2023-01-23 Thread osstest service owner
flight 176066 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/176066/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  d60324d8af9404014cfcc37bba09e9facfd02fcf
baseline version:
 xen  1d60c20260c7e82fe5344d06c20d718e0cc03b8b

Last test of basis   176006  2023-01-21 01:01:52 Z2 days
Testing same since   176066  2023-01-23 15:00:27 Z0 days1 attempts


People who touched revisions under test:
  Anthony PERARD 
  Daniel P. Smith 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   1d60c20260..d60324d8af  d60324d8af9404014cfcc37bba09e9facfd02fcf -> smoke



Re: [PATCH v2 4/8] x86/mem-sharing: copy GADDR based shared guest areas

2023-01-23 Thread Tamas K Lengyel
On Mon, Jan 23, 2023 at 11:24 AM Jan Beulich  wrote:
>
> On 23.01.2023 17:09, Tamas K Lengyel wrote:
> > On Mon, Jan 23, 2023 at 9:55 AM Jan Beulich  wrote:
> >> --- a/xen/arch/x86/mm/mem_sharing.c
> >> +++ b/xen/arch/x86/mm/mem_sharing.c
> >> @@ -1653,6 +1653,65 @@ static void copy_vcpu_nonreg_state(struc
> >>  hvm_set_nonreg_state(cd_vcpu, &nrs);
> >>  }
> >>
> >> +static int copy_guest_area(struct guest_area *cd_area,
> >> +   const struct guest_area *d_area,
> >> +   struct vcpu *cd_vcpu,
> >> +   const struct domain *d)
> >> +{
> >> +mfn_t d_mfn, cd_mfn;
> >> +
> >> +if ( !d_area->pg )
> >> +return 0;
> >> +
> >> +d_mfn = page_to_mfn(d_area->pg);
> >> +
> >> +/* Allocate & map a page for the area if it hasn't been already.
*/
> >> +if ( !cd_area->pg )
> >> +{
> >> +gfn_t gfn = mfn_to_gfn(d, d_mfn);
> >> +struct p2m_domain *p2m = p2m_get_hostp2m(cd_vcpu->domain);
> >> +p2m_type_t p2mt;
> >> +p2m_access_t p2ma;
> >> +unsigned int offset;
> >> +int ret;
> >> +
> >> +cd_mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL,
NULL);
> >> +if ( mfn_eq(cd_mfn, INVALID_MFN) )
> >> +{
> >> +struct page_info *pg = alloc_domheap_page(cd_vcpu->domain,
> > 0);
> >> +
> >> +if ( !pg )
> >> +return -ENOMEM;
> >> +
> >> +cd_mfn = page_to_mfn(pg);
> >> +set_gpfn_from_mfn(mfn_x(cd_mfn), gfn_x(gfn));
> >> +
> >> +ret = p2m->set_entry(p2m, gfn, cd_mfn, PAGE_ORDER_4K,
> > p2m_ram_rw,
> >> + p2m->default_access, -1);
> >> +if ( ret )
> >> +return ret;
> >> +}
> >> +else if ( p2mt != p2m_ram_rw )
> >> +return -EBUSY;
> >> +
> >> +/*
> >> + * Simply specify the entire range up to the end of the page.
> > All the
> >> + * function uses it for is a check for not crossing page
> > boundaries.
> >> + */
> >> +offset = PAGE_OFFSET(d_area->map);
> >> +ret = map_guest_area(cd_vcpu, gfn_to_gaddr(gfn) + offset,
> >> + PAGE_SIZE - offset, cd_area, NULL);
> >> +if ( ret )
> >> +return ret;
> >> +}
> >> +else
> >> +cd_mfn = page_to_mfn(cd_area->pg);
> >
> > Everything to this point seems to be non mem-sharing/forking related.
Could
> > these live somewhere else? There must be some other place where
allocating
> > these areas happens already for non-fork VMs so it would make sense to
just
> > refactor that code to be callable from here.
>
> It is the "copy" aspect with makes this mem-sharing (or really fork)
> specific. Plus in the end this is no different from what you have
> there right now for copying the vCPU info area. In the final patch
> that other code gets removed by re-using the code here.

Yes, the copy part is fork-specific. Arguably if there was a way to do the
allocation of the page for vcpu_info I would prefer that being elsewhere,
but while the only requirement is allocate-page and copy from parent I'm OK
with that logic being in here because it's really straight forward. But now
you also do extra sanity checks here which are harder to comprehend in this
context alone. What if extra sanity checks will be needed in the future? Or
the sanity checks in the future diverge from where this happens for normal
VMs because someone overlooks this needing to be synched here too?

> I also haven't been able to spot anything that could be factored
> out (and one might expect that if there was something, then the vCPU
> info area copying should also already have used it). map_guest_area()
> is all that is used for other purposes as well.

Well, there must be a location where all this happens for normal VMs as
well, no? Why not factor that code so that it can be called from here, so
that we don't have to track sanity check requirements in two different
locations? Or for normal VMs that sanity checking bit isn't required? If
so, why?

> >> +
> >> +copy_domain_page(cd_mfn, d_mfn);
> >> +
> >> +return 0;
> >> +}
> >> +
> >>  static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> >>  {
> >>  struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> >> @@ -1745,6 +1804,16 @@ static int copy_vcpu_settings(struct dom
> >>  copy_domain_page(new_vcpu_info_mfn, vcpu_info_mfn);
> >>  }
> >>
> >> +/* Same for the (physically registered) runstate and time info
> > areas. */
> >> +ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
> >> +  &d_vcpu->runstate_guest_area, cd_vcpu,
d);
> >> +if ( ret )
> >> +return ret;
> >> +ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
> >> +  &d_vcpu->arch.time_guest_area, cd_vcpu,
d);
> >> +if ( ret )
> >> +ret

[xen-unstable test] 176062: regressions - FAIL

2023-01-23 Thread osstest service owner
flight 176062 xen-unstable real [real]
flight 176073 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/176062/
http://logs.test-lab.xenproject.org/osstest/logs/176073/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-coresched-i386-xl 18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl-xsm   18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl   18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-pair  26 guest-migrate/src_host/dst_host fail REGR. vs. 175994
 test-amd64-i386-xl-vhd   17 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl-shadow18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-libvirt-pair 26 guest-migrate/src_host/dst_host fail REGR. vs. 
175994

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 175987
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 175987
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 175987
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 175994
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 175994
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 175994
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 175994
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 175994
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 175994
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 175994
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 175994
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 175994
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-

Re: [PATCH v1 07/14] xen/riscv: introduce exception handlers implementation

2023-01-23 Thread Andrew Cooper
On 23/01/2023 3:17 pm, Oleksii wrote:
> On Mon, 2023-01-23 at 11:50 +, Andrew Cooper wrote:
>> On 20/01/2023 2:59 pm, Oleksii Kurochko wrote:
>>> +    /* Save context to stack */
>>> +    REG_S   sp, (RISCV_CPU_USER_REGS_OFFSET(sp) -
>>> RISCV_CPU_USER_REGS_SIZE) (sp)
>>> +    addi    sp, sp, -RISCV_CPU_USER_REGS_SIZE
>>> +    REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(t0)(sp)
>> Exceptions on RISC-V don't adjust the stack pointer.  This logic
>> depends
>> on interrupting Xen code, and Xen not having suffered a stack
>> overflow
>> (and actually, that the space on the stack for all registers also
>> doesn't overflow).
>>
>> Which might be fine for now, but I think it warrants a comment
>> somewhere
>> (probably at handle_exception itself) stating the expectations while
>> it's still a work in progress.  So in this case something like:
>>
>> /* Work-in-progress:  Depends on interrupting Xen, and the stack
>> being
>> good. */
>>
>>
>> But, do we want to allocate stemp right away (even with an empty
>> struct), and get tp set up properly?
>>
> I am not sure that I get you here about stemp. Could you please clarify
> a little bit.

Sorry - sscratch, not stemp - I got the name wrong.

All registers are the interrupted context, not Xen's context.  This
includes the stack pointer, global pointer, and thread pointer.

Trap setup is supposed to stash Xen's tp in sscratch so on an
interrupt/exception, it can exchange sscratch with tp and recover the
stack pointer.

Linux plays games with having sscratch be 0 while in kernel and uses
this to determine whether the exception occurred in kernel or user
mode.  This is massive can of re-entrancy bugs that appears to be baked
into the architecture.

I genuinely can't figure out a safe way to cope with a stack overflow,
or a bad tp, because it is not safe to a pagefault until the exception
prologue has completed.  If you do, you'll switch back to the
interrupted task's tp and use that as if it were Xen's.

~Andrew


Re: [PATCH] automation: Modify static-mem check in qemu-smoke-dom0less-arm64.sh

2023-01-23 Thread Stefano Stabellini
On Mon, 23 Jan 2023, Michal Orzel wrote:
> At the moment, the static-mem check relies on the way Xen exposes the
> memory banks in device tree. As this might change, the check should be
> modified to be generic and not to rely on device tree. In this case,
> let's use /proc/iomem which exposes the memory ranges in %08x format
> as follows:
> - : 
> 
> This way, we can grep in /proc/iomem for an entry containing memory
> region defined by the static-mem configuration with "System RAM"
> description. If it exists, mark the test as passed. Also, take the
> opportunity to add 0x prefix to domu_{base,size} definition rather than
> adding it in front of each occurence.
> 
> Signed-off-by: Michal Orzel 

Acked-by: Stefano Stabellini 


> ---
> Patch made as part of the discussion:
> https://lore.kernel.org/xen-devel/ba37ee02-c07c-2803-0867-149c77989...@amd.com/
> 
> CC: Julien, Ayan
> ---
>  automation/scripts/qemu-smoke-dom0less-arm64.sh | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/automation/scripts/qemu-smoke-dom0less-arm64.sh 
> b/automation/scripts/qemu-smoke-dom0less-arm64.sh
> index 2b59346fdcfd..182a4b6c18fc 100755
> --- a/automation/scripts/qemu-smoke-dom0less-arm64.sh
> +++ b/automation/scripts/qemu-smoke-dom0less-arm64.sh
> @@ -16,14 +16,13 @@ fi
>  
>  if [[ "${test_variant}" == "static-mem" ]]; then
>  # Memory range that is statically allocated to DOM1
> -domu_base="5000"
> -domu_size="1000"
> +domu_base="0x5000"
> +domu_size="0x1000"
>  passed="${test_variant} test passed"
>  domU_check="
> -current=\$(hexdump -e '16/1 \"%02x\"' 
> /proc/device-tree/memory@${domu_base}/reg 2>/dev/null)
> -expected=$(printf \"%016x%016x\" 0x${domu_base} 0x${domu_size})
> -if [[ \"\${expected}\" == \"\${current}\" ]]; then
> - echo \"${passed}\"
> +mem_range=$(printf \"%08x-%08x\" ${domu_base} $(( ${domu_base} + 
> ${domu_size} - 1 )))
> +if grep -q -x \"\${mem_range} : System RAM\" /proc/iomem; then
> +echo \"${passed}\"
>  fi
>  "
>  fi
> @@ -126,7 +125,7 @@ UBOOT_SOURCE="boot.source"
>  UBOOT_SCRIPT="boot.scr"' > binaries/config
>  
>  if [[ "${test_variant}" == "static-mem" ]]; then
> -echo -e "\nDOMU_STATIC_MEM[0]=\"0x${domu_base} 0x${domu_size}\"" >> 
> binaries/config
> +echo -e "\nDOMU_STATIC_MEM[0]=\"${domu_base} ${domu_size}\"" >> 
> binaries/config
>  fi
>  
>  if [[ "${test_variant}" == "boot-cpupools" ]]; then
> -- 
> 2.25.1
> 



Re: [PATCH v11] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-23 Thread Stefano Stabellini
On Sat, 21 Jan 2023, Chuck Zmudzinski wrote:
> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> as noted in docs/igd-assign.txt in the Qemu source code.
> 
> Currently, when the xl toolstack is used to configure a Xen HVM guest with
> Intel IGD passthrough to the guest with the Qemu upstream device model,
> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
> a different slot. This problem often prevents the guest from booting.
> 
> The only available workarounds are not good: Configure Xen HVM guests to
> use the old and no longer maintained Qemu traditional device model
> available from xenbits.xen.org which does reserve slot 2 for the Intel
> IGD or use the "pc" machine type instead of the "xenfv" machine type and
> add the xen platform device at slot 3 using a command line option
> instead of patching qemu to fix the "xenfv" machine type directly. The
> second workaround causes some degredation in startup performance such as
> a longer boot time and reduced resolution of the grub menu that is
> displayed on the monitor. This patch avoids that reduced startup
> performance when using the Qemu upstream device model for Xen HVM guests
> configured with the igd-passthru=on option.
> 
> To implement this feature in the Qemu upstream device model for Xen HVM
> guests, introduce the following new functions, types, and macros:
> 
> * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> * typedef XenPTQdevRealize function pointer
> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> * xen_igd_reserve_slot and xen_igd_clear_slot functions
> 
> Michael Tsirkin:
> * Introduce XEN_PCI_IGD_DOMAIN, XEN_PCI_IGD_BUS, XEN_PCI_IGD_DEV, and
>   XEN_PCI_IGD_FN - use them to compute the value of XEN_PCI_IGD_SLOT_MASK
> 
> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> the xl toolstack with the gfx_passthru option enabled, which sets the
> igd-passthru=on option to Qemu for the Xen HVM machine type.
> 
> The new xen_igd_reserve_slot function also needs to be implemented in
> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
> in which case it does nothing.
> 
> The new xen_igd_clear_slot function overrides qdev->realize of the parent
> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> 
> Move the call to xen_host_pci_device_get, and the associated error
> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> initialize the device class and vendor values which enables the checks for
> the Intel IGD to succeed. The verification that the host device is an
> Intel IGD to be passed through is done by checking the domain, bus, slot,
> and function values as well as by checking that gfx_passthru is enabled,
> the device class is VGA, and the device vendor in Intel.
> 
> Signed-off-by: Chuck Zmudzinski 
> ---
> Notes that might be helpful to reviewers of patched code in hw/xen:
> 
> The new functions and types are based on recommendations from Qemu docs:
> https://qemu.readthedocs.io/en/latest/devel/qom.html
> 
> Notes that might be helpful to reviewers of patched code in hw/i386:
> 
> The small patch to hw/i386/pc_piix.c is protected by CONFIG_XEN so it does
> not affect builds that do not have CONFIG_XEN defined.
> 
> xen_igd_gfx_pt_enabled() in the patched hw/i386/pc_piix.c file is an
> existing function that is only true when Qemu is built with
> xen-pci-passthrough enabled and the administrator has configured the Xen
> HVM guest with Qemu's igd-passthru=on option.
> 
> v2: Remove From:  tag at top of commit message
> 
> v3: Changed the test for the Intel IGD in xen_igd_clear_slot:
> 
> if (is_igd_vga_passthrough(&s->real_device) &&
> (s->real_device.vendor_id == PCI_VENDOR_ID_INTEL)) {
> 
> is changed to
> 
> if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
> && (s->hostaddr.function == 0)) {
> 
> I hoped that I could use the test in v2, since it matches the
> other tests for the Intel IGD in Qemu and Xen, but those tests
> do not work because the necessary data structures are not set with
> their values yet. So instead use the test that the administrator
> has enabled gfx_passthru and the device address on the host is
> 02.0. This test does detect the Intel IGD correctly.
> 
> v4: Use brchu...@aol.com instead of brchu...@netscape.net for the author's
> email address to match the address used by the same author in commits
> be9c61da and c0e86b76
> 
> Change variable for XEN_PT_DEVICE_CLASS: xptc change

[xen-unstable bisection] complete test-amd64-i386-xl-xsm

2023-01-23 Thread osstest service owner
branch xen-unstable
xenbranch xen-unstable
job test-amd64-i386-xl-xsm
testid guest-localmigrate

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  xen git://xenbits.xen.org/xen.git
  Bug introduced:  1894049fa283308d5f90446370be1ade7afe8975
  Bug not present: 20279afd732371dd2534380d27aa6d1863d82d1f
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/176075/


  commit 1894049fa283308d5f90446370be1ade7afe8975
  Author: Jan Beulich 
  Date:   Fri Jan 20 09:17:33 2023 +0100
  
  x86/shadow: L2H shadow type is PV32-only
  
  Like for the various HVM-only types, save a little bit of code by suitably
  "masking" this type out when !PV32.
  
  Signed-off-by: Jan Beulich 
  Acked-by: Andrew Cooper 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-unstable/test-amd64-i386-xl-xsm.guest-localmigrate.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/xen-unstable/test-amd64-i386-xl-xsm.guest-localmigrate
 --summary-out=tmp/176075.bisection-summary --basis-template=175994 
--blessings=real,real-bisect,real-retry xen-unstable test-amd64-i386-xl-xsm 
guest-localmigrate
Searching for failure / basis pass:
 176062 fail [host=fiano0] / 175994 [host=elbling0] 175987 [host=fiano1] 175965 
[host=elbling1] 175734 [host=debina1] 175726 [host=italia0] 175720 
[host=pinot1] 175714 [host=nobling0] 175694 [host=albana1] 175671 
[host=nobling1] 175651 [host=debina0] 175635 [host=huxelrebe0] 175624 
[host=nocera1] 175612 [host=albana0] 175601 [host=italia0] 175592 ok.
Failure / basis pass flights: 176062 / 175592
(tree with no url: minios)
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
625eb5e96dc96aa7fddef59a08edae215527f19c 
1d60c20260c7e82fe5344d06c20d718e0cc03b8b
Basis pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
1cf02b05b27c48775a25699e61b93b814b9ae042 
671f50ffab3329c5497208da89620322b9721a77
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#c3038e718a19fc596f7b1baba0f83d5146dc7784-c3038e718a19fc596f7b1baba0f83d5146dc7784
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/qemu-xen-traditional.git#3d273dd05e51e5a1ffba3d98c7437ee84e8f8764-3d273dd05e51e5a1ffba3d98c7437ee84e8f8764
 git://xenbits.xen.org/qemu-xen.git#1cf02b05b27c48775a25699e61b93b8\
 14b9ae042-625eb5e96dc96aa7fddef59a08edae215527f19c 
git://xenbits.xen.org/xen.git#671f50ffab3329c5497208da89620322b9721a77-1d60c20260c7e82fe5344d06c20d718e0cc03b8b
Loaded 10003 nodes in revision graph
Searching for test results:
 175592 pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
1cf02b05b27c48775a25699e61b93b814b9ae042 
671f50ffab3329c5497208da89620322b9721a77
 175601 [host=italia0]
 175612 [host=albana0]
 175624 [host=nocera1]
 175635 [host=huxelrebe0]
 175651 [host=debina0]
 175671 [host=nobling1]
 175694 [host=albana1]
 175714 [host=nobling0]
 175720 [host=pinot1]
 175726 [host=italia0]
 175734 [host=debina1]
 175834 []
 175861 []
 175890 []
 175907 []
 175931 []
 175956 []
 175965 [host=elbling1]
 175987 [host=fiano1]
 175994 [host=elbling0]
 176003 fail c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
625eb5e96dc96aa7fddef59a08edae215527f19c 
89cc5d96a9d1fce81cf58b6814dac62a9e07fbee
 176011 fail c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
625eb5e96dc96aa7fddef59a08edae215527f19c 
1d60c20260c7e82fe5344d06c20d718e0cc03b8b
 176025 fail c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
625eb5e96dc96aa7fddef59a08edae215527f19c 
1d60c20260c7e82fe5344d06c20d718e0cc03b8b
 176035 fail c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
625eb5e96dc96aa7fddef59a08edae

Re: [RISC-V] Switch to H-mode

2023-01-23 Thread Bobby Eshleman
On Mon, Jan 23, 2023 at 06:56:19PM +0200, Oleksii wrote:
> Hi Alistair and community,
> 
> I am working on RISC-V support upstream for Xen based on your and Bobby
> patches.
> 
> Adding the RISC-V support I realized that Xen is ran in S-mode. Output
> of OpenSBI:
> ...
> Domain0 Next Mode : S-mode
> ...
> So the first my question is shouldn't it be in H-mode?
> 
> If I am right than it looks like we have to do a patch to OpenSBI to
> add support of H-mode as it is not supported now:
> [1]
> https://github.com/riscv-software-src/opensbi/blob/master/lib/sbi/sbi_domain.c#L380
> [2]
> https://github.com/riscv-software-src/opensbi/blob/master/include/sbi/riscv_encoding.h#L110
> Please correct me if I am wrong.
> 
> The other option I see is to switch to H-mode in U-boot as I understand
> the classical boot flow is:
> OpenSBI -> U-boot -> Xen -> Domain{0,...}
> If it is at all possible since U-boot will be in S mode after OpenSBI.
> 
> Thanks in advance.
> 
> ~ Oleksii
> 

Ah, what you are seeing there is that the openSBI's Next Mode excludes
the virtualization mode (it treats HS and S synonymously) and it is only
used for setting the mstatus MPP. The code also has next_virt for
setting the MPV but I don't think that is exposed via the device tree
yet. For Xen, you'd want next_mode = PRIV_S and next_virt = 0 (HS mode,
not VS mode). The relevant setup prior to mret is here for interested
readers:
https://github.com/riscv-software-src/opensbi/blob/001106d19b21cd6443ae7f7f6d4d048d80e9ecac/lib/sbi/sbi_hart.c#L759

As long as the next_mode and next_virt are set correctly, then Xen
should be launching in HS mode. I do believe this should be default for
the stock build too for Domain0, unless something has changed.

Thanks,
Bobby



Re: [XEN v3 1/3] xen/arm: Use the correct format specifier

2023-01-23 Thread Stefano Stabellini
On Mon, 23 Jan 2023, Ayan Kumar Halder wrote:
> 1. One should use 'PRIpaddr' to display 'paddr_t' variables. However,
> while creating nodes in fdt, the address (if present in the node name)
> should be represented using 'PRIx64'. This is to be in conformance
> with the following rule present in https://elinux.org/Device_Tree_Linux
> 
> . node names
> "unit-address does not have leading zeros"
> 
> As 'PRIpaddr' introduces leading zeros, we cannot use it.
> 
> So, we have introduced a wrapper ie domain_fdt_begin_node() which will
> represent physical address using 'PRIx64'.
> 
> 2. One should use 'PRIx64' to display 'u64' in hex format. The current
> use of 'PRIpaddr' for printing PTE is buggy as this is not a physical
> address.
> 
> Signed-off-by: Ayan Kumar Halder 
> ---
> 
> Changes from -
> 
> v1 - 1. Moved the patch earlier.
> 2. Moved a part of change from "[XEN v1 8/9] xen/arm: Other adaptations 
> required to support 32bit paddr"
> into this patch.
> 
> v2 - 1. Use PRIx64 for appending addresses to fdt node names. This fixes the 
> CI failure.
> 
>  xen/arch/arm/domain_build.c | 45 +
>  xen/arch/arm/gic-v2.c   |  6 ++---
>  xen/arch/arm/mm.c   |  2 +-

The changes to mm.c and gic-v2.c look OK and I'd ack them already. One
question on the changes to domain_build.c below.


>  3 files changed, 25 insertions(+), 28 deletions(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index f35f4d2456..97c2395f9a 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1288,6 +1288,20 @@ static int __init fdt_property_interrupts(const struct 
> kernel_info *kinfo,
>  return res;
>  }
>  
> +static int __init domain_fdt_begin_node(void *fdt, const char *name,
> +uint64_t unit)
> +{
> +/*
> + * The size of the buffer to hold the longest possible string ie
> + * interrupt-controller@ + a 64-bit number + \0
> + */
> +char buf[38];
> +
> +/* ePAPR 3.4 */
> +snprintf(buf, sizeof(buf), "%s@%"PRIx64, name, unit);
> +return fdt_begin_node(fdt, buf);
> +}
> +
>  static int __init make_memory_node(const struct domain *d,
> void *fdt,
> int addrcells, int sizecells,
> @@ -1296,8 +1310,6 @@ static int __init make_memory_node(const struct domain 
> *d,
>  unsigned int i;
>  int res, reg_size = addrcells + sizecells;
>  int nr_cells = 0;
> -/* Placeholder for memory@ + a 64-bit number + \0 */
> -char buf[24];
>  __be32 reg[NR_MEM_BANKS * 4 /* Worst case addrcells + sizecells */];
>  __be32 *cells;
>  
> @@ -1314,9 +1326,7 @@ static int __init make_memory_node(const struct domain 
> *d,
>  
>  dt_dprintk("Create memory node\n");
>  
> -/* ePAPR 3.4 */
> -snprintf(buf, sizeof(buf), "memory@%"PRIx64, mem->bank[i].start);
> -res = fdt_begin_node(fdt, buf);
> +res = domain_fdt_begin_node(fdt, "memory", mem->bank[i].start);

Basically this "hides" the paddr_t->uint64_t cast because it happens
implicitly when passing mem->bank[i].start as an argument to
domain_fdt_begin_node.

To be honest, I don't know if it is necessary. Also a normal cast would
be fine:

snprintf(buf, sizeof(buf), "memory@%"PRIx64, (uint64_t)mem->bank[i].start);
res = fdt_begin_node(fdt, buf);

Julien, what do you prefer?



Re: [XEN v3 3/3] xen/drivers: ns16550: Fix an incorrect assignment to uart->io_size

2023-01-23 Thread Stefano Stabellini
On Mon, 23 Jan 2023, Ayan Kumar Halder wrote:
> uart->io_size represents the size in bytes. Thus, when serial_port.bit_width
> is assigned to it, it should be converted to size in bytes.
> 
> Fixes: 17b516196c55 ("ns16550: add ACPI support for ARM only")
> Signed-off-by: Ayan Kumar Halder 

Reviewed-by: Stefano Stabellini 


> ---
> 
> Changes from -
> 
> v1, v2 - NA (New patch introduced in v3).
> 
>  xen/drivers/char/ns16550.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
> index 43e1f971ab..092f6b9c4b 100644
> --- a/xen/drivers/char/ns16550.c
> +++ b/xen/drivers/char/ns16550.c
> @@ -1870,7 +1870,7 @@ static int __init ns16550_acpi_uart_init(const void 
> *data)
>  uart->parity = spcr->parity;
>  uart->stop_bits = spcr->stop_bits;
>  uart->io_base = spcr->serial_port.address;
> -uart->io_size = spcr->serial_port.bit_width;
> +uart->io_size = DIV_ROUND_UP(spcr->serial_port.bit_width, BITS_PER_BYTE);
>  uart->reg_shift = spcr->serial_port.bit_offset;
>  uart->reg_width = spcr->serial_port.access_width;
>  
> -- 
> 2.17.1
> 
> 



Re: [PATCH 01/22] xen/common: page_alloc: Re-order includes

2023-01-23 Thread Stefano Stabellini
On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Julien Grall 
> 
> Order the includes with the xen headers first, then asm headers and
> last public headers. Within each category, they are sorted alphabetically.
> 
> Note that the includes in protected by CONFIG_X86 hasn't been sorted
> to avoid adding multiple #ifdef.
> 
> Signed-off-by: Julien Grall 

This patch doesn't apply as is any longer. Assuming it gets ported to
the latest staging appropriately:

Acked-by: Stefano Stabellini 


> 
> 
> I am open to add sort the includes protected by CONFIG_X86
> and add multiple #ifdef if this is preferred.
> ---
>  xen/common/page_alloc.c | 29 -
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 0c93a1078702..0a950288e241 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -120,27 +120,30 @@
>   *   regions within it.
>   */
>  
> +#include 
> +#include 
>  #include 
> -#include 
> +#include 
> +#include 
>  #include 
> -#include 
> -#include 
>  #include 
> +#include 
> +#include 
>  #include 
> -#include 
> -#include 
> -#include 
> -#include 
>  #include 
>  #include 
> -#include 
> -#include 
> -#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
>  #include 
>  #include 
> -#include 
> -#include 
> -#include 
> +
>  #ifdef CONFIG_X86
>  #include 
>  #include 
> -- 
> 2.38.1
> 



Re: [PATCH 02/22] x86/setup: move vm_init() before acpi calls

2023-01-23 Thread Stefano Stabellini
On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Wei Liu 
> 
> After the direct map removal, pages from the boot allocator are not
> mapped at all in the direct map. Although we have map_domain_page, they
> are ephemeral and are less helpful for mappings that are more than a
> page, so we want a mechanism to globally map a range of pages, which is
> what vmap is for. Therefore, we bring vm_init into early boot stage.
> 
> To allow vmap to be initialised and used in early boot, we need to
> modify vmap to receive pages from the boot allocator during early boot
> stage.
> 
> Signed-off-by: Wei Liu 
> Signed-off-by: David Woodhouse 
> Signed-off-by: Hongyan Xia 
> Signed-off-by: Julien Grall 

For the arm and common parts:

Reviewed-by: Stefano Stabellini 


> ---
>  xen/arch/arm/setup.c |  4 ++--
>  xen/arch/x86/setup.c | 31 ---
>  xen/common/vmap.c| 37 +
>  3 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 1f26f67b90e3..2311726f5ddd 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -1028,6 +1028,8 @@ void __init start_xen(unsigned long boot_phys_offset,
>  
>  setup_mm();
>  
> +vm_init();
> +
>  /* Parse the ACPI tables for possible boot-time configuration */
>  acpi_boot_table_init();
>  
> @@ -1039,8 +1041,6 @@ void __init start_xen(unsigned long boot_phys_offset,
>   */
>  system_state = SYS_STATE_boot;
>  
> -vm_init();
> -
>  if ( acpi_disabled )
>  {
>  printk("Booting using Device Tree\n");
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 6bb5bc7c84be..1c2e09711eb0 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -870,6 +870,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  unsigned long eb_start, eb_end;
>  bool acpi_boot_table_init_done = false, relocated = false;
>  int ret;
> +bool vm_init_done = false;
>  struct ns16550_defaults ns16550 = {
>  .data_bits = 8,
>  .parity= 'n',
> @@ -1442,12 +1443,23 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  continue;
>  
>  if ( !acpi_boot_table_init_done &&
> - s >= (1ULL << 32) &&
> - !acpi_boot_table_init() )
> + s >= (1ULL << 32) )
>  {
> -acpi_boot_table_init_done = true;
> -srat_parse_regions(s);
> -setup_max_pdx(raw_max_page);
> +/*
> + * We only initialise vmap and acpi after going through the 
> bottom
> + * 4GiB, so that we have enough pages in the boot allocator.
> + */
> +if ( !vm_init_done )
> +{
> +vm_init();
> +vm_init_done = true;
> +}
> +if ( !acpi_boot_table_init() )
> +{
> +acpi_boot_table_init_done = true;
> +srat_parse_regions(s);
> +setup_max_pdx(raw_max_page);
> +}
>  }
>  
>  if ( pfn_to_pdx((e - 1) >> PAGE_SHIFT) >= max_pdx )
> @@ -1624,6 +1636,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>  init_frametable();
>  
> +if ( !vm_init_done )
> +vm_init();
> +
>  if ( !acpi_boot_table_init_done )
>  acpi_boot_table_init();
>  
> @@ -1661,12 +1676,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  end_boot_allocator();
>  
>  system_state = SYS_STATE_boot;
> -/*
> - * No calls involving ACPI code should go between the setting of
> - * SYS_STATE_boot and vm_init() (or else acpi_os_{,un}map_memory()
> - * will break).
> - */
> -vm_init();
>  
>  bsp_stack = cpu_alloc_stack(0);
>  if ( !bsp_stack )
> diff --git a/xen/common/vmap.c b/xen/common/vmap.c
> index 4fd6b3067ec1..1340c7c6faf6 100644
> --- a/xen/common/vmap.c
> +++ b/xen/common/vmap.c
> @@ -34,9 +34,20 @@ void __init vm_init_type(enum vmap_region type, void 
> *start, void *end)
>  
>  for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += 
> PAGE_SIZE )
>  {
> -struct page_info *pg = alloc_domheap_page(NULL, 0);
> +mfn_t mfn;
> +int rc;
>  
> -map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
> +if ( system_state == SYS_STATE_early_boot )
> +mfn = alloc_boot_pages(1, 1);
> +else
> +{
> +struct page_info *pg = alloc_domheap_page(NULL, 0);
> +
> +BUG_ON(!pg);
> +mfn = page_to_mfn(pg);
> +}
> +rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
> +BUG_ON(rc);
>  clear_page((void *)va);
>  }
>  bitmap_fill(vm_bitmap(type), vm_low[type]);
> @@ -62,7 +73,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
>  spin_lock(&vm_lock);
>  for ( ; ; )
>  {
> -struct page_info *pg;
> 

Re: [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory

2023-01-23 Thread Stefano Stabellini
On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Hongyan Xia 
> 
> Also, introduce a wrapper around vmap that maps a contiguous range for
> boot allocations. Unfortunately, the new helper cannot be a static inline
> because the dependences are a mess. We would need to re-include
> asm/page.h (was removed in aa4b9d1ee653 "include: don't use asm/page.h
> from common headers") and it doesn't look to be enough anymore
> because bits from asm/cpufeature.h is used in the definition of PAGE_NX.
> 
> Signed-off-by: Hongyan Xia 
> Signed-off-by: Julien Grall 

I saw Jan's comments and I agree with them but I also wanted to track
that I reviewed this patch and looks OK:

Reviewed-by: Stefano Stabellini 


> 
> 
> Changes since Hongyan's version:
> * Rename vmap_boot_pages() to vmap_contig_pages()
> * Move the new helper in vmap.c to avoid compilation issue
> * Don't use __pa() to translate the virtual address
> ---
>  xen/common/vmap.c  |  5 +
>  xen/drivers/acpi/osl.c | 13 +++--
>  xen/include/xen/vmap.h |  2 ++
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/vmap.c b/xen/common/vmap.c
> index 1340c7c6faf6..78f051a67682 100644
> --- a/xen/common/vmap.c
> +++ b/xen/common/vmap.c
> @@ -244,6 +244,11 @@ void *vmap(const mfn_t *mfn, unsigned int nr)
>  return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
>  }
>  
> +void *vmap_contig_pages(mfn_t mfn, unsigned int nr_pages)
> +{
> +return __vmap(&mfn, nr_pages, 1, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
> +}
> +
>  void vunmap(const void *va)
>  {
>  unsigned long addr = (unsigned long)va;
> diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
> index 389505f78666..44a9719b0dcf 100644
> --- a/xen/drivers/acpi/osl.c
> +++ b/xen/drivers/acpi/osl.c
> @@ -221,7 +221,11 @@ void *__init acpi_os_alloc_memory(size_t sz)
>   void *ptr;
>  
>   if (system_state == SYS_STATE_early_boot)
> - return mfn_to_virt(mfn_x(alloc_boot_pages(PFN_UP(sz), 1)));
> + {
> + mfn_t mfn = alloc_boot_pages(PFN_UP(sz), 1);
> +
> + return vmap_contig_pages(mfn, PFN_UP(sz));
> + }
>  
>   ptr = xmalloc_bytes(sz);
>   ASSERT(!ptr || is_xmalloc_memory(ptr));
> @@ -246,5 +250,10 @@ void __init acpi_os_free_memory(void *ptr)
>   if (is_xmalloc_memory(ptr))
>   xfree(ptr);
>   else if (ptr && system_state == SYS_STATE_early_boot)
> - init_boot_pages(__pa(ptr), __pa(ptr) + PAGE_SIZE);
> + {
> + paddr_t addr = mfn_to_maddr(vmap_to_mfn(ptr));
> +
> + vunmap(ptr);
> + init_boot_pages(addr, addr + PAGE_SIZE);
> + }
>  }
> diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
> index b0f7632e8985..3c06c7c3ba30 100644
> --- a/xen/include/xen/vmap.h
> +++ b/xen/include/xen/vmap.h
> @@ -23,6 +23,8 @@ void *vmalloc_xen(size_t size);
>  void *vzalloc(size_t size);
>  void vfree(void *va);
>  
> +void *vmap_contig_pages(mfn_t mfn, unsigned int nr_pages);
> +
>  void __iomem *ioremap(paddr_t, size_t);
>  
>  static inline void iounmap(void __iomem *va)
> -- 
> 2.38.1
> 



Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map

2023-01-23 Thread Stefano Stabellini
On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Hongyan Xia 
> 
> Also add a helper function to retrieve it. Change arch_mfns_in_direct_map
> to check this option before returning.
> 
> This is added as a boot command line option, not a Kconfig to allow
> the user to experiment the feature without rebuild the hypervisor.
> 
> Signed-off-by: Hongyan Xia 
> Signed-off-by: Julien Grall 
> 
> 
> 
> TODO:
> * Do we also want to provide a Kconfig option?
> 
> Changes since Hongyan's version:
> * Reword the commit message
> * opt_directmap is only modified during boot so mark it as
>   __ro_after_init
> ---
>  docs/misc/xen-command-line.pandoc | 12 
>  xen/arch/arm/include/asm/mm.h |  5 +
>  xen/arch/x86/include/asm/mm.h | 17 -
>  xen/arch/x86/mm.c |  3 +++
>  xen/arch/x86/setup.c  |  2 ++
>  5 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc 
> b/docs/misc/xen-command-line.pandoc
> index b7ee97be762e..a63e4612acac 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -760,6 +760,18 @@ Specify the size of the console debug trace buffer. By 
> specifying `cpu:`
>  additionally a trace buffer of the specified size is allocated per cpu.
>  The debug trace feature is only enabled in debugging builds of Xen.
>  
> +### directmap (x86)
> +> `= `
> +
> +> Default: `true`
> +
> +Enable or disable the direct map region in Xen.
> +
> +By default, Xen creates the direct map region which maps physical memory
> +in that region. Setting this to no will remove the direct map, blocking
> +exploits that leak secrets via speculative memory access in the direct
> +map.
> +
>  ### dma_bits
>  > `= `
>  
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 68adcac9fa8d..2366928d71aa 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -406,6 +406,11 @@ static inline void page_set_xenheap_gfn(struct page_info 
> *p, gfn_t gfn)
>  } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
>  }
>  
> +static inline bool arch_has_directmap(void)
> +{
> +return true;

Shoudn't arch_has_directmap return false for arm32?



> +}
> +
>  #endif /*  __ARCH_ARM_MM__ */
>  /*
>   * Local variables:
> diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
> index db29e3e2059f..cf8b20817c6c 100644
> --- a/xen/arch/x86/include/asm/mm.h
> +++ b/xen/arch/x86/include/asm/mm.h
> @@ -464,6 +464,8 @@ static inline int get_page_and_type(struct page_info 
> *page,
>  ASSERT(((_p)->count_info & PGC_count_mask) != 0);  \
>  ASSERT(page_get_owner(_p) == (_d))
>  
> +extern bool opt_directmap;
> +
>  
> /**
>   * With shadow pagetables, the different kinds of address start
>   * to get get confusing.
> @@ -620,13 +622,26 @@ extern const char zero_page[];
>  /* Build a 32bit PSE page table using 4MB pages. */
>  void write_32bit_pse_identmap(uint32_t *l2);
>  
> +static inline bool arch_has_directmap(void)
> +{
> +return opt_directmap;
> +}
> +
>  /*
>   * x86 maps part of physical memory via the directmap region.
>   * Return whether the range of MFN falls in the directmap region.
> + *
> + * When boot command line sets directmap=no, we will not have a direct map at
> + * all so this will always return false.
>   */
>  static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long 
> nr)
>  {
> -unsigned long eva = min(DIRECTMAP_VIRT_END, HYPERVISOR_VIRT_END);
> +unsigned long eva;
> +
> +if ( !arch_has_directmap() )
> +return false;
> +
> +eva = min(DIRECTMAP_VIRT_END, HYPERVISOR_VIRT_END);
>  
>  return (mfn + nr) <= (virt_to_mfn(eva - 1) + 1);
>  }
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 041bd4cfde17..e76e135b96fc 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -157,6 +157,9 @@ l1_pgentry_t __section(".bss.page_aligned") 
> __aligned(PAGE_SIZE)
>  l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
>  l1_fixmap_x[L1_PAGETABLE_ENTRIES];
>  
> +bool __ro_after_init opt_directmap = true;
> +boolean_param("directmap", opt_directmap);
> +
>  /* Frame table size in pages. */
>  unsigned long max_page;
>  unsigned long total_pages;
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 1c2e09711eb0..2cb051c6e4e7 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1423,6 +1423,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  if ( highmem_start )
>  xenheap_max_mfn(PFN_DOWN(highmem_start - 1));
>  
> +printk("Booting with directmap %s\n", arch_has_directmap() ? "on" : 
> "off");
> +
>  /*
>   * Walk every RAM region and map it in its entirety (on x86/64, at least)
>   * and notify it to the boot allocator.
> -- 

Re: [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention

2023-01-23 Thread Stefano Stabellini
On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Julien Grall 
> 
> At the moment the fixmap slots are prefixed differently between arm and
> x86.
> 
> Some of them (e.g. the PMAP slots) are used in common code. So it would
> be better if they are named the same way to avoid having to create
> aliases.
> 
> I have decided to use the x86 naming because they are less change. So
> all the Arm fixmap slots will now be prefixed with FIX rather than
> FIXMAP.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 


> 
> 
> Note that potentially more renaming that could be done to share
> more code in future. I have decided to not do that to avoid going
> down a rabbit hole.
> ---
>  xen/arch/arm/acpi/lib.c | 18 +-
>  xen/arch/arm/include/asm/early_printk.h |  2 +-
>  xen/arch/arm/include/asm/fixmap.h   | 16 
>  xen/arch/arm/kernel.c   |  6 +++---
>  xen/common/pmap.c   |  8 
>  5 files changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c
> index 41d521f720ac..736cf09ecaa8 100644
> --- a/xen/arch/arm/acpi/lib.c
> +++ b/xen/arch/arm/acpi/lib.c
> @@ -40,10 +40,10 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
>  return NULL;
>  
>  offset = phys & (PAGE_SIZE - 1);
> -base = FIXMAP_ADDR(FIXMAP_ACPI_BEGIN) + offset;
> +base = FIXMAP_ADDR(FIX_ACPI_BEGIN) + offset;
>  
>  /* Check the fixmap is big enough to map the region */
> -if ( (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - base) < size )
> +if ( (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE - base) < size )
>  return NULL;
>  
>  /* With the fixmap, we can only map one region at the time */
> @@ -54,7 +54,7 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
>  
>  size += offset;
>  mfn = maddr_to_mfn(phys);
> -idx = FIXMAP_ACPI_BEGIN;
> +idx = FIX_ACPI_BEGIN;
>  
>  do {
>  set_fixmap(idx, mfn, PAGE_HYPERVISOR);
> @@ -72,8 +72,8 @@ bool __acpi_unmap_table(const void *ptr, unsigned long size)
>  unsigned int idx;
>  
>  /* We are only handling fixmap address in the arch code */
> -if ( (vaddr < FIXMAP_ADDR(FIXMAP_ACPI_BEGIN)) ||
> - (vaddr >= (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE)) )
> +if ( (vaddr < FIXMAP_ADDR(FIX_ACPI_BEGIN)) ||
> + (vaddr >= (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE)) )
>  return false;
>  
>  /*
> @@ -81,16 +81,16 @@ bool __acpi_unmap_table(const void *ptr, unsigned long 
> size)
>   * for the ACPI fixmap region. The caller is expected to free with
>   * the same address.
>   */
> -ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIXMAP_ACPI_BEGIN));
> +ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIX_ACPI_BEGIN));
>  
>  /* The region allocated fit in the ACPI fixmap region. */
> -ASSERT(size < (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - vaddr));
> +ASSERT(size < (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE - vaddr));
>  ASSERT(fixmap_inuse);
>  
>  fixmap_inuse = false;
>  
> -size += vaddr - FIXMAP_ADDR(FIXMAP_ACPI_BEGIN);
> -idx = FIXMAP_ACPI_BEGIN;
> +size += vaddr - FIXMAP_ADDR(FIX_ACPI_BEGIN);
> +idx = FIX_ACPI_BEGIN;
>  
>  do
>  {
> diff --git a/xen/arch/arm/include/asm/early_printk.h 
> b/xen/arch/arm/include/asm/early_printk.h
> index c5149b2976da..a5f48801f476 100644
> --- a/xen/arch/arm/include/asm/early_printk.h
> +++ b/xen/arch/arm/include/asm/early_printk.h
> @@ -17,7 +17,7 @@
>  
>  /* need to add the uart address offset in page to the fixmap address */
>  #define EARLY_UART_VIRTUAL_ADDRESS \
> -(FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & 
> ~PAGE_MASK))
> +(FIXMAP_ADDR(FIX_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & 
> ~PAGE_MASK))
>  
>  #endif /* !CONFIG_EARLY_PRINTK */
>  
> diff --git a/xen/arch/arm/include/asm/fixmap.h 
> b/xen/arch/arm/include/asm/fixmap.h
> index d0c9a52c8c28..154db85686c2 100644
> --- a/xen/arch/arm/include/asm/fixmap.h
> +++ b/xen/arch/arm/include/asm/fixmap.h
> @@ -8,17 +8,17 @@
>  #include 
>  
>  /* Fixmap slots */
> -#define FIXMAP_CONSOLE  0  /* The primary UART */
> -#define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
> -#define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
> -#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  
> /* End mappings of ACPI tables */
> -#define FIXMAP_PMAP_BEGIN (FIXMAP_ACPI_END + 1) /* Start of PMAP */
> -#define FIXMAP_PMAP_END (FIXMAP_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of 
> PMAP */
> +#define FIX_CONSOLE  0  /* The primary UART */
> +#define FIX_MISC 1  /* Ephemeral mappings of hardware */
> +#define FIX_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
> +#define FIX_ACPI_END(FIX_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* End 
> mappings of ACPI tables */
> +#define FIX_PMAP_BEGIN (FIX_ACPI_END + 1) /* Start of PM

  1   2   >