Re: [PATCH v3] xen/evtchn: Introduce new IOCTL to bind static evtchn

2023-07-26 Thread Juergen Gross

On 18.07.23 13:31, Rahul Singh wrote:

Xen 4.17 supports the creation of static evtchns. To allow user space
application to bind static evtchns introduce new ioctl
"IOCTL_EVTCHN_BIND_STATIC". Existing IOCTL doing more than binding
that’s why we need to introduce the new IOCTL to only bind the static
event channels.

Static evtchns to be available for use during the lifetime of the
guest. When the application exits, __unbind_from_irq() ends up being
called from release() file operations because of that static evtchns
are getting closed. To avoid closing the static event channel, add the
new bool variable "is_static" in "struct irq_info" to mark the event
channel static when creating the event channel to avoid closing the
static evtchn.

Also, take this opportunity to remove the open-coded version of the
evtchn close in drivers/xen/evtchn.c file and use xen_evtchn_close().

Signed-off-by: Rahul Singh 


Pushed to xen/tip.git for-linus-6.5a


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] xenbus: check xen_domain in xenbus_probe_initcall

2023-07-26 Thread Juergen Gross

On 22.07.23 01:13, Stefano Stabellini wrote:

The same way we already do in xenbus_init.
Fixes the following warning:

[  352.175563] Trying to free already-free IRQ 0
[  352.177355] WARNING: CPU: 1 PID: 88 at kernel/irq/manage.c:1893 
free_irq+0xbf/0x350
[...]
[  352.213951] Call Trace:
[  352.214390]  
[  352.214717]  ? __warn+0x81/0x170
[  352.215436]  ? free_irq+0xbf/0x350
[  352.215906]  ? report_bug+0x10b/0x200
[  352.216408]  ? prb_read_valid+0x17/0x20
[  352.216926]  ? handle_bug+0x44/0x80
[  352.217409]  ? exc_invalid_op+0x13/0x60
[  352.217932]  ? asm_exc_invalid_op+0x16/0x20
[  352.218497]  ? free_irq+0xbf/0x350
[  352.218979]  ? __pfx_xenbus_probe_thread+0x10/0x10
[  352.219600]  xenbus_probe+0x7a/0x80
[  352.221030]  xenbus_probe_thread+0x76/0xc0

Signed-off-by: Stefano Stabellini 
Tested-by: Petr Mladek 


Pushed to xen/tip.git for-linus-6.5a


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v4] xen: speed up grant-table reclaim

2023-07-26 Thread Juergen Gross

On 26.07.23 18:52, Demi Marie Obenour wrote:

When a grant entry is still in use by the remote domain, Linux must put
it on a deferred list.  Normally, this list is very short, because
the PV network and block protocols expect the backend to unmap the grant
first.  However, Qubes OS's GUI protocol is subject to the constraints
of the X Window System, and as such winds up with the frontend unmapping
the window first.  As a result, the list can grow very large, resulting
in a massive memory leak and eventual VM freeze.

To partially solve this problem, make the number of entries that the VM
will attempt to free at each iteration tunable.  The default is still
10, but it can be overridden via a module parameter.

This is Cc: stable because (when combined with appropriate userspace
changes) it fixes a severe performance and stability problem for Qubes
OS users.

Cc: sta...@vger.kernel.org
Signed-off-by: Demi Marie Obenour 


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] libxl: Add missing libxl__virtio_devtype to device_type_tbl array

2023-07-26 Thread Viresh Kumar
On 26-07-23, 17:14, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko 
> 
> Without it being present it won't be possible to use some
> libxl__device_type's callbacks for virtio devices as the common code
> can only invoke these callbacks (by dereferencing a pointer) for valid
> libxl__device_type's elements when iterating over device_type_tbl[].
> 
> Signed-off-by: Oleksandr Tyshchenko 
> ---
>  tools/libs/light/libxl_create.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
> index 393c535579..c91059d713 100644
> --- a/tools/libs/light/libxl_create.c
> +++ b/tools/libs/light/libxl_create.c
> @@ -1887,6 +1887,7 @@ const libxl__device_type *device_type_tbl[] = {
>  __dtdev_devtype,
>  __vdispl_devtype,
>  __vsnd_devtype,
> +__virtio_devtype,
>  NULL
>  };

Reviewed-by: Viresh Kumar 

-- 
viresh



[qemu-mainline test] 182023: tolerable FAIL - PUSHED

2023-07-26 Thread osstest service owner
flight 182023 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182023/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182006
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182006
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 182006
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 182006
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 182006
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 182006
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 182006
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 182006
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuu6cb2011fedf8c4e7b66b4a3abd6b42c1bae99ce6
baseline version:
 qemuu885fc169f09f5915ce037263d20a59eb226d473d

Last test of basis   182006  2023-07-25 05:21:48 Z1 days
Testing same since   182023  2023-07-26 13:00:59 Z0 days1 attempts


People who touched revisions under test:
  Andrei Gudkov 
  Ani Sinha 
  Bastian Koppelmann 
  Bernhard Beschow 
  Guenter Roeck 
  Ilya Leoshkevich 
  Juan Quintela 
  Markus Armbruster 
  Michael Tokarev 
  Peter Maydell 
  Peter Xu 
  Philippe Mathieu-Daudé 
  Sean Estabrooks 
  Thomas Huth 
  Thomas Huth 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  

Re: [PATCH v2 44/47] mm: shrinker: make global slab shrink lockless

2023-07-26 Thread Qi Zheng

Hi Dave,

On 2023/7/27 07:09, Dave Chinner wrote:

On Wed, Jul 26, 2023 at 05:14:09PM +0800, Qi Zheng wrote:

On 2023/7/26 16:08, Dave Chinner wrote:

On Mon, Jul 24, 2023 at 05:43:51PM +0800, Qi Zheng wrote:

@@ -122,6 +126,13 @@ void shrinker_free_non_registered(struct shrinker 
*shrinker);
   void shrinker_register(struct shrinker *shrinker);
   void shrinker_unregister(struct shrinker *shrinker);
+static inline bool shrinker_try_get(struct shrinker *shrinker)
+{
+   return READ_ONCE(shrinker->registered) &&
+  refcount_inc_not_zero(>refcount);
+}


Why do we care about shrinker->registered here? If we don't set
the refcount to 1 until we have fully initialised everything, then
the shrinker code can key entirely off the reference count and
none of the lookup code needs to care about whether the shrinker is
registered or not.


The purpose of checking shrinker->registered here is to stop running
shrinker after calling shrinker_free(), which can prevent the following
situations from happening:

CPU 0 CPU 1

shrinker_try_get()

shrinker_try_get()

shrinker_put()
shrinker_try_get()
shrinker_put()


I don't see any race here? What is wrong with having multiple active
users at once?


Maybe I'm overthinking. What I think is that if there are multiple users
at once, it may cause the above-mentioned livelock, which will cause
shrinker_free() to wait for a long time. But this probability should be
very low.





This should use a completion, then it is always safe under
rcu_read_lock().  This also gets rid of the shrinker_lock spin lock,
which only exists because we can't take a blocking lock under
rcu_read_lock(). i.e:


void shrinker_put(struct shrinker *shrinker)
{
if (refcount_dec_and_test(>refcount))
complete(>done);
}

void shrinker_free()
{
.
refcount_dec(>refcount);


I guess what you mean is shrinker_put(), because here may be the last
refcount.


Yes, I did.


wait_for_completion(>done);
/*
 * lookups on the shrinker will now all fail as refcount has
 * fallen to zero. We can now remove it from the lists and
 * free it.
 */
down_write(shrinker_rwsem);
list_del_rcu(>list);
up_write(_rwsem);
call_rcu(shrinker->rcu_head, shrinker_free_rcu_cb);
}




@@ -686,11 +711,14 @@ EXPORT_SYMBOL(shrinker_free_non_registered);
   void shrinker_register(struct shrinker *shrinker)
   {
-   down_write(_rwsem);
-   list_add_tail(>list, _list);
-   shrinker->flags |= SHRINKER_REGISTERED;
+   refcount_set(>refcount, 1);
+
+   spin_lock(_lock);
+   list_add_tail_rcu(>list, _list);
+   spin_unlock(_lock);
+
shrinker_debugfs_add(shrinker);
-   up_write(_rwsem);
+   WRITE_ONCE(shrinker->registered, true);
   }
   EXPORT_SYMBOL(shrinker_register);


This just looks wrong - you are trying to use WRITE_ONCE() as a
release barrier to indicate that the shrinker is now set up fully.
That's not necessary - the refcount is an atomic and along with the
rcu locks they should provides all the barriers we need. i.e.


The reason I used WRITE_ONCE() here is because the shrinker->registered
will be read and written concurrently (read in shrinker_try_get() and
written in shrinker_free()), which is why I added shrinker::registered
field instead of using SHRINKER_REGISTERED flag (this can reduce the
addition of WRITE_ONCE()/READ_ONCE()).


Using WRITE_ONCE/READ_ONCE doesn't provide memory barriers needed to
use the field like this. You need release/acquire memory ordering
here. i.e. smp_store_release()/smp_load_acquire().

As it is, the refcount_inc_not_zero() provides a control dependency,
as documented in include/linux/refcount.h, refcount_dec_and_test()
provides release memory ordering. The only thing I think we may need
is a write barrier before refcount_set(), such that if
refcount_inc_not_zero() sees a non-zero value, it is guaranteed to
see an initialised structure...

i.e. refcounts provide all the existence and initialisation
guarantees. Hence I don't see the need to use shrinker->registered
like this and it can remain a bit flag protected by the
shrinker_rwsem().


Ah, I didn't consider the memory order with refcount when I added
WRITE_ONCE/READ_ONCE to shrinker->registered, just didn't want KCSAN
to complain (there are multiple visitors at the same time, one of which
is a writer).

And the livelock case mentioned above is indeed unlikely to happen, so
I will delete shrinker->registered in the next version.





void shrinker_register(struct shrinker *shrinker)
{
down_write(_rwsem);
list_add_tail_rcu(>list, _list);
shrinker->flags |= SHRINKER_REGISTERED;
shrinker_debugfs_add(shrinker);
up_write(_rwsem);

/*
 * now the shrinker is fully set up, take the first
 * reference to it to indicate that lookup operations are
   

[xen-unstable-smoke test] 182028: tolerable all pass - PUSHED

2023-07-26 Thread osstest service owner
flight 182028 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182028/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  f126d7eeba33a1de04f6d9f6f64855637d4eadb9
baseline version:
 xen  3d2d4ea026df73c37a7df7e216443cbf652ff892

Last test of basis   182024  2023-07-26 13:01:24 Z0 days
Testing same since   182028  2023-07-26 23:00:27 Z0 days1 attempts


People who touched revisions under test:
  Federico Serafini 
  Jan Beulich 
  Simone Ballarin 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   3d2d4ea026..f126d7eeba  f126d7eeba33a1de04f6d9f6f64855637d4eadb9 -> smoke



Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-26 Thread Volodymyr Babchuk
Hi Roger,

Roger Pau Monné  writes:

> On Wed, Jul 26, 2023 at 01:17:58AM +, Volodymyr Babchuk wrote:
>> 
>> Hi Roger,
>> 
>> Roger Pau Monné  writes:
>> 
>> > On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
>> >> From: Oleksandr Andrushchenko 
>> >> @@ -498,6 +537,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
>> >> unsigned int size,
>> >>  ASSERT(data_offset < size);
>> >>  }
>> >>  spin_unlock(>vpci->lock);
>> >> +unlock_locks(d);
>> >
>> > There's one issue here, some handlers will cal pcidevs_lock(), which
>> > will result in a lock over inversion, as in the previous patch we
>> > agreed that the locking order was pcidevs_lock first, d->pci_lock
>> > after.
>> >
>> > For example the MSI control_write() handler will call
>> > vpci_msi_arch_enable() which takes the pcidevs lock.  I think I will
>> > have to look into using a dedicated lock for MSI related handling, as
>> > that's the only place where I think we have this pattern of taking the
>> > pcidevs_lock after the d->pci_lock.
>> 
>> I'll mention this in the commit message. Is there something else that I
>> should do right now?
>
> Well, I don't think we want to commit this as-is with a known lock
> inversion.
>
> The functions that require the pcidevs lock are:
>
> pt_irq_{create,destroy}_bind()
> unmap_domain_pirq()
>
> AFAICT those functions require the lock in order to assert that the
> underlying device doesn't go away, as they do also use d->event_lock
> in order to get exclusive access to the data fields.  Please double
> check that I'm not mistaken.

You are right, all three function does not access any of PCI state
directly. However...

> If that's accurate you will have to check the call tree that spawns
> from those functions in order to modify the asserts to check for
> either the pcidevs or the per-domain pci_list lock being taken.

... I checked calls for PT_IRQ_TYPE_MSI case, there is only call that
bothers me: hvm_pi_update_irte(), which calls IO-MMU code via
vmx_pi_update_irte():

amd_iommu_msi_msg_update_ire() or msi_msg_write_remap_rte().

Both functions read basic pdev fields like sbfd or type. I see no
problem there, as values of those fields are not supposed to be changed.
Also those function use own locks to protect shared state. But as IO-MMU
code is quite convoluted it is hard to be sure that it is safe to call
those functions without holding pdevs_lock. All I can say is that those
functions and their callees have no ASSERT(pcidevs_locked()).

-- 
WBR, Volodymyr

[libvirt test] 182021: tolerable all pass - PUSHED

2023-07-26 Thread osstest service owner
flight 182021 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182021/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail blocked in 
182004
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182004
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182004
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  ea4c67f56769b292fdb03dc3e626bde22111c79d
baseline version:
 libvirt  411cbe7199ce533ae5fa78f5558dddca6f88ef1a

Last test of basis   182004  2023-07-25 04:20:37 Z1 days
Testing same since   182021  2023-07-26 10:25:25 Z0 days1 attempts


People who touched revisions under test:
  Han Han 
  Michal Privoznik 
  Oleg Vasilev 
  Pavel Borecki 
  Peter Krempa 
  Tim Wiederhake 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at

Re: [PATCH v2 44/47] mm: shrinker: make global slab shrink lockless

2023-07-26 Thread Dave Chinner
On Wed, Jul 26, 2023 at 05:14:09PM +0800, Qi Zheng wrote:
> On 2023/7/26 16:08, Dave Chinner wrote:
> > On Mon, Jul 24, 2023 at 05:43:51PM +0800, Qi Zheng wrote:
> > > @@ -122,6 +126,13 @@ void shrinker_free_non_registered(struct shrinker 
> > > *shrinker);
> > >   void shrinker_register(struct shrinker *shrinker);
> > >   void shrinker_unregister(struct shrinker *shrinker);
> > > +static inline bool shrinker_try_get(struct shrinker *shrinker)
> > > +{
> > > + return READ_ONCE(shrinker->registered) &&
> > > +refcount_inc_not_zero(>refcount);
> > > +}
> > 
> > Why do we care about shrinker->registered here? If we don't set
> > the refcount to 1 until we have fully initialised everything, then
> > the shrinker code can key entirely off the reference count and
> > none of the lookup code needs to care about whether the shrinker is
> > registered or not.
> 
> The purpose of checking shrinker->registered here is to stop running
> shrinker after calling shrinker_free(), which can prevent the following
> situations from happening:
> 
> CPU 0 CPU 1
> 
> shrinker_try_get()
> 
>shrinker_try_get()
> 
> shrinker_put()
> shrinker_try_get()
>shrinker_put()

I don't see any race here? What is wrong with having multiple active
users at once?

> > 
> > This should use a completion, then it is always safe under
> > rcu_read_lock().  This also gets rid of the shrinker_lock spin lock,
> > which only exists because we can't take a blocking lock under
> > rcu_read_lock(). i.e:
> > 
> > 
> > void shrinker_put(struct shrinker *shrinker)
> > {
> > if (refcount_dec_and_test(>refcount))
> > complete(>done);
> > }
> > 
> > void shrinker_free()
> > {
> > .
> > refcount_dec(>refcount);
> 
> I guess what you mean is shrinker_put(), because here may be the last
> refcount.

Yes, I did.

> > wait_for_completion(>done);
> > /*
> >  * lookups on the shrinker will now all fail as refcount has
> >  * fallen to zero. We can now remove it from the lists and
> >  * free it.
> >  */
> > down_write(shrinker_rwsem);
> > list_del_rcu(>list);
> > up_write(_rwsem);
> > call_rcu(shrinker->rcu_head, shrinker_free_rcu_cb);
> > }
> > 
> > 
> > 
> > > @@ -686,11 +711,14 @@ EXPORT_SYMBOL(shrinker_free_non_registered);
> > >   void shrinker_register(struct shrinker *shrinker)
> > >   {
> > > - down_write(_rwsem);
> > > - list_add_tail(>list, _list);
> > > - shrinker->flags |= SHRINKER_REGISTERED;
> > > + refcount_set(>refcount, 1);
> > > +
> > > + spin_lock(_lock);
> > > + list_add_tail_rcu(>list, _list);
> > > + spin_unlock(_lock);
> > > +
> > >   shrinker_debugfs_add(shrinker);
> > > - up_write(_rwsem);
> > > + WRITE_ONCE(shrinker->registered, true);
> > >   }
> > >   EXPORT_SYMBOL(shrinker_register);
> > 
> > This just looks wrong - you are trying to use WRITE_ONCE() as a
> > release barrier to indicate that the shrinker is now set up fully.
> > That's not necessary - the refcount is an atomic and along with the
> > rcu locks they should provides all the barriers we need. i.e.
> 
> The reason I used WRITE_ONCE() here is because the shrinker->registered
> will be read and written concurrently (read in shrinker_try_get() and
> written in shrinker_free()), which is why I added shrinker::registered
> field instead of using SHRINKER_REGISTERED flag (this can reduce the
> addition of WRITE_ONCE()/READ_ONCE()).

Using WRITE_ONCE/READ_ONCE doesn't provide memory barriers needed to
use the field like this. You need release/acquire memory ordering
here. i.e. smp_store_release()/smp_load_acquire().

As it is, the refcount_inc_not_zero() provides a control dependency,
as documented in include/linux/refcount.h, refcount_dec_and_test()
provides release memory ordering. The only thing I think we may need
is a write barrier before refcount_set(), such that if
refcount_inc_not_zero() sees a non-zero value, it is guaranteed to
see an initialised structure...

i.e. refcounts provide all the existence and initialisation
guarantees. Hence I don't see the need to use shrinker->registered
like this and it can remain a bit flag protected by the
shrinker_rwsem().


> > void shrinker_register(struct shrinker *shrinker)
> > {
> > down_write(_rwsem);
> > list_add_tail_rcu(>list, _list);
> > shrinker->flags |= SHRINKER_REGISTERED;
> > shrinker_debugfs_add(shrinker);
> > up_write(_rwsem);
> > 
> > /*
> >  * now the shrinker is fully set up, take the first
> >  * reference to it to indicate that lookup operations are
> >  * now allowed to use it via shrinker_try_get().
> >  */
> > refcount_set(>refcount, 1);
> > }
> > 
> > > diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
> > > index f1becfd45853..c5573066adbf 100644
> > > --- a/mm/shrinker_debug.c
> > > +++ b/mm/shrinker_debug.c
> > > @@ -5,6 +5,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   /* 

Re: [XEN PATCH v4 3/4] automation: Add ECLAIR pipelines

2023-07-26 Thread Stefano Stabellini
On Wed, 26 Jul 2023, Simone Ballarin wrote:
> Add two pipelines that analyze an ARM64 and a X86_64 build with the
> ECLAIR static analyzer on the guidelines contained in Set1.
> 
> The analysis configuration is stored in automation/eclair_analysis.
> 
> All commits on the xen-project/xen:staging branch will be analyzed
> and their artifacts will be stored indefinitely; the integration will
> report differential information with respect to the previous analysis.
> 
> All commits on other branches or repositories will be analyzed and
> only the last ten artifacts will be kept; the integration will report
> differential information with respect to the analysis done on the common
> ancestor with xen-project/xen:staging (if available).
> 
> Currently the pipeline variable ENABLE_ECLAIR_BOT is set to "n".
> Doing so disables the generation of comments with the analysis summary
> on the commit threads. The variable can be set to "y" if the a masked
> variable named ECLAIR_BOT_TOKEN is set with the impersonation token of
> an account with enough privileges to write on all repositories.
> 
> Additionaly any repository should be able to read a masked variable
> named WTOKEN with the token provided by BUGSENG.
> 
> The analysis fails if it contains violations of guidelines tagged as
> clean:added. The list of clean guidelines are maintained in
> automation/eclair_analysis/ECLAIR/tagging.ecl.
> 
> Signed-off-by: Simone Ballarin 

Reviewed-by: Stefano Stabellini 

I committed the series with one change:

> --

This needs to be exactly --- so that git am will remove the content
below on commit. In this case I removed it manually.


> Changes in v4:
> - move link to the analysis results at the end of the console log
> 
> Changes in v3:
> - split definitions of the ECLAIR pipelines in a separate patch;
> - if the WTOKEN variable is missing now the analysis fails immediately.
> 
> Changes in v2:
> - add ECLAIR configuration files (before they were fetched from a separate
> repository);
> - now the pipeline fails if there are new violations of guidelines tagged
> with clean:added.
> ---
>  .gitlab-ci.yml|  2 ++
>  automation/gitlab-ci/analyze.yaml | 38 +++
>  automation/gitlab-ci/build.yaml   |  1 +
>  automation/scripts/eclair | 34 +++
>  4 files changed, 75 insertions(+)
>  create mode 100644 automation/gitlab-ci/analyze.yaml
>  create mode 100755 automation/scripts/eclair
> 
> diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> index c8bd7519d5..ee5430b8b7 100644
> --- a/.gitlab-ci.yml
> +++ b/.gitlab-ci.yml
> @@ -1,7 +1,9 @@
>  stages:
> +  - analyze
>- build
>- test
>  
>  include:
> +  - 'automation/gitlab-ci/analyze.yaml'
>- 'automation/gitlab-ci/build.yaml'
>- 'automation/gitlab-ci/test.yaml'
> diff --git a/automation/gitlab-ci/analyze.yaml 
> b/automation/gitlab-ci/analyze.yaml
> new file mode 100644
> index 00..3d8166572b
> --- /dev/null
> +++ b/automation/gitlab-ci/analyze.yaml
> @@ -0,0 +1,38 @@
> +.eclair-analysis:
> +  stage: analyze
> +  tags:
> +- eclair-analysis
> +  variables:
> +ECLAIR_OUTPUT_DIR: "ECLAIR_out"
> +ANALYSIS_KIND: "normal"
> +ENABLE_ECLAIR_BOT: "n"
> +AUTO_PR_BRANCH: "staging"
> +AUTO_PR_REPOSITORY: "xen-project/xen"
> +  artifacts:
> +when: always
> +paths:
> +  - "${ECLAIR_OUTPUT_DIR}/*.log"
> +  - "${ECLAIR_OUTPUT_DIR}/*.txt"
> +  - '*.log'
> +reports:
> +  codequality: gl-code-quality-report.json
> +
> +eclair-x86_64:
> +  extends: .eclair-analysis
> +  variables:
> +LOGFILE: "eclair-x86_64.log"
> +VARIANT: "X86_64"
> +RULESET: "Set1"
> +  script:
> +- ./automation/scripts/eclair 2>&1 | tee "${LOGFILE}"
> +  allow_failure: true
> +
> +eclair-ARM64:
> +  extends: .eclair-analysis
> +  variables:
> +LOGFILE: "eclair-ARM64.log"
> +VARIANT: "ARM64"
> +RULESET: "Set1"
> +  script:
> +- ./automation/scripts/eclair 2>&1 | tee "${LOGFILE}"
> +  allow_failure: true
> diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
> index c401f62d61..f01e2c32bb 100644
> --- a/automation/gitlab-ci/build.yaml
> +++ b/automation/gitlab-ci/build.yaml
> @@ -11,6 +11,7 @@
>- '*.log'
>- '*/*.log'
>  when: always
> +  needs: []
>except:
>  - master
>  - smoke
> diff --git a/automation/scripts/eclair b/automation/scripts/eclair
> new file mode 100755
> index 00..813a56eb6a
> --- /dev/null
> +++ b/automation/scripts/eclair
> @@ -0,0 +1,34 @@
> +#!/bin/sh -eu
> +
> +ECLAIR_ANALYSIS_DIR=automation/eclair_analysis
> +ECLAIR_DIR="${ECLAIR_ANALYSIS_DIR}/ECLAIR"
> +ECLAIR_OUTPUT_DIR=$(realpath "${ECLAIR_OUTPUT_DIR}")
> +
> +if [ -z "${WTOKEN:-}" ]; then
> +echo "Failure: the WTOKEN variable is not defined." >&2
> +exit 1
> +fi
> +
> +"${ECLAIR_ANALYSIS_DIR}/prepare.sh" "${VARIANT}"
> +
> +ex=0
> +"${ECLAIR_DIR}/analyze.sh" "${VARIANT}" "${RULESET}" || ex=$?
> 

Re: [XEN PATCH v2] xen: use parameter name 'mcs' in arch_do_multicall_call()

2023-07-26 Thread Stefano Stabellini
On Wed, 26 Jul 2023, Federico Serafini wrote:
> Make function declaration and definition consistent using the same
> parameter name ('mcs' do denote a pointer to an 'mc_state').
> This addresses a violation of MISRA C:2012 Rule 8.3: "All declarations
> of an object or function shall use the same names and type qualifiers".
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 

> ---
> Changes in v2:
>   - added forgotten changes about x86.
> ---
>  xen/arch/arm/traps.c| 4 ++--
>  xen/arch/x86/hypercall.c| 6 +++---
>  xen/include/xen/multicall.h | 2 +-
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index d1ef787638..eab2fcfc02 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1470,9 +1470,9 @@ static bool check_multicall_32bit_clean(struct 
> multicall_entry *multi)
>  return true;
>  }
>  
> -enum mc_disposition arch_do_multicall_call(struct mc_state *state)
> +enum mc_disposition arch_do_multicall_call(struct mc_state *mcs)
>  {
> -struct multicall_entry *multi = >call;
> +struct multicall_entry *multi = >call;
>  
>  if ( multi->op >= ARRAY_SIZE(hypercall_args) )
>  {
> diff --git a/xen/arch/x86/hypercall.c b/xen/arch/x86/hypercall.c
> index cc9a6a9d59..01cd73040d 100644
> --- a/xen/arch/x86/hypercall.c
> +++ b/xen/arch/x86/hypercall.c
> @@ -203,15 +203,15 @@ int hypercall_xlat_continuation(unsigned int *id, 
> unsigned int nr,
>  return rc;
>  }
>  
> -enum mc_disposition arch_do_multicall_call(struct mc_state *state)
> +enum mc_disposition arch_do_multicall_call(struct mc_state *mcs)
>  {
>  const struct domain *currd = current->domain;
>  
>  if ( is_pv_domain(currd) )
> -return pv_do_multicall_call(state);
> +return pv_do_multicall_call(mcs);
>  
>  if ( is_hvm_domain(currd) )
> -return hvm_do_multicall_call(state);
> +return hvm_do_multicall_call(mcs);
>  
>  return mc_exit;
>  }
> diff --git a/xen/include/xen/multicall.h b/xen/include/xen/multicall.h
> index ac8238660a..5bebeca7b2 100644
> --- a/xen/include/xen/multicall.h
> +++ b/xen/include/xen/multicall.h
> @@ -25,6 +25,6 @@ enum mc_disposition {
>  mc_continue,
>  mc_exit,
>  mc_preempt,
> -} arch_do_multicall_call(struct mc_state *mc);
> +} arch_do_multicall_call(struct mc_state *mcs);
>  
>  #endif /* __XEN_MULTICALL_H__ */
> -- 
> 2.34.1
> 



Re: [XEN PATCH v3] xen/spinlock: mechanically rename parameter name 'debug'

2023-07-26 Thread Stefano Stabellini
On Wed, 26 Jul 2023, Jan Beulich wrote:
> On 26.07.2023 08:42, Nicola Vetrini wrote:
> > On 26/07/23 08:34, Jan Beulich wrote:
> >> On 25.07.2023 22:45, Nicola Vetrini wrote:
> >>> Rule 5.3 has the following headline:
> >>> "An identifier declared in an inner scope shall not hide an
> >>> identifier declared in an outer scope"
> >>>
> >>> To avoid any confusion resulting from the parameter 'debug'
> >>> hiding the homonymous function declared at
> >>> 'xen/arch/x86/include/asm/processor.h:428'
> >>> the rename of parameters s/debug/lkdbg/ is performed.
> >>>
> >>> Signed-off-by: Nicola Vetrini 
> >>> ---
> >>> Changes in v2:
> >>> - s/dbg/lkdbg/
> >>> Changes in v3:
> >>> - Added missing renames for consistency
> >>
> >> Hmm, you asked whether to send v3, but then you didn't wait for an
> >> answer. So to repeat what I said there: I'd prefer if we could first
> >> settle whether to rename the conflicting x86 symbol.
> >>
> > 
> > Stefano replied asking for a v3 [1] before I had a chance to read your 
> > message this morning.
> 
> Right, sorry, I spotted his reply only after seeing the v3.

For what is worth I prefer the current implementation compared to
renaming debug()



Re: [PATCH v8 11/13] vpci: add initial support for virtual PCI bus topology

2023-07-26 Thread Stewart Hildebrand
On 7/19/23 20:32, Volodymyr Babchuk wrote:
> From: Oleksandr Andrushchenko 
> 
> Assign SBDF to the PCI devices being passed through with bus 0.
> The resulting topology is where PCIe devices reside on the bus 0 of the
> root complex itself (embedded endpoints).
> This implementation is limited to 32 devices which are allowed on
> a single PCI bus.
> 
> Please note, that at the moment only function 0 of a multifunction
> device can be passed through.
> 
> Signed-off-by: Oleksandr Andrushchenko 
> ---
> Since v8:
> - Added write lock in add_virtual_device
> Since v6:
> - re-work wrt new locking scheme
> - OT: add ASSERT(pcidevs_write_locked()); to add_virtual_device()
> Since v5:
> - s/vpci_add_virtual_device/add_virtual_device and make it static
> - call add_virtual_device from vpci_assign_device and do not use
>   REGISTER_VPCI_INIT machinery
> - add pcidevs_locked ASSERT
> - use DECLARE_BITMAP for vpci_dev_assigned_map
> Since v4:
> - moved and re-worked guest sbdf initializers
> - s/set_bit/__set_bit
> - s/clear_bit/__clear_bit
> - minor comment fix s/Virtual/Guest/
> - added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
>   later for counting the number of MMIO handlers required for a guest
>   (Julien)
> Since v3:
>  - make use of VPCI_INIT
>  - moved all new code to vpci.c which belongs to it
>  - changed open-coded 31 to PCI_SLOT(~0)
>  - added comments and code to reject multifunction devices with
>functions other than 0
>  - updated comment about vpci_dev_next and made it unsigned int
>  - implement roll back in case of error while assigning/deassigning devices
>  - s/dom%pd/%pd
> Since v2:
>  - remove casts that are (a) malformed and (b) unnecessary
>  - add new line for better readability
>  - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
> functions are now completely gated with this config
>  - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
> New in v2
> ---
>  xen/drivers/vpci/vpci.c | 72 -
>  xen/include/xen/sched.h |  8 +
>  xen/include/xen/vpci.h  | 11 +++
>  3 files changed, 90 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index ca3505ecb7..baaafe4a2a 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -46,6 +46,16 @@ void vpci_remove_device(struct pci_dev *pdev)
>  return;
> 
>  spin_lock(>vpci->lock);
> +
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +if ( pdev->vpci->guest_sbdf.sbdf != ~0 )
> +{
> +__clear_bit(pdev->vpci->guest_sbdf.dev,
> +>domain->vpci_dev_assigned_map);
> +pdev->vpci->guest_sbdf.sbdf = ~0;
> +}
> +#endif
> +
>  while ( !list_empty(>vpci->handlers) )
>  {
>  struct vpci_register *r = list_first_entry(>vpci->handlers,
> @@ -101,6 +111,10 @@ int vpci_add_handlers(struct pci_dev *pdev)
>  INIT_LIST_HEAD(>vpci->handlers);
>  spin_lock_init(>vpci->lock);
> 
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +pdev->vpci->guest_sbdf.sbdf = ~0;
> +#endif
> +
>  for ( i = 0; i < NUM_VPCI_INIT; i++ )
>  {
>  rc = __start_vpci_array[i](pdev);
> @@ -115,6 +129,54 @@ int vpci_add_handlers(struct pci_dev *pdev)
>  }
> 
>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +static int add_virtual_device(struct pci_dev *pdev)
> +{
> +struct domain *d = pdev->domain;
> +pci_sbdf_t sbdf = { 0 };
> +unsigned long new_dev_number;
> +
> +if ( is_hardware_domain(d) )
> +return 0;
> +
> +ASSERT(pcidevs_locked());
> +
> +/*
> + * Each PCI bus supports 32 devices/slots at max or up to 256 when
> + * there are multi-function ones which are not yet supported.
> + */
> +if ( pdev->info.is_extfn )
> +{
> +gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
> + >sbdf);
> +return -EOPNOTSUPP;
> +}
> +
> +write_lock(>domain->pci_lock);

This should be replaced with an ASSERT, same as the one in vpci_add_handlers() 
above.

The lock is already acquired a few patches before this in the caller in
drivers/passthrough/pci.c:assign_device()

1524 write_lock(>domain->pci_lock);
1525 rc = vpci_assign_device(pdev);
1526 write_unlock(>domain->pci_lock);

> +new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
> + VPCI_MAX_VIRT_DEV);
> +if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
> +{
> +write_unlock(>domain->pci_lock);
> +return -ENOSPC;
> +}
> +
> +__set_bit(new_dev_number, >vpci_dev_assigned_map);
> +
> +/*
> + * Both segment and bus number are 0:
> + *  - we emulate a single host bridge for the guest, e.g. segment 0
> + *  - with bus 0 the virtual devices are seen as embedded
> + *endpoints behind the root complex
> + *
> + * TODO: add support for multi-function devices.
> + */
> +sbdf.devfn = 

[xen-unstable test] 182012: tolerable FAIL - PUSHED

2023-07-26 Thread osstest service owner
flight 182012 xen-unstable real [real]
flight 182026 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/182012/
http://logs.test-lab.xenproject.org/osstest/logs/182026/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-vhd  13 guest-start fail pass in 182026-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-vhd 14 migrate-support-check fail in 182026 never pass
 test-armhf-armhf-xl-vhd 15 saverestore-support-check fail in 182026 never pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 181987
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 181987
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 181987
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 181987
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 181987
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 181987
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 181987
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 181987
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 181987
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 181987
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 181987
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 181987
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  0b1171be87698bc7d14760383c0770aeb6e41dd4
baseline version:
 xen  0c53c638e16278078371ce028c74693841d7738a

Last test of basis   181987  2023-07-24 01:53:36 Z

Re: [PATCH v4 2/2] xen/riscv: introduce identity mapping

2023-07-26 Thread Oleksii
On Wed, 2023-07-26 at 17:59 +0200, Jan Beulich wrote:
> On 26.07.2023 17:54, Oleksii wrote:
> > On Wed, 2023-07-26 at 17:00 +0200, Jan Beulich wrote:
> > > On 26.07.2023 15:12, Oleksii wrote:
> > > > On Wed, 2023-07-26 at 13:58 +0200, Jan Beulich wrote:
> > > > > On 26.07.2023 13:23, Oleksii wrote:
> > > > > > I would like to ask for advice on whether it would be
> > > > > > easier,
> > > > > > less
> > > > > > bug-
> > > > > > provoking ( during identity mapping to remove of whole Xen
> > > > > > ) to
> > > > > > have a
> > > > > > separate identity section that won't be more than
> > > > > > PAGE_SIZE.
> > > > > 
> > > > > I'm afraid you can't safely do this in C, or at least not
> > > > > without
> > > > > further checking on what the compiler actually did.
> > > > > 
> > > > > > @@ -264,6 +268,19 @@ void __init enable_mmu(void)
> > > > > >                RV_STAGE1_MODE << SATP_MODE_SHIFT);
> > > > > >  }
> > > > > >  
> > > > > > +void __attribute__((naked)) __section(".ident")
> > > > > > turn_on_mmu(unsigned
> > > > > > long ra)
> > > > > 
> > > > > Did you read what gcc doc says about "naked"? Extended asm()
> > > > > isn't
> > > > > supported there. Since ...
> > > > > 
> > > > > > +{
> > > > > > +    /* Ensure page table writes precede loading the SATP
> > > > > > */
> > > > > > +    sfence_vma();
> > > > > > +
> > > > > > +    /* Enable the MMU and load the new pagetable for Xen
> > > > > > */
> > > > > > +    csr_write(CSR_SATP,
> > > > > > +              PFN_DOWN((unsigned long)stage1_pgtbl_root) |
> > > > > > +              RV_STAGE1_MODE << SATP_MODE_SHIFT);
> > > > > > +
> > > > > > +    asm volatile( "jr %0\n" : : "r"(ra) );
> > > > > > +}
> > > > > 
> > > > > ... none of this really requires C, I think we're at the
> > > > > point
> > > > > where
> > > > > (iirc) Andrew's and my suggestion wants following, moving
> > > > > this to
> > > > > assembly code (at which point it doesn't need to be a
> > > > > separate
> > > > > function). You can still build page tables in C, of course.
> > > > > (Likely
> > > > > you then also won't need a separate section; some minimal
> > > > > alignment
> > > > > guarantees ought to suffice to make sure the critical code is
> > > > > confined to a single page.)
> > > > 
> > > > Thanks. I'll move all of this to assembly code.
> > > > Regarding alignment it is needed alignment on start and end of
> > > > function:
> > > >     .balign PAGE_SIZE
> > > >     GLOBAL(turn_on_mmu)
> > > >     ...
> > > >     .balign PAGE_SIZE
> > > >     ENDPROC(turn_on_mmu)
> > > > 
> > > > Does the better way exist?
> > > 
> > > The function is only going to be a handful of instructions. Its
> > > alignment doesn't need to be larger than the next power of 2. I
> > > expect you'll be good with 64-byte alignment. (In no case do you
> > > need to align the end of the function: Putting other stuff there
> > > is not a problem at all.) What you want in any event is a build
> > > time check that the within-a-page constraint is met.
> > But shouldn't be an address be aligned to a boundary equal to page
> > size?
> > 
> > According to the RISC-V privileged spec:
> > Any level of PTE may be a leaf PTE, so in addition to 4 KiB pages,
> > Sv39
> > supports 2 MiB megapages
> > and 1 GiB gigapages, each of which must be virtually and physically
> > aligned to a boundary equal
> > to its size. A page-fault exception is raised if the physical
> > address
> > is insufficiently aligned.
> 
> You'd simply map the page containing the chunk, i.e. masking off the
> low 12 bits. If far enough away from the Xen virtual range, you could
> as well map a 2M page masking off the low 21 bits, or a 1G page with
> the low 30 bits of the address cleared.
Agree, then it will work.

But still it doesn't clear what to do if turn_on_mmu will be bigger
then 64 ( ASSERT( (turn_on_mmu_end - turn_on_mmu) <= 64 ) somewhere in
xen.lds.S ). Right now turn_on_mmu() function is 0x22 bytes and it is
enough ( we are sure that we don't cross 4k boundary ) to be 64-byte
aligned. But if the size will be more then 64 bytes then the alignment
need to be changed to 0x128.
Am i right?


~ Oleksii



[PATCH v7 13/15] libxc: Add xc_set_cpufreq_cppc

2023-07-26 Thread Jason Andryuk
Add xc_set_cpufreq_cppc to allow calling xen_systctl_pm_op
SET_CPUFREQ_CPPC.

Signed-off-by: Jason Andryuk 
Acked-by: Anthony PERARD 
---
v2:
Mark xc_set_hwp_para_t const

v4:
s/hwp/cppc/
Add Anthony's Ack

v5:
Remove const and copy back result
---
 tools/include/xenctrl.h |  4 
 tools/libs/ctrl/xc_pm.c | 23 +++
 2 files changed, 27 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2092632296..52f42fb5b6 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1930,11 +1930,15 @@ struct xc_get_cpufreq_para {
 int32_t turbo_enabled;
 };
 
+typedef struct xen_set_cppc_para xc_set_cppc_para_t;
+
 int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 struct xc_get_cpufreq_para *user_para);
 int xc_set_cpufreq_gov(xc_interface *xch, int cpuid, char *govname);
 int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
 int ctrl_type, int ctrl_value);
+int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
+xc_set_cppc_para_t *set_cppc);
 int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq);
 
 int xc_set_sched_opt_smt(xc_interface *xch, uint32_t value);
diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index cea3eab22e..1f267147f6 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -329,6 +329,29 @@ int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
 return xc_sysctl(xch, );
 }
 
+int xc_set_cpufreq_cppc(xc_interface *xch, int cpuid,
+xc_set_cppc_para_t *set_cppc)
+{
+DECLARE_SYSCTL;
+int ret;
+
+if ( !xch )
+{
+errno = EINVAL;
+return -1;
+}
+sysctl.cmd = XEN_SYSCTL_pm_op;
+sysctl.u.pm_op.cmd = SET_CPUFREQ_CPPC;
+sysctl.u.pm_op.cpuid = cpuid;
+sysctl.u.pm_op.u.set_cppc = *set_cppc;
+
+ret = xc_sysctl(xch, );
+
+*set_cppc = sysctl.u.pm_op.u.set_cppc;
+
+return ret;
+}
+
 int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq)
 {
 int ret = 0;
-- 
2.41.0




[PATCH v7 14/15] xenpm: Add set-cpufreq-cppc subcommand

2023-07-26 Thread Jason Andryuk
set-cpufreq-cppc allows setting the Hardware P-State (HWP) parameters.

It can be run on all or just a single cpu.  There are presets of
balance, powersave & performance.  Those can be further tweaked by
param:val arguments as explained in the usage description.

Parameter names are just checked to the first 3 characters to shorten
typing.

Some options are hardware dependent, and ranges can be found in
get-cpufreq-para.

Signed-off-by: Jason Andryuk 
Acked-by: Jan Beulich 
---
v5:
Make i unsigned int
Use local max_cpuid instead of max_cpu_nr
Add Jan's Ack
Check set_params and print a message if activity window wasn't set

v4:
Remove energy bias 0-15 & 7 references
Use MASK_INSR
Fixup { placement
Drop extra case in parse_activity_window
strcmp suffix
Expand help text
s/hwp/cppc/
Use isdigit() to check cpuid - otherwise run on all CPUs.

v2:
Compare provided parameter name and not just 3 characters.
Use "-" in parameter names
Remove hw_
Replace sscanf with strchr & strtoul.
Remove toplevel error message with lower level ones.
Help text s/127/128/
Help text mention truncation.
Avoid some truncation rounding down by adding 5 before division.
Help test mention default microseconds
Also comment the limit check written to avoid overflow.
---
 tools/misc/xenpm.c | 244 +
 1 file changed, 244 insertions(+)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 3abd99fd20..0877f43946 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -16,6 +16,8 @@
  */
 #define MAX_NR_CPU 512
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -67,6 +69,30 @@ void show_help(void)
 " set-max-cstate|'unlimited' [|'unlimited']\n"
 " set the C-State limitation 
( >= 0) and\n"
 " optionally the C-sub-state 
limitation ( >= 0)\n"
+" set-cpufreq-cppc  [cpuid] [balance|performance|powersave] 
*\n"
+" set Hardware P-State (HWP) 
parameters\n"
+" on CPU  or all if 
omitted.\n"
+" optionally a preset of one 
of:\n"
+"   
balance|performance|powersave\n"
+" an optional list of 
param:val arguments\n"
+"   minimum:N (0-255)\n"
+"   maximum:N (0-255)\n"
+"   get-cpufreq-para 
lowest/highest\n"
+"   values are limits 
for\n"
+"   minumum/maximum.\n"
+"   desired:N (0-255)\n"
+"   set explicit 
performance target.\n"
+"   non-zero disables 
auto-HWP mode.\n"
+"   energy-perf:N (0-255)\n"
+"   
energy/performance hint\n"
+"   lower - favor 
performance\n"
+"   higher - favor 
powersave\n"
+"   128 - 
balance\n"
+"   act-window:N{,m,u}s range 
1us-1270s\n"
+"   window for internal 
calculations.\n"
+"   units default to 
\"us\" if unspecified.\n"
+"   truncates 
un-representable values.\n"
+"   0 lets the hardware 
decide.\n"
 " start [seconds] start collect Cx/Px 
statistics,\n"
 " output after CTRL-C or 
SIGINT or several seconds.\n"
 " enable-turbo-mode [cpuid]   enable Turbo Mode for 
processors that support it.\n"
@@ -1292,6 +1318,223 @@ void disable_turbo_mode(int argc, char *argv[])
 errno, strerror(errno));
 }
 
+/*
+ * Parse activity_window:NNN{us,ms,s} and validate range.
+ *
+ * Activity window is a 7bit mantissa (0-127) with a 3bit exponent (0-7) base
+ * 10 in microseconds.  So the range is 1 microsecond to 1270 seconds.  A value
+ * of 0 lets the hardware autonomously select the window.
+ *
+ * Return 0 on success
+ *   -1 on error
+ */
+static int parse_activity_window(xc_set_cppc_para_t *set_cppc, unsigned long u,
+ const char *suffix)
+{
+unsigned int exponent = 0;
+unsigned int multiplier = 1;
+
+if ( suffix && 

[PATCH v7 12/15] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op

2023-07-26 Thread Jason Andryuk
Add SET_CPUFREQ_HWP xen_sysctl_pm_op to set HWP parameters.  The sysctl
supports setting multiple values simultaneously as indicated by the
set_params bits.  This allows atomically applying new HWP configuration
via a single wrmsr.

XEN_SYSCTL_HWP_SET_PRESET_BALANCE/PERFORMANCE/POWERSAVE provide three
common presets.  Setting them depends on hardware limits which the
hypervisor is already caching.  So using them allows skipping a
hypercall to query the limits (lowest/highest) to then set those same
values.  The code is organized to allow a preset to be refined with
additional parameters if desired.

"most_efficient" and "guaranteed" could be additional presets in the
future, but the are not added now.  Those levels can change at runtime,
but we don't have code in place to monitor and update for those events.

Since activity window may not be supported by all hardware, omit writing
it when not supported, and return that fact to userspace by updating
set_params.

CPPC parameter checking disallows setting reserved bytes and ensure
values are only non-zero when the corresponding set_params bit is set.
There is no range checking (0-255 is allowed) since hardware is
documented to clip internally.

Signed-off-by: Jason Andryuk 
Reviewed-by: Jan Beulich 
---
v7:
Fix brace placement
Add Jan's R-b

v6:
Add () around bit test
Add U suffix to XEN_SYSCTL_ constants
Change -EINVAL returns to -ENOENT/-EOPNOTSUPP
Return success for no-op set_hwp_para()

v5:
Use cpufreq.h for include
Add () around bit test
Fix Configure typo
Drop duplicated energy_perf comment
Define XEN_SYSCTL_CPPC_ACT_WINDOW_MASK from *_MANTISSA_MASK and *_EXPONENT_MASK
Re-order XEN_SYSCTL_CPPC_SET_* flags to match field and hardware order.
Remove const from set_cppc param to update set_params
Skip Activity Window if not supported by hardware and clear set_params
Make parameter parsing consistent
Add an exit path when there are no parameters to write.
Expand the header file to cover the IN/OUT set_params.
Remove the "desired" lowest/highest checking as hardware clips internally

v4:
Remove IA32_ENERGY_BIAS support
Validate parameters don't exceed 255
Use CPPC/cppc name
set_cppc_para() add const
set_cppc_para() return hwp_cpufreq_target()
Expand sysctl comments

v3:
Remove cpufreq_governor_internal from set_cpufreq_hwp

v2:
Update for naming anonymous union
Drop hwp_err for invalid input in set_hwp_para()
Drop uint16_t cast in XEN_SYSCTL_HWP_SET_PARAM_MASK
Drop parens for HWP_SET_PRESET defines
Reference activity_window format comment
Place SET_CPUFREQ_HWP after SET_CPUFREQ_PARA
Add {HWP,IA32}_ENERGY_PERF_MAX_{PERFORMANCE,POWERSAVE} defines
Order defines before fields in sysctl.h
Use XEN_HWP_GOVERNOR
Use per_cpu for hwp_drv_data
---
 xen/arch/x86/acpi/cpufreq/hwp.c| 102 +
 xen/drivers/acpi/pmstat.c  |  17 +
 xen/include/acpi/cpufreq/cpufreq.h |   2 +
 xen/include/public/sysctl.h|  64 ++
 4 files changed, 185 insertions(+)

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 2cbfa1ce33..f6d445a8eb 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -21,6 +21,10 @@ static bool __ro_after_init feature_hdc;
 
 static bool __ro_after_init opt_cpufreq_hdc = true;
 
+#define HWP_ENERGY_PERF_MAX_PERFORMANCE 0
+#define HWP_ENERGY_PERF_BALANCE 0x80
+#define HWP_ENERGY_PERF_MAX_POWERSAVE   0xff
+
 union hwp_request
 {
 struct
@@ -542,6 +546,104 @@ int get_hwp_para(unsigned int cpu,
 return 0;
 }
 
+int set_hwp_para(struct cpufreq_policy *policy,
+ struct xen_set_cppc_para *set_cppc)
+{
+unsigned int cpu = policy->cpu;
+struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
+bool cleared_act_window = false;
+
+if ( data == NULL )
+return -ENOENT;
+
+/* Validate all parameters - Disallow reserved bits. */
+if ( set_cppc->minimum > 255 ||
+ set_cppc->maximum > 255 ||
+ set_cppc->desired > 255 ||
+ set_cppc->energy_perf > 255 ||
+ (set_cppc->set_params & ~XEN_SYSCTL_CPPC_SET_PARAM_MASK) ||
+ (set_cppc->activity_window & ~XEN_SYSCTL_CPPC_ACT_WINDOW_MASK) )
+return -EINVAL;
+
+/* Only allow values if params bit is set. */
+if ( (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED) &&
+  set_cppc->desired) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM) &&
+  set_cppc->minimum) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM) &&
+  set_cppc->maximum) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF) &&
+  set_cppc->energy_perf) ||
+ (!(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW) &&
+  set_cppc->activity_window) )
+return -EINVAL;
+
+/* Clear out activity window if lacking HW supported. */
+if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW) &&
+ 

[PATCH v7 15/15] CHANGELOG: Add Intel HWP entry

2023-07-26 Thread Jason Andryuk
Signed-off-by: Jason Andryuk 
Acked-by: Henry Wang 
---
v3:
Position under existing Added section
Add Henry's Ack

v2:
Add blank line
---
 CHANGELOG.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7d7e0590f8..8d6e6c3088 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -24,7 +24,7 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
  - xl/libxl can customize SMBIOS strings for HVM guests.
  - Add support for AVX512-FP16 on x86.
  - On Arm, Xen supports guests running SVE/SVE2 instructions. (Tech Preview)
-
+ - Add Intel Hardware P-States (HWP) cpufreq driver.
 
 ## 
[4.17.0](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.17.0) 
- 2022-12-12
 
-- 
2.41.0




[PATCH v7 11/15] xenpm: Print HWP/CPPC parameters

2023-07-26 Thread Jason Andryuk
Print HWP-specific parameters.  Some are always present, but others
depend on hardware support.

Signed-off-by: Jason Andryuk 
Reviewed-by: Jan Beulich 
---
v2:
Style fixes
Declare i outside loop
Replace repearted hardware/configured limits with spaces
Fixup for hw_ removal
Use XEN_HWP_GOVERNOR
Use HWP_ACT_WINDOW_EXPONENT_*
Remove energy_perf hw autonomous - 0 doesn't mean autonomous

v4:
Return activity_window from calculate_hwp_activity_window
Use blanks instead of _ in output
Use MASK_EXTR
Check XEN_HWP_DRIVER name since governor is no longer returned
s/hwp/cppc

v5:
Add Jan's Reviewed-by
---
 tools/misc/xenpm.c | 66 ++
 1 file changed, 66 insertions(+)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 21c93386de..3abd99fd20 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -708,6 +708,46 @@ void start_gather_func(int argc, char *argv[])
 pause();
 }
 
+static unsigned int calculate_activity_window(const xc_cppc_para_t *cppc,
+  const char **units)
+{
+unsigned int mantissa = MASK_EXTR(cppc->activity_window,
+  XEN_CPPC_ACT_WINDOW_MANTISSA_MASK);
+unsigned int exponent = MASK_EXTR(cppc->activity_window,
+  XEN_CPPC_ACT_WINDOW_EXPONENT_MASK);
+unsigned int multiplier = 1;
+unsigned int i;
+
+/*
+ * SDM only states a 0 register is hardware selected, and doesn't mention
+ * a 0 mantissa with a non-0 exponent.  Only special case a 0 register.
+ */
+if ( cppc->activity_window == 0 )
+{
+*units = "hardware selected";
+
+return 0;
+}
+
+if ( exponent >= 6 )
+{
+*units = "s";
+exponent -= 6;
+}
+else if ( exponent >= 3 )
+{
+*units = "ms";
+exponent -= 3;
+}
+else
+*units = "us";
+
+for ( i = 0; i < exponent; i++ )
+multiplier *= 10;
+
+return mantissa * multiplier;
+}
+
 /* print out parameters about cpu frequency */
 static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para 
*p_cpufreq)
 {
@@ -772,6 +812,32 @@ static void print_cpufreq_para(int cpuid, struct 
xc_get_cpufreq_para *p_cpufreq)
p_cpufreq->u.s.scaling_min_freq,
p_cpufreq->u.s.scaling_cur_freq);
 }
+else
+{
+const xc_cppc_para_t *cppc = _cpufreq->u.cppc_para;
+
+printf("cppc variables   :\n");
+printf("  hardware limits: lowest [%u] lowest nonlinear [%u]\n",
+   cppc->lowest, cppc->lowest_nonlinear);
+printf(" : nominal [%u] highest [%u]\n",
+   cppc->nominal, cppc->highest);
+printf("  configured limits  : min [%u] max [%u] energy perf [%u]\n",
+   cppc->minimum, cppc->maximum, cppc->energy_perf);
+
+if ( cppc->features & XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW )
+{
+unsigned int activity_window;
+const char *units;
+
+activity_window = calculate_activity_window(cppc, );
+printf(" : activity_window [%u %s]\n",
+   activity_window, units);
+}
+
+printf(" : desired [%u%s]\n",
+   cppc->desired,
+   cppc->desired ? "" : " hw autonomous");
+}
 
 printf("turbo mode   : %s\n",
p_cpufreq->turbo_enabled ? "enabled" : "disabled or n/a");
-- 
2.41.0




[PATCH v7 10/15] libxc: Include cppc_para in definitions

2023-07-26 Thread Jason Andryuk
Expose the cppc_para fields through libxc.

Signed-off-by: Jason Andryuk 
Acked-by: Anthony PERARD 
---
v4:
Rename hwp to cppc
Add Anthony's Ack
---
 tools/include/xenctrl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 8aedb952a0..2092632296 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1892,6 +1892,7 @@ int xc_smt_disable(xc_interface *xch);
  */
 typedef struct xen_userspace xc_userspace_t;
 typedef struct xen_ondemand xc_ondemand_t;
+typedef struct xen_cppc_para xc_cppc_para_t;
 
 struct xc_get_cpufreq_para {
 /* IN/OUT variable */
@@ -1923,6 +1924,7 @@ struct xc_get_cpufreq_para {
 xc_ondemand_t ondemand;
 } u;
 } s;
+xc_cppc_para_t cppc_para;
 } u;
 
 int32_t turbo_enabled;
-- 
2.41.0




[PATCH v7 09/15] cpufreq: Export HWP parameters to userspace as CPPC

2023-07-26 Thread Jason Andryuk
Extend xen_get_cpufreq_para to return hwp parameters.  HWP is an
implementation of ACPI CPPC (Collaborative Processor Performance
Control).  Use the CPPC name since that might be useful in the future
for AMD P-state.

We need the features bitmask to indicate fields supported by the actual
hardware - this only applies to activity window for the time being.

The HWP most_efficient is mapped to CPPC lowest_nonlinear, and guaranteed is
mapped to nominal.  CPPC has a guaranteed that is optional while nominal
is required.  ACPI spec says "If this register is not implemented, OSPM
assumes guaranteed performance is always equal to nominal performance."

Signed-off-by: Jason Andryuk 
Acked-by: Jan Beulich 
---
v2:
Style fixes
Don't bump XEN_SYSCTL_INTERFACE_VERSION
Drop cpufreq.h comment divider
Expand xen_hwp_para comment
Add HWP activity window mantissa/exponent defines
Handle union rename
Add const to get_hwp_para
Remove hw_ prefix from xen_hwp_para members
Use XEN_HWP_GOVERNOR
Use per_cpu for hwp_drv_data

v4:
Fixup for opt_cpufreq_hwp/hdc removal
get_hwp_para() takes cpu as arg
XEN_ prefix HWP_ACT_WINDOW_*
Drop HWP_ACT_WINDOW_EXPONENT_SHIFT - shift MASK
Remove Energy Bias (0-15) EPP fallback
Rename xen_hwp_para to xen_cppc_para
s/hwp/cppc/
Use scaling driver to switch output

v5:
Use XEN_HWP_DRIVER_NAME
Use cpufreq.h for declarations
Fixup some comments
Drop const from unsigned int cpu
Drop some unnecessary { }
Use strncmp
Switch sizeof(char) to sizeof(*scaling_available_governors)
Reindent copy_to_guest call
Add "HWP: " prefix to sysctl comments for cppc->hwp mapping

v6:
Drop commit message uint8_t paragraph
sysctl.h s/They/They're/
Fix indent and add parens for read_scaling_available_governors() args
Add Jan's Ack
---
 xen/arch/x86/acpi/cpufreq/hwp.c| 23 ++
 xen/drivers/acpi/pmstat.c  | 74 --
 xen/include/acpi/cpufreq/cpufreq.h |  2 +
 xen/include/public/sysctl.h| 56 ++
 4 files changed, 121 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 33039ad39e..2cbfa1ce33 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -519,6 +519,29 @@ static const struct cpufreq_driver __initconstrel 
hwp_cpufreq_driver =
 .update = hwp_cpufreq_update,
 };
 
+int get_hwp_para(unsigned int cpu,
+ struct xen_cppc_para *cppc_para)
+{
+const struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
+
+if ( data == NULL )
+return -ENODATA;
+
+cppc_para->features =
+(feature_hwp_activity_window ? XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW : 0);
+cppc_para->lowest   = data->hw.lowest;
+cppc_para->lowest_nonlinear = data->hw.most_efficient;
+cppc_para->nominal  = data->hw.guaranteed;
+cppc_para->highest  = data->hw.highest;
+cppc_para->minimum  = data->minimum;
+cppc_para->maximum  = data->maximum;
+cppc_para->desired  = data->desired;
+cppc_para->energy_perf  = data->energy_perf;
+cppc_para->activity_window  = data->activity_window;
+
+return 0;
+}
+
 int __init hwp_register_driver(void)
 {
 int ret;
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index d67d99e62f..e4e2df94a7 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -251,46 +251,52 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
 else
 strlcpy(op->u.get_para.scaling_driver, "Unknown", CPUFREQ_NAME_LEN);
 
-if ( !(scaling_available_governors =
-   xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) )
-return -ENOMEM;
-if ( (ret = read_scaling_available_governors(scaling_available_governors,
-gov_num * CPUFREQ_NAME_LEN * sizeof(char))) )
+if ( !strncmp(op->u.get_para.scaling_driver, XEN_HWP_DRIVER_NAME,
+  CPUFREQ_NAME_LEN) )
+ret = get_hwp_para(policy->cpu, >u.get_para.u.cppc_para);
+else
 {
+if ( !(scaling_available_governors =
+   xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) )
+return -ENOMEM;
+if ( (ret = read_scaling_available_governors(
+scaling_available_governors,
+(gov_num * CPUFREQ_NAME_LEN *
+ sizeof(*scaling_available_governors )
+{
+xfree(scaling_available_governors);
+return ret;
+}
+ret = copy_to_guest(op->u.get_para.scaling_available_governors,
+scaling_available_governors,
+gov_num * CPUFREQ_NAME_LEN);
 xfree(scaling_available_governors);
-return ret;
-}
-ret = copy_to_guest(op->u.get_para.scaling_available_governors,
-scaling_available_governors, gov_num * CPUFREQ_NAME_LEN);
-xfree(scaling_available_governors);
-if ( ret )
-return 

[PATCH v7 08/15] xenpm: Change get-cpufreq-para output for hwp

2023-07-26 Thread Jason Andryuk
When using HWP, some of the returned data is not applicable.  In that
case, we should just omit it to avoid confusing the user.  So switch to
printing the base and max frequencies since those are relevant to HWP.
Similarly, stop printing the CPU frequencies since those do not apply.
The scaling fields are also no longer printed.

Signed-off-by: Jason Andryuk 
Acked-by: Jan Beulich 
---
v5:
Use XEN_HWP_DRIVER_NAME
Add Jan's Ack

v4:
s/turbo/max/
Check for XEN_HWP_DRIVER driver instead of "-internal"

v2:
Use full governor name XEN_HWP_GOVERNOR to change output
Style fixes
---
 tools/misc/xenpm.c | 83 +-
 1 file changed, 46 insertions(+), 37 deletions(-)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 1c474c3b59..21c93386de 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -711,6 +711,7 @@ void start_gather_func(int argc, char *argv[])
 /* print out parameters about cpu frequency */
 static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para 
*p_cpufreq)
 {
+bool hwp = strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) == 0;
 int i;
 
 printf("cpu id   : %d\n", cpuid);
@@ -720,49 +721,57 @@ static void print_cpufreq_para(int cpuid, struct 
xc_get_cpufreq_para *p_cpufreq)
 printf(" %d", p_cpufreq->affected_cpus[i]);
 printf("\n");
 
-printf("cpuinfo frequency: max [%u] min [%u] cur [%u]\n",
-   p_cpufreq->cpuinfo_max_freq,
-   p_cpufreq->cpuinfo_min_freq,
-   p_cpufreq->cpuinfo_cur_freq);
+if ( hwp )
+printf("cpuinfo frequency: base [%u] max [%u]\n",
+   p_cpufreq->cpuinfo_min_freq,
+   p_cpufreq->cpuinfo_max_freq);
+else
+printf("cpuinfo frequency: max [%u] min [%u] cur [%u]\n",
+   p_cpufreq->cpuinfo_max_freq,
+   p_cpufreq->cpuinfo_min_freq,
+   p_cpufreq->cpuinfo_cur_freq);
 
 printf("scaling_driver   : %s\n", p_cpufreq->scaling_driver);
 
-printf("scaling_avail_gov: %s\n",
-   p_cpufreq->scaling_available_governors);
-
-printf("current_governor : %s\n", p_cpufreq->u.s.scaling_governor);
-if ( !strncmp(p_cpufreq->u.s.scaling_governor,
-  "userspace", CPUFREQ_NAME_LEN) )
-{
-printf("  userspace specific :\n");
-printf("scaling_setspeed : %u\n",
-   p_cpufreq->u.s.u.userspace.scaling_setspeed);
-}
-else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
-   "ondemand", CPUFREQ_NAME_LEN) )
+if ( !hwp )
 {
-printf("  ondemand specific  :\n");
-printf("sampling_rate: max [%u] min [%u] cur [%u]\n",
-   p_cpufreq->u.s.u.ondemand.sampling_rate_max,
-   p_cpufreq->u.s.u.ondemand.sampling_rate_min,
-   p_cpufreq->u.s.u.ondemand.sampling_rate);
-printf("up_threshold : %u\n",
-   p_cpufreq->u.s.u.ondemand.up_threshold);
-}
+printf("scaling_avail_gov: %s\n",
+   p_cpufreq->scaling_available_governors);
 
-printf("scaling_avail_freq   :");
-for ( i = 0; i < p_cpufreq->freq_num; i++ )
-if ( p_cpufreq->scaling_available_frequencies[i] ==
- p_cpufreq->u.s.scaling_cur_freq )
-printf(" *%d", p_cpufreq->scaling_available_frequencies[i]);
-else
-printf(" %d", p_cpufreq->scaling_available_frequencies[i]);
-printf("\n");
+printf("current_governor : %s\n", p_cpufreq->u.s.scaling_governor);
+if ( !strncmp(p_cpufreq->u.s.scaling_governor,
+  "userspace", CPUFREQ_NAME_LEN) )
+{
+printf("  userspace specific :\n");
+printf("scaling_setspeed : %u\n",
+   p_cpufreq->u.s.u.userspace.scaling_setspeed);
+}
+else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
+   "ondemand", CPUFREQ_NAME_LEN) )
+{
+printf("  ondemand specific  :\n");
+printf("sampling_rate: max [%u] min [%u] cur [%u]\n",
+   p_cpufreq->u.s.u.ondemand.sampling_rate_max,
+   p_cpufreq->u.s.u.ondemand.sampling_rate_min,
+   p_cpufreq->u.s.u.ondemand.sampling_rate);
+printf("up_threshold : %u\n",
+   p_cpufreq->u.s.u.ondemand.up_threshold);
+}
+
+printf("scaling_avail_freq   :");
+for ( i = 0; i < p_cpufreq->freq_num; i++ )
+if ( p_cpufreq->scaling_available_frequencies[i] ==
+ p_cpufreq->u.s.scaling_cur_freq )
+printf(" *%d", p_cpufreq->scaling_available_frequencies[i]);
+else
+printf(" %d", p_cpufreq->scaling_available_frequencies[i]);
+printf("\n");
 
-printf("scaling frequency: max [%u] min [%u] cur [%u]\n",
-   p_cpufreq->u.s.scaling_max_freq,
-   

[PATCH v7 07/15] xen/x86: Tweak PDC bits when using HWP

2023-07-26 Thread Jason Andryuk
Qubes testing of HWP support had a report of a laptop, Thinkpad X1
Carbon Gen 4 with a Skylake processor, locking up during boot when HWP
is enabled.  A user found a kernel bug that seems to be the same issue:
https://bugzilla.kernel.org/show_bug.cgi?id=110941.

That bug was fixed by Linux commit a21211672c9a ("ACPI / processor:
Request native thermal interrupt handling via _OSC").  The tl;dr is SMM
crashes when it receives thermal interrupts, so Linux calls the ACPI
_OSC method to take over interrupt handling.

The Linux fix looks at the CPU features to decide whether or not to call
_OSC with bit 12 set to take over native interrupt handling.  Xen needs
some way to communicate HWP to Dom0 for making an equivalent call.

Xen exposes modified PDC bits via the platform_op set_pminfo hypercall.
Expand that to set bit 12 when HWP is present and in use.

Any generated interrupt would be handled by Xen's thermal drive, which
clears the status.

Bit 12 isn't named in the linux header and is open coded in Linux's
usage.  Name it ACPI_PDC_CPPC_NATIVE_INTR.

This will need a corresponding linux patch to pick up and apply the PDC
bits.

Signed-off-by: Jason Andryuk 
Reviewed-by: Jan Beulich 
---
v5:
Minor fixup for feature_hwp removal
Use cpurfreq.h for declaration

v4:
Added __ro_after_init
s/ACPI_PDC_CPPC_NTV_INT/ACPI_PDC_CPPC_NATIVE_INTR/
Remove _IA32_
Fixup for opt_cpufreq_hwp removal
Add Jan Reviewed-by

v3:
New
---
 xen/arch/x86/acpi/cpufreq/hwp.c  | 9 +
 xen/arch/x86/acpi/lib.c  | 5 +
 xen/arch/x86/cpu/mcheck/mce_intel.c  | 6 ++
 xen/arch/x86/include/asm/msr-index.h | 1 +
 xen/include/acpi/cpufreq/cpufreq.h   | 1 +
 xen/include/acpi/pdc_intel.h | 1 +
 6 files changed, 23 insertions(+)

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index e9a6d23e38..33039ad39e 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -12,6 +12,8 @@
 #include 
 #include 
 
+static bool __ro_after_init hwp_in_use;
+
 static bool __ro_after_init feature_hwp_notification;
 static bool __ro_after_init feature_hwp_activity_window;
 
@@ -152,6 +154,11 @@ static int __init cf_check cpufreq_gov_hwp_init(void)
 }
 __initcall(cpufreq_gov_hwp_init);
 
+bool hwp_active(void)
+{
+return hwp_in_use;
+}
+
 static bool __init hwp_available(void)
 {
 unsigned int eax;
@@ -204,6 +211,8 @@ static bool __init hwp_available(void)
 hwp_verbose("HW_FEEDBACK %ssupported\n",
 (eax & CPUID6_EAX_HW_FEEDBACK) ? "" : "not ");
 
+hwp_in_use = true;
+
 hwp_info("Using HWP for cpufreq\n");
 
 return true;
diff --git a/xen/arch/x86/acpi/lib.c b/xen/arch/x86/acpi/lib.c
index 43831b92d1..51cb082ca0 100644
--- a/xen/arch/x86/acpi/lib.c
+++ b/xen/arch/x86/acpi/lib.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 
+#include 
+
 u32 __read_mostly acpi_smi_cmd;
 u8 __read_mostly acpi_enable_value;
 u8 __read_mostly acpi_disable_value;
@@ -140,5 +142,8 @@ int arch_acpi_set_pdc_bits(u32 acpi_id, u32 *pdc, u32 mask)
!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
pdc[2] &= ~(ACPI_PDC_C_C1_FFH | ACPI_PDC_C_C2C3_FFH);
 
+   if (hwp_active())
+   pdc[2] |= ACPI_PDC_CPPC_NATIVE_INTR;
+
return 0;
 }
diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c 
b/xen/arch/x86/cpu/mcheck/mce_intel.c
index 2f23f02923..4045c6591d 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -15,6 +15,9 @@
 #include 
 #include 
 #include 
+
+#include 
+
 #include "mce.h"
 #include "x86_mca.h"
 #include "barrier.h"
@@ -64,6 +67,9 @@ static void cf_check intel_thermal_interrupt(struct 
cpu_user_regs *regs)
 
 ack_APIC_irq();
 
+if ( hwp_active() )
+wrmsr_safe(MSR_HWP_STATUS, 0);
+
 if ( NOW() < per_cpu(next, cpu) )
 return;
 
diff --git a/xen/arch/x86/include/asm/msr-index.h 
b/xen/arch/x86/include/asm/msr-index.h
index 56f93cb8e2..f2410fc8c8 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -157,6 +157,7 @@
 #define MSR_HWP_CAPABILITIES0x0771
 #define MSR_HWP_INTERRUPT   0x0773
 #define MSR_HWP_REQUEST 0x0774
+#define MSR_HWP_STATUS  0x0777
 
 #define MSR_X2APIC_FIRST0x0800
 #define MSR_X2APIC_LAST 0x08ff
diff --git a/xen/include/acpi/cpufreq/cpufreq.h 
b/xen/include/acpi/cpufreq/cpufreq.h
index cfe975814e..6737798e25 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -254,5 +254,6 @@ void intel_feature_detect(struct cpufreq_policy *policy);
 
 int hwp_cmdline_parse(const char *s, const char *e);
 int hwp_register_driver(void);
+bool hwp_active(void);
 
 #endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/acpi/pdc_intel.h b/xen/include/acpi/pdc_intel.h
index 4fb719d6f5..abaa098b51 100644
--- a/xen/include/acpi/pdc_intel.h
+++ 

[PATCH v7 06/15] cpufreq: Add Hardware P-State (HWP) driver

2023-07-26 Thread Jason Andryuk
>From the Intel SDM: "Hardware-Controlled Performance States (HWP), which
autonomously selects performance states while utilizing OS supplied
performance guidance hints."

Enable HWP to run in autonomous mode by poking the correct MSRs.  HWP is
disabled by default, and cpufreq=hwp enables it.

cpufreq= parsing is expanded to allow cpufreq=hwp;xen.  This allows
trying HWP and falling back to xen if not available.  Only hwp and xen
are supported for this fallback feature.  hdc is a sub-option under hwp
(i.e.  cpufreq=hwp,hdc=0) as is verbose.

There is no interface to configure - xen_sysctl_pm_op/xenpm will
be extended to configure in subsequent patches.  It will run with the
default values, which should be the default 0x80 (out of 0x0-0xff)
energy/performance preference.

Unscientific powertop measurement of an mostly idle, customized OpenXT
install:
A 10th gen 6-core laptop showed battery discharge drop from ~9.x to
~7.x watts.
A 8th gen 4-core laptop dropped from ~10 to ~9

Power usage depends on many factors, especially display brightness, but
this does show a power saving in balanced mode when CPU utilization is
low.

HWP isn't compatible with an external governor - it doesn't take
explicit frequency requests.  Therefore a minimal internal governor,
hwp, is also added as a placeholder.

While adding to the xen-command-line.pandoc entry, un-nest verbose from
minfreq.  They are independent.

With cpufreq=hwp,verbose, HWP prints processor capabilities that are not
used by the code, like HW_FEEDBACK.  This is done because otherwise
there isn't a convenient way to query the information.

Xen doesn't use the HWP interrupt, so it is disabled like in the Linux
pstate driver.

Signed-off-by: Jason Andryuk 
---
We disable on cpuid_level < 0x16.  cpuid(0x16) is used to get the cpu
frequencies for calculating the APERF/MPERF.  Without it, things would
still work, but the average cpu frequency output would be wrong.

My 8th & 10th gen test systems both report:
(XEN) HWP: 1 notify: 1 act_window: 1 energy_perf: 1 pkg_level: 0 peci: 0
(XEN) HWP: Hardware Duty Cycling (HDC) supported
(XEN) HWP: HW_FEEDBACK not supported

We can't use parse_boolean() since it requires a single name=val string
and cpufreq_handle_common_option is provided two strings.  Use
parse_bool() and manual handle no-hwp.

FAST_IA32_HWP_REQUEST was removed in v2.  The check in v1 was wrong,
it's a model specific feature and the CPUID bit is only available
after enabling via the MSR.  Support was untested since I don't have
hardware with the feature.  Writes are expected to be infrequent, so
just leave it out.

---
v2:
Alphabetize headers
Re-work driver registration
name hwp_drv_data anonymous union "hw"
Drop hwp_verbose_cont
style cleanups
Condense hwp_governor switch
hwp_cpufreq_target remove .raw from hwp_req assignment
Use typed-pointer in a few functions
Pass type to xzalloc
Add HWP_ENERGY_PERF_BALANCE/IA32_ENERGY_BIAS_BALANCE defines
Add XEN_HWP_GOVERNOR define for "hwp-internal"
Capitalize CPUID and MSR defines
Change '_' to '-' for energy-perf & act-window
Read-modify-write MSRs updates
Use FAST_IA32_HWP_REQUEST_MSR_ENABLE define
constify pointer in hwp_set_misc_turbo
Add space after non-fallthrough break in governor switch
Add IA32_ENERGY_BIAS_MASK define
Check CPUID_PM_LEAK for energy bias when needed
Fail initialization with curr_req = -1
Fold hwp_read_capabilities into hwp_init_msrs
Add command line cpufreq=xen:hwp
Add command line cpufreq=xen:hdc
Use per_cpu for hwp_drv_data pointers
Move hwp_energy_perf_bias call into hwp_write_request
energy_perf 0 is valid, so hwp_energy_perf_bias cannot be skipped
Ensure we don't generate interrupts
Remove Fast Write of Uncore MSR
Initialize hwp_drv_data from curr_req
Use SPDX line instead of license text in hwp.c

v3:
Add cf_check to cpufreq_gov_hwp_init() - Marek
Print cpuid_level with %#x - Marek

v4:
Use BIT() for CPUID and MSR bits
Move __initdata after type
Add __ro_after_init to feature_*
Remove aperf/mperf comment
Move feature_hwp_energy_perf { to newline
Remove _IA32_ infix
Use unsigned int & bool for bitfields
Require energy perf pref (Remove ENERGY_PERF_BIAS support)
Initialize activity_window
Return errors on wrmsr failure
Change command line to: cpufreq=xen:hwp
Move hdc into the hwp-specific handle_options
Drop feature_hwp_energy_perf, feature_hwp_pkg_level_ctl & feature_hwp_peci
Print features before exiting when energy/performance preference isn't available
Disable HWP MSR on initialization error
Change hwp_ print macros to add prefixes
Disable HDC when hdc=0 - (opt_hdc no longer initdata)
Mark hwp governor internal and use "hwp" name
Add XEN_HWP_DRIVER
Use top-level cpufreq=hwp command line option
Document that cpufreq=hwp falls back to cpufreq=xen without hardware
Add SPDX suffix GPL-2.0-only

v5:
Use _AC() macro in MSR_IA32_MISC_ENABLE_TURBO_DISENGAGE definition
hwp_err arg re-ordering
Use XEN_HWP_DRIVER_NAME
Use cpufreq.h for all declarations
Clear feature_hdc on failure and print 

[PATCH v7 05/15] pmstat: Re-arrage for cpufreq union

2023-07-26 Thread Jason Andryuk
Rearrange code now that xen_sysctl_pm_op's get_para fields has the
nested union and struct.  In particular, the scaling governor
information like scaling_available_governors is inside the union, so it
is not always available.  Move those fields (op->u.get_para.u.s.u.*)
together as well as the common fields (ones outside the union like
op->u.get_para.turbo_enabled).

With that, gov_num may be 0, so bounce buffer handling needs
to be modified.

scaling_governor and other fields inside op->u.get_para.u.s.u.* won't be
used for hwp, so this will simplify the change when hwp support is
introduced and re-indents these lines all together.

Signed-off-by: Jason Andryuk 
Acked-by: Jan Beulich 
---
v6:
Add Jan's Ack

v5:
Remove some formatting changes
Expand commit message
---
 tools/libs/ctrl/xc_pm.c   | 12 
 tools/misc/xenpm.c|  3 ++-
 xen/drivers/acpi/pmstat.c | 24 
 3 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index 6e751e242f..cea3eab22e 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -221,7 +221,7 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 {
 if ( (!user_para->affected_cpus)||
  (!user_para->scaling_available_frequencies)||
- (!user_para->scaling_available_governors) )
+ (user_para->gov_num && !user_para->scaling_available_governors) )
 {
 errno = EINVAL;
 return -1;
@@ -230,12 +230,15 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 goto unlock_1;
 if ( xc_hypercall_bounce_pre(xch, scaling_available_frequencies) )
 goto unlock_2;
-if ( xc_hypercall_bounce_pre(xch, scaling_available_governors) )
+if ( user_para->gov_num &&
+ xc_hypercall_bounce_pre(xch, scaling_available_governors) )
 goto unlock_3;
 
 set_xen_guest_handle(sys_para->affected_cpus, affected_cpus);
 set_xen_guest_handle(sys_para->scaling_available_frequencies, 
scaling_available_frequencies);
-set_xen_guest_handle(sys_para->scaling_available_governors, 
scaling_available_governors);
+if ( user_para->gov_num )
+set_xen_guest_handle(sys_para->scaling_available_governors,
+ scaling_available_governors);
 }
 
 sysctl.cmd = XEN_SYSCTL_pm_op;
@@ -278,7 +281,8 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 }
 
 unlock_4:
-xc_hypercall_bounce_post(xch, scaling_available_governors);
+if ( user_para->gov_num )
+xc_hypercall_bounce_post(xch, scaling_available_governors);
 unlock_3:
 xc_hypercall_bounce_post(xch, scaling_available_frequencies);
 unlock_2:
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index ee8ce5d5f2..1c474c3b59 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -811,7 +811,8 @@ static int show_cpufreq_para_by_cpuid(xc_interface 
*xc_handle, int cpuid)
 ret = -ENOMEM;
 goto out;
 }
-if (!(p_cpufreq->scaling_available_governors =
+if (p_cpufreq->gov_num &&
+!(p_cpufreq->scaling_available_governors =
   malloc(p_cpufreq->gov_num * CPUFREQ_NAME_LEN * sizeof(char
 {
 fprintf(stderr,
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index f5a9ac3f1a..d67d99e62f 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -239,6 +239,18 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
 if ( ret )
 return ret;
 
+op->u.get_para.cpuinfo_cur_freq =
+cpufreq_driver.get ? cpufreq_driver.get(op->cpuid) : policy->cur;
+op->u.get_para.cpuinfo_max_freq = policy->cpuinfo.max_freq;
+op->u.get_para.cpuinfo_min_freq = policy->cpuinfo.min_freq;
+op->u.get_para.turbo_enabled = cpufreq_get_turbo_status(op->cpuid);
+
+if ( cpufreq_driver.name[0] )
+strlcpy(op->u.get_para.scaling_driver,
+cpufreq_driver.name, CPUFREQ_NAME_LEN);
+else
+strlcpy(op->u.get_para.scaling_driver, "Unknown", CPUFREQ_NAME_LEN);
+
 if ( !(scaling_available_governors =
xzalloc_array(char, gov_num * CPUFREQ_NAME_LEN)) )
 return -ENOMEM;
@@ -254,21 +266,10 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
 if ( ret )
 return ret;
 
-op->u.get_para.cpuinfo_cur_freq =
-cpufreq_driver.get ? cpufreq_driver.get(op->cpuid) : policy->cur;
-op->u.get_para.cpuinfo_max_freq = policy->cpuinfo.max_freq;
-op->u.get_para.cpuinfo_min_freq = policy->cpuinfo.min_freq;
-
 op->u.get_para.u.s.scaling_cur_freq = policy->cur;
 op->u.get_para.u.s.scaling_max_freq = policy->max;
 op->u.get_para.u.s.scaling_min_freq = policy->min;
 
-if ( cpufreq_driver.name[0] )
-strlcpy(op->u.get_para.scaling_driver,
-cpufreq_driver.name, CPUFREQ_NAME_LEN);
-

[PATCH v7 04/15] xen/sysctl: Nest cpufreq scaling options

2023-07-26 Thread Jason Andryuk
Add a union and struct so that most of the scaling variables of struct
xen_get_cpufreq_para are within in a binary-compatible layout.  This
allows cppc_para to live in the larger union and use uint32_ts - struct
xen_cppc_para will be 10 uint32_t's.

The new scaling struct is 3 * uint32_t + 16 bytes CPUFREQ_NAME_LEN + 4 *
uint32_t for xen_ondemand = 11 uint32_t.  That means the old size is
retained, int32_t turbo_enabled doesn't move and it's binary compatible.

The out-of-context memcpy() in xc_get_cpufreq_para() now handles the
copying of the fields removed there.

Signed-off-by: Jason Andryuk 
Reviewed-by: Jan Beulich 
---
NOTE: Jan would like a toolside review / ack because:
Nevertheless I continue to be uncertain about all of this: Parts of
the struct can apparently go out of sync with the sysctl struct, but
other parts have to remain in sync without there being an
appropriate build-time check (checking merely sizes clearly isn't
enough). Therefore I'd really like to have a toolstack side review /
ack here as well.

v6:
Add Jan's Reviewed-by

v5:
Expand commit message
Change comment to driver/governor
---
 tools/include/xenctrl.h | 22 +-
 tools/libs/ctrl/xc_pm.c |  7 +--
 tools/misc/xenpm.c  | 24 
 xen/drivers/acpi/pmstat.c   | 27 ++-
 xen/include/public/sysctl.h | 22 +-
 5 files changed, 53 insertions(+), 49 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index dba33d5d0f..8aedb952a0 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1909,16 +1909,20 @@ struct xc_get_cpufreq_para {
 uint32_t cpuinfo_cur_freq;
 uint32_t cpuinfo_max_freq;
 uint32_t cpuinfo_min_freq;
-uint32_t scaling_cur_freq;
-
-char scaling_governor[CPUFREQ_NAME_LEN];
-uint32_t scaling_max_freq;
-uint32_t scaling_min_freq;
-
-/* for specific governor */
 union {
-xc_userspace_t userspace;
-xc_ondemand_t ondemand;
+struct {
+uint32_t scaling_cur_freq;
+
+char scaling_governor[CPUFREQ_NAME_LEN];
+uint32_t scaling_max_freq;
+uint32_t scaling_min_freq;
+
+/* for specific governor */
+union {
+xc_userspace_t userspace;
+xc_ondemand_t ondemand;
+} u;
+} s;
 } u;
 
 int32_t turbo_enabled;
diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index c3a9864bf7..6e751e242f 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -265,17 +265,12 @@ int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
 user_para->cpuinfo_cur_freq = sys_para->cpuinfo_cur_freq;
 user_para->cpuinfo_max_freq = sys_para->cpuinfo_max_freq;
 user_para->cpuinfo_min_freq = sys_para->cpuinfo_min_freq;
-user_para->scaling_cur_freq = sys_para->scaling_cur_freq;
-user_para->scaling_max_freq = sys_para->scaling_max_freq;
-user_para->scaling_min_freq = sys_para->scaling_min_freq;
 user_para->turbo_enabled= sys_para->turbo_enabled;
 
 memcpy(user_para->scaling_driver,
 sys_para->scaling_driver, CPUFREQ_NAME_LEN);
-memcpy(user_para->scaling_governor,
-sys_para->scaling_governor, CPUFREQ_NAME_LEN);
 
-/* copy to user_para no matter what cpufreq governor */
+/* copy to user_para no matter what cpufreq driver/governor */
 BUILD_BUG_ON(sizeof(((struct xc_get_cpufreq_para *)0)->u) !=
 sizeof(((struct xen_get_cpufreq_para *)0)->u));
 
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 1bb6187e56..ee8ce5d5f2 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -730,39 +730,39 @@ static void print_cpufreq_para(int cpuid, struct 
xc_get_cpufreq_para *p_cpufreq)
 printf("scaling_avail_gov: %s\n",
p_cpufreq->scaling_available_governors);
 
-printf("current_governor : %s\n", p_cpufreq->scaling_governor);
-if ( !strncmp(p_cpufreq->scaling_governor,
+printf("current_governor : %s\n", p_cpufreq->u.s.scaling_governor);
+if ( !strncmp(p_cpufreq->u.s.scaling_governor,
   "userspace", CPUFREQ_NAME_LEN) )
 {
 printf("  userspace specific :\n");
 printf("scaling_setspeed : %u\n",
-   p_cpufreq->u.userspace.scaling_setspeed);
+   p_cpufreq->u.s.u.userspace.scaling_setspeed);
 }
-else if ( !strncmp(p_cpufreq->scaling_governor,
+else if ( !strncmp(p_cpufreq->u.s.scaling_governor,
"ondemand", CPUFREQ_NAME_LEN) )
 {
 printf("  ondemand specific  :\n");
 printf("sampling_rate: max [%u] min [%u] cur [%u]\n",
-   p_cpufreq->u.ondemand.sampling_rate_max,
-   p_cpufreq->u.ondemand.sampling_rate_min,
-   p_cpufreq->u.ondemand.sampling_rate);
+

[PATCH v7 03/15] cpufreq: Export intel_feature_detect

2023-07-26 Thread Jason Andryuk
Export feature_detect as intel_feature_detect so it can be re-used by
HWP.

Signed-off-by: Jason Andryuk 
Acked-by: Jan Beulich 
---
v4:
Add Jan's Ack

v3:
Remove void * cast when calling intel_feature_detect

v2:
export intel_feature_detect with typed pointer
Move intel_feature_detect to acpi/cpufreq/cpufreq.h since the
declaration now contains struct cpufreq_policy *.
---
 xen/arch/x86/acpi/cpufreq/cpufreq.c | 8 ++--
 xen/include/acpi/cpufreq/cpufreq.h  | 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c 
b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index 6c70d04395..f1cc473b4f 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -339,9 +339,8 @@ static unsigned int cf_check get_cur_freq_on_cpu(unsigned 
int cpu)
 return extract_freq(get_cur_val(cpumask_of(cpu)), data);
 }
 
-static void cf_check feature_detect(void *info)
+void intel_feature_detect(struct cpufreq_policy *policy)
 {
-struct cpufreq_policy *policy = info;
 unsigned int eax;
 
 eax = cpuid_eax(6);
@@ -353,6 +352,11 @@ static void cf_check feature_detect(void *info)
 }
 }
 
+static void cf_check feature_detect(void *info)
+{
+intel_feature_detect(info);
+}
+
 static unsigned int check_freqs(const cpumask_t *mask, unsigned int freq,
 struct acpi_cpufreq_data *data)
 {
diff --git a/xen/include/acpi/cpufreq/cpufreq.h 
b/xen/include/acpi/cpufreq/cpufreq.h
index 1f1898d811..482ea5b0de 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -243,4 +243,6 @@ int write_userspace_scaling_setspeed(unsigned int cpu, 
unsigned int freq);
 void cpufreq_dbs_timer_suspend(void);
 void cpufreq_dbs_timer_resume(void);
 
+void intel_feature_detect(struct cpufreq_policy *policy);
+
 #endif /* __XEN_CPUFREQ_PM_H__ */
-- 
2.41.0




[PATCH v7 00/15] Intel Hardware P-States (HWP) support

2023-07-26 Thread Jason Andryuk
Hi,

This patch series adds Hardware-Controlled Performance States (HWP) for
Intel processors to Xen.

v2 was only partially reviewed, so v3 is mostly a reposting of v2.  In v2 &
v3, I think I addressed all comments for v1.  I kept patch 11 "xenpm:
Factor out a non-fatal cpuid_parse variant", with a v2 comment
explaining why I keep it.

v3 adds "xen/x86: Tweak PDC bits when using HWP".  Qubes testing revealed
an issue where enabling HWP can crash firwmare code (maybe SMM).  This
requires a Linux change to get the PDC bits from Xen and pass them to
ACPI.  Roger has a patch [0] to set the PDC bits.  Roger's 3 patch
series was tested with "xen/x86: Tweak PDC bits when using HWP" on
affected hardware and allowed proper operation.

v4:
There is a large amount or renaming from HWP/hwp to CPPC/cppc in the series.
The driver remains hwp_ prefixed since it is dealing with the hardware
interface.  The sysctl, xc and xenpm interfaces were renamed to cppc to
be the generic ACPI CPPC (Collaborative Processor Performance Control)
interface.

struct xen_get_cpufreq_para was re-organized in a binary compatible
fashion to nest scaling governor options.  This allows the cppc support
to use uint32_t's for its parameters.

HWP is now enabled with a top-level cpufreq=hwp option.  It will
fallback to cpufreq=xen if hwp is unavailable.  This seems like the most
user-friendly option.  Since the user was trying to specify *some*
cpufreq, we should give them the best that we can instead of disabling
the functionality.

"xenpm: Factor out a non-fatal cpuid_parse variant" was dropped.
set-cpufreq-cppc expects either a cpu number or none specified, which
implies all.

Some patches were re-arrange - "xen/x86: Tweak PDC bits when using HWP"
now comes immediately after "cpufreq: Add Hardware P-State (HWP) driver"

The implementation of "cpufreq: Allow restricting to internal governors
only " changed, so I removed Jan's Ack.

v5:
HWP is enabled with a toplevel cpufreq=hwp option.  There is no fallback
by default, but a cpufreq=hwp;xen syntax is now supported.  That tries
hwp first.  If HWP registration is unsuccessful, then xen registration
is performed as a fallback.

More changes from Jan's feedback.  They are typically minor and
documented in individual patches.

v6:
After some discussion, the cpufreq=hwp;xen syntax is retained, with the
global applicability of "verbose" mention in docs.  Minor tweaks
elsewhere as documented in individual patches.

v7:
Minor tweaks from review feedback - documented in respective patches.
Jan previously asked for a toolstack side review / ack of 
"[PATCH 04/15] xen/sysctl: Nest cpufreq scaling options"

Previous cover letter:

With HWP, the processor makes its own determinations for frequency
selection, though users can set some parameters and preferences.  There
is also Turbo Boost which dynamically pushes the max frequency if
possible.

The existing governors don't work with HWP since they select frequencies
and HWP doesn't expose those.  Therefore a dummy hwp-interal governor is
used that doesn't do anything.

xenpm get-cpufreq-para is extended to show HWP parameters, and
set-cpufreq-cppc is added to set them.

A lightly loaded OpenXT laptop showed ~1W power savings according to
powertop.  A mostly idle Fedora system (dom0 only) showed a more modest
power savings.

This is for a 10th gen 6-core 1600 MHz base 4900 MHZ max cpu.  In the
default balance mode, Turbo Boost doesn't exceed 4GHz.  Tweaking the
energy_perf preference with `xenpm set-cpufreq-para balance ene:64`,
I've seen the CPU hit 4.7GHz before throttling down and bouncing around
between 4.3 and 4.5 GHz.  Curiously the other cores read ~4GHz when
turbo boost takes affect.  This was done after pinning all dom0 cores,
and using taskset to pin to vCPU/pCPU 11 and running a bash tightloop.

HWP defaults to disabled and running with the existing HWP configuration
- it doesn't reconfigure by default.  It can be enabled with
cpufreq=hwp.

Hardware Duty Cycling (HDC) is another feature to autonomously powerdown
things.  It defaults to enabled when HWP is enabled, but HDC can be
disabled on the command line.  cpufreq=xen:hwp,no-hdc

I've only tested on 8th gen and 10th gen systems with activity window
and energy_perf support.  So the pathes for CPUs lacking those features
are untested.

Fast MSR support was removed in v2.  The model specific checking was not
done properly, and I don't have hardware to test with.  Since writes are
expected to be infrequent, I just removed the code.

This changes the systcl_pm_op hypercall, so that wants review.

Regards,
Jason

[0] 
https://lore.kernel.org/xen-devel/20221121102113.41893-3-roger@citrix.com/

Jason Andryuk (15):
  cpufreq: Allow restricting to internal governors only
  cpufreq: Add perf_freq to cpuinfo
  cpufreq: Export intel_feature_detect
  xen/sysctl: Nest cpufreq scaling options
  pmstat: Re-arrage for cpufreq union
  cpufreq: Add Hardware P-State (HWP) driver
  xen/x86: Tweak PDC bits when 

[PATCH v7 02/15] cpufreq: Add perf_freq to cpuinfo

2023-07-26 Thread Jason Andryuk
acpi-cpufreq scales the aperf/mperf measurements by max_freq, but HWP
needs to scale by base frequency.  Settings max_freq to base_freq
"works" but the code is not obvious, and returning values to userspace
is tricky.  Add an additonal perf_freq member which is used for scaling
aperf/mperf measurements.

Signed-off-by: Jason Andryuk 
Acked-by: Jan Beulich 
---
v3:
Add Jan's Ack

I don't like this, but it seems the best way to re-use the common
aperf/mperf code.  The other option would be to add wrappers that then
do the acpi vs. hwp scaling.
---
 xen/arch/x86/acpi/cpufreq/cpufreq.c | 2 +-
 xen/drivers/cpufreq/utility.c   | 1 +
 xen/include/acpi/cpufreq/cpufreq.h  | 3 +++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c 
b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index 2e0067fbe5..6c70d04395 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -316,7 +316,7 @@ unsigned int get_measured_perf(unsigned int cpu, unsigned 
int flag)
 else
 perf_percent = 0;
 
-return policy->cpuinfo.max_freq * perf_percent / 100;
+return policy->cpuinfo.perf_freq * perf_percent / 100;
 }
 
 static unsigned int cf_check get_cur_freq_on_cpu(unsigned int cpu)
diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index 9eb7ecedcd..6831f62851 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -236,6 +236,7 @@ int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy 
*policy,
 
 policy->min = policy->cpuinfo.min_freq = min_freq;
 policy->max = policy->cpuinfo.max_freq = max_freq;
+policy->cpuinfo.perf_freq = max_freq;
 policy->cpuinfo.second_max_freq = second_max_freq;
 
 if (policy->min == ~0)
diff --git a/xen/include/acpi/cpufreq/cpufreq.h 
b/xen/include/acpi/cpufreq/cpufreq.h
index 44fc4c58fc..1f1898d811 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -37,6 +37,9 @@ extern struct acpi_cpufreq_data *cpufreq_drv_data[NR_CPUS];
 struct cpufreq_cpuinfo {
 unsigned intmax_freq;
 unsigned intsecond_max_freq;/* P1 if Turbo Mode is on */
+unsigned intperf_freq; /* Scaling freq for aperf/mpref.
+  acpi-cpufreq uses max_freq, but HWP uses
+  base_freq.*/
 unsigned intmin_freq;
 unsigned inttransition_latency; /* in 10^(-9) s = nanoseconds */
 };
-- 
2.41.0




[PATCH v7 01/15] cpufreq: Allow restricting to internal governors only

2023-07-26 Thread Jason Andryuk
For hwp, the standard governors are not usable, and only the internal
one is applicable.  Add the cpufreq_governor_internal boolean to
indicate when an internal governor, like hwp, will be used.  This is set
during presmp_initcall, and governor registration can be skipped when
called during initcall.

This way unusable governors are not registered, and only compatible
governors are advertised to userspace.

Signed-off-by: Jason Andryuk 
Acked-by: Jan Beulich 
---
v6:
Add Jan's Ack

v5:
Check cpufreq_governor_internal and skip registration as applicable
Remove internal flag

v4:
Rework to use an internal flag
Removed Jan's Ack since the approach is different.

v3:
Switch to initdata
Add Jan Acked-by
Commit message s/they/the/ typo
Don't register hwp-internal when running non-hwp - Marek

v2:
Switch to "-internal"
Add blank line in header
---
 xen/drivers/cpufreq/cpufreq.c| 1 +
 xen/drivers/cpufreq/cpufreq_misc_governors.c | 9 +
 xen/drivers/cpufreq/cpufreq_ondemand.c   | 3 +++
 xen/include/acpi/cpufreq/cpufreq.h   | 2 ++
 4 files changed, 15 insertions(+)

diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index 2321c7dd07..67a58d409b 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -56,6 +56,7 @@ struct cpufreq_dom {
 };
 static LIST_HEAD_READ_MOSTLY(cpufreq_dom_list_head);
 
+bool __initdata cpufreq_governor_internal;
 struct cpufreq_governor *__read_mostly cpufreq_opt_governor;
 LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
 
diff --git a/xen/drivers/cpufreq/cpufreq_misc_governors.c 
b/xen/drivers/cpufreq/cpufreq_misc_governors.c
index f5571f5486..0327fad23b 100644
--- a/xen/drivers/cpufreq/cpufreq_misc_governors.c
+++ b/xen/drivers/cpufreq/cpufreq_misc_governors.c
@@ -120,6 +120,9 @@ static int __init cf_check cpufreq_gov_userspace_init(void)
 {
 unsigned int cpu;
 
+if ( cpufreq_governor_internal )
+return 0;
+
 for_each_online_cpu(cpu)
 per_cpu(cpu_set_freq, cpu) = userspace_cmdline_freq;
 register_cpu_notifier(_userspace_cpu_nfb);
@@ -162,6 +165,9 @@ struct cpufreq_governor cpufreq_gov_performance = {
 
 static int __init cf_check cpufreq_gov_performance_init(void)
 {
+if ( cpufreq_governor_internal )
+return 0;
+
 return cpufreq_register_governor(_gov_performance);
 }
 __initcall(cpufreq_gov_performance_init);
@@ -201,6 +207,9 @@ struct cpufreq_governor cpufreq_gov_powersave = {
 
 static int __init cf_check cpufreq_gov_powersave_init(void)
 {
+if ( cpufreq_governor_internal )
+return 0;
+
 return cpufreq_register_governor(_gov_powersave);
 }
 __initcall(cpufreq_gov_powersave_init);
diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c 
b/xen/drivers/cpufreq/cpufreq_ondemand.c
index fbcd14d6c3..06cfc88d30 100644
--- a/xen/drivers/cpufreq/cpufreq_ondemand.c
+++ b/xen/drivers/cpufreq/cpufreq_ondemand.c
@@ -360,6 +360,9 @@ struct cpufreq_governor cpufreq_gov_dbs = {
 
 static int __init cf_check cpufreq_gov_dbs_init(void)
 {
+if ( cpufreq_governor_internal )
+return 0;
+
 return cpufreq_register_governor(_gov_dbs);
 }
 __initcall(cpufreq_gov_dbs_init);
diff --git a/xen/include/acpi/cpufreq/cpufreq.h 
b/xen/include/acpi/cpufreq/cpufreq.h
index 35dcf21e8f..44fc4c58fc 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -116,6 +116,8 @@ extern struct cpufreq_governor cpufreq_gov_powersave;
 
 extern struct list_head cpufreq_governor_list;
 
+extern bool cpufreq_governor_internal;
+
 extern int cpufreq_register_governor(struct cpufreq_governor *governor);
 extern struct cpufreq_governor *__find_governor(const char *governor);
 #define CPUFREQ_DEFAULT_GOVERNOR _gov_dbs
-- 
2.41.0




Re: [PATCH v6 06/15] cpufreq: Add Hardware P-State (HWP) driver

2023-07-26 Thread Jason Andryuk
On Tue, Jul 25, 2023 at 10:37 AM Jan Beulich  wrote:
>
> On 25.07.2023 15:26, Jason Andryuk wrote:

> > #define hwp_err(cpu, fmt, args...) \
> > printk(XENLOG_ERR "HWP: CPU%u error: " fmt, cpu, args)
>
> ..., just that you're missing the ##:
>
> #define hwp_err(cpu, fmt, args...) \
> printk(XENLOG_ERR "HWP: CPU%u error: " fmt, cpu, ## args)

Thanks.  I thought I was trying to avoid the use of "##", which sent
me off in the wrong direction.

Regards,
Jason



Re: [PATCH v3 6/8] RISC-V: annotate entry points with type and size

2023-07-26 Thread Oleksii
On Wed, 2023-07-26 at 17:43 +0200, Jan Beulich wrote:
> On 26.07.2023 17:28, Oleksii wrote:
> > On Mon, 2023-07-10 at 10:58 +0200, Jan Beulich wrote:
> > > On 10.07.2023 10:56, Jan Beulich wrote:
> > > > Use the generic framework in xen/linkage.h. No change in
> > > > generated
> > > > code
> > > > except of course the converted symbols change to be hidden ones
> > > > and
> > > > gain
> > > > a valid size.
> > > > 
> > > > Signed-off-by: Jan Beulich 
> > > 
> > > I'm sorry, the Cc list was incomplete here. Adding back the
> > > remaining
> > > REST
> > > maintainers.
> > > 
> > > Jan
> > > 
> > > > ---
> > > > Probably count_args_exp() should move to macros.h, but I first
> > > > wanted to
> > > > see whether anyone can suggest any better approach for checking
> > > > whether
> > > > a defined macro expands to nothing.
> > What about introduction of conditional macros ?
> > Something similar to:
> > #include 
> > 
> > #define CONDITIONAL_RETURN(arg1, arg2)
> > CONDITIONAL_RETURN_IMPL(arg1,
> > arg2, EMPTY)
> > 
> > #define EMPTY(...) ""
> > 
> > #define CONDITIONAL_RETURN_IMPL(arg1, arg2, empty_check) \
> >     CONDITIONAL_RETURN_##empty_check(arg1, arg2)
> > 
> > #define CONDITIONAL_RETURN_EMPTY(arg1, arg2) \
> >     CONDITIONAL_RETURN_ARG1(arg1, arg2)
> > 
> > #define CONDITIONAL_RETURN_ARG1(arg1, arg2) arg1, arg2
> > 
> > #define CONDITIONAL_RETURN_ARG2(arg1, arg2) arg1
> 
> I don't see how this would be used in your scheme. It ...
> 
> > int main() {
> >     int a = 42;
> >     const char* b = "hello";
> > 
> >     // Second argument is not empty, both arguments are returned
> >     printf("Case 1: %d, %s\n", CONDITIONAL_RETURN(a, b));  //
> > Prints:
> > Case 1: 42, hello
> > 
> >     // Second argument is empty, only the first argument is
> > returned
> >     printf("Case 2: %d, %s\n", CONDITIONAL_RETURN(a, "")); //
> > Prints:
> > Case 2: 42,
> 
> ... certainly isn't here, or this likely would cause at least a
> warning
> from the compiler (for there being too few arguments to printf()) and
> then a runtime UB for interpreting something as a pointer to a string
> which likely isn't.
> 
> >     return 0;
> > }
> > 
> > and then define DO_CODE_ALIGN using CONDITIONAL_RETURN?
> 
> Afaict instead of getting rid of the comma, you'd actually add ""
> after it. What am I missing?
You are right. I missed that actually it returns "".

~ Oleksii



[PATCH v4] xen: speed up grant-table reclaim

2023-07-26 Thread Demi Marie Obenour
When a grant entry is still in use by the remote domain, Linux must put
it on a deferred list.  Normally, this list is very short, because
the PV network and block protocols expect the backend to unmap the grant
first.  However, Qubes OS's GUI protocol is subject to the constraints
of the X Window System, and as such winds up with the frontend unmapping
the window first.  As a result, the list can grow very large, resulting
in a massive memory leak and eventual VM freeze.

To partially solve this problem, make the number of entries that the VM
will attempt to free at each iteration tunable.  The default is still
10, but it can be overridden via a module parameter.

This is Cc: stable because (when combined with appropriate userspace
changes) it fixes a severe performance and stability problem for Qubes
OS users.

Cc: sta...@vger.kernel.org
Signed-off-by: Demi Marie Obenour 
---
 Documentation/ABI/testing/sysfs-module | 11 +++
 drivers/xen/grant-table.c  | 40 +++---
 2 files changed, 40 insertions(+), 11 deletions(-)

Changes since v3:
- do not mention Kconfig in the commit message.
- add entry to Documentation/ABI for the new sysfs entry.

Changes since v2:
- use atomic_inc_return(x) and atomic_dec_return(x) instead of
  atomic_add_return(1, x) and atomic_sub_return(1, x) respectively.
- move module_param macro closer to the definition of
  free_per_iteration.
- add blank line between declarations and statements.

Changes since v1:
- drop setting default via Kconfig

diff --git a/Documentation/ABI/testing/sysfs-module 
b/Documentation/ABI/testing/sysfs-module
index 
08886367d0470e8d8922703a7d5174077801c2a8..62addab47d0c5908d26ec2f5d07db5ce21833566
 100644
--- a/Documentation/ABI/testing/sysfs-module
+++ b/Documentation/ABI/testing/sysfs-module
@@ -60,3 +60,14 @@ Description: Module taint flags:
C   staging driver module
E   unsigned module
==  =
+
+What:  /sys/module/grant_table/parameters/free_per_iteration
+Date:  July 2023
+KernelVersion: 6.5 but backported to all supported stable branches
+Contact:   Xen developer discussion 
+Description:   Read and write number of grant entries to attempt to free per 
iteration.
+
+   Note: Future versions of Xen and Linux may provide a better
+   interface for controlling the rate of deferred grant reclaim
+   or may not need it at all.
+Users: Qubes OS (https://www.qubes-os.org)
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 
e1ec725c2819d4d5dede063eb00d86a6d52944c0..f13c3b76ad1eb7110e2a2981e9fa4e504174e431
 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -498,14 +498,21 @@ static LIST_HEAD(deferred_list);
 static void gnttab_handle_deferred(struct timer_list *);
 static DEFINE_TIMER(deferred_timer, gnttab_handle_deferred);
 
+static atomic64_t deferred_count;
+static atomic64_t leaked_count;
+static unsigned int free_per_iteration = 10;
+module_param(free_per_iteration, uint, 0600);
+
 static void gnttab_handle_deferred(struct timer_list *unused)
 {
-   unsigned int nr = 10;
+   unsigned int nr = READ_ONCE(free_per_iteration);
+   const bool ignore_limit = nr == 0;
struct deferred_entry *first = NULL;
unsigned long flags;
+   size_t freed = 0;
 
spin_lock_irqsave(_list_lock, flags);
-   while (nr--) {
+   while ((ignore_limit || nr--) && !list_empty(_list)) {
struct deferred_entry *entry
= list_first_entry(_list,
   struct deferred_entry, list);
@@ -515,10 +522,14 @@ static void gnttab_handle_deferred(struct timer_list 
*unused)
list_del(>list);
spin_unlock_irqrestore(_list_lock, flags);
if (_gnttab_end_foreign_access_ref(entry->ref)) {
+   uint64_t ret = atomic64_dec_return(_count);
+
put_free_entry(entry->ref);
-   pr_debug("freeing g.e. %#x (pfn %#lx)\n",
-entry->ref, page_to_pfn(entry->page));
+   pr_debug("freeing g.e. %#x (pfn %#lx), %llu 
remaining\n",
+entry->ref, page_to_pfn(entry->page),
+(unsigned long long)ret);
put_page(entry->page);
+   freed++;
kfree(entry);
entry = NULL;
} else {
@@ -530,21 +541,22 @@ static void gnttab_handle_deferred(struct timer_list 
*unused)
spin_lock_irqsave(_list_lock, flags);
if (entry)
list_add_tail(>list, _list);
-   else if (list_empty(_list))
-   break;
}
-   if (!list_empty(_list) && !timer_pending(_timer)) {
+   if 

[XEN PATCH v4 2/4] automation: Add xen builds for the ECLAIR analyses

2023-07-26 Thread Simone Ballarin
This patch defines an ARM64 and a X86_64 build for the
ECLAIR pipelines.

These files are used by the analyze.sh script in
automation/eclair_analysis: it initially calls prepare.sh,
then runs into an ECLAIR environment build.sh.

Only the toolchain invocations triggered by build.sh
are analyzed; the prepare.sh script is instead intended
to perform all the required operations for building xen
that are not supposed to be analyzed: e.g. dependencies
build.

Signed-off-by: Simone Ballarin 
Reviewed-by: Stefano Stabellini 

--
Changes in v4:
- add 'Reviewed-by'

Changes in v3:
- split build definitions in a separate patch

Changes in v2:
- add ECLAIR configuration files (before they were fetched from a separate
  repository);
- now the pipeline fails if there are new violations of guidelines tagged
  with clean:added.
---
 automation/eclair_analysis/Makefile.prepare |   6 +
 automation/eclair_analysis/build.sh |  44 ++
 automation/eclair_analysis/prepare.sh   |  42 ++
 automation/eclair_analysis/xen_arm_config   | 147 +++
 automation/eclair_analysis/xen_x86_config   | 152 
 5 files changed, 391 insertions(+)
 create mode 100644 automation/eclair_analysis/Makefile.prepare
 create mode 100755 automation/eclair_analysis/build.sh
 create mode 100755 automation/eclair_analysis/prepare.sh
 create mode 100644 automation/eclair_analysis/xen_arm_config
 create mode 100644 automation/eclair_analysis/xen_x86_config

diff --git a/automation/eclair_analysis/Makefile.prepare 
b/automation/eclair_analysis/Makefile.prepare
new file mode 100644
index 00..90f4a31172
--- /dev/null
+++ b/automation/eclair_analysis/Makefile.prepare
@@ -0,0 +1,6 @@
+include Makefile
+prepare:
+   $(Q)$(MAKE) $(build)=tools
+   $(Q)$(MAKE) $(build)=. include/xen/compile.h
+   $(Q)$(MAKE) $(build)=include all
+   $(Q)$(MAKE) $(build)=arch/$(SRCARCH) include
diff --git a/automation/eclair_analysis/build.sh 
b/automation/eclair_analysis/build.sh
new file mode 100755
index 00..ec087dd822
--- /dev/null
+++ b/automation/eclair_analysis/build.sh
@@ -0,0 +1,44 @@
+#!/bin/bash
+# Stop immediately if any executed command has exit status different from 0.
+set -e
+
+script_name="$(basename "$0")"
+
+fatal() {
+  echo "${script_name}: $*" >&2
+  exit 1
+}
+
+usage() {
+  fatal "Usage: ${script_name} "
+}
+
+if [ $# -ne 1 ]; then
+  usage
+fi
+
+if [ "$1" = "X86_64" ]; then
+  export CROSS_COMPILE=
+  export XEN_TARGET_ARCH=x86_64
+elif [ "$1" = "ARM64" ]; then
+  export CROSS_COMPILE=aarch64-linux-gnu-
+  export XEN_TARGET_ARCH=arm64
+else
+  fatal "Unknown configuration: $1"
+fi
+
+if [[ -f /proc/cpuinfo ]]; then
+  PROCESSORS=$(grep -c ^processor /proc/cpuinfo)
+else
+  PROCESSORS=6
+fi
+
+(
+  cd xen
+
+  make "-j${PROCESSORS}" "-l${PROCESSORS}.0"\
+   "CROSS_COMPILE=${CROSS_COMPILE}" \
+   "CC=${CROSS_COMPILE}gcc-12"  \
+   "CXX=${CROSS_COMPILE}g++-12" \
+   "XEN_TARGET_ARCH=${XEN_TARGET_ARCH}"
+)
diff --git a/automation/eclair_analysis/prepare.sh 
b/automation/eclair_analysis/prepare.sh
new file mode 100755
index 00..275a1a3f51
--- /dev/null
+++ b/automation/eclair_analysis/prepare.sh
@@ -0,0 +1,42 @@
+#!/bin/bash
+# Stop immediately if any executed command has exit status different from 0.
+set -e
+
+script_name="$(basename "$0")"
+script_dir="$(
+  cd "$(dirname "$0")"
+  echo "${PWD}"
+)"
+
+fatal() {
+  echo "${script_name}: $*" >&2
+  exit 1
+}
+
+usage() {
+  fatal "Usage: ${script_name}"
+}
+
+if [ $# -ne 1 ]; then
+  usage
+  exit 1
+fi
+
+export XEN_TARGET_ARCH
+
+if [ "$1" = "X86_64" ]; then
+  CONFIG_FILE="${script_dir}/xen_x86_config"
+  XEN_TARGET_ARCH=x86_64
+elif [ "$1" = "ARM64" ]; then
+  CONFIG_FILE="${script_dir}/xen_arm_config"
+  XEN_TARGET_ARCH=arm64
+else
+  fatal "Unknown configuration: $1"
+fi
+
+(
+cd xen
+cp "${CONFIG_FILE}" .config
+make clean
+make -f ${script_dir}/Makefile.prepare prepare
+)
diff --git a/automation/eclair_analysis/xen_arm_config 
b/automation/eclair_analysis/xen_arm_config
new file mode 100644
index 00..82102b889e
--- /dev/null
+++ b/automation/eclair_analysis/xen_arm_config
@@ -0,0 +1,147 @@
+# File provided in
+# Re: Xen MISRA C: Source files in scope and out of scope
+# from:Stefano Stabellini 
+# date:6 giu 2023, 02:53
+
+#
+# Automatically generated file; DO NOT EDIT.
+# Xen/arm 4.18-unstable Configuration
+#
+CONFIG_CC_IS_GCC=y
+CONFIG_GCC_VERSION=90400
+CONFIG_CLANG_VERSION=0
+CONFIG_LD_IS_GNU=y
+CONFIG_CC_HAS_VISIBILITY_ATTRIBUTE=y
+CONFIG_ARM_64=y
+CONFIG_ARM=y
+CONFIG_ARCH_DEFCONFIG="arch/arm/configs/arm64_defconfig"
+
+# UBSAN
+CONFIG_UBSAN=n
+
+#
+# Architecture Features
+#
+CONFIG_ARM64_SVE=n
+CONFIG_64BIT=y
+CONFIG_NR_CPUS=4
+# CONFIG_ACPI is not set
+CONFIG_ARM_EFI=y
+CONFIG_GICV3=y
+CONFIG_HAS_ITS=y
+CONFIG_HVM=y
+# CONFIG_NEW_VGIC is not set
+CONFIG_SBSA_VUART_CONSOLE=y
+CONFIG_ARM_SSBD=y

[XEN PATCH v4 1/4] automation: Add ECLAIR utilities and settings

2023-07-26 Thread Simone Ballarin
The files with extension ecl are ECLAIR configurations that
are loaded during the analysis phase or during the report
generation phase: analysis.ecl is the main file for the analysis
phase, while reports.ecl is the one for the report phase.
All other ecl files are included by one of the two main ones.

The actions* scripts implement the integration with the CI server,
they are completely general and can be amended to work with any CI
server. Their presence in xen.git is recommended so that maintainance
would be easier.

analyze.sh is the script that actually triggers the analysis.

Signed-off-by: Simone Ballarin 
Acked-by: Stefano Stabellini 

--
Changes in v4:
- avoid warning from git fetch
- add --quit to git fetch
- add 'Acked-by'

Changes in v3:
- split ECLAIR configurations and scripts in a separate patch;
- remove references to "Task (a): Xen Coding Guidelines v1.0".

Changes in v2:
- add ECLAIR configuration files (before they were fetched from a separate
repository);
- now the pipeline fails if there are new violations of guidelines tagged
with clean:added.
---
 automation/eclair_analysis/ECLAIR/Set1.ecl|  59 
 automation/eclair_analysis/ECLAIR/Set2.ecl|  25 ++
 automation/eclair_analysis/ECLAIR/Set3.ecl|  67 
 .../eclair_analysis/ECLAIR/action.helpers | 193 
 .../eclair_analysis/ECLAIR/action.settings| 172 ++
 .../ECLAIR/action_clean_added.sh  |  36 +++
 .../eclair_analysis/ECLAIR/action_log.sh  |  15 +
 .../ECLAIR/action_pull_request.sh |  57 
 .../eclair_analysis/ECLAIR/action_push.sh |  95 ++
 .../ECLAIR/action_upload_sarif.sh |  31 ++
 .../eclair_analysis/ECLAIR/analysis.ecl   |  25 ++
 automation/eclair_analysis/ECLAIR/analyze.sh  | 106 +++
 .../ECLAIR/call_properties.ecl| 106 +++
 .../eclair_analysis/ECLAIR/deviations.ecl | 298 ++
 .../eclair_analysis/ECLAIR/out_of_scope.ecl   | 127 
 .../ECLAIR/print_analyzed_files.sh|  66 
 .../eclair_analysis/ECLAIR/public_APIs.ecl|   6 +
 automation/eclair_analysis/ECLAIR/report.ecl  |   4 +
 automation/eclair_analysis/ECLAIR/tagging.ecl |  34 ++
 .../eclair_analysis/ECLAIR/toolchain.ecl  | 275 
 20 files changed, 1797 insertions(+)
 create mode 100644 automation/eclair_analysis/ECLAIR/Set1.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/Set2.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/Set3.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/action.helpers
 create mode 100644 automation/eclair_analysis/ECLAIR/action.settings
 create mode 100755 automation/eclair_analysis/ECLAIR/action_clean_added.sh
 create mode 100755 automation/eclair_analysis/ECLAIR/action_log.sh
 create mode 100644 automation/eclair_analysis/ECLAIR/action_pull_request.sh
 create mode 100755 automation/eclair_analysis/ECLAIR/action_push.sh
 create mode 100755 automation/eclair_analysis/ECLAIR/action_upload_sarif.sh
 create mode 100644 automation/eclair_analysis/ECLAIR/analysis.ecl
 create mode 100755 automation/eclair_analysis/ECLAIR/analyze.sh
 create mode 100644 automation/eclair_analysis/ECLAIR/call_properties.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/deviations.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/out_of_scope.ecl
 create mode 100755 automation/eclair_analysis/ECLAIR/print_analyzed_files.sh
 create mode 100644 automation/eclair_analysis/ECLAIR/public_APIs.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/report.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/tagging.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/toolchain.ecl

diff --git a/automation/eclair_analysis/ECLAIR/Set1.ecl 
b/automation/eclair_analysis/ECLAIR/Set1.ecl
new file mode 100644
index 00..86b8e7e772
--- /dev/null
+++ b/automation/eclair_analysis/ECLAIR/Set1.ecl
@@ -0,0 +1,59 @@
+-doc_begin="Set 1 of Xen MISRA C guidelines"
+-enable=MC3R1.R9.1
+-enable=MC3R1.R12.5
+-enable=MC3R1.R17.3
+-enable=MC3R1.R17.4
+-enable=MC3R1.R17.6
+-enable=MC3R1.R19.1
+-enable=MC3R1.R21.13
+-enable=MC3R1.R21.17
+-enable=MC3R1.R21.18
+-enable=MC3R1.R21.19
+-enable=MC3R1.R21.20
+-enable=MC3R1.R21.21
+-enable=MC3R1.R22.2
+-enable=MC3R1.R22.4
+-enable=MC3R1.R22.5
+-enable=MC3R1.R22.6
+-enable=MC3R1.D1.1
+-enable=MC3R1.D2.1
+-enable=MC3R1.D4.1
+-enable=MC3R1.D4.3
+-enable=MC3R1.D4.7
+-enable=MC3R1.D4.10
+-enable=MC3R1.D4.11
+-enable=MC3R1.D4.14
+-enable=MC3R1.R1.1
+-enable=MC3R1.R1.3
+-enable=MC3R1.R1.4
+-enable=MC3R1.R2.1
+-enable=MC3R1.R2.2
+-enable=MC3R1.R3.1
+-enable=MC3R1.R3.2
+-enable=MC3R1.R4.1
+-enable=MC3R1.R5.1
+-enable=MC3R1.R5.2
+-enable=MC3R1.R5.3
+-enable=MC3R1.R5.4
+-enable=MC3R1.R5.6
+-enable=MC3R1.R6.1
+-enable=MC3R1.R6.2
+-enable=MC3R1.R7.1
+-enable=MC3R1.R7.2
+-enable=MC3R1.R7.3
+-enable=MC3R1.R7.4
+-enable=MC3R1.R8.1
+-enable=MC3R1.R8.2
+-enable=MC3R1.R8.3
+-enable=MC3R1.R8.4
+-enable=MC3R1.R8.5
+-enable=MC3R1.R8.6

[XEN PATCH v4 4/4] maintainers: Add ECLAIR reviewer

2023-07-26 Thread Simone Ballarin
Signed-off-by: Simone Ballarin 
Acked-by: Stefano Stabellini 
--

Changes in v4:
- add 'Acked-by'
- remove undesired blanks

Changes in v3:
- split maintainer add in a separate patch;
- substitute blanks with tabs;
- change role from maintainer to reviewer.

Changes in v2:
- add ECLAIR configuration files (before they were fetched from a separate
  repository);
- now the pipeline fails if there are new violations of guidelines tagged
  with clean:added.
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 180e57dac4..d8a02a6c19 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -305,6 +305,12 @@ F: xen/include/xen/libfdt/
 F: xen/include/xen/device_tree.h
 F: xen/drivers/passthrough/device_tree.c
 
+ECLAIR
+R: Simone Ballarin 
+S: Supported
+F: automation/eclair_analysis/
+F: automation/scripts/eclair
+
 EFI
 M: Jan Beulich 
 S: Supported
-- 
2.34.1




[XEN PATCH v4 3/4] automation: Add ECLAIR pipelines

2023-07-26 Thread Simone Ballarin
Add two pipelines that analyze an ARM64 and a X86_64 build with the
ECLAIR static analyzer on the guidelines contained in Set1.

The analysis configuration is stored in automation/eclair_analysis.

All commits on the xen-project/xen:staging branch will be analyzed
and their artifacts will be stored indefinitely; the integration will
report differential information with respect to the previous analysis.

All commits on other branches or repositories will be analyzed and
only the last ten artifacts will be kept; the integration will report
differential information with respect to the analysis done on the common
ancestor with xen-project/xen:staging (if available).

Currently the pipeline variable ENABLE_ECLAIR_BOT is set to "n".
Doing so disables the generation of comments with the analysis summary
on the commit threads. The variable can be set to "y" if the a masked
variable named ECLAIR_BOT_TOKEN is set with the impersonation token of
an account with enough privileges to write on all repositories.

Additionaly any repository should be able to read a masked variable
named WTOKEN with the token provided by BUGSENG.

The analysis fails if it contains violations of guidelines tagged as
clean:added. The list of clean guidelines are maintained in
automation/eclair_analysis/ECLAIR/tagging.ecl.

Signed-off-by: Simone Ballarin 

--
Changes in v4:
- move link to the analysis results at the end of the console log

Changes in v3:
- split definitions of the ECLAIR pipelines in a separate patch;
- if the WTOKEN variable is missing now the analysis fails immediately.

Changes in v2:
- add ECLAIR configuration files (before they were fetched from a separate
repository);
- now the pipeline fails if there are new violations of guidelines tagged
with clean:added.
---
 .gitlab-ci.yml|  2 ++
 automation/gitlab-ci/analyze.yaml | 38 +++
 automation/gitlab-ci/build.yaml   |  1 +
 automation/scripts/eclair | 34 +++
 4 files changed, 75 insertions(+)
 create mode 100644 automation/gitlab-ci/analyze.yaml
 create mode 100755 automation/scripts/eclair

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index c8bd7519d5..ee5430b8b7 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -1,7 +1,9 @@
 stages:
+  - analyze
   - build
   - test
 
 include:
+  - 'automation/gitlab-ci/analyze.yaml'
   - 'automation/gitlab-ci/build.yaml'
   - 'automation/gitlab-ci/test.yaml'
diff --git a/automation/gitlab-ci/analyze.yaml 
b/automation/gitlab-ci/analyze.yaml
new file mode 100644
index 00..3d8166572b
--- /dev/null
+++ b/automation/gitlab-ci/analyze.yaml
@@ -0,0 +1,38 @@
+.eclair-analysis:
+  stage: analyze
+  tags:
+- eclair-analysis
+  variables:
+ECLAIR_OUTPUT_DIR: "ECLAIR_out"
+ANALYSIS_KIND: "normal"
+ENABLE_ECLAIR_BOT: "n"
+AUTO_PR_BRANCH: "staging"
+AUTO_PR_REPOSITORY: "xen-project/xen"
+  artifacts:
+when: always
+paths:
+  - "${ECLAIR_OUTPUT_DIR}/*.log"
+  - "${ECLAIR_OUTPUT_DIR}/*.txt"
+  - '*.log'
+reports:
+  codequality: gl-code-quality-report.json
+
+eclair-x86_64:
+  extends: .eclair-analysis
+  variables:
+LOGFILE: "eclair-x86_64.log"
+VARIANT: "X86_64"
+RULESET: "Set1"
+  script:
+- ./automation/scripts/eclair 2>&1 | tee "${LOGFILE}"
+  allow_failure: true
+
+eclair-ARM64:
+  extends: .eclair-analysis
+  variables:
+LOGFILE: "eclair-ARM64.log"
+VARIANT: "ARM64"
+RULESET: "Set1"
+  script:
+- ./automation/scripts/eclair 2>&1 | tee "${LOGFILE}"
+  allow_failure: true
diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index c401f62d61..f01e2c32bb 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -11,6 +11,7 @@
   - '*.log'
   - '*/*.log'
 when: always
+  needs: []
   except:
 - master
 - smoke
diff --git a/automation/scripts/eclair b/automation/scripts/eclair
new file mode 100755
index 00..813a56eb6a
--- /dev/null
+++ b/automation/scripts/eclair
@@ -0,0 +1,34 @@
+#!/bin/sh -eu
+
+ECLAIR_ANALYSIS_DIR=automation/eclair_analysis
+ECLAIR_DIR="${ECLAIR_ANALYSIS_DIR}/ECLAIR"
+ECLAIR_OUTPUT_DIR=$(realpath "${ECLAIR_OUTPUT_DIR}")
+
+if [ -z "${WTOKEN:-}" ]; then
+echo "Failure: the WTOKEN variable is not defined." >&2
+exit 1
+fi
+
+"${ECLAIR_ANALYSIS_DIR}/prepare.sh" "${VARIANT}"
+
+ex=0
+"${ECLAIR_DIR}/analyze.sh" "${VARIANT}" "${RULESET}" || ex=$?
+"${ECLAIR_DIR}/action_log.sh" ANALYSIS_LOG \
+ "ECLAIR analysis log" \
+ "${ECLAIR_OUTPUT_DIR}/ANALYSIS.log" \
+ "${ex}"
+"${ECLAIR_DIR}/action_log.sh" REPORT_LOG \
+ "ECLAIR report log" \
+ "${ECLAIR_OUTPUT_DIR}/REPORT.log" \
+ "${ex}"
+[ "${ex}" = 0 ] || exit "${ex}"
+
+# Fail in case of new reports
+"${ECLAIR_DIR}/action_clean_added.sh" 

[XEN PATCH v4 0/4] automation: Add ECLAIR pipelines

2023-07-26 Thread Simone Ballarin
This patch series adds two pipelines that analyze an ARM64 and a X86_64
build with the ECLAIR static analyzer on the guidelines contained in Set1.
The builds analyzed are the ones triggered by 
automation/eclair_analysis/build.sh.

automation/eclair_analysis/ECLAIR contains the ECLAIR configuration files
(.ecl files) and scripts that implement the integration (action* scripts).

All commits on the xen-project/xen:staging branch will be analyzed
and their artifacts will be stored indefinitely; the integration will
report differential information with respect to the previous analysis.

All commits on other branches or repositories will be analyzed and
only the last ten artifacts will be kept; the integration will report
differential information with respect to the analysis done on the common
ancestor with xen-project/xen:staging (if available).

Additionaly any repository should be able to read a masked variable
named WTOKEN with the token provided by BUGSENG, otherwise the pipeline
will fail.

The analysis fails if it contains violations of guidelines tagged as
clean:added. The list of clean guidelines are maintained in
automation/eclair_analysis/ECLAIR/tagging.ecl.

--
Changes in v4:
- move link to the analysis results at the end of the console log
- avoid git warning
- avoid undesired git logs
- remove undesired blanks in MAINTAINERS

Simone Ballarin (4):
  automation: Add ECLAIR utilities and settings
  automation: Add xen builds for the ECLAIR analyses
  automation: Add ECLAIR pipelines
  maintainers: Add ECLAIR reviewer

 .gitlab-ci.yml|   2 +
 MAINTAINERS   |   6 +
 automation/eclair_analysis/ECLAIR/Set1.ecl|  59 
 automation/eclair_analysis/ECLAIR/Set2.ecl|  25 ++
 automation/eclair_analysis/ECLAIR/Set3.ecl|  67 
 .../eclair_analysis/ECLAIR/action.helpers | 193 
 .../eclair_analysis/ECLAIR/action.settings| 172 ++
 .../ECLAIR/action_clean_added.sh  |  36 +++
 .../eclair_analysis/ECLAIR/action_log.sh  |  15 +
 .../ECLAIR/action_pull_request.sh |  57 
 .../eclair_analysis/ECLAIR/action_push.sh |  95 ++
 .../ECLAIR/action_upload_sarif.sh |  31 ++
 .../eclair_analysis/ECLAIR/analysis.ecl   |  25 ++
 automation/eclair_analysis/ECLAIR/analyze.sh  | 106 +++
 .../ECLAIR/call_properties.ecl| 106 +++
 .../eclair_analysis/ECLAIR/deviations.ecl | 298 ++
 .../eclair_analysis/ECLAIR/out_of_scope.ecl   | 127 
 .../ECLAIR/print_analyzed_files.sh|  66 
 .../eclair_analysis/ECLAIR/public_APIs.ecl|   6 +
 automation/eclair_analysis/ECLAIR/report.ecl  |   4 +
 automation/eclair_analysis/ECLAIR/tagging.ecl |  34 ++
 .../eclair_analysis/ECLAIR/toolchain.ecl  | 275 
 automation/eclair_analysis/Makefile.prepare   |   6 +
 automation/eclair_analysis/build.sh   |  44 +++
 automation/eclair_analysis/prepare.sh |  42 +++
 automation/eclair_analysis/xen_arm_config | 147 +
 automation/eclair_analysis/xen_x86_config | 152 +
 automation/gitlab-ci/analyze.yaml |  38 +++
 automation/gitlab-ci/build.yaml   |   1 +
 automation/scripts/eclair |  34 ++
 30 files changed, 2269 insertions(+)
 create mode 100644 automation/eclair_analysis/ECLAIR/Set1.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/Set2.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/Set3.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/action.helpers
 create mode 100644 automation/eclair_analysis/ECLAIR/action.settings
 create mode 100755 automation/eclair_analysis/ECLAIR/action_clean_added.sh
 create mode 100755 automation/eclair_analysis/ECLAIR/action_log.sh
 create mode 100644 automation/eclair_analysis/ECLAIR/action_pull_request.sh
 create mode 100755 automation/eclair_analysis/ECLAIR/action_push.sh
 create mode 100755 automation/eclair_analysis/ECLAIR/action_upload_sarif.sh
 create mode 100644 automation/eclair_analysis/ECLAIR/analysis.ecl
 create mode 100755 automation/eclair_analysis/ECLAIR/analyze.sh
 create mode 100644 automation/eclair_analysis/ECLAIR/call_properties.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/deviations.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/out_of_scope.ecl
 create mode 100755 automation/eclair_analysis/ECLAIR/print_analyzed_files.sh
 create mode 100644 automation/eclair_analysis/ECLAIR/public_APIs.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/report.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/tagging.ecl
 create mode 100644 automation/eclair_analysis/ECLAIR/toolchain.ecl
 create mode 100644 automation/eclair_analysis/Makefile.prepare
 create mode 100755 automation/eclair_analysis/build.sh
 create mode 100755 automation/eclair_analysis/prepare.sh
 create mode 100644 automation/eclair_analysis/xen_arm_config
 create mode 

Re: [PATCH v4 2/2] xen/riscv: introduce identity mapping

2023-07-26 Thread Jan Beulich
On 26.07.2023 17:54, Oleksii wrote:
> On Wed, 2023-07-26 at 17:00 +0200, Jan Beulich wrote:
>> On 26.07.2023 15:12, Oleksii wrote:
>>> On Wed, 2023-07-26 at 13:58 +0200, Jan Beulich wrote:
 On 26.07.2023 13:23, Oleksii wrote:
> I would like to ask for advice on whether it would be easier,
> less
> bug-
> provoking ( during identity mapping to remove of whole Xen ) to
> have a
> separate identity section that won't be more than PAGE_SIZE.

 I'm afraid you can't safely do this in C, or at least not without
 further checking on what the compiler actually did.

> @@ -264,6 +268,19 @@ void __init enable_mmu(void)
>                RV_STAGE1_MODE << SATP_MODE_SHIFT);
>  }
>  
> +void __attribute__((naked)) __section(".ident")
> turn_on_mmu(unsigned
> long ra)

 Did you read what gcc doc says about "naked"? Extended asm()
 isn't
 supported there. Since ...

> +{
> +    /* Ensure page table writes precede loading the SATP */
> +    sfence_vma();
> +
> +    /* Enable the MMU and load the new pagetable for Xen */
> +    csr_write(CSR_SATP,
> +              PFN_DOWN((unsigned long)stage1_pgtbl_root) |
> +              RV_STAGE1_MODE << SATP_MODE_SHIFT);
> +
> +    asm volatile( "jr %0\n" : : "r"(ra) );
> +}

 ... none of this really requires C, I think we're at the point
 where
 (iirc) Andrew's and my suggestion wants following, moving this to
 assembly code (at which point it doesn't need to be a separate
 function). You can still build page tables in C, of course.
 (Likely
 you then also won't need a separate section; some minimal
 alignment
 guarantees ought to suffice to make sure the critical code is
 confined to a single page.)
>>>
>>> Thanks. I'll move all of this to assembly code.
>>> Regarding alignment it is needed alignment on start and end of
>>> function:
>>>     .balign PAGE_SIZE
>>>     GLOBAL(turn_on_mmu)
>>>     ...
>>>     .balign PAGE_SIZE
>>>     ENDPROC(turn_on_mmu)
>>>
>>> Does the better way exist?
>>
>> The function is only going to be a handful of instructions. Its
>> alignment doesn't need to be larger than the next power of 2. I
>> expect you'll be good with 64-byte alignment. (In no case do you
>> need to align the end of the function: Putting other stuff there
>> is not a problem at all.) What you want in any event is a build
>> time check that the within-a-page constraint is met.
> But shouldn't be an address be aligned to a boundary equal to page
> size?
> 
> According to the RISC-V privileged spec:
> Any level of PTE may be a leaf PTE, so in addition to 4 KiB pages, Sv39
> supports 2 MiB megapages
> and 1 GiB gigapages, each of which must be virtually and physically
> aligned to a boundary equal
> to its size. A page-fault exception is raised if the physical address
> is insufficiently aligned.

You'd simply map the page containing the chunk, i.e. masking off the
low 12 bits. If far enough away from the Xen virtual range, you could
as well map a 2M page masking off the low 21 bits, or a 1G page with
the low 30 bits of the address cleared.

Jan



[linux-linus test] 182011: regressions - FAIL

2023-07-26 Thread osstest service owner
flight 182011 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182011/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180278
 test-amd64-amd64-xl-credit2  20 guest-localmigrate/x10   fail REGR. vs. 180278
 test-amd64-amd64-xl-credit1  20 guest-localmigrate/x10   fail REGR. vs. 180278
 test-arm64-arm64-libvirt-raw 13 guest-start  fail REGR. vs. 180278
 test-arm64-arm64-xl-vhd  13 guest-start  fail REGR. vs. 180278
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. 
vs. 180278
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 20 guest-start/debianhvm.repeat 
fail REGR. vs. 180278

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt   16 saverestore-support-check fail blocked in 180278
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail blocked in 
180278
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 
180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux0b5547c51827e053cc754db47d3ec3e6c2c451d2
baseline version:
 linux6c538e1adbfc696ac4747fb10d63e704344f763d

Last test of basis   180278  2023-04-16 19:41:46 Z  100 days
Failing since180281  2023-04-17 06:24:36 Z  100 days  189 attempts
Testing same since   182011  2023-07-25 10:46:27 Z1 days1 attempts


3820 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  fail
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass

Re: [PATCH v4 2/2] xen/riscv: introduce identity mapping

2023-07-26 Thread Oleksii
On Wed, 2023-07-26 at 17:00 +0200, Jan Beulich wrote:
> On 26.07.2023 15:12, Oleksii wrote:
> > On Wed, 2023-07-26 at 13:58 +0200, Jan Beulich wrote:
> > > On 26.07.2023 13:23, Oleksii wrote:
> > > > I would like to ask for advice on whether it would be easier,
> > > > less
> > > > bug-
> > > > provoking ( during identity mapping to remove of whole Xen ) to
> > > > have a
> > > > separate identity section that won't be more than PAGE_SIZE.
> > > 
> > > I'm afraid you can't safely do this in C, or at least not without
> > > further checking on what the compiler actually did.
> > > 
> > > > @@ -264,6 +268,19 @@ void __init enable_mmu(void)
> > > >                RV_STAGE1_MODE << SATP_MODE_SHIFT);
> > > >  }
> > > >  
> > > > +void __attribute__((naked)) __section(".ident")
> > > > turn_on_mmu(unsigned
> > > > long ra)
> > > 
> > > Did you read what gcc doc says about "naked"? Extended asm()
> > > isn't
> > > supported there. Since ...
> > > 
> > > > +{
> > > > +    /* Ensure page table writes precede loading the SATP */
> > > > +    sfence_vma();
> > > > +
> > > > +    /* Enable the MMU and load the new pagetable for Xen */
> > > > +    csr_write(CSR_SATP,
> > > > +              PFN_DOWN((unsigned long)stage1_pgtbl_root) |
> > > > +              RV_STAGE1_MODE << SATP_MODE_SHIFT);
> > > > +
> > > > +    asm volatile( "jr %0\n" : : "r"(ra) );
> > > > +}
> > > 
> > > ... none of this really requires C, I think we're at the point
> > > where
> > > (iirc) Andrew's and my suggestion wants following, moving this to
> > > assembly code (at which point it doesn't need to be a separate
> > > function). You can still build page tables in C, of course.
> > > (Likely
> > > you then also won't need a separate section; some minimal
> > > alignment
> > > guarantees ought to suffice to make sure the critical code is
> > > confined to a single page.)
> > 
> > Thanks. I'll move all of this to assembly code.
> > Regarding alignment it is needed alignment on start and end of
> > function:
> >     .balign PAGE_SIZE
> >     GLOBAL(turn_on_mmu)
> >     ...
> >     .balign PAGE_SIZE
> >     ENDPROC(turn_on_mmu)
> > 
> > Does the better way exist?
> 
> The function is only going to be a handful of instructions. Its
> alignment doesn't need to be larger than the next power of 2. I
> expect you'll be good with 64-byte alignment. (In no case do you
> need to align the end of the function: Putting other stuff there
> is not a problem at all.) What you want in any event is a build
> time check that the within-a-page constraint is met.
But shouldn't be an address be aligned to a boundary equal to page
size?

According to the RISC-V privileged spec:
Any level of PTE may be a leaf PTE, so in addition to 4 KiB pages, Sv39
supports 2 MiB megapages
and 1 GiB gigapages, each of which must be virtually and physically
aligned to a boundary equal
to its size. A page-fault exception is raised if the physical address
is insufficiently aligned.

~ Oleksii



Re: [PATCH v5 3/4] xen/ppc: Implement early serial printk on pseries

2023-07-26 Thread Shawn Anastasio
On 7/26/23 10:45 AM, Jan Beulich wrote:
> On 26.07.2023 17:42, Shawn Anastasio wrote:
>> On 7/26/23 10:32 AM, Jan Beulich wrote:
>>> On 24.07.2023 17:06, Shawn Anastasio wrote:
 On 7/24/23 7:40 AM, Jan Beulich wrote:
> On 21.07.2023 19:02, Shawn Anastasio wrote:
>> On typical Power VMs (e.g. QEMU's -M pseries), a variety of services
>> including an early serial console are provided by Open Firmware.
>> Implement the required interfaces to call into Open Firmware and write
>> to the serial console.
>>
>> Since Open Firmware runs in 32-bit Big Endian mode and Xen runs in
>> 64-bit Little Endian mode, a thunk is required to save/restore
>> any potentially-clobbered registers as well as to perform the
>> required endianness switch. Thankfully, linux already has such
>> a routine, which was imported into ppc64/of-call.S.
>>
>> Support for bare metal (PowerNV) will be implemented in a future
>> patch.
>>
>> Signed-off-by: Shawn Anastasio 
>
> While I've committed the earlier two patches, I had to back out this
> one. In my environment (gcc13) the build fails due an unresolved
> reference to memset() out of boot-of.c (supposedly from of_call()).

 Does removing the `{ 0 }` initializer to `struct of_service s` on line
 43 resolve this?
>>>
>>> Yes, that's what's causing the call (and removing, whether or not correct,
>>> helps).
>>
>> Thanks for confirming. Removing it should be fine since the code
>> manually initializes all of the other fields of the struct. The only
>> behavioral difference is that the members of `ofs_args` at indices >=
>> nargs would be left uninitialized. This shouldn't be an issue though
>> since we're guarding reads of the array on `nargs` and `nrets` and thus
>> only read explicitly initialized values (and of course, firmware would
>> do the same).
>>
>> Naturally we can't avoid memset calls forever. I have lib/ building
>> locally, but if we could get this series in without having to make those
>> changes here that'd be great.
> 
> Are you suggesting I should put in this patch almost as is, with just
> that initializer dropped?

Yes. I've tested the change locally and it still behaves correctly, so
if dropping it removes the memset invocation on your toolchain then I'd
say the patch is fine to go with that initializer dropped.

I could also submit a v6 with the initializer dropped, if you'd be more
comfortable with that.

> Jan

Thanks,
Shawn



Re: [PATCH v5 3/4] xen/ppc: Implement early serial printk on pseries

2023-07-26 Thread Jan Beulich
On 26.07.2023 17:42, Shawn Anastasio wrote:
> On 7/26/23 10:32 AM, Jan Beulich wrote:
>> On 24.07.2023 17:06, Shawn Anastasio wrote:
>>> On 7/24/23 7:40 AM, Jan Beulich wrote:
 On 21.07.2023 19:02, Shawn Anastasio wrote:
> On typical Power VMs (e.g. QEMU's -M pseries), a variety of services
> including an early serial console are provided by Open Firmware.
> Implement the required interfaces to call into Open Firmware and write
> to the serial console.
>
> Since Open Firmware runs in 32-bit Big Endian mode and Xen runs in
> 64-bit Little Endian mode, a thunk is required to save/restore
> any potentially-clobbered registers as well as to perform the
> required endianness switch. Thankfully, linux already has such
> a routine, which was imported into ppc64/of-call.S.
>
> Support for bare metal (PowerNV) will be implemented in a future
> patch.
>
> Signed-off-by: Shawn Anastasio 

 While I've committed the earlier two patches, I had to back out this
 one. In my environment (gcc13) the build fails due an unresolved
 reference to memset() out of boot-of.c (supposedly from of_call()).
>>>
>>> Does removing the `{ 0 }` initializer to `struct of_service s` on line
>>> 43 resolve this?
>>
>> Yes, that's what's causing the call (and removing, whether or not correct,
>> helps).
> 
> Thanks for confirming. Removing it should be fine since the code
> manually initializes all of the other fields of the struct. The only
> behavioral difference is that the members of `ofs_args` at indices >=
> nargs would be left uninitialized. This shouldn't be an issue though
> since we're guarding reads of the array on `nargs` and `nrets` and thus
> only read explicitly initialized values (and of course, firmware would
> do the same).
> 
> Naturally we can't avoid memset calls forever. I have lib/ building
> locally, but if we could get this series in without having to make those
> changes here that'd be great.

Are you suggesting I should put in this patch almost as is, with just
that initializer dropped?

Jan



Re: [PATCH v3 6/8] RISC-V: annotate entry points with type and size

2023-07-26 Thread Jan Beulich
On 26.07.2023 17:28, Oleksii wrote:
> On Mon, 2023-07-10 at 10:58 +0200, Jan Beulich wrote:
>> On 10.07.2023 10:56, Jan Beulich wrote:
>>> Use the generic framework in xen/linkage.h. No change in generated
>>> code
>>> except of course the converted symbols change to be hidden ones and
>>> gain
>>> a valid size.
>>>
>>> Signed-off-by: Jan Beulich 
>>
>> I'm sorry, the Cc list was incomplete here. Adding back the remaining
>> REST
>> maintainers.
>>
>> Jan
>>
>>> ---
>>> Probably count_args_exp() should move to macros.h, but I first
>>> wanted to
>>> see whether anyone can suggest any better approach for checking
>>> whether
>>> a defined macro expands to nothing.
> What about introduction of conditional macros ?
> Something similar to:
> #include 
> 
> #define CONDITIONAL_RETURN(arg1, arg2) CONDITIONAL_RETURN_IMPL(arg1,
> arg2, EMPTY)
> 
> #define EMPTY(...) ""
> 
> #define CONDITIONAL_RETURN_IMPL(arg1, arg2, empty_check) \
> CONDITIONAL_RETURN_##empty_check(arg1, arg2)
> 
> #define CONDITIONAL_RETURN_EMPTY(arg1, arg2) \
> CONDITIONAL_RETURN_ARG1(arg1, arg2)
> 
> #define CONDITIONAL_RETURN_ARG1(arg1, arg2) arg1, arg2
> 
> #define CONDITIONAL_RETURN_ARG2(arg1, arg2) arg1

I don't see how this would be used in your scheme. It ...

> int main() {
> int a = 42;
> const char* b = "hello";
> 
> // Second argument is not empty, both arguments are returned
> printf("Case 1: %d, %s\n", CONDITIONAL_RETURN(a, b));  // Prints:
> Case 1: 42, hello
> 
> // Second argument is empty, only the first argument is returned
> printf("Case 2: %d, %s\n", CONDITIONAL_RETURN(a, "")); // Prints:
> Case 2: 42,

... certainly isn't here, or this likely would cause at least a warning
from the compiler (for there being too few arguments to printf()) and
then a runtime UB for interpreting something as a pointer to a string
which likely isn't.

> return 0;
> }
> 
> and then define DO_CODE_ALIGN using CONDITIONAL_RETURN?

Afaict instead of getting rid of the comma, you'd actually add ""
after it. What am I missing?

Jan



Re: [PATCH v5 3/4] xen/ppc: Implement early serial printk on pseries

2023-07-26 Thread Shawn Anastasio
On 7/26/23 10:32 AM, Jan Beulich wrote:
> On 24.07.2023 17:06, Shawn Anastasio wrote:
>> On 7/24/23 7:40 AM, Jan Beulich wrote:
>>> On 21.07.2023 19:02, Shawn Anastasio wrote:
 On typical Power VMs (e.g. QEMU's -M pseries), a variety of services
 including an early serial console are provided by Open Firmware.
 Implement the required interfaces to call into Open Firmware and write
 to the serial console.

 Since Open Firmware runs in 32-bit Big Endian mode and Xen runs in
 64-bit Little Endian mode, a thunk is required to save/restore
 any potentially-clobbered registers as well as to perform the
 required endianness switch. Thankfully, linux already has such
 a routine, which was imported into ppc64/of-call.S.

 Support for bare metal (PowerNV) will be implemented in a future
 patch.

 Signed-off-by: Shawn Anastasio 
>>>
>>> While I've committed the earlier two patches, I had to back out this
>>> one. In my environment (gcc13) the build fails due an unresolved
>>> reference to memset() out of boot-of.c (supposedly from of_call()).
>>
>> Does removing the `{ 0 }` initializer to `struct of_service s` on line
>> 43 resolve this?
> 
> Yes, that's what's causing the call (and removing, whether or not correct,
> helps).

Thanks for confirming. Removing it should be fine since the code
manually initializes all of the other fields of the struct. The only
behavioral difference is that the members of `ofs_args` at indices >=
nargs would be left uninitialized. This shouldn't be an issue though
since we're guarding reads of the array on `nargs` and `nrets` and thus
only read explicitly initialized values (and of course, firmware would
do the same).

Naturally we can't avoid memset calls forever. I have lib/ building
locally, but if we could get this series in without having to make those
changes here that'd be great.

> Jan

Thanks,
Shawn




Re: [PATCH v2] vpci: add permission checks to map_range()

2023-07-26 Thread Rahul Singh
Hi Roger,

> On 26 Jul 2023, at 3:01 pm, Roger Pau Monne  wrote:
> 
> Just like it's done for the XEN_DOMCTL_memory_mapping hypercall, add
> the permissions checks to vPCI map_range(), which is used to map the
> BARs into the domain p2m.
> 
> Adding those checks requires that for x86 PVH hardware domain builder
> the permissions are set before initializing the IOMMU, or else
> attempts to initialize vPCI done as part of IOMMU device setup will
> fail due to missing permissions to create the BAR mappings.
> 
> While moving the call to dom0_setup_permissions() convert the panic()
> used for error handling to a printk, the caller will already panic if
> required.
> 
> Fixes: 9c244fdef7e7 ('vpci: add header handlers')
> Signed-off-by: Roger Pau Monné 

I tested the patch on ARM board with vPCI enabled everything works.

Reviewed-by: Rahul Singh 
Tested-by: Rahul Singh 

Regards,
Rahul



Re: [PATCH v5 3/4] xen/ppc: Implement early serial printk on pseries

2023-07-26 Thread Jan Beulich
On 24.07.2023 17:06, Shawn Anastasio wrote:
> On 7/24/23 7:40 AM, Jan Beulich wrote:
>> On 21.07.2023 19:02, Shawn Anastasio wrote:
>>> On typical Power VMs (e.g. QEMU's -M pseries), a variety of services
>>> including an early serial console are provided by Open Firmware.
>>> Implement the required interfaces to call into Open Firmware and write
>>> to the serial console.
>>>
>>> Since Open Firmware runs in 32-bit Big Endian mode and Xen runs in
>>> 64-bit Little Endian mode, a thunk is required to save/restore
>>> any potentially-clobbered registers as well as to perform the
>>> required endianness switch. Thankfully, linux already has such
>>> a routine, which was imported into ppc64/of-call.S.
>>>
>>> Support for bare metal (PowerNV) will be implemented in a future
>>> patch.
>>>
>>> Signed-off-by: Shawn Anastasio 
>>
>> While I've committed the earlier two patches, I had to back out this
>> one. In my environment (gcc13) the build fails due an unresolved
>> reference to memset() out of boot-of.c (supposedly from of_call()).
> 
> Does removing the `{ 0 }` initializer to `struct of_service s` on line
> 43 resolve this?

Yes, that's what's causing the call (and removing, whether or not correct,
helps).

Jan



Re: [PATCH v3 6/8] RISC-V: annotate entry points with type and size

2023-07-26 Thread Oleksii
On Mon, 2023-07-10 at 10:58 +0200, Jan Beulich wrote:
> On 10.07.2023 10:56, Jan Beulich wrote:
> > Use the generic framework in xen/linkage.h. No change in generated
> > code
> > except of course the converted symbols change to be hidden ones and
> > gain
> > a valid size.
> > 
> > Signed-off-by: Jan Beulich 
> 
> I'm sorry, the Cc list was incomplete here. Adding back the remaining
> REST
> maintainers.
> 
> Jan
> 
> > ---
> > Probably count_args_exp() should move to macros.h, but I first
> > wanted to
> > see whether anyone can suggest any better approach for checking
> > whether
> > a defined macro expands to nothing.
What about introduction of conditional macros ?
Something similar to:
#include 

#define CONDITIONAL_RETURN(arg1, arg2) CONDITIONAL_RETURN_IMPL(arg1,
arg2, EMPTY)

#define EMPTY(...) ""

#define CONDITIONAL_RETURN_IMPL(arg1, arg2, empty_check) \
CONDITIONAL_RETURN_##empty_check(arg1, arg2)

#define CONDITIONAL_RETURN_EMPTY(arg1, arg2) \
CONDITIONAL_RETURN_ARG1(arg1, arg2)

#define CONDITIONAL_RETURN_ARG1(arg1, arg2) arg1, arg2

#define CONDITIONAL_RETURN_ARG2(arg1, arg2) arg1

int main() {
int a = 42;
const char* b = "hello";

// Second argument is not empty, both arguments are returned
printf("Case 1: %d, %s\n", CONDITIONAL_RETURN(a, b));  // Prints:
Case 1: 42, hello

// Second argument is empty, only the first argument is returned
printf("Case 2: %d, %s\n", CONDITIONAL_RETURN(a, "")); // Prints:
Case 2: 42,

return 0;
}

and then define DO_CODE_ALIGN using CONDITIONAL_RETURN?


~ Oleksii

> > ---
> > v3: New.
> > 
> > --- a/xen/arch/riscv/entry.S
> > +++ b/xen/arch/riscv/entry.S
> > @@ -5,7 +5,7 @@
> >  #include 
> >  
> >  /* WIP: only works while interrupting Xen context */
> > -ENTRY(handle_trap)
> > +FUNC(handle_trap)
> >  
> >  /* Exceptions from xen */
> >  save_to_stack:
> > @@ -92,3 +92,4 @@ restore_registers:
> >  REG_L   sp, CPU_USER_REGS_SP(sp)
> >  
> >  sret
> > +END(handle_trap)
> > --- a/xen/arch/riscv/include/asm/asm.h
> > +++ b/xen/arch/riscv/include/asm/asm.h
> > @@ -7,6 +7,7 @@
> >  #define _ASM_RISCV_ASM_H
> >  
> >  #ifdef __ASSEMBLY__
> > +#include 
> >  #define __ASM_STR(x)   x
> >  #else
> >  #define __ASM_STR(x)   #x
> > --- a/xen/arch/riscv/include/asm/config.h
> > +++ b/xen/arch/riscv/include/asm/config.h
> > @@ -67,12 +67,8 @@
> >  
> >  /* Linkage for RISCV */
> >  #ifdef __ASSEMBLY__
> > -#define ALIGN .align 4
> > -
> > -#define ENTRY(name)    \
> > -  .globl name; \
> > -  ALIGN;   \
> > -  name:
> > +#define CODE_ALIGN 16
> > +#define CODE_FILL /* empty */
> >  #endif
> >  
> >  #ifdef CONFIG_RISCV_64
> > --- a/xen/arch/riscv/riscv64/head.S
> > +++ b/xen/arch/riscv/riscv64/head.S
> > @@ -8,7 +8,7 @@
> >   *   a0 -> hart_id ( bootcpu_id )
> >   *   a1 -> dtb_base 
> >   */
> > -ENTRY(start)
> > +FUNC(start)
> >  /* Mask all interrupts */
> >  csrw    CSR_SIE, zero
> >  
> > @@ -30,13 +30,14 @@ ENTRY(start)
> >  jal reset_stack
> >  
> >  tail    start_xen
> > +END(start)
> >  
> >  .section .text, "ax", %progbits
> >  
> > -ENTRY(reset_stack)
> > +FUNC(reset_stack)
> >  la  sp, cpu0_boot_stack
> >  li  t0, STACK_SIZE
> >  add sp, sp, t0
> >  
> >  ret
> > -
> > +END(reset_stack)
> > --- a/xen/include/xen/linkage.h
> > +++ b/xen/include/xen/linkage.h
> > @@ -37,17 +37,28 @@
> >  
> >  #define END(name) .size name, . - name
> >  
> > +/*
> > + * CODE_FILL in particular may need to expand to nothing (e.g. for
> > RISC-V), in
> > + * which case we also need to get rid of the comma in the .balign
> > directive.
> > + */
> > +#define count_args_exp(args...) count_args(args)
> > +#if count_args_exp(CODE_FILL)
> > +# define DO_CODE_ALIGN(algn...) LASTARG(CODE_ALIGN, ## algn),
> > CODE_FILL
> > +#else
> > +# define DO_CODE_ALIGN(algn...) LASTARG(CODE_ALIGN, ## algn)
> > +#endif
> > +
> >  #define FUNC(name, algn...) \
> > -    SYM(name, FUNC, GLOBAL, LASTARG(CODE_ALIGN, ## algn),
> > CODE_FILL)
> > +    SYM(name, FUNC, GLOBAL, DO_CODE_ALIGN(algn))
> >  #define LABEL(name, algn...) \
> > -    SYM(name, NONE, GLOBAL, LASTARG(CODE_ALIGN, ## algn),
> > CODE_FILL)
> > +    SYM(name, NONE, GLOBAL, DO_CODE_ALIGN(algn))
> >  #define DATA(name, algn...) \
> >  SYM(name, DATA, GLOBAL, LASTARG(DATA_ALIGN, ## algn),
> > DATA_FILL)
> >  
> >  #define FUNC_LOCAL(name, algn...) \
> > -    SYM(name, FUNC, LOCAL, LASTARG(CODE_ALIGN, ## algn),
> > CODE_FILL)
> > +    SYM(name, FUNC, LOCAL, DO_CODE_ALIGN(algn))
> >  #define LABEL_LOCAL(name, algn...) \
> > -    SYM(name, NONE, LOCAL, LASTARG(CODE_ALIGN, ## algn),
> > CODE_FILL)
> > +    SYM(name, NONE, LOCAL, DO_CODE_ALIGN(algn))
> >  #define DATA_LOCAL(name, algn...) \
> >  SYM(name, DATA, LOCAL, LASTARG(DATA_ALIGN, 

[xen-unstable-smoke test] 182024: tolerable all pass - PUSHED

2023-07-26 Thread osstest service owner
flight 182024 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182024/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  3d2d4ea026df73c37a7df7e216443cbf652ff892
baseline version:
 xen  1f8a6a99b225d34cf608f47b2938092e310f9e03

Last test of basis   182015  2023-07-25 22:02:06 Z0 days
Testing same since   182024  2023-07-26 13:01:24 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony PERARD 
  Jan Beulich 
  Roger Pau Monne 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   1f8a6a99b2..3d2d4ea026  3d2d4ea026df73c37a7df7e216443cbf652ff892 -> smoke



Re: [PATCH] libxl: Add missing libxl__virtio_devtype to device_type_tbl array

2023-07-26 Thread Oleksandr Tyshchenko


On 26.07.23 17:50, Jan Beulich wrote:

Hello Jan


> On 26.07.2023 16:14, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko 
>>
>> Without it being present it won't be possible to use some
>> libxl__device_type's callbacks for virtio devices as the common code
>> can only invoke these callbacks (by dereferencing a pointer) for valid
>> libxl__device_type's elements when iterating over device_type_tbl[].
>>
>> Signed-off-by: Oleksandr Tyshchenko 
>> ---
>>   tools/libs/light/libxl_create.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/tools/libs/light/libxl_create.c 
>> b/tools/libs/light/libxl_create.c
>> index 393c535579..c91059d713 100644
>> --- a/tools/libs/light/libxl_create.c
>> +++ b/tools/libs/light/libxl_create.c
>> @@ -1887,6 +1887,7 @@ const libxl__device_type *device_type_tbl[] = {
>>   __dtdev_devtype,
>>   __vdispl_devtype,
>>   __vsnd_devtype,
>> +__virtio_devtype,
>>   NULL
>>   };
> 
>  From description and nature of the change this looks like a Fixes:
> tag would be warranted.

Looks like, yes. Thanks.

I guess, this should point to the commit that introduced 
libxl__virtio_devtype

Fixes: 43ba5202e2ee ('libxl: add support for generic virtio device')


> 
> Jan

Re: [PATCH v4 2/2] xen/riscv: introduce identity mapping

2023-07-26 Thread Jan Beulich
On 26.07.2023 15:12, Oleksii wrote:
> On Wed, 2023-07-26 at 13:58 +0200, Jan Beulich wrote:
>> On 26.07.2023 13:23, Oleksii wrote:
>>> I would like to ask for advice on whether it would be easier, less
>>> bug-
>>> provoking ( during identity mapping to remove of whole Xen ) to
>>> have a
>>> separate identity section that won't be more than PAGE_SIZE.
>>
>> I'm afraid you can't safely do this in C, or at least not without
>> further checking on what the compiler actually did.
>>
>>> @@ -264,6 +268,19 @@ void __init enable_mmu(void)
>>>                RV_STAGE1_MODE << SATP_MODE_SHIFT);
>>>  }
>>>  
>>> +void __attribute__((naked)) __section(".ident")
>>> turn_on_mmu(unsigned
>>> long ra)
>>
>> Did you read what gcc doc says about "naked"? Extended asm() isn't
>> supported there. Since ...
>>
>>> +{
>>> +    /* Ensure page table writes precede loading the SATP */
>>> +    sfence_vma();
>>> +
>>> +    /* Enable the MMU and load the new pagetable for Xen */
>>> +    csr_write(CSR_SATP,
>>> +              PFN_DOWN((unsigned long)stage1_pgtbl_root) |
>>> +              RV_STAGE1_MODE << SATP_MODE_SHIFT);
>>> +
>>> +    asm volatile( "jr %0\n" : : "r"(ra) );
>>> +}
>>
>> ... none of this really requires C, I think we're at the point where
>> (iirc) Andrew's and my suggestion wants following, moving this to
>> assembly code (at which point it doesn't need to be a separate
>> function). You can still build page tables in C, of course. (Likely
>> you then also won't need a separate section; some minimal alignment
>> guarantees ought to suffice to make sure the critical code is
>> confined to a single page.)
> 
> Thanks. I'll move all of this to assembly code.
> Regarding alignment it is needed alignment on start and end of
> function:
> .balign PAGE_SIZE
> GLOBAL(turn_on_mmu)
> ...
> .balign PAGE_SIZE
> ENDPROC(turn_on_mmu)
> 
> Does the better way exist?

The function is only going to be a handful of instructions. Its
alignment doesn't need to be larger than the next power of 2. I
expect you'll be good with 64-byte alignment. (In no case do you
need to align the end of the function: Putting other stuff there
is not a problem at all.) What you want in any event is a build
time check that the within-a-page constraint is met.

Jan



Re: [PATCH v2] vpci: add permission checks to map_range()

2023-07-26 Thread Jan Beulich
On 26.07.2023 16:01, Roger Pau Monne wrote:
> Just like it's done for the XEN_DOMCTL_memory_mapping hypercall, add
> the permissions checks to vPCI map_range(), which is used to map the
> BARs into the domain p2m.
> 
> Adding those checks requires that for x86 PVH hardware domain builder
> the permissions are set before initializing the IOMMU, or else
> attempts to initialize vPCI done as part of IOMMU device setup will
> fail due to missing permissions to create the BAR mappings.
> 
> While moving the call to dom0_setup_permissions() convert the panic()
> used for error handling to a printk, the caller will already panic if
> required.
> 
> Fixes: 9c244fdef7e7 ('vpci: add header handlers')
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 





Re: [XEN PATCH] xen/kernel: change parameter name in add_taint() definition

2023-07-26 Thread Jan Beulich
On 26.07.2023 15:58, Federico Serafini wrote:
> Change parameter name from 'flag' to 'taint' for consistency with
> the corresponding declaration.
> This addresses a violation of MISRA C:2012 Rule 8.3: "All declarations
> of an object or function shall use the same names and type qualifiers".
> 
> No functional changes.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich 





Re: [PATCH] libxl: Add missing libxl__virtio_devtype to device_type_tbl array

2023-07-26 Thread Jan Beulich
On 26.07.2023 16:14, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko 
> 
> Without it being present it won't be possible to use some
> libxl__device_type's callbacks for virtio devices as the common code
> can only invoke these callbacks (by dereferencing a pointer) for valid
> libxl__device_type's elements when iterating over device_type_tbl[].
> 
> Signed-off-by: Oleksandr Tyshchenko 
> ---
>  tools/libs/light/libxl_create.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
> index 393c535579..c91059d713 100644
> --- a/tools/libs/light/libxl_create.c
> +++ b/tools/libs/light/libxl_create.c
> @@ -1887,6 +1887,7 @@ const libxl__device_type *device_type_tbl[] = {
>  __dtdev_devtype,
>  __vdispl_devtype,
>  __vsnd_devtype,
> +__virtio_devtype,
>  NULL
>  };

>From description and nature of the change this looks like a Fixes:
tag would be warranted.

Jan



Re: [PATCH mm-unstable v7 00/31] Split ptdesc from struct page

2023-07-26 Thread Matthew Wilcox
On Mon, Jul 24, 2023 at 09:41:36PM -0700, Hugh Dickins wrote:
> On Mon, 24 Jul 2023, Vishal Moola (Oracle) wrote:
> 
> > The MM subsystem is trying to shrink struct page. This patchset
> > introduces a memory descriptor for page table tracking - struct ptdesc.
> > 
> > This patchset introduces ptdesc, splits ptdesc from struct page, and
> > converts many callers of page table constructor/destructors to use ptdescs.
> > 
> > Ptdesc is a foundation to further standardize page tables, and eventually
> > allow for dynamic allocation of page tables independent of struct page.
> > However, the use of pages for page table tracking is quite deeply
> > ingrained and varied across archictectures, so there is still a lot of
> > work to be done before that can happen.
> 
> Others may differ, but it remains the case that I see no point to this
> patchset, until the minimal descriptor that replaces struct page is
> working, and struct page then becomes just overhead.  Until that time,
> let architectures continue to use struct page as they do - whyever not?

Because it's easier for architecture maintainers to understand what they
should and shouldn't be using.  Look at the definition:

+struct ptdesc {
+   unsigned long __page_flags;
+
+   union {
+   struct rcu_head pt_rcu_head;
+   struct list_head pt_list;
+   struct {
+   unsigned long _pt_pad_1;
+   pgtable_t pmd_huge_pte;
+   };
+   };
+   unsigned long __page_mapping;
+
+   union {
+   struct mm_struct *pt_mm;
+   atomic_t pt_frag_refcount;
+   };
+
+   union {
+   unsigned long _pt_pad_2;
+#if ALLOC_SPLIT_PTLOCKS
+   spinlock_t *ptl;
+#else
+   spinlock_t ptl;
+#endif
+   };
+   unsigned int __page_type;
+   atomic_t _refcount;
+#ifdef CONFIG_MEMCG
+   unsigned long pt_memcg_data;
+#endif
+};

It's still a 31-line struct definition, I'll grant you.  But it's far
easier to comprehend than the definition of struct page (~140 lines).
An architecture maintainer can look at it and see what might be available,
and what is already used.  And hopefully we'll have less "Oh, I'll just
use page->private".  It's really not fair to expect arch maintainers to
learn so much about the mm.

It's still messier than I would like, but I don't think we can do better
for now.  I don't understand why you're so interested in delaying doing
this work.



[PATCH] libxl: Add missing libxl__virtio_devtype to device_type_tbl array

2023-07-26 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Without it being present it won't be possible to use some
libxl__device_type's callbacks for virtio devices as the common code
can only invoke these callbacks (by dereferencing a pointer) for valid
libxl__device_type's elements when iterating over device_type_tbl[].

Signed-off-by: Oleksandr Tyshchenko 
---
 tools/libs/light/libxl_create.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index 393c535579..c91059d713 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -1887,6 +1887,7 @@ const libxl__device_type *device_type_tbl[] = {
 __dtdev_devtype,
 __vdispl_devtype,
 __vsnd_devtype,
+__virtio_devtype,
 NULL
 };
 
-- 
2.34.1




Re: [PATCH] vpci: add permission checks to map_range()

2023-07-26 Thread Jan Beulich
On 26.07.2023 15:37, Roger Pau Monné wrote:
> On Wed, Jul 26, 2023 at 02:36:17PM +0200, Jan Beulich wrote:
>> Another Dom0 related concern can probably be put off until we actually
>> get a report of this failing (which may be more likely because of the
>> XSM check below): The function being used as a callback passed to
>> rangeset_consume_ranges(), failure may affect just a single BAR, while
>> the incoming range may cover multiple of them in one go. Depending on
>> what functionality such a BAR covers, the device may remain usable (a
>> typical example of what I'm thinking of is a multi-function device
>> having serial and/or parallel port on it, which are fine to be driven
>> via I/O ports even if driving via MMIO is possible [and would likely
>> be more efficient]). Of course, to allow some MMIO bars to be used
>> while prohibiting use of some others, further trickery may be needed.
>> But not exposing the device to Dom0 at all doesn't seem very nice in
>> such a case.
> 
> Hm, I see.  For dom0 we might want to consider ignoring mapping
> failures, the problem is that we would need to narrow down the pages
> not allowed to be mapped, as part of the range passed to map_range()
> might be allowed.  We would need to resort to checking permissions on
> a page by page basis, which is not overly nice.

Right, all of which is why I prefixed the while paragraph with "can
probably be put off until ...".

> I think it's more likely for such BARs to be marked as read-only
> (instead of denying access), in which case the checking here would
> still be OK.

Maybe, but (a) granting r/o access may still be beyond what an XSM
policy intends and (b) might be a problem when reads have side
effects.

Jan



[PATCH v2] vpci: add permission checks to map_range()

2023-07-26 Thread Roger Pau Monne
Just like it's done for the XEN_DOMCTL_memory_mapping hypercall, add
the permissions checks to vPCI map_range(), which is used to map the
BARs into the domain p2m.

Adding those checks requires that for x86 PVH hardware domain builder
the permissions are set before initializing the IOMMU, or else
attempts to initialize vPCI done as part of IOMMU device setup will
fail due to missing permissions to create the BAR mappings.

While moving the call to dom0_setup_permissions() convert the panic()
used for error handling to a printk, the caller will already panic if
required.

Fixes: 9c244fdef7e7 ('vpci: add header handlers')
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Fix printk calls.
 - Expand commit message.
---
I'm unsure whether on ARM MMIO permissions are properly set for the
hardware domain, but I don't have a system to test with.
---
 xen/arch/x86/hvm/dom0_build.c | 21 ++---
 xen/drivers/vpci/header.c | 20 
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index a7ae9c3b046e..bc0e290db612 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -715,13 +715,6 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t 
entry,
 return rc;
 }
 
-rc = dom0_setup_permissions(d);
-if ( rc )
-{
-panic("Unable to setup Dom0 permissions: %d\n", rc);
-return rc;
-}
-
 update_domain_wallclock_time(d);
 
 v->is_initialised = 1;
@@ -1184,6 +1177,20 @@ int __init dom0_construct_pvh(struct domain *d, const 
module_t *image,
 
 printk(XENLOG_INFO "*** Building a PVH Dom%d ***\n", d->domain_id);
 
+if ( is_hardware_domain(d) )
+{
+/*
+ * Setup permissions early so that calls to add MMIO regions to the
+ * p2m as part of vPCI setup don't fail due to permission checks.
+ */
+rc = dom0_setup_permissions(d);
+if ( rc )
+{
+printk("%pd unable to setup permissions: %d\n", d, rc);
+return rc;
+}
+}
+
 /*
  * NB: MMCFG initialization needs to be performed before iommu
  * initialization so the iommu code can fetch the MMCFG regions used by the
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index b41556d00746..60f7049e3498 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -17,10 +17,13 @@
  * License along with this program; If not, see .
  */
 
+#include 
 #include 
 #include 
 #include 
 
+#include 
+
 #include 
 #include 
 
@@ -43,6 +46,23 @@ static int cf_check map_range(
 {
 unsigned long size = e - s + 1;
 
+if ( !iomem_access_permitted(map->d, s, e) )
+{
+printk(XENLOG_G_WARNING
+   "%pd denied access to MMIO range [%#lx, %#lx]\n",
+   map->d, s, e);
+return -EPERM;
+}
+
+rc = xsm_iomem_mapping(XSM_HOOK, map->d, s, e, map->map);
+if ( rc )
+{
+printk(XENLOG_G_WARNING
+   "%pd XSM denied access to MMIO range [%#lx, %#lx]: %d\n",
+   map->d, s, e, rc);
+return rc;
+}
+
 /*
  * ARM TODOs:
  * - On ARM whether the memory is prefetchable or not should be passed
-- 
2.41.0




[XEN PATCH] xen/kernel: change parameter name in add_taint() definition

2023-07-26 Thread Federico Serafini
Change parameter name from 'flag' to 'taint' for consistency with
the corresponding declaration.
This addresses a violation of MISRA C:2012 Rule 8.3: "All declarations
of an object or function shall use the same names and type qualifiers".

No functional changes.

Signed-off-by: Federico Serafini 
---
 xen/common/kernel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 719b08d6c7..fb919f3d9c 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -397,9 +397,9 @@ char *print_tainted(char *str)
 return str;
 }
 
-void add_taint(unsigned int flag)
+void add_taint(unsigned int taint)
 {
-tainted |= flag;
+tainted |= taint;
 }
 
 extern const initcall_t __initcall_start[], __presmp_initcall_end[],
-- 
2.34.1




Re: [PATCH] vpci: add permission checks to map_range()

2023-07-26 Thread Roger Pau Monné
On Wed, Jul 26, 2023 at 02:36:17PM +0200, Jan Beulich wrote:
> On 24.07.2023 17:37, Roger Pau Monne wrote:
> > @@ -1184,6 +1177,20 @@ int __init dom0_construct_pvh(struct domain *d, 
> > const module_t *image,
> >  
> >  printk(XENLOG_INFO "*** Building a PVH Dom%d ***\n", d->domain_id);
> >  
> > +if ( is_hardware_domain(d) )
> > +{
> > +/*
> > + * Setup permissions early so that calls to add MMIO regions to the
> > + * p2m as part of vPCI setup don't fail due to permission checks.
> > + */
> > +rc = dom0_setup_permissions(d);
> > +if ( rc )
> > +{
> > +printk("%pd unable to setup permissions: %d\n", d, rc);
> 
> The switch from panic() to printk() may want mentioning in the description
> as deliberate. (The usefulness of %pd here is debatable, as it can't be
> other than Dom0. But I don't mind.)

The printk just above uses Dom%d, so I assumed it was best to not
hardcode 0 here either.

> > @@ -43,6 +46,21 @@ static int cf_check map_range(
> >  {
> >  unsigned long size = e - s + 1;
> >  
> > +if ( !iomem_access_permitted(map->d, s, e) )
> > +{
> > +gprintk(XENLOG_WARNING,
> > +"%pd denied access to MMIO range [%#lx, %#lx]\n", s, 
> > e);
> 
> This doesn't look like it would compile. Also gprintk() logs current,
> which I'm not sure is generally applicable here. IOW I think it wants
> to be
> 
> printk(XENLOG_G_WARNING,
>"%pd denied access to MMIO range [%#lx, %#lx]\n",
>map->d, s, e);
> 
> Same for the other log message then.

Oh great.  I changed the format of those before sending and didn't
rebuild properly.

> Another Dom0 related concern can probably be put off until we actually
> get a report of this failing (which may be more likely because of the
> XSM check below): The function being used as a callback passed to
> rangeset_consume_ranges(), failure may affect just a single BAR, while
> the incoming range may cover multiple of them in one go. Depending on
> what functionality such a BAR covers, the device may remain usable (a
> typical example of what I'm thinking of is a multi-function device
> having serial and/or parallel port on it, which are fine to be driven
> via I/O ports even if driving via MMIO is possible [and would likely
> be more efficient]). Of course, to allow some MMIO bars to be used
> while prohibiting use of some others, further trickery may be needed.
> But not exposing the device to Dom0 at all doesn't seem very nice in
> such a case.

Hm, I see.  For dom0 we might want to consider ignoring mapping
failures, the problem is that we would need to narrow down the pages
not allowed to be mapped, as part of the range passed to map_range()
might be allowed.  We would need to resort to checking permissions on
a page by page basis, which is not overly nice.

I think it's more likely for such BARs to be marked as read-only
(instead of denying access), in which case the checking here would
still be OK.

Thanks, Roger.



Re: [PATCH v4 2/2] xen/riscv: introduce identity mapping

2023-07-26 Thread Oleksii
On Wed, 2023-07-26 at 13:58 +0200, Jan Beulich wrote:
> On 26.07.2023 13:23, Oleksii wrote:
> > I would like to ask for advice on whether it would be easier, less
> > bug-
> > provoking ( during identity mapping to remove of whole Xen ) to
> > have a
> > separate identity section that won't be more than PAGE_SIZE.
> 
> I'm afraid you can't safely do this in C, or at least not without
> further checking on what the compiler actually did.
> 
> > @@ -264,6 +268,19 @@ void __init enable_mmu(void)
> >                RV_STAGE1_MODE << SATP_MODE_SHIFT);
> >  }
> >  
> > +void __attribute__((naked)) __section(".ident")
> > turn_on_mmu(unsigned
> > long ra)
> 
> Did you read what gcc doc says about "naked"? Extended asm() isn't
> supported there. Since ...
> 
> > +{
> > +    /* Ensure page table writes precede loading the SATP */
> > +    sfence_vma();
> > +
> > +    /* Enable the MMU and load the new pagetable for Xen */
> > +    csr_write(CSR_SATP,
> > +              PFN_DOWN((unsigned long)stage1_pgtbl_root) |
> > +              RV_STAGE1_MODE << SATP_MODE_SHIFT);
> > +
> > +    asm volatile( "jr %0\n" : : "r"(ra) );
> > +}
> 
> ... none of this really requires C, I think we're at the point where
> (iirc) Andrew's and my suggestion wants following, moving this to
> assembly code (at which point it doesn't need to be a separate
> function). You can still build page tables in C, of course. (Likely
> you then also won't need a separate section; some minimal alignment
> guarantees ought to suffice to make sure the critical code is
> confined to a single page.)

Thanks. I'll move all of this to assembly code.
Regarding alignment it is needed alignment on start and end of
function:
.balign PAGE_SIZE
GLOBAL(turn_on_mmu)
...
.balign PAGE_SIZE
ENDPROC(turn_on_mmu)

Does the better way exist?

~ Oleksii



[qemu-mainline test] 182006: tolerable FAIL - PUSHED

2023-07-26 Thread osstest service owner
flight 182006 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182006/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 181952
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 181952
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 181952
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 181952
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 181952
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 181952
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 181952
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 181952
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuu885fc169f09f5915ce037263d20a59eb226d473d
baseline version:
 qemuud1181d29370a4318a9f11ea92065bea6bb159f83

Last test of basis   181952  2023-07-21 10:07:07 Z5 days
Testing same since   181993  2023-07-24 13:37:28 Z1 days2 attempts


People who touched revisions under test:
  Alistair Francis 
  Anton Johansson 
  Anushree Mathur 
  Benjamin Gray 
  Bin Meng 
  Conor Dooley 
  Ilya Leoshkevich 
  Jordan Niethe 
  Luca Bonissi 
  Michael Tokarev 
  Peter Maydell 
  Richard Henderson 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 

[PATCH v3 3/4] iommu/vtd: rename io_apic_read_remap_rte() local variable

2023-07-26 Thread Roger Pau Monne
Preparatory change to unify the IO-APIC pin variable name between
io_apic_read_remap_rte() and amd_iommu_ioapic_update_ire(), so that
the local variable can be made a function parameter with the same name
across vendors.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - New in this version.
---
 xen/drivers/passthrough/vtd/intremap.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index 786388b4d9c7..05df6d5759b1 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -441,14 +441,14 @@ unsigned int cf_check io_apic_read_remap_rte(
 void cf_check io_apic_write_remap_rte(
 unsigned int apic, unsigned int reg, unsigned int value)
 {
-unsigned int ioapic_pin = (reg - 0x10) / 2;
+unsigned int pin = (reg - 0x10) / 2;
 struct IO_xAPIC_route_entry old_rte = { };
 struct IO_APIC_route_remap_entry *remap_rte;
 unsigned int rte_upper = (reg & 1) ? 1 : 0;
 struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
 int saved_mask;
 
-old_rte = __ioapic_read_entry(apic, ioapic_pin, true);
+old_rte = __ioapic_read_entry(apic, pin, true);
 
 remap_rte = (struct IO_APIC_route_remap_entry *) _rte;
 
@@ -458,7 +458,7 @@ void cf_check io_apic_write_remap_rte(
 __io_apic_write(apic, reg & ~1, *(u32 *)_rte);
 remap_rte->mask = saved_mask;
 
-if ( ioapic_rte_to_remap_entry(iommu, apic, ioapic_pin,
+if ( ioapic_rte_to_remap_entry(iommu, apic, pin,
_rte, rte_upper, value) )
 {
 __io_apic_write(apic, reg, value);
@@ -468,7 +468,7 @@ void cf_check io_apic_write_remap_rte(
 __io_apic_write(apic, reg & ~1, *(u32 *)_rte);
 }
 else
-__ioapic_write_entry(apic, ioapic_pin, true, old_rte);
+__ioapic_write_entry(apic, pin, true, old_rte);
 }
 
 static void set_msi_source_id(struct pci_dev *pdev, struct iremap_entry *ire)
-- 
2.41.0




[PATCH v3 4/4] x86/iommu: pass full IO-APIC RTE for remapping table update

2023-07-26 Thread Roger Pau Monne
So that the remapping entry can be updated atomically when possible.

Doing such update atomically will avoid Xen having to mask the IO-APIC
pin prior to performing any interrupt movements (ie: changing the
destination and vector fields), as the interrupt remapping entry is
always consistent.

This also simplifies some of the logic on both VT-d and AMD-Vi
implementations, as having the full RTE available instead of half of
it avoids to possibly read and update the missing other half from
hardware.

While there remove the explicit zeroing of new_ire fields in
ioapic_rte_to_remap_entry() and initialize the variable at definition
so all fields are zeroed.  Note fields could be also initialized with
final values at definition, but I found that likely too much to be
done at this time.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Remove unneeded initialization.
 - Use 'rte' as parameter name for update_ire_from_apic()
   implementations.
 - Fix comment style in ioapic_rte_to_remap_entry().
 - Fix requirement for atomic write in update_irte() call from
   ioapic_rte_to_remap_entry().
 - Remove remap_rte from io_apic_write_remap_rte().
---
Note that certain combination of changes to the RTE are impossible to
handle atomically. For example changing the vector and/or destination
fields together with the triggering mode is impossible to be performed
atomically (as the destination and vector is set in the IRTE, but the
triggering mode is set in the RTE).  Xen doesn't attempt to perform
such changes in a single update to the RTE anyway, so it's fine.

Naming the iommu_update_ire_from_apic() parameter RTE is not really
correct, as the format of the passed value expands the destination
field to be 32bits (in order to fit an x2APIC ID).  Passing an
IO_APIC_route_entry struct is not possible due to the circular
dependency that would create between io_apic.h and iommu.h.  It might
be possible to move IO_APIC_route_entry declaration to a different
header, but I haven't looked into it.
---
 xen/arch/x86/include/asm/iommu.h |   3 +-
 xen/arch/x86/io_apic.c   |   5 +-
 xen/drivers/passthrough/amd/iommu.h  |   2 +-
 xen/drivers/passthrough/amd/iommu_intr.c | 100 ++---
 xen/drivers/passthrough/vtd/extern.h |   2 +-
 xen/drivers/passthrough/vtd/intremap.c   | 136 +++
 xen/drivers/passthrough/x86/iommu.c  |   4 +-
 xen/include/xen/iommu.h  |   3 +-
 8 files changed, 86 insertions(+), 169 deletions(-)

diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/iommu.h
index 0540cd9faa87..eb720205e25e 100644
--- a/xen/arch/x86/include/asm/iommu.h
+++ b/xen/arch/x86/include/asm/iommu.h
@@ -84,7 +84,8 @@ struct iommu_init_ops {
 
 extern const struct iommu_init_ops *iommu_init_ops;
 
-void iommu_update_ire_from_apic(unsigned int apic, unsigned int reg, unsigned 
int value);
+void iommu_update_ire_from_apic(unsigned int apic, unsigned int pin,
+uint64_t rte);
 unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg);
 int iommu_setup_hpet_msi(struct msi_desc *);
 
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 041233b9b706..b3afef8933d7 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -275,10 +275,7 @@ void __ioapic_write_entry(
 __io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
 }
 else
-{
-iommu_update_ire_from_apic(apic, 0x11 + 2 * pin, eu.w2);
-iommu_update_ire_from_apic(apic, 0x10 + 2 * pin, eu.w1);
-}
+iommu_update_ire_from_apic(apic, pin, e.raw);
 }
 
 static void ioapic_write_entry(
diff --git a/xen/drivers/passthrough/amd/iommu.h 
b/xen/drivers/passthrough/amd/iommu.h
index 8bc3c35b1bb1..5429ada58ef5 100644
--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -300,7 +300,7 @@ int cf_check amd_iommu_free_intremap_table(
 unsigned int amd_iommu_intremap_table_order(
 const void *irt, const struct amd_iommu *iommu);
 void cf_check amd_iommu_ioapic_update_ire(
-unsigned int apic, unsigned int reg, unsigned int value);
+unsigned int apic, unsigned int pin, uint64_t rte);
 unsigned int cf_check amd_iommu_read_ioapic_from_ire(
 unsigned int apic, unsigned int reg);
 int cf_check amd_iommu_msi_msg_update_ire(
diff --git a/xen/drivers/passthrough/amd/iommu_intr.c 
b/xen/drivers/passthrough/amd/iommu_intr.c
index f32c418a7e49..e83a2a932af8 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -247,11 +247,6 @@ static void update_intremap_entry(const struct amd_iommu 
*iommu,
 }
 }
 
-static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
-{
-return rte->vector | (rte->delivery_mode << 8);
-}
-
 static inline void set_rte_index(struct IO_APIC_route_entry *rte, int offset)
 {
 rte->vector = (u8)offset;
@@ -267,7 +262,6 @@ static int update_intremap_entry_from_ioapic(
 

[PATCH v3 2/4] x86/ioapic: RTE modifications must use ioapic_write_entry

2023-07-26 Thread Roger Pau Monne
Do not allow to write to RTE registers using io_apic_write and instead
require changes to RTE to be performed using ioapic_write_entry.

This is in preparation for passing the full contents of the RTE to the
IOMMU interrupt remapping handlers, so remapping entries for IO-APIC
RTEs can be updated atomically when possible.

While immediately this commit might expand the number of MMIO accesses
in order to update an IO-APIC RTE, further changes will benefit from
getting the full RTE value passed to the IOMMU handlers, as the logic
is greatly simplified when the IOMMU handlers can get the complete RTE
value in one go.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Also adjust __ioapic_read_entry().

Changes since v1:
 - Reinstate io_apic_modify().
 - Expand commit message.
---
 xen/arch/x86/include/asm/io_apic.h   |  8 ++---
 xen/arch/x86/io_apic.c   | 43 
 xen/drivers/passthrough/amd/iommu_intr.c |  6 
 3 files changed, 25 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/include/asm/io_apic.h 
b/xen/arch/x86/include/asm/io_apic.h
index 4c4777b68a51..9165da2281ae 100644
--- a/xen/arch/x86/include/asm/io_apic.h
+++ b/xen/arch/x86/include/asm/io_apic.h
@@ -162,8 +162,8 @@ static inline void __io_apic_write(unsigned int apic, 
unsigned int reg, unsigned
 
 static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned 
int value)
 {
-if ( ioapic_reg_remapped(reg) )
-return iommu_update_ire_from_apic(apic, reg, value);
+/* RTE writes must use ioapic_write_entry. */
+BUG_ON(reg >= 0x10);
 __io_apic_write(apic, reg, value);
 }
 
@@ -173,8 +173,8 @@ static inline void io_apic_write(unsigned int apic, 
unsigned int reg, unsigned i
  */
 static inline void io_apic_modify(unsigned int apic, unsigned int reg, 
unsigned int value)
 {
-if ( ioapic_reg_remapped(reg) )
-return iommu_update_ire_from_apic(apic, reg, value);
+/* RTE writes must use ioapic_write_entry. */
+BUG_ON(reg >= 0x10);
 *(IO_APIC_BASE(apic) + 4) = value;
 }
 
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index aada2ef96c62..041233b9b706 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -237,15 +237,15 @@ struct IO_APIC_route_entry __ioapic_read_entry(
 {
 union entry_union eu;
 
-if ( raw )
+if ( raw || !iommu_intremap )
 {
 eu.w1 = __io_apic_read(apic, 0x10 + 2 * pin);
 eu.w2 = __io_apic_read(apic, 0x11 + 2 * pin);
 }
 else
 {
-eu.w1 = io_apic_read(apic, 0x10 + 2 * pin);
-eu.w2 = io_apic_read(apic, 0x11 + 2 * pin);
+eu.w1 = iommu_read_apic_from_ire(apic, 0x10 + 2 * pin);
+eu.w2 = iommu_read_apic_from_ire(apic, 0x11 + 2 * pin);
 }
 
 return eu.entry;
@@ -269,15 +269,15 @@ void __ioapic_write_entry(
 {
 union entry_union eu = { .entry = e };
 
-if ( raw )
+if ( raw || !iommu_intremap )
 {
 __io_apic_write(apic, 0x11 + 2 * pin, eu.w2);
 __io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
 }
 else
 {
-io_apic_write(apic, 0x11 + 2 * pin, eu.w2);
-io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
+iommu_update_ire_from_apic(apic, 0x11 + 2 * pin, eu.w2);
+iommu_update_ire_from_apic(apic, 0x10 + 2 * pin, eu.w1);
 }
 }
 
@@ -433,16 +433,17 @@ static void modify_IO_APIC_irq(unsigned int irq, unsigned 
int enable,
unsigned int disable)
 {
 struct irq_pin_list *entry = irq_2_pin + irq;
-unsigned int pin, reg;
 
 for (;;) {
-pin = entry->pin;
+unsigned int pin = entry->pin;
+struct IO_APIC_route_entry rte;
+
 if (pin == -1)
 break;
-reg = io_apic_read(entry->apic, 0x10 + pin*2);
-reg &= ~disable;
-reg |= enable;
-io_apic_modify(entry->apic, 0x10 + pin*2, reg);
+rte = __ioapic_read_entry(entry->apic, pin, false);
+rte.raw &= ~(uint64_t)disable;
+rte.raw |= enable;
+__ioapic_write_entry(entry->apic, pin, false, rte);
 if (!entry->next)
 break;
 entry = irq_2_pin + entry->next;
@@ -584,16 +585,16 @@ set_ioapic_affinity_irq(struct irq_desc *desc, const 
cpumask_t *mask)
 dest = SET_APIC_LOGICAL_ID(dest);
 entry = irq_2_pin + irq;
 for (;;) {
-unsigned int data;
+struct IO_APIC_route_entry rte;
+
 pin = entry->pin;
 if (pin == -1)
 break;
 
-io_apic_write(entry->apic, 0x10 + 1 + pin*2, dest);
-data = io_apic_read(entry->apic, 0x10 + pin*2);
-data &= ~IO_APIC_REDIR_VECTOR_MASK;
-data |= MASK_INSR(desc->arch.vector, IO_APIC_REDIR_VECTOR_MASK);
-io_apic_modify(entry->apic, 0x10 + pin*2, data);
+rte = __ioapic_read_entry(entry->apic, pin, false);
+rte.dest.dest32 = dest;
+rte.vector = 

[PATCH v3 0/4] x86/ioapic: fix edge triggered interrupt migration

2023-07-26 Thread Roger Pau Monne
Hello,

Following series attempts to solve the issue with IO-APIC edge triggered
interrupts seeing an inconsistent IRTE when injected while being
migrated.

It's been simplified a bit from the original RFC, and does make the
logic in the IOMMU RTE update handlers cleaner, as those get passed the
full RTE.

When not using interrupt remapping the logic is left as-is.  One further
change would be to mask edge triggered interrupts, as destination and
vector cannot be changed atomically when dealing with plain IO-APIC
RTEs.

The previously final patch to switch AMD IOMMU to use atomic RTE updates
has been drop pending feedback from AMD on whether that approach is
suitable.

Thanks, Roger.

Roger Pau Monne (4):
  x86/ioapic: add a raw field to RTE struct
  x86/ioapic: RTE modifications must use ioapic_write_entry
  iommu/vtd: rename io_apic_read_remap_rte() local variable
  x86/iommu: pass full IO-APIC RTE for remapping table update

 xen/arch/x86/include/asm/io_apic.h   |  65 ++-
 xen/arch/x86/include/asm/iommu.h |   3 +-
 xen/arch/x86/io_apic.c   |  46 
 xen/drivers/passthrough/amd/iommu.h  |   2 +-
 xen/drivers/passthrough/amd/iommu_intr.c | 106 ++---
 xen/drivers/passthrough/vtd/extern.h |   2 +-
 xen/drivers/passthrough/vtd/intremap.c   | 140 +++
 xen/drivers/passthrough/x86/iommu.c  |   4 +-
 xen/include/xen/iommu.h  |   3 +-
 9 files changed, 142 insertions(+), 229 deletions(-)

-- 
2.41.0




[PATCH v3 1/4] x86/ioapic: add a raw field to RTE struct

2023-07-26 Thread Roger Pau Monne
Further changes will require access to the full RTE as a single value
in order to pass it to IOMMU interrupt remapping handlers.

No functional change intended.

Signed-off-by: Roger Pau Monné 
Acked-by: Jan Beulich 
---
Changes since v1:
 - Fix initializers.
---
Tested on gitlab CI, builds on all compilers used there.
---
 xen/arch/x86/include/asm/io_apic.h   | 57 +---
 xen/arch/x86/io_apic.c   |  2 +-
 xen/drivers/passthrough/amd/iommu_intr.c |  4 +-
 xen/drivers/passthrough/vtd/intremap.c   |  4 +-
 4 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/include/asm/io_apic.h 
b/xen/arch/x86/include/asm/io_apic.h
index bd8cb95f46b1..4c4777b68a51 100644
--- a/xen/arch/x86/include/asm/io_apic.h
+++ b/xen/arch/x86/include/asm/io_apic.h
@@ -90,35 +90,38 @@ enum ioapic_irq_destination_types {
 };
 
 struct IO_APIC_route_entry {
-unsigned int vector:8;
-unsigned int delivery_mode:3; /*
-   * 000: FIXED
-   * 001: lowest prio
-   * 111: ExtINT
-   */
-unsigned int dest_mode:1; /* 0: physical, 1: logical */
-unsigned int delivery_status:1;
-unsigned int polarity:1;  /* 0: low, 1: high */
-unsigned int irr:1;
-unsigned int trigger:1;   /* 0: edge, 1: level */
-unsigned int mask:1;  /* 0: enabled, 1: disabled */
-unsigned int __reserved_2:15;
-
 union {
 struct {
-unsigned int __reserved_1:24;
-unsigned int physical_dest:4;
-unsigned int __reserved_2:4;
-} physical;
-
-struct {
-unsigned int __reserved_1:24;
-unsigned int logical_dest:8;
-} logical;
-
-/* used when Interrupt Remapping with EIM is enabled */
-unsigned int dest32;
-} dest;
+unsigned int vector:8;
+unsigned int delivery_mode:3; /*
+   * 000: FIXED
+   * 001: lowest prio
+   * 111: ExtINT
+   */
+unsigned int dest_mode:1; /* 0: physical, 1: logical */
+unsigned int delivery_status:1;
+unsigned int polarity:1;  /* 0: low, 1: high */
+unsigned int irr:1;
+unsigned int trigger:1;   /* 0: edge, 1: level */
+unsigned int mask:1;  /* 0: enabled, 1: disabled */
+unsigned int __reserved_2:15;
+
+union {
+struct {
+unsigned int __reserved_1:24;
+unsigned int physical_dest:4;
+unsigned int __reserved_2:4;
+} physical;
+
+struct {
+unsigned int __reserved_1:24;
+unsigned int logical_dest:8;
+} logical;
+unsigned int dest32;
+} dest;
+};
+uint64_t raw;
+};
 };
 
 /*
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 25a08b1ea6c6..aada2ef96c62 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2360,7 +2360,7 @@ int ioapic_guest_read(unsigned long physbase, unsigned 
int reg, u32 *pval)
 int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val)
 {
 int apic, pin, irq, ret, pirq;
-struct IO_APIC_route_entry rte = { 0 };
+struct IO_APIC_route_entry rte = { };
 unsigned long flags;
 struct irq_desc *desc;
 
diff --git a/xen/drivers/passthrough/amd/iommu_intr.c 
b/xen/drivers/passthrough/amd/iommu_intr.c
index f4de09f43180..9e6be3be3515 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -352,8 +352,8 @@ static int update_intremap_entry_from_ioapic(
 void cf_check amd_iommu_ioapic_update_ire(
 unsigned int apic, unsigned int reg, unsigned int value)
 {
-struct IO_APIC_route_entry old_rte = { 0 };
-struct IO_APIC_route_entry new_rte = { 0 };
+struct IO_APIC_route_entry old_rte = { };
+struct IO_APIC_route_entry new_rte = { };
 unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
 unsigned int pin = (reg - 0x10) / 2;
 int seg, bdf, rc;
diff --git a/xen/drivers/passthrough/vtd/intremap.c 
b/xen/drivers/passthrough/vtd/intremap.c
index b39bc832821a..786388b4d9c7 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -419,7 +419,7 @@ unsigned int cf_check io_apic_read_remap_rte(
 {
 unsigned int ioapic_pin = (reg - 0x10) / 2;
 int index;
-struct IO_xAPIC_route_entry old_rte = { 0 };
+struct IO_xAPIC_route_entry old_rte = { };
 int rte_upper = (reg & 1) ? 1 : 0;
 struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
 
@@ -442,7 +442,7 @@ void cf_check io_apic_write_remap_rte(
 unsigned int 

Re: [PATCH] vpci: add permission checks to map_range()

2023-07-26 Thread Jan Beulich
On 24.07.2023 17:37, Roger Pau Monne wrote:
> @@ -1184,6 +1177,20 @@ int __init dom0_construct_pvh(struct domain *d, const 
> module_t *image,
>  
>  printk(XENLOG_INFO "*** Building a PVH Dom%d ***\n", d->domain_id);
>  
> +if ( is_hardware_domain(d) )
> +{
> +/*
> + * Setup permissions early so that calls to add MMIO regions to the
> + * p2m as part of vPCI setup don't fail due to permission checks.
> + */
> +rc = dom0_setup_permissions(d);
> +if ( rc )
> +{
> +printk("%pd unable to setup permissions: %d\n", d, rc);

The switch from panic() to printk() may want mentioning in the description
as deliberate. (The usefulness of %pd here is debatable, as it can't be
other than Dom0. But I don't mind.)

> @@ -43,6 +46,21 @@ static int cf_check map_range(
>  {
>  unsigned long size = e - s + 1;
>  
> +if ( !iomem_access_permitted(map->d, s, e) )
> +{
> +gprintk(XENLOG_WARNING,
> +"%pd denied access to MMIO range [%#lx, %#lx]\n", s, e);

This doesn't look like it would compile. Also gprintk() logs current,
which I'm not sure is generally applicable here. IOW I think it wants
to be

printk(XENLOG_G_WARNING,
   "%pd denied access to MMIO range [%#lx, %#lx]\n",
   map->d, s, e);

Same for the other log message then.

Another Dom0 related concern can probably be put off until we actually
get a report of this failing (which may be more likely because of the
XSM check below): The function being used as a callback passed to
rangeset_consume_ranges(), failure may affect just a single BAR, while
the incoming range may cover multiple of them in one go. Depending on
what functionality such a BAR covers, the device may remain usable (a
typical example of what I'm thinking of is a multi-function device
having serial and/or parallel port on it, which are fine to be driven
via I/O ports even if driving via MMIO is possible [and would likely
be more efficient]). Of course, to allow some MMIO bars to be used
while prohibiting use of some others, further trickery may be needed.
But not exposing the device to Dom0 at all doesn't seem very nice in
such a case.

Jan

> +return -EPERM;
> +}
> +
> +rc = xsm_iomem_mapping(XSM_HOOK, map->d, s, e, map->map);
> +if ( rc )
> +{
> +gprintk(XENLOG_WARNING,
> +"%pd XSM denied access to MMIO range [%#lx, %#lx]\n", s, 
> e);
> +return rc;
> +}
> +
>  /*
>   * ARM TODOs:
>   * - On ARM whether the memory is prefetchable or not should be 
> passed




Re: [XEN PATCH 0/4] xen: address violations of MISRA C:2012 Rule 7.2

2023-07-26 Thread Simone Ballarin
Il giorno mer 26 lug 2023 alle ore 14:02 Jan Beulich  ha
scritto:

> On 26.07.2023 13:03, Simone Ballarin wrote:
> > The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
> headline
> > states:
> > "A 'u' or 'U' suffix shall be applied to all integer constants that are
> > represented in an unsigned type".
> >
> > These violations are caused by the missing "u" or "U" suffix in unsigned
> > integer constants, such as:
> >
> > xen/arch/x86/hvm/hypercall.c:132.17-132.26
> > if ( (eax & 0x8000) && is_viridian_domain(currd) )
> >
> > If a rule is not met, changes are needed in order to achieve compliance.
> > The patches in this series achieve compliance for MISRA C:2012 Rule 7.2
> by
> > adding the 'U' suffix to integers literals with unsigned type and also
> to other
> > literals used in the same contexts or near violations, when their
> positive
> > nature is immediately clear. The latter changes are done for the sake of
> > uniformity.
> >
> > Gianluca Luparini (4):
> >   x86/vmx: address violations of MISRA C:2012 Rule 7.2
> >   xen/vpci: address violations of MISRA C:2012 Rule 7.2
> >   x86/viridian: address violations of MISRA C:2012 Rule 7.2
> >   xen/x86: address violations of MISRA C:2012 Rule 7.2
>
> Looks like this is v4 of a previously larger series, where fair parts have
> gone in already?
>
> Jan
>

Yes, it is. I accidentally removed the v4 from the object.
In this submission, I've dropped the commits that have
been already merged.

-- 
Simone Ballarin, M.Sc.

Field Application Engineer, BUGSENG (https://bugseng.com
)


Re: [XEN PATCH 0/4] xen: address violations of MISRA C:2012 Rule 7.2

2023-07-26 Thread Jan Beulich
On 26.07.2023 13:03, Simone Ballarin wrote:
> The xen sources contains violations of MISRA C:2012 Rule 7.2 whose headline
> states:
> "A 'u' or 'U' suffix shall be applied to all integer constants that are
> represented in an unsigned type".
> 
> These violations are caused by the missing "u" or "U" suffix in unsigned
> integer constants, such as:
> 
> xen/arch/x86/hvm/hypercall.c:132.17-132.26
> if ( (eax & 0x8000) && is_viridian_domain(currd) )
> 
> If a rule is not met, changes are needed in order to achieve compliance.
> The patches in this series achieve compliance for MISRA C:2012 Rule 7.2 by
> adding the 'U' suffix to integers literals with unsigned type and also to 
> other
> literals used in the same contexts or near violations, when their positive
> nature is immediately clear. The latter changes are done for the sake of
> uniformity.
> 
> Gianluca Luparini (4):
>   x86/vmx: address violations of MISRA C:2012 Rule 7.2
>   xen/vpci: address violations of MISRA C:2012 Rule 7.2
>   x86/viridian: address violations of MISRA C:2012 Rule 7.2
>   xen/x86: address violations of MISRA C:2012 Rule 7.2

Looks like this is v4 of a previously larger series, where fair parts have
gone in already?

Jan



Re: [PATCH v4 2/2] xen/riscv: introduce identity mapping

2023-07-26 Thread Jan Beulich
On 26.07.2023 13:23, Oleksii wrote:
> I would like to ask for advice on whether it would be easier, less bug-
> provoking ( during identity mapping to remove of whole Xen ) to have a
> separate identity section that won't be more than PAGE_SIZE.

I'm afraid you can't safely do this in C, or at least not without
further checking on what the compiler actually did.

> @@ -264,6 +268,19 @@ void __init enable_mmu(void)
>                RV_STAGE1_MODE << SATP_MODE_SHIFT);
>  }
>  
> +void __attribute__((naked)) __section(".ident") turn_on_mmu(unsigned
> long ra)

Did you read what gcc doc says about "naked"? Extended asm() isn't
supported there. Since ...

> +{
> +    /* Ensure page table writes precede loading the SATP */
> +    sfence_vma();
> +
> +    /* Enable the MMU and load the new pagetable for Xen */
> +    csr_write(CSR_SATP,
> +              PFN_DOWN((unsigned long)stage1_pgtbl_root) |
> +              RV_STAGE1_MODE << SATP_MODE_SHIFT);
> +
> +    asm volatile( "jr %0\n" : : "r"(ra) );
> +}

... none of this really requires C, I think we're at the point where
(iirc) Andrew's and my suggestion wants following, moving this to
assembly code (at which point it doesn't need to be a separate
function). You can still build page tables in C, of course. (Likely
you then also won't need a separate section; some minimal alignment
guarantees ought to suffice to make sure the critical code is
confined to a single page.)

Jan



[xen-4.17-testing test] 182016: tolerable trouble: fail/pass/starved - PUSHED

2023-07-26 Thread osstest service owner
flight 182016 xen-4.17-testing real [real]
flight 182022 xen-4.17-testing real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/182016/
http://logs.test-lab.xenproject.org/osstest/logs/182022/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-migrupgrade 11 xen-install/dst_host fail pass in 182022-retest

Tests which did not succeed, but are not blocking:
 test-amd64-i386-migrupgrade  10 xen-install/src_host fail  like 181997
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 181997
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 181997
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 181997
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 181997
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 181997
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 181997
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 181997
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 181997
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 181997
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw  3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl   3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-credit1   3 hosts-allocate   starved  n/a
 test-armhf-armhf-libvirt  3 hosts-allocate   starved  n/a
 test-armhf-armhf-libvirt-qcow2  3 hosts-allocate   starved  n/a
 test-armhf-armhf-libvirt-raw  3 hosts-allocate   starved  n/a
 test-armhf-armhf-xl-vhd   3 hosts-allocate   starved  n/a
 test-arm64-arm64-libvirt-xsm  3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-credit2   3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-thunderx  3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-vhd   3 hosts-allocate   starved  n/a
 test-armhf-armhf-xl   3 hosts-allocate   starved  n/a
 test-armhf-armhf-xl-credit1   3 hosts-allocate   starved  n/a
 test-armhf-armhf-xl-multivcpu  3 hosts-allocate   starved  n/a
 test-armhf-armhf-xl-rtds  3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-xsm   3 hosts-allocate   starved  n/a
 test-armhf-armhf-xl-credit2   3 hosts-allocate   starved  n/a

version targeted for testing:
 xen  04f14792a6c7a4e4cd8fda10b29c7e72837c5433
baseline version:
 xen  3141a0b85c37b76e069ec7dcb906ff202f5c4075

Last test of basis   181997  2023-07-24 16:37:35 Z1 days
Testing same since   182016  2023-07-26 00:36:44 Z0 days1 attempts


People who touched revisions under test:
  Michal Orzel 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-prev pass
 build-i386-prev  pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 

Re: [PATCH v4 2/2] xen/riscv: introduce identity mapping

2023-07-26 Thread Oleksii
Hi all,

I would like to ask for advice on whether it would be easier, less bug-
provoking ( during identity mapping to remove of whole Xen ) to have a
separate identity section that won't be more than PAGE_SIZE.

Please take a look at the changes below. Comments are welcome.

diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index 7d1a8beba8..ba4af48fc6 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -26,6 +26,8 @@ static unsigned long __ro_after_init phys_offset;
 #define LOAD_TO_LINK(addr) ((unsigned long)(addr) - phys_offset)
 #define LINK_TO_LOAD(addr) ((unsigned long)(addr) + phys_offset)
 
+extern char _ident_start[], _ident_end[];
+
 /*
  * It is expected that Xen won't be more then 2 MB.
  * The check in xen.lds.S guarantees that.
@@ -112,7 +114,9 @@ static void __init setup_initial_mapping(struct
mmu_desc *mmu_desc,
         case 1: /* Level 0 */
             {
                 unsigned long paddr = (page_addr - map_start) +
pa_start;
-                unsigned int permissions = PTE_LEAF_DEFAULT;
+                unsigned int permissions = is_identity_mapping
+                                           ? PTE_LEAF_DEFAULT |
PTE_EXECUTABLE
+                                           : PTE_LEAF_DEFAULT ;
                 unsigned long addr = is_identity_mapping
                                      ? page_addr :
LINK_TO_LOAD(page_addr);
                 pte_t pte_to_be_written;
@@ -248,9 +252,9 @@ void __init setup_initial_pagetables(void)
         return;
 
     setup_initial_mapping(_desc,
-                          load_start,
-                          load_end,
-                          load_start);
+                          (unsigned long)_ident_start,
+                          (unsigned long)_ident_end,
+                          (unsigned long)_ident_start);
 }
 
 void __init enable_mmu(void)
@@ -264,6 +268,19 @@ void __init enable_mmu(void)
               RV_STAGE1_MODE << SATP_MODE_SHIFT);
 }
 
+void __attribute__((naked)) __section(".ident") turn_on_mmu(unsigned
long ra)
+{
+    /* Ensure page table writes precede loading the SATP */
+    sfence_vma();
+
+    /* Enable the MMU and load the new pagetable for Xen */
+    csr_write(CSR_SATP,
+              PFN_DOWN((unsigned long)stage1_pgtbl_root) |
+              RV_STAGE1_MODE << SATP_MODE_SHIFT);
+
+    asm volatile( "jr %0\n" : : "r"(ra) );
+}
+
 static void __init __remove_identity_mapping(pte_t *pgtbl,
                                              unsigned long load_start,
                                              unsigned int pt_level)
@@ -297,20 +314,42 @@ static void __init
__remove_identity_mapping(pte_t *pgtbl,
 
 void __init remove_identity_mapping(void)
 {
-    unsigned long load_start = LINK_TO_LOAD(_start);
+    unsigned int i;
+    pte_t *pgtbl;
+    unsigned int index, xen_index;
+    unsigned long ident_start = LINK_TO_LOAD(_ident_start);
 
-    if ( XEN_VIRT_START <= load_start )
+    for ( pgtbl = stage1_pgtbl_root, i = CONFIG_PAGING_LEVELS; i; i--
)
     {
-        early_printk("remove identity mapping algo expects that"
-                     "XEN_VIRT_START > load_start\n");
-        die();
-    }
+        index = pt_index(i - 1, ident_start);
+        xen_index = pt_index(i - 1, XEN_VIRT_START);
 
-    __remove_identity_mapping(stage1_pgtbl_root,
-                              LINK_TO_LOAD(_start),
-                              CONFIG_PAGING_LEVELS - 1);
+        if ( index != xen_index )
+        {
+            pgtbl[index].pte = 0;
+            break;
+        }
+
+        pgtbl = (pte_t *)pte_to_paddr(pgtbl[index]);
+    }
 }
 
+// void __init remove_identity_mapping(void)
+// {
+//     unsigned long load_start = LINK_TO_LOAD(_start);
+
+//     if ( XEN_VIRT_START <= load_start )
+//     {
+//         early_printk("remove identity mapping algo expects that"
+//                      "XEN_VIRT_START > load_start\n");
+//         die();
+//     }
+
+//     __remove_identity_mapping(stage1_pgtbl_root,
+//                               LINK_TO_LOAD(_start),
+//                               CONFIG_PAGING_LEVELS - 1);
+// }
+
 /*
  * calc_phys_offset() should be used before MMU is enabled because
access to
  * start() is PC-relative and in case when load_addr != linker_addr
phys_offset
diff --git a/xen/arch/riscv/riscv64/head.S
b/xen/arch/riscv/riscv64/head.S
index 613e25ea6f..bb529f6a11 100644
--- a/xen/arch/riscv/riscv64/head.S
+++ b/xen/arch/riscv/riscv64/head.S
@@ -41,14 +41,12 @@ ENTRY(start)
 
         jal     setup_initial_pagetables
 
-        jal     enable_mmu
-
         /* Calculate proper VA after jump from 1:1 mapping */
         la      t0, .L_primary_switched
         sub     t0, t0, s2
 
-        /* Jump from 1:1 mapping world */
-        jr      t0
+        mv      a0, t0
+        jal     turn_on_mmu
 
 .L_primary_switched:
         /*
diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S
index 31ccebadcb..ffa0225332 100644
--- a/xen/arch/riscv/xen.lds.S
+++ 

Xen Security Advisory 433 v2 (CVE-2023-20593) - x86/AMD: Zenbleed

2023-07-26 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory CVE-2023-20593 / XSA-433
  version 2

  x86/AMD: Zenbleed

UPDATES IN VERSION 2


Include the CVE, which was missed accidentally in the rush of
timelines repeatedly moving underfoot.

ISSUE DESCRIPTION
=

Researchers at Google have discovered Zenbleed, a hardware bug causing
corruption of the vector registers.

When a VZEROUPPER instruction is discarded as part of a bad transient
execution path, its effect on internal tracking are not unwound
correctly.  This manifests as the wrong micro-architectural state
becoming architectural, and corrupting the vector registers.

Note: While this malfunction is related to speculative execution, this
  is not a speculative sidechannel vulnerability.

The corruption is not random.  It happens to be stale values from the
physical vector register file, a structure competitively shared between
sibling threads.  Therefore, an attacker can directly access data from
the sibling thread, or from a more privileged context.

For more details, see:
  https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
  
https://github.com/google/security-research/security/advisories/GHSA-v6wh-rxpg-cmm8

IMPACT
==

With very low probability, corruption of the vector registers can occur.
This data corruption causes mis-calculations in subsequent logic.

An attacker can exploit this bug to read data from different contexts on
the same core.  Examples of such data includes key material, cypher and
plaintext from the AES-NI instructions, or the contents of REP-MOVS
instructions, commonly used to implement memcpy().

VULNERABLE SYSTEMS
==

Systems running all versions of Xen are affected.

This bug is specific to the AMD Zen2 microarchitecture.  AMD do not
believe that other microarchitectures are affected.

MITIGATION
==

This issue can be mitigated by disabling AVX, either by booting Xen with
`cpuid=no-avx` on the command line, or by specifying `cpuid="host:avx=0"` in
the vm.cfg file of all untrusted VMs.  However, this will come with a
significant impact on the system and is not recommended for anyone able to
deploy the microcode or patch described below.

RESOLUTION
==

AMD are producing microcode updates to address the bug.  Consult your
dom0 OS vendor.  This microcode is effective when late-loaded, which can
be performed on a live system without reboot.

In cases where microcode is not available, the appropriate attached
patch updates Xen to use a control register to avoid the issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa433.patch   xen-unstable
xsa433-4.17.patch  Xen 4.17.x
xsa433-4.16.patch  Xen 4.16.x
xsa433-4.15.patch  Xen 4.15.x
xsa433-4.14.patch  Xen 4.14.x

$ sha256sum xsa433*
a9331733b63e3e566f1436a48e9bd9e8b86eb48da6a8ced72ff4affb7859e027  xsa433.patch
6f1db2a2078b0152631f819f8ddee21720dabe185ec49dc9806d4a9d3478adfd  
xsa433-4.14.patch
ca3a92605195307ae9b6ff87240beb52a097c125a760c919d7b9a0aff6e557c0  
xsa433-4.15.patch
e5e94b3de68842a1c8d222802fb204d64acd118e3293c8e909dfaf3ada23d912  
xsa433-4.16.patch
41d12104869b7e8307cd93af1af12b4fd75a669aeff15d31b234dc72981ae407  
xsa433-4.17.patch
$

NOTE CONCERNING TIMELINE


This issue is subject to coordinated disclosure on August 8th.  The
discoverer chose to publish details ahead of this timeline.
-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmTA/2cMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZ0EIH/02n/gvMGF5RCwfs/uvwjsQASAgELWTgAFv+tXOG
yLZWCxNkWAWDxTWAEWfdcSsLCN8GDc4c6lNuhqnV3mVsIDiGSHmXgSkI9pcCQ79T
2KTgC+ncMM4yeYTI5SUL4xvzzIQ/38t5gK5+AyPxg3jpMhCLEz2dJwbjgd4CKai+
ax+l3cX9ibLj/lQQwvgkPXweAVsfILnCAB5J1VQb1Jw0DWauYJLurMj0flz82a2O
NftdEx3b5ADDxXHdE52J5p/kpXMDohdPm0R07Y63j+eY+QJADLHfwE+n4pqyzvDf
kPEGUtxbcCj4VygmO6xrHgoHYqaGbRYeHJyHEt4jpZDLwP4=
=9wn5
-END PGP SIGNATURE-


xsa433.patch
Description: Binary data


xsa433-4.14.patch
Description: Binary data


xsa433-4.15.patch
Description: Binary data


xsa433-4.16.patch
Description: Binary data


xsa433-4.17.patch
Description: Binary data


Re: [PATCH v3 08/25] tools/xenstore: make hashtable key and value parameters const

2023-07-26 Thread Juergen Gross

On 26.07.23 11:29, Julien Grall wrote:

Hi,

On 26/07/2023 09:44, Juergen Gross wrote:

On 26.07.23 10:20, Julien Grall wrote:
To give a concrete example, with the current interface we are telling the 
user that what they store in the hashtable can be modified at some point. By 
adding 'const' for the value in hashtable_add(), we can mislead a user to 
think it is fine to store static string, yet this is not enforced all the way 
through. So one could mistakenly think that values returned 
hashtable_search() can be modified. And the compiler will not be here to help 
enforcing it because you cast-away the const.


Yes, like in the case of strstr().

It takes two const char * parameters and it is returning char *, even with it
pointing into the first parameter.


This is a pretty good example on how to not write an interface. :)



Do you have any code in this series that requires the 'const' in 
hashtable_add()? If so, can you point me to the patch and I will have a look?


I had it when writing this patch, but this requirement is gone now. But please
note that this means to drop the const from db_write(), too.

If not, then I will strongly argue that this should be dropped because 
dropping a const is always a recipe for disaster.


Depends IMO.

I believe it is better as I've done it,
but in case you insist on it I can drop
the patch.


Well... I can always be swayed if there is a good argument to make it const. So 
far, you mention that hashtable doesn't modify the content but you don't really 
explain why waiving away the help from the compiler is ok. Therefore, to me it 
seems the downside is bigger than the benefit.


Also, I am not asking to drop the patch. The const on the key is ok. I am only 
requesting to remove the const on the value.




An alternative would be make hashtable_search() return a const and only cast the
const away where it is really needed (and probably with a prominent comment at
the related hashtable_add() place). I think this will hit xenstored_domain.c use
cases only.


Again, this still means we are casting away the const somewhere. This is the 
part I am against if there is no strong justification for it (i.e. there is no 
other way to do it).


Okay, I'll drop the const attribute for the value parameter.


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[XEN PATCH v4 4/4] xen/x86: address violations of MISRA C:2012 Rule 7.2

2023-07-26 Thread Simone Ballarin
From: Gianluca Luparini 

The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".

Add the 'U' suffix to integers literals with unsigned type.
Use _AC() for macro costants that are used also in assembly files.

For the sake of uniformity, the following changes are made:
- add the 'U' suffix to all first macro's arguments in 'mce-apei.c'
- add the 'U' suffix to switch cases in 'cpuid.c'
- add 'U' suffixes to 'mask16' in 'stdvga.c'
- add the 'U' suffix to macros in 'pci.h'
- use _AC() for macros near 'X86_CR0_PG'

Signed-off-by: Gianluca Luparini 
Signed-off-by: Simone Ballarin 
Reviewed-by: Stefano Stabellini 
---
Changes in v4:
- change commit headline
- remove 'U' suffix from '0x' in 'cpu-policy.c'
- remove some changes in 'msr-index.h'
- remove changes in 'irq.c' and 'acpi_mmcfg.c'

Changes in v3:
- change 'Signed-off-by' ordering
- change commit message
- add 'UL' in 'extable.c'
- fix indentation in 'cpu-policy.c'
- remove excessive suffixes in 'mce-apei.c'
- add 'UL' in 'x86-defns.h'
- remove changes to 'sr_mask' in 'stdvga.c'
- remove comments to 'gr_mask' in 'stdvga.c'
- move 'viridian.c' and 'hyperv-tlfs.h' in a separate commit

Changes in v2:
- minor change to commit title
- change commit message
- remove comments from 'gr_mask' in 'stdvga.c'
- correct code style in 'trace.h'
- add fix in 'extable.c'
- remove changes in 'x86-defns.h', 'msr-index.h' and 'xen-x86_64.h'
---
 xen/arch/x86/apic.c|   2 +-
 xen/arch/x86/cpu-policy.c  |  18 +--
 xen/arch/x86/cpu/mcheck/mce-apei.c |   4 +-
 xen/arch/x86/cpuid.c   |   8 +-
 xen/arch/x86/efi/efi-boot.h|   6 +-
 xen/arch/x86/extable.c |   2 +-
 xen/arch/x86/hvm/hypercall.c   |   2 +-
 xen/arch/x86/hvm/pmtimer.c |   4 +-
 xen/arch/x86/hvm/stdvga.c  |  50 +++
 xen/arch/x86/hvm/vlapic.c  |   6 +-
 xen/arch/x86/include/asm/apicdef.h |   2 +-
 xen/arch/x86/include/asm/config.h  |   2 +-
 xen/arch/x86/include/asm/hpet.h|   2 +-
 xen/arch/x86/include/asm/hvm/trace.h   |   4 +-
 xen/arch/x86/include/asm/hvm/vioapic.h |   2 +-
 xen/arch/x86/include/asm/msi.h |   2 +-
 xen/arch/x86/include/asm/msr-index.h   | 180 -
 xen/arch/x86/include/asm/pci.h |   8 +-
 xen/arch/x86/include/asm/x86-defns.h   |  24 ++--
 xen/arch/x86/percpu.c  |   2 +-
 xen/arch/x86/psr.c |   2 +-
 xen/arch/x86/spec_ctrl.c   |   8 +-
 xen/arch/x86/x86_64/pci.c  |   2 +-
 xen/arch/x86/x86_emulate/x86_emulate.h |   2 +-
 xen/lib/x86/cpuid.c|   8 +-
 xen/lib/x86/policy.c   |   2 +-
 26 files changed, 177 insertions(+), 177 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 41879230ec..1109c0d9cf 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1211,7 +1211,7 @@ static void __init calibrate_APIC_clock(void)
  * Setup the APIC counter to maximum. There is no way the lapic
  * can underflow in the 100ms detection time frame.
  */
-__setup_APIC_LVTT(0x);
+__setup_APIC_LVTT(0xU);
 
 bus_freq = calibrate_apic_timer();
 if ( !bus_freq )
diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index f40eeb8be8..b0c7e3bd02 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -321,7 +321,7 @@ static void recalculate_misc(struct cpu_policy *p)
 p->extd.vendor_edx = p->basic.vendor_edx;
 
 p->extd.raw_fms = p->basic.raw_fms;
-p->extd.raw[0x1].b &= 0xff00;
+p->extd.raw[0x1].b &= 0xff00U;
 p->extd.e1d |= p->basic._1d & CPUID_COMMON_1D_FEATURES;
 
 p->extd.raw[0x8].a &= 0x; /* GuestMaxPhysAddr hidden. */
@@ -378,10 +378,10 @@ static void __init calculate_host_policy(void)
  * this information.
  */
 if ( cpu_has_lfence_dispatch )
-max_extd_leaf = max(max_extd_leaf, 0x8021);
+max_extd_leaf = max(max_extd_leaf, 0x8021U);
 
-p->extd.max_leaf = 0x8000 | min_t(uint32_t, max_extd_leaf & 0x,
-  ARRAY_SIZE(p->extd.raw) - 1);
+p->extd.max_leaf = 0x8000U | min_t(uint32_t, max_extd_leaf & 0x,
+   ARRAY_SIZE(p->extd.raw) - 1);
 
 x86_cpu_featureset_to_policy(boot_cpu_data.x86_capability, p);
 recalculate_xstate(p);
@@ -768,11 +768,11 @@ void recalculate_cpuid_policy(struct domain *d)
 
 p->basic.max_leaf   = min(p->basic.max_leaf,   max->basic.max_leaf);
 p->feat.max_subleaf = min(p->feat.max_subleaf, max->feat.max_subleaf);
-p->extd.max_leaf= 0x8000 | min(p->extd.max_leaf & 0x,
-   ((p->x86_vendor & (X86_VENDOR_AMD |
-

[XEN PATCH v4 3/4] x86/viridian: address violations of MISRA C:2012 Rule 7.2

2023-07-26 Thread Simone Ballarin
From: Gianluca Luparini 

The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".

Add the 'U' suffix to integers literals with unsigned type and also to other
literals used in the same contexts or near violations, when their positive
nature is immediately clear. The latter changes are done for the sake of
uniformity.

Signed-off-by: Gianluca Luparini 
Signed-off-by: Simone Ballarin 
Reviewed-by: Stefano Stabellini 
---
Changes in v4:
- change commit headline
- add Reviewed-by

Changes in v3:
- create this commit for 'viridian.c' and 'hyperv-tlfs.h'
---
 xen/arch/x86/hvm/viridian/viridian.c |  2 +-
 xen/arch/x86/include/asm/guest/hyperv-tlfs.h | 28 ++--
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index 7405c117bc..61171e3363 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -291,7 +291,7 @@ static void enable_hypercall_page(struct domain *d)
  * calling convention) to differentiate Xen and Viridian hypercalls.
  */
 *(u8  *)(p + 0) = 0x0d; /* orl $0x8000, %eax */
-*(u32 *)(p + 1) = 0x8000;
+*(u32 *)(p + 1) = 0x8000U;
 *(u8  *)(p + 5) = 0x0f; /* vmcall/vmmcall */
 *(u8  *)(p + 6) = 0x01;
 *(u8  *)(p + 7) = (cpu_has_vmx ? 0xc1 : 0xd9);
diff --git a/xen/arch/x86/include/asm/guest/hyperv-tlfs.h 
b/xen/arch/x86/include/asm/guest/hyperv-tlfs.h
index 38f997a0c8..a6915ad731 100644
--- a/xen/arch/x86/include/asm/guest/hyperv-tlfs.h
+++ b/xen/arch/x86/include/asm/guest/hyperv-tlfs.h
@@ -471,30 +471,30 @@ typedef struct _HV_REFERENCE_TSC_PAGE {
 
 /* Define hypervisor message types. */
 enum hv_message_type {
-   HVMSG_NONE  = 0x,
+   HVMSG_NONE  = 0xU,
 
/* Memory access messages. */
-   HVMSG_UNMAPPED_GPA  = 0x8000,
-   HVMSG_GPA_INTERCEPT = 0x8001,
+   HVMSG_UNMAPPED_GPA  = 0x8000U,
+   HVMSG_GPA_INTERCEPT = 0x8001U,
 
/* Timer notification messages. */
-   HVMSG_TIMER_EXPIRED = 0x8010,
+   HVMSG_TIMER_EXPIRED = 0x8010U,
 
/* Error messages. */
-   HVMSG_INVALID_VP_REGISTER_VALUE = 0x8020,
-   HVMSG_UNRECOVERABLE_EXCEPTION   = 0x8021,
-   HVMSG_UNSUPPORTED_FEATURE   = 0x8022,
+   HVMSG_INVALID_VP_REGISTER_VALUE = 0x8020U,
+   HVMSG_UNRECOVERABLE_EXCEPTION   = 0x8021U,
+   HVMSG_UNSUPPORTED_FEATURE   = 0x8022U,
 
/* Trace buffer complete messages. */
-   HVMSG_EVENTLOG_BUFFERCOMPLETE   = 0x8040,
+   HVMSG_EVENTLOG_BUFFERCOMPLETE   = 0x8040U,
 
/* Platform-specific processor intercept messages. */
-   HVMSG_X64_IOPORT_INTERCEPT  = 0x8001,
-   HVMSG_X64_MSR_INTERCEPT = 0x80010001,
-   HVMSG_X64_CPUID_INTERCEPT   = 0x80010002,
-   HVMSG_X64_EXCEPTION_INTERCEPT   = 0x80010003,
-   HVMSG_X64_APIC_EOI  = 0x80010004,
-   HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005
+   HVMSG_X64_IOPORT_INTERCEPT  = 0x8001U,
+   HVMSG_X64_MSR_INTERCEPT = 0x80010001U,
+   HVMSG_X64_CPUID_INTERCEPT   = 0x80010002U,
+   HVMSG_X64_EXCEPTION_INTERCEPT   = 0x80010003U,
+   HVMSG_X64_APIC_EOI  = 0x80010004U,
+   HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005U
 };
 
 /* Define synthetic interrupt controller message flags. */
-- 
2.34.1




[XEN PATCH 0/4] xen: address violations of MISRA C:2012 Rule 7.2

2023-07-26 Thread Simone Ballarin
The xen sources contains violations of MISRA C:2012 Rule 7.2 whose headline
states:
"A 'u' or 'U' suffix shall be applied to all integer constants that are
represented in an unsigned type".

These violations are caused by the missing "u" or "U" suffix in unsigned
integer constants, such as:

xen/arch/x86/hvm/hypercall.c:132.17-132.26
if ( (eax & 0x8000) && is_viridian_domain(currd) )

If a rule is not met, changes are needed in order to achieve compliance.
The patches in this series achieve compliance for MISRA C:2012 Rule 7.2 by
adding the 'U' suffix to integers literals with unsigned type and also to other
literals used in the same contexts or near violations, when their positive
nature is immediately clear. The latter changes are done for the sake of
uniformity.

Gianluca Luparini (4):
  x86/vmx: address violations of MISRA C:2012 Rule 7.2
  xen/vpci: address violations of MISRA C:2012 Rule 7.2
  x86/viridian: address violations of MISRA C:2012 Rule 7.2
  xen/x86: address violations of MISRA C:2012 Rule 7.2

 xen/arch/x86/apic.c  |   2 +-
 xen/arch/x86/cpu-policy.c|  18 +-
 xen/arch/x86/cpu/mcheck/mce-apei.c   |   4 +-
 xen/arch/x86/cpu/vpmu_intel.c|   2 +-
 xen/arch/x86/cpuid.c |   8 +-
 xen/arch/x86/efi/efi-boot.h  |   6 +-
 xen/arch/x86/extable.c   |   2 +-
 xen/arch/x86/hvm/hypercall.c |   2 +-
 xen/arch/x86/hvm/pmtimer.c   |   4 +-
 xen/arch/x86/hvm/stdvga.c|  50 +++---
 xen/arch/x86/hvm/viridian/viridian.c |   2 +-
 xen/arch/x86/hvm/vlapic.c|   6 +-
 xen/arch/x86/hvm/vmx/vmcs.c  |   6 +-
 xen/arch/x86/hvm/vmx/vvmx.c  |   8 +-
 xen/arch/x86/include/asm/apicdef.h   |   2 +-
 xen/arch/x86/include/asm/config.h|   2 +-
 xen/arch/x86/include/asm/guest/hyperv-tlfs.h |  28 +--
 xen/arch/x86/include/asm/hpet.h  |   2 +-
 xen/arch/x86/include/asm/hvm/trace.h |   4 +-
 xen/arch/x86/include/asm/hvm/vioapic.h   |   2 +-
 xen/arch/x86/include/asm/hvm/vmx/vmcs.h  |  84 -
 xen/arch/x86/include/asm/hvm/vmx/vmx.h   |  16 +-
 xen/arch/x86/include/asm/msi.h   |   2 +-
 xen/arch/x86/include/asm/msr-index.h | 180 +--
 xen/arch/x86/include/asm/pci.h   |   8 +-
 xen/arch/x86/include/asm/x86-defns.h |  24 +--
 xen/arch/x86/percpu.c|   2 +-
 xen/arch/x86/psr.c   |   2 +-
 xen/arch/x86/spec_ctrl.c |   8 +-
 xen/arch/x86/x86_64/pci.c|   2 +-
 xen/arch/x86/x86_emulate/x86_emulate.h   |   2 +-
 xen/drivers/vpci/msi.c   |   2 +-
 xen/drivers/vpci/msix.c  |   2 +-
 xen/drivers/vpci/vpci.c  |   6 +-
 xen/lib/x86/cpuid.c  |   8 +-
 xen/lib/x86/policy.c |   2 +-
 36 files changed, 255 insertions(+), 255 deletions(-)

-- 
2.34.1




[XEN PATCH v4 2/4] xen/vpci: address violations of MISRA C:2012 Rule 7.2

2023-07-26 Thread Simone Ballarin
From: Gianluca Luparini 

The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".

Add the 'U' suffix to integers literals with unsigned type and also to other
literals used in the same contexts or near violations, when their positive
nature is immediately clear. The latter changes are done for the sake of
uniformity.

Signed-off-by: Gianluca Luparini 
Signed-off-by: Simone Ballarin 
Reviewed-by: Stefano Stabellini 
---
Changes in v4:
- change commit headline

Changes in v3:
- change 'Signed-off-by' ordering
- add 'uint32_t' casts in 'msi.c' and 'msix.c'

Changes in v2:
- minor change to commit title
- change commit message
---
 xen/drivers/vpci/msi.c  | 2 +-
 xen/drivers/vpci/msix.c | 2 +-
 xen/drivers/vpci/vpci.c | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 8f2b59e61a..bf5fe2f981 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -124,7 +124,7 @@ static void cf_check address_hi_write(
 struct vpci_msi *msi = data;
 
 /* Clear and update high part. */
-msi->address &= 0x;
+msi->address  = (uint32_t)msi->address;
 msi->address |= (uint64_t)val << 32;
 
 update_msi(pdev, msi);
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 25bde77586..2090168f42 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -531,7 +531,7 @@ static int cf_check msix_write(
 
 case PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET:
 entry->updated = true;
-entry->addr &= 0x;
+entry->addr  = (uint32_t)entry->addr;
 entry->addr |= (uint64_t)data << 32;
 break;
 
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index d73fa76302..3bec9a4153 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -319,7 +319,7 @@ static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int 
reg, unsigned int size,
 static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
  unsigned int offset)
 {
-uint32_t mask = 0x >> (32 - 8 * size);
+uint32_t mask = 0xU >> (32 - 8 * size);
 
 return (data & ~(mask << (offset * 8))) | ((new & mask) << (offset * 8));
 }
@@ -402,7 +402,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
unsigned int size)
 data = merge_result(data, tmp_data, size - data_offset, data_offset);
 }
 
-return data & (0x >> (32 - 8 * size));
+return data & (0xU >> (32 - 8 * size));
 }
 
 /*
@@ -427,7 +427,7 @@ static void vpci_write_helper(const struct pci_dev *pdev,
 data = merge_result(val, data, size, offset);
 }
 
-r->write(pdev, r->offset, data & (0x >> (32 - 8 * r->size)),
+r->write(pdev, r->offset, data & (0xU >> (32 - 8 * r->size)),
  r->private);
 }
 
-- 
2.34.1




[XEN PATCH v4 1/4] x86/vmx: address violations of MISRA C:2012 Rule 7.2

2023-07-26 Thread Simone Ballarin
From: Gianluca Luparini 

The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".

Add the 'U' suffix to integers literals with unsigned type.

For the sake of uniformity, the following changes are made:
- add the 'U' suffix to macros near
  'CPU_BASED_ACTIVATE_SECONDARY_CONTROLS' and
  'SECONDARY_EXEC_NOTIFY_VM_EXITING' macros in 'vmcs.h'
- add the 'U' suffix to macros near 'INTR_INFO_VALID_MASK'
  macro in 'vmx.h'

Signed-off-by: Gianluca Luparini 
Signed-off-by: Simone Ballarin 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Jan Beulich 
---
Changes in v4:
- change commit headline

Changes in v3:
- change 'Signed-off-by' ordering
- change commit message
- remove unnecessary changes in 'vvmx.c'
- add 'uint32_t' casts in 'vvmx.c'
- add missing 'U' in 'vmcs.h' macros
- change macro to '(1u << 31)' in 'vmx.h'
- remove unnecessary changes to 'vmx.h'

Changes in v2:
- minor change to commit title
- change commit message
- remove unnecessary changes in 'vpmu_intel.c' and 'vmx.h'
- add 'ULL' suffix in 'vpmu_intel.c'
- add zero-padding to constants in 'vmx.h'
- add missing 'U' in 'vmx.h'
---
 xen/arch/x86/cpu/vpmu_intel.c   |  2 +-
 xen/arch/x86/hvm/vmx/vmcs.c |  6 +-
 xen/arch/x86/hvm/vmx/vvmx.c |  8 +--
 xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 84 -
 xen/arch/x86/include/asm/hvm/vmx/vmx.h  | 16 ++---
 5 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
index fa5b40c65c..6330c89b47 100644
--- a/xen/arch/x86/cpu/vpmu_intel.c
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -945,7 +945,7 @@ const struct arch_vpmu_ops *__init core2_vpmu_init(void)
 fixed_counters_mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
 global_ctrl_mask = ~1ULL << fixed_pmc_cnt) - 1) << 32) |
  ((1ULL << arch_pmc_cnt) - 1));
-global_ovf_ctrl_mask = ~(0xC000 |
+global_ovf_ctrl_mask = ~(0xC000ULL |
  (((1ULL << fixed_pmc_cnt) - 1) << 32) |
  ((1ULL << arch_pmc_cnt) - 1));
 if ( version > 2 )
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 13719cc923..6cefb88aec 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -911,7 +911,7 @@ void vmx_clear_msr_intercept(struct vcpu *v, unsigned int 
msr,
 if ( type & VMX_MSR_W )
 clear_bit(msr, msr_bitmap->write_low);
 }
-else if ( (msr >= 0xc000) && (msr <= 0xc0001fff) )
+else if ( (msr >= 0xc000U) && (msr <= 0xc0001fffU) )
 {
 msr &= 0x1fff;
 if ( type & VMX_MSR_R )
@@ -939,7 +939,7 @@ void vmx_set_msr_intercept(struct vcpu *v, unsigned int msr,
 if ( type & VMX_MSR_W )
 set_bit(msr, msr_bitmap->write_low);
 }
-else if ( (msr >= 0xc000) && (msr <= 0xc0001fff) )
+else if ( (msr >= 0xc000U) && (msr <= 0xc0001fffU) )
 {
 msr &= 0x1fff;
 if ( type & VMX_MSR_R )
@@ -957,7 +957,7 @@ bool vmx_msr_is_intercepted(struct vmx_msr_bitmap 
*msr_bitmap,
 if ( msr <= 0x1fff )
 return test_bit(msr, is_write ? msr_bitmap->write_low
   : msr_bitmap->read_low);
-else if ( (msr >= 0xc000) && (msr <= 0xc0001fff) )
+else if ( (msr >= 0xc000U) && (msr <= 0xc0001fffU) )
 return test_bit(msr & 0x1fff, is_write ? msr_bitmap->write_high
: msr_bitmap->read_high);
 else
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 16b0ef82b6..b7be424afb 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -263,7 +263,7 @@ uint64_t get_vvmcs_virtual(void *vvmcs, uint32_t 
vmcs_encoding)
 res >>= 32;
 break;
 case VVMCS_WIDTH_32:
-res &= 0x;
+res = (uint32_t)res;
 break;
 case VVMCS_WIDTH_NATURAL:
 default:
@@ -315,14 +315,14 @@ void set_vvmcs_virtual(void *vvmcs, uint32_t 
vmcs_encoding, uint64_t val)
 case VVMCS_WIDTH_64:
 if ( enc.access_type )
 {
-res &= 0x;
+res = (uint32_t)res;
 res |= val << 32;
 }
 else
 res = val;
 break;
 case VVMCS_WIDTH_32:
-res = val & 0x;
+res = (uint32_t)val;
 break;
 case VVMCS_WIDTH_NATURAL:
 default:
@@ -2306,7 +2306,7 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 
*msr_content)
 break;
 case MSR_IA32_VMX_CR0_FIXED1:
 /* allow 0-settings for all bits */
-data = 0x;
+data = 0xU;
 break;
 case MSR_IA32_VMX_CR4_FIXED0:
 /* VMXE bit must be 1 in VMX operation */
diff --git 

[PATCH] libxenstat/Linux: pass nul-terminated string to strpbrk()

2023-07-26 Thread Jan Beulich
While what "tmp" points to has been cleared at the end of the first
iteration of parseNetDevLine()'s main loop, this is too late for the
first iteration's invocation of strpbrk() (copying the interface name).
Properly nul-terminate the string at population time instead, removing
the late clearing.

While there also eliminate a confusing (because of being wrong) comment:
A regex parsing error would be handled one further scope outwards. Here
we're dealing with field 1 vs any of the later fields.

Signed-off-by: Jan Beulich 
---
Of course the function leaves much to be desired.

We had a report of a problem which may have been because of the issue
addressed here. Sadly the reporter hasn't come back with any results, so
I have to submit this largely "blindly".

--- a/tools/libs/stat/xenstat_linux.c
+++ b/tools/libs/stat/xenstat_linux.c
@@ -169,6 +169,7 @@ static int parseNetDevLine(char *line, c
matches[i].rm_so + 1) * 
sizeof(char));
for (x = matches[i].rm_so; x < 
matches[i].rm_eo; x++)
tmp[x - matches[i].rm_so] = line[x];
+   tmp[x - matches[i].rm_so] = 0;
 
/* We populate all the fields from 
/proc/net/dev line */
if (i > 1) {
@@ -225,15 +226,11 @@ static int parseNetDevLine(char *line, c
break;
}
}
-   else
-   /* There were errors when parsing this directly 
in RE. strpbrk() helps */
-   if (iface != NULL) {
+   else if (iface != NULL) {
char *tmp2 = strpbrk(tmp, 
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");
if (tmp2 != NULL)
strcpy(iface, tmp2);
}
-
-   memset(tmp, 0, matches[i].rm_eo - 
matches[i].rm_so);
}
}
}



[PATCH v3 5/5] x86: short-circuit certain cpu_has_* when x86-64-v{2,3} are in effect

2023-07-26 Thread Jan Beulich
Certain fallback code can be made subject to DCE this way. Note that
CX16 has no compiler provided manifest constant, so CONFIG_* are used
there instead. Note also that we don't have cpu_has_movbe nor
cpu_has_lzcnt (aka cpu_has_abm).

Signed-off-by: Jan Beulich 
---
Of course we could use IS_ENABLED(CONFIG_X86_64_V) everywhere, but as
CX16 shows this isn't necessarily better than the #if/#else approach
based on compiler-provided manifest symbols. While not really intended
to be used that way, it looks as if we could also use
IS_ENABLED(__POPCNT__) and alike if we thought this would end up neater
(because of avoiding the #ifdef).

We could go further and also short-circuit SSE*, AVX and alike, which we
don't use outside of the emulator. This then would of course call for
also having a way to select x86-64-v4.
---
v2: Also cover XSAVE.

--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -76,15 +76,25 @@ static inline bool boot_cpu_has(unsigned
 #define cpu_has_eistboot_cpu_has(X86_FEATURE_EIST)
 #define cpu_has_ssse3   boot_cpu_has(X86_FEATURE_SSSE3)
 #define cpu_has_fma boot_cpu_has(X86_FEATURE_FMA)
-#define cpu_has_cx16boot_cpu_has(X86_FEATURE_CX16)
+#define cpu_has_cx16(IS_ENABLED(CONFIG_X86_64_V2) || \
+ IS_ENABLED(CONFIG_X86_64_V3) || \
+ boot_cpu_has(X86_FEATURE_CX16))
 #define cpu_has_pdcmboot_cpu_has(X86_FEATURE_PDCM)
 #define cpu_has_pcidboot_cpu_has(X86_FEATURE_PCID)
 #define cpu_has_sse4_1  boot_cpu_has(X86_FEATURE_SSE4_1)
 #define cpu_has_sse4_2  boot_cpu_has(X86_FEATURE_SSE4_2)
 #define cpu_has_x2apic  boot_cpu_has(X86_FEATURE_X2APIC)
+#ifdef __POPCNT__
+#define cpu_has_popcnt  true
+#else
 #define cpu_has_popcnt  boot_cpu_has(X86_FEATURE_POPCNT)
+#endif
 #define cpu_has_aesni   boot_cpu_has(X86_FEATURE_AESNI)
+#ifdef __XSAVE__
+#define cpu_has_xsave   true
+#else
 #define cpu_has_xsave   boot_cpu_has(X86_FEATURE_XSAVE)
+#endif
 #define cpu_has_avx boot_cpu_has(X86_FEATURE_AVX)
 #define cpu_has_f16cboot_cpu_has(X86_FEATURE_F16C)
 #define cpu_has_rdrand  boot_cpu_has(X86_FEATURE_RDRAND)
@@ -114,11 +124,19 @@ static inline bool boot_cpu_has(unsigned
 #define cpu_has_xsaves  boot_cpu_has(X86_FEATURE_XSAVES)
 
 /* CPUID level 0x0007:0.ebx */
+#ifdef __BMI__
+#define cpu_has_bmi1true
+#else
 #define cpu_has_bmi1boot_cpu_has(X86_FEATURE_BMI1)
+#endif
 #define cpu_has_hle boot_cpu_has(X86_FEATURE_HLE)
 #define cpu_has_avx2boot_cpu_has(X86_FEATURE_AVX2)
 #define cpu_has_smepboot_cpu_has(X86_FEATURE_SMEP)
+#ifdef __BMI2__
+#define cpu_has_bmi2true
+#else
 #define cpu_has_bmi2boot_cpu_has(X86_FEATURE_BMI2)
+#endif
 #define cpu_has_invpcid boot_cpu_has(X86_FEATURE_INVPCID)
 #define cpu_has_rtm boot_cpu_has(X86_FEATURE_RTM)
 #define cpu_has_pqe boot_cpu_has(X86_FEATURE_PQE)




[PATCH v3 4/5] x86: use POPCNT for hweight() when available

2023-07-26 Thread Jan Beulich
This is faster than using the software implementation, and the insn is
available on all half-way recent hardware. Use the respective compiler
builtins when available.

Signed-off-by: Jan Beulich 
Reviewed-by: Jason Andryuk 

--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -475,9 +475,16 @@ static inline int fls(unsigned int x)
  *
  * The Hamming Weight of a number is the total number of bits set in it.
  */
+#ifdef __POPCNT__
+#define hweight64(x) __builtin_popcountll(x)
+#define hweight32(x) __builtin_popcount(x)
+#define hweight16(x) __builtin_popcount((uint16_t)(x))
+#define hweight8(x)  __builtin_popcount((uint8_t)(x))
+#else
 #define hweight64(x) generic_hweight64(x)
 #define hweight32(x) generic_hweight32(x)
 #define hweight16(x) generic_hweight16(x)
 #define hweight8(x) generic_hweight8(x)
+#endif
 
 #endif /* _X86_BITOPS_H */




[PATCH v3 3/5] x86: allow Kconfig control over psABI level

2023-07-26 Thread Jan Beulich
Newer hardware offers more efficient and/or flexible and/or capable
instructions, some of which we can make good use of in the hypervisor
as well. Allow a basic way (no alternatives patching) of enabling their
use. Of course this means that hypervisors thus built won't work
anymore on older, less capable hardware.

Since older compilers (apparently gcc10 / clang11 and older) won't
recognize -march=x86-64-v2 etc, also addd fallback logic passing
-mpopcnt and alike explicitly.

Note that in efi_arch_cpu() the filling of 7c0 and 7d0 are forward-
looking; we only require 7b0, but we need to use cpuid_count() anyway.

Signed-off-by: Jan Beulich 
---
TBD: While the added assembly code goes strictly from the CONFIG_*
 settings, EFI code uses the compiler provided manifest constants
 where available (just like the subsequent "x86: short-circuit
 certain cpu_has_* when x86-64-v{2,3} are in effect"). While I
 generally prefer it that way, this comes with a downside: If we
 ever started to use one of the ISA extensions in assembly code
 (and then keyed to the CONFIG_* settings), things could break. I
 don't view us doing this as very likely though, as then we'd need
 to have two (or more) variants of such assembly code, which we
 would likely want to avoid. (What I'd like to avoid is using the
 compiler manifest constants in assembly code: In principle we ought
 to filter out any -march= when constructing AFLAGS, and perhaps
 at least all -m... and -f... options. Compilers might rightfully
 complain about their use as inapplicable, much like we've seen for
 -Wl,... when no linking is done.)

TBD: While we don't document most "cpuid=no-*" options (and hence imply
 their use to be unsupported), using e.g. "cpuid=no-popcnt" with a
 V2=y hypervisor clearly can't have the intended effect, and hence
 might perhaps better be flagged in some way.

TBD: v2 also includes LAHF/SAHF. Since we don't use floating point and
 hence FPU insns, we ought to be okay not explicitly checking for
 it. But there is a certain amount of risk of the compiler finding
 some "smart" use for one or both of the insns. However, if we were
 to check the feature, we'd need to account for the quirk that
 init_amd() also works around.

Whereas the baseline -> v2 step isn't much of a difference (we'll gain
more there by a subsequent patch), v2 -> v3, while presumably (or shall
I say hopefully) faster, yields an overall growth of .text size by (in
my build) about 2k. The primary reason for this appear to be conversions
of SHL-by-immediate to SHLX.

The VGA output attempt in early (MB/MB2) boot code does not appear to
work (anymore?). The serial output may work, but only if - without any
setup in Xen - both sides agree on the serial settings (baud rate etc).
Hence the feature checks added on the legacy paths are of limited use.
---
v3: Add fallback logic and use logic from new prereq change.
v2: Also cover XSAVE. Add early boot feature checking.

--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -118,6 +118,36 @@ config HVM
 
  If unsure, say Y.
 
+choice
+   prompt "base psABI level"
+   default X86_64_BASELINE
+   help
+ The psABI defines 4 levels of ISA extension sets as a coarse granular
+ way of identifying advanced functionality that would be uniformly
+ available in respectively newer hardware.  While v4 is not really of
+ interest for Xen, the others can be selected here, making the
+ resulting Xen no longer work on older hardware.  This option won't
+ have any effect if the toolchain doesn't support the distinction.
+
+ If unsure, stick to the default.
+
+config X86_64_BASELINE
+   bool "baseline"
+
+config X86_64_V2
+   bool "v2"
+   help
+ This enables POPCNT and CX16, besides other extensions which are of
+ no interest here.
+
+config X86_64_V3
+   bool "v3"
+   help
+ This enables BMI, BMI2, LZCNT, MOVBE, and XSAVE, besides other
+ extensions which are of no interest here.
+
+endchoice
+
 config XEN_SHSTK
bool "Supervisor Shadow Stacks"
depends on HAS_AS_CET_SS
--- a/xen/arch/x86/arch.mk
+++ b/xen/arch/x86/arch.mk
@@ -36,6 +36,29 @@ CFLAGS += -mno-red-zone -fpic
 # the SSE setup for variadic function calls.
 CFLAGS += -mno-mmx -mno-sse $(call cc-option,$(CC),-mskip-rax-setup)
 
+# Enable the selected baseline ABI, if supported by the compiler.
+x86-64-$(CONFIG_X86_64_BASELINE) :=
+x86-64-$(CONFIG_X86_64_V2) := x86-64-v2
+x86-64-$(CONFIG_X86_64_V3) := x86-64-v3
+
+ifneq ($(x86-64-y),)
+CFLAGS-psabi := $(call cc-option,$(CC),-march=$(x86-64-y))
+
+ifeq ($(CFLAGS-psabi),)
+# Fall back to using the subset of -m... options that are of interest.
+x86-64-v2 := -mpopcnt -mcx16
+x86-64-v3 := $(x86-64-v2) -mbmi -mbmi2 -mlzcnt -mmovbe -mxsave
+$(call cc-options-add,CFLAGS-psabi,CC,$($(x86-64-y)))
+
+ifneq ($(strip 

[PATCH v3 2/5] build: permit Kconfig control over how to deal with unsatisfiable choices

2023-07-26 Thread Jan Beulich
Some options we allow the build admin to select may require new enough
tool chain components to fulfill (partly or entirely). Provide yet
another control to pick what action to take at the end of the build
process - be silent about this, warn, or fail the build.

Signed-off-by: Jan Beulich 
---
This may not be fine grained enough: Optimization settings (like added
by "x86: allow Kconfig control over psABI level") may want dealing with
differently than security relevant ones (like XEN_SHSTK or XEN_IBT).

Whether to do this uniformly at the end of the build is up for
discussion: In the "warn" case we will want the resulting output late,
so it is more likely to be noticed. In the "fail build" case though we
may want the failure to occur early.
---
v3: New.

--- a/xen/Kconfig
+++ b/xen/Kconfig
@@ -64,6 +64,25 @@ config UNSUPPORTED
  preview features as defined by SUPPORT.md. (Note that if an option
  doesn't depend on UNSUPPORTED it doesn't imply that is supported.)
 
+choice
+   prompt "How to deal with settings which cannot be satisified"
+   default UNSATISFIED_WARNING
+   help
+ Some selectable options may depend on e.g. tool chain functionality.
+ Select here how to deal with such when actually building a such
+ configured hypervisor.
+
+config UNSATISFIED_SILENT
+   bool "silent"
+
+config UNSATISFIED_WARNING
+   bool "emit warnings"
+
+config UNSATISFIED_ERROR
+   bool "fail the build"
+
+endchoice
+
 config LTO
bool "Link Time Optimisation"
depends on BROKEN
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -588,6 +588,10 @@ $(TARGET): outputmakefile FORCE
$(Q)$(MAKE) $(build)=arch/$(SRCARCH) include
$(Q)$(MAKE) $(build)=. arch/$(SRCARCH)/include/asm/asm-offsets.h
$(Q)$(MAKE) $(build)=. MKRELOC=$(MKRELOC) 'ALL_OBJS=$(ALL_OBJS-y)' 
'ALL_LIBS=$(ALL_LIBS-y)' $@
+ifneq ($(CONFIG_UNSATISFIED_SILENT),y)
+   $(Q)$(if $(strip $(XEN_CONFIG_UNSATISFIED)),$(MAKE),:) \
+   $(build)=. 'XEN_CONFIG_UNSATISFIED=$(XEN_CONFIG_UNSATISFIED)' 
check_unsatisfied
+endif
 
 SUBDIRS = xsm arch common crypto drivers lib test
 define all_sources
--- a/xen/build.mk
+++ b/xen/build.mk
@@ -88,3 +88,11 @@ targets += prelink.o
 
 $(TARGET): prelink.o FORCE
$(Q)$(MAKE) $(build)=arch/$(SRCARCH) $@
+
+PHONY += check_unsatisfied
+check_unsatisfied:
+   $(Q): $(if $(filter y,$(CONFIG_UNSATISFIED_WARNING)), \
+  $(warning The following selections could not be satisfied:), 
\
+  $(shell echo 'The following selections could not be 
satisfied:' >&2)) \
+ $(foreach c,$(sort $(XEN_CONFIG_UNSATISFIED)),$(shell echo ' - 
CONFIG_$c' >&2)) \
+ $(if $(filter y,$(CONFIG_UNSATISFIED_ERROR)),$(error Failing 
build))




[PATCH v3 1/5] build: make cc-option properly deal with unrecognized sub-options

2023-07-26 Thread Jan Beulich
In options like -march=, it may be only the sub-option which is
unrecognized by the compiler. In such an event the error message often
splits option and argument, typically saying something like "bad value
'' for ''. Extend the grep invocation accordingly,
also accounting for Clang to not mention e.g. -march at all when an
incorrect argument was given for it.

To keep things halfway readable, re-wrap and re-indent the entire
construct.

Signed-off-by: Jan Beulich 
---
In principle -e "$$pat" could now be omitted from the grep invocation,
since if that matches, both $$opt and $$arg will, too. But I thought I'd
leave it for completeness.
---
v3: Fix build with make 4.3 and newer, where the treatment of \# has
changed.
v2: Further relax grep patterns for clang, which doesn't mention -march
when complaining about an invalid argument to it.

--- a/Config.mk
+++ b/Config.mk
@@ -8,6 +8,7 @@ endif
 comma   := ,
 open:= (
 close   := )
+sharp   := \#
 squote  := '
 #' Balancing squote, to help syntax highlighting
 empty   :=
@@ -90,9 +91,14 @@ PYTHON_PREFIX_ARG ?= --prefix="$(prefix)
 # of which would indicate an "unrecognized command-line option" warning/error.
 #
 # Usage: cflags-y += $(call cc-option,$(CC),-march=winchip-c6,-march=i586)
-cc-option = $(shell if test -z "`echo 'void*p=1;' | \
-  $(1) $(2) -c -o /dev/null -x c - 2>&1 | grep -- 
$(2:-Wa$(comma)%=%) -`"; \
-  then echo "$(2)"; else echo "$(3)"; fi ;)
+cc-option = $(shell pat='$(2:-Wa$(comma)%=%)'; \
+opt="$${pat%%=*}" arg="$${pat$(sharp)*=}"; \
+if test -z "`echo 'void*p=1;' | \
+ $(1) $(2) -c -o /dev/null -x c - 2>&1 | \
+ grep -e "$$pat" -e "$$opt" -e "$$arg" -`"; \
+then echo "$(2)"; \
+else echo "$(3)"; \
+fi;)
 
 # cc-option-add: Add an option to compilation flags, but only if supported.
 # Usage: $(call cc-option-add CFLAGS,CC,-march=winchip-c6)




[PATCH v3 0/5] x86: allow Kconfig control over psABI level

2023-07-26 Thread Jan Beulich
As discussed in Prague, with some minor add-ons (patch 2 replacing a much
older patch under the same title).

v3 addresses issues found as well as review feedback; see individual
patches for details. Patch 3 in particular continues to raises several
questions as well, though.

1: build: make cc-option properly deal with unrecognized sub-options
2: build: permit Kconfig control over how to deal with unsatisfiable choices
3: allow Kconfig control over psABI level
4: use POPCNT for hweight() when available
5: short-circuit certain cpu_has_* when x86-64-v{2,3} are in effect

Jan



[libvirt test] 182004: tolerable FAIL - PUSHED

2023-07-26 Thread osstest service owner
flight 182004 libvirt real [real]
flight 182020 libvirt real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/182004/
http://logs.test-lab.xenproject.org/osstest/logs/182020/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt-raw 13 guest-start fail pass in 182020-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail in 182020 like 
181966
 test-armhf-armhf-libvirt-raw 14 migrate-support-check fail in 182020 never pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 181966
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 181966
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 libvirt  411cbe7199ce533ae5fa78f5558dddca6f88ef1a
baseline version:
 libvirt  6c3ef66cc6d60d35e86206fd7b78178c4d0cbb88

Last test of basis   181966  2023-07-22 04:21:55 Z4 days
Testing same since   182004  2023-07-25 04:20:37 Z1 days1 attempts


People who touched revisions under test:
  Oleg Vasilev 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw fail
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on 

Re: [PATCH v5 1/2] tools/console: Add escape argument to configure escape character

2023-07-26 Thread Hongda Deng



On 2023/7/12 18:29, Peter Hoyes wrote:

From: Peter Hoyes 

Dom0 may be accessed via telnet, meaning the default escape character
(which is the same as telnet's) cannot be directly used to exit the
console. It would be helpful to make the escape character customizable
in such use cases.

Add --escape argument to console tool for this purpose.

Add argument to getopt options, parse and validate the escape character
and pass value to console_loop.

If --escape is not specified, it falls back to the existing behavior
using DEFAULT_ESCAPE_SEQUENCE.

Signed-off-by: Peter Hoyes 
---
Changes in v5:
- Add this changelog

Changes in v4:
- Improve validation of the escape_character optarg

Changes in v3:
- Re-add the Reviewed-By tag accidentally removed in v2

Changes in v2:
- Drop the tags intended only for internal use at Arm

  tools/console/client/main.c | 21 +
  1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tools/console/client/main.c b/tools/console/client/main.c
index 6775006488..d2dcc3ddca 100644
--- a/tools/console/client/main.c
+++ b/tools/console/client/main.c
@@ -42,7 +42,7 @@
  #include 
  #include "xenctrl.h"
  
-#define ESCAPE_CHARACTER 0x1d

+#define DEFAULT_ESCAPE_CHARACTER 0x1d
  
  static volatile sig_atomic_t received_signal = 0;

  static char lockfile[sizeof (XEN_LOCK_DIR "/xenconsole.") + 8] = { 0 };
@@ -77,6 +77,7 @@ static void usage(const char *program) {
   "  -n, --num N  use console number N\n"
   "  --type TYPE  console type. must be 'pv', 'serial' or 
'vuart'\n"
   "  --start-notify-fd N file descriptor used to notify parent\n"
+  "  --escape E   escape sequence to exit console\n"
   , program);
  }
  
@@ -174,7 +175,7 @@ static void restore_term(int fd, struct termios *old)

  }
  
  static int console_loop(int fd, struct xs_handle *xs, char *pty_path,

-   bool interactive)
+   bool interactive, char escape_character)
  {
int ret, xs_fd = xs_fileno(xs), max_fd = -1;
  
@@ -215,7 +216,7 @@ static int console_loop(int fd, struct xs_handle *xs, char *pty_path,

char msg[60];
  
  			len = read(STDIN_FILENO, msg, sizeof(msg));

-   if (len == 1 && msg[0] == ESCAPE_CHARACTER) {
+   if (len == 1 && msg[0] == escape_character) {
return 0;
}
  
@@ -335,6 +336,7 @@ int main(int argc, char **argv)

{ "help",0, 0, 'h' },
{ "start-notify-fd", 1, 0, 's' },
{ "interactive", 0, 0, 'i' },
+   { "escape",  1, 0, 'e' },
{ 0 },
  
  	};

@@ -345,6 +347,7 @@ int main(int argc, char **argv)
console_type type = CONSOLE_INVAL;
bool interactive = 0;
const char *console_names = "serial, pv, vuart";
+   char escape_character = DEFAULT_ESCAPE_CHARACTER;
  
  	while((ch = getopt_long(argc, argv, sopt, lopt, _ind)) != -1) {

switch(ch) {
@@ -375,6 +378,16 @@ int main(int argc, char **argv)
case 'i':
interactive = 1;
break;
+   case 'e':
+   if (optarg[0] == '^' && optarg[1] && optarg[2] == '\0')
+   escape_character = optarg[1] & 0x1f;
+   else if (optarg[0] && optarg[1] == '\0')
+   escape_character = optarg[0];
+   else {
+   fprintf(stderr, "Invalid escape argument\n");
+   exit(EINVAL);
+   }
+   break;
default:
fprintf(stderr, "Invalid argument\n");
fprintf(stderr, "Try `%s --help' for more 
information.\n",
@@ -493,7 +506,7 @@ int main(int argc, char **argv)
close(start_notify_fd);
}
  
-	console_loop(spty, xs, path, interactive);

+   console_loop(spty, xs, path, interactive, escape_character);
  
  	free(path);

free(dom_path);


Nice work~

Reviewed-by: Hongda Deng 




Re: [PATCH v2 4/5] x86/iommu: pass full IO-APIC RTE for remapping table update

2023-07-26 Thread Roger Pau Monné
On Wed, Jul 19, 2023 at 12:37:47PM +0200, Jan Beulich wrote:
> On 18.07.2023 14:43, Roger Pau Monne wrote:
> > @@ -439,36 +427,47 @@ unsigned int cf_check io_apic_read_remap_rte(
> >  }
> >  
> >  void cf_check io_apic_write_remap_rte(
> > -unsigned int apic, unsigned int reg, unsigned int value)
> > +unsigned int apic, unsigned int pin, uint64_t raw)
> >  {
> > -unsigned int pin = (reg - 0x10) / 2;
> > +struct IO_xAPIC_route_entry rte = { .raw = raw };
> >  struct IO_xAPIC_route_entry old_rte = { };
> >  struct IO_APIC_route_remap_entry *remap_rte;
> > -unsigned int rte_upper = (reg & 1) ? 1 : 0;
> >  struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
> > -int saved_mask;
> > -
> > -old_rte = __ioapic_read_entry(apic, pin, true);
> > +bool masked = true;
> > +int rc;
> >  
> >  remap_rte = (struct IO_APIC_route_remap_entry *) _rte;
> >  
> > -/* mask the interrupt while we change the intremap table */
> > -saved_mask = remap_rte->mask;
> > -remap_rte->mask = 1;
> > -__io_apic_write(apic, reg & ~1, *(u32 *)_rte);
> > -remap_rte->mask = saved_mask;
> > -
> > -if ( ioapic_rte_to_remap_entry(iommu, apic, pin,
> > -   _rte, rte_upper, value) )
> > +if ( !cpu_has_cx16 )
> >  {
> > -__io_apic_write(apic, reg, value);
> > +   /*
> > +* Cannot atomically update the IRTE entry: mask the IO-APIC pin to
> > +* avoid interrupts seeing an inconsistent IRTE entry.
> > +*/
> > +old_rte = __ioapic_read_entry(apic, pin, true);
> > +if ( !old_rte.mask )
> > +{
> > +masked = false;
> > +old_rte.mask = 1;
> > +__ioapic_write_entry(apic, pin, true, old_rte);
> > +}
> > +}
> >  
> > -/* Recover the original value of 'mask' bit */
> > -if ( rte_upper )
> > -__io_apic_write(apic, reg & ~1, *(u32 *)_rte);
> > +rc = ioapic_rte_to_remap_entry(iommu, apic, pin, _rte, rte);
> 
> I realize it has been like this before, but passing _rte here is
> odd. We already have its properly typed alias: remap_rte. All the
> called function does is do the same type cast again. Question is
> whether ...
> 
> > +if ( rc )
> > +{
> > +if ( !masked )
> > +{
> > +/* Recover the original value of 'mask' bit */
> > +old_rte.mask = 0;
> > +__ioapic_write_entry(apic, pin, true, old_rte);
> > +}
> > +dprintk(XENLOG_ERR VTDPREFIX,
> > +"failed to update IRTE for IO-APIC#%u pin %u: %d\n",
> > +apic, pin, rc);
> > +return;
> >  }
> > -else
> > -__ioapic_write_entry(apic, pin, true, old_rte);
> > +__ioapic_write_entry(apic, pin, true, old_rte);
> 
> ... the further uses of old_rte then won't end up yet more confusing
> than they already are (first and foremost again because of "old" not
> being applicable here).

I've instead opted to remove remap_rte from io_apic_write_remap_rte(),
as it was unused.  I've also added a comment to clarify the usage of
old_rte when ioapic_rte_to_remap_entry() returns success.

Thanks, Roger.



Re: [XEN PATCH v2] xen: use parameter name 'mcs' in arch_do_multicall_call()

2023-07-26 Thread Jan Beulich
On 26.07.2023 11:34, Federico Serafini wrote:
> Make function declaration and definition consistent using the same
> parameter name ('mcs' do denote a pointer to an 'mc_state').
> This addresses a violation of MISRA C:2012 Rule 8.3: "All declarations
> of an object or function shall use the same names and type qualifiers".
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich 





[PATCH v2] x86/cpu-policy: Advertise MSR_ARCH_CAPS to guests by default

2023-07-26 Thread Andrew Cooper
With xl/libxl now able to control the policy bits for MSR_ARCH_CAPS, it is
safe to advertise to guests by default.  In turn, we don't need the special
case to expose details to dom0.

This advertises MSR_ARCH_CAPS to guests on *all* Intel hardware, even if the
register content ends up being empty.

  - Advertising ARCH_CAPS and not RSBA signals "retpoline is safe here and
everywhere you might migrate to".  This is important because it avoids the
guest kernel needing to rely on model checks.

  - Alternatively, levelling for safety across the Broadwell/Skylake divide
requires advertising ARCH_CAPS and RSBA, meaning "retpoline not safe on
some hardware you might migrate to".

On Cascade Lake and later hardware, guests can now see RDCL_NO (not vulnerable
to Meltdown) amongst others.  This causes substantial performance
improvements, as guests are no longer applying software mitigations in cases
where they don't need to.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

Not to go in before Roger's libxl changes...

v2:
 * Extend the commit message.
---
 xen/arch/x86/cpu-policy.c   | 11 ---
 xen/include/public/arch-x86/cpufeatureset.h |  2 +-
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index f40eeb8be8dc..1f954d4e5940 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -888,17 +888,6 @@ void __init init_dom0_cpuid_policy(struct domain *d)
 if ( cpu_has_itsc )
 p->extd.itsc = true;
 
-/*
- * Expose the "hardware speculation behaviour" bits of ARCH_CAPS to dom0,
- * so dom0 can turn off workarounds as appropriate.  Temporary, until the
- * domain policy logic gains a better understanding of MSRs.
- */
-if ( is_hardware_domain(d) && cpu_has_arch_caps )
-{
-p->feat.arch_caps = true;
-p->arch_caps.raw = host_cpu_policy.arch_caps.raw;
-}
-
 /* Apply dom0-cpuid= command line settings, if provided. */
 if ( dom0_cpuid_cmdline )
 {
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index ce7407d6a10c..6d20810cb9d1 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -271,7 +271,7 @@ XEN_CPUFEATURE(AVX512_FP16,   9*32+23) /*A  AVX512 FP16 
instructions */
 XEN_CPUFEATURE(IBRSB, 9*32+26) /*A  IBRS and IBPB support (used by 
Intel) */
 XEN_CPUFEATURE(STIBP, 9*32+27) /*A  STIBP */
 XEN_CPUFEATURE(L1D_FLUSH, 9*32+28) /*S  MSR_FLUSH_CMD and L1D flush. */
-XEN_CPUFEATURE(ARCH_CAPS, 9*32+29) /*!a IA32_ARCH_CAPABILITIES MSR */
+XEN_CPUFEATURE(ARCH_CAPS, 9*32+29) /*!A IA32_ARCH_CAPABILITIES MSR */
 XEN_CPUFEATURE(CORE_CAPS, 9*32+30) /*   IA32_CORE_CAPABILITIES MSR */
 XEN_CPUFEATURE(SSBD,  9*32+31) /*A  MSR_SPEC_CTRL.SSBD available */
 
-- 
2.30.2




Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-26 Thread Roger Pau Monné
On Wed, Jul 26, 2023 at 01:17:58AM +, Volodymyr Babchuk wrote:
> 
> Hi Roger,
> 
> Roger Pau Monné  writes:
> 
> > On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
> >> From: Oleksandr Andrushchenko 
> >> @@ -498,6 +537,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
> >> unsigned int size,
> >>  ASSERT(data_offset < size);
> >>  }
> >>  spin_unlock(>vpci->lock);
> >> +unlock_locks(d);
> >
> > There's one issue here, some handlers will cal pcidevs_lock(), which
> > will result in a lock over inversion, as in the previous patch we
> > agreed that the locking order was pcidevs_lock first, d->pci_lock
> > after.
> >
> > For example the MSI control_write() handler will call
> > vpci_msi_arch_enable() which takes the pcidevs lock.  I think I will
> > have to look into using a dedicated lock for MSI related handling, as
> > that's the only place where I think we have this pattern of taking the
> > pcidevs_lock after the d->pci_lock.
> 
> I'll mention this in the commit message. Is there something else that I
> should do right now?

Well, I don't think we want to commit this as-is with a known lock
inversion.

The functions that require the pcidevs lock are:

pt_irq_{create,destroy}_bind()
unmap_domain_pirq()

AFAICT those functions require the lock in order to assert that the
underlying device doesn't go away, as they do also use d->event_lock
in order to get exclusive access to the data fields.  Please double
check that I'm not mistaken.

If that's accurate you will have to check the call tree that spawns
from those functions in order to modify the asserts to check for
either the pcidevs or the per-domain pci_list lock being taken.

Thanks, Roger.



[XEN PATCH v2] xen: use parameter name 'mcs' in arch_do_multicall_call()

2023-07-26 Thread Federico Serafini
Make function declaration and definition consistent using the same
parameter name ('mcs' do denote a pointer to an 'mc_state').
This addresses a violation of MISRA C:2012 Rule 8.3: "All declarations
of an object or function shall use the same names and type qualifiers".

Signed-off-by: Federico Serafini 
---
Changes in v2:
  - added forgotten changes about x86.
---
 xen/arch/arm/traps.c| 4 ++--
 xen/arch/x86/hypercall.c| 6 +++---
 xen/include/xen/multicall.h | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index d1ef787638..eab2fcfc02 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1470,9 +1470,9 @@ static bool check_multicall_32bit_clean(struct 
multicall_entry *multi)
 return true;
 }
 
-enum mc_disposition arch_do_multicall_call(struct mc_state *state)
+enum mc_disposition arch_do_multicall_call(struct mc_state *mcs)
 {
-struct multicall_entry *multi = >call;
+struct multicall_entry *multi = >call;
 
 if ( multi->op >= ARRAY_SIZE(hypercall_args) )
 {
diff --git a/xen/arch/x86/hypercall.c b/xen/arch/x86/hypercall.c
index cc9a6a9d59..01cd73040d 100644
--- a/xen/arch/x86/hypercall.c
+++ b/xen/arch/x86/hypercall.c
@@ -203,15 +203,15 @@ int hypercall_xlat_continuation(unsigned int *id, 
unsigned int nr,
 return rc;
 }
 
-enum mc_disposition arch_do_multicall_call(struct mc_state *state)
+enum mc_disposition arch_do_multicall_call(struct mc_state *mcs)
 {
 const struct domain *currd = current->domain;
 
 if ( is_pv_domain(currd) )
-return pv_do_multicall_call(state);
+return pv_do_multicall_call(mcs);
 
 if ( is_hvm_domain(currd) )
-return hvm_do_multicall_call(state);
+return hvm_do_multicall_call(mcs);
 
 return mc_exit;
 }
diff --git a/xen/include/xen/multicall.h b/xen/include/xen/multicall.h
index ac8238660a..5bebeca7b2 100644
--- a/xen/include/xen/multicall.h
+++ b/xen/include/xen/multicall.h
@@ -25,6 +25,6 @@ enum mc_disposition {
 mc_continue,
 mc_exit,
 mc_preempt,
-} arch_do_multicall_call(struct mc_state *mc);
+} arch_do_multicall_call(struct mc_state *mcs);
 
 #endif /* __XEN_MULTICALL_H__ */
-- 
2.34.1




Re: [PATCH v2 28/47] bcache: dynamically allocate the md-bcache shrinker

2023-07-26 Thread Qi Zheng




On 2023/7/26 15:32, Muchun Song wrote:



On 2023/7/24 17:43, Qi Zheng wrote:

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the md-bcache shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct cache_set.

Signed-off-by: Qi Zheng 
---
  drivers/md/bcache/bcache.h |  2 +-
  drivers/md/bcache/btree.c  | 27 ---
  drivers/md/bcache/sysfs.c  |  3 ++-
  3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 5a79bb3c272f..c622bc50f81b 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -541,7 +541,7 @@ struct cache_set {
  struct bio_set    bio_split;
  /* For the btree cache */
-    struct shrinker    shrink;
+    struct shrinker    *shrink;
  /* For the btree cache and anything allocation related */
  struct mutex    bucket_lock;
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index fd121a61f17c..c176c7fc77d9 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -667,7 +667,7 @@ static int mca_reap(struct btree *b, unsigned int 
min_order, bool flush)

  static unsigned long bch_mca_scan(struct shrinker *shrink,
    struct shrink_control *sc)
  {
-    struct cache_set *c = container_of(shrink, struct cache_set, 
shrink);

+    struct cache_set *c = shrink->private_data;
  struct btree *b, *t;
  unsigned long i, nr = sc->nr_to_scan;
  unsigned long freed = 0;
@@ -734,7 +734,7 @@ static unsigned long bch_mca_scan(struct shrinker 
*shrink,

  static unsigned long bch_mca_count(struct shrinker *shrink,
 struct shrink_control *sc)
  {
-    struct cache_set *c = container_of(shrink, struct cache_set, 
shrink);

+    struct cache_set *c = shrink->private_data;
  if (c->shrinker_disabled)
  return 0;
@@ -752,8 +752,8 @@ void bch_btree_cache_free(struct cache_set *c)
  closure_init_stack();
-    if (c->shrink.list.next)
-    unregister_shrinker(>shrink);
+    if (c->shrink)
+    shrinker_unregister(c->shrink);
  mutex_lock(>bucket_lock);
@@ -828,14 +828,19 @@ int bch_btree_cache_alloc(struct cache_set *c)
  c->verify_data = NULL;
  #endif
-    c->shrink.count_objects = bch_mca_count;
-    c->shrink.scan_objects = bch_mca_scan;
-    c->shrink.seeks = 4;
-    c->shrink.batch = c->btree_pages * 2;
+    c->shrink = shrinker_alloc(0, "md-bcache:%pU", c->set_uuid);
+    if (!c->shrink) {
+    pr_warn("bcache: %s: could not allocate shrinker\n", __func__);
+    return -ENOMEM;


Seems you have cheanged the semantic of this. In the past,
it is better to have a shrinker, but now it becomes a mandatory.
Right? I don't know if it is acceptable. From my point of view,
just do the cleanup, don't change any behaviour.


Oh, should return 0 here, will do.




+    }
+
+    c->shrink->count_objects = bch_mca_count;
+    c->shrink->scan_objects = bch_mca_scan;
+    c->shrink->seeks = 4;
+    c->shrink->batch = c->btree_pages * 2;
+    c->shrink->private_data = c;
-    if (register_shrinker(>shrink, "md-bcache:%pU", c->set_uuid))
-    pr_warn("bcache: %s: could not register shrinker\n",
-    __func__);
+    shrinker_register(c->shrink);
  return 0;
  }
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 0e2c1880f60b..45d8af755de6 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -866,7 +866,8 @@ STORE(__bch_cache_set)
  sc.gfp_mask = GFP_KERNEL;
  sc.nr_to_scan = strtoul_or_return(buf);
-    c->shrink.scan_objects(>shrink, );
+    if (c->shrink)
+    c->shrink->scan_objects(c->shrink, );
  }
  sysfs_strtoul_clamp(congested_read_threshold_us,






Re: [PATCH v2 43/47] mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}

2023-07-26 Thread Muchun Song



> On Jul 24, 2023, at 17:43, Qi Zheng  wrote:
> 
> Currently, we maintain two linear arrays per node per memcg, which are
> shrinker_info::map and shrinker_info::nr_deferred. And we need to resize
> them when the shrinker_nr_max is exceeded, that is, allocate a new array,
> and then copy the old array to the new array, and finally free the old
> array by RCU.
> 
> For shrinker_info::map, we do set_bit() under the RCU lock, so we may set
> the value into the old map which is about to be freed. This may cause the
> value set to be lost. The current solution is not to copy the old map when
> resizing, but to set all the corresponding bits in the new map to 1. This
> solves the data loss problem, but bring the overhead of more pointless
> loops while doing memcg slab shrink.
> 
> For shrinker_info::nr_deferred, we will only modify it under the read lock
> of shrinker_rwsem, so it will not run concurrently with the resizing. But
> after we make memcg slab shrink lockless, there will be the same data loss
> problem as shrinker_info::map, and we can't work around it like the map.
> 
> For such resizable arrays, the most straightforward idea is to change it
> to xarray, like we did for list_lru [1]. We need to do xa_store() in the
> list_lru_add()-->set_shrinker_bit(), but this will cause memory
> allocation, and the list_lru_add() doesn't accept failure. A possible
> solution is to pre-allocate, but the location of pre-allocation is not
> well determined.
> 
> Therefore, this commit chooses to introduce a secondary array for
> shrinker_info::{map, nr_deferred}, so that we only need to copy this
> secondary array every time the size is resized. Then even if we get the
> old secondary array under the RCU lock, the found map and nr_deferred are
> also true, so no data is lost.
> 
> [1]. 
> https://lore.kernel.org/all/20220228122126.37293-13-songmuc...@bytedance.com/
> 
> Signed-off-by: Qi Zheng 

Reviewed-by: Muchun Song 





Re: [PATCH v2 23/47] drm/msm: dynamically allocate the drm-msm_gem shrinker

2023-07-26 Thread Qi Zheng




On 2023/7/26 15:24, Muchun Song wrote:



On 2023/7/24 17:43, Qi Zheng wrote:

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the drm-msm_gem shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct msm_drm_private.

Signed-off-by: Qi Zheng 


Reviewed-by: Muchun Song 

A nit bellow.


---
  drivers/gpu/drm/msm/msm_drv.c  |  4 ++-
  drivers/gpu/drm/msm/msm_drv.h  |  4 +--
  drivers/gpu/drm/msm/msm_gem_shrinker.c | 36 --
  3 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c 
b/drivers/gpu/drm/msm/msm_drv.c

index 891eff8433a9..7f6933be703f 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -461,7 +461,9 @@ static int msm_drm_init(struct device *dev, const 
struct drm_driver *drv)

  if (ret)
  goto err_msm_uninit;
-    msm_gem_shrinker_init(ddev);
+    ret = msm_gem_shrinker_init(ddev);
+    if (ret)
+    goto err_msm_uninit;
  if (priv->kms_init) {
  ret = priv->kms_init(ddev);
diff --git a/drivers/gpu/drm/msm/msm_drv.h 
b/drivers/gpu/drm/msm/msm_drv.h

index e13a8cbd61c9..84523d4a1e58 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -217,7 +217,7 @@ struct msm_drm_private {
  } vram;
  struct notifier_block vmap_notifier;
-    struct shrinker shrinker;
+    struct shrinker *shrinker;
  struct drm_atomic_state *pm_state;
@@ -279,7 +279,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, 
void *data,
  unsigned long msm_gem_shrinker_shrink(struct drm_device *dev, 
unsigned long nr_to_scan);

  #endif
-void msm_gem_shrinker_init(struct drm_device *dev);
+int msm_gem_shrinker_init(struct drm_device *dev);
  void msm_gem_shrinker_cleanup(struct drm_device *dev);
  int msm_gem_prime_mmap(struct drm_gem_object *obj, struct 
vm_area_struct *vma);
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c

index f38296ad8743..7daab1298c11 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -34,8 +34,7 @@ static bool can_block(struct shrink_control *sc)
  static unsigned long
  msm_gem_shrinker_count(struct shrinker *shrinker, struct 
shrink_control *sc)

  {
-    struct msm_drm_private *priv =
-    container_of(shrinker, struct msm_drm_private, shrinker);
+    struct msm_drm_private *priv = shrinker->private_data;
  unsigned count = priv->lru.dontneed.count;
  if (can_swap())
@@ -100,8 +99,7 @@ active_evict(struct drm_gem_object *obj)
  static unsigned long
  msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
shrink_control *sc)

  {
-    struct msm_drm_private *priv =
-    container_of(shrinker, struct msm_drm_private, shrinker);
+    struct msm_drm_private *priv = shrinker->private_data;
  struct {
  struct drm_gem_lru *lru;
  bool (*shrink)(struct drm_gem_object *obj);
@@ -148,10 +146,11 @@ msm_gem_shrinker_shrink(struct drm_device *dev, 
unsigned long nr_to_scan)

  struct shrink_control sc = {
  .nr_to_scan = nr_to_scan,
  };
-    int ret;
+    unsigned long ret = SHRINK_STOP;
  fs_reclaim_acquire(GFP_KERNEL);
-    ret = msm_gem_shrinker_scan(>shrinker, );
+    if (priv->shrinker)
+    ret = msm_gem_shrinker_scan(priv->shrinker, );
  fs_reclaim_release(GFP_KERNEL);
  return ret;
@@ -210,16 +209,27 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, 
unsigned long event, void *ptr)

   *
   * This function registers and sets up the msm shrinker.
   */
-void msm_gem_shrinker_init(struct drm_device *dev)
+int msm_gem_shrinker_init(struct drm_device *dev)
  {
  struct msm_drm_private *priv = dev->dev_private;
-    priv->shrinker.count_objects = msm_gem_shrinker_count;
-    priv->shrinker.scan_objects = msm_gem_shrinker_scan;
-    priv->shrinker.seeks = DEFAULT_SEEKS;
-    WARN_ON(register_shrinker(>shrinker, "drm-msm_gem"));
+
+    priv->shrinker = shrinker_alloc(0, "drm-msm_gem");
+    if (!priv->shrinker) {


Just "if (WARN_ON(!priv->shrinker))"


As suggested by Steven Pric in patch #24, this warning is
unnecessary, so I will remove it in the next version.




+    WARN_ON(1);
+    return -ENOMEM;
+    }
+
+    priv->shrinker->count_objects = msm_gem_shrinker_count;
+    priv->shrinker->scan_objects = msm_gem_shrinker_scan;
+    priv->shrinker->seeks = DEFAULT_SEEKS;
+    priv->shrinker->private_data = priv;
+
+    shrinker_register(priv->shrinker);
  priv->vmap_notifier.notifier_call = msm_gem_shrinker_vmap;
  WARN_ON(register_vmap_purge_notifier(>vmap_notifier));
+
+    return 0;
  }
  /**
@@ -232,8 +242,8 @@ void msm_gem_shrinker_cleanup(struct drm_device *dev)
  {
  struct msm_drm_private *priv = dev->dev_private;
-    if (priv->shrinker.nr_deferred) {
+    if (priv->shrinker) {
  

Re: [PATCH v3 08/25] tools/xenstore: make hashtable key and value parameters const

2023-07-26 Thread Julien Grall

Hi,

On 26/07/2023 09:44, Juergen Gross wrote:

On 26.07.23 10:20, Julien Grall wrote:
To give a concrete example, with the current interface we are telling 
the user that what they store in the hashtable can be modified at some 
point. By adding 'const' for the value in hashtable_add(), we can 
mislead a user to think it is fine to store static string, yet this is 
not enforced all the way through. So one could mistakenly think that 
values returned hashtable_search() can be modified. And the compiler 
will not be here to help enforcing it because you cast-away the const.


Yes, like in the case of strstr().

It takes two const char * parameters and it is returning char *, even 
with it

pointing into the first parameter.


This is a pretty good example on how to not write an interface. :)



Do you have any code in this series that requires the 'const' in 
hashtable_add()? If so, can you point me to the patch and I will have 
a look?


I had it when writing this patch, but this requirement is gone now. But 
please

note that this means to drop the const from db_write(), too.

If not, then I will strongly argue that this should be dropped because 
dropping a const is always a recipe for disaster.


Depends IMO.

I believe it is better as I've done it,
but in case you insist on it I 
can drop

the patch.


Well... I can always be swayed if there is a good argument to make it 
const. So far, you mention that hashtable doesn't modify the content but 
you don't really explain why waiving away the help from the compiler is 
ok. Therefore, to me it seems the downside is bigger than the benefit.


Also, I am not asking to drop the patch. The const on the key is ok. I 
am only requesting to remove the const on the value.




An alternative would be make hashtable_search() return a const and only 
cast the
const away where it is really needed (and probably with a prominent 
comment at
the related hashtable_add() place). I think this will hit 
xenstored_domain.c use

cases only.


Again, this still means we are casting away the const somewhere. This is 
the part I am against if there is no strong justification for it (i.e. 
there is no other way to do it).


Cheers,

--
Julien Grall



  1   2   >