date:20240614

On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote:
> On 13.06.2024 20:43, Demi Marie Obenour wrote:
> > GPU acceleration requires that pageable host memory be able to be mapped
> > into a guest.
> 
> I'm sure it was explained in the session, which sadly I couldn't attend.
> I've been asking Ray and Xenia the same before, but I'm afraid it still
> hasn't become clear to me why this is a _requirement_. After all that's
> against what we're doing elsewhere (i.e. so far it has always been
> guest memory that's mapped in the host). I can appreciate that it might
> be more difficult to implement, but avoiding to violate this fundamental
> (kind of) rule might be worth the price (and would avoid other
> complexities, of which there may be lurking more than what you enumerate
> below).

My limited understanding (please someone correct me if wrong) is that
the GPU buffer (or context I think it's also called?) is always
allocated from dom0 (the owner of the GPU).  The underling memory
addresses of such buffer needs to be mapped into the guest.  The
buffer backing memory might be GPU MMIO from the device BAR(s) or
system RAM, and such buffer can be paged by the dom0 kernel at any
time (iow: changing the backing memory from MMIO to RAM or vice
versa).  Also, the buffer must be contiguous in physical address
space.

I'm not sure it's possible to ensure that when using system RAM such
memory comes from the guest rather than the host, as it would likely
require some very intrusive hooks into the kernel logic, and
negotiation with the guest to allocate the requested amount of
memory and hand it over to dom0.  If the maximum size of the buffer is
known in advance maybe dom0 can negotiate with the guest to allocate
such a region and grant it access to dom0 at driver attachment time.

One aspect that I'm lacking clarity is better understanding of how the
process of allocating and assigning a GPU buffer to a guest is
performed (I think this is the key to how GPU VirtIO native contexts
work?).

Another question I have, are guest expected to have a single GPU
buffer, or they can have multiple GPU buffers simultaneously
allocated?

Regards, Roger.

Re: [PATCH v3 0/3] x86/irq: fixes for CPU hot{,un}plug

Sorry, forgot to add the for-4.19 tag and Cc Oleksii.

Since we have taken the start of the series, we might as well take the
remaining patches (if other x86 maintainers agree) and attempt to
hopefully fix all the interrupt issues with CPU hotplug/unplug.

FTR: there are further issues when doing CPU hotplug/unplug from a PVH
dom0, but those are out of the scope for 4.19, as I haven't even
started to diagnose what's going on.

Thanks, Roger.

On Thu, Jun 13, 2024 at 06:56:14PM +0200, Roger Pau Monne wrote:
> Hello,
> 
> The following series aim to fix interrupt handling when doing CPU
> plug/unplug operations.  Without this series running:
> 
> cpus=`xl info max_cpu_id`
> while [ 1 ]; do
> for i in `seq 1 $cpus`; do
> xen-hptool cpu-offline $i;
> xen-hptool cpu-online $i;
> done
> done
> 
> Quite quickly results in interrupts getting lost and "No irq handler for
> vector" messages on the Xen console.  Drivers in dom0 also start getting
> interrupt timeouts and the system becomes unusable.
> 
> After applying the series running the loop over night still result in a
> fully usable system, no  "No irq handler for vector" messages at all, no
> interrupt loses reported by dom0.  Test with x2apic-mode={mixed,cluster}.
> 
> I've attempted to document all code as good as I could, interrupt
> handling has some unexpected corner cases that are hard to diagnose and
> reason about.
> 
> Some XenRT testing is undergoing to ensure no breakages.
> 
> Thanks, Roger.
> 
> Roger Pau Monne (3):
>   x86/irq: deal with old_cpu_mask for interrupts in movement in
> fixup_irqs()
>   x86/irq: handle moving interrupts in _assign_irq_vector()
>   x86/irq: forward pending interrupts to new destination in fixup_irqs()
> 
>  xen/arch/x86/include/asm/apic.h |   5 +
>  xen/arch/x86/irq.c  | 163 +---
>  2 files changed, 132 insertions(+), 36 deletions(-)
> 
> -- 
> 2.45.2
>

Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

On Thu, Jun 13, 2024 at 04:05:08PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 12, 2024 at 05:56:15PM +0200, Roger Pau Monné wrote:
> > Right.  AFAICT advertising "feature-barrier" and/or
> > "feature-flush-cache" could be done based on whether blkback
> > understand those commands, not on whether the underlying storage
> > supports the equivalent of them.
> > 
> > Worst case we can print a warning message once about the underlying
> > storage failing to complete flush/barrier requests, and that data
> > integrity might not be guaranteed going forward, and not propagate the
> > error to the upper layer?
> > 
> > What would be the consequence of propagating a flush error to the
> > upper layers?
> 
> If you propage the error to the upper layer you will generate an
> I/O error there, which usually leads to a file system shutdown.
> 
> > Given the description of the feature in the blkif header, I'm afraid
> > we cannot guarantee that seeing the feature exposed implies barrier or
> > flush support, since the request could fail at any time (or even from
> > the start of the disk attachment) and it would still sadly be a correct
> > implementation given the description of the options.
> 
> Well, then we could do something like the patch below, which keeps
> the existing behavior, but insolates the block layer from it and
> removes the only user of blk_queue_write_cache from interrupt
> context:

LGTM, I'm not sure there's much else we can do.

> ---
> From e6e82c769ab209a77302994c3829cf6ff7a595b8 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Thu, 30 May 2024 08:58:52 +0200
> Subject: xen-blkfront: don't disable cache flushes when they fail
> 
> blkfront always had a robust negotiation protocol for detecting a write
> cache.  Stop simply disabling cache flushes in the block layer as the
> flags handling is moving to the atomic queue limits API that needs
> user context to freeze the queue for that.  Instead handle the case
> of the feature flags cleared inside of blkfront.  This removes old
> debug code to check for such a mismatch which was previously impossible
> to hit, including the check for passthrough requests that blkfront
> never used to start with.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/xen-blkfront.c | 44 +++-
>  1 file changed, 23 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 9b4ec3e4908cce..e2c92d5095ff17 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -788,6 +788,14 @@ static int blkif_queue_rw_req(struct request *req, 
> struct blkfront_ring_info *ri
>* A barrier request a superset of FUA, so we can
>* implement it the same way.  (It's also a FLUSH+FUA,
>* since it is guaranteed ordered WRT previous writes.)
> +  *
> +  * Note that can end up here with a FUA write and the
> +  * flags cleared.  This happens when the flag was
> +  * run-time disabled and raced with I/O submission in
> +  * the block layer.  We submit it as a normal write

Since blkfront no longer signals that FUA is no longer available for the
device, getting a request with FUA is not actually a race I think?

> +  * here.  A pure flush should never end up here with
> +  * the flags cleared as they are completed earlier for
> +  * the !feature_flush case.
>*/
>   if (info->feature_flush && info->feature_fua)
>   ring_req->operation =
> @@ -795,8 +803,6 @@ static int blkif_queue_rw_req(struct request *req, struct 
> blkfront_ring_info *ri
>   else if (info->feature_flush)
>   ring_req->operation =
>   BLKIF_OP_FLUSH_DISKCACHE;
> - else
> - ring_req->operation = 0;
>   }
>   ring_req->u.rw.nr_segments = num_grant;
>   if (unlikely(require_extra_req)) {
> @@ -887,16 +893,6 @@ static inline void flush_requests(struct 
> blkfront_ring_info *rinfo)
>   notify_remote_via_irq(rinfo->irq);
>  }
>  
> -static inline bool blkif_request_flush_invalid(struct request *req,
> -struct blkfront_info *info)
> -{
> - return (blk_rq_is_passthrough(req) ||
> - ((req_op(req) == REQ_OP_FLUSH) &&
> -  !info->feature_flush) ||
> - ((req->cmd_flags & REQ_FUA) &&
> -  !info->feature_fua));
> -}
> -
>  static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
> const struct blk_mq_queue_data *qd)
>  {
> @@ -908,23 +904,30 @@ static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx 
> *hc

Re: [PATCH v4 0/7] Static shared memory followup v2 - pt2

2024-06-14 Thread Luca Fancellu

Hi Julien,

> On 13 Jun 2024, at 12:31, Julien Grall  wrote:
> 
> Hi,
> 
> On 11/06/2024 13:42, Michal Orzel wrote:
>>> We would like this serie to be in Xen 4.19, there was a misunderstanding on 
>>> our side because we thought
>>> that since the serie was sent before the last posting date, it could have 
>>> been a candidate for merging in the
>>> new release, instead after speaking with Julien and Oleksii we are now 
>>> aware that we need to provide a
>>> justification for its presence.
>>> 
>>> Pros to this serie is that we are closing the circle for static shared 
>>> memory, allowing it to use memory from
>>> the host or from Xen, it is also a feature that is not enabled by default, 
>>> so it should not cause too much
>>> disruption in case of any bugs that escaped the review, however we’ve 
>>> tested many configurations for that
>>> with/without enabling the feature if that can be an additional value.
>>> 
>>> Cons: we are touching some common code related to p2m, but also there the 
>>> impact should be minimal because
>>> the new code is subject to l2 foreign mapping (to be confirmed maybe from a 
>>> p2m expert like Julien).
>>> 
>>> The comments on patch 3 of this serie are addressed by this patch:
>>> https://patchwork.kernel.org/project/xen-devel/patch/20240528125603.2467640-1-luca.fance...@arm.com/
>>> And the serie is fully reviewed.
>>> 
>>> So our request is to allow this serie in 4.19, Oleksii, ARM maintainers, do 
>>> you agree on that?
>> As a main reviewer of this series I'm ok to have this series in. It is 
>> nicely encapsulated and the feature itself
>> is still in unsupported state. I don't foresee any issues with it.
> 
> There are changes in the p2m code and the memory allocation for boot domains. 
> So is it really encapsulated?
> 
> For me there are two risks:
> * p2m (already mentioned by Luca): We modify the code to put foreign mapping. 
> The worse that can happen if we don't release the foreign mapping. This would 
> mean the page will not be freed. AFAIK, we don't exercise this path in the CI.
> * domain allocation: This mainly look like refactoring. And the path is 
> exercised in the CI.
> 
> So I am not concerned with the domain allocation one. @Luca, would it be 
> possible to detail how did you test the foreign pages were properly removed?

So at first we tested the code, with/without the static shared memory feature 
enabled, creating/destroying guest from Dom0 and checking that everything
was ok.

After a chat on Matrix with Julien he suggested that using a virtio-mmio disk 
was better to stress out the foreign mapping looking for
regressions.

Luckily I’ve found this slide deck from @Oleksandr: 
https://static.linaro.org/connect/lvc21/presentations/lvc21-314.pdf

So I did a setup using fvp-base, having a disk with two partitions containing 
Dom0 rootfs and DomU rootfs, Dom0 sees
this disk using VirtIO block.

The Dom0 rootfs contains the virtio-disk backend: 
https://github.com/xen-troops/virtio-disk

And the DomU XL configuration is using these parameters:

cmdline="console=hvc0 root=/dev/vda rw"
disk = ['/dev/vda2,raw,xvda,w,specification=virtio’]

Running the setup and creating/destroying a couple of times the guest is not 
showing regressions, here an example of the output:

root@fvp-base:/opt/xtp/guests/linux-guests# xl create -c 
linux-ext-arm64-stresstests-rootfs.cfg
Parsing config from linux-ext-arm64-stresstests-rootfs.cfg
main: read frontend domid 2
  Info: connected to dom2

demu_seq_next: >XENSTORE_ATTACHED
demu_seq_next: domid = 2
demu_seq_next: devid = 51712
demu_seq_next: filename[0] = /dev/vda2
demu_seq_next: readonly[0] = 0
demu_seq_next: base[0] = 0x200
demu_seq_next: irq[0]  = 33
demu_seq_next: >XENEVTCHN_OPEN
demu_seq_next: >XENFOREIGNMEMORY_OPEN
demu_seq_next: >XENDEVICEMODEL_OPEN
demu_seq_next: >XENGNTTAB_OPEN
demu_initialize: 1 vCPU(s)
demu_seq_next: >SERVER_REGISTERED
demu_seq_next: ioservid = 0
demu_seq_next: >RESOURCE_MAPPED
demu_seq_next: shared_iopage = 0x7f80c58000
demu_seq_next: >SERVER_ENABLED
demu_seq_next: >PORT_ARRAY_ALLOCATED
demu_seq_next: >EVTCHN_PORTS_BOUND
demu_seq_next: VCPU0: 3 -> 6
demu_register_memory_space: 200 - 20001ff
  Info: (virtio/mmio.c) virtio_mmio_init:165: 
virtio-mmio.devices=0x200@0x200:33
demu_seq_next: >DEVICE_INITIALIZED
demu_seq_next: >INITIALIZED
IO request not ready
(XEN) d2v0 Unhandled SMC/HVC: 0x8450
(XEN) d2v0 Unhandled SMC/HVC: 0x8600ff01
(XEN) d2v0: vGICD: RAZ on reserved register offset 0x0c
(XEN) d2v0: vGICD: unhandled word write 0x00 to ICACTIVER4
(XEN) d2v0: vGICR: SGI: unhandled word write 0x00 to ICACTIVER0
[0.00] Booting Linux on physical CPU 0x00 [0x410fd0f0]
[0.00] Linux version 6.1.25 (lucfan01@e125770) (aarch64-poky-linux-gcc 
(GCC) 12.2.0, GNU ld (GNU Binutils) 2.40.20230119) #4 SMP PREEMPT Thu Jun 13 
21:55:06 UTC 2024
[0.00] Machine model: XENVM-4.19
[0.00] Xen 4.19 support found
[0.0

Re: Design session notes: GPU acceleration in Xen

2024-06-14 Thread Jan Beulich

On 14.06.2024 09:21, Roger Pau Monné wrote:
> On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote:
>> On 13.06.2024 20:43, Demi Marie Obenour wrote:
>>> GPU acceleration requires that pageable host memory be able to be mapped
>>> into a guest.
>>
>> I'm sure it was explained in the session, which sadly I couldn't attend.
>> I've been asking Ray and Xenia the same before, but I'm afraid it still
>> hasn't become clear to me why this is a _requirement_. After all that's
>> against what we're doing elsewhere (i.e. so far it has always been
>> guest memory that's mapped in the host). I can appreciate that it might
>> be more difficult to implement, but avoiding to violate this fundamental
>> (kind of) rule might be worth the price (and would avoid other
>> complexities, of which there may be lurking more than what you enumerate
>> below).
> 
> My limited understanding (please someone correct me if wrong) is that
> the GPU buffer (or context I think it's also called?) is always
> allocated from dom0 (the owner of the GPU).  The underling memory
> addresses of such buffer needs to be mapped into the guest.  The
> buffer backing memory might be GPU MMIO from the device BAR(s) or
> system RAM, and such buffer can be paged by the dom0 kernel at any
> time (iow: changing the backing memory from MMIO to RAM or vice
> versa).  Also, the buffer must be contiguous in physical address
> space.

This last one in particular would of course be a severe restriction.
Yet: There's an IOMMU involved, isn't there?

> I'm not sure it's possible to ensure that when using system RAM such
> memory comes from the guest rather than the host, as it would likely
> require some very intrusive hooks into the kernel logic, and
> negotiation with the guest to allocate the requested amount of
> memory and hand it over to dom0.  If the maximum size of the buffer is
> known in advance maybe dom0 can negotiate with the guest to allocate
> such a region and grant it access to dom0 at driver attachment time.

Besides the thought of transiently converting RAM to kind-of-MMIO, this
makes me think of another possible option: Could Dom0 transfer ownership
of the RAM that wants mapping in the guest (remotely resembling
grant-transfer)? Would require the guest to have ballooned down enough
first, of course. (In both cases it would certainly need working out how
the conversion / transfer back could be made work safely and reasonably
cleanly.)

Jan

> One aspect that I'm lacking clarity is better understanding of how the
> process of allocating and assigning a GPU buffer to a guest is
> performed (I think this is the key to how GPU VirtIO native contexts
> work?).
> 
> Another question I have, are guest expected to have a single GPU
> buffer, or they can have multiple GPU buffers simultaneously
> allocated?
> 
> Regards, Roger.

Re: Design session notes: GPU acceleration in Xen

On Fri, Jun 14, 2024 at 10:12:40AM +0200, Jan Beulich wrote:
> On 14.06.2024 09:21, Roger Pau Monné wrote:
> > On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote:
> >> On 13.06.2024 20:43, Demi Marie Obenour wrote:
> >>> GPU acceleration requires that pageable host memory be able to be mapped
> >>> into a guest.
> >>
> >> I'm sure it was explained in the session, which sadly I couldn't attend.
> >> I've been asking Ray and Xenia the same before, but I'm afraid it still
> >> hasn't become clear to me why this is a _requirement_. After all that's
> >> against what we're doing elsewhere (i.e. so far it has always been
> >> guest memory that's mapped in the host). I can appreciate that it might
> >> be more difficult to implement, but avoiding to violate this fundamental
> >> (kind of) rule might be worth the price (and would avoid other
> >> complexities, of which there may be lurking more than what you enumerate
> >> below).
> > 
> > My limited understanding (please someone correct me if wrong) is that
> > the GPU buffer (or context I think it's also called?) is always
> > allocated from dom0 (the owner of the GPU).  The underling memory
> > addresses of such buffer needs to be mapped into the guest.  The
> > buffer backing memory might be GPU MMIO from the device BAR(s) or
> > system RAM, and such buffer can be paged by the dom0 kernel at any
> > time (iow: changing the backing memory from MMIO to RAM or vice
> > versa).  Also, the buffer must be contiguous in physical address
> > space.
> 
> This last one in particular would of course be a severe restriction.
> Yet: There's an IOMMU involved, isn't there?

Yup, IIRC that's why Ray said it was much more easier for them to
support VirtIO GPUs from a PVH dom0 rather than classic PV one.

It might be easier to implement from a classic PV dom0 if there's
pv-iommu support, so that dom0 can create it's own contiguous memory
buffers from the device PoV.

> > I'm not sure it's possible to ensure that when using system RAM such
> > memory comes from the guest rather than the host, as it would likely
> > require some very intrusive hooks into the kernel logic, and
> > negotiation with the guest to allocate the requested amount of
> > memory and hand it over to dom0.  If the maximum size of the buffer is
> > known in advance maybe dom0 can negotiate with the guest to allocate
> > such a region and grant it access to dom0 at driver attachment time.
> 
> Besides the thought of transiently converting RAM to kind-of-MMIO, this

As a note here, changing the type to MMIO would likely involve
modifying the EPT/NPT tables to propagate the new type.  On a PVH dom0
this would likely involve shattering superpages in order to set the
correct memory types.

Depending on how often and how random those system RAM changes are
necessary this could also create contention on the p2m lock.

> makes me think of another possible option: Could Dom0 transfer ownership
> of the RAM that wants mapping in the guest (remotely resembling
> grant-transfer)? Would require the guest to have ballooned down enough
> first, of course. (In both cases it would certainly need working out how
> the conversion / transfer back could be made work safely and reasonably
> cleanly.)

Maybe.  The fact the guest needs to balloon down that amount of memory
seems weird to me, as from the guest PoV that mapped memory is
MMIO-like and not system RAM.

Thanks, Roger.

Re: [PATCH 02/14] ubd: untagle discard vs write zeroes not support handling

2024-06-14 Thread Anton Ivanov





On 31/05/2024 08:47, Christoph Hellwig wrote:

Discard and Write Zeroes are different operation and implemented
by different fallocate opcodes for ubd.  If one fails the other one
can work and vice versa.

Split the code to disable the operations in ubd_handler to only
disable the operation that actually failed.

Fixes: 50109b5a03b4 ("um: Add support for DISCARD in the UBD Driver")
Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
Reviewed-by: Martin K. Petersen 
---
  arch/um/drivers/ubd_kern.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index 0c9542d58c01b7..093c87879d08ba 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -449,10 +449,11 @@ static int bulk_req_safe_read(
  
  static void ubd_end_request(struct io_thread_req *io_req)

  {
-   if (io_req->error == BLK_STS_NOTSUPP &&
-   req_op(io_req->req) == REQ_OP_DISCARD) {
-   blk_queue_max_discard_sectors(io_req->req->q, 0);
-   blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
+   if (io_req->error == BLK_STS_NOTSUPP) {
+   if (req_op(io_req->req) == REQ_OP_DISCARD)
+   blk_queue_max_discard_sectors(io_req->req->q, 0);
+   else if (req_op(io_req->req) == REQ_OP_WRITE_ZEROES)
+   blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
}
blk_mq_end_request(io_req->req, io_req->error);
kfree(io_req);

Acked-By: Anton Ivanov 
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

Re: [PATCH 01/14] ubd: refactor the interrupt handler

2024-06-14 Thread Anton Ivanov





On 31/05/2024 08:47, Christoph Hellwig wrote:

Instead of a separate handler function that leaves no work in the
interrupt hanler itself, split out a per-request end I/O helper and
clean up the coding style and variable naming while we're at it.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Martin K. Petersen 
---
  arch/um/drivers/ubd_kern.c | 49 ++
  1 file changed, 18 insertions(+), 31 deletions(-)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index ef805eaa9e013d..0c9542d58c01b7 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -447,43 +447,30 @@ static int bulk_req_safe_read(
return n;
  }
  
-/* Called without dev->lock held, and only in interrupt context. */

-static void ubd_handler(void)
+static void ubd_end_request(struct io_thread_req *io_req)
  {
-   int n;
-   int count;
-
-   while(1){
-   n = bulk_req_safe_read(
-   thread_fd,
-   irq_req_buffer,
-   &irq_remainder,
-   &irq_remainder_size,
-   UBD_REQ_BUFFER_SIZE
-   );
-   if (n < 0) {
-   if(n == -EAGAIN)
-   break;
-   printk(KERN_ERR "spurious interrupt in ubd_handler, "
-  "err = %d\n", -n);
-   return;
-   }
-   for (count = 0; count < n/sizeof(struct io_thread_req *); 
count++) {
-   struct io_thread_req *io_req = (*irq_req_buffer)[count];
-
-   if ((io_req->error == BLK_STS_NOTSUPP) && 
(req_op(io_req->req) == REQ_OP_DISCARD)) {
-   blk_queue_max_discard_sectors(io_req->req->q, 
0);
-   
blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
-   }
-   blk_mq_end_request(io_req->req, io_req->error);
-   kfree(io_req);
-   }
+   if (io_req->error == BLK_STS_NOTSUPP &&
+   req_op(io_req->req) == REQ_OP_DISCARD) {
+   blk_queue_max_discard_sectors(io_req->req->q, 0);
+   blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
}
+   blk_mq_end_request(io_req->req, io_req->error);
+   kfree(io_req);
  }
  
  static irqreturn_t ubd_intr(int irq, void *dev)

  {
-   ubd_handler();
+   int len, i;
+
+   while ((len = bulk_req_safe_read(thread_fd, irq_req_buffer,
+   &irq_remainder, &irq_remainder_size,
+   UBD_REQ_BUFFER_SIZE)) >= 0) {
+   for (i = 0; i < len / sizeof(struct io_thread_req *); i++)
+   ubd_end_request((*irq_req_buffer)[i]);
+   }
+
+   if (len < 0 && len != -EAGAIN)
+   pr_err("spurious interrupt in %s, err = %d\n", __func__, len);
return IRQ_HANDLED;
  }
  

Acked-By: Anton Ivanov 
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

[XEN PATCH v3] automation/eclair: add deviation for MISRA C Rule 17.7

2024-06-14 Thread Federico Serafini

Update ECLAIR configuration to deviate some cases where not using
the return value of a function is not dangerous.

Signed-off-by: Federico Serafini 
---
Changes in v3:
- removed unwanted underscores;
- grammar fixed;
- do not constraint to the first actual argument.
Changes in v2:
- do not deviate strlcpy and strlcat.
---
 automation/eclair_analysis/ECLAIR/deviations.ecl | 4 
 docs/misra/deviations.rst| 9 +
 2 files changed, 13 insertions(+)

diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl 
b/automation/eclair_analysis/ECLAIR/deviations.ecl
index 447c1e6661..97281082a8 100644
--- a/automation/eclair_analysis/ECLAIR/deviations.ecl
+++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
@@ -413,6 +413,10 @@ explicit comment indicating the fallthrough intention is 
present."
 -config=MC3R1.R17.1,macros+={hide , "^va_(arg|start|copy|end)$"}
 -doc_end
 
+-doc_begin="Not using the return value of a function does not endanger safety 
if it coincides with an actual argument."
+-config=MC3R1.R17.7,calls+={safe, "any()", 
"decl(name(__builtin_memcpy||__builtin_memmove||__builtin_memset||cpumask_check))"}
+-doc_end
+
 #
 # Series 18.
 #
diff --git a/docs/misra/deviations.rst b/docs/misra/deviations.rst
index 36959aa44a..f3abe31eb5 100644
--- a/docs/misra/deviations.rst
+++ b/docs/misra/deviations.rst
@@ -364,6 +364,15 @@ Deviations related to MISRA C:2012 Rules:
by `stdarg.h`.
  - Tagged as `deliberate` for ECLAIR.
 
+   * - R17.7
+ - Not using the return value of a function does not endanger safety if it
+   coincides with an actual argument.
+ - Tagged as `safe` for ECLAIR. Such functions are:
+ - __builtin_memcpy()
+ - __builtin_memmove()
+ - __builtin_memset()
+ - cpumask_check()
+
* - R20.4
  - The override of the keyword \"inline\" in xen/compiler.h is present so
that section contents checks pass when the compiler chooses not to
-- 
2.34.1

[XEN PATCH v3] automation/eclair: extend existing deviations of MISRA C Rule 16.3

2024-06-14 Thread Federico Serafini

Update ECLAIR configuration to deviate more cases where an
unintentional fallthrough cannot happen.

Add Rule 16.3 to the monitored set and tag it as clean for arm.

Signed-off-by: Federico Serafini 
---
Changes from v2:
- fixed grammar;
- reprhased deviations regarding do-while-false and ASSERT_UNREACHABLE().
---
 .../eclair_analysis/ECLAIR/deviations.ecl | 31 ++-
 .../eclair_analysis/ECLAIR/monitored.ecl  |  1 +
 automation/eclair_analysis/ECLAIR/tagging.ecl |  2 +-
 docs/misra/deviations.rst | 28 +++--
 4 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl 
b/automation/eclair_analysis/ECLAIR/deviations.ecl
index 447c1e6661..3bdfc3a84d 100644
--- a/automation/eclair_analysis/ECLAIR/deviations.ecl
+++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
@@ -364,14 +364,30 @@ therefore it is deemed better to leave such files as is."
 -config=MC3R1.R16.2,reports+={deliberate, 
"any_area(any_loc(file(x86_emulate||x86_svm_emulate)))"}
 -doc_end
 
--doc_begin="Switch clauses ending with continue, goto, return statements are
-safe."
--config=MC3R1.R16.3,terminals+={safe, 
"node(continue_stmt||goto_stmt||return_stmt)"}
+-doc_begin="Statements that change the control flow (i.e., break, continue, 
goto, return) and calls to functions that do not return the control back are 
\"allowed terminal statements\"."
+-stmt_selector+={r16_3_allowed_terminal, 
"node(break_stmt||continue_stmt||goto_stmt||return_stmt)||call(property(noreturn))"}
+-config=MC3R1.R16.3,terminals+={safe, "r16_3_allowed_terminal"}
+-doc_end
+
+-doc_begin="An if-else statement having both branches ending with an allowed 
terminal statement is itself an allowed terminal statement."
+-stmt_selector+={r16_3_if, 
"node(if_stmt)&&(child(then,r16_3_allowed_terminal)||child(then,any_stmt(stmt,-1,r16_3_allowed_terminal)))"}
+-stmt_selector+={r16_3_else, 
"node(if_stmt)&&(child(else,r16_3_allowed_terminal)||child(else,any_stmt(stmt,-1,r16_3_allowed_terminal)))"}
+-stmt_selector+={r16_3_if_else, "r16_3_if&&r16_3_else"}
+-config=MC3R1.R16.3,terminals+={safe, "r16_3_if_else"}
+-doc_end
+
+-doc_begin="An if-else statement having an always true condition and the true 
branch ending with an allowed terminal statement is itself an allowed terminal 
statement."
+-stmt_selector+={r16_3_if_true, "r16_3_if&&child(cond,definitely_in(1..))"}
+-config=MC3R1.R16.3,terminals+={safe, "r16_3_if_true"}
+-doc_end
+
+-doc_begin="A switch clause ending with a statement expression which, in turn, 
ends with an allowed terminal statement is safe."
+-config=MC3R1.R16.3,terminals+={safe, 
"node(stmt_expr)&&child(stmt,node(compound_stmt)&&any_stmt(stmt,-1,r16_3_allowed_terminal||r16_3_if_else||r16_3_if_true))"}
 -doc_end
 
--doc_begin="Switch clauses ending with a call to a function that does not give
-the control back (i.e., a function with attribute noreturn) are safe."
--config=MC3R1.R16.3,terminals+={safe, "call(property(noreturn))"}
+-doc_begin="A switch clause ending with a do-while-false the body of which, in 
turn, ends with an allowed terminal statement is safe.
+An exception to that is the macro ASSERT_UNREACHABLE() which is effective in 
debug build only: a switch clause ending with ASSERT_UNREACHABLE() is not 
considered safe."
+-config=MC3R1.R16.3,terminals+={safe, 
"!macro(name(ASSERT_UNREACHABLE))&&node(do_stmt)&&child(cond,definitely_in(0))&&child(body,any_stmt(stmt,-1,r16_3_allowed_terminal||r16_3_if_else||r16_3_if_true))"}
 -doc_end
 
 -doc_begin="Switch clauses ending with pseudo-keyword \"fallthrough\" are
@@ -383,8 +399,7 @@ safe."
 -config=MC3R1.R16.3,reports+={safe, 
"any_area(end_loc(any_exp(text(/BUG\\(\\);/"}
 -doc_end
 
--doc_begin="Switch clauses not ending with the break statement are safe if an
-explicit comment indicating the fallthrough intention is present."
+-doc_begin="Switch clauses ending with an explicit comment indicating the 
fallthrough intention are safe."
 -config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(^(?s).*/\\* 
[fF]all ?through.? \\*/.*$,0..1"}
 -doc_end
 
diff --git a/automation/eclair_analysis/ECLAIR/monitored.ecl 
b/automation/eclair_analysis/ECLAIR/monitored.ecl
index 4daecb0c83..45a60074f9 100644
--- a/automation/eclair_analysis/ECLAIR/monitored.ecl
+++ b/automation/eclair_analysis/ECLAIR/monitored.ecl
@@ -22,6 +22,7 @@
 -enable=MC3R1.R14.1
 -enable=MC3R1.R14.4
 -enable=MC3R1.R16.2
+-enable=MC3R1.R16.3
 -enable=MC3R1.R16.6
 -enable=MC3R1.R16.7
 -enable=MC3R1.R17.1
diff --git a/automation/eclair_analysis/ECLAIR/tagging.ecl 
b/automation/eclair_analysis/ECLAIR/tagging.ecl
index a354ff322e..07de2e7b65 100644
--- a/automation/eclair_analysis/ECLAIR/tagging.ecl
+++ b/automation/eclair_analysis/ECLAIR/tagging.ecl
@@ -105,7 +105,7 @@ if(string_equal(target,"x86_64"),
 )
 
 if(string_equal(target,"arm64"),
-
service_selector({"additional_clean_guidelines","MC3R1.R14.4||MC3R1.R16.6||MC3R1.R20.12||M

[libvirt test] 186343: tolerable all pass - PUSHED

flight 186343 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186343/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186330
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 libvirt  991c324fae2cfca8d592437ffc386171d343c836
baseline version:
 libvirt  230d81fc3a1398e66f482e404abe9759afd9bb59

Last test of basis   186330  2024-06-13 04:18:43 Z1 days
Testing same since   186343  2024-06-14 04:22:22 Z0 days1 attempts


People who touched revisions under test:
  Daniel P. Berrangé 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-amd64-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-amd64-amd64-libvirt-raw pass
 test-arm64-arm64-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd pass
 test-armhf-armhf-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/libvirt.git
   230d81fc3a..991c324fae  991c324fae2cfca8d592437ffc386171d343c836 -> 
xen-tested-master

[ovmf test] 186346: all pass - PUSHED

flight 186346 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186346/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf ce91687a1b2d4e03b01abb474b4665629776f588
baseline version:
 ovmf 712797cf19acd292bf203522a79e40e7e13d268b

Last test of basis   186338  2024-06-13 16:11:15 Z0 days
Testing same since   186346  2024-06-14 07:13:40 Z0 days1 attempts


People who touched revisions under test:
  Jiaxin Wu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   712797cf19..ce91687a1b  ce91687a1b2d4e03b01abb474b4665629776f588 -> 
xen-tested-master

Re: [PATCH v4 0/7] Static shared memory followup v2 - pt2

2024-06-14 Thread Julien Grall

On 14/06/2024 08:54, Luca Fancellu wrote:

Hi Julien,

Hi Luca,

On 13 Jun 2024, at 12:31, Julien Grall wrote:

Hi,

On 11/06/2024 13:42, Michal Orzel wrote:

We would like this serie to be in Xen 4.19, there was a misunderstanding on our
side because we thought
that since the serie was sent before the last posting date, it could have been
a candidate for merging in the
new release, instead after speaking with Julien and Oleksii we are now aware
that we need to provide a
justification for its presence.

Pros to this serie is that we are closing the circle for static shared memory,
allowing it to use memory from
the host or from Xen, it is also a feature that is not enabled by default, so
it should not cause too much
disruption in case of any bugs that escaped the review, however we’ve tested
many configurations for that
with/without enabling the feature if that can be an additional value.

Cons: we are touching some common code related to p2m, but also there the
impact should be minimal because
the new code is subject to l2 foreign mapping (to be confirmed maybe from a p2m
expert like Julien).

The comments on patch 3 of this serie are addressed by this patch:
https://patchwork.kernel.org/project/xen-devel/patch/20240528125603.2467640-1-luca.fance...@arm.com/
And the serie is fully reviewed.

So our request is to allow this serie in 4.19, Oleksii, ARM maintainers, do you
agree on that?

As a main reviewer of this series I'm ok to have this series in. It is nicely
encapsulated and the feature itself
is still in unsupported state. I don't foresee any issues with it.

There are changes in the p2m code and the memory allocation for boot domains.
So is it really encapsulated?

For me there are two risks:
* p2m (already mentioned by Luca): We modify the code to put foreign mapping.
The worse that can happen if we don't release the foreign mapping. This would
mean the page will not be freed. AFAIK, we don't exercise this path in the CI.
* domain allocation: This mainly look like refactoring. And the path is
exercised in the CI.

So I am not concerned with the domain allocation one. @Luca, would it be
possible to detail how did you test the foreign pages were properly removed?

So at first we tested the code, with/without the static shared memory feature
enabled, creating/destroying guest from Dom0 and checking that everything
was ok.

Just to details a bit more for the other. In a normal setup, guest would
be using PV block/network which are based on grant table. To exercise
the foreing mapping path, you would need a backend that map using the
GFN directly.

This is the case for virtio MMIO. Other users are IIRC XenFB, IOREQ (if
you emulate a device).

After a chat on Matrix with Julien he suggested that using a virtio-mmio disk
was better to stress out the foreign mapping looking for
regressions.

Luckily I’ve found this slide deck from @Oleksandr:
https://static.linaro.org/connect/lvc21/presentations/lvc21-314.pdf

So I did a setup using fvp-base, having a disk with two partitions containing
Dom0 rootfs and DomU rootfs, Dom0 sees
this disk using VirtIO block.

Thanks for the testing! I am satisfied with the result. I have committed
the series.

Cheers,

--
Julien Grall

Re: [PATCH v12 5/8] xen/riscv: add minimal stuff to mm.h to build full Xen

On 11/06/2024 7:23 pm, Oleksii K. wrote:
> On Tue, 2024-06-11 at 16:53 +0100, Andrew Cooper wrote:
>> On 30/05/2024 7:22 pm, Oleksii K. wrote:
>>> On Thu, 2024-05-30 at 18:23 +0100, Andrew Cooper wrote:
 On 29/05/2024 8:55 pm, Oleksii Kurochko wrote:
> Signed-off-by: Oleksii Kurochko 
> Acked-by: Jan Beulich 
 This patch looks like it can go in independently?  Or does it
 depend
 on
 having bitops.h working in practice?

 However, one very strong suggestion...


> diff --git a/xen/arch/riscv/include/asm/mm.h
> b/xen/arch/riscv/include/asm/mm.h
> index 07c7a0abba..cc4a07a71c 100644
> --- a/xen/arch/riscv/include/asm/mm.h
> +++ b/xen/arch/riscv/include/asm/mm.h
> @@ -3,11 +3,246 @@
> 
> +/* PDX of the first page in the frame table. */
> +extern unsigned long frametable_base_pdx;
> +
> +/* Convert between machine frame numbers and page-info
> structures.
> */
> +#define
> mfn_to_page(mfn)    \
> +    (frame_table + (mfn_to_pdx(mfn) - frametable_base_pdx))
> +#define
> page_to_mfn(pg) \
> +    pdx_to_mfn((unsigned long)((pg) - frame_table) +
> frametable_base_pdx)
 Do yourself a favour and not introduce frametable_base_pdx to
 begin
 with.

 Every RISC-V board I can find has things starting from 0 in
 physical
 address space, with RAM starting immediately after.
>>> I checked Linux kernel and grep there:
>>>    [ok@fedora linux-aia]$ grep -Rni "memory@" arch/riscv/boot/dts/
>>> --
>>>    exclude "*.tmp" -I
>>>    arch/riscv/boot/dts/starfive/jh7110-starfive-visionfive-
>>> 2.dtsi:33:    
>>>    memory@4000 {
>>>    arch/riscv/boot/dts/starfive/jh7100-common.dtsi:28:
>>> memory@8000
>>>    {
>>>    arch/riscv/boot/dts/microchip/mpfs-sev-kit.dts:49: 
>>> ddrc_cache:
>>>    memory@10 {
>>>    arch/riscv/boot/dts/microchip/mpfs-m100pfsevp.dts:33:  
>>> ddrc_cache_lo:
>>>    memory@8000 {
>>>    arch/riscv/boot/dts/microchip/mpfs-m100pfsevp.dts:37:  
>>> ddrc_cache_hi:
>>>    memory@104000 {
>>>    arch/riscv/boot/dts/microchip/mpfs-tysom-m.dts:34: 
>>> ddrc_cache_lo:
>>>    memory@8000 {
>>>    arch/riscv/boot/dts/microchip/mpfs-tysom-m.dts:40: 
>>> ddrc_cache_hi:
>>>    memory@10 {
>>>    arch/riscv/boot/dts/microchip/mpfs-polarberry.dts:22:  
>>> ddrc_cache_lo:
>>>    memory@8000 {
>>>    arch/riscv/boot/dts/microchip/mpfs-polarberry.dts:27:  
>>> ddrc_cache_hi:
>>>    memory@10 {
>>>    arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts:57:  
>>> ddrc_cache_lo:
>>>    memory@8000 {
>>>    arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts:63:  
>>> ddrc_cache_hi:
>>>    memory@104000 {
>>>    arch/riscv/boot/dts/thead/th1520-beaglev-ahead.dts:32:  memory@0
>>> {
>>>    arch/riscv/boot/dts/thead/th1520-lichee-module-4a.dtsi:14: 
>>>    memory@0 {
>>>    arch/riscv/boot/dts/sophgo/cv1800b-milkv-duo.dts:26:   
>>> memory@8000
>>>    {
>>>    arch/riscv/boot/dts/sophgo/cv1812h.dtsi:12: memory@8000
>>> {
>>>    arch/riscv/boot/dts/sifive/hifive-unmatched-a00.dts:26:
>>> memory@8000
>>>    {
>>>    arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts:25:
>>> memory@8000
>>>    {
>>>    arch/riscv/boot/dts/canaan/k210.dtsi:82:    sram:
>>> memory@8000 {
>>>    
>>> And based on  that majority of supported by Linux kernel boards has
>>> RAM
>>> starting not from 0 in physical address space. Am I confusing
>>> something?
>>>
 Taking the microchip board as an example, RAM actually starts at
 0x800,
>>> Today we had conversation with the guy from SiFive in xen-devel
>>> channel
>>> and he mentioned that they are using "starfive visionfive2 and
>>> sifive
>>> unleashed platforms." which based on the grep above has RAM not at
>>> 0
>>> address.
>>>
>>> Also, QEMU uses 0x800.
>>>
  which means that having frametable_base_pdx and assuming it
 does get set to 0x8000 (which isn't even a certainty, given that
 I
 think
 you'll need struct pages covering the PLICs), then what you are
 trading
 off is:

 * Saving 32k of virtual address space only (no need to even
 allocate
 memory for this range of the framtable), by
 * Having an extra memory load and add/sub in every page <-> mfn
 conversion, which is a screaming hotpath all over Xen.

 It's a terribly short-sighted tradeoff.

 32k of VA space might be worth saving in a 32bit build (I
 personally
 wouldn't - especially as there's no need to share Xen's VA space
 with
 guests, given no PV guests on ARM/RISC-V), but it's absolutely
 not at
 all in an a 64bit build with TB of VA space available.

 Even if we do find a board with the first interesting thing in
 the
 frametable starting sufficiently away from 0 that it might be

Re: [PATCH for 4.19?] x86/Intel: unlock CPUID earlier for the BSP

On 14/06/2024 7:27 am, Jan Beulich wrote:
> On 13.06.2024 18:17, Andrew Cooper wrote:
>> On 13/06/2024 9:19 am, Jan Beulich wrote:
>>> Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
>>> this bit is set by the BIOS then CPUID evaluation does not work when
>>> data from any leaf greater than two is needed; early_cpu_init() in
>>> particular wants to collect leaf 7 data.
>>>
>>> Cure this by unlocking CPUID right before evaluating anything which
>>> depends on the maximum CPUID leaf being greater than two.
>>>
>>> Inspired by (and description cloned from) Linux commit 0c2f6d04619e
>>> ("x86/topology/intel: Unlock CPUID before evaluating anything").
>>>
>>> Signed-off-by: Jan Beulich 
>>> ---
>>> While I couldn't spot anything, it kind of feels as if I'm overlooking
>>> further places where we might be inspecting in particular leaf 7 yet
>>> earlier.
>>>
>>> No Fixes: tag(s), as imo it would be too many that would want
>>> enumerating.
>> I also saw that go by, but concluded that Xen doesn't need it, hence why
>> I left it alone.
>>
>> The truth is that only the BSP needs it.  APs sort it out in the
>> trampoline via trampoline_misc_enable_off, because they need to clear
>> XD_DISABLE in prior to enabling paging, so we should be taking it out of
>> early_init_intel().
> Except for the (odd) case also mentioned to Roger, where the BSP might have
> the bit clear but some (or all) AP(s) have it set.

Fine I suppose.  It's a single MSR adjustment once per CPU.

>
>> But, we don't have an early BSP-only early hook, and I'm not overwhelmed
>> at the idea of exporting it from intel.c
>>
>> I was intending to leave it alone until I can burn this whole
>> infrastructure to the ground and make it work nicely with policies, but
>> that's not a job for this point in the release...
> This last part reads like the rest of your reply isn't an objection to me
> putting this in with Roger's R-b, but it would be nice if you could
> confirm this understanding of mine. Without this last part it, especially
> the 2nd from last paragraph, certainly reads a little like an objection.

I'm -1 to this generally.  It's churn without fixing anything AFAICT,
and is moving in a direction which will need undoing in the future.

But, because it doesn't fix anything, I don't think it's appropriate to
be tagged as 4.19 even if you and roger feel strongly enough to put it
into 4.20.

~Andrew

Re: [PATCH for 4.19?] x86/Intel: unlock CPUID earlier for the BSP

2024-06-14 Thread Jan Beulich

On 14.06.2024 12:14, Andrew Cooper wrote:
> On 14/06/2024 7:27 am, Jan Beulich wrote:
>> On 13.06.2024 18:17, Andrew Cooper wrote:
>>> On 13/06/2024 9:19 am, Jan Beulich wrote:
 Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
 this bit is set by the BIOS then CPUID evaluation does not work when
 data from any leaf greater than two is needed; early_cpu_init() in
 particular wants to collect leaf 7 data.

 Cure this by unlocking CPUID right before evaluating anything which
 depends on the maximum CPUID leaf being greater than two.

 Inspired by (and description cloned from) Linux commit 0c2f6d04619e
 ("x86/topology/intel: Unlock CPUID before evaluating anything").

 Signed-off-by: Jan Beulich 
 ---
 While I couldn't spot anything, it kind of feels as if I'm overlooking
 further places where we might be inspecting in particular leaf 7 yet
 earlier.

 No Fixes: tag(s), as imo it would be too many that would want
 enumerating.
>>> I also saw that go by, but concluded that Xen doesn't need it, hence why
>>> I left it alone.
>>>
>>> The truth is that only the BSP needs it.  APs sort it out in the
>>> trampoline via trampoline_misc_enable_off, because they need to clear
>>> XD_DISABLE in prior to enabling paging, so we should be taking it out of
>>> early_init_intel().
>> Except for the (odd) case also mentioned to Roger, where the BSP might have
>> the bit clear but some (or all) AP(s) have it set.
> 
> Fine I suppose.  It's a single MSR adjustment once per CPU.
> 
>>
>>> But, we don't have an early BSP-only early hook, and I'm not overwhelmed
>>> at the idea of exporting it from intel.c
>>>
>>> I was intending to leave it alone until I can burn this whole
>>> infrastructure to the ground and make it work nicely with policies, but
>>> that's not a job for this point in the release...
>> This last part reads like the rest of your reply isn't an objection to me
>> putting this in with Roger's R-b, but it would be nice if you could
>> confirm this understanding of mine. Without this last part it, especially
>> the 2nd from last paragraph, certainly reads a little like an objection.
> 
> I'm -1 to this generally.  It's churn without fixing anything AFAICT,

How that? We clearly do the adjustment too late right now for the BSP.
All the leaf-7 stuff added to early_cpu_init() (iirc in part in the course
of speculation work) is useless on a system where firmware set that flag.

Jan

> and is moving in a direction which will need undoing in the future.
> 
> But, because it doesn't fix anything, I don't think it's appropriate to
> be tagged as 4.19 even if you and roger feel strongly enough to put it
> into 4.20.
> 
> ~Andrew

Re: [PATCH v3 0/3] x86/irq: fixes for CPU hot{,un}plug

2024-06-14 Thread Oleksii K.

On Fri, 2024-06-14 at 09:28 +0200, Roger Pau Monné wrote:
> Sorry, forgot to add the for-4.19 tag and Cc Oleksii.
> 
> Since we have taken the start of the series, we might as well take
> the
> remaining patches (if other x86 maintainers agree) and attempt to
> hopefully fix all the interrupt issues with CPU hotplug/unplug.
> 
> FTR: there are further issues when doing CPU hotplug/unplug from a
> PVH
> dom0, but those are out of the scope for 4.19, as I haven't even
> started to diagnose what's going on.
And this issues were before the current patch series was introduced?

~ Oleksii
> 
> Thanks, Roger.
> 
> On Thu, Jun 13, 2024 at 06:56:14PM +0200, Roger Pau Monne wrote:
> > Hello,
> > 
> > The following series aim to fix interrupt handling when doing CPU
> > plug/unplug operations.  Without this series running:
> > 
> > cpus=`xl info max_cpu_id`
> > while [ 1 ]; do
> >     for i in `seq 1 $cpus`; do
> >     xen-hptool cpu-offline $i;
> >     xen-hptool cpu-online $i;
> >     done
> > done
> > 
> > Quite quickly results in interrupts getting lost and "No irq
> > handler for
> > vector" messages on the Xen console.  Drivers in dom0 also start
> > getting
> > interrupt timeouts and the system becomes unusable.
> > 
> > After applying the series running the loop over night still result
> > in a
> > fully usable system, no  "No irq handler for vector" messages at
> > all, no
> > interrupt loses reported by dom0.  Test with x2apic-
> > mode={mixed,cluster}.
> > 
> > I've attempted to document all code as good as I could, interrupt
> > handling has some unexpected corner cases that are hard to diagnose
> > and
> > reason about.
> > 
> > Some XenRT testing is undergoing to ensure no breakages.
> > 
> > Thanks, Roger.
> > 
> > Roger Pau Monne (3):
> >   x86/irq: deal with old_cpu_mask for interrupts in movement in
> >     fixup_irqs()
> >   x86/irq: handle moving interrupts in _assign_irq_vector()
> >   x86/irq: forward pending interrupts to new destination in
> > fixup_irqs()
> > 
> >  xen/arch/x86/include/asm/apic.h |   5 +
> >  xen/arch/x86/irq.c  | 163 +---
> > 
> >  2 files changed, 132 insertions(+), 36 deletions(-)
> > 
> > -- 
> > 2.45.2
> >

Xen Summit talks live on YouTube

2024-06-14 Thread Kelly Choi

Hi everyone,

We had a great few days filled with discussions and talks during the Xen
Summit in Lisbon.

These are now available for you to watch on YouTube!
https://www.youtube.com/playlist?list=PLQMQQsKgvLntZiKoELFs22Mtk-tBNNOMJ

Many thanks,
Kelly Choi

Community Manager
Xen Project

Re: [PATCH v3 0/3] x86/irq: fixes for CPU hot{,un}plug

On Fri, Jun 14, 2024 at 01:52:59PM +0200, Oleksii K. wrote:
> On Fri, 2024-06-14 at 09:28 +0200, Roger Pau Monné wrote:
> > Sorry, forgot to add the for-4.19 tag and Cc Oleksii.
> > 
> > Since we have taken the start of the series, we might as well take
> > the
> > remaining patches (if other x86 maintainers agree) and attempt to
> > hopefully fix all the interrupt issues with CPU hotplug/unplug.
> > 
> > FTR: there are further issues when doing CPU hotplug/unplug from a
> > PVH
> > dom0, but those are out of the scope for 4.19, as I haven't even
> > started to diagnose what's going on.
> And this issues were before the current patch series was introduced?

Sure, the issues with PVH dom0 cpu hotplug/unplug are additional to
the ones fixed here.

Thanks, Roger.

[PATCH for-4.19] xen/arch: Centralise __read_mostly and __ro_after_init

These being in cache.h is inherited from Linux, but is an inappropriate
location to live.

__read_mostly is an optimisation related to data placement in order to avoid
having shared data in cachelines that are likely to be written to, but it
really is just a section of the linked image separating data by usage
patterns; it has nothing to do with cache sizes or flushing logic.

Worse, __ro_after_init was only in xen/cache.h because __read_mostly was in
arch/cache.h, and has literally nothing whatsoever to do with caches.

Move the definitions into xen/sections.h, which in paritcular means that
RISC-V doesn't need to repeat the problematic pattern.  Take the opportunity
to provide a short descriptions of what these are used for.

For now, leave TODO comments next to the other identical definitions.  It
turns out that unpicking cache.h is more complicated than it appears because a
number of files use it for transitive dependencies.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 

For 4.19.  This is to help the RISC-V "full build of Xen" effort without
introducing a pattern that's going to require extra effort to undo after the
fact.
---
 xen/arch/arm/include/asm/cache.h |  1 +
 xen/arch/ppc/include/asm/cache.h |  1 +
 xen/arch/x86/include/asm/cache.h |  1 +
 xen/include/xen/cache.h  |  1 +
 xen/include/xen/sections.h   | 21 +
 5 files changed, 25 insertions(+)

diff --git a/xen/arch/arm/include/asm/cache.h b/xen/arch/arm/include/asm/cache.h
index 240b6ae0eaa3..029b2896fb3e 100644
--- a/xen/arch/arm/include/asm/cache.h
+++ b/xen/arch/arm/include/asm/cache.h
@@ -6,6 +6,7 @@
 #define L1_CACHE_SHIFT  (CONFIG_ARM_L1_CACHE_SHIFT)
 #define L1_CACHE_BYTES  (1 << L1_CACHE_SHIFT)
 
+/* TODO: Phase out the use of this via cache.h */
 #define __read_mostly __section(".data.read_mostly")
 
 #endif
diff --git a/xen/arch/ppc/include/asm/cache.h b/xen/arch/ppc/include/asm/cache.h
index 0d7323d7892f..13c0bf3242c8 100644
--- a/xen/arch/ppc/include/asm/cache.h
+++ b/xen/arch/ppc/include/asm/cache.h
@@ -3,6 +3,7 @@
 #ifndef _ASM_PPC_CACHE_H
 #define _ASM_PPC_CACHE_H
 
+/* TODO: Phase out the use of this via cache.h */
 #define __read_mostly __section(".data.read_mostly")
 
 #endif /* _ASM_PPC_CACHE_H */
diff --git a/xen/arch/x86/include/asm/cache.h b/xen/arch/x86/include/asm/cache.h
index e4770efb22b9..956c05493e23 100644
--- a/xen/arch/x86/include/asm/cache.h
+++ b/xen/arch/x86/include/asm/cache.h
@@ -9,6 +9,7 @@
 #define L1_CACHE_SHIFT (CONFIG_X86_L1_CACHE_SHIFT)
 #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
 
+/* TODO: Phase out the use of this via cache.h */
 #define __read_mostly __section(".data.read_mostly")
 
 #ifndef __ASSEMBLY__
diff --git a/xen/include/xen/cache.h b/xen/include/xen/cache.h
index f52a0aedf768..55456823c543 100644
--- a/xen/include/xen/cache.h
+++ b/xen/include/xen/cache.h
@@ -15,6 +15,7 @@
 #define __cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
 #endif
 
+/* TODO: Phase out the use of this via cache.h */
 #define __ro_after_init __section(".data.ro_after_init")
 
 #endif /* __LINUX_CACHE_H */
diff --git a/xen/include/xen/sections.h b/xen/include/xen/sections.h
index b6cb5604c285..6d4db2b38f0f 100644
--- a/xen/include/xen/sections.h
+++ b/xen/include/xen/sections.h
@@ -3,9 +3,30 @@
 #ifndef __XEN_SECTIONS_H__
 #define __XEN_SECTIONS_H__
 
+#include 
+
 /* SAF-0-safe */
 extern char __init_begin[], __init_end[];
 
+/*
+ * Some data is expected to be written very rarely (if at all).
+ *
+ * For performance reasons is it helpful to group such data in the build, to
+ * avoid the linker placing it adjacent to often-written data.
+ */
+#define __read_mostly __section(".data.read_mostly")
+
+/*
+ * Some data should be chosen during boot and be immutable thereafter.
+ *
+ * Variables annotated with __ro_after_init will become read-only after boot
+ * and suffer a runtime access fault if modified.
+ *
+ * For architectures/platforms which haven't implemented support, these
+ * variables will be treated as regular mutable data.
+ */
+#define __ro_after_init __section(".data.ro_after_init")
+
 #endif /* !__XEN_SECTIONS_H__ */
 /*
  * Local variables:

base-commit: 8b4243a9b560c89bb259db5a27832c253d4bebc7
-- 
2.39.2

Re: [PATCH for-4.19] xen/arch: Centralise __read_mostly and __ro_after_init

2024-06-14 Thread Jan Beulich

On 14.06.2024 14:49, Andrew Cooper wrote:
> These being in cache.h is inherited from Linux, but is an inappropriate
> location to live.
> 
> __read_mostly is an optimisation related to data placement in order to avoid
> having shared data in cachelines that are likely to be written to, but it
> really is just a section of the linked image separating data by usage
> patterns; it has nothing to do with cache sizes or flushing logic.
> 
> Worse, __ro_after_init was only in xen/cache.h because __read_mostly was in
> arch/cache.h, and has literally nothing whatsoever to do with caches.
> 
> Move the definitions into xen/sections.h, which in paritcular means that
> RISC-V doesn't need to repeat the problematic pattern.  Take the opportunity
> to provide a short descriptions of what these are used for.
> 
> For now, leave TODO comments next to the other identical definitions.  It
> turns out that unpicking cache.h is more complicated than it appears because a
> number of files use it for transitive dependencies.

I don't particularly mind this approach, so ...

> Signed-off-by: Andrew Cooper 

Acked-by: Jan Beulich 

Yet this (or variants thereof) is precisely what I wouldn't have wanted to do
myself, for leaving garbled state and, if I'm not mistaken, introducing new
Misra violations because of the redundant #define-s.

Jan

Re: [PATCH 2/7] x86/xstate: Cross-check dynamic XSTATE sizes at boot

On 23/05/2024 4:34 pm, Jan Beulich wrote:
> On 23.05.2024 13:16, Andrew Cooper wrote:
>> Right now, xstate_ctxt_size() performs a cross-check of size with CPUID in 
>> for
>> every call.  This is expensive, being used for domain create/migrate, as well
>> as to service certain guest CPUID instructions.
>>
>> Instead, arrange to check the sizes once at boot.  See the code comments for
>> details.  Right now, it just checks hardware against the algorithm
>> expectations.  Later patches will add further cross-checking.
>>
>> Introduce the missing X86_XCR0_* and X86_XSS_* constants, and a couple of
>> missing CPUID bits.  This is to maximise coverage in the sanity check, even 
>> if
>> we don't expect to use/virtualise some of these features any time soon.  
>> Leave
>> HDC and HWP alone for now.  We don't have CPUID bits from them stored nicely.
> Since you say "the missing", ...
>
>> --- a/xen/arch/x86/include/asm/x86-defns.h
>> +++ b/xen/arch/x86/include/asm/x86-defns.h
>> @@ -77,7 +77,7 @@
>>  #define X86_CR4_PKS0x0100 /* Protection Key Supervisor */
>>  
>>  /*
>> - * XSTATE component flags in XCR0
>> + * XSTATE component flags in XCR0 | MSR_XSS
>>   */
>>  #define X86_XCR0_FP_POS   0
>>  #define X86_XCR0_FP   (1ULL << X86_XCR0_FP_POS)
>> @@ -95,11 +95,34 @@
>>  #define X86_XCR0_ZMM  (1ULL << X86_XCR0_ZMM_POS)
>>  #define X86_XCR0_HI_ZMM_POS   7
>>  #define X86_XCR0_HI_ZMM   (1ULL << X86_XCR0_HI_ZMM_POS)
>> +#define X86_XSS_PROC_TRACE(_AC(1, ULL) <<  8)
>>  #define X86_XCR0_PKRU_POS 9
>>  #define X86_XCR0_PKRU (1ULL << X86_XCR0_PKRU_POS)
>> +#define X86_XSS_PASID (_AC(1, ULL) << 10)
>> +#define X86_XSS_CET_U (_AC(1, ULL) << 11)
>> +#define X86_XSS_CET_S (_AC(1, ULL) << 12)
>> +#define X86_XSS_HDC   (_AC(1, ULL) << 13)
>> +#define X86_XSS_UINTR (_AC(1, ULL) << 14)
>> +#define X86_XSS_LBR   (_AC(1, ULL) << 15)
>> +#define X86_XSS_HWP   (_AC(1, ULL) << 16)
>> +#define X86_XCR0_TILE_CFG (_AC(1, ULL) << 17)
>> +#define X86_XCR0_TILE_DATA(_AC(1, ULL) << 18)
> ... I'm wondering if you deliberately left out APX (bit 19).

It was deliberate.  APX isn't in the SDM yet, so in principle is still
subject to change.

I've tweaked the commit message to avoid using the word 'missing'.

> Since you're re-doing some of what I have long had in patches already,
> I'd also like to ask whether the last underscores each in the two AMX
> names really are useful in your opinion. While rebasing isn't going
> to be difficult either way, it would be yet simpler with
> X86_XCR0_TILECFG and X86_XCR0_TILEDATA, as I've had it in my patches
> for over 3 years.

I'm torn here.  I don't want to deliberately make things harder for you,
but I would really prefer that we use the more legible form...
>> --- a/xen/arch/x86/xstate.c
>> +++ b/xen/arch/x86/xstate.c
>> @@ -604,9 +604,156 @@ static bool valid_xcr0(uint64_t xcr0)
>>  if ( !(xcr0 & X86_XCR0_BNDREGS) != !(xcr0 & X86_XCR0_BNDCSR) )
>>  return false;
>>  
>> +/* TILE_CFG and TILE_DATA must be the same. */
>> +if ( !(xcr0 & X86_XCR0_TILE_CFG) != !(xcr0 & X86_XCR0_TILE_DATA) )
>> +return false;
>> +
>>  return true;
>>  }
>>  
>> +struct xcheck_state {
>> +uint64_t states;
>> +uint32_t uncomp_size;
>> +uint32_t comp_size;
>> +};
>> +
>> +static void __init check_new_xstate(struct xcheck_state *s, uint64_t new)
>> +{
>> +uint32_t hw_size;
>> +
>> +BUILD_BUG_ON(X86_XCR0_STATES & X86_XSS_STATES);
>> +
>> +BUG_ON(s->states & new); /* States only increase. */
>> +BUG_ON(!valid_xcr0(s->states | new)); /* Xen thinks it's a good value. 
>> */
>> +BUG_ON(new & ~(X86_XCR0_STATES | X86_XSS_STATES)); /* Known state. */
>> +BUG_ON((new & X86_XCR0_STATES) &&
>> +   (new & X86_XSS_STATES)); /* User or supervisor, not both. */
>> +
>> +s->states |= new;
>> +if ( new & X86_XCR0_STATES )
>> +{
>> +if ( !set_xcr0(s->states & X86_XCR0_STATES) )
>> +BUG();
>> +}
>> +else
>> +set_msr_xss(s->states & X86_XSS_STATES);
>> +
>> +/*
>> + * Check the uncompressed size.  Some XSTATEs are out-of-order and fill 
>> in
>> + * prior holes in the state area, so we check that the size doesn't
>> + * decrease.
>> + */
>> +hw_size = cpuid_count_ebx(0xd, 0);
>> +
>> +if ( hw_size < s->uncomp_size )
>> +panic("XSTATE 0x%016"PRIx64", new bits {%63pbl}, uncompressed hw 
>> size %#x < prev size %#x\n",
>> +  s->states, &new, hw_size, s->uncomp_size);
>> +
>> +s->uncomp_size = hw_size;
>> +
>> +/*
>> + * Check the compressed size, if available.  All components strictly
>> + * appear in index order.  In principle there are no holes, but some
>> + * components have their base address 64-byte aligned for efficiency
>> + * reasons (e.g. AMX-TILE) and there are ot

Re: [PATCH 4/7] x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()

On 23/05/2024 5:09 pm, Jan Beulich wrote:
> On 23.05.2024 13:16, Andrew Cooper wrote:
>> @@ -611,6 +587,40 @@ static bool valid_xcr0(uint64_t xcr0)
>>  return true;
>>  }
>>  
>> +unsigned int xstate_uncompressed_size(uint64_t xcr0)
>> +{
>> +unsigned int size = XSTATE_AREA_MIN_SIZE, i;
>> +
>> +ASSERT((xcr0 & ~X86_XCR0_STATES) == 0);
> I'm puzzled by the combination of this assertion and ...
>
>> +if ( xcr0 == xfeature_mask )
>> +return xsave_cntxt_size;
> ... this conditional return. Yes, right now we don't support/use any XSS
> components, but without any comment the assertion looks overly restrictive
> to me.

The ASSERT() is to cover a bug I found during testing.

It is a hard error to pass in non-XCR0 states.  XSS states do not exist
in an uncompressed image, and have a base of 0, which ends up hitting a
later assertion.

This snippet with xfeature_mask is just code motion from
xstate_ctxt_size(), expressed as an addition because of diff.  It was to
save a double XCR0 write in a perceived common case.

But, your AMX series makes both xfeature_mask and xsave_cntxt_size bogus
by there no longer being a uniform size of the save area, so I can
probably get away with simply dropping it here.

~Andrew

Re: [PATCH for-4.19] xen/arch: Centralise __read_mostly and __ro_after_init

On Fri, Jun 14, 2024 at 01:49:50PM +0100, Andrew Cooper wrote:
> These being in cache.h is inherited from Linux, but is an inappropriate
> location to live.
> 
> __read_mostly is an optimisation related to data placement in order to avoid
> having shared data in cachelines that are likely to be written to, but it
> really is just a section of the linked image separating data by usage
> patterns; it has nothing to do with cache sizes or flushing logic.
> 
> Worse, __ro_after_init was only in xen/cache.h because __read_mostly was in
> arch/cache.h, and has literally nothing whatsoever to do with caches.
> 
> Move the definitions into xen/sections.h, which in paritcular means that
> RISC-V doesn't need to repeat the problematic pattern.  Take the opportunity
> to provide a short descriptions of what these are used for.
> 
> For now, leave TODO comments next to the other identical definitions.  It
> turns out that unpicking cache.h is more complicated than it appears because a
> number of files use it for transitive dependencies.

I assume that including sections.h from cache.h (in the meantime)
creates a circular header dependency?

> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> CC: Volodymyr Babchuk 
> CC: Bertrand Marquis 
> CC: Michal Orzel 
> CC: Oleksii Kurochko 
> CC: Shawn Anastasio 
> 
> For 4.19.  This is to help the RISC-V "full build of Xen" effort without
> introducing a pattern that's going to require extra effort to undo after the
> fact.
> ---
>  xen/arch/arm/include/asm/cache.h |  1 +
>  xen/arch/ppc/include/asm/cache.h |  1 +
>  xen/arch/x86/include/asm/cache.h |  1 +
>  xen/include/xen/cache.h  |  1 +
>  xen/include/xen/sections.h   | 21 +
>  5 files changed, 25 insertions(+)
> 
> diff --git a/xen/arch/arm/include/asm/cache.h 
> b/xen/arch/arm/include/asm/cache.h
> index 240b6ae0eaa3..029b2896fb3e 100644
> --- a/xen/arch/arm/include/asm/cache.h
> +++ b/xen/arch/arm/include/asm/cache.h
> @@ -6,6 +6,7 @@
>  #define L1_CACHE_SHIFT  (CONFIG_ARM_L1_CACHE_SHIFT)
>  #define L1_CACHE_BYTES  (1 << L1_CACHE_SHIFT)
>  
> +/* TODO: Phase out the use of this via cache.h */
>  #define __read_mostly __section(".data.read_mostly")
>  
>  #endif
> diff --git a/xen/arch/ppc/include/asm/cache.h 
> b/xen/arch/ppc/include/asm/cache.h
> index 0d7323d7892f..13c0bf3242c8 100644
> --- a/xen/arch/ppc/include/asm/cache.h
> +++ b/xen/arch/ppc/include/asm/cache.h
> @@ -3,6 +3,7 @@
>  #ifndef _ASM_PPC_CACHE_H
>  #define _ASM_PPC_CACHE_H
>  
> +/* TODO: Phase out the use of this via cache.h */
>  #define __read_mostly __section(".data.read_mostly")
>  
>  #endif /* _ASM_PPC_CACHE_H */
> diff --git a/xen/arch/x86/include/asm/cache.h 
> b/xen/arch/x86/include/asm/cache.h
> index e4770efb22b9..956c05493e23 100644
> --- a/xen/arch/x86/include/asm/cache.h
> +++ b/xen/arch/x86/include/asm/cache.h
> @@ -9,6 +9,7 @@
>  #define L1_CACHE_SHIFT   (CONFIG_X86_L1_CACHE_SHIFT)
>  #define L1_CACHE_BYTES   (1 << L1_CACHE_SHIFT)
>  
> +/* TODO: Phase out the use of this via cache.h */
>  #define __read_mostly __section(".data.read_mostly")
>  
>  #ifndef __ASSEMBLY__
> diff --git a/xen/include/xen/cache.h b/xen/include/xen/cache.h
> index f52a0aedf768..55456823c543 100644
> --- a/xen/include/xen/cache.h
> +++ b/xen/include/xen/cache.h
> @@ -15,6 +15,7 @@
>  #define __cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
>  #endif
>  
> +/* TODO: Phase out the use of this via cache.h */
>  #define __ro_after_init __section(".data.ro_after_init")
>  
>  #endif /* __LINUX_CACHE_H */
> diff --git a/xen/include/xen/sections.h b/xen/include/xen/sections.h
> index b6cb5604c285..6d4db2b38f0f 100644
> --- a/xen/include/xen/sections.h
> +++ b/xen/include/xen/sections.h
> @@ -3,9 +3,30 @@
>  #ifndef __XEN_SECTIONS_H__
>  #define __XEN_SECTIONS_H__
>  
> +#include 
> +
>  /* SAF-0-safe */
>  extern char __init_begin[], __init_end[];
>  
> +/*
> + * Some data is expected to be written very rarely (if at all).
> + *
> + * For performance reasons is it helpful to group such data in the build, to
> + * avoid the linker placing it adjacent to often-written data.
> + */
> +#define __read_mostly __section(".data.read_mostly")
> +
> +/*
> + * Some data should be chosen during boot and be immutable thereafter.
> + *
> + * Variables annotated with __ro_after_init will become read-only after boot
> + * and suffer a runtime access fault if modified.
> + *
> + * For architectures/platforms which haven't implemented support, these
> + * variables will be treated as regular mutable data.
> + */
> +#define __ro_after_init __section(".data.ro_after_init")

Is it fine for MISRA to have possibly two identical defines in the
same translation unit?

Thanks, Roger.

[xen-unstable test] 186344: tolerable FAIL

flight 186344 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186344/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186341
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186341
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186341
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186341
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186341
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186341
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  4fdd8d75566fdad06667a79ec0ce6f43cc466c54
baseline version:
 xen  4fdd8d75566fdad06667a79ec0ce6f43cc466c54

Last test of basis   186344  2024-06-14 05:48:54 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-l

Re: [PATCH 6/7] x86/cpuid: Fix handling of XSAVE dynamic leaves

On 23/05/2024 5:16 pm, Jan Beulich wrote:
> On 23.05.2024 13:16, Andrew Cooper wrote:
>> First, if XSAVE is available in hardware but not visible to the guest, the
>> dynamic leaves shouldn't be filled in.
>>
>> Second, the comment concerning XSS state is wrong.  VT-x doesn't manage
>> host/guest state automatically, but there is provision for "host only" bits 
>> to
>> be set, so the implications are still accurate.
>>
>> Introduce xstate_compressed_size() to mirror the uncompressed one.  Cross
>> check it at boot.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Thanks.

> Irrespective ...
>
>> v3:
>>  * Adjust commit message about !XSAVE guests
>>  * Rebase over boot time cross check
>>  * Use raw policy
> ... it should probably have occurred to me earlier on to ask: Why raw policy?
> Isn't the host one the more appropriate one to use for any kind of internal
> decisions?

State information is identical in all policies.  It's the ABI of the
X{SAVE,RSTOR}* instructions.

Beyond that, consistency.

xstate_uncompressed_size() does strictly need to be the raw policy,
because it is used by recalculate_xstate() to calculate the host policy.

xstate_compressed_size() doesn't have the same restriction, but should
use the same source of data.

Finally, xstate_{un,}compressed_size() aren't really tied to a choice of
features in the first place.  They shouldn't be limited to the
host_policy's subset of active features.

~Andrew

[xen-unstable-smoke test] 186349: tolerable all pass - PUSHED

flight 186349 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186349/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  8b4243a9b560c89bb259db5a27832c253d4bebc7
baseline version:
 xen  4fdd8d75566fdad06667a79ec0ce6f43cc466c54

Last test of basis   186337  2024-06-13 16:02:07 Z0 days
Testing same since   186349  2024-06-14 10:03:57 Z0 days1 attempts


People who touched revisions under test:
  Luca Fancellu 
  Penny Zheng 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   4fdd8d7556..8b4243a9b5  8b4243a9b560c89bb259db5a27832c253d4bebc7 -> smoke

Re: [PATCH] x86: pci: xen: Remove unnecessary ‘0’ values from ret

2024-06-14 Thread Ilpo Järvinen

On Wed, 12 Jun 2024, Li zeming wrote:

> ret is assigned first, so it does not need to initialize the assignment.

While the patch seems fine, this description and the shortlog are
confusing.

-- 
 i.

> Signed-off-by: Li zeming 
> ---
>  arch/x86/pci/xen.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index 652cd53e77f6..67cb9dc9b2e7 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -267,7 +267,7 @@ static bool __read_mostly pci_seg_supported = true;
>  
>  static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int 
> type)
>  {
> - int ret = 0;
> + int ret;
>   struct msi_desc *msidesc;
>  
>   msi_for_each_desc(msidesc, &dev->dev, MSI_DESC_NOTASSOCIATED) {
> @@ -353,7 +353,7 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev 
> *dev, int nvec, int type)
>  
>  bool xen_initdom_restore_msi(struct pci_dev *dev)
>  {
> - int ret = 0;
> + int ret;
>  
>   if (!xen_initial_domain())
>   return true;
>

Re: [PATCH for-4.19] xen/arch: Centralise __read_mostly and __ro_after_init

2024-06-14 Thread Oleksii K.

On Fri, 2024-06-14 at 13:49 +0100, Andrew Cooper wrote:
> These being in cache.h is inherited from Linux, but is an
> inappropriate
> location to live.
> 
> __read_mostly is an optimisation related to data placement in order
> to avoid
> having shared data in cachelines that are likely to be written to,
> but it
> really is just a section of the linked image separating data by usage
> patterns; it has nothing to do with cache sizes or flushing logic.
> 
> Worse, __ro_after_init was only in xen/cache.h because __read_mostly
> was in
> arch/cache.h, and has literally nothing whatsoever to do with caches.
> 
> Move the definitions into xen/sections.h, which in paritcular means
> that
> RISC-V doesn't need to repeat the problematic pattern.  Take the
> opportunity
> to provide a short descriptions of what these are used for.
> 
> For now, leave TODO comments next to the other identical
> definitions.  It
> turns out that unpicking cache.h is more complicated than it appears
> because a
> number of files use it for transitive dependencies.
> 
> Signed-off-by: Andrew Cooper 
Release-Acked-By: Oleksii Kurochko 

> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> CC: Volodymyr Babchuk 
> CC: Bertrand Marquis 
> CC: Michal Orzel 
> CC: Oleksii Kurochko 
> CC: Shawn Anastasio 
> 
> For 4.19.  This is to help the RISC-V "full build of Xen" effort
> without
> introducing a pattern that's going to require extra effort to undo
> after the
> fact.
> ---
>  xen/arch/arm/include/asm/cache.h |  1 +
>  xen/arch/ppc/include/asm/cache.h |  1 +
>  xen/arch/x86/include/asm/cache.h |  1 +
>  xen/include/xen/cache.h  |  1 +
>  xen/include/xen/sections.h   | 21 +
>  5 files changed, 25 insertions(+)
> 
> diff --git a/xen/arch/arm/include/asm/cache.h
> b/xen/arch/arm/include/asm/cache.h
> index 240b6ae0eaa3..029b2896fb3e 100644
> --- a/xen/arch/arm/include/asm/cache.h
> +++ b/xen/arch/arm/include/asm/cache.h
> @@ -6,6 +6,7 @@
>  #define L1_CACHE_SHIFT  (CONFIG_ARM_L1_CACHE_SHIFT)
>  #define L1_CACHE_BYTES  (1 << L1_CACHE_SHIFT)
>  
> +/* TODO: Phase out the use of this via cache.h */
>  #define __read_mostly __section(".data.read_mostly")
>  
>  #endif
> diff --git a/xen/arch/ppc/include/asm/cache.h
> b/xen/arch/ppc/include/asm/cache.h
> index 0d7323d7892f..13c0bf3242c8 100644
> --- a/xen/arch/ppc/include/asm/cache.h
> +++ b/xen/arch/ppc/include/asm/cache.h
> @@ -3,6 +3,7 @@
>  #ifndef _ASM_PPC_CACHE_H
>  #define _ASM_PPC_CACHE_H
>  
> +/* TODO: Phase out the use of this via cache.h */
>  #define __read_mostly __section(".data.read_mostly")
>  
>  #endif /* _ASM_PPC_CACHE_H */
> diff --git a/xen/arch/x86/include/asm/cache.h
> b/xen/arch/x86/include/asm/cache.h
> index e4770efb22b9..956c05493e23 100644
> --- a/xen/arch/x86/include/asm/cache.h
> +++ b/xen/arch/x86/include/asm/cache.h
> @@ -9,6 +9,7 @@
>  #define L1_CACHE_SHIFT   (CONFIG_X86_L1_CACHE_SHIFT)
>  #define L1_CACHE_BYTES   (1 << L1_CACHE_SHIFT)
>  
> +/* TODO: Phase out the use of this via cache.h */
>  #define __read_mostly __section(".data.read_mostly")
>  
>  #ifndef __ASSEMBLY__
> diff --git a/xen/include/xen/cache.h b/xen/include/xen/cache.h
> index f52a0aedf768..55456823c543 100644
> --- a/xen/include/xen/cache.h
> +++ b/xen/include/xen/cache.h
> @@ -15,6 +15,7 @@
>  #define __cacheline_aligned
> __attribute__((__aligned__(SMP_CACHE_BYTES)))
>  #endif
>  
> +/* TODO: Phase out the use of this via cache.h */
>  #define __ro_after_init __section(".data.ro_after_init")
>  
>  #endif /* __LINUX_CACHE_H */
> diff --git a/xen/include/xen/sections.h b/xen/include/xen/sections.h
> index b6cb5604c285..6d4db2b38f0f 100644
> --- a/xen/include/xen/sections.h
> +++ b/xen/include/xen/sections.h
> @@ -3,9 +3,30 @@
>  #ifndef __XEN_SECTIONS_H__
>  #define __XEN_SECTIONS_H__
>  
> +#include 
> +
>  /* SAF-0-safe */
>  extern char __init_begin[], __init_end[];
>  
> +/*
> + * Some data is expected to be written very rarely (if at all).
> + *
> + * For performance reasons is it helpful to group such data in the
> build, to
> + * avoid the linker placing it adjacent to often-written data.
> + */
> +#define __read_mostly __section(".data.read_mostly")
> +
> +/*
> + * Some data should be chosen during boot and be immutable
> thereafter.
> + *
> + * Variables annotated with __ro_after_init will become read-only
> after boot
> + * and suffer a runtime access fault if modified.
> + *
> + * For architectures/platforms which haven't implemented support,
> these
> + * variables will be treated as regular mutable data.
> + */
> +#define __ro_after_init __section(".data.ro_after_init")
> +
>  #endif /* !__XEN_SECTIONS_H__ */
>  /*
>   * Local variables:
> 
> base-commit: 8b4243a9b560c89bb259db5a27832c253d4bebc7

[ovmf test] 186352: all pass - PUSHED

flight 186352 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186352/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 5e776299a2604b336a947e68593012ab2cc16eb4
baseline version:
 ovmf ce91687a1b2d4e03b01abb474b4665629776f588

Last test of basis   186346  2024-06-14 07:13:40 Z0 days
Testing same since   186352  2024-06-14 13:42:52 Z0 days1 attempts


People who touched revisions under test:
  Gerd Hoffmann 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   ce91687a1b..5e776299a2  5e776299a2604b336a947e68593012ab2cc16eb4 -> 
xen-tested-master

[PATCH 0/2] Address violations of MISRA C:2012 Rule 5.3

2024-06-14 Thread Alessandro Zucchelli

This addresses violations of MISRA C:2012 Rule 5.3 which states as
following: An identifier declared in an inner scope shall not hide an
identifier declared in an outer scope.

In this series are modified files x86/mm.c and x86/e820.c in which occurred
instances of variable names shadowing a global variable; these patches are aimed
to remove said occurrences leading to partial compliance under MISRA C:2012
Rule 5.3.

No functional change.

Alessandro Zucchelli (2):
  x86/mm address violations of MISRA C:2012 Rule 5.3
  x86/e820 address violations of MISRA C:2012 Rule 5.3

 xen/arch/x86/e820.c | 74 ++---
 xen/arch/x86/mm.c   | 12 
 2 files changed, 43 insertions(+), 43 deletions(-)

-- 
2.34.1

[PATCH 2/2] x86/e820 address violations of MISRA C:2012 Rule 5.3

2024-06-14 Thread Alessandro Zucchelli

This addresses violations of MISRA C:2012 Rule 5.3 which states as
following: An identifier declared in an inner scope shall not hide an
identifier declared in an outer scope.

No functional change.

Signed-off-by: Alessandro Zucchelli 
---
 xen/arch/x86/e820.c | 74 ++---
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
index 6a3ce7e0a0..3726823e88 100644
--- a/xen/arch/x86/e820.c
+++ b/xen/arch/x86/e820.c
@@ -593,79 +593,79 @@ int __init e820_add_range(uint64_t s, uint64_t e, 
uint32_t type)
 }
 
 int __init e820_change_range_type(
-struct e820map *e820, uint64_t s, uint64_t e,
+struct e820map *map, uint64_t s, uint64_t e,
 uint32_t orig_type, uint32_t new_type)
 {
 uint64_t rs = 0, re = 0;
 unsigned int i;
 
-for ( i = 0; i < e820->nr_map; i++ )
+for ( i = 0; i < map->nr_map; i++ )
 {
 /* Have we found the e820 region that includes the specified range? */
-rs = e820->map[i].addr;
-re = rs + e820->map[i].size;
+rs = map->map[i].addr;
+re = rs + map->map[i].size;
 if ( (s >= rs) && (e <= re) )
 break;
 }
 
-if ( (i == e820->nr_map) || (e820->map[i].type != orig_type) )
+if ( (i == map->nr_map) || (map->map[i].type != orig_type) )
 return 0;
 
 if ( (s == rs) && (e == re) )
 {
-e820->map[i].type = new_type;
+map->map[i].type = new_type;
 }
 else if ( (s == rs) || (e == re) )
 {
-if ( (e820->nr_map + 1) > ARRAY_SIZE(e820->map) )
+if ( (map->nr_map + 1) > ARRAY_SIZE(map->map) )
 goto overflow;
 
-memmove(&e820->map[i+1], &e820->map[i],
-(e820->nr_map-i) * sizeof(e820->map[0]));
-e820->nr_map++;
+memmove(&map->map[i+1], &map->map[i],
+(map->nr_map-i) * sizeof(map->map[0]));
+map->nr_map++;
 
 if ( s == rs )
 {
-e820->map[i].size = e - s;
-e820->map[i].type = new_type;
-e820->map[i+1].addr = e;
-e820->map[i+1].size = re - e;
+map->map[i].size = e - s;
+map->map[i].type = new_type;
+map->map[i+1].addr = e;
+map->map[i+1].size = re - e;
 }
 else
 {
-e820->map[i].size = s - rs;
-e820->map[i+1].addr = s;
-e820->map[i+1].size = e - s;
-e820->map[i+1].type = new_type;
+map->map[i].size = s - rs;
+map->map[i+1].addr = s;
+map->map[i+1].size = e - s;
+map->map[i+1].type = new_type;
 }
 }
 else
 {
-if ( (e820->nr_map + 2) > ARRAY_SIZE(e820->map) )
+if ( (map->nr_map + 2) > ARRAY_SIZE(map->map) )
 goto overflow;
 
-memmove(&e820->map[i+2], &e820->map[i],
-(e820->nr_map-i) * sizeof(e820->map[0]));
-e820->nr_map += 2;
+memmove(&map->map[i+2], &map->map[i],
+(map->nr_map-i) * sizeof(map->map[0]));
+map->nr_map += 2;
 
-e820->map[i].size = s - rs;
-e820->map[i+1].addr = s;
-e820->map[i+1].size = e - s;
-e820->map[i+1].type = new_type;
-e820->map[i+2].addr = e;
-e820->map[i+2].size = re - e;
+map->map[i].size = s - rs;
+map->map[i+1].addr = s;
+map->map[i+1].size = e - s;
+map->map[i+1].type = new_type;
+map->map[i+2].addr = e;
+map->map[i+2].size = re - e;
 }
 
 /* Finally, look for any opportunities to merge adjacent e820 entries. */
-for ( i = 0; i < (e820->nr_map - 1); i++ )
+for ( i = 0; i < (map->nr_map - 1); i++ )
 {
-if ( (e820->map[i].type != e820->map[i+1].type) ||
- ((e820->map[i].addr + e820->map[i].size) != e820->map[i+1].addr) )
+if ( (map->map[i].type != map->map[i+1].type) ||
+ ((map->map[i].addr + map->map[i].size) != map->map[i+1].addr) )
 continue;
-e820->map[i].size += e820->map[i+1].size;
-memmove(&e820->map[i+1], &e820->map[i+2],
-(e820->nr_map-i-2) * sizeof(e820->map[0]));
-e820->nr_map--;
+map->map[i].size += map->map[i+1].size;
+memmove(&map->map[i+1], &map->map[i+2],
+(map->nr_map-i-2) * sizeof(map->map[0]));
+map->nr_map--;
 i--;
 }
 
@@ -678,9 +678,9 @@ int __init e820_change_range_type(
 }
 
 /* Set E820_RAM area (@s,@e) as RESERVED in specified e820 map. */
-int __init reserve_e820_ram(struct e820map *e820, uint64_t s, uint64_t e)
+int __init reserve_e820_ram(struct e820map *map, uint64_t s, uint64_t e)
 {
-return e820_change_range_type(e820, s, e, E820_RAM, E820_RESERVED);
+return e820_change_range_type(map, s, e, E820_RAM, E820_RESERVED);
 }
 
 unsigned long __init init_e820(const char *str, struct e820map *raw)
-- 
2.34.1

[PATCH 1/2] x86/mm address violations of MISRA C:2012 Rule 5.3

2024-06-14 Thread Alessandro Zucchelli

This addresses violations of MISRA C:2012 Rule 5.3 which states as
following: An identifier declared in an inner scope shall not hide an
identifier declared in an outer scope.

No functional change.

Signed-off-by: Alessandro Zucchelli 
---
 xen/arch/x86/mm.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5471b6b1f2..720d56e103 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4703,7 +4703,7 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 struct xen_foreign_memory_map fmap;
 struct domain *d;
-struct e820entry *map;
+struct e820entry *e;
 
 if ( copy_from_guest(&fmap, arg, 1) )
 return -EFAULT;
@@ -4722,23 +4722,23 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-map = xmalloc_array(e820entry_t, fmap.map.nr_entries);
-if ( map == NULL )
+e = xmalloc_array(e820entry_t, fmap.map.nr_entries);
+if ( e == NULL )
 {
 rcu_unlock_domain(d);
 return -ENOMEM;
 }
 
-if ( copy_from_guest(map, fmap.map.buffer, fmap.map.nr_entries) )
+if ( copy_from_guest(e, fmap.map.buffer, fmap.map.nr_entries) )
 {
-xfree(map);
+xfree(e);
 rcu_unlock_domain(d);
 return -EFAULT;
 }
 
 spin_lock(&d->arch.e820_lock);
 xfree(d->arch.e820);
-d->arch.e820 = map;
+d->arch.e820 = e;
 d->arch.nr_e820 = fmap.map.nr_entries;
 spin_unlock(&d->arch.e820_lock);
 
-- 
2.34.1

Re: convert the SCSI ULDs to the atomic queue limits API v2

2024-06-14 Thread Jens Axboe



On Fri, 31 May 2024 09:47:55 +0200, Christoph Hellwig wrote:
> this series converts the SCSI upper level drivers to the atomic queue
> limits API.
> 
> The first patch is a bug fix for ubd that later patches depend on and
> might be worth picking up for 6.10.
> 
> The second patch changes the max_sectors calculation to take the optimal
> I/O size into account so that sd, nbd and rbd don't have to mess with
> the user max_sector value.  I'd love to see a careful review from the
> nbd and rbd maintainers for this one!
> 
> [...]

Applied, thanks!

[01/14] ubd: refactor the interrupt handler
commit: 5db755fbb1a0de4a4cfd5d5edfaa19853b9c56e6
[02/14] ubd: untagle discard vs write zeroes not support handling
commit: 31ade7d4fdcf382beb8cb229a1f5d77e0f239672
[03/14] rbd: increase io_opt again
commit: a00d4bfce7c6d7fa4712b8133ec195c9bd142ae6
[04/14] block: take io_opt and io_min into account for max_sectors
commit: a23634644afc2f7c1bac98776440a1f3b161819e
[05/14] sd: simplify the ZBC case in provisioning_mode_store
commit: b3491b0db165c0cbe25874da66d97652c03db654
[06/14] sd: add a sd_disable_discard helper
commit: b0dadb86a90bd5a7b723f9d3a6cf69da9b596496
[07/14] sd: add a sd_disable_write_same helper
commit: 9972b8ce0d4ba373901bdd1e15e4de58fcd7f662
[08/14] sd: simplify the disable case in sd_config_discard
commit: d15b9bd42cd3b2077812d4bf32f532a9bd5c4914
[09/14] sd: factor out a sd_discard_mode helper
commit: f1e8185fc12c699c3abf4f39b1ff5d7793da3a95
[10/14] sd: cleanup zoned queue limits initialization
commit: 9c1d339a1bf45f4d3a2e77bbf24b0ec51f02551c
[11/14] sd: convert to the atomic queue limits API
commit: 804e498e0496d889090f32f812b5ce1a4f2aa63e
[12/14] sr: convert to the atomic queue limits API
commit: 969f17e10f5b732c05186ee0126c8a08166d0cda
[13/14] block: remove unused queue limits API
commit: 1652b0bafeaa8281ca9a805d81e13d7647bd2f44
[14/14] block: add special APIs for run-time disabling of discard and friends
commit: 73e3715ed14844067c5c598e72777641004a7f60

Best regards,
-- 
Jens Axboe

Re: Design session notes: GPU acceleration in Xen

2024-06-14 Thread Demi Marie Obenour

On Fri, Jun 14, 2024 at 10:12:40AM +0200, Jan Beulich wrote:
> On 14.06.2024 09:21, Roger Pau Monné wrote:
> > On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote:
> >> On 13.06.2024 20:43, Demi Marie Obenour wrote:
> >>> GPU acceleration requires that pageable host memory be able to be mapped
> >>> into a guest.
> >>
> >> I'm sure it was explained in the session, which sadly I couldn't attend.
> >> I've been asking Ray and Xenia the same before, but I'm afraid it still
> >> hasn't become clear to me why this is a _requirement_. After all that's
> >> against what we're doing elsewhere (i.e. so far it has always been
> >> guest memory that's mapped in the host). I can appreciate that it might
> >> be more difficult to implement, but avoiding to violate this fundamental
> >> (kind of) rule might be worth the price (and would avoid other
> >> complexities, of which there may be lurking more than what you enumerate
> >> below).
> > 
> > My limited understanding (please someone correct me if wrong) is that
> > the GPU buffer (or context I think it's also called?) is always
> > allocated from dom0 (the owner of the GPU).  The underling memory
> > addresses of such buffer needs to be mapped into the guest.  The
> > buffer backing memory might be GPU MMIO from the device BAR(s) or
> > system RAM, and such buffer can be paged by the dom0 kernel at any
> > time (iow: changing the backing memory from MMIO to RAM or vice
> > versa).  Also, the buffer must be contiguous in physical address
> > space.
> 
> This last one in particular would of course be a severe restriction.
> Yet: There's an IOMMU involved, isn't there?

On x86 there is.  On Arm there might or might not be.  There are
non-embedded systems (such as Apple silicon) where the GPU is not behind
an IOMMU, for performance reasons IIUC.

> > I'm not sure it's possible to ensure that when using system RAM such
> > memory comes from the guest rather than the host, as it would likely
> > require some very intrusive hooks into the kernel logic, and
> > negotiation with the guest to allocate the requested amount of
> > memory and hand it over to dom0.  If the maximum size of the buffer is
> > known in advance maybe dom0 can negotiate with the guest to allocate
> > such a region and grant it access to dom0 at driver attachment time.
> 
> Besides the thought of transiently converting RAM to kind-of-MMIO, this
> makes me think of another possible option: Could Dom0 transfer ownership
> of the RAM that wants mapping in the guest (remotely resembling
> grant-transfer)? Would require the guest to have ballooned down enough
> first, of course. (In both cases it would certainly need working out how
> the conversion / transfer back could be made work safely and reasonably
> cleanly.)
> 
> Jan

The kernel driver needs to be able to reclaim the memory at any time.
My understanding is that this is used to migrate memory between VRAM and
system RAM.  It might also be used for other purposes.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature

for_each_set_bit() clean-up (API RFC)

More fallout from looking at the code generation...

for_each_set_bit() forces it's bitmap parameter out into memory.  For an
arbitrary sized bitmap, this is fine - and likely preferable as it's an
in-memory to begin with.

However, more than half the current users of for_each_set_bit() are
operating over an single int/long, and this too is spilled to the
stack.  Worse, x86 seems to be the only architecture which (tries, but
not very well) to optimise find_{first,next}_bit() for GPR-sized
quantities, meaning that for_each_set_bit() hides 2 backing function calls.

The ARM (v)GIC code in particular suffers horribly because of this.

We also have several interesting opencoded forms:
* evtchn_check_pollers() is a (preprocessor identical) opencoding.
* hvm_emulate_writeback() is equivalent.
* for_each_vp() exists just to hardcode a constant and swap the other
two parameters.

and several others forms which I think could be expressed more cleanly
as for_each_set_bit().

We also have the while()/ffs() forms which are "just" for_each_set_bit()
and some even manage to not spill their main variable to memory.


I want to get to a position where there is one clear API to use, and
that the compiler will handle nicely.  Xen's code generation will
definitely improve as a consequence.


Sadly, transforming the ideal while()/ffs() form into a for() loop is a
bit tricky.  This works:

for ( unsigned int v = (val), (bit);
      v;
      v &= v - 1 )
if ( 1 )
{
    (bit) = ffs(v) - 1;
    goto body;
}
else
    body:

which is a C metaprogramming trick borrowed from PuTTY to make:

for_each_BLAH ( bit, val )
{
    // nice loop body
}

work, while having the ffs() calculated logically within the loop body.

The first issue I expect people to have with the above is the raw 'body'
label, although with a macro that can be fixed using body_ ## __COUNTER__.

A full example is https://godbolt.org/z/oMGfah696 although a real
example in Xen is going to have to be variadic for at least ffs() and
ffsl().


Now, from an API point of view, it would be lovely if we could make a
single for_each_set_bit() which covers both cases, and while I can
distinguish the two forms by whether there are 2 or 3 args, I expect
MISRA is going to have a fit at that.  Also there's a difference based
on the scope of 'bit' and also whether modifications to 'val' in the
loop body take effect on the loop condition (they don't because a copy
is taken).

So I expect everyone is going to want a new API to use here.  But what
to call it?

More than half of the callers in Xen really want the GPR form, so we
could introduce a new bitmap_for_each_set_bit(), move all the callers
over, then introduce a "new" for_each_set_bit() which is only of the GPR
form.

Or does anyone want to suggest an alternative name?

Thoughts?

~Andrew

Re: Design session notes: GPU acceleration in Xen

2024-06-14 Thread Demi Marie Obenour

On Fri, Jun 14, 2024 at 09:21:56AM +0200, Roger Pau Monné wrote:
> On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote:
> > On 13.06.2024 20:43, Demi Marie Obenour wrote:
> > > GPU acceleration requires that pageable host memory be able to be mapped
> > > into a guest.
> > 
> > I'm sure it was explained in the session, which sadly I couldn't attend.
> > I've been asking Ray and Xenia the same before, but I'm afraid it still
> > hasn't become clear to me why this is a _requirement_. After all that's
> > against what we're doing elsewhere (i.e. so far it has always been
> > guest memory that's mapped in the host). I can appreciate that it might
> > be more difficult to implement, but avoiding to violate this fundamental
> > (kind of) rule might be worth the price (and would avoid other
> > complexities, of which there may be lurking more than what you enumerate
> > below).
> 
> My limited understanding (please someone correct me if wrong) is that
> the GPU buffer (or context I think it's also called?) is always
> allocated from dom0 (the owner of the GPU).

A GPU context is a GPU address space.  It's the GPU equivalent of a CPU
process.  I don't believe that the same context can be used by more than
one userspace process (though I could be wrong), but the same userspace
process can create and use as many contexts as it wants.

> The underling memory
> addresses of such buffer needs to be mapped into the guest.  The
> buffer backing memory might be GPU MMIO from the device BAR(s) or
> system RAM, and such buffer can be paged by the dom0 kernel at any
> time (iow: changing the backing memory from MMIO to RAM or vice
> versa).  Also, the buffer must be contiguous in physical address
> space.
> 
> I'm not sure it's possible to ensure that when using system RAM such
> memory comes from the guest rather than the host, as it would likely
> require some very intrusive hooks into the kernel logic, and
> negotiation with the guest to allocate the requested amount of
> memory and hand it over to dom0.  If the maximum size of the buffer is
> known in advance maybe dom0 can negotiate with the guest to allocate
> such a region and grant it access to dom0 at driver attachment time.

I don't think there is a useful maximum size known.  There may be a
limit, but it would be around 4GiB or more, which is far too high to
reserve physical memory for up front.

> One aspect that I'm lacking clarity is better understanding of how the
> process of allocating and assigning a GPU buffer to a guest is
> performed (I think this is the key to how GPU VirtIO native contexts
> work?).

The buffer is allocated by the GPU driver in response to an ioctl() made
by the userspace server process.  If the buffer needs to be accessed by
the guest CPU (not all do), it is mapped into part of an emulated PCI
BAR for access by the guest.  This mailing list thread is about making
that possible.

> Another question I have, are guest expected to have a single GPU
> buffer, or they can have multiple GPU buffers simultaneously
> allocated?

I believe there is only one emulated BAR, but this is very large (GiBs)
and sparsely populated.  There can be many GPU buffers mapped into the
BAR.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature

Re: [PATCH] MAINTAINERS: add me as scheduer maintainer

2024-06-14 Thread Dario Faggioli

On Mon, 2024-06-10 at 13:35 +0200, Jan Beulich wrote:
> George, Dario,
> 
> On 06.06.2024 07:47, Juergen Gross wrote:
> > I've been active in the scheduling code since many years now. Add
> > me as a maintainer.
> > 
> > Signed-off-by: Juergen Gross 
> > ---
> >  MAINTAINERS | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 6ba7d2765f..cc40c0be9d 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -490,6 +490,7 @@ F:  xen/common/sched/rt.c
> >  SCHEDULING
> >  M: George Dunlap 
> >  M: Dario Faggioli 
> > +M: Juergen Gross 
> >  S: Supported
> >  F: xen/common/sched/
> 
> no matter what get-maintainers.pl may say for changes here, I think
> it's
> largely on the two of you to approve this addition.
> 
Well, I for sure could not approve more, and I'm supper happy to
provide my:

Acked-by: Dario Faggioli 

And thanks a lot again Juergen for offering help! 

Regards,
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 3/7] x86/boot: Collect the Raw CPU Policy earlier on boot

On 23/05/2024 4:44 pm, Jan Beulich wrote:
> On 23.05.2024 13:16, Andrew Cooper wrote:
>> This is a tangle, but it's a small step in the right direction.
>>
>> xstate_init() is shortly going to want data from the Raw policy.
>> calculate_raw_cpu_policy() is sufficiently separate from the other policies 
>> to
>> be safe to do.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper 
> Would you mind taking a look at
> https://lists.xen.org/archives/html/xen-devel/2021-04/msg01335.html
> to make clear (to me at least) in how far we can perhaps find common grounds
> on what wants doing when? (Of course the local version I have has been
> constantly re-based, so some of the function names would have changed from
> what's visible there.)

That's been covered several times, at least in part.

I want to eventually move the host policy too, but I'm not willing to
compound the mess we've currently got just to do it earlier.  It's just
creating even more obstacles to doing it nicely.

Nothing in this series needs (or indeed should) use the host policy.

The same is true of your AMX series.  You're (correctly) breaking the
uniform allocation size and (when policy selection is ordered WRT vCPU
creation, as discussed) it becomes solely depend on the guest policy.

xsave.c really has no legitimate use for the host policy once the
uniform allocation size aspect has gone away.

>> --- a/xen/arch/x86/cpu-policy.c
>> +++ b/xen/arch/x86/cpu-policy.c
>> @@ -845,7 +845,6 @@ static void __init calculate_hvm_def_policy(void)
>>  
>>  void __init init_guest_cpu_policies(void)
>>  {
>> -calculate_raw_cpu_policy();
>>  calculate_host_policy();
>>  
>>  if ( IS_ENABLED(CONFIG_PV) )
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -1888,7 +1888,9 @@ void asmlinkage __init noreturn __start_xen(unsigned 
>> long mbi_p)
>>  
>>  tsx_init(); /* Needs microcode.  May change HLE/RTM feature bits. */
>>  
>> -identify_cpu(&boot_cpu_data);
>> +calculate_raw_cpu_policy(); /* Needs microcode.  No other dependenices. 
>> */
>> +
>> +identify_cpu(&boot_cpu_data); /* Needs microcode and raw policy. */
> You don't introduce any dependency on raw policy here, and there cannot 
> possibly
> have been such a dependency before (unless there was a bug somewhere). 
> Therefore
> I consider this latter comment misleading at this point.

It's made true by the next patch, and I'm not included to split the
comment across two patches which are going to be committed together in a
unit.

~Andrew

[linux-linus test] 186345: regressions - FAIL

flight 186345 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186345/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-build  fail in 186342 REGR. vs. 186314

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 186342

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked in 186342 n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked in 186342 n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked in 186342 n/a
 test-armhf-armhf-xl-raw   1 build-check(1)   blocked in 186342 n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked in 186342 n/a
 test-armhf-armhf-xl-qcow2 1 build-check(1)   blocked in 186342 n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked in 186342 n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked in 186342 n/a
 test-armhf-armhf-examine  1 build-check(1)   blocked in 186342 n/a
 build-armhf-libvirt   1 build-check(1)   blocked in 186342 n/a
 test-armhf-armhf-libvirt-vhd  1 build-check(1)   blocked in 186342 n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)  blocked in 186342 n/a
 test-armhf-armhf-xl-rtds  8 xen-boot fail  like 186314
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186314
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186314
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186314
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186314
 test-armhf-armhf-xl-arndale   8 xen-boot fail  like 186314
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186314
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186314
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd

Re: Design session notes: GPU acceleration in Xen

2024-06-14 Thread Demi Marie Obenour

On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote:
> On 13.06.2024 20:43, Demi Marie Obenour wrote:
> > GPU acceleration requires that pageable host memory be able to be mapped
> > into a guest.
> 
> I'm sure it was explained in the session, which sadly I couldn't attend.
> I've been asking Ray and Xenia the same before, but I'm afraid it still
> hasn't become clear to me why this is a _requirement_. After all that's
> against what we're doing elsewhere (i.e. so far it has always been
> guest memory that's mapped in the host). I can appreciate that it might
> be more difficult to implement, but avoiding to violate this fundamental
> (kind of) rule might be worth the price (and would avoid other
> complexities, of which there may be lurking more than what you enumerate
> below).

The GPU driver knows how to allocate buffers that are usable by the GPU.
On a discrete GPU, these buffers will generally be in VRAM, rather than
in system RAM, because access to system RAM requires going through the
PCI bus (slow).  However, VRAM is a limited resource, so the driver will
migrate pages between VRAM and system RAM as needed.  During the
migration, a guest that tries to access the pages must block until the
migration is complete.

Some GPU drivers support accessing externally provided memory.  This is
called "userptr", and is supported by i915 and amdgpu.  However, it
appears that some other drivers (such as MSM) do not support it, and
since GPUs with VRAM need to be supported anyway, Xen still needs to
support GPU driver-allocated memory.

I also CCd dri-de...@lists.freedesktop.org and the general GPU driver
maintainers in Linux in case they can give a better answer, as well as
Rob Clark who invented native contexts.

> >  This requires changes to all of the Xen hypervisor, Linux
> > kernel, and userspace device model.
> > 
> > ### Goals
> > 
> >  - Allow any userspace pages to be mapped into a guest.
> >  - Support deprivileged operation: this API must not be usable for 
> > privilege escalation.
> >  - Use MMU notifiers to ensure safety with respect to use-after-free.
> > 
> > ### Hypervisor changes
> > 
> > There are at least two Xen changes required:
> > 
> > 1. Add a new flag to IOREQ that means "retry this instruction".
> > 
> >An IOREQ server can set this flag after having successfully handled a
> >page fault.  It is expected that the IOREQ server has successfully
> >mapped a page into the guest at the location of the fault.
> >Otherwise, the same fault will likely happen again.
> 
> Were there any thoughts on how to prevent this becoming an infinite loop?
> I.e. how to (a) guarantee forward progress in the guest and (b) deal with
> misbehaving IOREQ servers?

Guaranteeing forward progress is up to the IOREQ server.  If the IOREQ
server misbehaves, an infinite loop is possible, but the CPU time used
by it should be charged to the IOREQ server, so this isn't a
vulnerability.

> > 2. Add support for `XEN_DOMCTL_memory_mapping` to use system RAM, not
> >just IOMEM.  Mappings made with `XEN_DOMCTL_memory_mapping` are
> >guaranteed to be able to be successfully revoked with
> >`XEN_DOMCTL_memory_mapping`, so all operations that would create
> >extra references to the mapped memory must be forbidden.  These
> >include, but may not be limited to:
> > 
> >1. Granting the pages to the same or other domains.
> >2. Mapping into another domain using `XEN_DOMCTL_memory_mapping`.
> >3. Another domain accessing the pages using the foreign memory APIs,
> >   unless it is privileged over the domain that owns the pages.
> 
> All of which may call for actually converting the memory to kind-of-MMIO,
> with a means to later convert it back.

Would this support the case where the mapping domain is not fully
priviliged, and where it might be a PV guest?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature

[xen-unstable test] 186353: tolerable FAIL - PUSHED

flight 186353 xen-unstable real [real]
flight 186355 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/186353/
http://logs.test-lab.xenproject.org/osstest/logs/186355/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-arndale  10 host-ping-check-xen fail pass in 186355-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-arndale 15 migrate-support-check fail in 186355 never pass
 test-armhf-armhf-xl-arndale 16 saverestore-support-check fail in 186355 never 
pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186344
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186344
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186344
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186344
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186344
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186344
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  8b4243a9b560c89bb259db5a27832c253d4bebc7
baseline version:
 xen  4fdd8d75566fdad06667a79ec0ce6f43cc466c54

Last test of basis   186344  2024-06-14 05:48:54 Z0 days
Testing same since   186353  2024-06-14 15:10:38 Z0 days1 attempts


People who touched revisions under test:
  Luca Fancellu 
  Penny Zheng 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64

[linux-linus test] 186354: tolerable FAIL - PUSHED

flight 186354 linux-linus real [real]
flight 186357 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/186354/
http://logs.test-lab.xenproject.org/osstest/logs/186357/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-qemuu-freebsd12-amd64 21 guest-start/freebsd.repeat fail pass 
in 186357-retest
 test-armhf-armhf-examine  8 reboot  fail pass in 186357-retest
 test-armhf-armhf-xl-multivcpu  8 xen-boot   fail pass in 186357-retest
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 186357-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check fail in 186357 never 
pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check fail in 186357 
never pass
 test-armhf-armhf-xl-raw   8 xen-boot fail  like 186314
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186314
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186314
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186314
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186314
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186314
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186314
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 linux0cac73eb3875f6ecb6105e533218dba1868d04c9
baseline version:
 linux2ef5971ff345d3c000873725db555085e0131961

Last test of basis   186314  2024-06-12 00:10:33 Z3 days
Failing since186324  2024-06-12 17:12:12 Z2 days5 attempts
Testing same since   186354  2024-06-14 18:42:32 Z0 days1 attempts


People who touched revisions under test:
  "Csókás, Bence" 
  Aleksandr Mishin 
  Andrei Vagin 
  Andy Shevchenko 
  Ard Biesheuvel 
  Borislav Petkov (A

[ovmf test] 186358: all pass - PUSHED