Re: [PATCH v3 0/4] x86/spec-ctrl: IPBP improvements

2023-01-25 Thread Jan Beulich
On 25.01.2023 18:49, Andrew Cooper wrote:
> On 25/01/2023 3:24 pm, Jan Beulich wrote:
>> Versions of the two final patches were submitted standalone earlier
>> on. The series here tries to carry out a suggestion from Andrew,
>> which the two of us have been discussing. Then said previously posted
>> patches are re-based on top, utilizing the new functionality.
>>
>> 1: spec-ctrl: add logic to issue IBPB on exit to guest
>> 2: spec-ctrl: defer context-switch IBPB until guest entry
>> 3: limit issuing of IBPB during context switch
>> 4: PV: issue branch prediction barrier when switching 64-bit guest to kernel 
>> mode
> 
> In the subject, you mean IBPB.  I think all the individual patches are fine.

Yes, I did notice the typo immediately after sending.

> Do you have an implementation of VMASST_TYPE_mode_switch_no_ibpb for
> Linux yet?  The thing I'd like to avoid is that we commit this perf it
> to Xen, without lining Linux up to be able to skip it.

No, I don't. I haven't even looked at where invoking this might be best placed.
Also I have to admit that it's not really clear to me what the criteria are
going to be for Linux to disable this, and whether perhaps finer grained
control might be needed (i.e. to turn it on/off dynamically under certain
conditions).

In any event this concern is only related to patch 4; I'd appreciate if at
least the earlier three patches wouldn't be blocked on there being something
on the Linux side. (In fact patch 3 ends up [still] being entirely independent
of the rest of the rework, unlike I think you were expecting it to be.)

Jan



Re: [PATCH v4 04/11] xen: extend domctl interface for cache coloring

2023-01-25 Thread Jan Beulich
On 24.01.2023 17:29, Jan Beulich wrote:
> On 23.01.2023 16:47, Carlo Nonato wrote:
>> @@ -92,6 +92,10 @@ struct xen_domctl_createdomain {
>>  /* CPU pool to use; specify 0 or a specific existing pool */
>>  uint32_t cpupool_id;
>>  
>> +/* IN LLC coloring parameters */
>> +uint32_t num_llc_colors;
>> +XEN_GUEST_HANDLE(uint32) llc_colors;
> 
> Despite your earlier replies I continue to be unconvinced that this
> is information which needs to be available right at domain_create.
> Without that you'd also get away without the sufficiently odd
> domain_create_llc_colored(). (Odd because: Think of two or three
> more extended features appearing, all of which want a special cased
> domain_create().)

And perhaps the real question is: Why do the two items need passing
to a special variant of domain_create() in the first place? The
necessary information already is passed to the normal function via
struct xen_domctl_createdomain. All it would take is to read the
array from guest space later, when struct domain was already
allocated and is hence available for storing the pointer. (Passing
the count separately is redundant in any event.)

Jan




Re: [QEMU][PATCH v4 07/10] hw/xen/xen-hvm-common: Use g_new and error_setg_errno

2023-01-25 Thread Frediano Ziglio
Il giorno mer 25 gen 2023 alle ore 22:07 Stefano Stabellini
 ha scritto:
>
> On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> > Replace g_malloc with g_new and perror with error_setg_errno.
> >

error_setg_errno -> error_report ?

Also in the title

> > Signed-off-by: Vikram Garhwal 

Frediano



[PATCH v5 4/4] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

2023-01-25 Thread Alexander Bulekov
This protects devices from bh->mmio reentrancy issues.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Alexander Bulekov 
---
 hw/9pfs/xen-9p-backend.c| 4 +++-
 hw/block/dataplane/virtio-blk.c | 3 ++-
 hw/block/dataplane/xen-block.c  | 5 +++--
 hw/block/virtio-blk.c   | 5 +++--
 hw/char/virtio-serial-bus.c | 3 ++-
 hw/display/qxl.c| 9 ++---
 hw/display/virtio-gpu.c | 6 --
 hw/ide/ahci.c   | 3 ++-
 hw/ide/core.c   | 3 ++-
 hw/misc/imx_rngc.c  | 6 --
 hw/misc/macio/mac_dbdma.c   | 2 +-
 hw/net/virtio-net.c | 3 ++-
 hw/nvme/ctrl.c  | 6 --
 hw/scsi/mptsas.c| 3 ++-
 hw/scsi/scsi-bus.c  | 3 ++-
 hw/scsi/vmw_pvscsi.c| 3 ++-
 hw/usb/dev-uas.c| 3 ++-
 hw/usb/hcd-dwc2.c   | 3 ++-
 hw/usb/hcd-ehci.c   | 3 ++-
 hw/usb/hcd-uhci.c   | 2 +-
 hw/usb/host-libusb.c| 6 --
 hw/usb/redirect.c   | 6 --
 hw/usb/xen-usb.c| 3 ++-
 hw/virtio/virtio-balloon.c  | 5 +++--
 hw/virtio/virtio-crypto.c   | 3 ++-
 25 files changed, 66 insertions(+), 35 deletions(-)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 65c4979c3c..f077c1b255 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -441,7 +441,9 @@ static int xen_9pfs_connect(struct XenLegacyDevice *xendev)
 xen_9pdev->rings[i].ring.out = xen_9pdev->rings[i].data +
XEN_FLEX_RING_SIZE(ring_order);
 
-xen_9pdev->rings[i].bh = qemu_bh_new(xen_9pfs_bh, 
_9pdev->rings[i]);
+xen_9pdev->rings[i].bh = qemu_bh_new_guarded(xen_9pfs_bh,
+ _9pdev->rings[i],
+ 
(xen_9pdev)->mem_reentrancy_guard);
 xen_9pdev->rings[i].out_cons = 0;
 xen_9pdev->rings[i].out_size = 0;
 xen_9pdev->rings[i].inprogress = false;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 26f965cabc..191a8c90aa 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -127,7 +127,8 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 } else {
 s->ctx = qemu_get_aio_context();
 }
-s->bh = aio_bh_new(s->ctx, notify_guest_bh, s);
+s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
+   (s)->mem_reentrancy_guard);
 s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
 *dataplane = s;
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 2785b9e849..e31806b317 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -632,8 +632,9 @@ XenBlockDataPlane *xen_block_dataplane_create(XenDevice 
*xendev,
 } else {
 dataplane->ctx = qemu_get_aio_context();
 }
-dataplane->bh = aio_bh_new(dataplane->ctx, xen_block_dataplane_bh,
-   dataplane);
+dataplane->bh = aio_bh_new_guarded(dataplane->ctx, xen_block_dataplane_bh,
+   dataplane,
+   (xendev)->mem_reentrancy_guard);
 
 return dataplane;
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f717550fdc..e9f516e633 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -866,8 +866,9 @@ static void virtio_blk_dma_restart_cb(void *opaque, bool 
running,
  * requests will be processed while starting the data plane.
  */
 if (!s->bh && !virtio_bus_ioeventfd_enabled(bus)) {
-s->bh = aio_bh_new(blk_get_aio_context(s->conf.conf.blk),
-   virtio_blk_dma_restart_bh, s);
+s->bh = aio_bh_new_guarded(blk_get_aio_context(s->conf.conf.blk),
+   virtio_blk_dma_restart_bh, s,
+   (s)->mem_reentrancy_guard);
 blk_inc_in_flight(s->conf.conf.blk);
 qemu_bh_schedule(s->bh);
 }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7d4601cb5d..dd619f0731 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -985,7 +985,8 @@ static void virtser_port_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-port->bh = qemu_bh_new(flush_queued_data_bh, port);
+port->bh = qemu_bh_new_guarded(flush_queued_data_bh, port,
+   >mem_reentrancy_guard);
 port->elem = NULL;
 }
 
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 6772849dec..67efa3c3ef 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2223,11 +2223,14 @@ static void qxl_realize_common(PCIQXLDevice *qxl, Error 
**errp)
 
 qemu_add_vm_change_state_handler(qxl_vm_change_state_handler, qxl);
 
-qxl->update_irq = qemu_bh_new(qxl_update_irq_bh, 

[xen-unstable test] 176132: regressions - FAIL

2023-01-25 Thread osstest service owner
flight 176132 xen-unstable real [real]
flight 176138 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/176132/
http://logs.test-lab.xenproject.org/osstest/logs/176138/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-coresched-i386-xl 18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl-xsm   18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl   18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-pair  26 guest-migrate/src_host/dst_host fail REGR. vs. 175994
 test-amd64-i386-xl-vhd   17 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl-shadow18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-libvirt-pair 26 guest-migrate/src_host/dst_host fail REGR. vs. 
175994

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail pass in 
176138-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 175987
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 175987
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 175987
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 175994
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 175994
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 175994
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 175994
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 175994
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 175994
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 175994
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 175994
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 175994
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check

[PATCH] tools/python: change 's#' size type for Python >= 3.10

2023-01-25 Thread Marek Marczykowski-Górecki
Python < 3.10 by default uses 'int' type for data+size string types
(s#), unless PY_SSIZE_T_CLEAN is defined - in which case it uses
Py_ssize_t. The former behavior was removed in Python 3.10 and now it's
required to define PY_SSIZE_T_CLEAN before including Python.h, and using
Py_ssize_t for the length argument. The PY_SSIZE_T_CLEAN behavior is
supported since Python 2.5.

Adjust bindings accordingly.

Signed-off-by: Marek Marczykowski-Górecki 
---
 tools/python/xen/lowlevel/xc/xc.c | 3 ++-
 tools/python/xen/lowlevel/xs/xs.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/python/xen/lowlevel/xc/xc.c 
b/tools/python/xen/lowlevel/xc/xc.c
index fd008610329b..cfb2734a992b 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -4,6 +4,7 @@
  * Copyright (c) 2003-2004, K A Fraser (University of Cambridge)
  */
 
+#define PY_SSIZE_T_CLEAN
 #include 
 #define XC_WANT_COMPAT_MAP_FOREIGN_API
 #include 
@@ -1774,7 +1775,7 @@ static PyObject *pyflask_load(PyObject *self, PyObject 
*args, PyObject *kwds)
 {
 xc_interface *xc_handle;
 char *policy;
-uint32_t len;
+Py_ssize_t len;
 int ret;
 
 static char *kwd_list[] = { "policy", NULL };
diff --git a/tools/python/xen/lowlevel/xs/xs.c 
b/tools/python/xen/lowlevel/xs/xs.c
index 0dad7fa5f2fc..3ba5a8b893d9 100644
--- a/tools/python/xen/lowlevel/xs/xs.c
+++ b/tools/python/xen/lowlevel/xs/xs.c
@@ -18,6 +18,7 @@
  * Copyright (C) 2005 XenSource Ltd.
  */
 
+#define PY_SSIZE_T_CLEAN
 #include 
 
 #include 
@@ -141,7 +142,7 @@ static PyObject *xspy_write(XsHandle *self, PyObject *args)
 char *thstr;
 char *path;
 char *data;
-int data_n;
+Py_ssize_t data_n;
 bool result;
 
 if (!xh)
-- 
2.37.3




Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Matthew Wilcox
On Wed, Jan 25, 2023 at 08:49:50AM -0800, Suren Baghdasaryan wrote:
> On Wed, Jan 25, 2023 at 1:10 AM Peter Zijlstra  wrote:
> > > + /*
> > > +  * Flags, see mm.h.
> > > +  * WARNING! Do not modify directly.
> > > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > > +  */
> > > + unsigned long vm_flags;
> >
> > We have __private and ACCESS_PRIVATE() to help with enforcing this.
> 
> Thanks for pointing this out, Peter! I guess for that I'll need to
> convert all read accesses and provide get_vm_flags() too? That will
> cause some additional churt (a quick search shows 801 hits over 248
> files) but maybe it's worth it? I think Michal suggested that too in
> another patch. Should I do that while we are at it?

Here's a trick I saw somewhere in the VFS:

union {
const vm_flags_t vm_flags;
vm_flags_t __private __vm_flags;
};

Now it can be read by anybody but written only by those using
ACCESS_PRIVATE.



Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 10:33 AM Matthew Wilcox  wrote:
>
> On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:
> > +/* Use when VMA is not part of the VMA tree and needs no locking */
> > +static inline void init_vm_flags(struct vm_area_struct *vma,
> > +  unsigned long flags)
> > +{
> > + vma->vm_flags = flags;
>
> vm_flags are supposed to have type vm_flags_t.  That's not been
> fully realised yet, but perhaps we could avoid making it worse?
>
> >   pgprot_t vm_page_prot;
> > - unsigned long vm_flags; /* Flags, see mm.h. */
> > +
> > + /*
> > +  * Flags, see mm.h.
> > +  * WARNING! Do not modify directly.
> > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > +  */
> > + unsigned long vm_flags;
>
> Including changing this line to vm_flags_t

Good point. Will make the change. Thanks!



Re: [PATCH v2 5/6] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 1:42 AM Michal Hocko  wrote:
>
> On Wed 25-01-23 00:38:50, Suren Baghdasaryan wrote:
> > In cases when VMA flags are modified after VMA was isolated and mmap_lock
> > was downgraded, flags modifications would result in an assertion because
> > mmap write lock is not held.
> > Introduce mod_vm_flags_nolock to be used in such situation.
> > Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
> > flags modification and to avoid assertion.
>
> The changelog nor the documentation of mod_vm_flags_nolock
> really explain when it is safe to use it. This is really important for
> future potential users.

True. I'll add clarification in the comments and in the changelog. Thanks!

>
> > Signed-off-by: Suren Baghdasaryan 
> > ---
> >  arch/x86/mm/pat/memtype.c | 10 +++---
> >  include/linux/mm.h| 12 +---
> >  include/linux/pgtable.h   |  5 +++--
> >  mm/memory.c   | 13 +++--
> >  mm/memremap.c |  4 ++--
> >  mm/mmap.c | 16 ++--
> >  6 files changed, 38 insertions(+), 22 deletions(-)
> >
> > diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> > index ae9645c900fa..d8adc0b42cf2 100644
> > --- a/arch/x86/mm/pat/memtype.c
> > +++ b/arch/x86/mm/pat/memtype.c
> > @@ -1046,7 +1046,7 @@ void track_pfn_insert(struct vm_area_struct *vma, 
> > pgprot_t *prot, pfn_t pfn)
> >   * can be for the entire vma (in which case pfn, size are zero).
> >   */
> >  void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> > -  unsigned long size)
> > +  unsigned long size, bool mm_wr_locked)
> >  {
> >   resource_size_t paddr;
> >   unsigned long prot;
> > @@ -1065,8 +1065,12 @@ void untrack_pfn(struct vm_area_struct *vma, 
> > unsigned long pfn,
> >   size = vma->vm_end - vma->vm_start;
> >   }
> >   free_pfn_range(paddr, size);
> > - if (vma)
> > - clear_vm_flags(vma, VM_PAT);
> > + if (vma) {
> > + if (mm_wr_locked)
> > + clear_vm_flags(vma, VM_PAT);
> > + else
> > + mod_vm_flags_nolock(vma, 0, VM_PAT);
> > + }
> >  }
> >
> >  /*
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 55335edd1373..48d49930c411 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -656,12 +656,18 @@ static inline void clear_vm_flags(struct 
> > vm_area_struct *vma,
> >   vma->vm_flags &= ~flags;
> >  }
> >
> > +static inline void mod_vm_flags_nolock(struct vm_area_struct *vma,
> > +unsigned long set, unsigned long clear)
> > +{
> > + vma->vm_flags |= set;
> > + vma->vm_flags &= ~clear;
> > +}
> > +
> >  static inline void mod_vm_flags(struct vm_area_struct *vma,
> >   unsigned long set, unsigned long clear)
> >  {
> >   mmap_assert_write_locked(vma->vm_mm);
> > - vma->vm_flags |= set;
> > - vma->vm_flags &= ~clear;
> > + mod_vm_flags_nolock(vma, set, clear);
> >  }
> >
> >  static inline void vma_set_anonymous(struct vm_area_struct *vma)
> > @@ -2087,7 +2093,7 @@ static inline void zap_vma_pages(struct 
> > vm_area_struct *vma)
> >  }
> >  void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
> >   struct vm_area_struct *start_vma, unsigned long start,
> > - unsigned long end);
> > + unsigned long end, bool mm_wr_locked);
> >
> >  struct mmu_notifier_range;
> >
> > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> > index 5fd45454c073..c63cd44777ec 100644
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -1185,7 +1185,8 @@ static inline int track_pfn_copy(struct 
> > vm_area_struct *vma)
> >   * can be for the entire vma (in which case pfn, size are zero).
> >   */
> >  static inline void untrack_pfn(struct vm_area_struct *vma,
> > -unsigned long pfn, unsigned long size)
> > +unsigned long pfn, unsigned long size,
> > +bool mm_wr_locked)
> >  {
> >  }
> >
> > @@ -1203,7 +1204,7 @@ extern void track_pfn_insert(struct vm_area_struct 
> > *vma, pgprot_t *prot,
> >pfn_t pfn);
> >  extern int track_pfn_copy(struct vm_area_struct *vma);
> >  extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> > - unsigned long size);
> > + unsigned long size, bool mm_wr_locked);
> >  extern void untrack_pfn_moved(struct vm_area_struct *vma);
> >  #endif
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index d6902065e558..5b11b50e2c4a 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1613,7 +1613,7 @@ void unmap_page_range(struct mmu_gather *tlb,
> >  static void unmap_single_vma(struct mmu_gather *tlb,
> >   struct vm_area_struct *vma, unsigned long start_addr,
> >   unsigned long 

Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Matthew Wilcox
On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:
> +/* Use when VMA is not part of the VMA tree and needs no locking */
> +static inline void init_vm_flags(struct vm_area_struct *vma,
> +  unsigned long flags)
> +{
> + vma->vm_flags = flags;

vm_flags are supposed to have type vm_flags_t.  That's not been
fully realised yet, but perhaps we could avoid making it worse?

>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * WARNING! Do not modify directly.
> +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> +  */
> + unsigned long vm_flags;

Including changing this line to vm_flags_t



Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 1:10 AM Peter Zijlstra  wrote:
>
> On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:
>
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 2d6d790d9bed..6c7c70bf50dd 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -491,7 +491,13 @@ struct vm_area_struct {
> >* See vmf_insert_mixed_prot() for discussion.
> >*/
> >   pgprot_t vm_page_prot;
> > - unsigned long vm_flags; /* Flags, see mm.h. */
> > +
> > + /*
> > +  * Flags, see mm.h.
> > +  * WARNING! Do not modify directly.
> > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > +  */
> > + unsigned long vm_flags;
>
> We have __private and ACCESS_PRIVATE() to help with enforcing this.

Thanks for pointing this out, Peter! I guess for that I'll need to
convert all read accesses and provide get_vm_flags() too? That will
cause some additional churt (a quick search shows 801 hits over 248
files) but maybe it's worth it? I think Michal suggested that too in
another patch. Should I do that while we are at it?

>



Re: [PATCH v2 3/6] mm: replace vma->vm_flags direct modifications with modifier calls

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 1:30 AM 'Michal Hocko' via kernel-team
 wrote:
>
> On Wed 25-01-23 00:38:48, Suren Baghdasaryan wrote:
> > Replace direct modifications to vma->vm_flags with calls to modifier
> > functions to be able to track flag changes and to keep vma locking
> > correctness.
>
> Is this a manual (git grep) based work or have you used Coccinele for
> the patch generation?

It was a manual "search and replace" and in the process I temporarily
renamed vm_flags to ensure I did not miss any usage.

>
> My potentially incomplete check
> $ git grep ">[[:space:]]*vm_flags[[:space:]]*[&|^]="
>
> shows that nothing should be left after this. There is still quite a lot
> of direct checks of the flags (more than 600). Maybe it would be good to
> make flags accessible only via accessors which would also prevent any
> future direct setting of those flags in uncontrolled way as well.

Yes, I think Peter's suggestion in the first patch would also require
that. Much more churn but probably worth it for the future
maintenance. I'll add a patch which converts all readers as well.

>
> Anyway
> Acked-by: Michal Hocko 

Thanks for all the reviews!

> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
>



Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 9:08 AM Michal Hocko  wrote:
>
> On Wed 25-01-23 08:57:48, Suren Baghdasaryan wrote:
> > On Wed, Jan 25, 2023 at 1:38 AM 'Michal Hocko' via kernel-team
> >  wrote:
> > >
> > > On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote:
> > > > Replace indirect modifications to vma->vm_flags with calls to modifier
> > > > functions to be able to track flag changes and to keep vma locking
> > > > correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
> > > > vm_flags modification attempts.
> > >
> > > Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I
> > > gueess we should be willing to trust it.
> >
> > Yes, but I really want to prevent an indirect misuse since it was not
> > easy to find these. If you feel strongly about it I will remove them
> > or if you have a better suggestion I'm all for it.
>
> You can avoid that by making flags inaccesible directly, right?

Ah, you mean Peter's suggestion of using __private? I guess that would
cover it. I'll drop these BUG_ONs in the next version. Thanks!

>
> --
> Michal Hocko
> SUSE Labs



Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 10:37 AM Matthew Wilcox  wrote:
>
> On Wed, Jan 25, 2023 at 08:49:50AM -0800, Suren Baghdasaryan wrote:
> > On Wed, Jan 25, 2023 at 1:10 AM Peter Zijlstra  wrote:
> > > > + /*
> > > > +  * Flags, see mm.h.
> > > > +  * WARNING! Do not modify directly.
> > > > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > > > +  */
> > > > + unsigned long vm_flags;
> > >
> > > We have __private and ACCESS_PRIVATE() to help with enforcing this.
> >
> > Thanks for pointing this out, Peter! I guess for that I'll need to
> > convert all read accesses and provide get_vm_flags() too? That will
> > cause some additional churt (a quick search shows 801 hits over 248
> > files) but maybe it's worth it? I think Michal suggested that too in
> > another patch. Should I do that while we are at it?
>
> Here's a trick I saw somewhere in the VFS:
>
> union {
> const vm_flags_t vm_flags;
> vm_flags_t __private __vm_flags;
> };
>
> Now it can be read by anybody but written only by those using
> ACCESS_PRIVATE.

Huh, this is quite nice! I think it does not save us from the cases
when vma->vm_flags is passed by a reference and modified indirectly,
like in ksm_madvise()? Though maybe such usecases are so rare (I found
only 2 cases) that we can ignore this?



Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Suren Baghdasaryan
On Wed, Jan 25, 2023 at 1:38 AM 'Michal Hocko' via kernel-team
 wrote:
>
> On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote:
> > Replace indirect modifications to vma->vm_flags with calls to modifier
> > functions to be able to track flag changes and to keep vma locking
> > correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
> > vm_flags modification attempts.
>
> Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I
> gueess we should be willing to trust it.

Yes, but I really want to prevent an indirect misuse since it was not
easy to find these. If you feel strongly about it I will remove them
or if you have a better suggestion I'm all for it.

>
> > Signed-off-by: Suren Baghdasaryan 
>
> Acked-by: Michal Hocko 
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kernel-team+unsubscr...@android.com.
>



Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 08:57:48, Suren Baghdasaryan wrote:
> On Wed, Jan 25, 2023 at 1:38 AM 'Michal Hocko' via kernel-team
>  wrote:
> >
> > On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote:
> > > Replace indirect modifications to vma->vm_flags with calls to modifier
> > > functions to be able to track flag changes and to keep vma locking
> > > correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
> > > vm_flags modification attempts.
> >
> > Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I
> > gueess we should be willing to trust it.
> 
> Yes, but I really want to prevent an indirect misuse since it was not
> easy to find these. If you feel strongly about it I will remove them
> or if you have a better suggestion I'm all for it.

You can avoid that by making flags inaccesible directly, right?

-- 
Michal Hocko
SUSE Labs



[RFC PATCH 7/7] xen/blkback: Inform userspace that device has been opened

2023-01-25 Thread Demi Marie Obenour
This allows userspace to use block devices with delete-on-close
behavior, which is necessary to ensure virtual devices (such as loop or
device-mapper devices) are cleaned up automatically.  Protocol details
are included in comments.

Signed-off-by: Demi Marie Obenour 
---
 drivers/block/xen-blkback/xenbus.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 
2c43bfc7ab5ba6954f11d4b949a5668660dbd290..ca8dae05985038da490c5ac93364509913f6b4c7
 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -3,6 +3,19 @@
 Copyright (C) 2005 Rusty Russell 
 Copyright (C) 2005 XenSource Ltd
 
+In addition to the XenStore nodes required by the Xen block device
+specification, this implementation of blkback uses a new XenStore
+node: "opened".  blkback sets "opened" to "0" before the hotplug script
+is called.  Once the device node has been opened, blkback sets "opened"
+to "1".
+
+"opened" is used exclusively by userspace.  It serves two purposes:
+
+1. It tells userspace that diskseq@major:minor syntax for "physical-device" is
+   supported.
+2. It tells userspace that it can wait for "opened" to be set to 1.  Once
+   "opened" is 1, blkback has a reference to the device, so userspace doesn't
+   need to keep one.
 
 */
 
@@ -698,6 +711,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
if (err)
pr_warn("%s write out 'max-ring-page-order' failed\n", 
__func__);
 
+   /*
+* This informs userspace that the "opened" node will be set to "1" when
+* the device has been opened successfully.
+*/
+   err = xenbus_write(XBT_NIL, dev->nodename, "opened", "0");
+   if (err)
+   goto fail;
+
err = xenbus_switch_state(dev, XenbusStateInitWait);
if (err)
goto fail;
@@ -824,6 +845,19 @@ static void backend_changed(struct xenbus_watch *watch,
goto fail;
}
 
+   /*
+* Tell userspace that the device has been opened and that blkback has a
+* reference to it.  Userspace can then close the device or mark it as
+* delete-on-close, knowing that blkback will keep the device open as
+* long as necessary.
+*/
+   err = xenbus_write(XBT_NIL, dev->nodename, "opened", "1");
+   if (err) {
+   xenbus_dev_fatal(dev, err, "%s: notifying userspace device has 
been opened",
+dev->nodename);
+   goto free_vbd;
+   }
+
err = xenvbd_sysfs_addif(dev);
if (err) {
xenbus_dev_fatal(dev, err, "creating sysfs entries");
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab




[RFC PATCH 6/7] Minor blkback cleanups

2023-01-25 Thread Demi Marie Obenour
No functional change intended.

Signed-off-by: Demi Marie Obenour 
---
 drivers/block/xen-blkback/blkback.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 
a5cf7f1e871c7f9ff397ab8ff1d7b9e3db686659..8a49cbe81d8895f89371bdf50d1b445c088c9b6a
 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1238,6 +1238,8 @@ static int dispatch_rw_block_io(struct xen_blkif_ring 
*ring,
nseg = req->operation == BLKIF_OP_INDIRECT ?
   req->u.indirect.nr_segments : req->u.rw.nr_segments;
 
+   BUILD_BUG_ON(offsetof(struct blkif_request, u.rw.id) != 8);
+   BUILD_BUG_ON(offsetof(struct blkif_request, u.indirect.id) != 8);
if (unlikely(nseg == 0 && operation_flags != REQ_PREFLUSH) ||
unlikely((req->operation != BLKIF_OP_INDIRECT) &&
 (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) ||
@@ -1261,13 +1263,13 @@ static int dispatch_rw_block_io(struct xen_blkif_ring 
*ring,
preq.sector_number = req->u.rw.sector_number;
for (i = 0; i < nseg; i++) {
pages[i]->gref = req->u.rw.seg[i].gref;
-   seg[i].nsec = req->u.rw.seg[i].last_sect -
-   req->u.rw.seg[i].first_sect + 1;
-   seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
if ((req->u.rw.seg[i].last_sect >= (XEN_PAGE_SIZE >> 
9)) ||
(req->u.rw.seg[i].last_sect <
 req->u.rw.seg[i].first_sect))
goto fail_response;
+   seg[i].nsec = req->u.rw.seg[i].last_sect -
+   req->u.rw.seg[i].first_sect + 1;
+   seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
preq.nr_sects += seg[i].nsec;
}
} else {
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab




[RFC PATCH 0/7] Allow race-free block device handling

2023-01-25 Thread Demi Marie Obenour
This work aims to allow userspace to create and destroy block devices
in a race-free and leak-free way, and to allow them to be exposed to
other Xen VMs via blkback without leaks or races.  It’s marked as RFC
for a few reasons:

- The code has been only lightly tested.  It might be unstable or
  insecure.

- The DM_DEV_CREATE ioctl gains a new flag.  Unknown flags were
  previously ignored, so this could theoretically break buggy userspace
  tools.

- I have no idea if I got the block device reference counting and
  locking correct.

Demi Marie Obenour (7):
  block: Support creating a struct file from a block device
  Allow userspace to get an FD to a newly-created DM device
  Implement diskseq checks in blkback
  Increment diskseq when releasing a loop device
  If autoclear is set, delete a no-longer-used loop device
  Minor blkback cleanups
  xen/blkback: Inform userspace that device has been opened

 block/bdev.c|  77 +++--
 block/genhd.c   |   1 +
 drivers/block/loop.c|  17 ++-
 drivers/block/xen-blkback/blkback.c |   8 +-
 drivers/block/xen-blkback/xenbus.c  | 171 ++--
 drivers/md/dm-ioctl.c   |  67 +--
 include/linux/blkdev.h  |   5 +
 include/uapi/linux/dm-ioctl.h   |  16 ++-
 8 files changed, 298 insertions(+), 64 deletions(-)

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab



[RFC PATCH 3/7] Implement diskseq checks in blkback

2023-01-25 Thread Demi Marie Obenour
From: Demi Marie Obenour 

This allows specifying a disk sequence number in XenStore.  If it does
not match the disk sequence number of the underlying device, the device
will not be exported and a warning will be logged.  Userspace can use
this to eliminate race conditions due to major/minor number reuse.
Older kernels will ignore this, so it is safe for userspace to set it
unconditionally.

This also makes physical-device parsing stricter.  I do not believe this
will break any extant userspace tools.

Signed-off-by: Demi Marie Obenour 
---
 drivers/block/xen-blkback/xenbus.c | 137 +
 1 file changed, 100 insertions(+), 37 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 
4807af1d58059394d7a992335dabaf2bc3901721..2c43bfc7ab5ba6954f11d4b949a5668660dbd290
 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -24,6 +24,7 @@ struct backend_info {
struct xenbus_watch backend_watch;
unsignedmajor;
unsignedminor;
+   unsigned long long  diskseq;
char*mode;
 };
 
@@ -479,7 +480,7 @@ static void xen_vbd_free(struct xen_vbd *vbd)
 
 static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
  unsigned major, unsigned minor, int readonly,
- int cdrom)
+ bool cdrom, u64 diskseq)
 {
struct xen_vbd *vbd;
struct block_device *bdev;
@@ -507,6 +508,25 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
xen_vbd_free(vbd);
return -ENOENT;
}
+
+   if (diskseq) {
+   struct gendisk *disk = bdev->bd_disk;
+   if (unlikely(disk == NULL)) {
+   pr_err("xen_vbd_create: device %08x has no gendisk\n",
+  vbd->pdevice);
+   xen_vbd_free(vbd);
+   return -EFAULT;
+   }
+
+   if (unlikely(disk->diskseq != diskseq)) {
+   pr_warn("xen_vbd_create: device %08x has incorrect 
sequence "
+   "number 0x%llx (expected 0x%llx)\n",
+   vbd->pdevice, disk->diskseq, diskseq);
+   xen_vbd_free(vbd);
+   return -ENODEV;
+   }
+   }
+
vbd->size = vbd_sz(vbd);
 
if (cdrom || disk_to_cdi(vbd->bdev->bd_disk))
@@ -690,6 +710,55 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return err;
 }
 
+static bool read_physical_device(struct xenbus_device *dev,
+unsigned long long *diskseq,
+unsigned *major, unsigned *minor)
+{
+   char *physical_device, *problem;
+   int i, physical_device_length;
+   char junk;
+
+   physical_device = xenbus_read(XBT_NIL, dev->nodename, "physical-device",
+ _device_length);
+
+   if (IS_ERR(physical_device)) {
+   int err = PTR_ERR(physical_device);
+   /*
+* Since this watch will fire once immediately after it is
+* registered, we expect "does not exist" errors.  Ignore
+* them and wait for the hotplug scripts.
+*/
+   if (unlikely(!XENBUS_EXIST_ERR(err)))
+   xenbus_dev_fatal(dev, err, "reading physical-device");
+   return false;
+   }
+
+   for (i = 0; i < physical_device_length; ++i)
+   if (unlikely(physical_device[i] <= 0x20 || physical_device[i] 
>= 0x7F)) {
+   problem = "bad byte in physical-device";
+   goto fail;
+   }
+
+   if (sscanf(physical_device, "%16llx@%8x:%8x%c",
+  diskseq, major, minor, ) == 3) {
+   if (*diskseq == 0) {
+   problem = "diskseq 0 is invalid";
+   goto fail;
+   }
+   } else if (sscanf(physical_device, "%8x:%8x%c", major, minor, ) == 
2) {
+   *diskseq = 0;
+   } else {
+   problem = "invalid physical-device";
+   goto fail;
+   }
+   kfree(physical_device);
+   return true;
+fail:
+   kfree(physical_device);
+   xenbus_dev_fatal(dev, -EINVAL, problem);
+   return false;
+}
+
 /*
  * Callback received when the hotplug scripts have placed the physical-device
  * node.  Read it and the mode node, and create a vbd.  If the frontend is
@@ -707,28 +776,17 @@ static void backend_changed(struct xenbus_watch *watch,
int cdrom = 0;
unsigned long handle;
char *device_type;
+   unsigned long long diskseq;
 
pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
-
-   err = xenbus_scanf(XBT_NIL, dev->nodename, 

Re: [QEMU][PATCH v4 09/10] hw/arm: introduce xenpvh machine

2023-01-25 Thread Vikram Garhwal

Hi Stefano,

On 1/25/23 2:20 PM, Stefano Stabellini wrote:

On Wed, 25 Jan 2023, Vikram Garhwal wrote:

Add a new machine xenpvh which creates a IOREQ server to register/connect with
Xen Hypervisor.

Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
TPM emulator and connects to swtpm running on host machine via chardev socket
and support TPM functionalities for a guest domain.

Extra command line for aarch64 xenpvh QEMU to connect to swtpm:
 -chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
 -tpmdev emulator,id=tpm0,chardev=chrtpm \
 -machine tpm-base-addr=0x0c00 \

swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
provides access to TPM functionality over socket, chardev and CUSE interface.
Github repo: https://github.com/stefanberger/swtpm
Example for starting swtpm on host machine:
 mkdir /tmp/vtpm2
 swtpm socket --tpmstate dir=/tmp/vtpm2 \
 --ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
  docs/system/arm/xenpvh.rst|  34 +++
  docs/system/target-arm.rst|   1 +
  hw/arm/meson.build|   2 +
  hw/arm/xen_arm.c  | 184 ++
  include/hw/arm/xen_arch_hvm.h |   9 ++
  include/hw/xen/arch_hvm.h |   2 +
  6 files changed, 232 insertions(+)
  create mode 100644 docs/system/arm/xenpvh.rst
  create mode 100644 hw/arm/xen_arm.c
  create mode 100644 include/hw/arm/xen_arch_hvm.h

diff --git a/docs/system/arm/xenpvh.rst b/docs/system/arm/xenpvh.rst
new file mode 100644
index 00..e1655c7ab8
--- /dev/null
+++ b/docs/system/arm/xenpvh.rst
@@ -0,0 +1,34 @@
+XENPVH (``xenpvh``)
+=
+This machine creates a IOREQ server to register/connect with Xen Hypervisor.
+
+When TPM is enabled, this machine also creates a tpm-tis-device at a user input
+tpm base address, adds a TPM emulator and connects to a swtpm application
+running on host machine via chardev socket. This enables xenpvh to support TPM
+functionalities for a guest domain.
+
+More information about TPM use and installing swtpm linux application can be
+found at: docs/specs/tpm.rst.
+
+Example for starting swtpm on host machine:
+.. code-block:: console
+
+mkdir /tmp/vtpm2
+swtpm socket --tpmstate dir=/tmp/vtpm2 \
+--ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &
+
+Sample QEMU xenpvh commands for running and connecting with Xen:
+.. code-block:: console
+
+qemu-system-aarch64 -xen-domid 1 \
+-chardev socket,id=libxl-cmd,path=qmp-libxl-1,server=on,wait=off \
+-mon chardev=libxl-cmd,mode=control \
+-chardev socket,id=libxenstat-cmd,path=qmp-libxenstat-1,server=on,wait=off 
\
+-mon chardev=libxenstat-cmd,mode=control \
+-xen-attach -name guest0 -vnc none -display none -nographic \
+-machine xenpvh -m 1301 \
+-chardev socket,id=chrtpm,path=tmp/vtpm2/swtpm-sock \
+-tpmdev emulator,id=tpm0,chardev=chrtpm -machine tpm-base-addr=0x0C00
+
+In above QEMU command, last two lines are for connecting xenpvh QEMU to swtpm
+via chardev socket.
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index 91ebc26c6d..af8d7c77d6 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -106,6 +106,7 @@ undocumented; you can get a complete list by running
 arm/stm32
 arm/virt
 arm/xlnx-versal-virt
+   arm/xenpvh
  
  Emulated CPU architecture support

  =
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index b036045603..06bddbfbb8 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -61,6 +61,8 @@ arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: 
files('fsl-imx7.c', 'mcimx7d-sabre.
  arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
  arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 
'mcimx6ul-evk.c'))
  arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
+arm_ss.add(when: 'CONFIG_XEN', if_true: files('xen_arm.c'))
+arm_ss.add_all(xen_ss)
  
  softmmu_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmu-common.c'))

  softmmu_ss.add(when: 'CONFIG_EXYNOS4', if_true: files('exynos4_boards.c'))
diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
new file mode 100644
index 00..12b19e3609
--- /dev/null
+++ b/hw/arm/xen_arm.c
@@ -0,0 +1,184 @@
+/*
+ * QEMU ARM Xen PV Machine

^ PVH



+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall 

Re: [QEMU][PATCH v4 01/10] hw/i386/xen/: move xen-mapcache.c to hw/xen/

2023-01-25 Thread Vikram Garhwal

Hi Philippe,

On 1/25/23 2:59 PM, Philippe Mathieu-Daudé wrote:

On 25/1/23 09:53, Vikram Garhwal wrote:
xen-mapcache.c contains common functions which can be used for 
enabling Xen on
aarch64 with IOREQ handling. Moving it out from hw/i386/xen to hw/xen 
to make it

accessible for both aarch64 and x86.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
  hw/i386/meson.build  | 1 +
  hw/i386/xen/meson.build  | 1 -
  hw/i386/xen/trace-events | 5 -
  hw/xen/meson.build   | 4 
  hw/xen/trace-events  | 5 +
  hw/{i386 => }/xen/xen-mapcache.c | 0
  6 files changed, 10 insertions(+), 6 deletions(-)
  rename hw/{i386 => }/xen/xen-mapcache.c (100%)

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 213e2e82b3..cfdbfdcbcb 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -33,5 +33,6 @@ subdir('kvm')
  subdir('xen')
    i386_ss.add_all(xenpv_ss)
+i386_ss.add_all(xen_ss)
    hw_arch += {'i386': i386_ss}
diff --git a/hw/i386/xen/meson.build b/hw/i386/xen/meson.build
index be84130300..2fcc46e6ca 100644
--- a/hw/i386/xen/meson.build
+++ b/hw/i386/xen/meson.build
@@ -1,6 +1,5 @@
  i386_ss.add(when: 'CONFIG_XEN', if_true: files(
    'xen-hvm.c',
-  'xen-mapcache.c',
    'xen_apic.c',
    'xen_platform.c',
    'xen_pvdevice.c',
diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index 5d6be61090..a0c89d91c4 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -21,8 +21,3 @@ xen_map_resource_ioreq(uint32_t id, void *addr) 
"id: %u addr: %p"
  cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, 
uint32_t size, uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u 
data=0x%x"
  cpu_ioreq_config_write(void *req, uint32_t sbdf, uint32_t reg, 
uint32_t size, uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u 
data=0x%x"

  -# xen-mapcache.c
-xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
-xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
-xen_map_cache_return(void* ptr) "%p"
-
diff --git a/hw/xen/meson.build b/hw/xen/meson.build
index ae0ace3046..19d0637c46 100644
--- a/hw/xen/meson.build
+++ b/hw/xen/meson.build
@@ -22,3 +22,7 @@ else
  endif
    specific_ss.add_all(when: ['CONFIG_XEN', xen], if_true: 
xen_specific_ss)

+
+xen_ss = ss.source_set()
+
+xen_ss.add(when: 'CONFIG_XEN', if_true: files('xen-mapcache.c'))


Can't we add it to softmmu_ss directly?

I tried adding this in softmmu_ss as per your comment in v2. But it 
fails with following error:
//mnt/qemu_ioreq_upstream/include/sysemu/xen-mapcache.h:16:8: error: 
attempt to use poisoned "CONFIG_XEN"//

// #ifdef CONFIG_XEN//
//    ^//
//../hw/xen/xen-mapcache.c:106:6: error: redefinition of 
'xen_map_cache_init'//

/

/ void xen_map_cache_init(phys_offset_to_gaddr_t f, void *opaque)/

I couldn't fix it in easy way.


diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index 3da3fd8348..2c8f238f42 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -41,3 +41,8 @@ xs_node_vprintf(char *path, char *value) "%s %s"
  xs_node_vscanf(char *path, char *value) "%s %s"
  xs_node_watch(char *path) "%s"
  xs_node_unwatch(char *path) "%s"
+
+# xen-mapcache.c
+xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
+xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
+xen_map_cache_return(void* ptr) "%p"
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
similarity index 100%
rename from hw/i386/xen/xen-mapcache.c
rename to hw/xen/xen-mapcache.c






Re: [PATCH v2] x86/hotplug: Do not put offline vCPUs in mwait idle state

2023-01-25 Thread Srivatsa S. Bhat


Hi Igor and Sean,

On 1/20/23 10:35 AM, Sean Christopherson wrote:
> On Fri, Jan 20, 2023, Igor Mammedov wrote:
>> On Fri, 20 Jan 2023 05:55:11 -0800
>> "Srivatsa S. Bhat"  wrote:
>>
>>> Hi Igor and Thomas,
>>>
>>> Thank you for your review!
>>>
>>> On 1/19/23 1:12 PM, Thomas Gleixner wrote:
 On Mon, Jan 16 2023 at 15:55, Igor Mammedov wrote:  
> "Srivatsa S. Bhat"  wrote:  
>> Fix this by preventing the use of mwait idle state in the vCPU offline
>> play_dead() path for any hypervisor, even if mwait support is
>> available.  
>
> if mwait is enabled, it's very likely guest to have cpuidle
> enabled and using the same mwait as well. So exiting early from
>  mwait_play_dead(), might just punt workflow down:
>   native_play_dead()
> ...
> mwait_play_dead();
> if (cpuidle_play_dead())   <- possible mwait here 
>  
> hlt_play_dead(); 
>
> and it will end up in mwait again and only if that fails
> it will go HLT route and maybe transition to VMM.  

 Good point.
   
> Instead of workaround on guest side,
> shouldn't hypervisor force VMEXIT on being uplugged vCPU when it's
> actually hot-unplugging vCPU? (ex: QEMU kicks vCPU out from guest
> context when it is removing vCPU, among other things)  

 For a pure guest side CPU unplug operation:

 guest$ echo 0 >/sys/devices/system/cpu/cpu$N/online

 the hypervisor is not involved at all. The vCPU is not removed in that
 case.
   
>>>
>>> Agreed, and this is indeed the scenario I was targeting with this patch,
>>> as opposed to vCPU removal from the host side. I'll add this clarification
>>> to the commit message.
> 
> Forcing HLT doesn't solve anything, it's perfectly legal to passthrough HLT.  
> I
> guarantee there are use cases that passthrough HLT but _not_ MONITOR/MWAIT, 
> and
> that passthrough all of them.
> 
>> commit message explicitly said:
>> "which prevents the hypervisor from running other vCPUs or workloads on the
>> corresponding pCPU."
>>
>> and that implies unplug on hypervisor side as well.
>> Why? That's because when hypervisor exposes mwait to guest, it has to 
>> reserve/pin
>> a pCPU for each of present vCPUs. And you can safely run other VMs/workloads
>> on that pCPU only after it's not possible for it to be reused by VM where
>> it was used originally.
> 
> Pinning isn't strictly required from a safety perspective.  The latency of 
> context
> switching may suffer due to wake times, but preempting a vCPU that it's C1 (or
> deeper) won't cause functional problems.   Passing through an entire socket
> (or whatever scope triggers extra fun) might be a different story, but pinning
> isn't strictly required.
> 
> That said, I 100% agree that this is expected behavior and not a bug.  
> Letting the
> guest execute MWAIT or HLT means the host won't have perfect visibility into 
> guest
> activity state.
> 
> Oversubscribing a pCPU and exposing MWAIT and/or HLT to vCPUs is generally 
> not done
> precisely because the guest will always appear busy without extra effort on 
> the
> host.  E.g. KVM requires an explicit opt-in from userspace to expose MWAIT 
> and/or
> HLT.
> 
> If someone really wants to effeciently oversubscribe pCPUs and passthrough 
> MWAIT,
> then their best option is probably to have a paravirt interface so that the 
> guest
> can tell the host its offlining a vCPU.  Barring that the host could inspect 
> the
> guest when preempting a vCPU to try and guesstimate how much work the vCPU is
> actually doing in order to make better scheduling decisions.
> 
>> Now consider following worst (and most likely) case without unplug
>> on hypervisor side:
>>
>>  1. vm1mwait: pin pCPU2 to vCPU2
>>  2. vm1mwait: guest$ echo 0 >/sys/devices/system/cpu/cpu2/online
>> -> HLT -> VMEXIT
>>  --
>>  3. vm2mwait: pin pCPU2 to vCPUx and start VM
>>  4. vm2mwait: guest OS onlines Vcpu and starts using it incl.
>>going into idle=>mwait state
>>  --
>>  5. vm1mwait: it still thinks that vCPU is present it can rightfully do:
>>guest$ echo 1 >/sys/devices/system/cpu/cpu2/online
>>  --  
>>  6.1 best case vm1mwait online fails after timeout
>>  6.2 worse case: vm2mwait does VMEXIT on vCPUx around time-frame when
>>  vm1mwait onlines vCPU2, the online may succeed and then vm2mwait's
>>  vCPUx will be stuck (possibly indefinitely) until for some reason
>>  VMEXIT happens on vm1mwait's vCPU2 _and_ host decides to schedule
>>  vCPUx on pCPU2 which would make vm1mwait stuck on vCPU2.
>> So either way it's expected behavior.
>>
>> And if there is no intention to unplug vCPU on hypervisor side,
>> then VMEXIT on play_dead is not really necessary (mwait is better
>> then HLT), since hypervisor can't safely reuse pCPU elsewhere and
>> VCPU goes into deep sleep within guest context.
>>
>> PS:
>> 

[linux-linus test] 176125: regressions - FAIL

2023-01-25 Thread osstest service owner
flight 176125 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/176125/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-examine  8 reboot   fail REGR. vs. 173462
 test-arm64-arm64-xl-vhd   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-seattle   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-arndale   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-multivcpu  8 xen-bootfail REGR. vs. 173462
 test-armhf-armhf-xl   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-vhd   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-libvirt-raw  8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-libvirt  8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-libvirt-raw  8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-examine  8 reboot   fail REGR. vs. 173462
 test-armhf-armhf-libvirt-qcow2  8 xen-boot   fail REGR. vs. 173462
 test-armhf-armhf-xl-credit2   8 xen-boot fail REGR. vs. 173462

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 15 guest-saverestore fail 
pass in 176115

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds  8 xen-boot fail REGR. vs. 173462

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173462
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass

version targeted for testing:
 linux948ef7bb70c4acaf74d87420ea3a1190862d4548
baseline version:
 linux9d84bb40bcb30a7fa16f33baa967aeb9953dda78

Last test of basis   173462  2022-10-07 18:41:45 Z  110 days
Failing since173470  2022-10-08 06:21:34 Z  109 days  226 attempts
Testing same since   176115  2023-01-25 03:57:20 Z0 days2 attempts


3442 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-coresched-amd64-xlpass
 test-arm64-arm64-xl  fail
 

Re: [XEN PATCH v2 0/3] Configure qemu upstream correctly by default for igd-passthru

2023-01-25 Thread Chuck Zmudzinski
On 1/25/2023 6:37 AM, Anthony PERARD wrote:
> On Tue, Jan 10, 2023 at 02:32:01AM -0500, Chuck Zmudzinski wrote:
> > I call attention to the commit message of the first patch which points
> > out that using the "pc" machine and adding the xen platform device on
> > the qemu upstream command line is not functionally equivalent to using
> > the "xenfv" machine which automatically adds the xen platform device
> > earlier in the guest creation process. As a result, there is a noticeable
> > reduction in the performance of the guest during startup with the "pc"
> > machne type even if the xen platform device is added via the qemu
> > command line options, although eventually both Linux and Windows guests
> > perform equally well once the guest operating system is fully loaded.
>
> There shouldn't be a difference between "xenfv" machine or using the
> "pc" machine while adding the "xen-platform" device, at least with
> regards to access to disk or network.
>
> The first patch of the series is using the "pc" machine without any
> "xen-platform" device, so we can't compare startup performance based on
> that.
>
> > Specifically, startup time is longer and neither the grub vga drivers
> > nor the windows vga drivers in early startup perform as well when the
> > xen platform device is added via the qemu command line instead of being
> > added immediately after the other emulated i440fx pci devices when the
> > "xenfv" machine type is used.
>
> The "xen-platform" device is mostly an hint to a guest that they can use
> pv-disk and pv-network devices. I don't think it would change anything
> with regards to graphics.
>
> > For example, when using the "pc" machine, which adds the xen platform
> > device using a command line option, the Linux guest could not display
> > the grub boot menu at the native resolution of the monitor, but with the
> > "xenfv" machine, the grub menu is displayed at the full 1920x1080
> > native resolution of the monitor for testing. So improved startup
> > performance is an advantage for the patch for qemu.
>
> I've just found out that when doing IGD passthrough, both machine
> "xenfv" and "pc" are much more different than I though ... :-(
> pc_xen_hvm_init_pci() in QEMU changes the pci-host device, which in
> turns copy some informations from the real host bridge.
> I guess this new host bridge help when the firmware setup the graphic
> for grub.

I am surprised it works at all with the "pc" machine, that is, without the
TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE that is used in the "xenfv"
machine. This only seems to affect the legacy grub vga driver and the legacy
Windows vga driver during early boot. Still, I much prefer keeping the "xenfv"
machine for Intel IGD than this workaround of patching libxl to use the "pc"
machine.

>
> > I also call attention to the last point of the commit message of the
> > second patch and the comments for reviewers section of the second patch.
> > This approach, as opposed to fixing this in qemu upstream, makes
> > maintaining the code in libxl__build_device_model_args_new more
> > difficult and therefore increases the chances of problems caused by
> > coding errors and typos for users of libxl. So that is another advantage
> > of the patch for qemu.
>
> We would just needs to use a different approach in libxl when generating
> the command line. We could probably avoid duplications. I was hopping to
> have patch series for libxl that would change the machine used to start
> using "pc" instead of "xenfv" for all configurations, but based on the
> point above (IGD specific change to "xenfv"), then I guess we can't
> really do anything from libxl to fix IGD passthrough.

We could switch to the "pc" machine, but we would need to patch
qemu also so the "pc" machine uses the special device the "xenfv"
machine uses (TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE).
So it is simpler to just use the other patch to qemu and not patch
libxl at all to fix this.

>
> > OTOH, fixing this in qemu causes newer qemu versions to behave
> > differently than previous versions of qemu, which the qemu community
> > does not like, although they seem OK with the other patch since it only
> > affects qemu "xenfv" machine types, but they do not want the patch to
> > affect toolstacks like libvirt that do not use qemu upstream's
> > autoconfiguration options as much as libxl does, and, of course, libvirt
> > can manage qemu "xenfv" machines so exising "xenfv" guests configured
> > manually by libvirt could be adversely affected by the patch to qemu,
> > but only if those same guests are also configured for igd-passthrough,
> > which is likely a very small number of possibly affected libvirt users
> > of qemu.
> > 
> > A year or two ago I tried to configure guests for pci passthrough on xen
> > using libvirt's tool to convert a libxl xl.cfg file to libvirt xml. It
> > could not convert an xl.cfg file with a configuration item
> > pci = [ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...] for pci 

Re: [QEMU][PATCH v4 01/10] hw/i386/xen/: move xen-mapcache.c to hw/xen/

2023-01-25 Thread Philippe Mathieu-Daudé

On 25/1/23 09:53, Vikram Garhwal wrote:

xen-mapcache.c contains common functions which can be used for enabling Xen on
aarch64 with IOREQ handling. Moving it out from hw/i386/xen to hw/xen to make it
accessible for both aarch64 and x86.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
  hw/i386/meson.build  | 1 +
  hw/i386/xen/meson.build  | 1 -
  hw/i386/xen/trace-events | 5 -
  hw/xen/meson.build   | 4 
  hw/xen/trace-events  | 5 +
  hw/{i386 => }/xen/xen-mapcache.c | 0
  6 files changed, 10 insertions(+), 6 deletions(-)
  rename hw/{i386 => }/xen/xen-mapcache.c (100%)

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 213e2e82b3..cfdbfdcbcb 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -33,5 +33,6 @@ subdir('kvm')
  subdir('xen')
  
  i386_ss.add_all(xenpv_ss)

+i386_ss.add_all(xen_ss)
  
  hw_arch += {'i386': i386_ss}

diff --git a/hw/i386/xen/meson.build b/hw/i386/xen/meson.build
index be84130300..2fcc46e6ca 100644
--- a/hw/i386/xen/meson.build
+++ b/hw/i386/xen/meson.build
@@ -1,6 +1,5 @@
  i386_ss.add(when: 'CONFIG_XEN', if_true: files(
'xen-hvm.c',
-  'xen-mapcache.c',
'xen_apic.c',
'xen_platform.c',
'xen_pvdevice.c',
diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index 5d6be61090..a0c89d91c4 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -21,8 +21,3 @@ xen_map_resource_ioreq(uint32_t id, void *addr) "id: %u addr: 
%p"
  cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, uint32_t 
data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
  cpu_ioreq_config_write(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, uint32_t 
data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
  
-# xen-mapcache.c

-xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
-xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
-xen_map_cache_return(void* ptr) "%p"
-
diff --git a/hw/xen/meson.build b/hw/xen/meson.build
index ae0ace3046..19d0637c46 100644
--- a/hw/xen/meson.build
+++ b/hw/xen/meson.build
@@ -22,3 +22,7 @@ else
  endif
  
  specific_ss.add_all(when: ['CONFIG_XEN', xen], if_true: xen_specific_ss)

+
+xen_ss = ss.source_set()
+
+xen_ss.add(when: 'CONFIG_XEN', if_true: files('xen-mapcache.c'))


Can't we add it to softmmu_ss directly?


diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index 3da3fd8348..2c8f238f42 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -41,3 +41,8 @@ xs_node_vprintf(char *path, char *value) "%s %s"
  xs_node_vscanf(char *path, char *value) "%s %s"
  xs_node_watch(char *path) "%s"
  xs_node_unwatch(char *path) "%s"
+
+# xen-mapcache.c
+xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
+xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
+xen_map_cache_return(void* ptr) "%p"
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
similarity index 100%
rename from hw/i386/xen/xen-mapcache.c
rename to hw/xen/xen-mapcache.c





Re: [QEMU][PATCH v4 04/10] xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common

2023-01-25 Thread Vikram Garhwal

Hi Stefano,

On 1/25/23 1:55 PM, Stefano Stabellini wrote:

On Wed, 25 Jan 2023, Vikram Garhwal wrote:

From: Stefano Stabellini 

This patch does following:
1. creates arch_handle_ioreq() and arch_xen_set_memory(). This is done in
 preparation for moving most of xen-hvm code to an arch-neutral location,
 move the x86-specific portion of xen_set_memory to arch_xen_set_memory.
 Also, move handle_vmport_ioreq to arch_handle_ioreq.

2. Pure code movement: move common functions to hw/xen/xen-hvm-common.c
 Extract common functionalities from hw/i386/xen/xen-hvm.c and move them to
 hw/xen/xen-hvm-common.c. These common functions are useful for creating
 an IOREQ server.

 xen_hvm_init_pc() contains the architecture independent code for creating
 and mapping a IOREQ server, connecting memory and IO listeners, 
initializing
 a xen bus and registering backends. Moved this common xen code to a new
 function xen_register_ioreq() which can be used by both x86 and ARM 
machines.

 Following functions are moved to hw/xen/xen-hvm-common.c:
 xen_vcpu_eport(), xen_vcpu_ioreq(), xen_ram_alloc(), xen_set_memory(),
 xen_region_add(), xen_region_del(), xen_io_add(), xen_io_del(),
 xen_device_realize(), xen_device_unrealize(),
 cpu_get_ioreq_from_shared_memory(), cpu_get_ioreq(), do_inp(),
 do_outp(), rw_phys_req_item(), read_phys_req_item(),
 write_phys_req_item(), cpu_ioreq_pio(), cpu_ioreq_move(),
 cpu_ioreq_config(), handle_ioreq(), handle_buffered_iopage(),
 handle_buffered_io(), cpu_handle_ioreq(), xen_main_loop_prepare(),
 xen_hvm_change_state_handler(), xen_exit_notifier(),
 xen_map_ioreq_server(), destroy_hvm_domain() and
 xen_shutdown_fatal_error()

3. Removed static type from below functions:
 1. xen_region_add()
 2. xen_region_del()
 3. xen_io_add()
 4. xen_io_del()
 5. xen_device_realize()
 6. xen_device_unrealize()
 7. xen_hvm_change_state_handler()
 8. cpu_ioreq_pio()
 9. xen_exit_notifier()

4. Replace TARGET_PAGE_SIZE with XC_PAGE_SIZE to match the page side with Xen.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 

One comment below

[...]


+void xen_exit_notifier(Notifier *n, void *data)
+{
+XenIOState *state = container_of(n, XenIOState, exit);
+
+xen_destroy_ioreq_server(xen_domid, state->ioservid);

In the original code we had:

-if (state->fres != NULL) {
-xenforeignmemory_unmap_resource(xen_fmem, state->fres);
-}

Should we add it here?


I went through the manual process of comparing all the code additions
and deletions (not fun!) and everything checks out except for this.
thanks for catching this. There were two recent commits in upstream and 
i missed those. I rechecked and there are actually three other lines 
which needs update. I will address it in v5.



+xenevtchn_close(state->xce_handle);
+xs_daemon_close(state->xenstore);
+}




Re: [QEMU][PATCH v4 09/10] hw/arm: introduce xenpvh machine

2023-01-25 Thread Stefano Stabellini
On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> Add a new machine xenpvh which creates a IOREQ server to register/connect with
> Xen Hypervisor.
> 
> Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
> TPM emulator and connects to swtpm running on host machine via chardev socket
> and support TPM functionalities for a guest domain.
> 
> Extra command line for aarch64 xenpvh QEMU to connect to swtpm:
> -chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
> -tpmdev emulator,id=tpm0,chardev=chrtpm \
> -machine tpm-base-addr=0x0c00 \
> 
> swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
> provides access to TPM functionality over socket, chardev and CUSE interface.
> Github repo: https://github.com/stefanberger/swtpm
> Example for starting swtpm on host machine:
> mkdir /tmp/vtpm2
> swtpm socket --tpmstate dir=/tmp/vtpm2 \
> --ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &
> 
> Signed-off-by: Vikram Garhwal 
> Signed-off-by: Stefano Stabellini 
> ---
>  docs/system/arm/xenpvh.rst|  34 +++
>  docs/system/target-arm.rst|   1 +
>  hw/arm/meson.build|   2 +
>  hw/arm/xen_arm.c  | 184 ++
>  include/hw/arm/xen_arch_hvm.h |   9 ++
>  include/hw/xen/arch_hvm.h |   2 +
>  6 files changed, 232 insertions(+)
>  create mode 100644 docs/system/arm/xenpvh.rst
>  create mode 100644 hw/arm/xen_arm.c
>  create mode 100644 include/hw/arm/xen_arch_hvm.h
> 
> diff --git a/docs/system/arm/xenpvh.rst b/docs/system/arm/xenpvh.rst
> new file mode 100644
> index 00..e1655c7ab8
> --- /dev/null
> +++ b/docs/system/arm/xenpvh.rst
> @@ -0,0 +1,34 @@
> +XENPVH (``xenpvh``)
> +=
> +This machine creates a IOREQ server to register/connect with Xen Hypervisor.
> +
> +When TPM is enabled, this machine also creates a tpm-tis-device at a user 
> input
> +tpm base address, adds a TPM emulator and connects to a swtpm application
> +running on host machine via chardev socket. This enables xenpvh to support 
> TPM
> +functionalities for a guest domain.
> +
> +More information about TPM use and installing swtpm linux application can be
> +found at: docs/specs/tpm.rst.
> +
> +Example for starting swtpm on host machine:
> +.. code-block:: console
> +
> +mkdir /tmp/vtpm2
> +swtpm socket --tpmstate dir=/tmp/vtpm2 \
> +--ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &
> +
> +Sample QEMU xenpvh commands for running and connecting with Xen:
> +.. code-block:: console
> +
> +qemu-system-aarch64 -xen-domid 1 \
> +-chardev socket,id=libxl-cmd,path=qmp-libxl-1,server=on,wait=off \
> +-mon chardev=libxl-cmd,mode=control \
> +-chardev 
> socket,id=libxenstat-cmd,path=qmp-libxenstat-1,server=on,wait=off \
> +-mon chardev=libxenstat-cmd,mode=control \
> +-xen-attach -name guest0 -vnc none -display none -nographic \
> +-machine xenpvh -m 1301 \
> +-chardev socket,id=chrtpm,path=tmp/vtpm2/swtpm-sock \
> +-tpmdev emulator,id=tpm0,chardev=chrtpm -machine tpm-base-addr=0x0C00
> +
> +In above QEMU command, last two lines are for connecting xenpvh QEMU to swtpm
> +via chardev socket.
> diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
> index 91ebc26c6d..af8d7c77d6 100644
> --- a/docs/system/target-arm.rst
> +++ b/docs/system/target-arm.rst
> @@ -106,6 +106,7 @@ undocumented; you can get a complete list by running
> arm/stm32
> arm/virt
> arm/xlnx-versal-virt
> +   arm/xenpvh
>  
>  Emulated CPU architecture support
>  =
> diff --git a/hw/arm/meson.build b/hw/arm/meson.build
> index b036045603..06bddbfbb8 100644
> --- a/hw/arm/meson.build
> +++ b/hw/arm/meson.build
> @@ -61,6 +61,8 @@ arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: 
> files('fsl-imx7.c', 'mcimx7d-sabre.
>  arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
>  arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 
> 'mcimx6ul-evk.c'))
>  arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
> +arm_ss.add(when: 'CONFIG_XEN', if_true: files('xen_arm.c'))
> +arm_ss.add_all(xen_ss)
>  
>  softmmu_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmu-common.c'))
>  softmmu_ss.add(when: 'CONFIG_EXYNOS4', if_true: files('exynos4_boards.c'))
> diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
> new file mode 100644
> index 00..12b19e3609
> --- /dev/null
> +++ b/hw/arm/xen_arm.c
> @@ -0,0 +1,184 @@
> +/*
> + * QEMU ARM Xen PV Machine
   ^ PVH


> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to 

Re: [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

2023-01-25 Thread Stefan Hajnoczi
On Thu, Jan 19, 2023 at 02:03:08AM -0500, Alexander Bulekov wrote:
> This protects devices from bh->mmio reentrancy issues.
> 
> Signed-off-by: Alexander Bulekov 
> ---
>  hw/9pfs/xen-9p-backend.c| 4 +++-
>  hw/block/dataplane/virtio-blk.c | 3 ++-
>  hw/block/dataplane/xen-block.c  | 5 +++--
>  hw/block/virtio-blk.c   | 5 +++--
>  hw/char/virtio-serial-bus.c | 3 ++-
>  hw/display/qxl.c| 9 ++---
>  hw/display/virtio-gpu.c | 6 --
>  hw/ide/ahci.c   | 3 ++-
>  hw/ide/core.c   | 3 ++-
>  hw/misc/imx_rngc.c  | 6 --
>  hw/misc/macio/mac_dbdma.c   | 2 +-
>  hw/net/virtio-net.c | 3 ++-
>  hw/nvme/ctrl.c  | 6 --
>  hw/scsi/mptsas.c| 3 ++-
>  hw/scsi/scsi-bus.c  | 3 ++-
>  hw/scsi/vmw_pvscsi.c| 3 ++-
>  hw/usb/dev-uas.c| 3 ++-
>  hw/usb/hcd-dwc2.c   | 3 ++-
>  hw/usb/hcd-ehci.c   | 3 ++-
>  hw/usb/hcd-uhci.c   | 2 +-
>  hw/usb/host-libusb.c| 6 --
>  hw/usb/redirect.c   | 6 --
>  hw/usb/xen-usb.c| 3 ++-
>  hw/virtio/virtio-balloon.c  | 5 +++--
>  hw/virtio/virtio-crypto.c   | 3 ++-
>  25 files changed, 66 insertions(+), 35 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v4 3/3] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

2023-01-25 Thread Stefan Hajnoczi
On Thu, Jan 19, 2023 at 02:03:08AM -0500, Alexander Bulekov wrote:
> This protects devices from bh->mmio reentrancy issues.
> 
> Signed-off-by: Alexander Bulekov 
> ---
>  hw/9pfs/xen-9p-backend.c| 4 +++-
>  hw/block/dataplane/virtio-blk.c | 3 ++-
>  hw/block/dataplane/xen-block.c  | 5 +++--
>  hw/block/virtio-blk.c   | 5 +++--
>  hw/char/virtio-serial-bus.c | 3 ++-
>  hw/display/qxl.c| 9 ++---
>  hw/display/virtio-gpu.c | 6 --
>  hw/ide/ahci.c   | 3 ++-
>  hw/ide/core.c   | 3 ++-
>  hw/misc/imx_rngc.c  | 6 --
>  hw/misc/macio/mac_dbdma.c   | 2 +-
>  hw/net/virtio-net.c | 3 ++-
>  hw/nvme/ctrl.c  | 6 --
>  hw/scsi/mptsas.c| 3 ++-
>  hw/scsi/scsi-bus.c  | 3 ++-
>  hw/scsi/vmw_pvscsi.c| 3 ++-
>  hw/usb/dev-uas.c| 3 ++-
>  hw/usb/hcd-dwc2.c   | 3 ++-
>  hw/usb/hcd-ehci.c   | 3 ++-
>  hw/usb/hcd-uhci.c   | 2 +-
>  hw/usb/host-libusb.c| 6 --
>  hw/usb/redirect.c   | 6 --
>  hw/usb/xen-usb.c| 3 ++-
>  hw/virtio/virtio-balloon.c  | 5 +++--
>  hw/virtio/virtio-crypto.c   | 3 ++-
>  25 files changed, 66 insertions(+), 35 deletions(-)

Should scripts/checkpatch.pl complain when qemu_bh_new() or aio_bh_new()
are called from hw/? Adding a check is important so new instances cannot
be added accidentally in the future.

Stefan


signature.asc
Description: PGP signature


Re: [QEMU][PATCH v4 07/10] hw/xen/xen-hvm-common: Use g_new and error_setg_errno

2023-01-25 Thread Stefano Stabellini
On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> Replace g_malloc with g_new and perror with error_setg_errno.
> 
> Signed-off-by: Vikram Garhwal 
> ---
>  hw/xen/xen-hvm-common.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> index 94dbbe97ed..01c8ec1956 100644
> --- a/hw/xen/xen-hvm-common.c
> +++ b/hw/xen/xen-hvm-common.c
> @@ -34,7 +34,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
> MemoryRegion *mr,
>  trace_xen_ram_alloc(ram_addr, size);
>  
>  nr_pfn = size >> TARGET_PAGE_BITS;
> -pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
> +pfn_list = g_new(xen_pfn_t, nr_pfn);
>  
>  for (i = 0; i < nr_pfn; i++) {
>  pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
> @@ -726,7 +726,7 @@ void destroy_hvm_domain(bool reboot)
>  return;
>  }
>  if (errno != ENOTTY /* old Xen */) {
> -perror("xendevicemodel_shutdown failed");
> +error_report("xendevicemodel_shutdown failed with error %d", 
> errno);

You can use strerror(errno), here and below.

Either way:

Reviewed-by: Stefano Stabellini 



>  }
>  /* well, try the old thing then */
>  }
> @@ -797,7 +797,7 @@ static void xen_do_ioreq_register(XenIOState *state,
>  }
>  
>  /* Note: cpus is empty at this point in init */
> -state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
> +state->cpu_by_vcpu_id = g_new0(CPUState *, max_cpus);
>  
>  rc = xen_set_ioreq_server_state(xen_domid, state->ioservid, true);
>  if (rc < 0) {
> @@ -806,7 +806,7 @@ static void xen_do_ioreq_register(XenIOState *state,
>  goto err;
>  }
>  
> -state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
> +state->ioreq_local_port = g_new0(evtchn_port_t, max_cpus);
>  
>  /* FIXME: how about if we overflow the page here? */
>  for (i = 0; i < max_cpus; i++) {
> @@ -860,13 +860,13 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
> max_cpus,
>  
>  state->xce_handle = xenevtchn_open(NULL, 0);
>  if (state->xce_handle == NULL) {
> -perror("xen: event channel open");
> +error_report("xen: event channel open failed with error %d", errno);
>  goto err;
>  }
>  
>  state->xenstore = xs_daemon_open();
>  if (state->xenstore == NULL) {
> -perror("xen: xenstore open");
> +error_report("xen: xenstore open failed with error %d", errno);
>  goto err;
>  }
>  
> -- 
> 2.17.0
> 
> 



Re: [QEMU][PATCH v4 06/10] hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration failure

2023-01-25 Thread Stefano Stabellini
On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> From: Stefano Stabellini 
> 
> On ARM it is possible to have a functioning xenpv machine with only the
> PV backends and no IOREQ server. If the IOREQ server creation fails continue
> to the PV backends initialization.
> 
> Also, moved the IOREQ registration and mapping subroutine to new function
> xen_do_ioreq_register().
> 
> Signed-off-by: Stefano Stabellini 
> Signed-off-by: Vikram Garhwal 

as per my previous reply, even though I am listed as co-author, for
tracking that I did review this version of the patch:

Reviewed-by: Stefano Stabellini 


> ---
>  hw/xen/xen-hvm-common.c | 53 -
>  1 file changed, 36 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> index e748d8d423..94dbbe97ed 100644
> --- a/hw/xen/xen-hvm-common.c
> +++ b/hw/xen/xen-hvm-common.c
> @@ -777,25 +777,12 @@ err:
>  exit(1);
>  }
>  
> -void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
> -MemoryListener xen_memory_listener)
> +static void xen_do_ioreq_register(XenIOState *state,
> +   unsigned int max_cpus,
> +   MemoryListener 
> xen_memory_listener)
>  {
>  int i, rc;
>  
> -state->xce_handle = xenevtchn_open(NULL, 0);
> -if (state->xce_handle == NULL) {
> -perror("xen: event channel open");
> -goto err;
> -}
> -
> -state->xenstore = xs_daemon_open();
> -if (state->xenstore == NULL) {
> -perror("xen: xenstore open");
> -goto err;
> -}
> -
> -xen_create_ioreq_server(xen_domid, >ioservid);
> -
>  state->exit.notify = xen_exit_notifier;
>  qemu_add_exit_notifier(>exit);
>  
> @@ -859,12 +846,44 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
> max_cpus,
>  QLIST_INIT(>dev_list);
>  device_listener_register(>device_listener);
>  
> +return;
> +
> +err:
> +error_report("xen hardware virtual machine initialisation failed");
> +exit(1);
> +}
> +
> +void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
> +MemoryListener xen_memory_listener)
> +{
> +int rc;
> +
> +state->xce_handle = xenevtchn_open(NULL, 0);
> +if (state->xce_handle == NULL) {
> +perror("xen: event channel open");
> +goto err;
> +}
> +
> +state->xenstore = xs_daemon_open();
> +if (state->xenstore == NULL) {
> +perror("xen: xenstore open");
> +goto err;
> +}
> +
> +rc = xen_create_ioreq_server(xen_domid, >ioservid);
> +if (!rc) {
> +xen_do_ioreq_register(state, max_cpus, xen_memory_listener);
> +} else {
> +warn_report("xen: failed to create ioreq server");
> +}
> +
>  xen_bus_init();
>  
>  xen_register_backend(state);
>  
>  return;
> +
>  err:
> -error_report("xen hardware virtual machine initialisation failed");
> +error_report("xen hardware virtual machine backend registration failed");
>  exit(1);
>  }
> -- 
> 2.17.0
> 
> 



Re: [QEMU][PATCH v4 05/10] include/hw/xen/xen_common: return error from xen_create_ioreq_server

2023-01-25 Thread Stefano Stabellini
On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> From: Stefano Stabellini 
> 
> This is done to prepare for enabling xenpv support for ARM architecture.
> On ARM it is possible to have a functioning xenpv machine with only the
> PV backends and no IOREQ server. If the IOREQ server creation fails,
> continue to the PV backends initialization.
> 
> Signed-off-by: Stefano Stabellini 
> Signed-off-by: Vikram Garhwal 

I know I am co-author of the patch but just for record-keeping to
remember that I also reviewed this patch:

Reviewed-by: Stefano Stabellini 


> ---
>  include/hw/xen/xen_common.h | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
> index 9a13a756ae..9ec69582b3 100644
> --- a/include/hw/xen/xen_common.h
> +++ b/include/hw/xen/xen_common.h
> @@ -467,9 +467,10 @@ static inline void xen_unmap_pcidev(domid_t dom,
>  {
>  }
>  
> -static inline void xen_create_ioreq_server(domid_t dom,
> -   ioservid_t *ioservid)
> +static inline int xen_create_ioreq_server(domid_t dom,
> +  ioservid_t *ioservid)
>  {
> +return 0;
>  }
>  
>  static inline void xen_destroy_ioreq_server(domid_t dom,
> @@ -600,8 +601,8 @@ static inline void xen_unmap_pcidev(domid_t dom,
>PCI_FUNC(pci_dev->devfn));
>  }
>  
> -static inline void xen_create_ioreq_server(domid_t dom,
> -   ioservid_t *ioservid)
> +static inline int xen_create_ioreq_server(domid_t dom,
> +  ioservid_t *ioservid)
>  {
>  int rc = xendevicemodel_create_ioreq_server(xen_dmod, dom,
>  HVM_IOREQSRV_BUFIOREQ_ATOMIC,
> @@ -609,12 +610,14 @@ static inline void xen_create_ioreq_server(domid_t dom,
>  
>  if (rc == 0) {
>  trace_xen_ioreq_server_create(*ioservid);
> -return;
> +return rc;
>  }
>  
>  *ioservid = 0;
>  use_default_ioreq_server = true;
>  trace_xen_default_ioreq_server();
> +
> +return rc;
>  }
>  
>  static inline void xen_destroy_ioreq_server(domid_t dom,
> -- 
> 2.17.0
> 
> 



Re: [QEMU][PATCH v4 04/10] xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common

2023-01-25 Thread Stefano Stabellini
On Wed, 25 Jan 2023, Vikram Garhwal wrote:
> From: Stefano Stabellini 
> 
> This patch does following:
> 1. creates arch_handle_ioreq() and arch_xen_set_memory(). This is done in
> preparation for moving most of xen-hvm code to an arch-neutral location,
> move the x86-specific portion of xen_set_memory to arch_xen_set_memory.
> Also, move handle_vmport_ioreq to arch_handle_ioreq.
> 
> 2. Pure code movement: move common functions to hw/xen/xen-hvm-common.c
> Extract common functionalities from hw/i386/xen/xen-hvm.c and move them to
> hw/xen/xen-hvm-common.c. These common functions are useful for creating
> an IOREQ server.
> 
> xen_hvm_init_pc() contains the architecture independent code for creating
> and mapping a IOREQ server, connecting memory and IO listeners, 
> initializing
> a xen bus and registering backends. Moved this common xen code to a new
> function xen_register_ioreq() which can be used by both x86 and ARM 
> machines.
> 
> Following functions are moved to hw/xen/xen-hvm-common.c:
> xen_vcpu_eport(), xen_vcpu_ioreq(), xen_ram_alloc(), xen_set_memory(),
> xen_region_add(), xen_region_del(), xen_io_add(), xen_io_del(),
> xen_device_realize(), xen_device_unrealize(),
> cpu_get_ioreq_from_shared_memory(), cpu_get_ioreq(), do_inp(),
> do_outp(), rw_phys_req_item(), read_phys_req_item(),
> write_phys_req_item(), cpu_ioreq_pio(), cpu_ioreq_move(),
> cpu_ioreq_config(), handle_ioreq(), handle_buffered_iopage(),
> handle_buffered_io(), cpu_handle_ioreq(), xen_main_loop_prepare(),
> xen_hvm_change_state_handler(), xen_exit_notifier(),
> xen_map_ioreq_server(), destroy_hvm_domain() and
> xen_shutdown_fatal_error()
> 
> 3. Removed static type from below functions:
> 1. xen_region_add()
> 2. xen_region_del()
> 3. xen_io_add()
> 4. xen_io_del()
> 5. xen_device_realize()
> 6. xen_device_unrealize()
> 7. xen_hvm_change_state_handler()
> 8. cpu_ioreq_pio()
> 9. xen_exit_notifier()
> 
> 4. Replace TARGET_PAGE_SIZE with XC_PAGE_SIZE to match the page side with Xen.
> 
> Signed-off-by: Vikram Garhwal 
> Signed-off-by: Stefano Stabellini 

One comment below

[...]

> +void xen_exit_notifier(Notifier *n, void *data)
> +{
> +XenIOState *state = container_of(n, XenIOState, exit);
> +
> +xen_destroy_ioreq_server(xen_domid, state->ioservid);

In the original code we had:

-if (state->fres != NULL) {
-xenforeignmemory_unmap_resource(xen_fmem, state->fres);
-}

Should we add it here?


I went through the manual process of comparing all the code additions
and deletions (not fun!) and everything checks out except for this.


> +xenevtchn_close(state->xce_handle);
> +xs_daemon_close(state->xenstore);
> +}



[xen-unstable bisection] complete test-amd64-i386-pair

2023-01-25 Thread osstest service owner
branch xen-unstable
xenbranch xen-unstable
job test-amd64-i386-pair
testid guest-migrate/src_host/dst_host

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  xen git://xenbits.xen.org/xen.git
  Bug introduced:  1894049fa283308d5f90446370be1ade7afe8975
  Bug not present: 20279afd732371dd2534380d27aa6d1863d82d1f
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/176130/


  commit 1894049fa283308d5f90446370be1ade7afe8975
  Author: Jan Beulich 
  Date:   Fri Jan 20 09:17:33 2023 +0100
  
  x86/shadow: L2H shadow type is PV32-only
  
  Like for the various HVM-only types, save a little bit of code by suitably
  "masking" this type out when !PV32.
  
  Signed-off-by: Jan Beulich 
  Acked-by: Andrew Cooper 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-unstable/test-amd64-i386-pair.guest-migrate--src_host--dst_host.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/xen-unstable/test-amd64-i386-pair.guest-migrate--src_host--dst_host
 --summary-out=tmp/176130.bisection-summary --basis-template=175994 
--blessings=real,real-bisect,real-retry xen-unstable test-amd64-i386-pair 
guest-migrate/src_host/dst_host
Searching for failure / basis pass:
 176121 fail [dst_host=elbling1,src_host=elbling0] / 175994 
[dst_host=nocera0,src_host=nocera1] 175987 
[dst_host=huxelrebe1,src_host=huxelrebe0] 175965 
[dst_host=albana0,src_host=albana1] 175734 [dst_host=italia0,src_host=italia1] 
175726 [dst_host=nocera1,src_host=nocera0] 175714 
[dst_host=albana1,src_host=albana0] 175694 
[dst_host=huxelrebe0,src_host=huxelrebe1] 175671 
[dst_host=elbling0,src_host=elbling1] 175651 
[dst_host=debina0,src_host=debina1] 175635 [dst_host=italia1,src_host=italia0] 
175\
 624 [dst_host=fiano0,src_host=fiano1] 175612 [dst_host=pinot1,src_host=pinot0] 
175601 ok.
Failure / basis pass flights: 176121 / 175601
(tree with no url: minios)
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
625eb5e96dc96aa7fddef59a08edae215527f19c 
3b760245f74ab2022b1aa4da842c4545228c2e83
Basis pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
1cf02b05b27c48775a25699e61b93b814b9ae042 
2b21cbbb339fb14414f357a6683b1df74c36fda2
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#c3038e718a19fc596f7b1baba0f83d5146dc7784-c3038e718a19fc596f7b1baba0f83d5146dc7784
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/qemu-xen-traditional.git#3d273dd05e51e5a1ffba3d98c7437ee84e8f8764-3d273dd05e51e5a1ffba3d98c7437ee84e8f8764
 git://xenbits.xen.org/qemu-xen.git#1cf02b05b27c48775a25699e61b93b8\
 14b9ae042-625eb5e96dc96aa7fddef59a08edae215527f19c 
git://xenbits.xen.org/xen.git#2b21cbbb339fb14414f357a6683b1df74c36fda2-3b760245f74ab2022b1aa4da842c4545228c2e83
Loaded 10003 nodes in revision graph
Searching for test results:
 175592 [dst_host=nobling1,src_host=nobling0]
 175601 pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
1cf02b05b27c48775a25699e61b93b814b9ae042 
2b21cbbb339fb14414f357a6683b1df74c36fda2
 175612 [dst_host=pinot1,src_host=pinot0]
 175624 [dst_host=fiano0,src_host=fiano1]
 175635 [dst_host=italia1,src_host=italia0]
 175651 [dst_host=debina0,src_host=debina1]
 175671 [dst_host=elbling0,src_host=elbling1]
 175694 [dst_host=huxelrebe0,src_host=huxelrebe1]
 175714 [dst_host=albana1,src_host=albana0]
 175720 [dst_host=nocera1,src_host=nocera0]
 175726 [dst_host=nocera1,src_host=nocera0]
 175734 [dst_host=italia0,src_host=italia1]
 175834 []
 175861 []
 175890 []
 175907 []
 175931 []
 175956 []
 175965 [dst_host=albana0,src_host=albana1]
 175987 [dst_host=huxelrebe1,src_host=huxelrebe0]
 175994 [dst_host=nocera0,src_host=nocera1]
 176003 fail c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
625eb5e96dc96aa7fddef59a08edae215527f19c 
89cc5d96a9d1fce81cf58b6814dac62a9e07fbee
 176011 fail 

[xen-unstable test] 176121: regressions - FAIL

2023-01-25 Thread osstest service owner
flight 176121 xen-unstable real [real]
flight 176129 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/176121/
http://logs.test-lab.xenproject.org/osstest/logs/176129/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-coresched-i386-xl 18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 175994
 test-amd64-i386-xl-xsm   18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl   18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-pair  26 guest-migrate/src_host/dst_host fail REGR. vs. 175994
 test-amd64-i386-xl-vhd   17 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl-shadow18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-libvirt-pair 26 guest-migrate/src_host/dst_host fail REGR. vs. 
175994

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 12 debian-hvm-install fail pass in 
176129-retest
 test-amd64-i386-freebsd10-amd64 19 guest-localmigrate/x10 fail pass in 
176129-retest
 test-amd64-amd64-xl-qcow2 21 guest-start/debian.repeat fail pass in 
176129-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 175987
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 175987
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 175987
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 175994
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 175994
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 175994
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 175994
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 175994
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 175994
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 175994
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 175994
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 175994
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 

Re: [PATCH v3 1/4] x86/spec-ctrl: add logic to issue IBPB on exit to guest

2023-01-25 Thread Andrew Cooper
On 25/01/2023 3:25 pm, Jan Beulich wrote:
> In order to be able to defer the context switch IBPB to the last
> possible point, add logic to the exit-to-guest paths to issue the
> barrier there, including the "IBPB doesn't flush the RSB/RAS"
> workaround. Since alternatives, for now at least, can't nest, emit JMP
> to skip past both constructs where both are needed. This may be more
> efficient anyway, as the sequence of NOPs is pretty long.

It is very uarch specific as to when a jump is less overhead than a line
of nops.

In all CPUs liable to be running Xen, even unconditional jumps take up
branch prediction resource, because all branch prediction is pre-decode
these days, so branch locations/types/destinations all need deriving
from %rip and "history" alone.

So whether a branch or a line of nops is better is a tradeoff between
how much competition there is for branch prediction resource, and how
efficiently the CPU can brute-force its way through a long line of nops.

But a very interesting datapoint.  It turns out that AMD Zen4 CPUs
macrofuse adjacent nops, including longnops, because it reduces the
amount of execute/retire resources required.  And a lot of
kernel/hypervisor fastpaths have a lot of nops these days.


For us, the "can't nest" is singularly more important than any worry
about uarch behaviour.  We've frankly got much lower hanging fruit than
worring about one branch vs a few nops.

> LFENCEs are omitted - for HVM a VM entry is immanent, which already
> elsewhere we deem sufficiently serializing an event. For 32-bit PV
> we're going through IRET, which ought to be good enough as well. While
> 64-bit PV may use SYSRET, there are several more conditional branches
> there which are all unprotected.

Privilege changes are serialsing-ish, and this behaviour has been
guaranteed moving forwards, although not documented coherently.

CPL (well - privilege, which includes SMM, root/non-root, etc) is not
written speculatively.  So any logic which needs to modify privilege has
to block until it is known to be an architectural execution path.

This gets us "lfence-like" or "dispatch serialising" behaviour, which is
also the reason why INT3 is our go-to speculation halting instruction. 
Microcode has to be entirely certain we are going to deliver an
interrupt/exception/etc before it can start reading the IDT/etc.

Either way, we've been promised that all instructions like IRET,
SYS{CALL,RET,ENTER,EXIT}, VM{RUN,LAUNCH,RESUME} (and ERET{U,S} in the
future FRED world) do, and shall continue to not execute speculatively.

Which in practice means we don't need to worry about Spectre-v1 attack
against codepaths which hit an exit-from-xen path, in terms of skipping
protections.

We do need to be careful about memory accesses and potential double
dereferences, but all the data is on the top of the stack for XPTI
reasons.  About the only concern is v->arch.msrs->* in the HVM path, and
we're fine with the current layout (AFAICT).

>
> Signed-off-by: Jan Beulich 
> ---
> I have to admit that I'm not really certain about the placement of the
> IBPB wrt the MSR_SPEC_CTRL writes. For now I've simply used "opposite of
> entry".

It really doesn't matter.  They're independent operations that both need
doing, and are fully serialising so can't parallelise.

But on this note, WRMSRNS and WRMSRLIST are on the horizon.  The CPUs
which implement these instructions are the ones which also ought not to
need any adjustments in the exit paths.  So I think it is specifically
not worth trying to make any effort to turn *these* WRMSR's into more
optimised forms.

But WRMSRLIST was designed specifically for this kind of usecase
(actually, more for the main context switch path) where you can prepare
the list of MSRs in memory, including the ability to conditionally skip
certain entries by adjusting the index field.


It occurs to me, having written this out, is that what we actually want
to do is have slightly custom not-quite-alternative blocks.  We have a
sequence of independent code blocks, and a small block at the end that
happens to contain an IRET.

We could remove the nops at boot time if we treated it as one large
region, with the IRET at the end also able to have a variable position,
and assembles the "active" blocks tightly from the start.  Complications
would include adjusting the IRET extable entry, but this isn't
insurmountable.  Entrypoints are a bit more tricky but could be done by
packing from the back forward, and adjusting the entry position.

Either way, something to ponder.  (It's also possible that it doesn't
make a measurable difference until we get to FRED, at which point we
have a set of fresh entry-points to write anyway, and a slight glimmer
of hope of not needing to pollute them with speculation workarounds...)

> Since we're going to run out of SCF_* bits soon and since the new flag
> is meaningful only in struct cpu_info's spec_ctrl_flags, we could choose
> to widen that field to 16 bits right away and 

Re: [XEN v5] xen/arm: Use the correct format specifier

2023-01-25 Thread Stefano Stabellini
On Wed, 25 Jan 2023, Ayan Kumar Halder wrote:
> 1. One should use 'PRIpaddr' to display 'paddr_t' variables. However,
> while creating nodes in fdt, the address (if present in the node name)
> should be represented using 'PRIx64'. This is to be in conformance
> with the following rule present in https://elinux.org/Device_Tree_Linux
> 
> . node names
> "unit-address does not have leading zeros"
> 
> As 'PRIpaddr' introduces leading zeros, we cannot use it.
> 
> So, we have introduced a wrapper ie domain_fdt_begin_node() which will
> represent physical address using 'PRIx64'.
> 
> 2. One should use 'PRIx64' to display 'u64' in hex format. The current
> use of 'PRIpaddr' for printing PTE is buggy as this is not a physical
> address.
> 
> Signed-off-by: Ayan Kumar Halder 


Reviewed-by: Stefano Stabellini 

(I checked that Ayan also addressed Julien's latest comments.)


> ---
> Changes from -
> 
> v1 - 1. Moved the patch earlier.
> 2. Moved a part of change from "[XEN v1 8/9] xen/arm: Other adaptations 
> required to support 32bit paddr"
> into this patch.
> 
> v2 - 1. Use PRIx64 for appending addresses to fdt node names. This fixes the 
> CI failure.
> 
> v3 - 1. Added a comment on top of domain_fdt_begin_node().
> 2. Check for the return of snprintf() in domain_fdt_begin_node().
> 
> v4 - 1. Grammatical error fixes.
> 
>  xen/arch/arm/domain_build.c | 64 +++--
>  xen/arch/arm/gic-v2.c   |  6 ++--
>  xen/arch/arm/mm.c   |  2 +-
>  3 files changed, 44 insertions(+), 28 deletions(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index c2b97fa21e..a798e0b256 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1288,6 +1288,39 @@ static int __init fdt_property_interrupts(const struct 
> kernel_info *kinfo,
>  return res;
>  }
>  
> +/*
> + * Wrapper to convert physical address from paddr_t to uint64_t and
> + * invoke fdt_begin_node(). This is required as the physical address
> + * provided as part of node name should not contain any leading
> + * zeroes. Thus, one should use PRIx64 (instead of PRIpaddr) to append
> + * unit (which contains the physical address) with name to generate a
> + * node name.
> + */
> +static int __init domain_fdt_begin_node(void *fdt, const char *name,
> +uint64_t unit)
> +{
> +/*
> + * The size of the buffer to hold the longest possible string (i.e.
> + * interrupt-controller@ + a 64-bit number + \0).
> + */
> +char buf[38];
> +int ret;
> +
> +/* ePAPR 3.4 */
> +ret = snprintf(buf, sizeof(buf), "%s@%"PRIx64, name, unit);
> +
> +if ( ret >= sizeof(buf) )
> +{
> +printk(XENLOG_ERR
> +   "Insufficient buffer. Minimum size required is %d\n",
> +   (ret + 1));
> +
> +return -FDT_ERR_TRUNCATED;
> +}
> +
> +return fdt_begin_node(fdt, buf);
> +}
> +
>  static int __init make_memory_node(const struct domain *d,
> void *fdt,
> int addrcells, int sizecells,
> @@ -1296,8 +1329,6 @@ static int __init make_memory_node(const struct domain 
> *d,
>  unsigned int i;
>  int res, reg_size = addrcells + sizecells;
>  int nr_cells = 0;
> -/* Placeholder for memory@ + a 64-bit number + \0 */
> -char buf[24];
>  __be32 reg[NR_MEM_BANKS * 4 /* Worst case addrcells + sizecells */];
>  __be32 *cells;
>  
> @@ -1314,9 +1345,7 @@ static int __init make_memory_node(const struct domain 
> *d,
>  
>  dt_dprintk("Create memory node\n");
>  
> -/* ePAPR 3.4 */
> -snprintf(buf, sizeof(buf), "memory@%"PRIx64, mem->bank[i].start);
> -res = fdt_begin_node(fdt, buf);
> +res = domain_fdt_begin_node(fdt, "memory", mem->bank[i].start);
>  if ( res )
>  return res;
>  
> @@ -1375,16 +1404,13 @@ static int __init make_shm_memory_node(const struct 
> domain *d,
>  {
>  uint64_t start = mem->bank[i].start;
>  uint64_t size = mem->bank[i].size;
> -/* Placeholder for xen-shmem@ + a 64-bit number + \0 */
> -char buf[27];
>  const char compat[] = "xen,shared-memory-v1";
>  /* Worst case addrcells + sizecells */
>  __be32 reg[GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS];
>  __be32 *cells;
>  unsigned int len = (addrcells + sizecells) * sizeof(__be32);
>  
> -snprintf(buf, sizeof(buf), "xen-shmem@%"PRIx64, mem->bank[i].start);
> -res = fdt_begin_node(fdt, buf);
> +res = domain_fdt_begin_node(fdt, "xen-shmem", mem->bank[i].start);
>  if ( res )
>  return res;
>  
> @@ -2716,12 +2742,9 @@ static int __init make_gicv2_domU_node(struct 
> kernel_info *kinfo)
>  __be32 reg[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) * 2];
>  __be32 *cells;
>  const struct domain *d = kinfo->d;
> -/* Placeholder for interrupt-controller@ + a 

Re: [XEN v6] xen/arm: Probe the load/entry point address of an uImage correctly

2023-01-25 Thread Stefano Stabellini
On Wed, 25 Jan 2023, Ayan Kumar Halder wrote:
> Currently, kernel_uimage_probe() does not read the load/entry point address
> set in the uImge header. Thus, info->zimage.start is 0 (default value). This
> causes, kernel_zimage_place() to treat the binary (contained within uImage)
> as position independent executable. Thus, it loads it at an incorrect
> address.
> 
> The correct approach would be to read "uimage.load" and set
> info->zimage.start. This will ensure that the binary is loaded at the
> correct address. Also, read "uimage.ep" and set info->entry (ie kernel entry
> address).
> 
> If user provides load address (ie "uimage.load") as 0x0, then the image is
> treated as position independent executable. Xen can load such an image at
> any address it considers appropriate. A position independent executable
> cannot have a fixed entry point address.
> 
> This behavior is applicable for both arm32 and arm64 platforms.
> 
> Earlier for arm32 and arm64 platforms, Xen was ignoring the load and entry
> point address set in the uImage header. With this commit, Xen will use them.
> This makes the behavior of Xen consistent with uboot for uimage headers.
> 
> Users who want to use Xen with statically partitioned domains, can provide
> non zero load address and entry address for the dom0/domU kernel. It is
> required that the load and entry address provided must be within the memory
> region allocated by Xen.
> 
> A deviation from uboot behaviour is that we consider load address == 0x0,
> to denote that the image supports position independent execution. This
> is to make the behavior consistent across uImage and zImage.
>
> Signed-off-by: Ayan Kumar Halder 
> ---
> 
> Changes from v1 :-
> 1. Added a check to ensure load address and entry address are the same.
> 2. Considered load address == 0x0 as position independent execution.
> 3. Ensured that the uImage header interpretation is consistent across
> arm32 and arm64.
> 
> v2 :-
> 1. Mentioned the change in existing behavior in booting.txt.
> 2. Updated booting.txt with a new section to document "Booting Guests".
> 
> v3 :-
> 1. Removed the constraint that the entry point should be same as the load
> address. Thus, Xen uses both the load address and entry point to determine
> where the image is to be copied and the start address.
> 2. Updated documentation to denote that load address and start address
> should be within the memory region allocated by Xen.
> 3. Added constraint that user cannot provide entry point for a position
> independent executable (PIE) image.
> 
> v4 :-
> 1. Explicitly mentioned the version in booting.txt from when the uImage
> probing behavior has changed.
> 2. Logged the requested load address and entry point parsed from the uImage
> header.
> 3. Some style issues.
> 
> v5 :-
> 1. Set info->zimage.text_offset = 0 in kernel_uimage_probe().
> 2. Mention that if the kernel has a legacy image header on top of 
> zImage/zImage64
> header, then the attrbutes from legacy image header is used to determine the 
> load
> address, entry point, etc. Thus, zImage/zImage64 header is effectively 
> ignored.
> 
> This is true because Xen currently does not support recursive probing of 
> kernel
> headers ie if uImage header is probed, then Xen will not attempt to see if 
> there
> is an underlying zImage/zImage64 header.
> 
>  docs/misc/arm/booting.txt | 30 
>  xen/arch/arm/include/asm/kernel.h |  2 +-
>  xen/arch/arm/kernel.c | 58 +--
>  3 files changed, 86 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/misc/arm/booting.txt b/docs/misc/arm/booting.txt
> index 3e0c03e065..1837579aef 100644
> --- a/docs/misc/arm/booting.txt
> +++ b/docs/misc/arm/booting.txt
> @@ -23,6 +23,32 @@ The exceptions to this on 32-bit ARM are as follows:
>  
>  There are no exception on 64-bit ARM.
>  
> +Booting Guests
> +--
> +
> +Xen supports the legacy image header[3], zImage protocol for 32-bit
> +ARM Linux[1] and Image protocol defined for ARM64[2].
> +
> +Until Xen 4.17, in case of legacy image protocol, Xen ignored the load
> +address and entry point specified in the header. This has now changed.
> +
> +Now, it loads the image at the load address provided in the header.
> +And the entry point is used as the kernel start address.
> +
> +A deviation from uboot is that, Xen treats "load address == 0x0" as
> +position independent execution (PIE). Thus, Xen will load such an image
> +at an address it considers appropriate. Also, user cannot specify the
> +entry point of a PIE image since the start address cennot be
> +predetermined.
> +
> +Users who want to use Xen with statically partitioned domains, can provide
> +the fixed non zero load address and start address for the dom0/domU kernel.
> +The load address and start address specified by the user in the header must
> +be within the memory region allocated by Xen.
> +
> +Also, it is to be noted that if user provides the legacy image header on 

[PATCH] Add more rules to docs/misra/rules.rst

2023-01-25 Thread Stefano Stabellini
From: Stefano Stabellini 

As agreed during the last MISRA C discussion, I am adding the following
MISRA C rules: 7.1, 7.3, 18.3.

I am also adding 13.1 and 18.2 that were "agreed pending an analysis on
the amount of violations".

In the case of 13.1 there are zero violations reported by cppcheck.

In the case of 18.2, there are zero violations reported by cppcheck
after deviating the linker symbols, as discussed.

Signed-off-by: Stefano Stabellini 
---
 docs/misra/rules.rst | 25 +
 1 file changed, 25 insertions(+)

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index dcceab9388..1da79f33c1 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -138,6 +138,16 @@ existing codebase are work-in-progress.
  - Single-bit named bit fields shall not be of a signed type
  -
 
+   * - `Rule 7.1 
`_
+ - Required
+ - Octal constants shall not be used
+ -
+
+   * - `Rule 7.3 
`_
+ - Required
+ - The lowercase character l shall not be used in a literal suffix
+ -
+
* - `Rule 8.1 
`_
  - Required
  - Types shall be explicitly specified
@@ -200,6 +210,11 @@ existing codebase are work-in-progress.
expression which has potential side effects
  -
 
+   * - `Rule 13.1 
`_
+ - Required
+ - Initializer lists shall not contain persistent side effects
+ -
+
* - `Rule 14.1 
`_
  - Required
  - A loop counter shall not have essentially floating type
@@ -227,6 +242,16 @@ existing codebase are work-in-progress.
static keyword between the [ ]
  -
 
+   * - `Rule 18.2 
`_
+ - Required
+ - Subtraction between pointers shall only be applied to pointers that 
address elements of the same array
+ -
+
+   * - `Rule 18.3 
`_
+ - Required
+ - The relational operators > >= < and <= shall not be applied to objects 
of pointer type except where they point into the same object
+ -
+
* - `Rule 19.1 
`_
  - Mandatory
  - An object shall not be assigned or copied to an overlapping
-- 
2.25.1




Re: [XEN PATCH v2 0/3] Configure qemu upstream correctly by default for igd-passthru

2023-01-25 Thread Chuck Zmudzinski
On 1/25/2023 6:37 AM, Anthony PERARD wrote:
> On Tue, Jan 10, 2023 at 02:32:01AM -0500, Chuck Zmudzinski wrote:
> > I call attention to the commit message of the first patch which points
> > out that using the "pc" machine and adding the xen platform device on
> > the qemu upstream command line is not functionally equivalent to using
> > the "xenfv" machine which automatically adds the xen platform device
> > earlier in the guest creation process. As a result, there is a noticeable
> > reduction in the performance of the guest during startup with the "pc"
> > machne type even if the xen platform device is added via the qemu
> > command line options, although eventually both Linux and Windows guests
> > perform equally well once the guest operating system is fully loaded.
>
> There shouldn't be a difference between "xenfv" machine or using the
> "pc" machine while adding the "xen-platform" device, at least with
> regards to access to disk or network.
>
> The first patch of the series is using the "pc" machine without any
> "xen-platform" device, so we can't compare startup performance based on
> that.
>
> > Specifically, startup time is longer and neither the grub vga drivers
> > nor the windows vga drivers in early startup perform as well when the
> > xen platform device is added via the qemu command line instead of being
> > added immediately after the other emulated i440fx pci devices when the
> > "xenfv" machine type is used.
>
> The "xen-platform" device is mostly an hint to a guest that they can use
> pv-disk and pv-network devices. I don't think it would change anything
> with regards to graphics.
>
> > For example, when using the "pc" machine, which adds the xen platform
> > device using a command line option, the Linux guest could not display
> > the grub boot menu at the native resolution of the monitor, but with the
> > "xenfv" machine, the grub menu is displayed at the full 1920x1080
> > native resolution of the monitor for testing. So improved startup
> > performance is an advantage for the patch for qemu.
>
> I've just found out that when doing IGD passthrough, both machine
> "xenfv" and "pc" are much more different than I though ... :-(
> pc_xen_hvm_init_pci() in QEMU changes the pci-host device, which in
> turns copy some informations from the real host bridge.
> I guess this new host bridge help when the firmware setup the graphic
> for grub.
>
> > I also call attention to the last point of the commit message of the
> > second patch and the comments for reviewers section of the second patch.
> > This approach, as opposed to fixing this in qemu upstream, makes
> > maintaining the code in libxl__build_device_model_args_new more
> > difficult and therefore increases the chances of problems caused by
> > coding errors and typos for users of libxl. So that is another advantage
> > of the patch for qemu.
>
> We would just needs to use a different approach in libxl when generating
> the command line. We could probably avoid duplications. I was hopping to
> have patch series for libxl that would change the machine used to start
> using "pc" instead of "xenfv" for all configurations, but based on the
> point above (IGD specific change to "xenfv"), then I guess we can't
> really do anything from libxl to fix IGD passthrough.
>
> > OTOH, fixing this in qemu causes newer qemu versions to behave
> > differently than previous versions of qemu, which the qemu community
> > does not like, although they seem OK with the other patch since it only
> > affects qemu "xenfv" machine types, but they do not want the patch to
> > affect toolstacks like libvirt that do not use qemu upstream's
> > autoconfiguration options as much as libxl does, and, of course, libvirt
> > can manage qemu "xenfv" machines so exising "xenfv" guests configured
> > manually by libvirt could be adversely affected by the patch to qemu,
> > but only if those same guests are also configured for igd-passthrough,
> > which is likely a very small number of possibly affected libvirt users
> > of qemu.
> > 
> > A year or two ago I tried to configure guests for pci passthrough on xen
> > using libvirt's tool to convert a libxl xl.cfg file to libvirt xml. It
> > could not convert an xl.cfg file with a configuration item
> > pci = [ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...] for pci passthrough.
> > So it is unlikely there are any users out there using libvirt to
> > configure xen hvm guests for igd passthrough on xen, and those are the
> > only users that could be adversely affected by the simpler patch to qemu
> > to fix this.
>
> FYI, libvirt should be using libxl to create guest, I don't think there
> is another way for libvirt to create xen guests.

I have success using libvirt as a frontend to libxl for most of my xen guests,
except for HVM guests that have pci devices passed through because the
tool to convert an xl.cfg file to libvirt xml was not able to convert the
pci = ... line in xl.cfg. Perhaps newer versions of libvirt can do it (I 

[PATCH v2] xen/x86: public: add TSC defines for cpuid leaf 4

2023-01-25 Thread Krister Johansen
Cpuid leaf 4 contains information about how the state of the tsc, its
mode, and some additional information.  A commit that is queued for
linux would like to use this to determine whether the tsc mode has been
set to 'no emulation' in order to make some decisions about which
clocksource is more reliable.

Expose this information in the public API headers so that they can
subsequently be imported into linux and used there.

Link: 
https://lore.kernel.org/xen-devel/eda8d9f2-3013-1b68-0df8-64d7f13ee...@suse.com/
Link: 
https://lore.kernel.org/xen-devel/0835453d-9617-48d5-b2dc-77a2ac298...@oracle.com/
Signed-off-by: Krister Johansen 
---
v2.1:
  - Correct In-Reply-To header for proper threading
v2:
  - Fix whitespace between comment and #defines (feedback from Jan Beulich)
  - Add tsc mode 3: no emulate TSC_AUX (feedback from Jan Beulich)
---
 xen/include/public/arch-x86/cpuid.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/include/public/arch-x86/cpuid.h 
b/xen/include/public/arch-x86/cpuid.h
index 7ecd16ae05..090f7f0034 100644
--- a/xen/include/public/arch-x86/cpuid.h
+++ b/xen/include/public/arch-x86/cpuid.h
@@ -72,6 +72,14 @@
  * Sub-leaf 2: EAX: host tsc frequency in kHz
  */
 
+#define XEN_CPUID_TSC_EMULATED   (1u << 0)
+#define XEN_CPUID_HOST_TSC_RELIABLE  (1u << 1)
+#define XEN_CPUID_RDTSCP_INSTR_AVAIL (1u << 2)
+#define XEN_CPUID_TSC_MODE_DEFAULT   (0)
+#define XEN_CPUID_TSC_MODE_EMULATE   (1u)
+#define XEN_CPUID_TSC_MODE_NOEMULATE (2u)
+#define XEN_CPUID_TSC_MODE_NOEMULATE_TSC_AUX (3u)
+
 /*
  * Leaf 5 (0x4x04)
  * HVM-specific features
-- 
2.25.1




[PATCH v2] xen/x86: public: add TSC defines for cpuid leaf 4

2023-01-25 Thread Krister Johansen
Cpuid leaf 4 contains information about how the state of the tsc, its
mode, and some additional information.  A commit that is queued for
linux would like to use this to determine whether the tsc mode has been
set to 'no emulation' in order to make some decisions about which
clocksource is more reliable.

Expose this information in the public API headers so that they can
subsequently be imported into linux and used there.

Link: 
https://lore.kernel.org/xen-devel/eda8d9f2-3013-1b68-0df8-64d7f13ee...@suse.com/
Link: 
https://lore.kernel.org/xen-devel/0835453d-9617-48d5-b2dc-77a2ac298...@oracle.com/
Signed-off-by: Krister Johansen 
---
v2:
  - Fix whitespace between comment and #defines (feedback from Jan Beulich)
  - Add tsc mode 3: no emulate TSC_AUX (feedback from Jan Beulich)
---
 xen/include/public/arch-x86/cpuid.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/include/public/arch-x86/cpuid.h 
b/xen/include/public/arch-x86/cpuid.h
index 7ecd16ae05..090f7f0034 100644
--- a/xen/include/public/arch-x86/cpuid.h
+++ b/xen/include/public/arch-x86/cpuid.h
@@ -72,6 +72,14 @@
  * Sub-leaf 2: EAX: host tsc frequency in kHz
  */
 
+#define XEN_CPUID_TSC_EMULATED   (1u << 0)
+#define XEN_CPUID_HOST_TSC_RELIABLE  (1u << 1)
+#define XEN_CPUID_RDTSCP_INSTR_AVAIL (1u << 2)
+#define XEN_CPUID_TSC_MODE_DEFAULT   (0)
+#define XEN_CPUID_TSC_MODE_EMULATE   (1u)
+#define XEN_CPUID_TSC_MODE_NOEMULATE (2u)
+#define XEN_CPUID_TSC_MODE_NOEMULATE_TSC_AUX (3u)
+
 /*
  * Leaf 5 (0x4x04)
  * HVM-specific features
-- 
2.25.1




Re: [PATCH] xen/x86: public: add TSC defines for cpuid leaf 4

2023-01-25 Thread Krister Johansen
On Wed, Jan 25, 2023 at 07:57:16AM +0100, Jan Beulich wrote:
> On 24.01.2023 23:35, Krister Johansen wrote:
> > --- a/xen/include/public/arch-x86/cpuid.h
> > +++ b/xen/include/public/arch-x86/cpuid.h
> > @@ -71,6 +71,12 @@
> >   * EDX: shift amount for tsc->ns conversion
> >   * Sub-leaf 2: EAX: host tsc frequency in kHz
> >   */
> > +#define XEN_CPUID_TSC_EMULATED   (1u << 0)
> > +#define XEN_CPUID_HOST_TSC_RELIABLE  (1u << 1)
> > +#define XEN_CPUID_RDTSCP_INSTR_AVAIL (1u << 2)
> > +#define XEN_CPUID_TSC_MODE_DEFAULT   (0)
> > +#define XEN_CPUID_TSC_MODE_EMULATE   (1u)
> > +#define XEN_CPUID_TSC_MODE_NOEMULATE (2u)
> 
> This could do with a blank line between the two groups. You're also
> missing mode 3. Plus, as a formal remark, please follow patch
> submission rules: They are sent To: the list, with maintainers on
> Cc:.

Thanks for the feedback.  I'll make those changes.

My apologies for the breach etiquette, and thank you for the reminder
about the norms.  I'll correct the To: and CC: headers on the next go
around.

-K



[XEN PATCH v2] Create a Kconfig option to set preferred reboot method

2023-01-25 Thread Per Bilse
Provide a user-friendly option to specify preferred reboot details at
compile time.  It uses the same internals as the command line 'reboot'
parameter, and will be overridden by a choice on the command line.

Signed-off-by: Per Bilse 
---
v2: Incorporate feedback from initial patch.  Separating out warm
reboot as a separate boolean led to a proliferation of code changes,
so we now use the details from Kconfig to assemble a reboot string
identical to what would be specified on the command line.  This leads
to minimal changes and additions to the code.
---
 xen/arch/x86/Kconfig| 84 +
 xen/arch/x86/shutdown.c | 30 ++-
 2 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 6a7825f4ba..b881a118f1 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -306,6 +306,90 @@ config MEM_SHARING
bool "Xen memory sharing support (UNSUPPORTED)" if UNSUPPORTED
depends on HVM
 
+config REBOOT_SYSTEM_DEFAULT
+   bool "Xen-defined reboot method"
+   default y
+   help
+ Xen will choose the most appropriate reboot method,
+ which will be a Xen SCHEDOP hypercall if running as
+ a Xen guest, otherwise EFI, ACPI, or by way of the
+ keyboard controller, depending on system features.
+ Disabling this will allow you to specify how the
+ system will be rebooted.
+
+choice
+   bool "Reboot method"
+   depends on !REBOOT_SYSTEM_DEFAULT
+   default REBOOT_METHOD_ACPI
+   help
+ This is a compiled-in alternative to specifying the
+ reboot method on the Xen command line.  Specifying a
+ method on the command line will override both this
+ configuration and the warm boot option below.
+
+ noneSuppress automatic reboot after panics or crashes
+ triple  Force a triple fault (init)
+ kbd Use the keyboard controller
+ acpiUse the RESET_REG in the FADT
+ pci Use the so-called "PCI reset register", CF9
+ power   Like 'pci' but for a full power-cyle reset
+ efi Use the EFI reboot (if running under EFI)
+ xen Use Xen SCHEDOP hypercall (if running under Xen as a guest)
+
+   config REBOOT_METHOD_NONE
+   bool "none"
+
+   config REBOOT_METHOD_TRIPLE
+   bool "triple"
+
+   config REBOOT_METHOD_KBD
+   bool "kbd"
+
+   config REBOOT_METHOD_ACPI
+   bool "acpi"
+
+   config REBOOT_METHOD_PCI
+   bool "pci"
+
+   config REBOOT_METHOD_POWER
+   bool "power"
+
+   config REBOOT_METHOD_EFI
+   bool "efi"
+
+   config REBOOT_METHOD_XEN
+   bool "xen"
+   depends on !XEN_GUEST
+
+endchoice
+
+config REBOOT_METHOD
+   string
+   default "none"   if REBOOT_METHOD_NONE
+   default "triple" if REBOOT_METHOD_TRIPLE
+   default "kbd"if REBOOT_METHOD_KBD
+   default "acpi"   if REBOOT_METHOD_ACPI
+   default "pci"if REBOOT_METHOD_PCI
+   default "Power"  if REBOOT_METHOD_POWER
+   default "efi"if REBOOT_METHOD_EFI
+   default "xen"if REBOOT_METHOD_XEN
+
+config REBOOT_WARM
+   bool "Warm reboot"
+   default n
+   help
+ By default the system will perform a cold reboot.
+ Enable this to carry out a warm reboot.  This
+ configuration will have no effect if a "reboot="
+ string is supplied on the Xen command line; in this
+ case the reboot string must include "warm" if a warm
+ reboot is desired.
+
+config REBOOT_TEMPERATURE
+   string
+   default "warm" if REBOOT_WARM
+   default "cold" if !REBOOT_WARM && !REBOOT_SYSTEM_DEFAULT
+
 endmenu
 
 source "common/Kconfig"
diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c
index 7619544d14..4969af1316 100644
--- a/xen/arch/x86/shutdown.c
+++ b/xen/arch/x86/shutdown.c
@@ -28,6 +28,19 @@
 #include 
 #include 
 
+/*
+ * We don't define a compiled-in reboot string if both method and
+ * temperature are defaults, in which case we can compile better code.
+ */
+#ifdef CONFIG_REBOOT_METHOD
+#define REBOOT_STR CONFIG_REBOOT_METHOD "," CONFIG_REBOOT_TEMPERATURE
+#else
+#ifdef CONFIG_REBOOT_TEMPERATURE
+#define REBOOT_STR CONFIG_REBOOT_TEMPERATURE
+#endif
+#endif
+
+/* Do not modify without updating arch/x86/Kconfig, see below. */
 enum reboot_type {
 BOOT_INVALID,
 BOOT_TRIPLE = 't',
@@ -42,10 +55,13 @@ enum reboot_type {
 static int reboot_mode;
 
 /*
- * reboot=t[riple] | k[bd] | a[cpi] | p[ci] | n[o] | [e]fi [, [w]arm | [c]old]
+ * These constants are duplicated in full in arch/x86/Kconfig, keep in synch.
+ *
+ * reboot=t[riple] | k[bd] | a[cpi] | p[ci] | P[ower] | n[one] | [e]fi
+ * [, [w]arm | [c]old]
  * warm   Don't set the cold reboot flag
  * cold   Set the cold reboot flag
- * no Suppress automatic reboot after panics 

Re: [PATCH v3 0/4] x86/spec-ctrl: IPBP improvements

2023-01-25 Thread Andrew Cooper
On 25/01/2023 3:24 pm, Jan Beulich wrote:
> Versions of the two final patches were submitted standalone earlier
> on. The series here tries to carry out a suggestion from Andrew,
> which the two of us have been discussing. Then said previously posted
> patches are re-based on top, utilizing the new functionality.
>
> 1: spec-ctrl: add logic to issue IBPB on exit to guest
> 2: spec-ctrl: defer context-switch IBPB until guest entry
> 3: limit issuing of IBPB during context switch
> 4: PV: issue branch prediction barrier when switching 64-bit guest to kernel 
> mode

In the subject, you mean IBPB.  I think all the individual patches are fine.

Do you have an implementation of VMASST_TYPE_mode_switch_no_ibpb for
Linux yet?  The thing I'd like to avoid is that we commit this perf it
to Xen, without lining Linux up to be able to skip it.

~Andrew



Re: [PATCH v1 09/14] xen/riscv: introduce do_unexpected_trap()

2023-01-25 Thread Andrew Cooper
On 25/01/2023 5:01 pm, Oleksii wrote:
> On Mon, 2023-01-23 at 09:39 +1000, Alistair Francis wrote:
>> On Sat, Jan 21, 2023 at 1:00 AM Oleksii Kurochko
>>  wrote:
>>> The patch introduces the function the purpose of which is to print
>>> a cause of an exception and call "wfi" instruction.
>>>
>>> Signed-off-by: Oleksii Kurochko 
>>> ---
>>>  xen/arch/riscv/traps.c | 14 +-
>>>  1 file changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/riscv/traps.c b/xen/arch/riscv/traps.c
>>> index dd64f053a5..fc25138a4b 100644
>>> --- a/xen/arch/riscv/traps.c
>>> +++ b/xen/arch/riscv/traps.c
>>> @@ -95,7 +95,19 @@ const char *decode_cause(unsigned long cause)
>>>  return decode_trap_cause(cause);
>>>  }
>>>
>>> -void __handle_exception(struct cpu_user_regs *cpu_regs)
>>> +static void do_unexpected_trap(const struct cpu_user_regs *regs)
>>>  {
>>> +    unsigned long cause = csr_read(CSR_SCAUSE);
>>> +
>>> +    early_printk("Unhandled exception: ");
>>> +    early_printk(decode_cause(cause));
>>> +    early_printk("\n");
>>> +
>>> +    // kind of die...
>>>  wait_for_interrupt();
>> We could put this in a loop, to ensure we never progress
>>
> I think that right now there is no big difference how to stop
> because we have only 1 CPU, interrupts are disabled and we are in
> exception so it looks like anything can interrupt us.
> And in future it will be changed to panic() so we won't need here wfi()
> any more.

WFI is permitted to be implemented as a NOP by hardware.  Furthermore,
WFI with interrupts already disabled is a supported usecase, and will
resume execution without taking the interrupt that became pending.

You need an infinite loop of WFI's for execution to halt here.

~Andrew



Re: [PATCH v1 09/14] xen/riscv: introduce do_unexpected_trap()

2023-01-25 Thread Julien Grall

Hi,

On 25/01/2023 17:01, Oleksii wrote:

On Mon, 2023-01-23 at 09:39 +1000, Alistair Francis wrote:

On Sat, Jan 21, 2023 at 1:00 AM Oleksii Kurochko
 wrote:


The patch introduces the function the purpose of which is to print
a cause of an exception and call "wfi" instruction.

Signed-off-by: Oleksii Kurochko 
---
  xen/arch/riscv/traps.c | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/riscv/traps.c b/xen/arch/riscv/traps.c
index dd64f053a5..fc25138a4b 100644
--- a/xen/arch/riscv/traps.c
+++ b/xen/arch/riscv/traps.c
@@ -95,7 +95,19 @@ const char *decode_cause(unsigned long cause)
  return decode_trap_cause(cause);
  }

-void __handle_exception(struct cpu_user_regs *cpu_regs)
+static void do_unexpected_trap(const struct cpu_user_regs *regs)
  {
+    unsigned long cause = csr_read(CSR_SCAUSE);
+
+    early_printk("Unhandled exception: ");
+    early_printk(decode_cause(cause));
+    early_printk("\n");
+
+    // kind of die...
  wait_for_interrupt();


We could put this in a loop, to ensure we never progress


I think that right now there is no big difference how to stop
because we have only 1 CPU, interrupts are disabled and we are in
exception so it looks like anything can interrupt us.


From my understanding of the specification, WFI is an hint. So it could 
be implemented as a NOP.


Therefore it would sound better to wrap in a loop. That said...


And in future it will be changed to panic() so we won't need here wfi()
any more.


... ideally using panic() right now would be the best.

Cheers,

--
Julien Grall



Re: [PATCH v1 09/14] xen/riscv: introduce do_unexpected_trap()

2023-01-25 Thread Oleksii
On Mon, 2023-01-23 at 09:39 +1000, Alistair Francis wrote:
> On Sat, Jan 21, 2023 at 1:00 AM Oleksii Kurochko
>  wrote:
> > 
> > The patch introduces the function the purpose of which is to print
> > a cause of an exception and call "wfi" instruction.
> > 
> > Signed-off-by: Oleksii Kurochko 
> > ---
> >  xen/arch/riscv/traps.c | 14 +-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/riscv/traps.c b/xen/arch/riscv/traps.c
> > index dd64f053a5..fc25138a4b 100644
> > --- a/xen/arch/riscv/traps.c
> > +++ b/xen/arch/riscv/traps.c
> > @@ -95,7 +95,19 @@ const char *decode_cause(unsigned long cause)
> >  return decode_trap_cause(cause);
> >  }
> > 
> > -void __handle_exception(struct cpu_user_regs *cpu_regs)
> > +static void do_unexpected_trap(const struct cpu_user_regs *regs)
> >  {
> > +    unsigned long cause = csr_read(CSR_SCAUSE);
> > +
> > +    early_printk("Unhandled exception: ");
> > +    early_printk(decode_cause(cause));
> > +    early_printk("\n");
> > +
> > +    // kind of die...
> >  wait_for_interrupt();
> 
> We could put this in a loop, to ensure we never progress
> 
I think that right now there is no big difference how to stop
because we have only 1 CPU, interrupts are disabled and we are in
exception so it looks like anything can interrupt us.
And in future it will be changed to panic() so we won't need here wfi()
any more.
> 

Oleksii



[PATCH] x86/shadow: Fix PV32 shadowing in !HVM builds

2023-01-25 Thread Andrew Cooper
The OSSTest bisector identified an issue with c/s 1894049fa283 ("x86/shadow:
L2H shadow type is PV32-only") in !HVM builds.

The bug is ultimately caused by sh_type_to_size[] not actually being specific
to HVM guests, and it's position in shadow/hvm.c mislead the reasoning.

To fix the issue that OSSTest identified, SH_type_l2h_64_shadow must still
have the value 1 in any CONFIG_PV32 build.  But simply adjusting this leaves
us with misleading logic, and a reasonable chance of making a related error
again in the future.

In hindsight, moving sh_type_to_size[] out of common.c in the first place a
mistake.  Therefore, move sh_type_to_size[] back to living in common.c,
leaving a comment explaining why it happens to be inside an HVM conditional.

This effectively reverts the second half of 4fec945409fc ("x86/shadow: adjust
and move sh_type_to_size[]") while retaining the other improvements from the
same changeset.

While making this change, also adjust the sh_type_to_size[] declaration to
match its definition.

Fixes: 4fec945409fc ("x86/shadow: adjust and move sh_type_to_size[]")
Fixes: 1894049fa283 ("x86/shadow: L2H shadow type is PV32-only")
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: George Dunlap 
CC: Tim Deegan 

I was unsure whether it was reasonable to move the table back into its old
position but it can live pretty much anywhere in common.c as far as I'm
concerned.
---
 xen/arch/x86/mm/shadow/common.c  | 38 ++
 xen/arch/x86/mm/shadow/hvm.c | 31 ---
 xen/arch/x86/mm/shadow/private.h |  2 +-
 3 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 26901b8b3bcf..a74b15e3e75b 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -39,6 +39,44 @@
 #include 
 #include "private.h"
 
+/*
+ * This table shows the allocation behaviour of the different modes:
+ *
+ * Xen paging  64b  64b  64b
+ * Guest paging32b  pae  64b
+ * PV or HVM   HVM  HVM   *
+ * Shadow paging   pae  pae  64b
+ *
+ * sl1 size 8k   4k   4k
+ * sl2 size16k   4k   4k
+ * sl3 size --4k
+ * sl4 size --4k
+ *
+ * Note: our accessor, shadow_size(), can optimise out this table in PV-only
+ * builds.
+ */
+#ifdef CONFIG_HVM
+const uint8_t sh_type_to_size[] = {
+[SH_type_l1_32_shadow]   = 2,
+[SH_type_fl1_32_shadow]  = 2,
+[SH_type_l2_32_shadow]   = 4,
+[SH_type_l1_pae_shadow]  = 1,
+[SH_type_fl1_pae_shadow] = 1,
+[SH_type_l2_pae_shadow]  = 1,
+[SH_type_l1_64_shadow]   = 1,
+[SH_type_fl1_64_shadow]  = 1,
+[SH_type_l2_64_shadow]   = 1,
+#ifdef CONFIG_PV32
+[SH_type_l2h_64_shadow]  = 1,
+#endif
+[SH_type_l3_64_shadow]   = 1,
+[SH_type_l4_64_shadow]   = 1,
+[SH_type_p2m_table]  = 1,
+[SH_type_monitor_table]  = 1,
+[SH_type_oos_snapshot]   = 1,
+};
+#endif /* CONFIG_HVM */
+
 DEFINE_PER_CPU(uint32_t,trace_shadow_path_flags);
 
 static int cf_check sh_enable_log_dirty(struct domain *, bool log_global);
diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c
index 918865cf1b6a..88c3c16322f2 100644
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -33,37 +33,6 @@
 
 #include "private.h"
 
-/*
- * This table shows the allocation behaviour of the different modes:
- *
- * Xen paging  64b  64b  64b
- * Guest paging32b  pae  64b
- * PV or HVM   HVM  HVM   *
- * Shadow paging   pae  pae  64b
- *
- * sl1 size 8k   4k   4k
- * sl2 size16k   4k   4k
- * sl3 size --4k
- * sl4 size --4k
- */
-const uint8_t sh_type_to_size[] = {
-[SH_type_l1_32_shadow]   = 2,
-[SH_type_fl1_32_shadow]  = 2,
-[SH_type_l2_32_shadow]   = 4,
-[SH_type_l1_pae_shadow]  = 1,
-[SH_type_fl1_pae_shadow] = 1,
-[SH_type_l2_pae_shadow]  = 1,
-[SH_type_l1_64_shadow]   = 1,
-[SH_type_fl1_64_shadow]  = 1,
-[SH_type_l2_64_shadow]   = 1,
-/*  [SH_type_l2h_64_shadow]  = 1,  PV32-only */
-[SH_type_l3_64_shadow]   = 1,
-[SH_type_l4_64_shadow]   = 1,
-[SH_type_p2m_table]  = 1,
-[SH_type_monitor_table]  = 1,
-[SH_type_oos_snapshot]   = 1,
-};
-
 /**/
 /* x86 emulator support for the shadow code
  */
diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
index 7d6c846c8037..79d82364fc92 100644
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -362,7 +362,7 @@ static inline int mfn_oos_may_write(mfn_t gmfn)
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
 
 /* Figure out the size (in pages) of a given shadow type */
-extern const u8 sh_type_to_size[SH_type_unused];
+extern const uint8_t sh_type_to_size[SH_type_unused];
 static inline unsigned int
 shadow_size(unsigned int 

Re: [PATCH v4 04/11] xen: extend domctl interface for cache coloring

2023-01-25 Thread Carlo Nonato
Hi Jan,

On Tue, Jan 24, 2023 at 5:29 PM Jan Beulich  wrote:
>
> On 23.01.2023 16:47, Carlo Nonato wrote:
> > @@ -275,6 +276,19 @@ unsigned int *dom0_llc_colors(unsigned int *num_colors)
> >  return colors;
> >  }
> >
> > +unsigned int *llc_colors_from_guest(struct xen_domctl_createdomain *config)
>
> const struct ...?
>
> > +{
> > +unsigned int *colors;
> > +
> > +if ( !config->num_llc_colors )
> > +return NULL;
> > +
> > +colors = alloc_colors(config->num_llc_colors);
>
> Error handling needs to occur here; the panic() in alloc_colors() needs
> to go away.
>
> > @@ -434,7 +436,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
> > u_domctl)
> >  rover = dom;
> >  }
> >
> > -d = domain_create(dom, >u.createdomain, false);
> > +if ( llc_coloring_enabled )
> > +{
> > +llc_colors = llc_colors_from_guest(>u.createdomain);
> > +num_llc_colors = op->u.createdomain.num_llc_colors;
>
> I think you would better avoid setting num_llc_colors to non-zero if
> you got back NULL from the function. It's at best confusing.
>
> > @@ -92,6 +92,10 @@ struct xen_domctl_createdomain {
> >  /* CPU pool to use; specify 0 or a specific existing pool */
> >  uint32_t cpupool_id;
> >
> > +/* IN LLC coloring parameters */
> > +uint32_t num_llc_colors;
> > +XEN_GUEST_HANDLE(uint32) llc_colors;
>
> Despite your earlier replies I continue to be unconvinced that this
> is information which needs to be available right at domain_create.
> Without that you'd also get away without the sufficiently odd
> domain_create_llc_colored(). (Odd because: Think of two or three
> more extended features appearing, all of which want a special cased
> domain_create().)

Yes, I definitely see your point. Still there is the p2m table allocation
problem that you and Julien have discussed previously. I'm not sure I
understood what the approach is.

> Jan



Re: [PATCH v4 01/11] xen/common: add cache coloring common code

2023-01-25 Thread Carlo Nonato
On Wed, Jan 25, 2023 at 2:10 PM Jan Beulich  wrote:
>
> On 25.01.2023 12:18, Carlo Nonato wrote:
> > On Tue, Jan 24, 2023 at 5:37 PM Jan Beulich  wrote:
> >> On 23.01.2023 16:47, Carlo Nonato wrote:
> >>> --- /dev/null
> >>> +++ b/xen/include/xen/llc_coloring.h
> >>> @@ -0,0 +1,54 @@
> >>> +/* SPDX-License-Identifier: GPL-2.0 */
> >>> +/*
> >>> + * Last Level Cache (LLC) coloring common header
> >>> + *
> >>> + * Copyright (C) 2022 Xilinx Inc.
> >>> + *
> >>> + * Authors:
> >>> + *Carlo Nonato 
> >>> + */
> >>> +#ifndef __COLORING_H__
> >>> +#define __COLORING_H__
> >>> +
> >>> +#include 
> >>> +#include 
> >>> +
> >>> +#ifdef CONFIG_HAS_LLC_COLORING
> >>> +
> >>> +#include 
> >>> +
> >>> +extern bool llc_coloring_enabled;
> >>> +
> >>> +int domain_llc_coloring_init(struct domain *d, unsigned int *colors,
> >>> + unsigned int num_colors);
> >>> +void domain_llc_coloring_free(struct domain *d);
> >>> +void domain_dump_llc_colors(struct domain *d);
> >>> +
> >>> +#else
> >>> +
> >>> +#define llc_coloring_enabled (false)
> >>
> >> While I agree this is needed, ...
> >>
> >>> +static inline int domain_llc_coloring_init(struct domain *d,
> >>> +   unsigned int *colors,
> >>> +   unsigned int num_colors)
> >>> +{
> >>> +return 0;
> >>> +}
> >>> +static inline void domain_llc_coloring_free(struct domain *d) {}
> >>> +static inline void domain_dump_llc_colors(struct domain *d) {}
> >>
> >> ... I don't think you need any of these. Instead the declarations above
> >> simply need to be visible unconditionally (to be visible to the compiler
> >> when processing consuming code). We rely on DCE to remove such references
> >> in many other places.
> >
> > So this is true for any other stub function that I used in the series, 
> > right?
>
> Likely. I didn't look at most of the Arm-only pieces.
>
> >>> --- a/xen/include/xen/sched.h
> >>> +++ b/xen/include/xen/sched.h
> >>> @@ -602,6 +602,9 @@ struct domain
> >>>
> >>>  /* Holding CDF_* constant. Internal flags for domain creation. */
> >>>  unsigned int cdf;
> >>> +
> >>> +unsigned int *llc_colors;
> >>> +unsigned int num_llc_colors;
> >>>  };
> >>
> >> Why outside of any #ifdef, and why not in struct arch_domain?
> >
> > Moving this in sched.h seemed like the natural continuation of the common +
> > arch specific split. Notice that this split is also because Julien pointed
> > out (as you did in some earlier revision) that cache coloring can be used
> > by other arch in the future (even if x86 is excluded). Having two 
> > maintainers
> > saying the same thing sounded like a good reason to do that.
>
> If you mean this to be usable by other arch-es as well (which I would
> welcome, as I think I had expressed on an earlier version), then I think
> more pieces want to be in common code. But putting the fields here and all
> users of them in arch-specific code (which I think is the way I saw it)
> doesn't look very logical to me. IOW to me there exist only two possible
> approaches: As much as possible in common code, or common code being
> disturbed as little as possible.

This means having a llc-coloring.c in common where to put the common
implementation, right?
Anyway right now there is also another user of such fields in common:
page_alloc.c.

> > The missing #ifdef comes from a discussion I had with Julien in v2 about
> > domctl interface where he suggested removing it
> > (https://marc.info/?l=xen-devel=166151802002263).
>
> I went about five levels deep in the replies, without finding any such reply
> from Julien. Can you please be more specific with the link, so readers don't
> need to endlessly dig?

https://marc.info/?l=xen-devel=19617917298

quote (me and then Julien):
>> We can also think of moving the coloring fields from this
>> struct to the common one (xen_domctl_createdomain) protecting them with
>> the proper #ifdef (but we are targeting only arm64...).

> Your code is targeting arm64 but fundamentally this is an arm64 specific
> feature. IOW, this could be used in the future on other arch. So I think
> it would make sense to define it in common without the #ifdef.

> Jan
>
> > We were talking about
> > a different struct, but I thought the principle was the same. Anyway I would
> > like the #ifdef too.
> >
> > So @Jan, @Julien, can you help me fix this once for all?
> >
> > Thanks.
> >
> > - Carlo Nonato
>



Re: [PATCH v2 1/2] libxl: Fix guest kexec - skip cpuid policy

2023-01-25 Thread Anthony PERARD
On Mon, Jan 23, 2023 at 09:59:38PM -0500, Jason Andryuk wrote:
> When a domain performs a kexec (soft reset), libxl__build_pre() is
> called with the existing domid.  Calling libxl__cpuid_legacy() on the
> existing domain fails since the cpuid policy has already been set, and
> the guest isn't rebuilt and doesn't kexec.
> 
> xc: error: Failed to set d1's policy (err leaf 0x, subleaf 
> 0x, msr 0x) (17 = File exists): Internal error
> libxl: error: libxl_cpuid.c:494:libxl__cpuid_legacy: Domain 1:Failed to apply 
> CPUID policy: File exists
> libxl: error: libxl_create.c:1641:domcreate_rebuild_done: Domain 1:cannot 
> (re-)build domain: -3
> libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read 
> failed: `/libxl/1/type': No such file or directory
> libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type 
> for domid=1, assuming HVM
> 
> During a soft_reset, skip calling libxl__cpuid_legacy() to avoid the
> issue.  Before the fixes commit, the libxl__cpuid_legacy() failure would

s/fixes/fixed/ or maybe better just write: "before commit 34990446ca91".

> have been ignored, so kexec would continue.
> 
> Fixes: 34990446ca91 "libxl: don't ignore the return value from 
> xc_cpuid_apply_policy"

FYI, the tags format is with () around the commit title:
Fixes: 34990446ca91 ("libxl: don't ignore the return value from 
xc_cpuid_apply_policy")
I have this in my git config file to help generate those:
[alias]
fixes = log -1 --abbrev=12 --format=tformat:'Fixes: %h (\"%s\")'


> Signed-off-by: Jason Andryuk 
> ---
> Probably a backport candidate since this has been broken for a while.
> 
> v2:
> Use soft_reset field in libxl__domain_build_state. - Juergen

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD



Re: [PATCH v2 4/8] x86/mem-sharing: copy GADDR based shared guest areas

2023-01-25 Thread Tamas K Lengyel
On Tue, Jan 24, 2023 at 6:19 AM Jan Beulich  wrote:
>
> On 23.01.2023 19:32, Tamas K Lengyel wrote:
> > On Mon, Jan 23, 2023 at 11:24 AM Jan Beulich  wrote:
> >> On 23.01.2023 17:09, Tamas K Lengyel wrote:
> >>> On Mon, Jan 23, 2023 at 9:55 AM Jan Beulich  wrote:
>  --- a/xen/arch/x86/mm/mem_sharing.c
>  +++ b/xen/arch/x86/mm/mem_sharing.c
>  @@ -1653,6 +1653,65 @@ static void copy_vcpu_nonreg_state(struc
>   hvm_set_nonreg_state(cd_vcpu, );
>   }
> 
>  +static int copy_guest_area(struct guest_area *cd_area,
>  +   const struct guest_area *d_area,
>  +   struct vcpu *cd_vcpu,
>  +   const struct domain *d)
>  +{
>  +mfn_t d_mfn, cd_mfn;
>  +
>  +if ( !d_area->pg )
>  +return 0;
>  +
>  +d_mfn = page_to_mfn(d_area->pg);
>  +
>  +/* Allocate & map a page for the area if it hasn't been already.
> > */
>  +if ( !cd_area->pg )
>  +{
>  +gfn_t gfn = mfn_to_gfn(d, d_mfn);
>  +struct p2m_domain *p2m = p2m_get_hostp2m(cd_vcpu->domain);
>  +p2m_type_t p2mt;
>  +p2m_access_t p2ma;
>  +unsigned int offset;
>  +int ret;
>  +
>  +cd_mfn = p2m->get_entry(p2m, gfn, , , 0, NULL,
> > NULL);
>  +if ( mfn_eq(cd_mfn, INVALID_MFN) )
>  +{
>  +struct page_info *pg =
alloc_domheap_page(cd_vcpu->domain,
> >>> 0);
>  +
>  +if ( !pg )
>  +return -ENOMEM;
>  +
>  +cd_mfn = page_to_mfn(pg);
>  +set_gpfn_from_mfn(mfn_x(cd_mfn), gfn_x(gfn));
>  +
>  +ret = p2m->set_entry(p2m, gfn, cd_mfn, PAGE_ORDER_4K,
> >>> p2m_ram_rw,
>  + p2m->default_access, -1);
>  +if ( ret )
>  +return ret;
>  +}
>  +else if ( p2mt != p2m_ram_rw )
>  +return -EBUSY;
>  +
>  +/*
>  + * Simply specify the entire range up to the end of the
page.
> >>> All the
>  + * function uses it for is a check for not crossing page
> >>> boundaries.
>  + */
>  +offset = PAGE_OFFSET(d_area->map);
>  +ret = map_guest_area(cd_vcpu, gfn_to_gaddr(gfn) + offset,
>  + PAGE_SIZE - offset, cd_area, NULL);
>  +if ( ret )
>  +return ret;
>  +}
>  +else
>  +cd_mfn = page_to_mfn(cd_area->pg);
> >>>
> >>> Everything to this point seems to be non mem-sharing/forking related.
> > Could
> >>> these live somewhere else? There must be some other place where
> > allocating
> >>> these areas happens already for non-fork VMs so it would make sense to
> > just
> >>> refactor that code to be callable from here.
> >>
> >> It is the "copy" aspect with makes this mem-sharing (or really fork)
> >> specific. Plus in the end this is no different from what you have
> >> there right now for copying the vCPU info area. In the final patch
> >> that other code gets removed by re-using the code here.
> >
> > Yes, the copy part is fork-specific. Arguably if there was a way to do
the
> > allocation of the page for vcpu_info I would prefer that being
elsewhere,
> > but while the only requirement is allocate-page and copy from parent
I'm OK
> > with that logic being in here because it's really straight forward. But
now
> > you also do extra sanity checks here which are harder to comprehend in
this
> > context alone.
>
> What sanity checks are you talking about (also below, where you claim
> map_guest_area() would be used only to sanity check)?

Did I misread your comment above "All the function uses it for is a check
for not crossing page boundaries"? That sounds to me like a simple sanity
check, unclear why it matters though and why only for forks.

>
> > What if extra sanity checks will be needed in the future? Or
> > the sanity checks in the future diverge from where this happens for
normal
> > VMs because someone overlooks this needing to be synched here too?
> >
> >> I also haven't been able to spot anything that could be factored
> >> out (and one might expect that if there was something, then the vCPU
> >> info area copying should also already have used it). map_guest_area()
> >> is all that is used for other purposes as well.
> >
> > Well, there must be a location where all this happens for normal VMs as
> > well, no?
>
> That's map_guest_area(). What is needed here but not elsewhere is the
> populating of the GFN underlying the to-be-mapped area. That's the code
> being added here, mirroring what you need to do for the vCPU info page.
> Similar code isn't needed elsewhere because the guest invoked operation
> is purely a "map" - the underlying pages are already expected to be
> populated (which of course we check, or else we 

[PATCH v3 4/4] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode

2023-01-25 Thread Jan Beulich
Since both kernel and user mode run in ring 3, they run in the same
"predictor mode". While the kernel could take care of this itself, doing
so would be yet another item distinguishing PV from native. Additionally
we're in a much better position to issue the barrier command, and we can
save a #GP (for privileged instruction emulation) this way.

To allow to recover performance, introduce a new VM assist allowing the
guest kernel to suppress this barrier. Make availability of the assist
dependent upon the command line control, such that kernels have a way to
know whether their request actually took any effect.

Note that because of its use in PV64_VM_ASSIST_MASK, the declaration of
opt_ibpb_mode_switch can't live in asm/spec_ctrl.h.

Signed-off-by: Jan Beulich 
---
Is the placement of the clearing of opt_ibpb_ctxt_switch correct in
parse_spec_ctrl()? Shouldn't it live ahead of the "disable_common"
label, as being about guest protection, not Xen's?

Adding setting of the variable to the "pv" sub-case in parse_spec_ctrl()
didn't seem quite right to me, considering that we default it to the
opposite of opt_ibpb_entry_pv.
---
v3: Leverage exit-IBPB. Introduce separate command line control.
v2: Leverage entry-IBPB. Add VM assist. Re-base.

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2315,8 +2315,8 @@ By default SSBD will be mitigated at run
 ### spec-ctrl (x86)
 > `= List of [ , xen=, {pv,hvm}=,
 >  {msr-sc,rsb,md-clear,ibpb-entry}=|{pv,hvm}=,
->  bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
->  eager-fpu,l1d-flush,branch-harden,srb-lock,
+>  bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ibpb-mode-switch,
+>  ssbd,psfd,eager-fpu,l1d-flush,branch-harden,srb-lock,
 >  unpriv-mmio}= ]`
 
 Controls for speculative execution sidechannel mitigations.  By default, Xen
@@ -2398,7 +2398,10 @@ default.
 
 On hardware supporting IBPB (Indirect Branch Prediction Barrier), the `ibpb=`
 option can be used to force (the default) or prevent Xen from issuing branch
-prediction barriers on vcpu context switches.
+prediction barriers on vcpu context switches.  On such hardware the
+`ibpb-mode-switch` option can be used to control whether, by default, Xen
+would issue branch prediction barriers when 64-bit PV guests switch from
+user to kernel mode.  If enabled, guest kernels can op out of this behavior.
 
 On all hardware, the `eager-fpu=` option can be used to force or prevent Xen
 from using fully eager FPU context switches.  This is currently implemented as
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -742,6 +742,8 @@ static inline void pv_inject_sw_interrup
 pv_inject_event();
 }
 
+extern int8_t opt_ibpb_mode_switch;
+
 #define PV32_VM_ASSIST_MASK ((1UL << VMASST_TYPE_4gb_segments)| \
  (1UL << VMASST_TYPE_4gb_segments_notify) | \
  (1UL << VMASST_TYPE_writable_pagetables) | \
@@ -753,7 +755,9 @@ static inline void pv_inject_sw_interrup
  * but we can't make such requests fail all of the sudden.
  */
 #define PV64_VM_ASSIST_MASK (PV32_VM_ASSIST_MASK  | \
- (1UL << VMASST_TYPE_m2p_strict))
+ (1UL << VMASST_TYPE_m2p_strict)  | \
+ ((opt_ibpb_mode_switch + 0UL) <<   \
+  VMASST_TYPE_mode_switch_no_ibpb))
 #define HVM_VM_ASSIST_MASK  (1UL << VMASST_TYPE_runstate_update_flag)
 
 #define arch_vm_assist_valid_mask(d) \
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -455,6 +455,7 @@ static void _toggle_guest_pt(struct vcpu
 void toggle_guest_mode(struct vcpu *v)
 {
 const struct domain *d = v->domain;
+struct cpu_info *cpu_info = get_cpu_info();
 unsigned long gs_base;
 
 ASSERT(!is_pv_32bit_vcpu(v));
@@ -467,15 +468,21 @@ void toggle_guest_mode(struct vcpu *v)
 if ( v->arch.flags & TF_kernel_mode )
 v->arch.pv.gs_base_kernel = gs_base;
 else
+{
 v->arch.pv.gs_base_user = gs_base;
+
+if ( opt_ibpb_mode_switch &&
+ !(d->arch.spec_ctrl_flags & SCF_entry_ibpb) &&
+ !VM_ASSIST(d, mode_switch_no_ibpb) )
+cpu_info->spec_ctrl_flags |= SCF_exit_ibpb;
+}
+
 asm volatile ( "swapgs" );
 
 _toggle_guest_pt(v);
 
 if ( d->arch.pv.xpti )
 {
-struct cpu_info *cpu_info = get_cpu_info();
-
 cpu_info->root_pgt_changed = true;
 cpu_info->pv_cr3 = __pa(this_cpu(root_pgt)) |
(d->arch.pv.pcid ? get_pcid_bits(v, true) : 0);
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -60,6 +60,7 @@ bool __ro_after_init opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
 int8_t __ro_after_init opt_ibpb_ctxt_switch = -1;
+int8_t __ro_after_init opt_ibpb_mode_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 

[PATCH v3 3/4] x86: limit issuing of IBPB during context switch

2023-01-25 Thread Jan Beulich
When the outgoing vCPU had IBPB issued upon entering Xen there's no
need for a 2nd barrier during context switch.

Signed-off-by: Jan Beulich 
---
v3: Fold into series.

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2015,7 +2015,8 @@ void context_switch(struct vcpu *prev, s
 
 ctxt_switch_levelling(next);
 
-if ( opt_ibpb_ctxt_switch && !is_idle_domain(nextd) )
+if ( opt_ibpb_ctxt_switch && !is_idle_domain(nextd) &&
+ !(prevd->arch.spec_ctrl_flags & SCF_entry_ibpb) )
 {
 static DEFINE_PER_CPU(unsigned int, last);
 unsigned int *last_id = _cpu(last);




[PATCH v3 2/4] x86/spec-ctrl: defer context-switch IBPB until guest entry

2023-01-25 Thread Jan Beulich
In order to avoid clobbering Xen's own predictions, defer the barrier as
much as possible. Merely mark the CPU as needing a barrier issued the
next time we're exiting to guest context.

Suggested-by: Andrew Cooper 
Signed-off-by: Jan Beulich 
---
I couldn't find any sensible (central/unique) place where to move the
comment which is being deleted alongside spec_ctrl_new_guest_context().
---
v3: New.

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2038,7 +2038,7 @@ void context_switch(struct vcpu *prev, s
  */
 if ( *last_id != next_id )
 {
-spec_ctrl_new_guest_context();
+info->spec_ctrl_flags |= SCF_exit_ibpb;
 *last_id = next_id;
 }
 }
--- a/xen/arch/x86/include/asm/spec_ctrl.h
+++ b/xen/arch/x86/include/asm/spec_ctrl.h
@@ -67,28 +67,6 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-/*
- * Switch to a new guest prediction context.
- *
- * This flushes all indirect branch predictors (BTB, RSB/RAS), so guest code
- * which has previously run on this CPU can't attack subsequent guest code.
- *
- * As this flushes the RSB/RAS, it destroys the predictions of the calling
- * context.  For best performace, arrange for this to be used when we're going
- * to jump out of the current context, e.g. with reset_stack_and_jump().
- *
- * For hardware which mis-implements IBPB, fix up by flushing the RSB/RAS
- * manually.
- */
-static always_inline void spec_ctrl_new_guest_context(void)
-{
-wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
-
-/* (ab)use alternative_input() to specify clobbers. */
-alternative_input("", "DO_OVERWRITE_RSB", X86_BUG_IBPB_NO_RET,
-  : "rax", "rcx");
-}
-
 extern int8_t opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -854,6 +854,11 @@ static void __init ibpb_calculations(voi
  */
 if ( opt_ibpb_ctxt_switch == -1 )
 opt_ibpb_ctxt_switch = !(opt_ibpb_entry_hvm && opt_ibpb_entry_pv);
+if ( opt_ibpb_ctxt_switch )
+{
+setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_PV);
+setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_HVM);
+}
 }
 
 /* Calculate whether this CPU is vulnerable to L1TF. */




[PATCH v3 1/4] x86/spec-ctrl: add logic to issue IBPB on exit to guest

2023-01-25 Thread Jan Beulich
In order to be able to defer the context switch IBPB to the last
possible point, add logic to the exit-to-guest paths to issue the
barrier there, including the "IBPB doesn't flush the RSB/RAS"
workaround. Since alternatives, for now at least, can't nest, emit JMP
to skip past both constructs where both are needed. This may be more
efficient anyway, as the sequence of NOPs is pretty long.

LFENCEs are omitted - for HVM a VM entry is immanent, which already
elsewhere we deem sufficiently serializing an event. For 32-bit PV
we're going through IRET, which ought to be good enough as well. While
64-bit PV may use SYSRET, there are several more conditional branches
there which are all unprotected.

Signed-off-by: Jan Beulich 
---
I have to admit that I'm not really certain about the placement of the
IBPB wrt the MSR_SPEC_CTRL writes. For now I've simply used "opposite of
entry".

Since we're going to run out of SCF_* bits soon and since the new flag
is meaningful only in struct cpu_info's spec_ctrl_flags, we could choose
to widen that field to 16 bits right away and then use bit 8 (or higher)
for the purpose here.
---
v3: New.

--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -75,6 +75,12 @@ __UNLIKELY_END(nsvm_hap)
 .endm
 ALTERNATIVE "", svm_vmentry_spec_ctrl, X86_FEATURE_SC_MSR_HVM
 
+ALTERNATIVE "jmp 2f", __stringify(DO_SPEC_CTRL_EXIT_IBPB 
disp=(2f-1f)), \
+X86_FEATURE_IBPB_EXIT_HVM
+1:
+ALTERNATIVE "", DO_OVERWRITE_RSB, X86_BUG_IBPB_NO_RET
+2:
+
 pop  %r15
 pop  %r14
 pop  %r13
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -86,7 +86,8 @@ UNLIKELY_END(realmode)
 jz .Lvmx_vmentry_restart
 
 /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-/* SPEC_CTRL_EXIT_TO_VMX   Req: %rsp=regs/cpuinfo  Clob:   
 */
+/* SPEC_CTRL_EXIT_TO_VMX   Req: %rsp=regs/cpuinfo  Clob: 
acd */
+ALTERNATIVE "", DO_SPEC_CTRL_EXIT_IBPB, X86_FEATURE_IBPB_EXIT_HVM
 DO_SPEC_CTRL_COND_VERW
 
 mov  VCPU_hvm_guest_cr2(%rbx),%rax
--- a/xen/arch/x86/include/asm/cpufeatures.h
+++ b/xen/arch/x86/include/asm/cpufeatures.h
@@ -39,8 +39,10 @@ XEN_CPUFEATURE(XEN_LBR,   X86_SY
 XEN_CPUFEATURE(SC_VERW_IDLE,  X86_SYNTH(25)) /* VERW used by Xen for idle 
*/
 XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks 
*/
 XEN_CPUFEATURE(XEN_IBT,   X86_SYNTH(27)) /* Xen uses CET Indirect 
Branch Tracking */
-XEN_CPUFEATURE(IBPB_ENTRY_PV, X86_SYNTH(28)) /* MSR_PRED_CMD used by Xen 
for PV */
-XEN_CPUFEATURE(IBPB_ENTRY_HVM,X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen 
for HVM */
+XEN_CPUFEATURE(IBPB_ENTRY_PV, X86_SYNTH(28)) /* MSR_PRED_CMD used by Xen 
when entered from PV */
+XEN_CPUFEATURE(IBPB_ENTRY_HVM,X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen 
when entered from HVM */
+XEN_CPUFEATURE(IBPB_EXIT_PV,  X86_SYNTH(30)) /* MSR_PRED_CMD used by Xen 
when exiting to PV */
+XEN_CPUFEATURE(IBPB_EXIT_HVM, X86_SYNTH(31)) /* MSR_PRED_CMD used by Xen 
when exiting to HVM */
 
 /* Bug words follow the synthetic words. */
 #define X86_NR_BUG 1
--- a/xen/arch/x86/include/asm/current.h
+++ b/xen/arch/x86/include/asm/current.h
@@ -55,9 +55,13 @@ struct cpu_info {
 
 /* See asm/spec_ctrl_asm.h for usage. */
 unsigned int shadow_spec_ctrl;
+/*
+ * spec_ctrl_flags can be accessed as a 32-bit entity and hence needs
+ * placing suitably.
+ */
+uint8_t  spec_ctrl_flags;
 uint8_t  xen_spec_ctrl;
 uint8_t  last_spec_ctrl;
-uint8_t  spec_ctrl_flags;
 
 /*
  * The following field controls copying of the L4 page table of 64-bit
--- a/xen/arch/x86/include/asm/spec_ctrl.h
+++ b/xen/arch/x86/include/asm/spec_ctrl.h
@@ -36,6 +36,8 @@
 #define SCF_verw   (1 << 3)
 #define SCF_ist_ibpb   (1 << 4)
 #define SCF_entry_ibpb (1 << 5)
+#define SCF_exit_ibpb_bit 6
+#define SCF_exit_ibpb  (1 << SCF_exit_ibpb_bit)
 
 /*
  * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -117,6 +117,27 @@
 .L\@_done:
 .endm
 
+.macro DO_SPEC_CTRL_EXIT_IBPB disp=0
+/*
+ * Requires %rsp=regs
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * Conditionally issue IBPB if SCF_exit_ibpb is active.  The macro invocation
+ * may be followed by X86_BUG_IBPB_NO_RET workaround code.  The "disp" argument
+ * is to allow invocation sites to pass in the extra amount of code which needs
+ * skipping in case no action is necessary.
+ *
+ * The flag is a "one-shot" indicator, so it is being cleared at the same time.
+ */
+btrl$SCF_exit_ibpb_bit, CPUINFO_spec_ctrl_flags(%rsp)
+jnc .L\@_skip + (\disp)
+mov $MSR_PRED_CMD, %ecx
+mov $PRED_CMD_IBPB, %eax
+xor %edx, %edx
+wrmsr
+.L\@_skip:
+.endm
+
 .macro DO_OVERWRITE_RSB tmp=rax
 /*
  * 

[PATCH v3 0/4] x86/spec-ctrl: IPBP improvements

2023-01-25 Thread Jan Beulich
Versions of the two final patches were submitted standalone earlier
on. The series here tries to carry out a suggestion from Andrew,
which the two of us have been discussing. Then said previously posted
patches are re-based on top, utilizing the new functionality.

1: spec-ctrl: add logic to issue IBPB on exit to guest
2: spec-ctrl: defer context-switch IBPB until guest entry
3: limit issuing of IBPB during context switch
4: PV: issue branch prediction barrier when switching 64-bit guest to kernel 
mode

Jan



Xen Security Advisory 425 v1 (CVE-2022-42330) - Guests can cause Xenstore crash via soft reset

2023-01-25 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory CVE-2022-42330 / XSA-425

Guests can cause Xenstore crash via soft reset

ISSUE DESCRIPTION
=

When a guest issues a "Soft Reset" (e.g. for performing a kexec) the
libxl based Xen toolstack will normally perform a XS_RELEASE Xenstore
operation.

Due to a bug in xenstored this can result in a crash of xenstored.

Any other use of XS_RELEASE will have the same impact.

IMPACT
==

A malicious guest could try to kexec until it hits the xenstored bug,
resulting in the inability to perform any further domain administration
like starting new guests, or adding/removing resources to or from any
existing guest.

VULNERABLE SYSTEMS
==

Only Xen version 4.17 is vulnerable. Systems running an older version
of Xen are not vulnerable.

All Xen systems using C xenstored are vulnerable. Systems using the
OCaml variant of xenstored are not vulnerable.

Systems running only PV guests (x86 only) are not vulnerable, as long as
they are using a libxl based toolstack.

MITIGATION
==

The problem can be avoided by either:

- - using the OCaml xenstored variant

- - explicitly configuring guests to NOT perform the "Soft Reset" action
  by adding:
on_soft_reset="reboot"
  or similar to the guest's configuration. This will break kexec in the
  guest, though.

NOTE REGARDING LACK OF EMBARGO
==

This issue was discussed in public already.

RESOLUTION
==

Applying the attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa425.patch   xen-unstable, Xen 4.17.x

$ sha256sum xsa425*
49f322c955fe7857cc824bba80625e56f582fdf0a4b244f513b6750e15ba5e48  xsa425.patch
$

-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmPRQroMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZEpsIAJmIVB2lvqT2Qdp0pPSoaJIxXxuGE320kVTWmudB
F2WbRCxeubqoOC/MyHTLOujMix6wBHnbm1cMQo0r4Vah/KX34vPS3wYqDZQYZtES
aEkOQ+214QLAS2futcT0gde9idKpShI9jjWSRwcH01a7V6tlwwidc4V0luUFV0iX
EKHPJ89rbbCMP1fOq5B+C7UP8oyiHItNWPWPFBwtUeXKvFiPOoyUPCoTHG8CCYHG
WiVbeaZab7x/9+WUwXJ6hZqZiVr6NqoaItOx9Nbw4yCHwJlAj2UfA9skmqtGbPbB
vxhkbIgOeiWoPvZgTGQjzZLosWO5+y30Fv5QYIbjA2/1OSQ=
=7kiM
-END PGP SIGNATURE-


xsa425.patch
Description: Binary data


Re: [PATCH v1 07/14] xen/riscv: introduce exception handlers implementation

2023-01-25 Thread Oleksii
On Mon, 2023-01-23 at 11:50 +, Andrew Cooper wrote:
> 
> 
> > +    /* Save context to stack */
> > +    REG_S   sp, (RISCV_CPU_USER_REGS_OFFSET(sp) -
> > RISCV_CPU_USER_REGS_SIZE) (sp)
> > +    addi    sp, sp, -RISCV_CPU_USER_REGS_SIZE
> > +    REG_S   t0, RISCV_CPU_USER_REGS_OFFSET(t0)(sp)
> 
> Exceptions on RISC-V don't adjust the stack pointer.  This logic
> depends
> on interrupting Xen code, and Xen not having suffered a stack
> overflow
> (and actually, that the space on the stack for all registers also
> doesn't overflow).
> 
Probably I missed something but an idea of the code above was to
reserve memory on a stack to save the registers which can be changed
in __handler_expception() as the line of code where exception occurs
will expect that registers value weren't changed.
Otherwise if we won't reserve memory on stack it will be corrupted by
REG_S which basically is SD instruction.





[linux-linus test] 176115: regressions - FAIL

2023-01-25 Thread osstest service owner
flight 176115 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/176115/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-examine  8 reboot   fail REGR. vs. 173462
 test-arm64-arm64-xl-vhd   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-seattle   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-arndale   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-multivcpu  8 xen-bootfail REGR. vs. 173462
 test-armhf-armhf-xl   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-vhd   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-libvirt-raw  8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-libvirt  8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-libvirt-raw  8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-examine  8 reboot   fail REGR. vs. 173462
 test-armhf-armhf-libvirt-qcow2  8 xen-boot   fail REGR. vs. 173462
 test-armhf-armhf-xl-credit2   8 xen-boot fail REGR. vs. 173462

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds  8 xen-boot fail REGR. vs. 173462

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173462
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173462
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass

version targeted for testing:
 linux948ef7bb70c4acaf74d87420ea3a1190862d4548
baseline version:
 linux9d84bb40bcb30a7fa16f33baa967aeb9953dda78

Last test of basis   173462  2022-10-07 18:41:45 Z  109 days
Failing since173470  2022-10-08 06:21:34 Z  109 days  225 attempts
Testing same since   176115  2023-01-25 03:57:20 Z0 days1 attempts


3442 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-coresched-amd64-xlpass
 test-arm64-arm64-xl  fail
 test-armhf-armhf-xl  fail
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 

Re: [RFC PATCH 0/8] SVE feature for arm guests

2023-01-25 Thread Julien Grall

Hi Bertrand,

On 25/01/2023 13:21, Bertrand Marquis wrote:

On 13 Jan 2023, at 09:44, Julien Grall  wrote:

Hi Luca,

On 12/01/2023 11:58, Luca Fancellu wrote:

On 11 Jan 2023, at 16:59, Julien Grall  wrote:
On 11/01/2023 14:38, Luca Fancellu wrote:

This serie is introducing the possibility for Dom0 and DomU guests to use
sve/sve2 instructions.
SVE feature introduces new instruction and registers to improve performances on
floating point operations.
The SVE feature is advertised using the ID_AA64PFR0_EL1 register, SVE field, and
when available the ID_AA64ZFR0_EL1 register provides additional information
about the implemented version and other SVE feature.
New registers added by the SVE feature are Z0-Z31, P0-P15, FFR, ZCR_ELx.
Z0-Z31 are scalable vector register whose size is implementation defined and
goes from 128 bits to maximum 2048, the term vector length will be used to refer
to this quantity.
P0-P15 are predicate registers and the size is the vector length divided by 8,
same size is the FFR (First Fault Register).
ZCR_ELx is a register that can control and restrict the maximum vector length
used by the  exception level and all the lower exception levels, so for
example EL3 can restrict the vector length usable by EL3,2,1,0.
The platform has a maximum implemented vector length, so for every value
written in ZCR register, if this value is above the implemented length, then the
lower value will be used. The RDVL instruction can be used to check what vector
length is the HW using after setting ZCR.
For an SVE guest, the V0-V31 registers are part of the Z0-Z31, so there is no
need to save them separately, saving Z0-Z31 will save implicitly also V0-V31.
SVE usage can be trapped using a flag in CPTR_EL2, hence in this serie the
register is added to the domain state, to be able to trap only the guests that
are not allowed to use SVE.
This serie is introducing a command line parameter to enable Dom0 to use SVE and
to set its maximum vector length that by default is 0 which means the guest is
not allowed to use SVE. Values from 128 to 2048 mean the guest can use SVE with
the selected value used as maximum allowed vector length (which could be lower
if the implemented one is lower).
For DomUs, an XL parameter with the same way of use is introduced and a dom0less
DTB binding is created.
The context switch is the most critical part because there can be big registers
to be saved, in this serie an easy approach is used and the context is
saved/restored every time for the guests that are allowed to use SVE.


This would be OK for an initial approach. But I would be worry to officially 
support SVE because of the potential large impact on other users.

What's the long term plan?

Hi Julien,
For the future we can plan some work and decide together how to handle the 
context switch,
we might need some suggestions from you (arm maintainers) to design that part 
in the best
way for functional and security perspective.

I think SVE will need to be lazily saved/restored. So on context switch, we 
would tell that the context belongs to the a previous domain. The first time 
after the current domain tries to access SVE, then we would load it.


We should try to prevent those kind of things because it makes the real time 
analysis a lot more complex.


The choice of SVE (including the vector length) is per-domain. If all 
the VMs are using the same vector length. Then the delay would indeed be 
fixed. Otherwise, the delay will vary depending on the scheduling choice.


It is not clear to me how this is better for real time analysis.


The only use case where this would make the system a lot faster is if there is 
only one guest using SVE (which might be a use case), other than that case this 
will just create delays when someone else is trying to use SVE instead of 
having a fix delay at context swit
Even in the case you mention, I think it will highly depend on the cost 
of context switching SVE. I have been told this is quite large, and one 
surely don't want to spend an extra thousand cycles when receiving an 
interrupt (I don't expect handler to use SVE).


I think we need to understand the workload (and cost) in order to decide 
whether it should be eager/lazy.


At least, I know that in Linux, only the part common with VFP are 
guaranteed to be preserved (see [1]). So the expectation seems that SVE 
use will be short-lived.







For now we might flag the feature as unsupported, explaining in the Kconfig 
help that switching
between SVE and non-SVE guests, or between SVE guests, might add latency 
compared to
switching between non-SVE guests.


I am OK with that. I actually like the idea to spell it out because that helps 
us to remember what are the gaps in the code :).


I like this solution to.

Cheers
Bertrand



Cheers,

--
Julien Grall




[1] https://www.kernel.org/doc/Documentation/arm64/sve.txt

--
Julien Grall



[libvirt test] 176116: tolerable FAIL - PUSHED

2023-01-25 Thread osstest service owner
flight 176116 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/176116/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt-raw   7 xen-install  fail  like 176085
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 176085
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 176085
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 176085
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  d5ecc2aa779d48be32bba51a6c8c16635c52721d
baseline version:
 libvirt  7b5777afcbe508a15a509444ff6e951e7201f321

Last test of basis   176085  2023-01-24 04:20:12 Z1 days
Testing same since   176116  2023-01-25 04:18:48 Z0 days1 attempts


People who touched revisions under test:
  Brooks Swinnerton 
  Daniel Henrique Barboza 
  Martin Kletzander 
  Michal Privoznik 
  Peter Krempa 
  Shaleen Bathla 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  fail
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at

Re: [RFC PATCH 0/8] SVE feature for arm guests

2023-01-25 Thread Bertrand Marquis
Hi Julien,

> On 13 Jan 2023, at 09:44, Julien Grall  wrote:
> 
> Hi Luca,
> 
> On 12/01/2023 11:58, Luca Fancellu wrote:
>>> On 11 Jan 2023, at 16:59, Julien Grall  wrote:
>>> On 11/01/2023 14:38, Luca Fancellu wrote:
 This serie is introducing the possibility for Dom0 and DomU guests to use
 sve/sve2 instructions.
 SVE feature introduces new instruction and registers to improve 
 performances on
 floating point operations.
 The SVE feature is advertised using the ID_AA64PFR0_EL1 register, SVE 
 field, and
 when available the ID_AA64ZFR0_EL1 register provides additional information
 about the implemented version and other SVE feature.
 New registers added by the SVE feature are Z0-Z31, P0-P15, FFR, ZCR_ELx.
 Z0-Z31 are scalable vector register whose size is implementation defined 
 and
 goes from 128 bits to maximum 2048, the term vector length will be used to 
 refer
 to this quantity.
 P0-P15 are predicate registers and the size is the vector length divided 
 by 8,
 same size is the FFR (First Fault Register).
 ZCR_ELx is a register that can control and restrict the maximum vector 
 length
 used by the  exception level and all the lower exception levels, so for
 example EL3 can restrict the vector length usable by EL3,2,1,0.
 The platform has a maximum implemented vector length, so for every value
 written in ZCR register, if this value is above the implemented length, 
 then the
 lower value will be used. The RDVL instruction can be used to check what 
 vector
 length is the HW using after setting ZCR.
 For an SVE guest, the V0-V31 registers are part of the Z0-Z31, so there is 
 no
 need to save them separately, saving Z0-Z31 will save implicitly also 
 V0-V31.
 SVE usage can be trapped using a flag in CPTR_EL2, hence in this serie the
 register is added to the domain state, to be able to trap only the guests 
 that
 are not allowed to use SVE.
 This serie is introducing a command line parameter to enable Dom0 to use 
 SVE and
 to set its maximum vector length that by default is 0 which means the 
 guest is
 not allowed to use SVE. Values from 128 to 2048 mean the guest can use SVE 
 with
 the selected value used as maximum allowed vector length (which could be 
 lower
 if the implemented one is lower).
 For DomUs, an XL parameter with the same way of use is introduced and a 
 dom0less
 DTB binding is created.
 The context switch is the most critical part because there can be big 
 registers
 to be saved, in this serie an easy approach is used and the context is
 saved/restored every time for the guests that are allowed to use SVE.
>>> 
>>> This would be OK for an initial approach. But I would be worry to 
>>> officially support SVE because of the potential large impact on other users.
>>> 
>>> What's the long term plan?
>> Hi Julien,
>> For the future we can plan some work and decide together how to handle the 
>> context switch,
>> we might need some suggestions from you (arm maintainers) to design that 
>> part in the best
>> way for functional and security perspective.
> I think SVE will need to be lazily saved/restored. So on context switch, we 
> would tell that the context belongs to the a previous domain. The first time 
> after the current domain tries to access SVE, then we would load it.

We should try to prevent those kind of things because it makes the real time 
analysis a lot more complex.
The only use case where this would make the system a lot faster is if there is 
only one guest using SVE (which might be a use case), other than that case this 
will just create delays when someone else is trying to use SVE instead of 
having a fix delay at context switch.

> 
>> For now we might flag the feature as unsupported, explaining in the Kconfig 
>> help that switching
>> between SVE and non-SVE guests, or between SVE guests, might add latency 
>> compared to
>> switching between non-SVE guests.
> 
> I am OK with that. I actually like the idea to spell it out because that 
> helps us to remember what are the gaps in the code :).

I like this solution to.

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall




Re: [PATCH v4 01/11] xen/common: add cache coloring common code

2023-01-25 Thread Jan Beulich
On 25.01.2023 12:18, Carlo Nonato wrote:
> On Tue, Jan 24, 2023 at 5:37 PM Jan Beulich  wrote:
>> On 23.01.2023 16:47, Carlo Nonato wrote:
>>> --- /dev/null
>>> +++ b/xen/include/xen/llc_coloring.h
>>> @@ -0,0 +1,54 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +/*
>>> + * Last Level Cache (LLC) coloring common header
>>> + *
>>> + * Copyright (C) 2022 Xilinx Inc.
>>> + *
>>> + * Authors:
>>> + *Carlo Nonato 
>>> + */
>>> +#ifndef __COLORING_H__
>>> +#define __COLORING_H__
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +#ifdef CONFIG_HAS_LLC_COLORING
>>> +
>>> +#include 
>>> +
>>> +extern bool llc_coloring_enabled;
>>> +
>>> +int domain_llc_coloring_init(struct domain *d, unsigned int *colors,
>>> + unsigned int num_colors);
>>> +void domain_llc_coloring_free(struct domain *d);
>>> +void domain_dump_llc_colors(struct domain *d);
>>> +
>>> +#else
>>> +
>>> +#define llc_coloring_enabled (false)
>>
>> While I agree this is needed, ...
>>
>>> +static inline int domain_llc_coloring_init(struct domain *d,
>>> +   unsigned int *colors,
>>> +   unsigned int num_colors)
>>> +{
>>> +return 0;
>>> +}
>>> +static inline void domain_llc_coloring_free(struct domain *d) {}
>>> +static inline void domain_dump_llc_colors(struct domain *d) {}
>>
>> ... I don't think you need any of these. Instead the declarations above
>> simply need to be visible unconditionally (to be visible to the compiler
>> when processing consuming code). We rely on DCE to remove such references
>> in many other places.
> 
> So this is true for any other stub function that I used in the series, right?

Likely. I didn't look at most of the Arm-only pieces.

>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -602,6 +602,9 @@ struct domain
>>>
>>>  /* Holding CDF_* constant. Internal flags for domain creation. */
>>>  unsigned int cdf;
>>> +
>>> +unsigned int *llc_colors;
>>> +unsigned int num_llc_colors;
>>>  };
>>
>> Why outside of any #ifdef, and why not in struct arch_domain?
> 
> Moving this in sched.h seemed like the natural continuation of the common +
> arch specific split. Notice that this split is also because Julien pointed
> out (as you did in some earlier revision) that cache coloring can be used
> by other arch in the future (even if x86 is excluded). Having two maintainers
> saying the same thing sounded like a good reason to do that.

If you mean this to be usable by other arch-es as well (which I would
welcome, as I think I had expressed on an earlier version), then I think
more pieces want to be in common code. But putting the fields here and all
users of them in arch-specific code (which I think is the way I saw it)
doesn't look very logical to me. IOW to me there exist only two possible
approaches: As much as possible in common code, or common code being
disturbed as little as possible.

> The missing #ifdef comes from a discussion I had with Julien in v2 about
> domctl interface where he suggested removing it
> (https://marc.info/?l=xen-devel=166151802002263).

I went about five levels deep in the replies, without finding any such reply
from Julien. Can you please be more specific with the link, so readers don't
need to endlessly dig?

Jan

> We were talking about
> a different struct, but I thought the principle was the same. Anyway I would
> like the #ifdef too.
> 
> So @Jan, @Julien, can you help me fix this once for all?
> 
> Thanks.
> 
> - Carlo Nonato




Re: [XEN PATCH v2 0/3] Configure qemu upstream correctly by default for igd-passthru

2023-01-25 Thread Anthony PERARD
On Tue, Jan 10, 2023 at 02:32:01AM -0500, Chuck Zmudzinski wrote:
> I call attention to the commit message of the first patch which points
> out that using the "pc" machine and adding the xen platform device on
> the qemu upstream command line is not functionally equivalent to using
> the "xenfv" machine which automatically adds the xen platform device
> earlier in the guest creation process. As a result, there is a noticeable
> reduction in the performance of the guest during startup with the "pc"
> machne type even if the xen platform device is added via the qemu
> command line options, although eventually both Linux and Windows guests
> perform equally well once the guest operating system is fully loaded.

There shouldn't be a difference between "xenfv" machine or using the
"pc" machine while adding the "xen-platform" device, at least with
regards to access to disk or network.

The first patch of the series is using the "pc" machine without any
"xen-platform" device, so we can't compare startup performance based on
that.

> Specifically, startup time is longer and neither the grub vga drivers
> nor the windows vga drivers in early startup perform as well when the
> xen platform device is added via the qemu command line instead of being
> added immediately after the other emulated i440fx pci devices when the
> "xenfv" machine type is used.

The "xen-platform" device is mostly an hint to a guest that they can use
pv-disk and pv-network devices. I don't think it would change anything
with regards to graphics.

> For example, when using the "pc" machine, which adds the xen platform
> device using a command line option, the Linux guest could not display
> the grub boot menu at the native resolution of the monitor, but with the
> "xenfv" machine, the grub menu is displayed at the full 1920x1080
> native resolution of the monitor for testing. So improved startup
> performance is an advantage for the patch for qemu.

I've just found out that when doing IGD passthrough, both machine
"xenfv" and "pc" are much more different than I though ... :-(
pc_xen_hvm_init_pci() in QEMU changes the pci-host device, which in
turns copy some informations from the real host bridge.
I guess this new host bridge help when the firmware setup the graphic
for grub.

> I also call attention to the last point of the commit message of the
> second patch and the comments for reviewers section of the second patch.
> This approach, as opposed to fixing this in qemu upstream, makes
> maintaining the code in libxl__build_device_model_args_new more
> difficult and therefore increases the chances of problems caused by
> coding errors and typos for users of libxl. So that is another advantage
> of the patch for qemu.

We would just needs to use a different approach in libxl when generating
the command line. We could probably avoid duplications. I was hopping to
have patch series for libxl that would change the machine used to start
using "pc" instead of "xenfv" for all configurations, but based on the
point above (IGD specific change to "xenfv"), then I guess we can't
really do anything from libxl to fix IGD passthrough.

> OTOH, fixing this in qemu causes newer qemu versions to behave
> differently than previous versions of qemu, which the qemu community
> does not like, although they seem OK with the other patch since it only
> affects qemu "xenfv" machine types, but they do not want the patch to
> affect toolstacks like libvirt that do not use qemu upstream's
> autoconfiguration options as much as libxl does, and, of course, libvirt
> can manage qemu "xenfv" machines so exising "xenfv" guests configured
> manually by libvirt could be adversely affected by the patch to qemu,
> but only if those same guests are also configured for igd-passthrough,
> which is likely a very small number of possibly affected libvirt users
> of qemu.
> 
> A year or two ago I tried to configure guests for pci passthrough on xen
> using libvirt's tool to convert a libxl xl.cfg file to libvirt xml. It
> could not convert an xl.cfg file with a configuration item
> pci = [ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ...] for pci passthrough.
> So it is unlikely there are any users out there using libvirt to
> configure xen hvm guests for igd passthrough on xen, and those are the
> only users that could be adversely affected by the simpler patch to qemu
> to fix this.

FYI, libvirt should be using libxl to create guest, I don't think there
is another way for libvirt to create xen guests.



So overall, unfortunately the "pc" machine in QEMU isn't suitable to do
IGD passthrough as the "xenfv" machine has already some workaround to
make IGD work and just need some more.

I've seen that the patch for QEMU is now reviewed, so I look at having
it merged soonish.

Thanks,

-- 
Anthony PERARD



Re: [PATCH v2 1/3] xen/arm: Add memory overlap check for bootinfo.reserved_mem

2023-01-25 Thread Julien Grall




On 14/12/2022 03:16, Henry Wang wrote:

As we are having more and more types of static region, and all of
these static regions are defined in bootinfo.reserved_mem, it is
necessary to add the overlap check of reserved memory regions in Xen,
because such check will help user to identify the misconfiguration in
the device tree at the early stage of boot time.

Currently we have 3 types of static region, namely
(1) static memory
(2) static heap
(3) static shared memory

(1) and (2) are parsed by the function `device_tree_get_meminfo()` and
(3) is parsed using its own logic. All of parsed information of these
types will be stored in `struct meminfo`.

Therefore, to unify the overlap checking logic for all of these types,
this commit firstly introduces a helper `meminfo_overlap_check()` and
a function `check_reserved_regions_overlap()` to check if an input
physical address range is overlapping with the existing memory regions
defined in bootinfo. After that, use `check_reserved_regions_overlap()`
in `device_tree_get_meminfo()` to do the overlap check of (1) and (2)
and replace the original overlap check of (3) with
`check_reserved_regions_overlap()`.

Signed-off-by: Henry Wang 
---
v1 -> v2:
1. Split original `overlap_check()` to `meminfo_overlap_check()`.
2. Rework commit message.
---
  xen/arch/arm/bootfdt.c   | 13 +-
  xen/arch/arm/include/asm/setup.h |  2 ++
  xen/arch/arm/setup.c | 42 
  3 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 0085c28d74..e2f6c7324b 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -88,6 +88,9 @@ static int __init device_tree_get_meminfo(const void *fdt, 
int node,
  for ( i = 0; i < banks && mem->nr_banks < NR_MEM_BANKS; i++ )
  {
  device_tree_get_reg(, address_cells, size_cells, , );
+if ( mem == _mem &&
+ check_reserved_regions_overlap(start, size) )
+return -EINVAL;
  /* Some DT may describe empty bank, ignore them */
  if ( !size )
  continue;
@@ -482,7 +485,9 @@ static int __init process_shm_node(const void *fdt, int 
node,
  return -EINVAL;
  }
  
-if ( (end <= mem->bank[i].start) || (paddr >= bank_end) )

+if ( check_reserved_regions_overlap(paddr, size) )
+return -EINVAL;
+else
  {
  if ( strcmp(shm_id, mem->bank[i].shm_id) != 0 )
  continue;
@@ -493,12 +498,6 @@ static int __init process_shm_node(const void *fdt, int 
node,
  return -EINVAL;
  }
  }
-else
-{
-printk("fdt: shared memory region overlap with an existing entry %#"PRIpaddr" 
- %#"PRIpaddr"\n",
-mem->bank[i].start, bank_end);
-return -EINVAL;
-}
  }
  }
  
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h

index fdbf68aadc..6a9f88ecbb 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -143,6 +143,8 @@ void fw_unreserved_regions(paddr_t s, paddr_t e,
  size_t boot_fdt_info(const void *fdt, paddr_t paddr);
  const char *boot_fdt_cmdline(const void *fdt);
  
+int check_reserved_regions_overlap(paddr_t region_start, paddr_t region_size);

+
  struct bootmodule *add_boot_module(bootmodule_kind kind,
 paddr_t start, paddr_t size, bool domU);
  struct bootmodule *boot_module_find_by_kind(bootmodule_kind kind);
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 1f26f67b90..e6eeb3a306 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -261,6 +261,31 @@ static void __init dt_unreserved_regions(paddr_t s, 
paddr_t e,
  cb(s, e);
  }
  
+static int __init meminfo_overlap_check(struct meminfo *meminfo,

+paddr_t region_start,
+paddr_t region_end)
+{
+paddr_t bank_start = INVALID_PADDR, bank_end = 0;
+unsigned int i, bank_num = meminfo->nr_banks;
+
+for ( i = 0; i < bank_num; i++ )
+{
+bank_start = meminfo->bank[i].start;
+bank_end = bank_start + meminfo->bank[i].size;
+
+if ( region_end <= bank_start || region_start >= bank_end )
+continue;
+else
+{
+printk("Region %#"PRIpaddr" - %#"PRIpaddr" overlapping with bank[%u] %#"PRIpaddr" 
- %#"PRIpaddr"\n",


AFAICT, in messages, the end would be inclusive. But here...


+   region_start, region_end, i, bank_start, bank_end);


... it would be exclusive. I would suggest to print using the format 
[start, end[ or decrement the value by 1.


Cheers,

--
Julien Grall



Re: Usage of Xen Security Data in VulnerableCode

2023-01-25 Thread George Dunlap
On Thu, Jan 19, 2023 at 1:10 PM Tushar Goel 
wrote:

> Hi Andrew,
>
> > Maybe we want to make it CC-BY-4 to require people to reference back to
> > the canonical upstream ?
> Thanks for your response, can we have a more declarative statement on
> the license from your end
> and also can you please provide your acknowledgement over the usage of
> Xen security data in vulnerablecode.
>

Hey Tushar,

Informally, the Xen Project Security Team is happy for you to include the
data from xsa.json in your open-source vulnerability database.  As a
courtesy we'd request that it be documented where the information came
from.  (I think if the data includes links to then advisories on our
website, that will suffice.)

Formally, we're not copyright lawyers; but we don't think there's anything
copyright-able in the xsa.json: There is no editorial or creative control
in the generation of that file; it's just a collection of facts which you
could re-generate by scanning all the advisories.  (In fact that's exactly
how the file is created; i.e., the collection of advisory texts is our
"source of truth".)

We do have "Officially license all advisory text as CC-BY-4" on our to-do
list; if you'd be more comfortable with an official license for xsa.json as
well, we can add that to the list.

 -George


Re: [PATCH v2 1/3] xen/arm: Add memory overlap check for bootinfo.reserved_mem

2023-01-25 Thread Julien Grall

Hi Henry,

On 14/12/2022 03:16, Henry Wang wrote:

As we are having more and more types of static region, and all of
these static regions are defined in bootinfo.reserved_mem, it is
necessary to add the overlap check of reserved memory regions in Xen,
because such check will help user to identify the misconfiguration in
the device tree at the early stage of boot time.

Currently we have 3 types of static region, namely
(1) static memory
(2) static heap
(3) static shared memory

(1) and (2) are parsed by the function `device_tree_get_meminfo()` and
(3) is parsed using its own logic. All of parsed information of these
types will be stored in `struct meminfo`.

Therefore, to unify the overlap checking logic for all of these types,
this commit firstly introduces a helper `meminfo_overlap_check()` and
a function `check_reserved_regions_overlap()` to check if an input
physical address range is overlapping with the existing memory regions
defined in bootinfo. After that, use `check_reserved_regions_overlap()`
in `device_tree_get_meminfo()` to do the overlap check of (1) and (2)
and replace the original overlap check of (3) with
`check_reserved_regions_overlap()`.

Signed-off-by: Henry Wang 
---
v1 -> v2:
1. Split original `overlap_check()` to `meminfo_overlap_check()`.
2. Rework commit message.
---
  xen/arch/arm/bootfdt.c   | 13 +-
  xen/arch/arm/include/asm/setup.h |  2 ++
  xen/arch/arm/setup.c | 42 
  3 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 0085c28d74..e2f6c7324b 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -88,6 +88,9 @@ static int __init device_tree_get_meminfo(const void *fdt, 
int node,
  for ( i = 0; i < banks && mem->nr_banks < NR_MEM_BANKS; i++ )
  {
  device_tree_get_reg(, address_cells, size_cells, , );
+if ( mem == _mem &&
+ check_reserved_regions_overlap(start, size) )
+return -EINVAL;
  /* Some DT may describe empty bank, ignore them */
  if ( !size )
  continue;
@@ -482,7 +485,9 @@ static int __init process_shm_node(const void *fdt, int 
node,
  return -EINVAL;
  }
  
-if ( (end <= mem->bank[i].start) || (paddr >= bank_end) )

+if ( check_reserved_regions_overlap(paddr, size) )
+return -EINVAL;
+else
  {
  if ( strcmp(shm_id, mem->bank[i].shm_id) != 0 )
  continue;
@@ -493,12 +498,6 @@ static int __init process_shm_node(const void *fdt, int 
node,
  return -EINVAL;
  }
  }
-else
-{
-printk("fdt: shared memory region overlap with an existing entry %#"PRIpaddr" 
- %#"PRIpaddr"\n",
-mem->bank[i].start, bank_end);
-return -EINVAL;
-}
  }
  }
  
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h

index fdbf68aadc..6a9f88ecbb 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -143,6 +143,8 @@ void fw_unreserved_regions(paddr_t s, paddr_t e,
  size_t boot_fdt_info(const void *fdt, paddr_t paddr);
  const char *boot_fdt_cmdline(const void *fdt);
  
+int check_reserved_regions_overlap(paddr_t region_start, paddr_t region_size);

+
  struct bootmodule *add_boot_module(bootmodule_kind kind,
 paddr_t start, paddr_t size, bool domU);
  struct bootmodule *boot_module_find_by_kind(bootmodule_kind kind);
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 1f26f67b90..e6eeb3a306 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -261,6 +261,31 @@ static void __init dt_unreserved_regions(paddr_t s, 
paddr_t e,
  cb(s, e);
  }
  
+static int __init meminfo_overlap_check(struct meminfo *meminfo,

+paddr_t region_start,
+paddr_t region_end)


I am starting to dislike the use of 'end' for a couple of reasons:
  1) It never clear whether this is inclusive or exclusive
  2) When it is exclusive, this doesn't properly work if the region 
finish at (2^64 - 1) as 'end' would be 0


I have started to clean-up the Arm code to avoid all those issues. So 
for new code, I would rather prefer if we use 'start' and 'size' to 
describe a region.



+{
+paddr_t bank_start = INVALID_PADDR, bank_end = 0;
+unsigned int i, bank_num = meminfo->nr_banks;
+
+for ( i = 0; i < bank_num; i++ )
+{
+bank_start = meminfo->bank[i].start;
+bank_end = bank_start + meminfo->bank[i].size;
+
+if ( region_end <= bank_start || region_start >= bank_end )
+continue;
+else
+{
+printk("Region %#"PRIpaddr" - %#"PRIpaddr" overlapping with bank[%u] %#"PRIpaddr" 
- 

Re: [XEN v5] xen/arm: Probe the load/entry point address of an uImage correctly

2023-01-25 Thread Ayan Kumar Halder

Hi Stefano,

On 20/01/2023 22:28, Stefano Stabellini wrote:

On Fri, 13 Jan 2023, Ayan Kumar Halder wrote:

Currently, kernel_uimage_probe() does not read the load/entry point address
set in the uImge header. Thus, info->zimage.start is 0 (default value). This
causes, kernel_zimage_place() to treat the binary (contained within uImage)
as position independent executable. Thus, it loads it at an incorrect
address.

The correct approach would be to read "uimage.load" and set
info->zimage.start. This will ensure that the binary is loaded at the
correct address. Also, read "uimage.ep" and set info->entry (ie kernel entry
address).

If user provides load address (ie "uimage.load") as 0x0, then the image is
treated as position independent executable. Xen can load such an image at
any address it considers appropriate. A position independent executable
cannot have a fixed entry point address.

This behavior is applicable for both arm32 and arm64 platforms.

Earlier for arm32 and arm64 platforms, Xen was ignoring the load and entry
point address set in the uImage header. With this commit, Xen will use them.
This makes the behavior of Xen consistent with uboot for uimage headers.

Users who want to use Xen with statically partitioned domains, can provide
non zero load address and entry address for the dom0/domU kernel. It is
required that the load and entry address provided must be within the memory
region allocated by Xen.

A deviation from uboot behaviour is that we consider load address == 0x0,
to denote that the image supports position independent execution. This
is to make the behavior consistent across uImage and zImage.

Signed-off-by: Ayan Kumar Halder 
---

Changes from v1 :-
1. Added a check to ensure load address and entry address are the same.
2. Considered load address == 0x0 as position independent execution.
3. Ensured that the uImage header interpretation is consistent across
arm32 and arm64.

v2 :-
1. Mentioned the change in existing behavior in booting.txt.
2. Updated booting.txt with a new section to document "Booting Guests".

v3 :-
1. Removed the constraint that the entry point should be same as the load
address. Thus, Xen uses both the load address and entry point to determine
where the image is to be copied and the start address.
2. Updated documentation to denote that load address and start address
should be within the memory region allocated by Xen.
3. Added constraint that user cannot provide entry point for a position
independent executable (PIE) image.

v4 :-
1. Explicitly mentioned the version in booting.txt from when the uImage
probing behavior has changed.
2. Logged the requested load address and entry point parsed from the uImage
header.
3. Some style issues.

  docs/misc/arm/booting.txt | 26 
  xen/arch/arm/include/asm/kernel.h |  2 +-
  xen/arch/arm/kernel.c | 49 +--
  3 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/docs/misc/arm/booting.txt b/docs/misc/arm/booting.txt
index 3e0c03e065..aeb0123e8d 100644
--- a/docs/misc/arm/booting.txt
+++ b/docs/misc/arm/booting.txt
@@ -23,6 +23,28 @@ The exceptions to this on 32-bit ARM are as follows:
  
  There are no exception on 64-bit ARM.
  
+Booting Guests

+--
+
+Xen supports the legacy image header[3], zImage protocol for 32-bit
+ARM Linux[1] and Image protocol defined for ARM64[2].
+
+Until Xen 4.17, in case of legacy image protocol, Xen ignored the load
+address and entry point specified in the header. This has now changed.
+
+Now, it loads the image at the load address provided in the header.
+And the entry point is used as the kernel start address.
+
+A deviation from uboot is that, Xen treats "load address == 0x0" as
+position independent execution (PIE). Thus, Xen will load such an image
+at an address it considers appropriate. Also, user cannot specify the
+entry point of a PIE image since the start address cennot be
+predetermined.
+
+Users who want to use Xen with statically partitioned domains, can provide
+the fixed non zero load address and start address for the dom0/domU kernel.
+The load address and start address specified by the user in the header must
+be within the memory region allocated by Xen.
  
  Firmware/bootloader requirements

  
@@ -39,3 +61,7 @@ Latest version: 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/t
  
  [2] linux/Documentation/arm64/booting.rst

  Latest version: 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/arm64/booting.rst
+
+[3] legacy format header
+Latest version: 
https://source.denx.de/u-boot/u-boot/-/blob/master/include/image.h#L315
+https://linux.die.net/man/1/mkimage
diff --git a/xen/arch/arm/include/asm/kernel.h 
b/xen/arch/arm/include/asm/kernel.h
index 5bb30c3f2f..4617cdc83b 100644
--- a/xen/arch/arm/include/asm/kernel.h
+++ b/xen/arch/arm/include/asm/kernel.h
@@ -72,7 +72,7 @@ struct kernel_info {
  #ifdef 

[XEN v6] xen/arm: Probe the load/entry point address of an uImage correctly

2023-01-25 Thread Ayan Kumar Halder
Currently, kernel_uimage_probe() does not read the load/entry point address
set in the uImge header. Thus, info->zimage.start is 0 (default value). This
causes, kernel_zimage_place() to treat the binary (contained within uImage)
as position independent executable. Thus, it loads it at an incorrect
address.

The correct approach would be to read "uimage.load" and set
info->zimage.start. This will ensure that the binary is loaded at the
correct address. Also, read "uimage.ep" and set info->entry (ie kernel entry
address).

If user provides load address (ie "uimage.load") as 0x0, then the image is
treated as position independent executable. Xen can load such an image at
any address it considers appropriate. A position independent executable
cannot have a fixed entry point address.

This behavior is applicable for both arm32 and arm64 platforms.

Earlier for arm32 and arm64 platforms, Xen was ignoring the load and entry
point address set in the uImage header. With this commit, Xen will use them.
This makes the behavior of Xen consistent with uboot for uimage headers.

Users who want to use Xen with statically partitioned domains, can provide
non zero load address and entry address for the dom0/domU kernel. It is
required that the load and entry address provided must be within the memory
region allocated by Xen.

A deviation from uboot behaviour is that we consider load address == 0x0,
to denote that the image supports position independent execution. This
is to make the behavior consistent across uImage and zImage.

Signed-off-by: Ayan Kumar Halder 
---

Changes from v1 :-
1. Added a check to ensure load address and entry address are the same.
2. Considered load address == 0x0 as position independent execution.
3. Ensured that the uImage header interpretation is consistent across
arm32 and arm64.

v2 :-
1. Mentioned the change in existing behavior in booting.txt.
2. Updated booting.txt with a new section to document "Booting Guests".

v3 :-
1. Removed the constraint that the entry point should be same as the load
address. Thus, Xen uses both the load address and entry point to determine
where the image is to be copied and the start address.
2. Updated documentation to denote that load address and start address
should be within the memory region allocated by Xen.
3. Added constraint that user cannot provide entry point for a position
independent executable (PIE) image.

v4 :-
1. Explicitly mentioned the version in booting.txt from when the uImage
probing behavior has changed.
2. Logged the requested load address and entry point parsed from the uImage
header.
3. Some style issues.

v5 :-
1. Set info->zimage.text_offset = 0 in kernel_uimage_probe().
2. Mention that if the kernel has a legacy image header on top of 
zImage/zImage64
header, then the attrbutes from legacy image header is used to determine the 
load
address, entry point, etc. Thus, zImage/zImage64 header is effectively ignored.

This is true because Xen currently does not support recursive probing of kernel
headers ie if uImage header is probed, then Xen will not attempt to see if there
is an underlying zImage/zImage64 header.

 docs/misc/arm/booting.txt | 30 
 xen/arch/arm/include/asm/kernel.h |  2 +-
 xen/arch/arm/kernel.c | 58 +--
 3 files changed, 86 insertions(+), 4 deletions(-)

diff --git a/docs/misc/arm/booting.txt b/docs/misc/arm/booting.txt
index 3e0c03e065..1837579aef 100644
--- a/docs/misc/arm/booting.txt
+++ b/docs/misc/arm/booting.txt
@@ -23,6 +23,32 @@ The exceptions to this on 32-bit ARM are as follows:
 
 There are no exception on 64-bit ARM.
 
+Booting Guests
+--
+
+Xen supports the legacy image header[3], zImage protocol for 32-bit
+ARM Linux[1] and Image protocol defined for ARM64[2].
+
+Until Xen 4.17, in case of legacy image protocol, Xen ignored the load
+address and entry point specified in the header. This has now changed.
+
+Now, it loads the image at the load address provided in the header.
+And the entry point is used as the kernel start address.
+
+A deviation from uboot is that, Xen treats "load address == 0x0" as
+position independent execution (PIE). Thus, Xen will load such an image
+at an address it considers appropriate. Also, user cannot specify the
+entry point of a PIE image since the start address cennot be
+predetermined.
+
+Users who want to use Xen with statically partitioned domains, can provide
+the fixed non zero load address and start address for the dom0/domU kernel.
+The load address and start address specified by the user in the header must
+be within the memory region allocated by Xen.
+
+Also, it is to be noted that if user provides the legacy image header on top of
+zImage or Image header, then Xen uses the attrbutes of legacy image header only
+to determine the load address, entry point, etc.
 
 Firmware/bootloader requirements
 
@@ -39,3 +65,7 @@ Latest version: 

Re: [PATCH v4 01/11] xen/common: add cache coloring common code

2023-01-25 Thread Carlo Nonato
Hi Jan, Julien

On Tue, Jan 24, 2023 at 5:37 PM Jan Beulich  wrote:
>
> On 23.01.2023 16:47, Carlo Nonato wrote:
> > @@ -769,6 +776,13 @@ struct domain *domain_create(domid_t domid,
> >  return ERR_PTR(err);
> >  }
> >
> > +struct domain *domain_create(domid_t domid,
> > + struct xen_domctl_createdomain *config,
> > + unsigned int flags)
> > +{
> > +return domain_create_llc_colored(domid, config, flags, 0, 0);
>
> Please can you use NULL when you mean a null pointer?
>
> > --- /dev/null
> > +++ b/xen/include/xen/llc_coloring.h
> > @@ -0,0 +1,54 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Last Level Cache (LLC) coloring common header
> > + *
> > + * Copyright (C) 2022 Xilinx Inc.
> > + *
> > + * Authors:
> > + *Carlo Nonato 
> > + */
> > +#ifndef __COLORING_H__
> > +#define __COLORING_H__
> > +
> > +#include 
> > +#include 
> > +
> > +#ifdef CONFIG_HAS_LLC_COLORING
> > +
> > +#include 
> > +
> > +extern bool llc_coloring_enabled;
> > +
> > +int domain_llc_coloring_init(struct domain *d, unsigned int *colors,
> > + unsigned int num_colors);
> > +void domain_llc_coloring_free(struct domain *d);
> > +void domain_dump_llc_colors(struct domain *d);
> > +
> > +#else
> > +
> > +#define llc_coloring_enabled (false)
>
> While I agree this is needed, ...
>
> > +static inline int domain_llc_coloring_init(struct domain *d,
> > +   unsigned int *colors,
> > +   unsigned int num_colors)
> > +{
> > +return 0;
> > +}
> > +static inline void domain_llc_coloring_free(struct domain *d) {}
> > +static inline void domain_dump_llc_colors(struct domain *d) {}
>
> ... I don't think you need any of these. Instead the declarations above
> simply need to be visible unconditionally (to be visible to the compiler
> when processing consuming code). We rely on DCE to remove such references
> in many other places.

So this is true for any other stub function that I used in the series, right?
Since all of them are guarded by the same kind of if statement: checking for
llc_coloring_enabled value which, in case of coloring disabled from Kconfig,
is always false and then DCE comes in. Sorry for being so verbose, but I just
want to be sure I understood.

> > +#endif /* CONFIG_HAS_LLC_COLORING */
> > +
> > +#define is_domain_llc_colored(d) (llc_coloring_enabled)
> > +
> > +#endif /* __COLORING_H__ */
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * tab-width: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > \ No newline at end of file
>
> This wants taking care of.
>
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -602,6 +602,9 @@ struct domain
> >
> >  /* Holding CDF_* constant. Internal flags for domain creation. */
> >  unsigned int cdf;
> > +
> > +unsigned int *llc_colors;
> > +unsigned int num_llc_colors;
> >  };
>
> Why outside of any #ifdef, and why not in struct arch_domain?

Moving this in sched.h seemed like the natural continuation of the common +
arch specific split. Notice that this split is also because Julien pointed
out (as you did in some earlier revision) that cache coloring can be used
by other arch in the future (even if x86 is excluded). Having two maintainers
saying the same thing sounded like a good reason to do that.

The missing #ifdef comes from a discussion I had with Julien in v2 about
domctl interface where he suggested removing it
(https://marc.info/?l=xen-devel=166151802002263). We were talking about
a different struct, but I thought the principle was the same. Anyway I would
like the #ifdef too.

So @Jan, @Julien, can you help me fix this once for all?

> Jan

Thanks.

- Carlo Nonato



[xen-unstable test] 176110: regressions - FAIL

2023-01-25 Thread osstest service owner
flight 176110 xen-unstable real [real]
flight 176119 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/176110/
http://logs.test-lab.xenproject.org/osstest/logs/176119/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-coresched-i386-xl 18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-examine-bios  6 xen-install  fail REGR. vs. 175994
 test-amd64-i386-xl-xsm   18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl   18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-pair  26 guest-migrate/src_host/dst_host fail REGR. vs. 175994
 test-amd64-i386-xl-vhd   17 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-xl-shadow18 guest-localmigrate   fail REGR. vs. 175994
 test-amd64-i386-libvirt-pair 26 guest-migrate/src_host/dst_host fail REGR. vs. 
175994

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail pass in 
176119-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 175987
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 175987
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 175987
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 175994
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 175994
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 175994
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 175994
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 175994
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 175994
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 175994
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 175994
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 175994
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check

[XEN v5] xen/arm: Use the correct format specifier

2023-01-25 Thread Ayan Kumar Halder
1. One should use 'PRIpaddr' to display 'paddr_t' variables. However,
while creating nodes in fdt, the address (if present in the node name)
should be represented using 'PRIx64'. This is to be in conformance
with the following rule present in https://elinux.org/Device_Tree_Linux

. node names
"unit-address does not have leading zeros"

As 'PRIpaddr' introduces leading zeros, we cannot use it.

So, we have introduced a wrapper ie domain_fdt_begin_node() which will
represent physical address using 'PRIx64'.

2. One should use 'PRIx64' to display 'u64' in hex format. The current
use of 'PRIpaddr' for printing PTE is buggy as this is not a physical
address.

Signed-off-by: Ayan Kumar Halder 
---
Changes from -

v1 - 1. Moved the patch earlier.
2. Moved a part of change from "[XEN v1 8/9] xen/arm: Other adaptations 
required to support 32bit paddr"
into this patch.

v2 - 1. Use PRIx64 for appending addresses to fdt node names. This fixes the CI 
failure.

v3 - 1. Added a comment on top of domain_fdt_begin_node().
2. Check for the return of snprintf() in domain_fdt_begin_node().

v4 - 1. Grammatical error fixes.

 xen/arch/arm/domain_build.c | 64 +++--
 xen/arch/arm/gic-v2.c   |  6 ++--
 xen/arch/arm/mm.c   |  2 +-
 3 files changed, 44 insertions(+), 28 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index c2b97fa21e..a798e0b256 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1288,6 +1288,39 @@ static int __init fdt_property_interrupts(const struct 
kernel_info *kinfo,
 return res;
 }
 
+/*
+ * Wrapper to convert physical address from paddr_t to uint64_t and
+ * invoke fdt_begin_node(). This is required as the physical address
+ * provided as part of node name should not contain any leading
+ * zeroes. Thus, one should use PRIx64 (instead of PRIpaddr) to append
+ * unit (which contains the physical address) with name to generate a
+ * node name.
+ */
+static int __init domain_fdt_begin_node(void *fdt, const char *name,
+uint64_t unit)
+{
+/*
+ * The size of the buffer to hold the longest possible string (i.e.
+ * interrupt-controller@ + a 64-bit number + \0).
+ */
+char buf[38];
+int ret;
+
+/* ePAPR 3.4 */
+ret = snprintf(buf, sizeof(buf), "%s@%"PRIx64, name, unit);
+
+if ( ret >= sizeof(buf) )
+{
+printk(XENLOG_ERR
+   "Insufficient buffer. Minimum size required is %d\n",
+   (ret + 1));
+
+return -FDT_ERR_TRUNCATED;
+}
+
+return fdt_begin_node(fdt, buf);
+}
+
 static int __init make_memory_node(const struct domain *d,
void *fdt,
int addrcells, int sizecells,
@@ -1296,8 +1329,6 @@ static int __init make_memory_node(const struct domain *d,
 unsigned int i;
 int res, reg_size = addrcells + sizecells;
 int nr_cells = 0;
-/* Placeholder for memory@ + a 64-bit number + \0 */
-char buf[24];
 __be32 reg[NR_MEM_BANKS * 4 /* Worst case addrcells + sizecells */];
 __be32 *cells;
 
@@ -1314,9 +1345,7 @@ static int __init make_memory_node(const struct domain *d,
 
 dt_dprintk("Create memory node\n");
 
-/* ePAPR 3.4 */
-snprintf(buf, sizeof(buf), "memory@%"PRIx64, mem->bank[i].start);
-res = fdt_begin_node(fdt, buf);
+res = domain_fdt_begin_node(fdt, "memory", mem->bank[i].start);
 if ( res )
 return res;
 
@@ -1375,16 +1404,13 @@ static int __init make_shm_memory_node(const struct 
domain *d,
 {
 uint64_t start = mem->bank[i].start;
 uint64_t size = mem->bank[i].size;
-/* Placeholder for xen-shmem@ + a 64-bit number + \0 */
-char buf[27];
 const char compat[] = "xen,shared-memory-v1";
 /* Worst case addrcells + sizecells */
 __be32 reg[GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS];
 __be32 *cells;
 unsigned int len = (addrcells + sizecells) * sizeof(__be32);
 
-snprintf(buf, sizeof(buf), "xen-shmem@%"PRIx64, mem->bank[i].start);
-res = fdt_begin_node(fdt, buf);
+res = domain_fdt_begin_node(fdt, "xen-shmem", mem->bank[i].start);
 if ( res )
 return res;
 
@@ -2716,12 +2742,9 @@ static int __init make_gicv2_domU_node(struct 
kernel_info *kinfo)
 __be32 reg[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) * 2];
 __be32 *cells;
 const struct domain *d = kinfo->d;
-/* Placeholder for interrupt-controller@ + a 64-bit number + \0 */
-char buf[38];
 
-snprintf(buf, sizeof(buf), "interrupt-controller@%"PRIx64,
- vgic_dist_base(>arch.vgic));
-res = fdt_begin_node(fdt, buf);
+res = domain_fdt_begin_node(fdt, "interrupt-controller",
+vgic_dist_base(>arch.vgic));
 if ( res )
 return res;
 
@@ -2771,14 +2794,10 @@ static int __init 

Re: [PATCH v2 6/6] mm: export dump_mm()

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:51, Suren Baghdasaryan wrote:
> mmap_assert_write_locked() is used in vm_flags modifiers. Because
> mmap_assert_write_locked() uses dump_mm() and vm_flags are sometimes
> modified from from inside a module, it's necessary to export
> dump_mm() function.
> 
> Signed-off-by: Suren Baghdasaryan 

Acked-by: Michal Hocko 

> ---
>  mm/debug.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/debug.c b/mm/debug.c
> index 9d3d893dc7f4..96d594e16292 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -215,6 +215,7 @@ void dump_mm(const struct mm_struct *mm)
>   mm->def_flags, >def_flags
>   );
>  }
> +EXPORT_SYMBOL(dump_mm);
>  
>  static bool page_init_poisoning __read_mostly = true;
>  
> -- 
> 2.39.1

-- 
Michal Hocko
SUSE Labs



Re: [PATCH v2 5/6] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:50, Suren Baghdasaryan wrote:
> In cases when VMA flags are modified after VMA was isolated and mmap_lock
> was downgraded, flags modifications would result in an assertion because
> mmap write lock is not held.
> Introduce mod_vm_flags_nolock to be used in such situation.
> Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
> flags modification and to avoid assertion.

The changelog nor the documentation of mod_vm_flags_nolock 
really explain when it is safe to use it. This is really important for
future potential users.

> Signed-off-by: Suren Baghdasaryan 
> ---
>  arch/x86/mm/pat/memtype.c | 10 +++---
>  include/linux/mm.h| 12 +---
>  include/linux/pgtable.h   |  5 +++--
>  mm/memory.c   | 13 +++--
>  mm/memremap.c |  4 ++--
>  mm/mmap.c | 16 ++--
>  6 files changed, 38 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index ae9645c900fa..d8adc0b42cf2 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -1046,7 +1046,7 @@ void track_pfn_insert(struct vm_area_struct *vma, 
> pgprot_t *prot, pfn_t pfn)
>   * can be for the entire vma (in which case pfn, size are zero).
>   */
>  void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> -  unsigned long size)
> +  unsigned long size, bool mm_wr_locked)
>  {
>   resource_size_t paddr;
>   unsigned long prot;
> @@ -1065,8 +1065,12 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned 
> long pfn,
>   size = vma->vm_end - vma->vm_start;
>   }
>   free_pfn_range(paddr, size);
> - if (vma)
> - clear_vm_flags(vma, VM_PAT);
> + if (vma) {
> + if (mm_wr_locked)
> + clear_vm_flags(vma, VM_PAT);
> + else
> + mod_vm_flags_nolock(vma, 0, VM_PAT);
> + }
>  }
>  
>  /*
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 55335edd1373..48d49930c411 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -656,12 +656,18 @@ static inline void clear_vm_flags(struct vm_area_struct 
> *vma,
>   vma->vm_flags &= ~flags;
>  }
>  
> +static inline void mod_vm_flags_nolock(struct vm_area_struct *vma,
> +unsigned long set, unsigned long clear)
> +{
> + vma->vm_flags |= set;
> + vma->vm_flags &= ~clear;
> +}
> +
>  static inline void mod_vm_flags(struct vm_area_struct *vma,
>   unsigned long set, unsigned long clear)
>  {
>   mmap_assert_write_locked(vma->vm_mm);
> - vma->vm_flags |= set;
> - vma->vm_flags &= ~clear;
> + mod_vm_flags_nolock(vma, set, clear);
>  }
>  
>  static inline void vma_set_anonymous(struct vm_area_struct *vma)
> @@ -2087,7 +2093,7 @@ static inline void zap_vma_pages(struct vm_area_struct 
> *vma)
>  }
>  void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
>   struct vm_area_struct *start_vma, unsigned long start,
> - unsigned long end);
> + unsigned long end, bool mm_wr_locked);
>  
>  struct mmu_notifier_range;
>  
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 5fd45454c073..c63cd44777ec 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1185,7 +1185,8 @@ static inline int track_pfn_copy(struct vm_area_struct 
> *vma)
>   * can be for the entire vma (in which case pfn, size are zero).
>   */
>  static inline void untrack_pfn(struct vm_area_struct *vma,
> -unsigned long pfn, unsigned long size)
> +unsigned long pfn, unsigned long size,
> +bool mm_wr_locked)
>  {
>  }
>  
> @@ -1203,7 +1204,7 @@ extern void track_pfn_insert(struct vm_area_struct 
> *vma, pgprot_t *prot,
>pfn_t pfn);
>  extern int track_pfn_copy(struct vm_area_struct *vma);
>  extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> - unsigned long size);
> + unsigned long size, bool mm_wr_locked);
>  extern void untrack_pfn_moved(struct vm_area_struct *vma);
>  #endif
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index d6902065e558..5b11b50e2c4a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1613,7 +1613,7 @@ void unmap_page_range(struct mmu_gather *tlb,
>  static void unmap_single_vma(struct mmu_gather *tlb,
>   struct vm_area_struct *vma, unsigned long start_addr,
>   unsigned long end_addr,
> - struct zap_details *details)
> + struct zap_details *details, bool mm_wr_locked)
>  {
>   unsigned long start = max(vma->vm_start, start_addr);
>   unsigned long end;
> @@ -1628,7 +1628,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
>   uprobe_munmap(vma, start, end);
>  
>   if 

Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote:
> Replace indirect modifications to vma->vm_flags with calls to modifier
> functions to be able to track flag changes and to keep vma locking
> correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
> vm_flags modification attempts.

Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I
gueess we should be willing to trust it.

> Signed-off-by: Suren Baghdasaryan 

Acked-by: Michal Hocko 
-- 
Michal Hocko
SUSE Labs



Re: [PATCH v2 3/6] mm: replace vma->vm_flags direct modifications with modifier calls

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:48, Suren Baghdasaryan wrote:
> Replace direct modifications to vma->vm_flags with calls to modifier
> functions to be able to track flag changes and to keep vma locking
> correctness.

Is this a manual (git grep) based work or have you used Coccinele for
the patch generation?

My potentially incomplete check
$ git grep ">[[:space:]]*vm_flags[[:space:]]*[&|^]="

shows that nothing should be left after this. There is still quite a lot
of direct checks of the flags (more than 600). Maybe it would be good to
make flags accessible only via accessors which would also prevent any
future direct setting of those flags in uncontrolled way as well.

Anyway
Acked-by: Michal Hocko 
-- 
Michal Hocko
SUSE Labs



[PATCH v2 5/6] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

2023-01-25 Thread Suren Baghdasaryan
In cases when VMA flags are modified after VMA was isolated and mmap_lock
was downgraded, flags modifications would result in an assertion because
mmap write lock is not held.
Introduce mod_vm_flags_nolock to be used in such situation.
Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
flags modification and to avoid assertion.

Signed-off-by: Suren Baghdasaryan 
---
 arch/x86/mm/pat/memtype.c | 10 +++---
 include/linux/mm.h| 12 +---
 include/linux/pgtable.h   |  5 +++--
 mm/memory.c   | 13 +++--
 mm/memremap.c |  4 ++--
 mm/mmap.c | 16 ++--
 6 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index ae9645c900fa..d8adc0b42cf2 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -1046,7 +1046,7 @@ void track_pfn_insert(struct vm_area_struct *vma, 
pgprot_t *prot, pfn_t pfn)
  * can be for the entire vma (in which case pfn, size are zero).
  */
 void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
-unsigned long size)
+unsigned long size, bool mm_wr_locked)
 {
resource_size_t paddr;
unsigned long prot;
@@ -1065,8 +1065,12 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned 
long pfn,
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
-   if (vma)
-   clear_vm_flags(vma, VM_PAT);
+   if (vma) {
+   if (mm_wr_locked)
+   clear_vm_flags(vma, VM_PAT);
+   else
+   mod_vm_flags_nolock(vma, 0, VM_PAT);
+   }
 }
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 55335edd1373..48d49930c411 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -656,12 +656,18 @@ static inline void clear_vm_flags(struct vm_area_struct 
*vma,
vma->vm_flags &= ~flags;
 }
 
+static inline void mod_vm_flags_nolock(struct vm_area_struct *vma,
+  unsigned long set, unsigned long clear)
+{
+   vma->vm_flags |= set;
+   vma->vm_flags &= ~clear;
+}
+
 static inline void mod_vm_flags(struct vm_area_struct *vma,
unsigned long set, unsigned long clear)
 {
mmap_assert_write_locked(vma->vm_mm);
-   vma->vm_flags |= set;
-   vma->vm_flags &= ~clear;
+   mod_vm_flags_nolock(vma, set, clear);
 }
 
 static inline void vma_set_anonymous(struct vm_area_struct *vma)
@@ -2087,7 +2093,7 @@ static inline void zap_vma_pages(struct vm_area_struct 
*vma)
 }
 void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
struct vm_area_struct *start_vma, unsigned long start,
-   unsigned long end);
+   unsigned long end, bool mm_wr_locked);
 
 struct mmu_notifier_range;
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5fd45454c073..c63cd44777ec 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1185,7 +1185,8 @@ static inline int track_pfn_copy(struct vm_area_struct 
*vma)
  * can be for the entire vma (in which case pfn, size are zero).
  */
 static inline void untrack_pfn(struct vm_area_struct *vma,
-  unsigned long pfn, unsigned long size)
+  unsigned long pfn, unsigned long size,
+  bool mm_wr_locked)
 {
 }
 
@@ -1203,7 +1204,7 @@ extern void track_pfn_insert(struct vm_area_struct *vma, 
pgprot_t *prot,
 pfn_t pfn);
 extern int track_pfn_copy(struct vm_area_struct *vma);
 extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
-   unsigned long size);
+   unsigned long size, bool mm_wr_locked);
 extern void untrack_pfn_moved(struct vm_area_struct *vma);
 #endif
 
diff --git a/mm/memory.c b/mm/memory.c
index d6902065e558..5b11b50e2c4a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1613,7 +1613,7 @@ void unmap_page_range(struct mmu_gather *tlb,
 static void unmap_single_vma(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start_addr,
unsigned long end_addr,
-   struct zap_details *details)
+   struct zap_details *details, bool mm_wr_locked)
 {
unsigned long start = max(vma->vm_start, start_addr);
unsigned long end;
@@ -1628,7 +1628,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
uprobe_munmap(vma, start, end);
 
if (unlikely(vma->vm_flags & VM_PFNMAP))
-   untrack_pfn(vma, 0, 0);
+   untrack_pfn(vma, 0, 0, mm_wr_locked);
 
if (start != end) {
if (unlikely(is_vm_hugetlb_page(vma))) {
@@ -1675,7 +1675,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
  */
 void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,

[PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Suren Baghdasaryan
Replace indirect modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness. Add a BUG_ON check in ksm_madvise() to catch indirect
vm_flags modification attempts.

Signed-off-by: Suren Baghdasaryan 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 5 -
 arch/s390/mm/gmap.c| 5 -
 mm/khugepaged.c| 2 ++
 mm/ksm.c   | 2 ++
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 1d67baa5557a..325a7a47d348 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -393,6 +393,7 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
 {
unsigned long gfn = memslot->base_gfn;
unsigned long end, start = gfn_to_hva(kvm, gfn);
+   unsigned long vm_flags;
int ret = 0;
struct vm_area_struct *vma;
int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
@@ -409,12 +410,14 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
ret = H_STATE;
break;
}
+   vm_flags = vma->vm_flags;
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
- merge_flag, >vm_flags);
+ merge_flag, _flags);
if (ret) {
ret = H_STATE;
break;
}
+   reset_vm_flags(vma, vm_flags);
start = vma->vm_end;
} while (end > vma->vm_end);
 
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 3a695b8a1e3c..d5eb47dcdacb 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2587,14 +2587,17 @@ int gmap_mark_unmergeable(void)
 {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+   unsigned long vm_flags;
int ret;
VMA_ITERATOR(vmi, mm, 0);
 
for_each_vma(vmi, vma) {
+   vm_flags = vma->vm_flags;
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
- MADV_UNMERGEABLE, >vm_flags);
+ MADV_UNMERGEABLE, _flags);
if (ret)
return ret;
+   reset_vm_flags(vma, vm_flags);
}
mm->def_flags &= ~VM_MERGEABLE;
return 0;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8abc59345bf2..76b24cd0c179 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -354,6 +354,8 @@ struct attribute_group khugepaged_attr_group = {
 int hugepage_madvise(struct vm_area_struct *vma,
 unsigned long *vm_flags, int advice)
 {
+   /* vma->vm_flags can be changed only using modifier functions */
+   BUG_ON(vm_flags == >vm_flags);
switch (advice) {
case MADV_HUGEPAGE:
 #ifdef CONFIG_S390
diff --git a/mm/ksm.c b/mm/ksm.c
index 04f1c8c2df11..992b2be9f5e6 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2573,6 +2573,8 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long 
start,
struct mm_struct *mm = vma->vm_mm;
int err;
 
+   /* vma->vm_flags can be changed only using modifier functions */
+   BUG_ON(vm_flags == >vm_flags);
switch (advice) {
case MADV_MERGEABLE:
/*
-- 
2.39.1




Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Peter Zijlstra
On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 2d6d790d9bed..6c7c70bf50dd 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -491,7 +491,13 @@ struct vm_area_struct {
>* See vmf_insert_mixed_prot() for discussion.
>*/
>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * WARNING! Do not modify directly.
> +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> +  */
> + unsigned long vm_flags;

We have __private and ACCESS_PRIVATE() to help with enforcing this.



Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:46, Suren Baghdasaryan wrote:
> vm_flags are among VMA attributes which affect decisions like VMA merging
> and splitting. Therefore all vm_flags modifications are performed after
> taking exclusive mmap_lock to prevent vm_flags updates racing with such
> operations. Introduce modifier functions for vm_flags to be used whenever
> flags are updated. This way we can better check and control correct
> locking behavior during these updates.
> 
> Signed-off-by: Suren Baghdasaryan 

Acked-by: Michal Hocko 

> ---
>  include/linux/mm.h   | 37 +
>  include/linux/mm_types.h |  8 +++-
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c2f62bdce134..b71f2809caac 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -627,6 +627,43 @@ static inline void vma_init(struct vm_area_struct *vma, 
> struct mm_struct *mm)
>   INIT_LIST_HEAD(>anon_vma_chain);
>  }
>  
> +/* Use when VMA is not part of the VMA tree and needs no locking */
> +static inline void init_vm_flags(struct vm_area_struct *vma,
> +  unsigned long flags)
> +{
> + vma->vm_flags = flags;
> +}
> +
> +/* Use when VMA is part of the VMA tree and modifications need coordination 
> */
> +static inline void reset_vm_flags(struct vm_area_struct *vma,
> +   unsigned long flags)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + init_vm_flags(vma, flags);
> +}
> +
> +static inline void set_vm_flags(struct vm_area_struct *vma,
> + unsigned long flags)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + vma->vm_flags |= flags;
> +}
> +
> +static inline void clear_vm_flags(struct vm_area_struct *vma,
> +   unsigned long flags)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + vma->vm_flags &= ~flags;
> +}
> +
> +static inline void mod_vm_flags(struct vm_area_struct *vma,
> + unsigned long set, unsigned long clear)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + vma->vm_flags |= set;
> + vma->vm_flags &= ~clear;
> +}
> +
>  static inline void vma_set_anonymous(struct vm_area_struct *vma)
>  {
>   vma->vm_ops = NULL;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 2d6d790d9bed..6c7c70bf50dd 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -491,7 +491,13 @@ struct vm_area_struct {
>* See vmf_insert_mixed_prot() for discussion.
>*/
>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * WARNING! Do not modify directly.
> +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> +  */
> + unsigned long vm_flags;
>  
>   /*
>* For areas with an address space and backing store,
> -- 
> 2.39.1

-- 
Michal Hocko
SUSE Labs



[PATCH v2 3/6] mm: replace vma->vm_flags direct modifications with modifier calls

2023-01-25 Thread Suren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

Signed-off-by: Suren Baghdasaryan 
---
 arch/arm/kernel/process.c  |  2 +-
 arch/ia64/mm/init.c|  8 
 arch/loongarch/include/asm/tlb.h   |  2 +-
 arch/powerpc/kvm/book3s_xive_native.c  |  2 +-
 arch/powerpc/mm/book3s64/subpage_prot.c|  2 +-
 arch/powerpc/platforms/book3s/vas-api.c|  2 +-
 arch/powerpc/platforms/cell/spufs/file.c   | 14 +++---
 arch/s390/mm/gmap.c|  3 +--
 arch/x86/entry/vsyscall/vsyscall_64.c  |  2 +-
 arch/x86/kernel/cpu/sgx/driver.c   |  2 +-
 arch/x86/kernel/cpu/sgx/virt.c |  2 +-
 arch/x86/mm/pat/memtype.c  |  6 +++---
 arch/x86/um/mem_32.c   |  2 +-
 drivers/acpi/pfr_telemetry.c   |  2 +-
 drivers/android/binder.c   |  3 +--
 drivers/char/mspec.c   |  2 +-
 drivers/crypto/hisilicon/qm.c  |  2 +-
 drivers/dax/device.c   |  2 +-
 drivers/dma/idxd/cdev.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c  |  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_events.c|  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_process.c   |  4 ++--
 drivers/gpu/drm/drm_gem.c  |  2 +-
 drivers/gpu/drm/drm_gem_dma_helper.c   |  3 +--
 drivers/gpu/drm/drm_gem_shmem_helper.c |  2 +-
 drivers/gpu/drm/drm_vm.c   |  8 
 drivers/gpu/drm/etnaviv/etnaviv_gem.c  |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c|  4 ++--
 drivers/gpu/drm/gma500/framebuffer.c   |  2 +-
 drivers/gpu/drm/i810/i810_dma.c|  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c   |  4 ++--
 drivers/gpu/drm/mediatek/mtk_drm_gem.c |  2 +-
 drivers/gpu/drm/msm/msm_gem.c  |  2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c |  3 +--
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c|  3 +--
 drivers/gpu/drm/tegra/gem.c|  5 ++---
 drivers/gpu/drm/ttm/ttm_bo_vm.c|  3 +--
 drivers/gpu/drm/virtio/virtgpu_vram.c  |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c   |  2 +-
 drivers/gpu/drm/xen/xen_drm_front_gem.c|  3 +--
 drivers/hsi/clients/cmt_speech.c   |  2 +-
 drivers/hwtracing/intel_th/msu.c   |  2 +-
 drivers/hwtracing/stm/core.c   |  2 +-
 drivers/infiniband/hw/hfi1/file_ops.c  |  4 ++--
 drivers/infiniband/hw/mlx5/main.c  |  4 ++--
 drivers/infiniband/hw/qib/qib_file_ops.c   | 13 ++---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c   |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c|  2 +-
 .../media/common/videobuf2/videobuf2-dma-contig.c  |  2 +-
 drivers/media/common/videobuf2/videobuf2-vmalloc.c |  2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c  |  2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c  |  4 ++--
 drivers/media/v4l2-core/videobuf-vmalloc.c |  2 +-
 drivers/misc/cxl/context.c |  2 +-
 drivers/misc/habanalabs/common/memory.c|  2 +-
 drivers/misc/habanalabs/gaudi/gaudi.c  |  4 ++--
 drivers/misc/habanalabs/gaudi2/gaudi2.c|  8 
 drivers/misc/habanalabs/goya/goya.c|  4 ++--
 drivers/misc/ocxl/context.c|  4 ++--
 drivers/misc/ocxl/sysfs.c  |  2 +-
 drivers/misc/open-dice.c   |  4 ++--
 drivers/misc/sgi-gru/grufile.c |  4 ++--
 drivers/misc/uacce/uacce.c |  2 +-
 drivers/sbus/char/oradax.c |  2 +-
 drivers/scsi/cxlflash/ocxl_hw.c|  2 +-
 drivers/scsi/sg.c  |  2 +-
 drivers/staging/media/atomisp/pci/hmm/hmm_bo.c |  2 +-
 drivers/staging/media/deprecated/meye/meye.c   |  4 ++--
 .../media/deprecated/stkwebcam/stk-webcam.c|  2 +-
 drivers/target/target_core_user.c  |  2 +-
 drivers/uio/uio.c  |  2 +-
 drivers/usb/core/devio.c   |  3 +--
 drivers/usb/mon/mon_bin.c  |  3 +--
 drivers/vdpa/vdpa_user/iova_domain.c   |  2 +-
 drivers/vfio/pci/vfio_pci_core.c   |  2 +-
 drivers/vhost/vdpa.c   |  2 +-
 drivers/video/fbdev/68328fb.c 

[PATCH v2 6/6] mm: export dump_mm()

2023-01-25 Thread Suren Baghdasaryan
mmap_assert_write_locked() is used in vm_flags modifiers. Because
mmap_assert_write_locked() uses dump_mm() and vm_flags are sometimes
modified from from inside a module, it's necessary to export
dump_mm() function.

Signed-off-by: Suren Baghdasaryan 
---
 mm/debug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/debug.c b/mm/debug.c
index 9d3d893dc7f4..96d594e16292 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -215,6 +215,7 @@ void dump_mm(const struct mm_struct *mm)
mm->def_flags, >def_flags
);
 }
+EXPORT_SYMBOL(dump_mm);
 
 static bool page_init_poisoning __read_mostly = true;
 
-- 
2.39.1




[PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
vm_flags are among VMA attributes which affect decisions like VMA merging
and splitting. Therefore all vm_flags modifications are performed after
taking exclusive mmap_lock to prevent vm_flags updates racing with such
operations. Introduce modifier functions for vm_flags to be used whenever
flags are updated. This way we can better check and control correct
locking behavior during these updates.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h   | 37 +
 include/linux/mm_types.h |  8 +++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c2f62bdce134..b71f2809caac 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -627,6 +627,43 @@ static inline void vma_init(struct vm_area_struct *vma, 
struct mm_struct *mm)
INIT_LIST_HEAD(>anon_vma_chain);
 }
 
+/* Use when VMA is not part of the VMA tree and needs no locking */
+static inline void init_vm_flags(struct vm_area_struct *vma,
+unsigned long flags)
+{
+   vma->vm_flags = flags;
+}
+
+/* Use when VMA is part of the VMA tree and modifications need coordination */
+static inline void reset_vm_flags(struct vm_area_struct *vma,
+ unsigned long flags)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   init_vm_flags(vma, flags);
+}
+
+static inline void set_vm_flags(struct vm_area_struct *vma,
+   unsigned long flags)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   vma->vm_flags |= flags;
+}
+
+static inline void clear_vm_flags(struct vm_area_struct *vma,
+ unsigned long flags)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   vma->vm_flags &= ~flags;
+}
+
+static inline void mod_vm_flags(struct vm_area_struct *vma,
+   unsigned long set, unsigned long clear)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   vma->vm_flags |= set;
+   vma->vm_flags &= ~clear;
+}
+
 static inline void vma_set_anonymous(struct vm_area_struct *vma)
 {
vma->vm_ops = NULL;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2d6d790d9bed..6c7c70bf50dd 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -491,7 +491,13 @@ struct vm_area_struct {
 * See vmf_insert_mixed_prot() for discussion.
 */
pgprot_t vm_page_prot;
-   unsigned long vm_flags; /* Flags, see mm.h. */
+
+   /*
+* Flags, see mm.h.
+* WARNING! Do not modify directly.
+* Use {init|reset|set|clear|mod}_vm_flags() functions instead.
+*/
+   unsigned long vm_flags;
 
/*
 * For areas with an address space and backing store,
-- 
2.39.1




Re: [PATCH v2 2/6] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:47, Suren Baghdasaryan wrote:
> To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(),
> replace it with VM_LOCKED_MASK bitmask and convert all users.
>
> Signed-off-by: Suren Baghdasaryan 

Acked-by: Michal Hocko 

> ---
>  include/linux/mm.h | 4 ++--
>  kernel/fork.c  | 2 +-
>  mm/hugetlb.c   | 4 ++--
>  mm/mlock.c | 6 +++---
>  mm/mmap.c  | 6 +++---
>  mm/mremap.c| 2 +-
>  6 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b71f2809caac..da62bdd627bf 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -421,8 +421,8 @@ extern unsigned int kobjsize(const void *objp);
>  /* This mask defines which mm->def_flags a process can inherit its parent */
>  #define VM_INIT_DEF_MASK VM_NOHUGEPAGE
>  
> -/* This mask is used to clear all the VMA flags used by mlock */
> -#define VM_LOCKED_CLEAR_MASK (~(VM_LOCKED | VM_LOCKONFAULT))
> +/* This mask represents all the VMA flag bits used by mlock */
> +#define VM_LOCKED_MASK   (VM_LOCKED | VM_LOCKONFAULT)
>  
>  /* Arch-specific flags to clear when updating VM flags on protection change 
> */
>  #ifndef VM_ARCH_CLEAR
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 6683c1b0f460..03d472051236 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -669,7 +669,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>   tmp->anon_vma = NULL;
>   } else if (anon_vma_fork(tmp, mpnt))
>   goto fail_nomem_anon_vma_fork;
> - tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
> + clear_vm_flags(tmp, VM_LOCKED_MASK);
>   file = tmp->vm_file;
>   if (file) {
>   struct address_space *mapping = file->f_mapping;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d20c8b09890e..4ecdbad9a451 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6973,8 +6973,8 @@ static unsigned long page_table_shareable(struct 
> vm_area_struct *svma,
>   unsigned long s_end = sbase + PUD_SIZE;
>  
>   /* Allow segments to share if only one is marked locked */
> - unsigned long vm_flags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
> - unsigned long svm_flags = svma->vm_flags & VM_LOCKED_CLEAR_MASK;
> + unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED_MASK;
> + unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED_MASK;
>  
>   /*
>* match the virtual addresses, permission and the alignment of the
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 0336f52e03d7..5c4fff93cd6b 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -497,7 +497,7 @@ static int apply_vma_lock_flags(unsigned long start, 
> size_t len,
>   if (vma->vm_start != tmp)
>   return -ENOMEM;
>  
> - newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
> + newflags = vma->vm_flags & ~VM_LOCKED_MASK;
>   newflags |= flags;
>   /* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
>   tmp = vma->vm_end;
> @@ -661,7 +661,7 @@ static int apply_mlockall_flags(int flags)
>   struct vm_area_struct *vma, *prev = NULL;
>   vm_flags_t to_add = 0;
>  
> - current->mm->def_flags &= VM_LOCKED_CLEAR_MASK;
> + current->mm->def_flags &= ~VM_LOCKED_MASK;
>   if (flags & MCL_FUTURE) {
>   current->mm->def_flags |= VM_LOCKED;
>  
> @@ -681,7 +681,7 @@ static int apply_mlockall_flags(int flags)
>   for_each_vma(vmi, vma) {
>   vm_flags_t newflags;
>  
> - newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
> + newflags = vma->vm_flags & ~VM_LOCKED_MASK;
>   newflags |= to_add;
>  
>   /* Ignore errors */
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d4abc6feced1..323bd253b25a 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2671,7 +2671,7 @@ unsigned long mmap_region(struct file *file, unsigned 
> long addr,
>   if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
>   is_vm_hugetlb_page(vma) ||
>   vma == get_gate_vma(current->mm))
> - vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
> + clear_vm_flags(vma, VM_LOCKED_MASK);
>   else
>   mm->locked_vm += (len >> PAGE_SHIFT);
>   }
> @@ -3340,8 +3340,8 @@ static struct vm_area_struct *__install_special_mapping(
>   vma->vm_start = addr;
>   vma->vm_end = addr + len;
>  
> - vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY;
> - vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
> + init_vm_flags(vma, (vm_flags | mm->def_flags |
> +   VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK);
>   vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>  
>   vma->vm_ops = ops;
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 

[PATCH v2 0/6] introduce vm_flags modifier functions

2023-01-25 Thread Suren Baghdasaryan
This patchset was originally published as a part of per-VMA locking [1] and
was split after suggestion that it's viable on its own and to facilitate
the review process. It is now a preprequisite for the next version of per-VMA
lock patchset, which reuses vm_flags modifier functions to lock the VMA when
vm_flags are being updated.

VMA vm_flags modifications are usually done under exclusive mmap_lock
protection because this attrubute affects other decisions like VMA merging
or splitting and races should be prevented. Introduce vm_flags modifier
functions to enforce correct locking.

[1] https://lore.kernel.org/all/20230109205336.3665937-1-sur...@google.com/

The patchset applies cleanly over mm-unstable branch of mm tree.

My apologies for an extremely large distribution list. The patch touches
lots of files and many are in arch/ and drivers/.

Suren Baghdasaryan (6):
  mm: introduce vma->vm_flags modifier functions
  mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK
  mm: replace vma->vm_flags direct modifications with modifier calls
  mm: replace vma->vm_flags indirect modification in ksm_madvise
  mm: introduce mod_vm_flags_nolock and use it in untrack_pfn
  mm: export dump_mm()

 arch/arm/kernel/process.c |  2 +-
 arch/ia64/mm/init.c   |  8 +--
 arch/loongarch/include/asm/tlb.h  |  2 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c|  5 +-
 arch/powerpc/kvm/book3s_xive_native.c |  2 +-
 arch/powerpc/mm/book3s64/subpage_prot.c   |  2 +-
 arch/powerpc/platforms/book3s/vas-api.c   |  2 +-
 arch/powerpc/platforms/cell/spufs/file.c  | 14 ++---
 arch/s390/mm/gmap.c   |  8 +--
 arch/x86/entry/vsyscall/vsyscall_64.c |  2 +-
 arch/x86/kernel/cpu/sgx/driver.c  |  2 +-
 arch/x86/kernel/cpu/sgx/virt.c|  2 +-
 arch/x86/mm/pat/memtype.c | 14 +++--
 arch/x86/um/mem_32.c  |  2 +-
 drivers/acpi/pfr_telemetry.c  |  2 +-
 drivers/android/binder.c  |  3 +-
 drivers/char/mspec.c  |  2 +-
 drivers/crypto/hisilicon/qm.c |  2 +-
 drivers/dax/device.c  |  2 +-
 drivers/dma/idxd/cdev.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  4 +-
 drivers/gpu/drm/drm_gem.c |  2 +-
 drivers/gpu/drm/drm_gem_dma_helper.c  |  3 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c|  2 +-
 drivers/gpu/drm/drm_vm.c  |  8 +--
 drivers/gpu/drm/etnaviv/etnaviv_gem.c |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c   |  4 +-
 drivers/gpu/drm/gma500/framebuffer.c  |  2 +-
 drivers/gpu/drm/i810/i810_dma.c   |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  4 +-
 drivers/gpu/drm/mediatek/mtk_drm_gem.c|  2 +-
 drivers/gpu/drm/msm/msm_gem.c |  2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c|  3 +-
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c   |  3 +-
 drivers/gpu/drm/tegra/gem.c   |  5 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |  3 +-
 drivers/gpu/drm/virtio/virtgpu_vram.c |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c  |  2 +-
 drivers/gpu/drm/xen/xen_drm_front_gem.c   |  3 +-
 drivers/hsi/clients/cmt_speech.c  |  2 +-
 drivers/hwtracing/intel_th/msu.c  |  2 +-
 drivers/hwtracing/stm/core.c  |  2 +-
 drivers/infiniband/hw/hfi1/file_ops.c |  4 +-
 drivers/infiniband/hw/mlx5/main.c |  4 +-
 drivers/infiniband/hw/qib/qib_file_ops.c  | 13 +++--
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c  |  2 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.c   |  2 +-
 .../common/videobuf2/videobuf2-dma-contig.c   |  2 +-
 .../common/videobuf2/videobuf2-vmalloc.c  |  2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c |  4 +-
 drivers/media/v4l2-core/videobuf-vmalloc.c|  2 +-
 drivers/misc/cxl/context.c|  2 +-
 drivers/misc/habanalabs/common/memory.c   |  2 +-
 drivers/misc/habanalabs/gaudi/gaudi.c |  4 +-
 drivers/misc/habanalabs/gaudi2/gaudi2.c   |  8 +--
 drivers/misc/habanalabs/goya/goya.c   |  4 +-
 drivers/misc/ocxl/context.c   |  4 +-
 drivers/misc/ocxl/sysfs.c |  2 +-
 drivers/misc/open-dice.c  |  4 +-
 drivers/misc/sgi-gru/grufile.c|  4 +-
 drivers/misc/uacce/uacce.c|  2 +-
 drivers/sbus/char/oradax.c|  2 +-
 drivers/scsi/cxlflash/ocxl_hw.c   |  2 +-
 drivers/scsi/sg.c

[PATCH v2 2/6] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

2023-01-25 Thread Suren Baghdasaryan
To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(),
replace it with VM_LOCKED_MASK bitmask and convert all users.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h | 4 ++--
 kernel/fork.c  | 2 +-
 mm/hugetlb.c   | 4 ++--
 mm/mlock.c | 6 +++---
 mm/mmap.c  | 6 +++---
 mm/mremap.c| 2 +-
 6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b71f2809caac..da62bdd627bf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -421,8 +421,8 @@ extern unsigned int kobjsize(const void *objp);
 /* This mask defines which mm->def_flags a process can inherit its parent */
 #define VM_INIT_DEF_MASK   VM_NOHUGEPAGE
 
-/* This mask is used to clear all the VMA flags used by mlock */
-#define VM_LOCKED_CLEAR_MASK   (~(VM_LOCKED | VM_LOCKONFAULT))
+/* This mask represents all the VMA flag bits used by mlock */
+#define VM_LOCKED_MASK (VM_LOCKED | VM_LOCKONFAULT)
 
 /* Arch-specific flags to clear when updating VM flags on protection change */
 #ifndef VM_ARCH_CLEAR
diff --git a/kernel/fork.c b/kernel/fork.c
index 6683c1b0f460..03d472051236 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -669,7 +669,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
tmp->anon_vma = NULL;
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
-   tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+   clear_vm_flags(tmp, VM_LOCKED_MASK);
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d20c8b09890e..4ecdbad9a451 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6973,8 +6973,8 @@ static unsigned long page_table_shareable(struct 
vm_area_struct *svma,
unsigned long s_end = sbase + PUD_SIZE;
 
/* Allow segments to share if only one is marked locked */
-   unsigned long vm_flags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
-   unsigned long svm_flags = svma->vm_flags & VM_LOCKED_CLEAR_MASK;
+   unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED_MASK;
+   unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED_MASK;
 
/*
 * match the virtual addresses, permission and the alignment of the
diff --git a/mm/mlock.c b/mm/mlock.c
index 0336f52e03d7..5c4fff93cd6b 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -497,7 +497,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t 
len,
if (vma->vm_start != tmp)
return -ENOMEM;
 
-   newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+   newflags = vma->vm_flags & ~VM_LOCKED_MASK;
newflags |= flags;
/* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
tmp = vma->vm_end;
@@ -661,7 +661,7 @@ static int apply_mlockall_flags(int flags)
struct vm_area_struct *vma, *prev = NULL;
vm_flags_t to_add = 0;
 
-   current->mm->def_flags &= VM_LOCKED_CLEAR_MASK;
+   current->mm->def_flags &= ~VM_LOCKED_MASK;
if (flags & MCL_FUTURE) {
current->mm->def_flags |= VM_LOCKED;
 
@@ -681,7 +681,7 @@ static int apply_mlockall_flags(int flags)
for_each_vma(vmi, vma) {
vm_flags_t newflags;
 
-   newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+   newflags = vma->vm_flags & ~VM_LOCKED_MASK;
newflags |= to_add;
 
/* Ignore errors */
diff --git a/mm/mmap.c b/mm/mmap.c
index d4abc6feced1..323bd253b25a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2671,7 +2671,7 @@ unsigned long mmap_region(struct file *file, unsigned 
long addr,
if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
is_vm_hugetlb_page(vma) ||
vma == get_gate_vma(current->mm))
-   vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
+   clear_vm_flags(vma, VM_LOCKED_MASK);
else
mm->locked_vm += (len >> PAGE_SHIFT);
}
@@ -3340,8 +3340,8 @@ static struct vm_area_struct *__install_special_mapping(
vma->vm_start = addr;
vma->vm_end = addr + len;
 
-   vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY;
-   vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
+   init_vm_flags(vma, (vm_flags | mm->def_flags |
+ VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
 
vma->vm_ops = ops;
diff --git a/mm/mremap.c b/mm/mremap.c
index 1b3ee02bead7..35db9752cb6a 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -687,7 +687,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 
if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
  

[QEMU][PATCH v4 08/10] meson.build: do not set have_xen_pci_passthrough for aarch64 targets

2023-01-25 Thread Vikram Garhwal
From: Stefano Stabellini 

have_xen_pci_passthrough is only used for Xen x86 VMs.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Alex Bennée 
---
 meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/meson.build b/meson.build
index 6d3b665629..693802adb2 100644
--- a/meson.build
+++ b/meson.build
@@ -1471,6 +1471,8 @@ have_xen_pci_passthrough = 
get_option('xen_pci_passthrough') \
error_message: 'Xen PCI passthrough requested but Xen not enabled') 
\
   .require(targetos == 'linux',
error_message: 'Xen PCI passthrough not available on this 
platform') \
+  .require(cpu == 'x86'  or cpu == 'x86_64',
+   error_message: 'Xen PCI passthrough not available on this 
platform') \
   .allowed()
 
 
-- 
2.17.0




[QEMU][PATCH v4 04/10] xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common

2023-01-25 Thread Vikram Garhwal
From: Stefano Stabellini 

This patch does following:
1. creates arch_handle_ioreq() and arch_xen_set_memory(). This is done in
preparation for moving most of xen-hvm code to an arch-neutral location,
move the x86-specific portion of xen_set_memory to arch_xen_set_memory.
Also, move handle_vmport_ioreq to arch_handle_ioreq.

2. Pure code movement: move common functions to hw/xen/xen-hvm-common.c
Extract common functionalities from hw/i386/xen/xen-hvm.c and move them to
hw/xen/xen-hvm-common.c. These common functions are useful for creating
an IOREQ server.

xen_hvm_init_pc() contains the architecture independent code for creating
and mapping a IOREQ server, connecting memory and IO listeners, initializing
a xen bus and registering backends. Moved this common xen code to a new
function xen_register_ioreq() which can be used by both x86 and ARM 
machines.

Following functions are moved to hw/xen/xen-hvm-common.c:
xen_vcpu_eport(), xen_vcpu_ioreq(), xen_ram_alloc(), xen_set_memory(),
xen_region_add(), xen_region_del(), xen_io_add(), xen_io_del(),
xen_device_realize(), xen_device_unrealize(),
cpu_get_ioreq_from_shared_memory(), cpu_get_ioreq(), do_inp(),
do_outp(), rw_phys_req_item(), read_phys_req_item(),
write_phys_req_item(), cpu_ioreq_pio(), cpu_ioreq_move(),
cpu_ioreq_config(), handle_ioreq(), handle_buffered_iopage(),
handle_buffered_io(), cpu_handle_ioreq(), xen_main_loop_prepare(),
xen_hvm_change_state_handler(), xen_exit_notifier(),
xen_map_ioreq_server(), destroy_hvm_domain() and
xen_shutdown_fatal_error()

3. Removed static type from below functions:
1. xen_region_add()
2. xen_region_del()
3. xen_io_add()
4. xen_io_del()
5. xen_device_realize()
6. xen_device_unrealize()
7. xen_hvm_change_state_handler()
8. cpu_ioreq_pio()
9. xen_exit_notifier()

4. Replace TARGET_PAGE_SIZE with XC_PAGE_SIZE to match the page side with Xen.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
 hw/i386/xen/trace-events|   14 -
 hw/i386/xen/xen-hvm.c   | 1023 ++-
 hw/xen/meson.build  |5 +-
 hw/xen/trace-events |   14 +
 hw/xen/xen-hvm-common.c |  870 ++
 include/hw/i386/xen_arch_hvm.h  |   11 +
 include/hw/xen/arch_hvm.h   |3 +
 include/hw/xen/xen-hvm-common.h |   97 +++
 8 files changed, 1063 insertions(+), 974 deletions(-)
 create mode 100644 hw/xen/xen-hvm-common.c
 create mode 100644 include/hw/i386/xen_arch_hvm.h
 create mode 100644 include/hw/xen/arch_hvm.h
 create mode 100644 include/hw/xen/xen-hvm-common.h

diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index a0c89d91c4..5d0a8d6dcf 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -7,17 +7,3 @@ xen_platform_log(char *s) "xen platform: %s"
 xen_pv_mmio_read(uint64_t addr) "WARNING: read from Xen PV Device MMIO space 
(address 0x%"PRIx64")"
 xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space 
(address 0x%"PRIx64")"
 
-# xen-hvm.c
-xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: 0x%lx, 
size 0x%lx"
-xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) 
"0x%"PRIx64" size 0x%lx, log_dirty %i"
-handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, uint32_t 
data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) 
"I/O=%p type=%d dir=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d 
size=%d"
-handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p read 
type=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
-handle_ioreq_write(void *req, uint32_t type, uint32_t df, uint32_t 
data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) 
"I/O=%p write type=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d 
size=%d"
-cpu_ioreq_pio(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p pio dir=%d 
df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
-cpu_ioreq_pio_read_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) 
"I/O=%p pio read reg data=0x%"PRIx64" port=0x%"PRIx64" size=%d"
-cpu_ioreq_pio_write_reg(void *req, uint64_t data, uint64_t addr, uint32_t 
size) "I/O=%p pio write reg data=0x%"PRIx64" port=0x%"PRIx64" size=%d"
-cpu_ioreq_move(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p copy 
dir=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
-xen_map_resource_ioreq(uint32_t id, void *addr) "id: %u addr: %p"
-cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, 
uint32_t 

[QEMU][PATCH v4 05/10] include/hw/xen/xen_common: return error from xen_create_ioreq_server

2023-01-25 Thread Vikram Garhwal
From: Stefano Stabellini 

This is done to prepare for enabling xenpv support for ARM architecture.
On ARM it is possible to have a functioning xenpv machine with only the
PV backends and no IOREQ server. If the IOREQ server creation fails,
continue to the PV backends initialization.

Signed-off-by: Stefano Stabellini 
Signed-off-by: Vikram Garhwal 
---
 include/hw/xen/xen_common.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index 9a13a756ae..9ec69582b3 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -467,9 +467,10 @@ static inline void xen_unmap_pcidev(domid_t dom,
 {
 }
 
-static inline void xen_create_ioreq_server(domid_t dom,
-   ioservid_t *ioservid)
+static inline int xen_create_ioreq_server(domid_t dom,
+  ioservid_t *ioservid)
 {
+return 0;
 }
 
 static inline void xen_destroy_ioreq_server(domid_t dom,
@@ -600,8 +601,8 @@ static inline void xen_unmap_pcidev(domid_t dom,
   PCI_FUNC(pci_dev->devfn));
 }
 
-static inline void xen_create_ioreq_server(domid_t dom,
-   ioservid_t *ioservid)
+static inline int xen_create_ioreq_server(domid_t dom,
+  ioservid_t *ioservid)
 {
 int rc = xendevicemodel_create_ioreq_server(xen_dmod, dom,
 HVM_IOREQSRV_BUFIOREQ_ATOMIC,
@@ -609,12 +610,14 @@ static inline void xen_create_ioreq_server(domid_t dom,
 
 if (rc == 0) {
 trace_xen_ioreq_server_create(*ioservid);
-return;
+return rc;
 }
 
 *ioservid = 0;
 use_default_ioreq_server = true;
 trace_xen_default_ioreq_server();
+
+return rc;
 }
 
 static inline void xen_destroy_ioreq_server(domid_t dom,
-- 
2.17.0




[QEMU][PATCH v4 10/10] meson.build: enable xenpv machine build for ARM

2023-01-25 Thread Vikram Garhwal
Add CONFIG_XEN for aarch64 device to support build for ARM targets.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Alex Bennée 
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 693802adb2..13c4ad1017 100644
--- a/meson.build
+++ b/meson.build
@@ -135,7 +135,7 @@ endif
 if cpu in ['x86', 'x86_64', 'arm', 'aarch64']
   # i386 emulator provides xenpv machine type for multiple architectures
   accelerator_targets += {
-'CONFIG_XEN': ['i386-softmmu', 'x86_64-softmmu'],
+'CONFIG_XEN': ['i386-softmmu', 'x86_64-softmmu', 'aarch64-softmmu'],
   }
 endif
 if cpu in ['x86', 'x86_64']
-- 
2.17.0




[QEMU][PATCH v4 09/10] hw/arm: introduce xenpvh machine

2023-01-25 Thread Vikram Garhwal
Add a new machine xenpvh which creates a IOREQ server to register/connect with
Xen Hypervisor.

Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
TPM emulator and connects to swtpm running on host machine via chardev socket
and support TPM functionalities for a guest domain.

Extra command line for aarch64 xenpvh QEMU to connect to swtpm:
-chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
-tpmdev emulator,id=tpm0,chardev=chrtpm \
-machine tpm-base-addr=0x0c00 \

swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
provides access to TPM functionality over socket, chardev and CUSE interface.
Github repo: https://github.com/stefanberger/swtpm
Example for starting swtpm on host machine:
mkdir /tmp/vtpm2
swtpm socket --tpmstate dir=/tmp/vtpm2 \
--ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
 docs/system/arm/xenpvh.rst|  34 +++
 docs/system/target-arm.rst|   1 +
 hw/arm/meson.build|   2 +
 hw/arm/xen_arm.c  | 184 ++
 include/hw/arm/xen_arch_hvm.h |   9 ++
 include/hw/xen/arch_hvm.h |   2 +
 6 files changed, 232 insertions(+)
 create mode 100644 docs/system/arm/xenpvh.rst
 create mode 100644 hw/arm/xen_arm.c
 create mode 100644 include/hw/arm/xen_arch_hvm.h

diff --git a/docs/system/arm/xenpvh.rst b/docs/system/arm/xenpvh.rst
new file mode 100644
index 00..e1655c7ab8
--- /dev/null
+++ b/docs/system/arm/xenpvh.rst
@@ -0,0 +1,34 @@
+XENPVH (``xenpvh``)
+=
+This machine creates a IOREQ server to register/connect with Xen Hypervisor.
+
+When TPM is enabled, this machine also creates a tpm-tis-device at a user input
+tpm base address, adds a TPM emulator and connects to a swtpm application
+running on host machine via chardev socket. This enables xenpvh to support TPM
+functionalities for a guest domain.
+
+More information about TPM use and installing swtpm linux application can be
+found at: docs/specs/tpm.rst.
+
+Example for starting swtpm on host machine:
+.. code-block:: console
+
+mkdir /tmp/vtpm2
+swtpm socket --tpmstate dir=/tmp/vtpm2 \
+--ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &
+
+Sample QEMU xenpvh commands for running and connecting with Xen:
+.. code-block:: console
+
+qemu-system-aarch64 -xen-domid 1 \
+-chardev socket,id=libxl-cmd,path=qmp-libxl-1,server=on,wait=off \
+-mon chardev=libxl-cmd,mode=control \
+-chardev socket,id=libxenstat-cmd,path=qmp-libxenstat-1,server=on,wait=off 
\
+-mon chardev=libxenstat-cmd,mode=control \
+-xen-attach -name guest0 -vnc none -display none -nographic \
+-machine xenpvh -m 1301 \
+-chardev socket,id=chrtpm,path=tmp/vtpm2/swtpm-sock \
+-tpmdev emulator,id=tpm0,chardev=chrtpm -machine tpm-base-addr=0x0C00
+
+In above QEMU command, last two lines are for connecting xenpvh QEMU to swtpm
+via chardev socket.
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index 91ebc26c6d..af8d7c77d6 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -106,6 +106,7 @@ undocumented; you can get a complete list by running
arm/stm32
arm/virt
arm/xlnx-versal-virt
+   arm/xenpvh
 
 Emulated CPU architecture support
 =
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index b036045603..06bddbfbb8 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -61,6 +61,8 @@ arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: 
files('fsl-imx7.c', 'mcimx7d-sabre.
 arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
 arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 
'mcimx6ul-evk.c'))
 arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
+arm_ss.add(when: 'CONFIG_XEN', if_true: files('xen_arm.c'))
+arm_ss.add_all(xen_ss)
 
 softmmu_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmu-common.c'))
 softmmu_ss.add(when: 'CONFIG_EXYNOS4', if_true: files('exynos4_boards.c'))
diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
new file mode 100644
index 00..12b19e3609
--- /dev/null
+++ b/hw/arm/xen_arm.c
@@ -0,0 +1,184 @@
+/*
+ * QEMU ARM Xen PV Machine
+ *
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * 

[QEMU][PATCH v4 02/10] hw/i386/xen: rearrange xen_hvm_init_pc

2023-01-25 Thread Vikram Garhwal
In preparation to moving most of xen-hvm code to an arch-neutral location,
move non IOREQ references to:
- xen_get_vmport_regs_pfn
- xen_suspend_notifier
- xen_wakeup_notifier
- xen_ram_init

towards the end of the xen_hvm_init_pc() function.

This is done to keep the common ioreq functions in one place which will be
moved to new function in next patch in order to make it common to both x86 and
aarch64 machines.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 hw/i386/xen/xen-hvm.c | 49 ++-
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index b9a6f7f538..1fba0e0ae1 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1416,12 +1416,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 state->exit.notify = xen_exit_notifier;
 qemu_add_exit_notifier(>exit);
 
-state->suspend.notify = xen_suspend_notifier;
-qemu_register_suspend_notifier(>suspend);
-
-state->wakeup.notify = xen_wakeup_notifier;
-qemu_register_wakeup_notifier(>wakeup);
-
 /*
  * Register wake-up support in QMP query-current-machine API
  */
@@ -1432,23 +1426,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 goto err;
 }
 
-rc = xen_get_vmport_regs_pfn(xen_xc, xen_domid, _pfn);
-if (!rc) {
-DPRINTF("shared vmport page at pfn %lx\n", ioreq_pfn);
-state->shared_vmport_page =
-xenforeignmemory_map(xen_fmem, xen_domid, PROT_READ|PROT_WRITE,
- 1, _pfn, NULL);
-if (state->shared_vmport_page == NULL) {
-error_report("map shared vmport IO page returned error %d 
handle=%p",
- errno, xen_xc);
-goto err;
-}
-} else if (rc != -ENOSYS) {
-error_report("get vmport regs pfn returned error %d, rc=%d",
- errno, rc);
-goto err;
-}
-
 /* Note: cpus is empty at this point in init */
 state->cpu_by_vcpu_id = g_new0(CPUState *, max_cpus);
 
@@ -1486,7 +1463,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 #else
 xen_map_cache_init(NULL, state);
 #endif
-xen_ram_init(pcms, ms->ram_size, ram_memory);
 
 qemu_add_vm_change_state_handler(xen_hvm_change_state_handler, state);
 
@@ -1513,6 +1489,31 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 QLIST_INIT(_physmap);
 xen_read_physmap(state);
 
+state->suspend.notify = xen_suspend_notifier;
+qemu_register_suspend_notifier(>suspend);
+
+state->wakeup.notify = xen_wakeup_notifier;
+qemu_register_wakeup_notifier(>wakeup);
+
+rc = xen_get_vmport_regs_pfn(xen_xc, xen_domid, _pfn);
+if (!rc) {
+DPRINTF("shared vmport page at pfn %lx\n", ioreq_pfn);
+state->shared_vmport_page =
+xenforeignmemory_map(xen_fmem, xen_domid, PROT_READ|PROT_WRITE,
+ 1, _pfn, NULL);
+if (state->shared_vmport_page == NULL) {
+error_report("map shared vmport IO page returned error %d 
handle=%p",
+ errno, xen_xc);
+goto err;
+}
+} else if (rc != -ENOSYS) {
+error_report("get vmport regs pfn returned error %d, rc=%d",
+ errno, rc);
+goto err;
+}
+
+xen_ram_init(pcms, ms->ram_size, ram_memory);
+
 /* Disable ACPI build because Xen handles it */
 pcms->acpi_build_enabled = false;
 
-- 
2.17.0




[QEMU][PATCH v4 07/10] hw/xen/xen-hvm-common: Use g_new and error_setg_errno

2023-01-25 Thread Vikram Garhwal
Replace g_malloc with g_new and perror with error_setg_errno.

Signed-off-by: Vikram Garhwal 
---
 hw/xen/xen-hvm-common.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 94dbbe97ed..01c8ec1956 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -34,7 +34,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr,
 trace_xen_ram_alloc(ram_addr, size);
 
 nr_pfn = size >> TARGET_PAGE_BITS;
-pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
+pfn_list = g_new(xen_pfn_t, nr_pfn);
 
 for (i = 0; i < nr_pfn; i++) {
 pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
@@ -726,7 +726,7 @@ void destroy_hvm_domain(bool reboot)
 return;
 }
 if (errno != ENOTTY /* old Xen */) {
-perror("xendevicemodel_shutdown failed");
+error_report("xendevicemodel_shutdown failed with error %d", 
errno);
 }
 /* well, try the old thing then */
 }
@@ -797,7 +797,7 @@ static void xen_do_ioreq_register(XenIOState *state,
 }
 
 /* Note: cpus is empty at this point in init */
-state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
+state->cpu_by_vcpu_id = g_new0(CPUState *, max_cpus);
 
 rc = xen_set_ioreq_server_state(xen_domid, state->ioservid, true);
 if (rc < 0) {
@@ -806,7 +806,7 @@ static void xen_do_ioreq_register(XenIOState *state,
 goto err;
 }
 
-state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
+state->ioreq_local_port = g_new0(evtchn_port_t, max_cpus);
 
 /* FIXME: how about if we overflow the page here? */
 for (i = 0; i < max_cpus; i++) {
@@ -860,13 +860,13 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
max_cpus,
 
 state->xce_handle = xenevtchn_open(NULL, 0);
 if (state->xce_handle == NULL) {
-perror("xen: event channel open");
+error_report("xen: event channel open failed with error %d", errno);
 goto err;
 }
 
 state->xenstore = xs_daemon_open();
 if (state->xenstore == NULL) {
-perror("xen: xenstore open");
+error_report("xen: xenstore open failed with error %d", errno);
 goto err;
 }
 
-- 
2.17.0




[QEMU][PATCH v4 00/10] Introduce xenpvh machine for arm architecture

2023-01-25 Thread Vikram Garhwal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Hi,
This series add xenpvh machine for aarch64. Motivation behind creating xenpvh
machine with IOREQ and TPM was to enable each guest on Xen aarch64 to have it's
own unique and emulated TPM.

This series does following:
1. Moved common xen functionalities from hw/i386/xen to hw/xen/ so those can
   be used for aarch64.
2. We added a minimal xenpvh arm machine which creates an IOREQ server and
   support TPM.

Also, checkpatch.pl fails for 03/12 and 06/12. These fails are due to
moving old code to new place which was not QEMU code style compatible.
No new add code was added.

Regards,
Vikram

ChangeLog:
v3->v4:
Removed the out of series 04/12 patch.

v2->v3:
1. Change machine name to xenpvh as per Jurgen's input.
2. Add docs/system/xenpvh.rst documentation.
3. Removed GUEST_TPM_BASE and added tpm_base_address as property.
4. Correct CONFIG_TPM related issues.
5. Added xen_register_backend() function call to xen_register_ioreq().
6. Added Oleksandr's suggestion i.e. removed extra interface opening and
   used accel=xen option

v1 -> v2
Merged patch 05 and 06.
04/12: xen-hvm-common.c:
1. Moved xen_be_init() and xen_be_register_common() from
xen_register_ioreq() to xen_register_backend().
2. Changed g_malloc to g_new and perror -> error_setg_errno.
3. Created a local subroutine function for Xen_IOREQ_register.
4. Fixed build issues with inclusion of xenstore.h.
5. Fixed minor errors.

Stefano Stabellini (5):
  hw/i386/xen/xen-hvm: move x86-specific fields out of XenIOState
  xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common
  include/hw/xen/xen_common: return error from xen_create_ioreq_server
  hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration
failure
  meson.build: do not set have_xen_pci_passthrough for aarch64 targets

Vikram Garhwal (5):
  hw/i386/xen/: move xen-mapcache.c to hw/xen/
  hw/i386/xen: rearrange xen_hvm_init_pc
  hw/xen/xen-hvm-common: Use g_new and error_setg_errno
  hw/arm: introduce xenpvh machine
  meson.build: enable xenpv machine build for ARM

 docs/system/arm/xenpvh.rst   |   34 +
 docs/system/target-arm.rst   |1 +
 hw/arm/meson.build   |2 +
 hw/arm/xen_arm.c |  184 +
 hw/i386/meson.build  |1 +
 hw/i386/xen/meson.build  |1 -
 hw/i386/xen/trace-events |   19 -
 hw/i386/xen/xen-hvm.c| 1084 +++---
 hw/xen/meson.build   |7 +
 hw/xen/trace-events  |   19 +
 hw/xen/xen-hvm-common.c  |  889 
 hw/{i386 => }/xen/xen-mapcache.c |0
 include/hw/arm/xen_arch_hvm.h|9 +
 include/hw/i386/xen_arch_hvm.h   |   11 +
 include/hw/xen/arch_hvm.h|5 +
 include/hw/xen/xen-hvm-common.h  |   97 +++
 include/hw/xen/xen_common.h  |   13 +-
 meson.build  |4 +-
 18 files changed, 1363 insertions(+), 1017 deletions(-)
 create mode 100644 docs/system/arm/xenpvh.rst
 create mode 100644 hw/arm/xen_arm.c
 create mode 100644 hw/xen/xen-hvm-common.c
 rename hw/{i386 => }/xen/xen-mapcache.c (100%)
 create mode 100644 include/hw/arm/xen_arch_hvm.h
 create mode 100644 include/hw/i386/xen_arch_hvm.h
 create mode 100644 include/hw/xen/arch_hvm.h
 create mode 100644 include/hw/xen/xen-hvm-common.h

-- 
2.17.0




[QEMU][PATCH v4 06/10] hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration failure

2023-01-25 Thread Vikram Garhwal
From: Stefano Stabellini 

On ARM it is possible to have a functioning xenpv machine with only the
PV backends and no IOREQ server. If the IOREQ server creation fails continue
to the PV backends initialization.

Also, moved the IOREQ registration and mapping subroutine to new function
xen_do_ioreq_register().

Signed-off-by: Stefano Stabellini 
Signed-off-by: Vikram Garhwal 
---
 hw/xen/xen-hvm-common.c | 53 -
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index e748d8d423..94dbbe97ed 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -777,25 +777,12 @@ err:
 exit(1);
 }
 
-void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
-MemoryListener xen_memory_listener)
+static void xen_do_ioreq_register(XenIOState *state,
+   unsigned int max_cpus,
+   MemoryListener xen_memory_listener)
 {
 int i, rc;
 
-state->xce_handle = xenevtchn_open(NULL, 0);
-if (state->xce_handle == NULL) {
-perror("xen: event channel open");
-goto err;
-}
-
-state->xenstore = xs_daemon_open();
-if (state->xenstore == NULL) {
-perror("xen: xenstore open");
-goto err;
-}
-
-xen_create_ioreq_server(xen_domid, >ioservid);
-
 state->exit.notify = xen_exit_notifier;
 qemu_add_exit_notifier(>exit);
 
@@ -859,12 +846,44 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
max_cpus,
 QLIST_INIT(>dev_list);
 device_listener_register(>device_listener);
 
+return;
+
+err:
+error_report("xen hardware virtual machine initialisation failed");
+exit(1);
+}
+
+void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
+MemoryListener xen_memory_listener)
+{
+int rc;
+
+state->xce_handle = xenevtchn_open(NULL, 0);
+if (state->xce_handle == NULL) {
+perror("xen: event channel open");
+goto err;
+}
+
+state->xenstore = xs_daemon_open();
+if (state->xenstore == NULL) {
+perror("xen: xenstore open");
+goto err;
+}
+
+rc = xen_create_ioreq_server(xen_domid, >ioservid);
+if (!rc) {
+xen_do_ioreq_register(state, max_cpus, xen_memory_listener);
+} else {
+warn_report("xen: failed to create ioreq server");
+}
+
 xen_bus_init();
 
 xen_register_backend(state);
 
 return;
+
 err:
-error_report("xen hardware virtual machine initialisation failed");
+error_report("xen hardware virtual machine backend registration failed");
 exit(1);
 }
-- 
2.17.0




[QEMU][PATCH v4 03/10] hw/i386/xen/xen-hvm: move x86-specific fields out of XenIOState

2023-01-25 Thread Vikram Garhwal
From: Stefano Stabellini 

In preparation to moving most of xen-hvm code to an arch-neutral location, move:
- shared_vmport_page
- log_for_dirtybit
- dirty_bitmap
- suspend
- wakeup

out of XenIOState struct as these are only used on x86, especially the ones
related to dirty logging.
Updated XenIOState can be used for both aarch64 and x86.

Also, remove free_phys_offset as it was unused.

Signed-off-by: Stefano Stabellini 
Signed-off-by: Vikram Garhwal 
Reviewed-by: Paul Durrant 
Reviewed-by: Alex Bennée 
---
 hw/i386/xen/xen-hvm.c | 58 ---
 1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 1fba0e0ae1..06c446e7be 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -73,6 +73,7 @@ struct shared_vmport_iopage {
 };
 typedef struct shared_vmport_iopage shared_vmport_iopage_t;
 #endif
+static shared_vmport_iopage_t *shared_vmport_page;
 
 static inline uint32_t xen_vcpu_eport(shared_iopage_t *shared_page, int i)
 {
@@ -95,6 +96,11 @@ typedef struct XenPhysmap {
 } XenPhysmap;
 
 static QLIST_HEAD(, XenPhysmap) xen_physmap;
+static const XenPhysmap *log_for_dirtybit;
+/* Buffer used by xen_sync_dirty_bitmap */
+static unsigned long *dirty_bitmap;
+static Notifier suspend;
+static Notifier wakeup;
 
 typedef struct XenPciDevice {
 PCIDevice *pci_dev;
@@ -105,7 +111,6 @@ typedef struct XenPciDevice {
 typedef struct XenIOState {
 ioservid_t ioservid;
 shared_iopage_t *shared_page;
-shared_vmport_iopage_t *shared_vmport_page;
 buffered_iopage_t *buffered_io_page;
 xenforeignmemory_resource_handle *fres;
 QEMUTimer *buffered_io_timer;
@@ -125,14 +130,8 @@ typedef struct XenIOState {
 MemoryListener io_listener;
 QLIST_HEAD(, XenPciDevice) dev_list;
 DeviceListener device_listener;
-hwaddr free_phys_offset;
-const XenPhysmap *log_for_dirtybit;
-/* Buffer used by xen_sync_dirty_bitmap */
-unsigned long *dirty_bitmap;
 
 Notifier exit;
-Notifier suspend;
-Notifier wakeup;
 } XenIOState;
 
 /* Xen specific function for piix pci */
@@ -462,10 +461,10 @@ static int xen_remove_from_physmap(XenIOState *state,
 }
 
 QLIST_REMOVE(physmap, list);
-if (state->log_for_dirtybit == physmap) {
-state->log_for_dirtybit = NULL;
-g_free(state->dirty_bitmap);
-state->dirty_bitmap = NULL;
+if (log_for_dirtybit == physmap) {
+log_for_dirtybit = NULL;
+g_free(dirty_bitmap);
+dirty_bitmap = NULL;
 }
 g_free(physmap);
 
@@ -626,16 +625,16 @@ static void xen_sync_dirty_bitmap(XenIOState *state,
 return;
 }
 
-if (state->log_for_dirtybit == NULL) {
-state->log_for_dirtybit = physmap;
-state->dirty_bitmap = g_new(unsigned long, bitmap_size);
-} else if (state->log_for_dirtybit != physmap) {
+if (log_for_dirtybit == NULL) {
+log_for_dirtybit = physmap;
+dirty_bitmap = g_new(unsigned long, bitmap_size);
+} else if (log_for_dirtybit != physmap) {
 /* Only one range for dirty bitmap can be tracked. */
 return;
 }
 
 rc = xen_track_dirty_vram(xen_domid, start_addr >> TARGET_PAGE_BITS,
-  npages, state->dirty_bitmap);
+  npages, dirty_bitmap);
 if (rc < 0) {
 #ifndef ENODATA
 #define ENODATA  ENOENT
@@ -650,7 +649,7 @@ static void xen_sync_dirty_bitmap(XenIOState *state,
 }
 
 for (i = 0; i < bitmap_size; i++) {
-unsigned long map = state->dirty_bitmap[i];
+unsigned long map = dirty_bitmap[i];
 while (map != 0) {
 j = ctzl(map);
 map &= ~(1ul << j);
@@ -676,12 +675,10 @@ static void xen_log_start(MemoryListener *listener,
 static void xen_log_stop(MemoryListener *listener, MemoryRegionSection 
*section,
  int old, int new)
 {
-XenIOState *state = container_of(listener, XenIOState, memory_listener);
-
 if (old & ~new & (1 << DIRTY_MEMORY_VGA)) {
-state->log_for_dirtybit = NULL;
-g_free(state->dirty_bitmap);
-state->dirty_bitmap = NULL;
+log_for_dirtybit = NULL;
+g_free(dirty_bitmap);
+dirty_bitmap = NULL;
 /* Disable dirty bit tracking */
 xen_track_dirty_vram(xen_domid, 0, 0, NULL);
 }
@@ -1021,9 +1018,9 @@ static void handle_vmport_ioreq(XenIOState *state, 
ioreq_t *req)
 {
 vmware_regs_t *vmport_regs;
 
-assert(state->shared_vmport_page);
+assert(shared_vmport_page);
 vmport_regs =
->shared_vmport_page->vcpu_vmport_regs[state->send_vcpu];
+_vmport_page->vcpu_vmport_regs[state->send_vcpu];
 QEMU_BUILD_BUG_ON(sizeof(*req) < sizeof(*vmport_regs));
 
 current_cpu = state->cpu_by_vcpu_id[state->send_vcpu];
@@ -1468,7 +1465,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 
 state->memory_listener = xen_memory_listener;
 

[QEMU][PATCH v4 01/10] hw/i386/xen/: move xen-mapcache.c to hw/xen/

2023-01-25 Thread Vikram Garhwal
xen-mapcache.c contains common functions which can be used for enabling Xen on
aarch64 with IOREQ handling. Moving it out from hw/i386/xen to hw/xen to make it
accessible for both aarch64 and x86.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
 hw/i386/meson.build  | 1 +
 hw/i386/xen/meson.build  | 1 -
 hw/i386/xen/trace-events | 5 -
 hw/xen/meson.build   | 4 
 hw/xen/trace-events  | 5 +
 hw/{i386 => }/xen/xen-mapcache.c | 0
 6 files changed, 10 insertions(+), 6 deletions(-)
 rename hw/{i386 => }/xen/xen-mapcache.c (100%)

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 213e2e82b3..cfdbfdcbcb 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -33,5 +33,6 @@ subdir('kvm')
 subdir('xen')
 
 i386_ss.add_all(xenpv_ss)
+i386_ss.add_all(xen_ss)
 
 hw_arch += {'i386': i386_ss}
diff --git a/hw/i386/xen/meson.build b/hw/i386/xen/meson.build
index be84130300..2fcc46e6ca 100644
--- a/hw/i386/xen/meson.build
+++ b/hw/i386/xen/meson.build
@@ -1,6 +1,5 @@
 i386_ss.add(when: 'CONFIG_XEN', if_true: files(
   'xen-hvm.c',
-  'xen-mapcache.c',
   'xen_apic.c',
   'xen_platform.c',
   'xen_pvdevice.c',
diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index 5d6be61090..a0c89d91c4 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -21,8 +21,3 @@ xen_map_resource_ioreq(uint32_t id, void *addr) "id: %u addr: 
%p"
 cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, 
uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
 cpu_ioreq_config_write(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, 
uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
 
-# xen-mapcache.c
-xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
-xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
-xen_map_cache_return(void* ptr) "%p"
-
diff --git a/hw/xen/meson.build b/hw/xen/meson.build
index ae0ace3046..19d0637c46 100644
--- a/hw/xen/meson.build
+++ b/hw/xen/meson.build
@@ -22,3 +22,7 @@ else
 endif
 
 specific_ss.add_all(when: ['CONFIG_XEN', xen], if_true: xen_specific_ss)
+
+xen_ss = ss.source_set()
+
+xen_ss.add(when: 'CONFIG_XEN', if_true: files('xen-mapcache.c'))
diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index 3da3fd8348..2c8f238f42 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -41,3 +41,8 @@ xs_node_vprintf(char *path, char *value) "%s %s"
 xs_node_vscanf(char *path, char *value) "%s %s"
 xs_node_watch(char *path) "%s"
 xs_node_unwatch(char *path) "%s"
+
+# xen-mapcache.c
+xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
+xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
+xen_map_cache_return(void* ptr) "%p"
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
similarity index 100%
rename from hw/i386/xen/xen-mapcache.c
rename to hw/xen/xen-mapcache.c
-- 
2.17.0




  1   2   >