date:20230601

Re: [PATCH V3 3/3] libxl: arm: Add grant_usage parameter for virtio devices

2023-06-01 Thread Viresh Kumar

On 02-06-23, 08:25, Erik Schilling wrote:
> > diff --git a/tools/golang/xenlight/helpers.gen.go 
> > b/tools/golang/xenlight/helpers.gen.go
> > index 0a203d22321f..bf846dca8ec0 100644
> > --- a/tools/golang/xenlight/helpers.gen.go
> > +++ b/tools/golang/xenlight/helpers.gen.go
> > @@ -1792,6 +1792,9 @@ func (x *DeviceVirtio) fromC(xc 
> > *C.libxl_device_virtio) error {
> >  x.BackendDomname = C.GoString(xc.backend_domname)
> >  x.Type = C.GoString(xc._type)
> >  x.Transport = VirtioTransport(xc.transport)
> > +if err := x.GrantUsage.fromC(&xc.grant_usage);err != nil {
> 
> NITPICK: space after ; seems missing.

This is an auto-generated file, perhaps the script has a bug :)

-- 
viresh

Re: [PATCH V3 3/3] libxl: arm: Add grant_usage parameter for virtio devices

2023-06-01 Thread Erik Schilling

> diff --git a/tools/golang/xenlight/helpers.gen.go 
> b/tools/golang/xenlight/helpers.gen.go
> index 0a203d22321f..bf846dca8ec0 100644
> --- a/tools/golang/xenlight/helpers.gen.go
> +++ b/tools/golang/xenlight/helpers.gen.go
> @@ -1792,6 +1792,9 @@ func (x *DeviceVirtio) fromC(xc *C.libxl_device_virtio) 
> error {
>  x.BackendDomname = C.GoString(xc.backend_domname)
>  x.Type = C.GoString(xc._type)
>  x.Transport = VirtioTransport(xc.transport)
> +if err := x.GrantUsage.fromC(&xc.grant_usage);err != nil {

NITPICK: space after ; seems missing.

> +return fmt.Errorf("converting field GrantUsage: %v", err)
> +}
>  x.Devid = Devid(xc.devid)
>  x.Irq = uint32(xc.irq)
>  x.Base = uint64(xc.base)
> @@ -1809,6 +1812,9 @@ xc.backend_domname = C.CString(x.BackendDomname)}
>  if x.Type != "" {
>  xc._type = C.CString(x.Type)}
>  xc.transport = C.libxl_virtio_transport(x.Transport)
> +if err := x.GrantUsage.toC(&xc.grant_usage); err != nil {

Here it exists.

- Erik

Re: [PATCH v2] iscsi_ibft: Fix finding the iBFT under Xen Dom 0

2023-06-01 Thread Juergen Gross


On 01.06.23 18:57, Dave Hansen wrote:

On 5/30/23 08:01, Ross Lagerwall wrote:

Since firmware doesn't indicate the iBFT in the E820, add a reserved
region so that it gets identity mapped when running as Dom 0 so that it
is possible to search for it. Move the call to reserve_ibft_region()
later so that it is called after the Xen identity mapping adjustments
are applied.

Finally, instead of using isa_bus_to_virt() which doesn't do the right
thing under Xen, use early_memremap() like the dmi_scan code does.


This is connecting Xen, iSCSI and x86.  Some background here would be
*really* nice for dummies like me that deal heavily in only one of those
three.

One or two sentences like this:

Firmware can provide an iSCSI-specific table called the iBFT
which helps the OS boot from iSCSI devices.

can go a long way for dummies like me.  As could some background about
why this:

... add a reserved region so that it gets identity mapped when
running as Dom 0 so that it is possible to search for it.

These are all English words, but off the top of my head, I have no idea
why reserved regions get identity mapped when running as Dom 0 or why
that makes it possible to search.

The addresses and size here:


+#ifdef CONFIG_ISCSI_IBFT_FIND
+   /* Reserve 0.5 MiB to 1 MiB region so iBFT can be found */
+   xen_e820_table.entries[xen_e820_table.nr_entries].addr = 
0x8;
+   xen_e820_table.entries[xen_e820_table.nr_entries].size = 
0x8;
+   xen_e820_table.entries[xen_e820_table.nr_entries].type = 
E820_TYPE_RESERVED;
+   xen_e820_table.nr_entries++;
+#endif


also appear to be conjured out of thin air.


I'd suggest to move the definitions of IBFT_START and IBFT_END from
drivers/firmware/iscsi_ibft_find.c to include/linux/iscsi_ibft.h and use
them here.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

[PATCH V3 1/3] libxl: virtio: Remove unused frontend nodes

2023-06-01 Thread Viresh Kumar

Only the VirtIO backend will watch xenstore to find out when a new
instance needs to be created for a guest, and read the parameters from
there. VirtIO frontend are only virtio, so they will not do anything
with the xenstore nodes. They can be removed.

While at it, also add a comment to the libxl_virtio.c file.

Signed-off-by: Viresh Kumar 
---
 tools/libs/light/libxl_virtio.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libs/light/libxl_virtio.c b/tools/libs/light/libxl_virtio.c
index faada49e184e..f8a78e22d156 100644
--- a/tools/libs/light/libxl_virtio.c
+++ b/tools/libs/light/libxl_virtio.c
@@ -1,4 +1,9 @@
 /*
+ * Setup VirtIO backend. This is intended to interact with a VirtIO
+ * backend that is watching xenstore, and create new VirtIO devices
+ * with the parameter found in xenstore (VirtIO frontend don't
+ * interact with xenstore.)
+ *
  * Copyright (C) 2022 Linaro Ltd.
  *
  * This program is free software; you can redistribute it and/or modify
@@ -49,11 +54,6 @@ static int libxl__set_xenstore_virtio(libxl__gc *gc, 
uint32_t domid,
 flexarray_append_pair(back, "type", GCSPRINTF("%s", virtio->type));
 flexarray_append_pair(back, "transport", GCSPRINTF("%s", transport));
 
-flexarray_append_pair(front, "irq", GCSPRINTF("%u", virtio->irq));
-flexarray_append_pair(front, "base", GCSPRINTF("%#"PRIx64, virtio->base));
-flexarray_append_pair(front, "type", GCSPRINTF("%s", virtio->type));
-flexarray_append_pair(front, "transport", GCSPRINTF("%s", transport));
-
 return 0;
 }
 
-- 
2.31.1.272.g89b43f80a514

[PATCH V3 3/3] libxl: arm: Add grant_usage parameter for virtio devices

2023-06-01 Thread Viresh Kumar

Currently, the grant mapping related device tree properties are added if
the backend domain is not Dom0. While Dom0 is privileged and can do
foreign mapping for the entire guest memory, it is still desired for
Dom0 to access guest's memory via grant mappings and hence map only what
is required.

This commit adds the "grant_usage" parameter for virtio devices, which
provides better control over the functionality.

Signed-off-by: Viresh Kumar 
---
 docs/man/xl.cfg.5.pod.in |  8 
 tools/golang/xenlight/helpers.gen.go |  6 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/libs/light/libxl_arm.c | 22 +-
 tools/libs/light/libxl_types.idl |  1 +
 tools/libs/light/libxl_virtio.c  | 23 +--
 tools/xl/xl_parse.c  |  2 ++
 7 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 24ac92718288..3a40ac8cb322 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -1619,6 +1619,14 @@ hexadecimal format, without the "0x" prefix and all in 
lower case, like
 Specifies the transport mechanism for the Virtio device, only "mmio" is
 supported for now.
 
+=item B
+
+If this option is B, the Xen grants are always enabled.
+If this option is B, the Xen grants are always disabled.
+
+If this option is missing, then the default grant setting will be used,
+i.e. enable grants if backend-domid != 0.
+
 =back
 
 =item B
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 0a203d22321f..bf846dca8ec0 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1792,6 +1792,9 @@ func (x *DeviceVirtio) fromC(xc *C.libxl_device_virtio) 
error {
 x.BackendDomname = C.GoString(xc.backend_domname)
 x.Type = C.GoString(xc._type)
 x.Transport = VirtioTransport(xc.transport)
+if err := x.GrantUsage.fromC(&xc.grant_usage);err != nil {
+return fmt.Errorf("converting field GrantUsage: %v", err)
+}
 x.Devid = Devid(xc.devid)
 x.Irq = uint32(xc.irq)
 x.Base = uint64(xc.base)
@@ -1809,6 +1812,9 @@ xc.backend_domname = C.CString(x.BackendDomname)}
 if x.Type != "" {
 xc._type = C.CString(x.Type)}
 xc.transport = C.libxl_virtio_transport(x.Transport)
+if err := x.GrantUsage.toC(&xc.grant_usage); err != nil {
+return fmt.Errorf("converting field GrantUsage: %v", err)
+}
 xc.devid = C.libxl_devid(x.Devid)
 xc.irq = C.uint32_t(x.Irq)
 xc.base = C.uint64_t(x.Base)
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index a7c17699f80e..e0c6e91bb0ef 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -683,6 +683,7 @@ BackendDomid Domid
 BackendDomname string
 Type string
 Transport VirtioTransport
+GrantUsage Defbool
 Devid Devid
 Irq uint32
 Base uint64
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 97c80d7ed0fa..bc2bd9649b95 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -922,7 +922,8 @@ static int make_xen_iommu_node(libxl__gc *gc, void *fdt)
 
 /* The caller is responsible to complete / close the fdt node */
 static int make_virtio_mmio_node_common(libxl__gc *gc, void *fdt, uint64_t 
base,
-uint32_t irq, uint32_t backend_domid)
+uint32_t irq, uint32_t backend_domid,
+bool grant_usage)
 {
 int res;
 gic_interrupt intr;
@@ -945,7 +946,7 @@ static int make_virtio_mmio_node_common(libxl__gc *gc, void 
*fdt, uint64_t base,
 res = fdt_property(fdt, "dma-coherent", NULL, 0);
 if (res) return res;
 
-if (backend_domid != LIBXL_TOOLSTACK_DOMID) {
+if (grant_usage) {
 uint32_t iommus_prop[2];
 
 iommus_prop[0] = cpu_to_fdt32(GUEST_PHANDLE_IOMMU);
@@ -959,11 +960,12 @@ static int make_virtio_mmio_node_common(libxl__gc *gc, 
void *fdt, uint64_t base,
 }
 
 static int make_virtio_mmio_node(libxl__gc *gc, void *fdt, uint64_t base,
- uint32_t irq, uint32_t backend_domid)
+ uint32_t irq, uint32_t backend_domid,
+ bool grant_usage)
 {
 int res;
 
-res = make_virtio_mmio_node_common(gc, fdt, base, irq, backend_domid);
+res = make_virtio_mmio_node_common(gc, fdt, base, irq, backend_domid, 
grant_usage);
 if (res) return res;
 
 return fdt_end_node(fdt);
@@ -1019,11 +1021,11 @@ static int make_virtio_mmio_node_gpio(libxl__gc *gc, 
void *fdt)
 
 static int make_virtio_mmio_node_device(libxl__gc *gc, void *fdt, uint64_t 
base,
 uint32_t irq, const char *type,
-uint32_t backend_domid)
+uint32_t backend_domid, bool 
grant_usage)
 {
 int res;
 
-res = make_virtio_mmio_node_common(gc, fdt, b

[PATCH V3 0/3] libxl: Make grants configurable for virtio devices

2023-06-01 Thread Viresh Kumar

Hi,

This patchset intends to make grant mapping usage configurable for virtio
devices. Currently they are forced enabled for backends running on non-Dom0
domains. This patchset adds a new `grant_usage` parameter for the virtio
devices, which can be used to enable or disable grant mappings irrespective of
the backend domain, while still preserving the default behavior in absence of a
parameter.

V2->V3:
- Patch 2/3 is new and fixes ordering issues with default values.
- Reuse `libxl_defbool` instead of defining a new type, it can take values 0 and
  1.
- Improved commit logs and comments.

V1->V2:
- Instead of just 0 or 1, the argument can take multiple values now and control
  the functionality in a better way.

- Update .gen.go files as well.

- Don't add nodes under frontend path.

Viresh Kumar (3):
  libxl: virtio: Remove unused frontend nodes
  libxl: Call libxl__virtio_devtype.set_default() early enough
  libxl: arm: Add grant_usage parameter for virtio devices

 docs/man/xl.cfg.5.pod.in |  8 +++
 tools/golang/xenlight/helpers.gen.go |  6 +
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/libs/light/libxl_arm.c | 22 +++
 tools/libs/light/libxl_create.c  | 11 +-
 tools/libs/light/libxl_types.idl |  1 +
 tools/libs/light/libxl_virtio.c  | 33 ++--
 tools/xl/xl_parse.c  |  2 ++
 8 files changed, 67 insertions(+), 17 deletions(-)

-- 
2.31.1.272.g89b43f80a514

[PATCH V3 2/3] libxl: Call libxl__virtio_devtype.set_default() early enough

2023-06-01 Thread Viresh Kumar

The _setdefault() function for virtio devices is getting called after
libxl__prepare_dtb(), which is late as libxl__prepare_dtb() expects the
defaults to be already set by this time.

Call libxl__virtio_devtype.set_default() from
libxl__domain_config_setdefault(), in a similar way as other devices
like disk, etc.

Suggested-by: Anthony PERARD 
Signed-off-by: Viresh Kumar 
---
 tools/libs/light/libxl_create.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index ec8eab02c207..36770af6d4ff 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -1068,7 +1068,7 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
 uint32_t domid /* for logging, only */)
 {
 libxl_ctx *ctx = libxl__gc_owner(gc);
-int ret;
+int ret, i;
 bool pod_enabled = false;
 libxl_domain_create_info *c_info = &d_config->c_info;
 
@@ -1266,6 +1266,15 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
 goto error_out;
 }
 
+for (i = 0; i < d_config->num_virtios; i++) {
+ret = libxl__virtio_devtype.set_default(gc, domid,
+&d_config->virtios[i], false);
+if (ret) {
+LOGD(ERROR, domid, "Unable to set virtio defaults for device %d", 
i);
+goto error_out;
+}
+}
+
 ret = 0;
  error_out:
 return ret;
-- 
2.31.1.272.g89b43f80a514

[qemu-mainline test] 181089: regressions - FAIL

2023-06-01 Thread osstest service owner

flight 181089 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181089/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-pair 30 leak-check/check/src_host fail REGR. vs. 
180691
 test-amd64-amd64-libvirt-pair 31 leak-check/check/dst_host fail REGR. vs. 
180691
 test-amd64-i386-libvirt  23 leak-check/check fail REGR. vs. 180691
 test-amd64-amd64-libvirt-xsm 23 leak-check/check fail REGR. vs. 180691
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 test-amd64-amd64-libvirt 23 leak-check/check fail REGR. vs. 180691
 test-amd64-i386-libvirt-xsm  23 leak-check/check fail REGR. vs. 180691
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 21 leak-check/check fail 
REGR. vs. 180691
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 21 leak-check/check fail 
REGR. vs. 180691
 test-amd64-i386-xl-vhd   24 leak-check/check fail REGR. vs. 180691
 test-amd64-i386-libvirt-raw  22 leak-check/check fail REGR. vs. 180691
 test-amd64-amd64-xl-qcow224 leak-check/check fail REGR. vs. 180691
 test-armhf-armhf-libvirt 21 leak-check/check fail REGR. vs. 180691
 test-armhf-armhf-libvirt-qcow2 20 leak-check/check   fail REGR. vs. 180691
 test-armhf-armhf-libvirt-raw 20 leak-check/check fail REGR. vs. 180691
 test-armhf-armhf-xl-vhd  20 leak-check/check fail REGR. vs. 180691
 test-amd64-amd64-libvirt-vhd 22 leak-check/check fail in 181068 REGR. vs. 
180691

Tests which are failing intermittently (not blocking):
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail in 181068 pass in 
181089
 test-amd64-i386-libvirt-pair 10 xen-install/src_host   fail pass in 181068
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat  fail pass in 181068

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180691
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180691
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180691
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180691
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180691
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180691
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180691
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180691
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-mult

[linux-linus test] 181082: regressions - FAIL

2023-06-01 Thread osstest service owner

flight 181082 linux-linus real [real]
flight 181098 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/181082/
http://logs.test-lab.xenproject.org/osstest/logs/181098/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 180278

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-examine  8 reboot   fail  like 180278
 test-armhf-armhf-xl-arndale   8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278
 test-armhf-armhf-xl-credit2   8 xen-boot fail  like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-armhf-armhf-xl-multivcpu  8 xen-boot fail like 180278
 test-armhf-armhf-libvirt-raw  8 xen-boot fail  like 180278
 test-armhf-armhf-libvirt  8 xen-boot fail  like 180278
 test-armhf-armhf-libvirt-qcow2  8 xen-bootfail like 180278
 test-armhf-armhf-xl   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-vhd   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-rtds  8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 linux929ed21dfdb6ee94391db51c9eedb63314ef6847
baseline version:
 linux6c538e1adbfc696ac4747fb10d63e704344f763d

Last test of basis   180278  2023-04-16 19:41:46 Z   46 days
Failing since180281  2023-04-17 06:24:36 Z   45 days   86 attempts
Testing same since   181063  2023-06-01 00:42:42 Z1 days2 attempts


2563 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-cores

RE: [XEN][PATCH v7 14/19] common/device_tree: Add rwlock for dt_host

2023-06-01 Thread Henry Wang

Hi Vikram,

> -Original Message-
> Subject: [XEN][PATCH v7 14/19] common/device_tree: Add rwlock for dt_host
> 
>  Dynamic programming ops will modify the dt_host and there might be other
>  function which are browsing the dt_host at the same time. To avoid the race
>  conditions, adding rwlock for browsing the dt_host during runtime.
> 
>  Reason behind adding rwlock instead of spinlock:
> For now, dynamic programming is the sole modifier of dt_host in Xen
> during
> run time. All other access functions like iommu_release_dt_device() 
> are
> just reading the dt_host during run-time. So, there is a need to 
> protect
> others from browsing the dt_host while dynamic programming is
> modifying
> it. rwlock is better suitable for this task as spinlock won't be able 
> to
> differentiate between read and write access.
> 
> Signed-off-by: Vikram Garhwal 

Reviewed-by: Henry Wang 

Kind regards,
Henry

RE: [XEN][PATCH v7 08/19] xen/device-tree: Add device_tree_find_node_by_path() to find nodes in device tree

2023-06-01 Thread Henry Wang

Hi Vikram,

> -Original Message-
> Subject: [XEN][PATCH v7 08/19] xen/device-tree: Add
> device_tree_find_node_by_path() to find nodes in device tree
> 
> Add device_tree_find_node_by_path() to find a matching node with path for
> a
> dt_device_node.
> 
> Reason behind this function:
> Each time overlay nodes are added using .dtbo, a new fdt(memcpy of
> device_tree_flattened) is created and updated with overlay nodes. This
> updated fdt is further unflattened to a dt_host_new. Next, we need to find
> the overlay nodes in dt_host_new, find the overlay node's parent in 
> dt_host
> and add the nodes as child under their parent in the dt_host. Thus we need
> this function to search for node in different unflattened device trees.
> 
> Also, make dt_find_node_by_path() static inline.
> 
> Signed-off-by: Vikram Garhwal 
> 
> ---
> Changes from v6:
> Rename "dt_node" to "from"
> ---
>  xen/common/device_tree.c  |  6 --
>  xen/include/xen/device_tree.h | 18 --
>  2 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
> index 16b4b4e946..c5250a1644 100644
> --- a/xen/common/device_tree.c
> +++ b/xen/common/device_tree.c
> @@ -358,11 +358,13 @@ struct dt_device_node
> *dt_find_node_by_type(struct dt_device_node *from,
>  return np;
>  }
> 
> -struct dt_device_node *dt_find_node_by_path(const char *path)
> +struct dt_device_node *
> +device_tree_find_node_by_path(struct dt_device_node 
> *from,
> +  const char *path)

NIT: I found that the indentation here is a bit strange to me. I personally 
would
write like:
struct dt_device_node *
device_tree_find_node_by_path(struct dt_device_node *from, char *path)

[...]

> -struct dt_device_node *dt_find_node_by_path(const char *path);
> +struct dt_device_node *
> +device_tree_find_node_by_path(struct dt_device_node 
> *from,
> +  const char *path);

Same here.

But anyway, the content of this patch looks good to me and I confirm you've
addressed the comment of mine and Michal's in v6, so:

Reviewed-by: Henry Wang 

Kind regards,
Henry

RE: [XEN][PATCH v7 05/19] xen/arm: Add CONFIG_OVERLAY_DTB

2023-06-01 Thread Henry Wang

Hi Vikram,

> -Original Message-
> Subject: [XEN][PATCH v7 05/19] xen/arm: Add CONFIG_OVERLAY_DTB
> 
> Introduce a config option where the user can enable support for
> adding/removing
> device tree nodes using a device tree binary overlay.
> 
> Update SUPPORT.md and CHANGELOG.md to state the Device Tree Overlays
> support for
> Arm.
> 
> Signed-off-by: Vikram Garhwal 

Acked-by: Henry Wang  # CHANGELOG

Kind regards,
Henry

[xen-unstable-smoke test] 181093: tolerable all pass - PUSHED

2023-06-01 Thread osstest service owner

flight 181093 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181093/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  71226054f28ad98ab214b56a15899e8f62a7bca8
baseline version:
 xen  59d0bf62861f5c9b317ccf89f8b5c8b4d19927ad

Last test of basis   181074  2023-06-01 10:00:27 Z0 days
Testing same since   181093  2023-06-01 21:00:28 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   59d0bf6286..71226054f2  71226054f28ad98ab214b56a15899e8f62a7bca8 -> smoke

[XEN][PATCH v7 17/19] tools/libs/ctrl: Implement new xc interfaces for dt overlay

2023-06-01 Thread Vikram Garhwal

xc_dt_overlay() sends the device tree binary overlay, size of .dtbo and overlay
operation type i.e. add or remove to xen.

Signed-off-by: Vikram Garhwal 
---
 tools/include/xenctrl.h |  5 
 tools/libs/ctrl/Makefile.common |  1 +
 tools/libs/ctrl/xc_dt_overlay.c | 51 +
 3 files changed, 57 insertions(+)
 create mode 100644 tools/libs/ctrl/xc_dt_overlay.c

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index dba33d5d0f..411f7ef04b 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2626,6 +2626,11 @@ int xc_livepatch_replace(xc_interface *xch, char *name, 
uint32_t timeout, uint32
 int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
  xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
 
+#if defined(__arm__) || defined(__aarch64__)
+int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
+  uint32_t overlay_fdt_size, uint8_t overlay_op);
+#endif
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libs/ctrl/Makefile.common b/tools/libs/ctrl/Makefile.common
index 0a09c28fd3..247afbe5f9 100644
--- a/tools/libs/ctrl/Makefile.common
+++ b/tools/libs/ctrl/Makefile.common
@@ -24,6 +24,7 @@ OBJS-y   += xc_hcall_buf.o
 OBJS-y   += xc_foreign_memory.o
 OBJS-y   += xc_kexec.o
 OBJS-y   += xc_resource.o
+OBJS-$(CONFIG_ARM)  += xc_dt_overlay.o
 OBJS-$(CONFIG_X86) += xc_psr.o
 OBJS-$(CONFIG_X86) += xc_pagetab.o
 OBJS-$(CONFIG_Linux) += xc_linux.o
diff --git a/tools/libs/ctrl/xc_dt_overlay.c b/tools/libs/ctrl/xc_dt_overlay.c
new file mode 100644
index 00..58283b9ef6
--- /dev/null
+++ b/tools/libs/ctrl/xc_dt_overlay.c
@@ -0,0 +1,51 @@
+/*
+ *
+ * Device Tree Overlay functions.
+ * Copyright (C) 2021 Xilinx Inc.
+ * Author Vikram Garhwal 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see .
+ */
+
+#include "xc_private.h"
+
+int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
+  uint32_t overlay_fdt_size, uint8_t overlay_op)
+{
+int err;
+DECLARE_SYSCTL;
+
+DECLARE_HYPERCALL_BOUNCE(overlay_fdt, overlay_fdt_size,
+ XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( (err = xc_hypercall_bounce_pre(xch, overlay_fdt)) )
+goto err;
+
+sysctl.cmd = XEN_SYSCTL_dt_overlay;
+sysctl.u.dt_overlay.overlay_op = overlay_op;
+sysctl.u.dt_overlay.overlay_fdt_size = overlay_fdt_size;
+sysctl.u.dt_overlay.pad[0]= 0;
+sysctl.u.dt_overlay.pad[1]= 0;
+sysctl.u.dt_overlay.pad[2]= 0;
+
+set_xen_guest_handle(sysctl.u.dt_overlay.overlay_fdt, overlay_fdt);
+
+if ( (err = do_sysctl(xch, &sysctl)) != 0 )
+PERROR("%s failed", __func__);
+
+err:
+xc_hypercall_bounce_post(xch, overlay_fdt);
+
+return err;
+}
-- 
2.17.1

Re: [XEN][PATCH v6 16/19] xen/arm: Implement device tree node addition functionalities

2023-06-01 Thread Vikram Garhwal


Hi Michal,

On 5/10/23 3:18 AM, Michal Orzel wrote:


On 03/05/2023 01:36, Vikram Garhwal wrote:

Update sysctl XEN_SYSCTL_dt_overlay to enable support for dtbo nodes addition
using device tree overlay.

xl dt-overlay add file.dtbo:
 Each time overlay nodes are added using .dtbo, a new fdt(memcpy of
 device_tree_flattened) is created and updated with overlay nodes. This
 updated fdt is further unflattened to a dt_host_new. Next, it checks if any
 of the overlay nodes already exists in the dt_host. If overlay nodes 
doesn't
 exist then find the overlay nodes in dt_host_new, find the overlay node's
 parent in dt_host and add the nodes as child under their parent in the
 dt_host. The node is attached as the last node under target parent.

 Finally, add IRQs, add device to IOMMUs, set permissions and map MMIO for 
the
 overlay node.

When a node is added using overlay, a new entry is allocated in the
overlay_track to keep the track of memory allocation due to addition of overlay
node. This is helpful for freeing the memory allocated when a device tree node
is removed.

The main purpose of this to address first part of dynamic programming i.e.
making xen aware of new device tree node which means updating the dt_host with
overlay node information. Here we are adding/removing node from dt_host, and
checking/setting IOMMU and IRQ permission but never mapping them to any domain.
Right now, mapping/Un-mapping will happen only when a new domU is
created/destroyed using "xl create".

Signed-off-by: Vikram Garhwal 
---
  xen/common/dt-overlay.c | 510 
  1 file changed, 510 insertions(+)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index b89cceab84..09ea46111b 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -33,6 +33,25 @@ static struct dt_device_node *
  return child_node;
  }
  
+/*

+ * Returns next node to the input node. If node has children then return
+ * last descendant's next node.
+*/
+static struct dt_device_node *
+dt_find_next_node(struct dt_device_node *dt, const struct dt_device_node *node)
+{
+struct dt_device_node *np;
+
+dt_for_each_device_node(dt, np)
+if ( np == node )
+break;
+
+if ( np->child )
+np = find_last_descendants_node(np);
+
+return np->allnext;
+}
+
  static int dt_overlay_remove_node(struct dt_device_node *device_node)
  {
  struct dt_device_node *np;
@@ -106,6 +125,76 @@ static int dt_overlay_remove_node(struct dt_device_node 
*device_node)
  return 0;
  }
  
+static int dt_overlay_add_node(struct dt_device_node *device_node,

+   const char *parent_node_path)
+{
+struct dt_device_node *parent_node;
+struct dt_device_node *next_node;
+
+parent_node = dt_find_node_by_path(parent_node_path);
+
+if ( parent_node == NULL )
+{
+dt_dprintk("Parent node %s not found. Overlay node will not be 
added\n",
+   parent_node_path);
+return -EINVAL;
+}
+
+/* If parent has no child. */
+if ( parent_node->child == NULL )
+{
+next_node = parent_node->allnext;
+device_node->parent = parent_node;
+parent_node->allnext = device_node;
+parent_node->child = device_node;
+}
+else
+{
+struct dt_device_node *np;
+/* If parent has at least one child node.

incorrect comment style, should be:
/*
  *
  */

Changed this in v7.



+ * Iterate to the last child node of parent.
+ */
+for ( np = parent_node->child; np->sibling != NULL; np = np->sibling );
+
+/* Iterate over all child nodes of np node. */
+if ( np->child )
+{
+struct dt_device_node *np_last_descendant;
+
+np_last_descendant = find_last_descendants_node(np);
+
+next_node = np_last_descendant->allnext;
+np_last_descendant->allnext = device_node;
+}
+else
+{
+next_node = np->allnext;
+np->allnext = device_node;
+}
+
+device_node->parent = parent_node;
+np->sibling = device_node;
+np->sibling->sibling = NULL;
+}
+
+/* Iterate over all child nodes of device_node to add children too. */
+if ( device_node->child )
+{
+struct dt_device_node *device_node_last_descendant;
+
+device_node_last_descendant = find_last_descendants_node(device_node);

empty line


+/* Plug next_node at the end of last children of device_node. */
+device_node_last_descendant->allnext = next_node;
+}
+else
+{
+/* Now plug next_node at the end of device_node. */
+device_node->allnext = next_node;
+}
+
+return 0;
+}
+
  /* Basic sanity check for the dtbo tool stack provided to Xen. */
  static int check_overlay_fdt(const void *overlay_fdt, uint32_t 
overlay_fdt_size)
  {
@@ -145,6 +234,82 @@ static unsigned int

Re: [XEN][PATCH v6 17/19] tools/libs/ctrl: Implement new xc interfaces for dt overlay

2023-06-01 Thread Vikram Garhwal


Hi Anthony,

On 5/18/23 9:01 AM, Anthony PERARD wrote:

On Tue, May 02, 2023 at 04:36:48PM -0700, Vikram Garhwal wrote:

xc_dt_overlay() sends the device tree binary overlay, size of .dtbo and overlay
operation type i.e. add or remove to xen.

Signed-off-by: Vikram Garhwal 

Reviewed-by: Anthony PERARD 
Thanks for reviewing this one. Can you please re-review this patch for 
v7 version? I added small padding change as per Jan'

s comment on 15/19 patch and that changed this patch too.


Thanks,

[XEN][PATCH v7 19/19] tools/xl: Add new xl command overlay for device tree overlay support

2023-06-01 Thread Vikram Garhwal

Signed-off-by: Vikram Garhwal 
Reviewed-by: Anthony PERARD 
---
 tools/xl/xl.h   |  1 +
 tools/xl/xl_cmdtable.c  |  6 +
 tools/xl/xl_vmcontrol.c | 52 +
 3 files changed, 59 insertions(+)

diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 72538d6a81..a923daccd3 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -138,6 +138,7 @@ int main_shutdown(int argc, char **argv);
 int main_reboot(int argc, char **argv);
 int main_list(int argc, char **argv);
 int main_vm_list(int argc, char **argv);
+int main_dt_overlay(int argc, char **argv);
 int main_create(int argc, char **argv);
 int main_config_update(int argc, char **argv);
 int main_button_press(int argc, char **argv);
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index ccf4d83584..db0acff62a 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -630,6 +630,12 @@ const struct cmd_spec cmd_table[] = {
   "Issue a qemu monitor command to the device model of a domain",
   " ",
 },
+{ "dt-overlay",
+  &main_dt_overlay, 0, 1,
+  "Add/Remove a device tree overlay",
+  "add/remove <.dtbo>"
+  "-h print this help\n"
+},
 };
 
 const int cmdtable_len = ARRAY_SIZE(cmd_table);
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 5518c78dc6..de56e00d8b 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -1265,6 +1265,58 @@ int main_create(int argc, char **argv)
 return 0;
 }
 
+int main_dt_overlay(int argc, char **argv)
+{
+const char *overlay_ops = NULL;
+const char *overlay_config_file = NULL;
+void *overlay_dtb = NULL;
+int rc;
+uint8_t op;
+int overlay_dtb_size = 0;
+const int overlay_add_op = 1;
+const int overlay_remove_op = 2;
+
+if (argc < 2) {
+help("dt_overlay");
+return EXIT_FAILURE;
+}
+
+overlay_ops = argv[1];
+overlay_config_file = argv[2];
+
+if (strcmp(overlay_ops, "add") == 0)
+op = overlay_add_op;
+else if (strcmp(overlay_ops, "remove") == 0)
+op = overlay_remove_op;
+else {
+fprintf(stderr, "Invalid dt overlay operation\n");
+return EXIT_FAILURE;
+}
+
+if (overlay_config_file) {
+rc = libxl_read_file_contents(ctx, overlay_config_file,
+  &overlay_dtb, &overlay_dtb_size);
+
+if (rc) {
+fprintf(stderr, "failed to read the overlay device tree file %s\n",
+overlay_config_file);
+free(overlay_dtb);
+return ERROR_FAIL;
+}
+} else {
+fprintf(stderr, "overlay dtbo file not provided\n");
+return ERROR_FAIL;
+}
+
+rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op);
+
+free(overlay_dtb);
+
+if (rc)
+return EXIT_FAILURE;
+
+return rc;
+}
 /*
  * Local variables:
  * mode: C
-- 
2.17.1

[XEN][PATCH v7 13/19] asm/smp.h: Fix circular dependency for device_tree.h and rwlock.h

2023-06-01 Thread Vikram Garhwal

Dynamic programming ops will modify the dt_host and there might be other
function which are browsing the dt_host at the same time. To avoid the race
conditions, adding rwlock for browsing the dt_host. But adding rwlock in
device_tree.h causes following circular dependency:
device_tree.h->rwlock.h->smp.h->asm/smp.h->device_tree.h

To fix this, removed the "#include  and forward declared
"struct dt_device_node".

Signed-off-by: Vikram Garhwal 
Reviewed-by: Henry Wang 
Reviewed-by: Michal Orzel 
---
 xen/arch/arm/include/asm/smp.h | 3 ++-
 xen/arch/arm/smpboot.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/include/asm/smp.h b/xen/arch/arm/include/asm/smp.h
index a37ca55bff..b12949ba8a 100644
--- a/xen/arch/arm/include/asm/smp.h
+++ b/xen/arch/arm/include/asm/smp.h
@@ -3,13 +3,14 @@
 
 #ifndef __ASSEMBLY__
 #include 
-#include 
 #include 
 #endif
 
 DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 
+struct dt_device_node;
+
 #define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
 
 #define smp_processor_id() get_processor_id()
diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index e107b86b7b..eeb76cd551 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.17.1

[XEN][PATCH v7 18/19] tools/libs/light: Implement new libxl functions for device tree overlay ops

2023-06-01 Thread Vikram Garhwal

Signed-off-by: Vikram Garhwal 
Reviewed-by: Anthony PERARD 
---
 tools/include/libxl.h   | 11 +
 tools/libs/light/Makefile   |  3 ++
 tools/libs/light/libxl_dt_overlay.c | 71 +
 3 files changed, 85 insertions(+)
 create mode 100644 tools/libs/light/libxl_dt_overlay.c

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index cfa1a19131..1c5e8abaae 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -250,6 +250,12 @@
  */
 #define LIBXL_HAVE_DEVICETREE_PASSTHROUGH 1
 
+#if defined(__arm__) || defined(__aarch64__)
+/**
+ * This means Device Tree Overlay is supported.
+ */
+#define LIBXL_HAVE_DT_OVERLAY 1
+#endif
 /*
  * libxl_domain_build_info has device_model_user to specify the user to
  * run the device model with. See docs/misc/qemu-deprivilege.txt.
@@ -2453,6 +2459,11 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, 
uint32_t domid,
 int *num);
 void libxl_device_pci_list_free(libxl_device_pci* list, int num);
 
+#if defined(__arm__) || defined(__aarch64__)
+int libxl_dt_overlay(libxl_ctx *ctx, void *overlay,
+ uint32_t overlay_size, uint8_t overlay_op);
+#endif
+
 /*
  * Turns the current process into a backend device service daemon
  * for a driver domain.
diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile
index 5d7ff94b05..ba4c1b7933 100644
--- a/tools/libs/light/Makefile
+++ b/tools/libs/light/Makefile
@@ -112,6 +112,9 @@ OBJS-y += _libxl_types.o
 OBJS-y += libxl_flask.o
 OBJS-y += _libxl_types_internal.o
 
+# Device tree overlay is enabled only for ARM architecture.
+OBJS-$(CONFIG_ARM) += libxl_dt_overlay.o
+
 ifeq ($(CONFIG_LIBNL),y)
 CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
 endif
diff --git a/tools/libs/light/libxl_dt_overlay.c 
b/tools/libs/light/libxl_dt_overlay.c
new file mode 100644
index 00..a6c709a6dc
--- /dev/null
+++ b/tools/libs/light/libxl_dt_overlay.c
@@ -0,0 +1,71 @@
+/*
+ * Copyright (C) 2021 Xilinx Inc.
+ * Author Vikram Garhwal 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+#include "libxl_internal.h"
+#include 
+#include 
+
+static int check_overlay_fdt(libxl__gc *gc, void *fdt, size_t size)
+{
+int r;
+
+if (fdt_magic(fdt) != FDT_MAGIC) {
+LOG(ERROR, "Overlay FDT is not a valid Flat Device Tree");
+return ERROR_FAIL;
+}
+
+r = fdt_check_header(fdt);
+if (r) {
+LOG(ERROR, "Failed to check the overlay FDT (%d)", r);
+return ERROR_FAIL;
+}
+
+if (fdt_totalsize(fdt) > size) {
+LOG(ERROR, "Overlay FDT totalsize is too big");
+return ERROR_FAIL;
+}
+
+return 0;
+}
+
+int libxl_dt_overlay(libxl_ctx *ctx, void *overlay_dt, uint32_t 
overlay_dt_size,
+ uint8_t overlay_op)
+{
+int rc;
+int r;
+GC_INIT(ctx);
+
+if (check_overlay_fdt(gc, overlay_dt, overlay_dt_size)) {
+LOG(ERROR, "Overlay DTB check failed");
+rc = ERROR_FAIL;
+goto out;
+} else {
+LOG(DEBUG, "Overlay DTB check passed");
+rc = 0;
+}
+
+r = xc_dt_overlay(ctx->xch, overlay_dt, overlay_dt_size, overlay_op);
+
+if (r) {
+LOG(ERROR, "%s: Adding/Removing overlay dtb failed.", __func__);
+rc = ERROR_FAIL;
+}
+
+out:
+GC_FREE;
+return rc;
+}
+
-- 
2.17.1

[XEN][PATCH v7 12/19] xen/smmu: Add remove_device callback for smmu_iommu ops

2023-06-01 Thread Vikram Garhwal

Add remove_device callback for removing the device entry from smmu-master using
following steps:
1. Find if SMMU master exists for the device node.
2. Check if device is currently in use.
3. Remove the SMMU master.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
 xen/drivers/passthrough/arm/smmu.c | 59 ++
 1 file changed, 59 insertions(+)

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index c37fa9af13..fdef6e7a7d 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -815,6 +816,19 @@ static int insert_smmu_master(struct arm_smmu_device *smmu,
return 0;
 }
 
+static int remove_smmu_master(struct arm_smmu_device *smmu,
+ struct arm_smmu_master *master)
+{
+   if (!smmu->masters.rb_node) {
+   ASSERT_UNREACHABLE();
+   return -ENOENT;
+   }
+
+   rb_erase(&master->node, &smmu->masters);
+
+   return 0;
+}
+
 static int arm_smmu_dt_add_device_legacy(struct arm_smmu_device *smmu,
 struct device *dev,
 struct iommu_fwspec *fwspec)
@@ -852,6 +866,34 @@ static int arm_smmu_dt_add_device_legacy(struct 
arm_smmu_device *smmu,
return insert_smmu_master(smmu, master);
 }
 
+static int arm_smmu_dt_remove_device_legacy(struct arm_smmu_device *smmu,
+struct device *dev)
+{
+   struct arm_smmu_master *master;
+   struct device_node *dev_node = dev_get_dev_node(dev);
+   int ret;
+
+   master = find_smmu_master(smmu, dev_node);
+   if (master == NULL) {
+   dev_err(dev,
+   "No registrations found for master device %s\n",
+   dev_node->name);
+   return -EINVAL;
+   }
+
+   if (iommu_dt_device_is_assigned_locked(dev_to_dt(dev)))
+   return -EBUSY;
+
+   ret = remove_smmu_master(smmu, master);
+   if (ret)
+   return ret;
+
+   dev_node->is_protected = false;
+
+   kfree(master);
+   return 0;
+}
+
 static int register_smmu_master(struct arm_smmu_device *smmu,
struct device *dev,
struct of_phandle_args *masterspec)
@@ -875,6 +917,22 @@ static int register_smmu_master(struct arm_smmu_device 
*smmu,
 fwspec);
 }
 
+static int arm_smmu_dt_remove_device_generic(u8 devfn, struct device *dev)
+{
+   struct arm_smmu_device *smmu;
+   struct iommu_fwspec *fwspec;
+
+   fwspec = dev_iommu_fwspec_get(dev);
+   if (fwspec == NULL)
+   return -ENXIO;
+
+   smmu = find_smmu(fwspec->iommu_dev);
+   if (smmu == NULL)
+   return -ENXIO;
+
+   return arm_smmu_dt_remove_device_legacy(smmu, dev);
+}
+
 static int arm_smmu_dt_add_device_generic(u8 devfn, struct device *dev)
 {
struct arm_smmu_device *smmu;
@@ -2859,6 +2917,7 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
 .init = arm_smmu_iommu_domain_init,
 .hwdom_init = arch_iommu_hwdom_init,
 .add_device = arm_smmu_dt_add_device_generic,
+.remove_device = arm_smmu_dt_remove_device_generic,
 .teardown = arm_smmu_iommu_domain_teardown,
 .iotlb_flush = arm_smmu_iotlb_flush,
 .assign_device = arm_smmu_assign_dev,
-- 
2.17.1

[XEN][PATCH v7 14/19] common/device_tree: Add rwlock for dt_host

2023-06-01 Thread Vikram Garhwal

 Dynamic programming ops will modify the dt_host and there might be other
 function which are browsing the dt_host at the same time. To avoid the race
 conditions, adding rwlock for browsing the dt_host during runtime.

 Reason behind adding rwlock instead of spinlock:
For now, dynamic programming is the sole modifier of dt_host in Xen during
run time. All other access functions like iommu_release_dt_device() are
just reading the dt_host during run-time. So, there is a need to protect
others from browsing the dt_host while dynamic programming is modifying
it. rwlock is better suitable for this task as spinlock won't be able to
differentiate between read and write access.

Signed-off-by: Vikram Garhwal 

---
Changes from v6:
Remove redundant "read_unlock(&dt_host->lock);" in the following case:
 XEN_DOMCTL_deassign_device
---
 xen/common/device_tree.c  |  4 
 xen/drivers/passthrough/device_tree.c | 15 +++
 xen/include/xen/device_tree.h |  6 ++
 3 files changed, 25 insertions(+)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index c5250a1644..c8fcdf8fa1 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -2146,7 +2146,11 @@ int unflatten_device_tree(const void *fdt, struct 
dt_device_node **mynodes)
 
 dt_dprintk(" <- unflatten_device_tree()\n");
 
+/* Init r/w lock for host device tree. */
+rwlock_init(&dt_host->lock);
+
 return 0;
+
 }
 
 static void dt_alias_add(struct dt_alias_prop *ap,
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 301a5bcd97..f4d9deb624 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -112,6 +112,8 @@ int iommu_release_dt_devices(struct domain *d)
 if ( !is_iommu_enabled(d) )
 return 0;
 
+read_lock(&dt_host->lock);
+
 list_for_each_entry_safe(dev, _dev, &hd->dt_devices, domain_list)
 {
 rc = iommu_deassign_dt_device(d, dev);
@@ -119,10 +121,14 @@ int iommu_release_dt_devices(struct domain *d)
 {
 dprintk(XENLOG_ERR, "Failed to deassign %s in domain %u\n",
 dt_node_full_name(dev), d->domain_id);
+
+read_unlock(&dt_host->lock);
 return rc;
 }
 }
 
+read_unlock(&dt_host->lock);
+
 return 0;
 }
 
@@ -246,6 +252,8 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 int ret;
 struct dt_device_node *dev;
 
+read_lock(&dt_host->lock);
+
 switch ( domctl->cmd )
 {
 case XEN_DOMCTL_assign_device:
@@ -295,7 +303,10 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 spin_unlock(&dtdevs_lock);
 
 if ( d == dom_io )
+{
+read_unlock(&dt_host->lock);
 return -EINVAL;
+}
 
 ret = iommu_add_dt_device(dev);
 if ( ret < 0 )
@@ -333,7 +344,10 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 break;
 
 if ( d == dom_io )
+{
+read_unlock(&dt_host->lock);
 return -EINVAL;
+}
 
 ret = iommu_deassign_dt_device(d, dev);
 
@@ -348,5 +362,6 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 break;
 }
 
+read_unlock(&dt_host->lock);
 return ret;
 }
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index e239f7de26..dee40d2ea3 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DEVICE_TREE_MAX_DEPTH 16
 
@@ -106,6 +107,11 @@ struct dt_device_node {
 struct list_head domain_list;
 
 struct device dev;
+
+/*
+ * Lock that protects r/w updates to unflattened device tree i.e. dt_host.
+ */
+rwlock_t lock;
 };
 
 #define dt_to_dev(dt_node)  (&(dt_node)->dev)
-- 
2.17.1

[XEN][PATCH v7 16/19] xen/arm: Implement device tree node addition functionalities

2023-06-01 Thread Vikram Garhwal

Update sysctl XEN_SYSCTL_dt_overlay to enable support for dtbo nodes addition
using device tree overlay.

xl dt-overlay add file.dtbo:
Each time overlay nodes are added using .dtbo, a new fdt(memcpy of
device_tree_flattened) is created and updated with overlay nodes. This
updated fdt is further unflattened to a dt_host_new. Next, it checks if any
of the overlay nodes already exists in the dt_host. If overlay nodes doesn't
exist then find the overlay nodes in dt_host_new, find the overlay node's
parent in dt_host and add the nodes as child under their parent in the
dt_host. The node is attached as the last node under target parent.

Finally, add IRQs, add device to IOMMUs, set permissions and map MMIO for 
the
overlay node.

When a node is added using overlay, a new entry is allocated in the
overlay_track to keep the track of memory allocation due to addition of overlay
node. This is helpful for freeing the memory allocated when a device tree node
is removed.

The main purpose of this to address first part of dynamic programming i.e.
making xen aware of new device tree node which means updating the dt_host with
overlay node information. Here we are adding/removing node from dt_host, and
checking/setting IOMMU and IRQ permission but never mapping them to any domain.
Right now, mapping/Un-mapping will happen only when a new domU is
created/destroyed using "xl create".

Signed-off-by: Vikram Garhwal 

---
Changes from v6:
Fix comment style and add comment regarding false flag in irq mapping.
Move malloc for nodes_full_path to handle_add_overlay_nodes.
Move node_num define to start of overlay_get_nodes_info().
Remove "domain *d" from handle_add_irq_iommu().
Fix error handling for handle_add_irq_iommu().
Split handle_add_overlay_nodes to two functions.
Create a separate function for freeing nodes_full_path.
Fix xfree for dt_sysctl.
---
 xen/common/dt-overlay.c | 533 
 1 file changed, 533 insertions(+)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index b2a7e441df..12b6b010ef 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -33,6 +33,25 @@ static struct dt_device_node *
 return child_node;
 }
 
+/*
+ * Returns next node to the input node. If node has children then return
+ * last descendant's next node.
+*/
+static struct dt_device_node *
+dt_find_next_node(struct dt_device_node *dt, const struct dt_device_node *node)
+{
+struct dt_device_node *np;
+
+dt_for_each_device_node(dt, np)
+if ( np == node )
+break;
+
+if ( np->child )
+np = find_last_descendants_node(np);
+
+return np->allnext;
+}
+
 static int dt_overlay_remove_node(struct dt_device_node *device_node)
 {
 struct dt_device_node *np;
@@ -106,6 +125,78 @@ static int dt_overlay_remove_node(struct dt_device_node 
*device_node)
 return 0;
 }
 
+static int dt_overlay_add_node(struct dt_device_node *device_node,
+   const char *parent_node_path)
+{
+struct dt_device_node *parent_node;
+struct dt_device_node *next_node;
+
+parent_node = dt_find_node_by_path(parent_node_path);
+
+if ( parent_node == NULL )
+{
+dt_dprintk("Parent node %s not found. Overlay node will not be 
added\n",
+   parent_node_path);
+return -EINVAL;
+}
+
+/* If parent has no child. */
+if ( parent_node->child == NULL )
+{
+next_node = parent_node->allnext;
+device_node->parent = parent_node;
+parent_node->allnext = device_node;
+parent_node->child = device_node;
+}
+else
+{
+struct dt_device_node *np;
+/*
+ * If parent has at least one child node.
+ * Iterate to the last child node of parent.
+ */
+for ( np = parent_node->child; np->sibling != NULL; np = np->sibling );
+
+/* Iterate over all child nodes of np node. */
+if ( np->child )
+{
+struct dt_device_node *np_last_descendant;
+
+np_last_descendant = find_last_descendants_node(np);
+
+next_node = np_last_descendant->allnext;
+np_last_descendant->allnext = device_node;
+}
+else
+{
+next_node = np->allnext;
+np->allnext = device_node;
+}
+
+device_node->parent = parent_node;
+np->sibling = device_node;
+np->sibling->sibling = NULL;
+}
+
+/* Iterate over all child nodes of device_node to add children too. */
+if ( device_node->child )
+{
+struct dt_device_node *device_node_last_descendant;
+
+device_node_last_descendant = find_last_descendants_node(device_node);
+
+/* Plug next_node at the end of last children of device_node. */
+device_node_last_descendant->allnext = next_node;
+}
+else
+{
+/* Now plug next_node at the end of device_nod

[XEN][PATCH v7 15/19] xen/arm: Implement device tree node removal functionalities

2023-06-01 Thread Vikram Garhwal

Introduce sysctl XEN_SYSCTL_dt_overlay to remove device-tree nodes added using
device tree overlay.

xl dt-overlay remove file.dtbo:
Removes all the nodes in a given dtbo.
First, removes IRQ permissions and MMIO accesses. Next, it finds the nodes
in dt_host and delete the device node entries from dt_host.

The nodes get removed only if it is not used by any of dom0 or domio.

Also, added overlay_track struct to keep the track of added node through device
tree overlay. overlay_track has dt_host_new which is unflattened form of updated
fdt and name of overlay nodes. When a node is removed, we also free the memory
used by overlay_track for the particular overlay node.

Nested overlay removal is supported in sequential manner only i.e. if
overlay_child nests under overlay_parent, it is assumed that user first removes
overlay_child and then removes overlay_parent.

Signed-off-by: Vikram Garhwal 

---
Changes from v6:
Add explicit padding for xen_system_dt_overlay{}
Update license.
Rearrange xfree in dt_sysctl()
Update overlay_track struct comment with relevant message.
Fix missing xen/errno.h for builds without CONFIG_OVERLAY_DTB cases.
Fix header formatting.
---
 xen/arch/arm/sysctl.c|  16 +-
 xen/common/Makefile  |   1 +
 xen/common/dt-overlay.c  | 420 +++
 xen/include/public/sysctl.h  |  24 ++
 xen/include/xen/dt-overlay.h |  59 +
 5 files changed, 519 insertions(+), 1 deletion(-)
 create mode 100644 xen/common/dt-overlay.c
 create mode 100644 xen/include/xen/dt-overlay.h

diff --git a/xen/arch/arm/sysctl.c b/xen/arch/arm/sysctl.c
index b0a78a8b10..8b813c970f 100644
--- a/xen/arch/arm/sysctl.c
+++ b/xen/arch/arm/sysctl.c
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -21,7 +22,20 @@ void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
 long arch_do_sysctl(struct xen_sysctl *sysctl,
 XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
-return -ENOSYS;
+long ret;
+
+switch ( sysctl->cmd )
+{
+case XEN_SYSCTL_dt_overlay:
+ret = dt_sysctl(&sysctl->u.dt_overlay);
+break;
+
+default:
+ret = -ENOSYS;
+break;
+}
+
+return ret;
 }
 
 /*
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 46049eac35..e7e96b1087 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_DEBUG_TRACE) += debugtrace.o
 obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o
 obj-$(CONFIG_IOREQ_SERVER) += dm.o
 obj-y += domain.o
+obj-$(CONFIG_OVERLAY_DTB) += dt-overlay.o
 obj-y += event_2l.o
 obj-y += event_channel.o
 obj-y += event_fifo.o
diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
new file mode 100644
index 00..b2a7e441df
--- /dev/null
+++ b/xen/common/dt-overlay.c
@@ -0,0 +1,420 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * xen/common/dt-overlay.c
+ *
+ * Device tree overlay support in Xen.
+ *
+ * Copyright (C) 2023, Advanced Micro Devices, Inc. All Rights Reserved.
+ * Written by Vikram Garhwal 
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static LIST_HEAD(overlay_tracker);
+static DEFINE_SPINLOCK(overlay_lock);
+
+/* Find last descendants of the device_node. */
+static struct dt_device_node *
+find_last_descendants_node(struct dt_device_node *device_node)
+{
+struct dt_device_node *child_node;
+
+for ( child_node = device_node->child; child_node->sibling != NULL;
+  child_node = child_node->sibling );
+
+/* If last child_node also have children. */
+if ( child_node->child )
+child_node = find_last_descendants_node(child_node);
+
+return child_node;
+}
+
+static int dt_overlay_remove_node(struct dt_device_node *device_node)
+{
+struct dt_device_node *np;
+struct dt_device_node *parent_node;
+struct dt_device_node *device_node_last_descendant = device_node->child;
+
+parent_node = device_node->parent;
+
+if ( parent_node == NULL )
+{
+dt_dprintk("%s's parent node not found\n", device_node->name);
+return -EFAULT;
+}
+
+np = parent_node->child;
+
+if ( np == NULL )
+{
+dt_dprintk("parent node %s's not found\n", parent_node->name);
+return -EFAULT;
+}
+
+/* If node to be removed is only child node or first child. */
+if ( !dt_node_cmp(np->full_name, device_node->full_name) )
+{
+parent_node->child = np->sibling;
+
+/*
+ * Iterate over all child nodes of device_node. Given that we are
+ * removing parent node, we need to remove all it's descendants too.
+ */
+if ( device_node_last_descendant )
+{
+device_node_last_descendant =
+
find_last_descendants_node(device_node);
+parent_node->allnext = device_node_last_descendant->allnext;
+}
+else
+parent_node->al

[XEN][PATCH v7 09/19] xen/iommu: Move spin_lock from iommu_dt_device_is_assigned to caller

2023-06-01 Thread Vikram Garhwal

Rename iommu_dt_device_is_assigned() to iommu_dt_device_is_assigned_locked().
Remove static type so this can also be used by SMMU drivers to check if the
device is being used before removing.

Moving spin_lock to caller was done to prevent the concurrent access to
iommu_dt_device_is_assigned while doing add/remove/assign/deassign.

Signed-off-by: Vikram Garhwal 

---
Changes from v6:
Created a private header and moved iommu_dt_device_is_assigned() to header.
---
 xen/drivers/passthrough/device_tree.c | 20 
 xen/include/xen/iommu-private.h   | 27 +++
 2 files changed, 43 insertions(+), 4 deletions(-)
 create mode 100644 xen/include/xen/iommu-private.h

diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 1c32d7b50c..52e370db01 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -83,16 +84,14 @@ fail:
 return rc;
 }
 
-static bool_t iommu_dt_device_is_assigned(const struct dt_device_node *dev)
+bool_t iommu_dt_device_is_assigned_locked(const struct dt_device_node *dev)
 {
 bool_t assigned = 0;
 
 if ( !dt_device_is_protected(dev) )
 return 0;
 
-spin_lock(&dtdevs_lock);
 assigned = !list_empty(&dev->domain_list);
-spin_unlock(&dtdevs_lock);
 
 return assigned;
 }
@@ -213,27 +212,40 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 if ( (d && d->is_dying) || domctl->u.assign_device.flags )
 break;
 
+spin_lock(&dtdevs_lock);
+
 ret = dt_find_node_by_gpath(domctl->u.assign_device.u.dt.path,
 domctl->u.assign_device.u.dt.size,
 &dev);
 if ( ret )
+{
+spin_unlock(&dtdevs_lock);
 break;
+}
 
 ret = xsm_assign_dtdevice(XSM_HOOK, d, dt_node_full_name(dev));
 if ( ret )
+{
+spin_unlock(&dtdevs_lock);
 break;
+}
 
 if ( domctl->cmd == XEN_DOMCTL_test_assign_device )
 {
-if ( iommu_dt_device_is_assigned(dev) )
+
+if ( iommu_dt_device_is_assigned_locked(dev) )
 {
 printk(XENLOG_G_ERR "%s already assigned.\n",
dt_node_full_name(dev));
 ret = -EINVAL;
 }
+
+spin_unlock(&dtdevs_lock);
 break;
 }
 
+spin_unlock(&dtdevs_lock);
+
 if ( d == dom_io )
 return -EINVAL;
 
diff --git a/xen/include/xen/iommu-private.h b/xen/include/xen/iommu-private.h
new file mode 100644
index 00..5615decaff
--- /dev/null
+++ b/xen/include/xen/iommu-private.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+ /*
+ * xen/iommu-private.h
+ *
+ *
+ * Copyright (C) 2023, Advanced Micro Devices, Inc. All Rights Reserved.
+ * Written by Vikram Garhwal 
+ *
+ */
+#ifndef __XEN_IOMMU_PRIVATE_H__
+#define __XEN_IOMMU_PRIVATE_H__
+
+#ifdef CONFIG_HAS_DEVICE_TREE
+#include 
+bool_t iommu_dt_device_is_assigned_locked(const struct dt_device_node *dev);
+#endif
+
+#endif /* __XEN_IOMMU_PRIVATE_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.17.1

[XEN][PATCH v7 01/19] common/device_tree: handle memory allocation failure in __unflatten_device_tree()

2023-06-01 Thread Vikram Garhwal

Change __unflatten_device_tree() return type to integer so it can propagate
memory allocation failure. Add panic() in dt_unflatten_host_device_tree() for
memory allocation failure during boot.

Fixes: fb97eb614acf ("xen/arm: Create a hierarchical device tree")

Signed-off-by: Vikram Garhwal 
Reviewed-by: Henry Wang 
---
 xen/common/device_tree.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 8da1052911..dfdb10e488 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -2092,8 +2092,8 @@ static unsigned long __init unflatten_dt_node(const void 
*fdt,
  * @fdt: The fdt to expand
  * @mynodes: The device_node tree created by the call
  */
-static void __init __unflatten_device_tree(const void *fdt,
-   struct dt_device_node **mynodes)
+static int __init __unflatten_device_tree(const void *fdt,
+  struct dt_device_node **mynodes)
 {
 unsigned long start, mem, size;
 struct dt_device_node **allnextp = mynodes;
@@ -2114,6 +2114,8 @@ static void __init __unflatten_device_tree(const void 
*fdt,
 
 /* Allocate memory for the expanded device tree */
 mem = (unsigned long)_xmalloc (size + 4, __alignof__(struct 
dt_device_node));
+if ( !mem )
+return -ENOMEM;
 
 ((__be32 *)mem)[size / 4] = cpu_to_be32(0xdeadbeef);
 
@@ -2131,6 +2133,8 @@ static void __init __unflatten_device_tree(const void 
*fdt,
 *allnextp = NULL;
 
 dt_dprintk(" <- unflatten_device_tree()\n");
+
+return 0;
 }
 
 static void dt_alias_add(struct dt_alias_prop *ap,
@@ -2215,7 +2219,11 @@ dt_find_interrupt_controller(const struct 
dt_device_match *matches)
 
 void __init dt_unflatten_host_device_tree(void)
 {
-__unflatten_device_tree(device_tree_flattened, &dt_host);
+int error = __unflatten_device_tree(device_tree_flattened, &dt_host);
+
+if ( error )
+panic("__unflatten_device_tree failed with error %d\n", error);
+
 dt_alias_scan();
 }
 
-- 
2.17.1

[XEN][PATCH v7 07/19] libfdt: overlay: change overlay_get_target()

2023-06-01 Thread Vikram Garhwal

Rename overlay_get_target() to fdt_overlay_target_offset() and remove static
function type.

This is done to get the target path for the overlay nodes which is very useful
in many cases. For example, Xen hypervisor needs it when applying overlays
because Xen needs to do further processing of the overlay nodes, e.g. mapping of
resources(IRQs and IOMMUs) to other VMs, creation of SMMU pagetables, etc.

Signed-off-by: Vikram Garhwal 
Message-Id: <1637204036-382159-2-git-send-email-fnu.vik...@xilinx.com>
Signed-off-by: David Gibson 
Origin: git://git.kernel.org/pub/scm/utils/dtc/dtc.git 45f3d1a095dd

Signed-off-by: Vikram Garhwal 
Reviewed-by: Michal Orzel 
Reviewed-by: Henry Wang 
---
 xen/common/libfdt/fdt_overlay.c | 29 +++--
 xen/common/libfdt/version.lds   |  1 +
 xen/include/xen/libfdt/libfdt.h | 18 ++
 3 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/xen/common/libfdt/fdt_overlay.c b/xen/common/libfdt/fdt_overlay.c
index 7b95e2b639..acf0c4c2a6 100644
--- a/xen/common/libfdt/fdt_overlay.c
+++ b/xen/common/libfdt/fdt_overlay.c
@@ -41,37 +41,22 @@ static uint32_t overlay_get_target_phandle(const void 
*fdto, int fragment)
return fdt32_to_cpu(*val);
 }
 
-/**
- * overlay_get_target - retrieves the offset of a fragment's target
- * @fdt: Base device tree blob
- * @fdto: Device tree overlay blob
- * @fragment: node offset of the fragment in the overlay
- * @pathp: pointer which receives the path of the target (or NULL)
- *
- * overlay_get_target() retrieves the target offset in the base
- * device tree of a fragment, no matter how the actual targeting is
- * done (through a phandle or a path)
- *
- * returns:
- *  the targeted node offset in the base device tree
- *  Negative error code on error
- */
-static int overlay_get_target(const void *fdt, const void *fdto,
- int fragment, char const **pathp)
+int fdt_overlay_target_offset(const void *fdt, const void *fdto,
+ int fragment_offset, char const **pathp)
 {
uint32_t phandle;
const char *path = NULL;
int path_len = 0, ret;
 
/* Try first to do a phandle based lookup */
-   phandle = overlay_get_target_phandle(fdto, fragment);
+   phandle = overlay_get_target_phandle(fdto, fragment_offset);
if (phandle == (uint32_t)-1)
return -FDT_ERR_BADPHANDLE;
 
/* no phandle, try path */
if (!phandle) {
/* And then a path based lookup */
-   path = fdt_getprop(fdto, fragment, "target-path", &path_len);
+   path = fdt_getprop(fdto, fragment_offset, "target-path", 
&path_len);
if (path)
ret = fdt_path_offset(fdt, path);
else
@@ -638,7 +623,7 @@ static int overlay_merge(void *fdt, void *fdto)
if (overlay < 0)
return overlay;
 
-   target = overlay_get_target(fdt, fdto, fragment, NULL);
+   target = fdt_overlay_target_offset(fdt, fdto, fragment, NULL);
if (target < 0)
return target;
 
@@ -781,7 +766,7 @@ static int overlay_symbol_update(void *fdt, void *fdto)
return -FDT_ERR_BADOVERLAY;
 
/* get the target of the fragment */
-   ret = overlay_get_target(fdt, fdto, fragment, &target_path);
+   ret = fdt_overlay_target_offset(fdt, fdto, fragment, 
&target_path);
if (ret < 0)
return ret;
target = ret;
@@ -803,7 +788,7 @@ static int overlay_symbol_update(void *fdt, void *fdto)
 
if (!target_path) {
/* again in case setprop_placeholder changed it */
-   ret = overlay_get_target(fdt, fdto, fragment, 
&target_path);
+   ret = fdt_overlay_target_offset(fdt, fdto, fragment, 
&target_path);
if (ret < 0)
return ret;
target = ret;
diff --git a/xen/common/libfdt/version.lds b/xen/common/libfdt/version.lds
index 7ab85f1d9d..cbce5d4a8b 100644
--- a/xen/common/libfdt/version.lds
+++ b/xen/common/libfdt/version.lds
@@ -77,6 +77,7 @@ LIBFDT_1.2 {
fdt_appendprop_addrrange;
fdt_setprop_inplace_namelen_partial;
fdt_create_with_flags;
+   fdt_overlay_target_offset;
local:
*;
 };
diff --git a/xen/include/xen/libfdt/libfdt.h b/xen/include/xen/libfdt/libfdt.h
index c71689e2be..fabddbee8c 100644
--- a/xen/include/xen/libfdt/libfdt.h
+++ b/xen/include/xen/libfdt/libfdt.h
@@ -2109,6 +2109,24 @@ int fdt_del_node(void *fdt, int nodeoffset);
  */
 int fdt_overlay_apply(void *fdt, void *fdto);
 
+/**
+ * fdt_overlay_target_offset - retrieves the offset of a fragment's target
+ * @fdt: Base device tree blob
+ * @fdto: Device tree overlay blob
+ * @fr

[XEN][PATCH v7 11/19] xen/iommu: Introduce iommu_remove_dt_device()

2023-06-01 Thread Vikram Garhwal

Remove master device from the IOMMU. This will be helpful when removing the
overlay nodes using dynamic programming during run time.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Michal Orzel 
Acked-by: Jan Beulich 
---
 xen/drivers/passthrough/device_tree.c | 41 +++
 xen/include/xen/iommu.h   |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 8cc413f867..301a5bcd97 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -126,6 +126,47 @@ int iommu_release_dt_devices(struct domain *d)
 return 0;
 }
 
+int iommu_remove_dt_device(struct dt_device_node *np)
+{
+const struct iommu_ops *ops = iommu_get_ops();
+struct device *dev = dt_to_dev(np);
+int rc;
+
+if ( !ops )
+return -EOPNOTSUPP;
+
+spin_lock(&dtdevs_lock);
+
+if ( iommu_dt_device_is_assigned_locked(np) )
+{
+rc = -EBUSY;
+goto fail;
+}
+
+/*
+ * The driver which supports generic IOMMU DT bindings must have this
+ * callback implemented.
+ */
+if ( !ops->remove_device )
+{
+rc = -EOPNOTSUPP;
+goto fail;
+}
+
+/*
+ * Remove master device from the IOMMU if latter is present and available.
+ * The driver is responsible for removing is_protected flag.
+ */
+rc = ops->remove_device(0, dev);
+
+if ( !rc )
+iommu_fwspec_free(dev);
+
+fail:
+spin_unlock(&dtdevs_lock);
+return rc;
+}
+
 int iommu_add_dt_device(struct dt_device_node *np)
 {
 const struct iommu_ops *ops = iommu_get_ops();
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 405db59971..0d7924821b 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -229,6 +229,8 @@ int iommu_release_dt_devices(struct domain *d);
  */
 int iommu_add_dt_device(struct dt_device_node *np);
 
+int iommu_remove_dt_device(struct dt_device_node *np);
+
 int iommu_do_dt_domctl(struct xen_domctl *, struct domain *,
XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
 
-- 
2.17.1

[XEN][PATCH v7 03/19] xen/arm/device: Remove __init from function type

2023-06-01 Thread Vikram Garhwal

Remove __init from following function to access during runtime:
1. map_irq_to_domain()
2. handle_device_interrupts()
3. map_range_to_domain()
4. unflatten_dt_node()

Move map_irq_to_domain() prototype from domain_build.h to setup.h.

To avoid breaking the build, following changes are also done:
1. Move map_irq_to_domain(), handle_device_interrupts() and 
map_range_to_domain()
to device.c. After removing __init type,  these functions are not specific
to domain building, so moving them out of domain_build.c to device.c.
2. Remove static type from handle_device_interrupt().

Overall, these changes are done to support the dynamic programming of a nodes
where an overlay node will be added to fdt and unflattened node will be added to
dt_host. Furthermore, IRQ and mmio mapping will be done for the added node.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Michal Orzel 
---
 xen/arch/arm/device.c   | 149 
 xen/arch/arm/domain_build.c | 147 ---
 xen/arch/arm/include/asm/domain_build.h |   2 -
 xen/arch/arm/include/asm/setup.h|   6 +
 xen/common/device_tree.c|  12 +-
 5 files changed, 161 insertions(+), 155 deletions(-)

diff --git a/xen/arch/arm/device.c b/xen/arch/arm/device.c
index ca8539dee5..1652d765bd 100644
--- a/xen/arch/arm/device.c
+++ b/xen/arch/arm/device.c
@@ -9,8 +9,10 @@
  */
 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 
 extern const struct device_desc _sdevice[], _edevice[];
@@ -75,6 +77,153 @@ enum device_class device_get_class(const struct 
dt_device_node *dev)
 return DEVICE_UNKNOWN;
 }
 
+int map_irq_to_domain(struct domain *d, unsigned int irq,
+  bool need_mapping, const char *devname)
+{
+int res;
+
+res = irq_permit_access(d, irq);
+if ( res )
+{
+printk(XENLOG_ERR "Unable to permit to %pd access to IRQ %u\n", d, 
irq);
+return res;
+}
+
+if ( need_mapping )
+{
+/*
+ * Checking the return of vgic_reserve_virq is not
+ * necessary. It should not fail except when we try to map
+ * the IRQ twice. This can legitimately happen if the IRQ is shared
+ */
+vgic_reserve_virq(d, irq);
+
+res = route_irq_to_guest(d, irq, irq, devname);
+if ( res < 0 )
+{
+printk(XENLOG_ERR "Unable to map IRQ%u to %pd\n", irq, d);
+return res;
+}
+}
+
+dt_dprintk("  - IRQ: %u\n", irq);
+return 0;
+}
+
+int map_range_to_domain(const struct dt_device_node *dev,
+uint64_t addr, uint64_t len, void *data)
+{
+struct map_range_data *mr_data = data;
+struct domain *d = mr_data->d;
+int res;
+
+if ( (addr != (paddr_t)addr) || (((paddr_t)~0 - addr) < len) )
+{
+printk(XENLOG_ERR "%s: [0x%"PRIx64", 0x%"PRIx64"] exceeds the maximum 
allowed PA width (%u bits)",
+   dt_node_full_name(dev), addr, (addr + len), PADDR_BITS);
+return -ERANGE;
+}
+
+/*
+ * reserved-memory regions are RAM carved out for a special purpose.
+ * They are not MMIO and therefore a domain should not be able to
+ * manage them via the IOMEM interface.
+ */
+if ( strncasecmp(dt_node_full_name(dev), "/reserved-memory/",
+ strlen("/reserved-memory/")) != 0 )
+{
+res = iomem_permit_access(d, paddr_to_pfn(addr),
+paddr_to_pfn(PAGE_ALIGN(addr + len - 1)));
+if ( res )
+{
+printk(XENLOG_ERR "Unable to permit to dom%d access to"
+" 0x%"PRIx64" - 0x%"PRIx64"\n",
+d->domain_id,
+addr & PAGE_MASK, PAGE_ALIGN(addr + len) - 1);
+return res;
+}
+}
+
+if ( !mr_data->skip_mapping )
+{
+res = map_regions_p2mt(d,
+   gaddr_to_gfn(addr),
+   PFN_UP(len),
+   maddr_to_mfn(addr),
+   mr_data->p2mt);
+
+if ( res < 0 )
+{
+printk(XENLOG_ERR "Unable to map 0x%"PRIx64
+   " - 0x%"PRIx64" in domain %d\n",
+   addr & PAGE_MASK, PAGE_ALIGN(addr + len) - 1,
+   d->domain_id);
+return res;
+}
+}
+
+dt_dprintk("  - MMIO: %010"PRIx64" - %010"PRIx64" P2MType=%x\n",
+   addr, addr + len, mr_data->p2mt);
+
+return 0;
+}
+
+/*
+ * handle_device_interrupts retrieves the interrupts configuration from
+ * a device tree node and maps those interrupts to the target domain.
+ *
+ * Returns:
+ *   < 0 error
+ *   0   success
+ */
+int handle_device_interrupts(struct domain *d,
+ struct dt_device_node *dev,
+ bool need_mapping)
+{
+unsigned int i, nirq;
+int res;
+struct dt_raw_irq rirq;
+
+nirq = dt_number_of_irq(d

[XEN][PATCH v7 10/19] xen/iommu: protect iommu_add_dt_device() with dtdevs_lock

2023-06-01 Thread Vikram Garhwal

Protect iommu_add_dt_device() with dtdevs_lock to prevent concurrent access add.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
 xen/drivers/passthrough/device_tree.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 52e370db01..8cc413f867 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -146,6 +146,8 @@ int iommu_add_dt_device(struct dt_device_node *np)
 if ( dev_iommu_fwspec_get(dev) )
 return 0;
 
+spin_lock(&dtdevs_lock);
+
 /*
  * According to the Documentation/devicetree/bindings/iommu/iommu.txt
  * from Linux.
@@ -158,7 +160,10 @@ int iommu_add_dt_device(struct dt_device_node *np)
  * these callback implemented.
  */
 if ( !ops->add_device || !ops->dt_xlate )
-return -EINVAL;
+{
+rc = -EINVAL;
+goto fail;
+}
 
 if ( !dt_device_is_available(iommu_spec.np) )
 break;
@@ -189,6 +194,8 @@ int iommu_add_dt_device(struct dt_device_node *np)
 if ( rc < 0 )
 iommu_fwspec_free(dev);
 
+fail:
+spin_unlock(&dtdevs_lock);
 return rc;
 }
 
-- 
2.17.1

[XEN][PATCH v7 05/19] xen/arm: Add CONFIG_OVERLAY_DTB

2023-06-01 Thread Vikram Garhwal

Introduce a config option where the user can enable support for adding/removing
device tree nodes using a device tree binary overlay.

Update SUPPORT.md and CHANGELOG.md to state the Device Tree Overlays support for
Arm.

Signed-off-by: Vikram Garhwal 

---
Changes from v6:
Add CHANGELOG and SUPPORT.md entries.
---
 CHANGELOG.md | 2 ++
 SUPPORT.md   | 6 ++
 xen/arch/arm/Kconfig | 5 +
 3 files changed, 13 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5bfd3aa5c0..a137fce576 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -20,6 +20,8 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
- Bus-lock detection, used by Xen to mitigate (by rate-limiting) the system
  wide impact of a guest misusing atomic instructions.
  - xl/libxl can customize SMBIOS strings for HVM guests.
+ - On Arm, support for dynamic addition/removal of Xen device tree nodes using
+   a device tree overlay binary(.dtbo).
 
 ## 
[4.17.0](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.17.0) 
- 2022-12-12
 
diff --git a/SUPPORT.md b/SUPPORT.md
index 6dbed9d5d0..6b27d43cc6 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -832,6 +832,12 @@ No support for QEMU backends in a 16K or 64K domain.
 
 Status: Supported
 
+### Device Tree Overlays
+
+Add/Remove device tree nodes using a device tree overlay binary(.dtbo).
+
+Status, ARM: Experimental
+
 ### ARM: Guest ACPI support
 
 Status: Supported
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 239d3aed3c..1fe3d698a5 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -53,6 +53,11 @@ config HAS_ITS
 bool "GICv3 ITS MSI controller support (UNSUPPORTED)" if UNSUPPORTED
 depends on GICV3 && !NEW_VGIC && !ARM_32
 
+config OVERLAY_DTB
+   bool "DTB overlay support (UNSUPPORTED)" if UNSUPPORTED
+   help
+ Dynamic addition/removal of Xen device tree nodes using a dtbo.
+
 config HVM
 def_bool y
 
-- 
2.17.1

[XEN][PATCH v7 08/19] xen/device-tree: Add device_tree_find_node_by_path() to find nodes in device tree

2023-06-01 Thread Vikram Garhwal

Add device_tree_find_node_by_path() to find a matching node with path for a
dt_device_node.

Reason behind this function:
Each time overlay nodes are added using .dtbo, a new fdt(memcpy of
device_tree_flattened) is created and updated with overlay nodes. This
updated fdt is further unflattened to a dt_host_new. Next, we need to find
the overlay nodes in dt_host_new, find the overlay node's parent in dt_host
and add the nodes as child under their parent in the dt_host. Thus we need
this function to search for node in different unflattened device trees.

Also, make dt_find_node_by_path() static inline.

Signed-off-by: Vikram Garhwal 

---
Changes from v6:
Rename "dt_node" to "from"
---
 xen/common/device_tree.c  |  6 --
 xen/include/xen/device_tree.h | 18 --
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 16b4b4e946..c5250a1644 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -358,11 +358,13 @@ struct dt_device_node *dt_find_node_by_type(struct 
dt_device_node *from,
 return np;
 }
 
-struct dt_device_node *dt_find_node_by_path(const char *path)
+struct dt_device_node *
+device_tree_find_node_by_path(struct dt_device_node *from,
+  const char *path)
 {
 struct dt_device_node *np;
 
-dt_for_each_device_node(dt_host, np)
+dt_for_each_device_node(from, np)
 if ( np->full_name && (dt_node_cmp(np->full_name, path) == 0) )
 break;
 
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index 2c35c0d391..e239f7de26 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -561,13 +561,27 @@ struct dt_device_node *dt_find_node_by_type(struct 
dt_device_node *from,
 struct dt_device_node *dt_find_node_by_alias(const char *alias);
 
 /**
- * dt_find_node_by_path - Find a node matching a full DT path
+ * device_tree_find_node_by_path - Generic function to find a node matching the
+ * full DT path for any given unflatten device tree
+ * @from: The device tree node to start searching from
  * @path: The full path to match
  *
  * Returns a node pointer.
  */
-struct dt_device_node *dt_find_node_by_path(const char *path);
+struct dt_device_node *
+device_tree_find_node_by_path(struct dt_device_node *from,
+  const char *path);
 
+/**
+ * dt_find_node_by_path - Find a node matching a full DT path in dt_host
+ * @path: The full path to match
+ *
+ * Returns a node pointer.
+ */
+static inline struct dt_device_node *dt_find_node_by_path(const char *path)
+{
+return device_tree_find_node_by_path(dt_host, path);
+}
 
 /**
  * dt_find_node_by_gpath - Same as dt_find_node_by_path but retrieve the
-- 
2.17.1

[XEN][PATCH v7 06/19] libfdt: Keep fdt functions after init for CONFIG_OVERLAY_DTB.

2023-06-01 Thread Vikram Garhwal

This is done to access fdt library function which are required for adding device
tree overlay nodes for dynamic programming of nodes.

Signed-off-by: Vikram Garhwal 
Acked-by: Julien Grall 
---
 xen/common/libfdt/Makefile | 4 
 1 file changed, 4 insertions(+)

diff --git a/xen/common/libfdt/Makefile b/xen/common/libfdt/Makefile
index 75aaefa2e3..d50487aa6e 100644
--- a/xen/common/libfdt/Makefile
+++ b/xen/common/libfdt/Makefile
@@ -1,7 +1,11 @@
 include $(src)/Makefile.libfdt
 
 SECTIONS := text data $(SPECIAL_DATA_SECTIONS)
+
+# For CONFIG_OVERLAY_DTB, libfdt functionalities will be needed during runtime.
+ifneq ($(CONFIG_OVERLAY_DTB),y)
 OBJCOPYFLAGS := $(foreach s,$(SECTIONS),--rename-section .$(s)=.init.$(s))
+endif
 
 obj-y += libfdt.o
 nocov-y += libfdt.o
-- 
2.17.1

[XEN][PATCH v7 02/19] common/device_tree.c: unflatten_device_tree() propagate errors

2023-06-01 Thread Vikram Garhwal

This will be useful in dynamic node programming when new dt nodes are unflattend
during runtime. Invalid device tree node related errors should be propagated
back to the caller.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Henry Wang 
---
 xen/common/device_tree.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index dfdb10e488..117b09 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -2108,6 +2108,9 @@ static int __init __unflatten_device_tree(const void *fdt,
 /* First pass, scan for size */
 start = ((unsigned long)fdt) + fdt_off_dt_struct(fdt);
 size = unflatten_dt_node(fdt, 0, &start, NULL, NULL, 0);
+if ( !size )
+return -EINVAL;
+
 size = (size | 3) + 1;
 
 dt_dprintk("  size is %#lx allocating...\n", size);
@@ -2125,11 +2128,19 @@ static int __init __unflatten_device_tree(const void 
*fdt,
 start = ((unsigned long)fdt) + fdt_off_dt_struct(fdt);
 unflatten_dt_node(fdt, mem, &start, NULL, &allnextp, 0);
 if ( be32_to_cpup((__be32 *)start) != FDT_END )
-printk(XENLOG_WARNING "Weird tag at end of tree: %08x\n",
+{
+printk(XENLOG_ERR "Weird tag at end of tree: %08x\n",
   *((u32 *)start));
+return -EINVAL;
+}
+
 if ( be32_to_cpu(((__be32 *)mem)[size / 4]) != 0xdeadbeef )
-printk(XENLOG_WARNING "End of tree marker overwritten: %08x\n",
+{
+printk(XENLOG_ERR "End of tree marker overwritten: %08x\n",
   be32_to_cpu(((__be32 *)mem)[size / 4]));
+return -EINVAL;
+}
+
 *allnextp = NULL;
 
 dt_dprintk(" <- unflatten_device_tree()\n");
-- 
2.17.1

[XEN][PATCH v7 04/19] common/device_tree: change __unflatten_device_tree() type

2023-06-01 Thread Vikram Garhwal

Following changes are done to __unflatten_device_tree():
1. __unflatten_device_tree() is renamed to unflatten_device_tree().
2. Remove __init and static function type.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Henry Wang 
---
 xen/common/device_tree.c  | 9 -
 xen/include/xen/device_tree.h | 5 +
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index bbdab07596..16b4b4e946 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -2083,7 +2083,7 @@ static unsigned long unflatten_dt_node(const void *fdt,
 }
 
 /**
- * __unflatten_device_tree - create tree of device_nodes from flat blob
+ * unflatten_device_tree - create tree of device_nodes from flat blob
  *
  * unflattens a device-tree, creating the
  * tree of struct device_node. It also fills the "name" and "type"
@@ -2092,8 +2092,7 @@ static unsigned long unflatten_dt_node(const void *fdt,
  * @fdt: The fdt to expand
  * @mynodes: The device_node tree created by the call
  */
-static int __init __unflatten_device_tree(const void *fdt,
-  struct dt_device_node **mynodes)
+int unflatten_device_tree(const void *fdt, struct dt_device_node **mynodes)
 {
 unsigned long start, mem, size;
 struct dt_device_node **allnextp = mynodes;
@@ -2230,10 +2229,10 @@ dt_find_interrupt_controller(const struct 
dt_device_match *matches)
 
 void __init dt_unflatten_host_device_tree(void)
 {
-int error = __unflatten_device_tree(device_tree_flattened, &dt_host);
+int error = unflatten_device_tree(device_tree_flattened, &dt_host);
 
 if ( error )
-panic("__unflatten_device_tree failed with error %d\n", error);
+panic("unflatten_device_tree failed with error %d\n", error);
 
 dt_alias_scan();
 }
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index c2eada7489..2c35c0d391 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -178,6 +178,11 @@ int device_tree_for_each_node(const void *fdt, int node,
  */
 void dt_unflatten_host_device_tree(void);
 
+/**
+ * unflatten any device tree.
+ */
+int unflatten_device_tree(const void *fdt, struct dt_device_node **mynodes);
+
 /**
  * IRQ translation callback
  * TODO: For the moment we assume that we only have ONE
-- 
2.17.1

[XEN][PATCH v7 00/19] dynamic node programming using overlay dtbo

2023-06-01 Thread Vikram Garhwal

Hi,
This patch series is for introducing dynamic programming i.e. add/remove the
devices during run time. Using "xl dt_overlay" a device can be added/removed
with dtbo.

For adding a node using dynamic programming:
1. flatten device tree overlay node will be added to a fdt
2. Updated fdt will be unflattened to a new dt_host_new
3. Extract the newly added node information from dt_host_new
4. Add the added node under correct parent in original dt_host.
3. Map/Permit interrupt and iomem region as required.

For removing a node:
1. Find the node with given path.
2. Check if the node is used by any of domus. Removes the node only when
it's not used by any domain.
3. Removes IRQ permissions and MMIO access.
5. Find the node in dt_host and delete the device node entry from dt_host.
6. Free the overlay_tracker entry which means free dt_host_new also(created
in adding node step).

The main purpose of this series to address first part of the dynamic programming
i.e. making Xen aware of new device tree node which means updating the dt_host
with overlay node information. Here we are adding/removing node from dt_host,
and checking/set IOMMU and IRQ permission but never mapping them to any domain.
Right now, mapping/Un-mapping will happen only when a new domU is
created/destroyed using "xl create".

To map IOREQ and IOMMU during runtime, there will be another small series after
this one where we will do the actual IOMMU and IRQ mapping to a running domain
and will call unmap_mmio_regions() to remove the mapping.

Change Log:
 v5 -> v6:
Add separate patch for memory allocation failure in 
__unflatten_device_tree().
Move __unflatten_device_tree() function type changes to single patch.
Add error propagation for failures in unflatten_dt_node.
Change CONFIG_OVERLAY_DTB status to "ARM: Tech Preview".
xen/smmu: Add remove_device callback for smmu_iommu ops:
Added check to see if device is currently used.
common/device_tree: Add rwlock for dt_host:
Addressed feedback from Henry to rearrange code.
xen/arm: Implement device tree node removal functionalities:
Changed file name to dash format.
Addressed Michal's comments.
Rectified formatting related errors pointed by Michal.

 v4 -> v5:
Split patch 01/16 to two patches. One with function type changes and another
with changes inside unflatten_device_tree().
Change dt_overlay xl command to dt-overlay.
Protect overlay functionality with CONFIG(arm).
Fix rwlock issues.
Move include "device_tree.h" to c file where arch_cpu_init() is called and
forward declare dt_device_node. This was done to avoid circular deps b/w
device_tree.h and rwlock.h
Address Michal's comment on coding style.

 v3 -> v4:
Add support for adding node's children.
Add rwlock to dt_host functions.
Corrected fdt size issue when applying overlay into it.
Add memory allocation fail handling for unflatten_device_tree().
Changed xl overlay to xl dt_overlay.
Correct commit messages.
Addressed code issue from v3 review.

 v2 -> v3:
Moved overlay functionalities to dt_overlay.c file.
Renamed XEN_SYSCTL_overlay to XEN_SYSCTL_dt_overlay.
Add dt_* prefix to overlay_add/remove_nodes.
Added dtdevs_lock to protect iommu_add_dt_device().
For iommu, moved spin_lock to caller.
Address code issue from v2 review.

 v1 -> v2:
Add support for multiple node addition/removal using dtbo.
Replaced fpga-add and fpga-remove with one hypercall overlay_op.
Moved common domain_build.c function to device.c
Add OVERLAY_DTB configuration.
Renamed overlay_get_target() to fdt_overlay_get_target().
Split remove_device patch into two patches.
Moved overlay_add/remove code to sysctl and changed it from domctl to 
sysctl.
Added all overlay code under CONFIG_OVERLAY_DTB
Renamed all tool domains fpga function to overlay
Addressed code issues from v1 review.

Regards,
Vikram

Vikram Garhwal (19):
  common/device_tree: handle memory allocation failure in
__unflatten_device_tree()
  common/device_tree.c: unflatten_device_tree() propagate errors
  xen/arm/device: Remove __init from function type
  common/device_tree: change __unflatten_device_tree() type
  xen/arm: Add CONFIG_OVERLAY_DTB
  libfdt: Keep fdt functions after init for CONFIG_OVERLAY_DTB.
  libfdt: overlay: change overlay_get_target()
  xen/device-tree: Add device_tree_find_node_by_path() to find nodes in
device tree
  xen/iommu: Move spin_lock from iommu_dt_device_is_assigned to caller
  xen/iommu: protect iommu_add_dt_device() with dtdevs_lock
  xen/iommu: Introduce iommu_remove_dt_device()
  xen/smmu: Add remove_device callback for smmu_iommu ops
  asm/smp.h: Fix circular dependency for device_tree.h and rwlock.h
  common/device_tree: Add rwlock for dt_host
  xen/arm: Implement device tree node removal functionalities
  xen/arm: Implement de

Re: [ANNOUNCE] KVM Microconference at LPC 2023

2023-06-01 Thread Sean Christopherson

On Thu, Jun 01, 2023, Mickaï¿½l Salaï¿½n wrote:
> Hi,
> 
> What is the status of this microconference proposal? We'd be happy to talk
> about Heki [1] and potentially other hypervisor supports.

Proposal submitted (deadline is/was today), now we wait :-)  IIUC, we should 
find
out rather quickly whether or not the KVM MC is a go.

[ovmf test] 181091: all pass - PUSHED

2023-06-01 Thread osstest service owner

flight 181091 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181091/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 41abf00bf98e36830974bd669ab7ec3679bd5e67
baseline version:
 ovmf ded1d5414b5a0161de8fcf234b7fb200fb59fb2c

Last test of basis   181087  2023-06-01 17:10:49 Z0 days
Testing same since   181091  2023-06-01 19:43:55 Z0 days1 attempts


People who touched revisions under test:
  Anthony PERARD 
  Ard Biesheuvel 
  Corvin Köhne 
  Gerd Hoffmann 
  Pedro Falcato 
  Peter Grehan 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   ded1d5414b..41abf00bf9  41abf00bf98e36830974bd669ab7ec3679bd5e67 -> 
xen-tested-master

Re: [ANNOUNCE] KVM Microconference at LPC 2023

2023-06-01 Thread Mickaël Salaün

Hi,

What is the status of this microconference proposal? We'd be happy to 
talk about Heki [1] and potentially other hypervisor supports.

Regards,
 Mickaël

[1] https://lore.kernel.org/all/20230505152046.6575-1-...@digikod.net/

On 26/05/2023 18:09, Mickaël Salaün wrote:

See James Morris's proposal here:
https://lore.kernel.org/all/17f62cb1-a5de-2020-2041-359b8e96b...@linux.microsoft.com/

On 26/05/2023 04:36, James Morris wrote:
  > [Side topic]
  >
  > Would folks be interested in a Linux Plumbers Conference MC on this
  > topic generally, across different hypervisors, VMMs, and architectures?
  >
  > If so, please let me know who the key folk would be and we can try
writing
  > up an MC proposal.

The fine-grain memory management proposal from James Gowans looks
interesting, especially the "side-car" virtual machines:
https://lore.kernel.org/all/88db2d9cb42e471692ff1feb0b9ca855906a9d95.ca...@amazon.com/

On 09/05/2023 11:55, Paolo Bonzini wrote:

Hi all!

We are planning on submitting a CFP to host a KVM Microconference at
Linux Plumbers Conference 2023. To help justify the proposal, we would
like to gather a list of folks that would likely attend, and crowdsource
a list of topics to include in the proposal.

For both this year and future years, the intent is that a KVM
Microconference will complement KVM Forum, *NOT* supplant it. As you
probably noticed, KVM Forum is going through a somewhat radical change in
how it's organized; the conference is now free and (with some help from
Red Hat) organized directly by the KVM and QEMU communities. Despite the
unexpected changes and some teething pains, community response to KVM
Forum continues to be overwhelmingly positive! KVM Forum will remain
the venue of choice for KVM/userspace collaboration, for educational
content covering both KVM and userspace, and to discuss new features in
QEMU and other userspace projects.

At least on the x86 side, however, the success of KVM Forum led us
virtualization folks to operate in relative isolation. KVM depends on
and impacts multiple subsystems (MM, scheduler, perf) in profound ways,
and recently we’ve seen more and more ideas/features that require
non-trivial changes outside KVM and buy-in from stakeholders that
(typically) do not attend KVM Forum. Linux Plumbers Conference is a
natural place to establish such collaboration within the kernel.

Therefore, the aim of the KVM Microconference will be:
* to provide a setting in which to discuss KVM and kernel internals
* to increase collaboration and reduce friction with other subsystems
* to discuss system virtualization issues that require coordination with
other subsystems (such as VFIO, or guest support in arch/)

Below is a rough draft of the planned CFP submission.

Thanks!

Paolo Bonzini (KVM Maintainer)
Sean Christopherson (KVM x86 Co-Maintainer)
Marc Zyngier (KVM ARM Co-Maintainer)

===
KVM Microconference
===

KVM (Kernel-based Virtual Machine) enables the use of hardware features
to improve the efficiency, performance, and security of virtual machines
created and managed by userspace.  KVM was originally developed to host
and accelerate "full" virtual machines running a traditional kernel and
operating system, but has long since expanded to cover a wide array of use
cases, e.g. hosting real time workloads, sandboxing untrusted workloads,
deprivileging third party code, reducing the trusted computed base of
security sensitive workloads, etc.  As KVM's use cases have grown, so too
have the requirements placed on KVM and the interactions between it and
other kernel subsystems.

The KVM Microconference will focus on how to evolve KVM and adjacent
subsystems in order to satisfy new and upcoming requirements: serving
guest memory that cannot be accessed by host userspace[1], providing
accurate, feature-rich PMU/perf virtualization in cloud VMs[2], etc.

Potential Topics:
 - Serving inaccessible/unmappable memory for KVM guests (protected VMs)
 - Optimizing mmu_notifiers, e.g. reducing TLB flushes and spurious zapping
 - Supporting multiple KVM modules (for non-disruptive upgrades)
 - Improving and hardening KVM+perf interactions
 - Implementing arch-agnostic abstractions in KVM (e.g. MMU)
 - Defining KVM requirements for hardware vendors
 - Utilizing "fault" injection to increase test coverage of edge cases
 - KVM vs VFIO (e.g. memory types, a rather hot topic on the ARM side)

Key Attendees:
 - Paolo Bonzini  (KVM Maintainer)
 - Sean Christopherson   (KVM x86 Co-Maintainer)
 - Your name could be here!

[1] 
https://lore.kernel.org/all/20221202061347.1070246-1-chao.p.p...@linux.intel.com
[2] 
https://lore.kernel.org/all/CALMp9eRBOmwz=mspp0m5q093k3rmueasf3vel39mgv5br9w...@mail.gmail.com

[PATCH 1/2] xen-blkback: Implement diskseq checks

2023-06-01 Thread Demi Marie Obenour

This allows specifying a disk sequence number in XenStore.  If it does
not match the disk sequence number of the underlying device, the device
will not be exported and a warning will be logged.  Userspace can use
this to eliminate race conditions due to major/minor number reuse.
Old kernels do not support the new syntax, but a later patch will allow
userspace to discover that the new syntax is supported.

Signed-off-by: Demi Marie Obenour 
---
 drivers/block/xen-blkback/xenbus.c | 112 +++--
 1 file changed, 89 insertions(+), 23 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 
4807af1d58059394d7a992335dabaf2bc3901721..9c3eb148fbd802c74e626c3d7bcd69dcb09bd921
 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -24,6 +24,7 @@ struct backend_info {
struct xenbus_watch backend_watch;
unsignedmajor;
unsignedminor;
+   unsigned long long  diskseq;
char*mode;
 };
 
@@ -479,7 +480,7 @@ static void xen_vbd_free(struct xen_vbd *vbd)
 
 static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
  unsigned major, unsigned minor, int readonly,
- int cdrom)
+ bool cdrom, u64 diskseq)
 {
struct xen_vbd *vbd;
struct block_device *bdev;
@@ -507,6 +508,26 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
xen_vbd_free(vbd);
return -ENOENT;
}
+
+   if (diskseq) {
+   struct gendisk *disk = bdev->bd_disk;
+
+   if (unlikely(disk == NULL)) {
+   pr_err("%s: device %08x has no gendisk\n",
+  __func__, vbd->pdevice);
+   xen_vbd_free(vbd);
+   return -EFAULT;
+   }
+
+   if (unlikely(disk->diskseq != diskseq)) {
+   pr_warn("%s: device %08x has incorrect sequence "
+   "number 0x%llx (expected 0x%llx)\n",
+   __func__, vbd->pdevice, disk->diskseq, diskseq);
+   xen_vbd_free(vbd);
+   return -ENODEV;
+   }
+   }
+
vbd->size = vbd_sz(vbd);
 
if (cdrom || disk_to_cdi(vbd->bdev->bd_disk))
@@ -707,6 +728,9 @@ static void backend_changed(struct xenbus_watch *watch,
int cdrom = 0;
unsigned long handle;
char *device_type;
+   char *diskseq_str = NULL;
+   int diskseq_len;
+   unsigned long long diskseq;
 
pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
 
@@ -725,10 +749,46 @@ static void backend_changed(struct xenbus_watch *watch,
return;
}
 
-   if (be->major | be->minor) {
-   if (be->major != major || be->minor != minor)
-   pr_warn("changing physical device (from %x:%x to %x:%x) 
not supported.\n",
-   be->major, be->minor, major, minor);
+   diskseq_str = xenbus_read(XBT_NIL, dev->nodename, "diskseq", 
&diskseq_len);
+   if (IS_ERR(diskseq_str)) {
+   int err = PTR_ERR(diskseq_str);
+   diskseq_str = NULL;
+
+   /*
+* If this does not exist, it means legacy userspace that does 
not
+* support diskseq.
+*/
+   if (unlikely(!XENBUS_EXIST_ERR(err))) {
+   xenbus_dev_fatal(dev, err, "reading diskseq");
+   return;
+   }
+   diskseq = 0;
+   } else if (diskseq_len <= 0) {
+   xenbus_dev_fatal(dev, -EFAULT, "diskseq must not be empty");
+   goto fail;
+   } else if (diskseq_len > 16) {
+   xenbus_dev_fatal(dev, -ERANGE, "diskseq too long: got %d but 
limit is 16",
+diskseq_len);
+   goto fail;
+   } else if (diskseq_str[0] == '0') {
+   xenbus_dev_fatal(dev, -ERANGE, "diskseq must not start with 
'0'");
+   goto fail;
+   } else {
+   char *diskseq_end;
+   diskseq = simple_strtoull(diskseq_str, &diskseq_end, 16);
+   if (diskseq_end != diskseq_str + diskseq_len) {
+   xenbus_dev_fatal(dev, -EINVAL, "invalid diskseq");
+   goto fail;
+   }
+   kfree(diskseq_str);
+   diskseq_str = NULL;
+   }
+
+   if (be->major | be->minor | be->diskseq) {
+   if (be->major != major || be->minor != minor || be->diskseq != 
diskseq)
+   pr_warn("changing physical device (from %x:%x:%llx to 
%x:%x:%llx)"
+   " not supported.\n",
+   be->major, be->minor, be->diskseq, majo

[PATCH 2/2] xen-blkback: Inform userspace that device has been opened

2023-06-01 Thread Demi Marie Obenour

Set "opened" to "0" before the hotplug script is called.  Once the
device node has been opened, set "opened" to "1".

"opened" is used exclusively by userspace.  It serves two purposes:

1. It tells userspace that the diskseq Xenstore entry is supported.

2. It tells userspace that it can wait for "opened" to be set to 1.
   Once "opened" is 1, blkback has a reference to the device, so
   userspace doesn't need to keep one.

Together, these changes allow userspace to use block devices with
delete-on-close behavior, such as loop devices with the autoclear flag
set.

Signed-off-by: Demi Marie Obenour 
---
 drivers/block/xen-blkback/xenbus.c | 35 ++
 1 file changed, 35 insertions(+)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 
9c3eb148fbd802c74e626c3d7bcd69dcb09bd921..519a78aa9073d1faa1dce5c1b36e95ae58da534b
 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -3,6 +3,20 @@
 Copyright (C) 2005 Rusty Russell 
 Copyright (C) 2005 XenSource Ltd
 
+In addition to the Xenstore nodes required by the Xen block device
+specification, this implementation of blkback uses a new Xenstore
+node: "opened".  blkback sets "opened" to "0" before the hotplug script
+is called.  Once the device node has been opened, blkback sets "opened"
+to "1".
+
+"opened" is read exclusively by userspace.  It serves two purposes:
+
+1. It tells userspace that diskseq@major:minor syntax for "physical-device" is
+   supported.
+
+2. It tells userspace that it can wait for "opened" to be set to 1 after 
writing
+   "physical-device".  Once "opened" is 1, blkback has a reference to the
+   device, so userspace doesn't need to keep one.
 
 */
 
@@ -699,6 +713,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
if (err)
pr_warn("%s write out 'max-ring-page-order' failed\n", 
__func__);
 
+   /*
+* This informs userspace that the "opened" node will be set to "1" when
+* the device has been opened successfully.
+*/
+   err = xenbus_write(XBT_NIL, dev->nodename, "opened", "0");
+   if (err)
+   goto fail;
+
err = xenbus_switch_state(dev, XenbusStateInitWait);
if (err)
goto fail;
@@ -826,6 +848,19 @@ static void backend_changed(struct xenbus_watch *watch,
goto fail;
}
 
+   /*
+* Tell userspace that the device has been opened and that blkback has a
+* reference to it.  Userspace can then close the device or mark it as
+* delete-on-close, knowing that blkback will keep the device open as
+* long as necessary.
+*/
+   err = xenbus_write(XBT_NIL, dev->nodename, "opened", "1");
+   if (err) {
+   xenbus_dev_fatal(dev, err, "%s: notifying userspace device has 
been opened",
+dev->nodename);
+   goto free_vbd;
+   }
+
err = xenvbd_sysfs_addif(dev);
if (err) {
xenbus_dev_fatal(dev, err, "creating sysfs entries");
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[xen-unstable test] 181079: tolerable FAIL - PUSHED

2023-06-01 Thread osstest service owner

flight 181079 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181079/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail in 181061 pass 
in 181079
 test-amd64-amd64-pair 28 guest-migrate/dst_host/src_host/debian.repeat fail in 
181061 pass in 181079
 test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-install fail in 181061 pass in 
181079
 test-amd64-i386-qemuu-rhel6hvm-intel 14 guest-start/redhat.repeat fail pass in 
181061
 test-amd64-amd64-xl-qcow221 guest-start/debian.repeat  fail pass in 181061
 test-amd64-i386-xl-vhd   21 guest-start/debian.repeat  fail pass in 181061

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 181027
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 181027
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 181027
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 181027
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 181027
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 181027
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 181027
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 181027
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 181027
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 181027
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 181027
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 181027
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-

[PATCH 0/2] xen/blkback: support delete-on-close block devices

2023-06-01 Thread Demi Marie Obenour

These two patches allow userspace to provide an expected diskseq of a
block device and discover when blkback has opened the device.  Together,
these features allow using blkback with delete-on-close block devices,
such as loop devices with autoclear set.

Demi Marie Obenour (2):
  xen-blkback: Implement diskseq checks
  xen-blkback: Inform userspace that device has been opened

 drivers/block/xen-blkback/xenbus.c | 147 -
 1 file changed, 124 insertions(+), 23 deletions(-)

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Re: [PATCH v3 03/34] s390: Use pt_frag_refcount for pagetables

2023-06-01 Thread Vishal Moola

On Thu, Jun 1, 2023 at 6:19 AM Gerald Schaefer
 wrote:
>
>  On Wed, 31 May 2023 14:30:01 -0700
> "Vishal Moola (Oracle)"  wrote:
>
> > s390 currently uses _refcount to identify fragmented page tables.
> > The page table struct already has a member pt_frag_refcount used by
> > powerpc, so have s390 use that instead of the _refcount field as well.
> > This improves the safety for _refcount and the page table tracking.
> >
> > This also allows us to simplify the tracking since we can once again use
> > the lower byte of pt_frag_refcount instead of the upper byte of _refcount.
>
> This would conflict with s390 impact of pte_free_defer() work from Hugh 
> Dickins
> https://lore.kernel.org/lkml/35e983f5-7ed3-b310-d949-9ae8b130c...@google.com/
> https://lore.kernel.org/lkml/6dd63b39-e71f-2e8b-7e0-83e02f3bc...@google.com/
>
> There he uses pt_frag_refcount, or rather pt_mm in the same union, to save
> the mm_struct for deferred pte_free().
>
> I still need to look closer into both of your patch series, but so far it
> seems that you have no hard functional requirement to switch from _refcount
> to pt_frag_refcount here, for s390.
>
> If this is correct, and you do not e.g. need this to make some other use
> of _refcount, I would suggest to drop this patch.

The goal of this preparation patch is to consolidate s390's usage of
struct page fields so that struct ptdesc can be smaller. Its not particularly
mandatory; leaving _refcount in ptdesc only increases the struct by
8 bytes and can always be changed later.

However it is a little annoying since s390 is the only architecture
that egregiously uses space throughout struct page for their page
tables, rather than just the page table struct. For example, s390
gmap uses page->index which also aliases with pt_mm and
pt_frag_refcount. I'm not sure if/how gmap page tables interact
with s390 process page tables at all, but if it does that could
potentially cause problems with Hugh's patch as well :(

I can add _refcount to ptdesc if we would like, but I still
prefer if s390 could be simplified instead.

Re: [PATCH v6 5/6] xen/riscv: introduce an implementation of macros from

2023-06-01 Thread Oleksii

On Thu, 2023-06-01 at 09:59 +0200, Jan Beulich wrote:
> On 31.05.2023 22:06, Oleksii wrote:
> > On Tue, 2023-05-30 at 18:00 +0200, Jan Beulich wrote:
> > > > +static uint32_t read_instr(unsigned long pc)
> > > > +{
> > > > +    uint16_t instr16 = *(uint16_t *)pc;
> > > > +
> > > > +    if ( GET_INSN_LENGTH(instr16) == 2 )
> > > > +    return (uint32_t)instr16;
> > > > +    else
> > > > +    return *(uint32_t *)pc;
> > > > +}
> > > 
> > > As long as this function is only used on Xen code, it's kind of
> > > okay.
> > > There you/we control whether code can change behind our backs.
> > > But as
> > > soon as you might use this on guest code, the double read is
> > > going to
> > > be a problem
> > Will it be enough to add a comment that read_instr() should be used
> > only on Xen code? Or it is needed to introduce some lock?
> 
> A comment will do for now. A lock would be problematic: It won't help
> when the function is used on non-Xen code, and since you use this in
> exception handling you may deadlock unless you carefully use a
> recursive lock.
Then I'll add a comment.

> 
> > > (I think; I wonder how hardware is supposed to deal with
> > > the situation: Maybe they indeed fetch in 16-bit quantities?).
> > I thought that it reads amount of bytes corresponded to i-cache
> > size
> > and then the pipeline tracks whether an instruction is 16  or 32
> > bit.
> 
> And what if an insn spans a cacheline boundary?
I think it is CPU specific, but your original assumption ( about 16-bit
fetching ) was probably right.

In RISC-V ISA doc [1] I found the following in chapter 1.2:
 The base RISC-V ISA has fixed-length 32-bit instructions that must be
naturally aligned on 32-bit boundaries. However, the standard RISC-V 
encoding scheme is designed to support ISA extensions with variable-
length instructions, where each instruction can be any number of 16-bit
instruction parcels in length, and parcels are naturally aligned on 16-
bit boundaries. The standard compressed ISA extension described in 
Chapter 12 reduces code size by providing compressed 16-bit
instructions and relaxes the alignment constraints to allow all
instructions (16 bit and 32 bit) to be aligned on any 16-bit boundary
to improve code density.

It sounds like h/w reads 16-bit and then based on the first bits
decides if it is needed to read more 16-bit parcels.

[1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf

[ovmf test] 181087: all pass - PUSHED

2023-06-01 Thread osstest service owner

flight 181087 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181087/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf ded1d5414b5a0161de8fcf234b7fb200fb59fb2c
baseline version:
 ovmf 15f83fa36442eaa272300b31699b3b82ce7e07a9

Last test of basis   181081  2023-06-01 13:15:20 Z0 days
Testing same since   181087  2023-06-01 17:10:49 Z0 days1 attempts


People who touched revisions under test:
  Neil Jones 
  Sami Mujawar 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   15f83fa364..ded1d5414b  ded1d5414b5a0161de8fcf234b7fb200fb59fb2c -> 
xen-tested-master

[qemu-mainline test] 181068: regressions - FAIL

2023-06-01 Thread osstest service owner

flight 181068 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181068/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 180691
 test-amd64-amd64-libvirt-pair 30 leak-check/check/src_host fail REGR. vs. 
180691
 test-amd64-amd64-libvirt-pair 31 leak-check/check/dst_host fail REGR. vs. 
180691
 test-amd64-i386-libvirt  23 leak-check/check fail REGR. vs. 180691
 test-amd64-amd64-libvirt-xsm 23 leak-check/check fail REGR. vs. 180691
 build-arm64-xsm   6 xen-buildfail REGR. vs. 180691
 build-arm64   6 xen-buildfail REGR. vs. 180691
 test-amd64-amd64-libvirt 23 leak-check/check fail REGR. vs. 180691
 test-amd64-i386-libvirt-xsm  23 leak-check/check fail REGR. vs. 180691
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 21 leak-check/check fail 
REGR. vs. 180691
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 21 leak-check/check fail 
REGR. vs. 180691
 test-amd64-amd64-libvirt-vhd 22 leak-check/check fail REGR. vs. 180691
 test-amd64-i386-xl-vhd   24 leak-check/check fail REGR. vs. 180691
 test-amd64-i386-libvirt-raw  22 leak-check/check fail REGR. vs. 180691
 test-amd64-amd64-xl-qcow224 leak-check/check fail REGR. vs. 180691
 test-armhf-armhf-libvirt 21 leak-check/check fail REGR. vs. 180691
 test-armhf-armhf-libvirt-qcow2 20 leak-check/check   fail REGR. vs. 180691
 test-armhf-armhf-libvirt-raw 20 leak-check/check fail REGR. vs. 180691
 test-armhf-armhf-xl-vhd  20 leak-check/check fail REGR. vs. 180691

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180691
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180691
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180691
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180691
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180691
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180691
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180691
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180691
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm

Re: [PATCH v2] iscsi_ibft: Fix finding the iBFT under Xen Dom 0

2023-06-01 Thread Andrew Cooper

On 01/06/2023 6:08 pm, Dave Hansen wrote:
> On 6/1/23 09:57, Dave Hansen wrote:
>> On 5/30/23 08:01, Ross Lagerwall wrote:
>>> Since firmware doesn't indicate the iBFT in the E820, add a reserved
>>> region so that it gets identity mapped when running as Dom 0 so that it
>>> is possible to search for it. Move the call to reserve_ibft_region()
>>> later so that it is called after the Xen identity mapping adjustments
>>> are applied.
> Oh, and one more thing:
>
> What is the end user visible effect of this problem and of your solution?
>
> Do Xen Dom 0 systems fail to find their boot iSCSI volume and refuse to
> boot?  I take it after this patch that they can boot again.
>

Yeah, this isn't as clear as it could be.  In short...

The iBFT suffers from the same problem as legacy ACPI RDSP.  You've got
to search lowmem for a magic marker to find it.

Xen dom0 is just a VM with root-like perms.  Anything it wants an
identity map of, it has to ask for.  And because dom0 is commonly
sharing ownership of hardware, it requests identity mappings for
everything reserved in the E820 table.

The consequence of not having this patch is that if you try iSCSI boot
under Xen, dom0 can't find it's filesystem, because it can't get at the
iSCSI initiator.

~Andrew

[PATCH] x86: Add Kconfig option to require NX bit support

2023-06-01 Thread Alejandro Vallejo

This allows replacing many instances of runtime checks with folded
constants. The patch asserts support for the NX bit in PTEs at boot time
and if so short-circuits cpu_has_nx to 1. This has several knock-on effects
that improve codegen:
  * _PAGE_NX matches _PAGE_NX_BIT, optimising the macro to a constant.
  * Many PAGE_HYPERVISOR_X are also folded into constants
  * A few if ( cpu_has_nx ) statements are optimised out

We save 2.5KiB off the text section and remove the runtime dependency for
applying NX, which hardens our security posture. The config option defaults
to OFF for compatibility with previous behaviour.

Signed-off-by: Alejandro Vallejo 
---
 xen/arch/x86/Kconfig  | 10 ++
 xen/arch/x86/boot/head.S  | 19 ---
 xen/arch/x86/boot/trampoline.S|  3 ++-
 xen/arch/x86/efi/efi-boot.h   |  9 +
 xen/arch/x86/include/asm/cpufeature.h |  3 ++-
 5 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 406445a358..0983915372 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -307,6 +307,16 @@ config MEM_SHARING
bool "Xen memory sharing support (UNSUPPORTED)" if UNSUPPORTED
depends on HVM
 
+config REQUIRE_NX_BIT
+   def_bool n
+   prompt "Require NX bit support"
+   ---help---
+ Makes Xen require NX bit support on page table entries. This
+ allows the resulting code to have folded constants where
+ otherwise branches are required, yielding a smaller binary as a
+ result. Requiring NX trades compatibility with older CPUs for
+ improvements in speed and code size.
+
 endmenu
 
 source "common/Kconfig"
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 09bebf8635..8414266281 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -123,6 +123,7 @@ multiboot2_header:
 .Lbad_ldr_nih: .asciz "ERR: EFI ImageHandle is not provided by bootloader!"
 .Lbad_efi_msg: .asciz "ERR: EFI IA-32 platforms are not supported!"
 .Lbag_alg_msg: .asciz "ERR: Xen must be loaded at a 2Mb boundary!"
+.Lno_nx_bit_msg: .asciz "ERR: Not an NX-bit capable CPU!"
 
 .section .init.data, "aw", @progbits
 .align 4
@@ -151,6 +152,11 @@ not_multiboot:
 .Lnot_aligned:
 add $sym_offs(.Lbag_alg_msg),%esi   # Error message
 jmp .Lget_vtb
+#if IS_ENABLED(CONFIG_REQUIRE_NX_BIT)
+no_nx_bit:
+add $sym_offs(.Lno_nx_bit_msg),%esi   # Error message
+jmp .Lget_vtb
+#endif
 .Lmb2_no_st:
 /*
  * Here we are on EFI platform. vga_text_buffer was zapped earlier
@@ -647,11 +653,18 @@ trampoline_setup:
 cpuid
 1:  mov %edx, CPUINFO_FEATURE_OFFSET(X86_FEATURE_LM) + 
sym_esi(boot_cpu_data)
 
-/* Check for NX. Adjust EFER setting if available. */
+/*
+ * Check for NX:
+ *   - If Xen was compiled requiring it simply assert it's
+ * supported. The trampoline already has the right constant.
+ *   - Otherwise, update the trampoline EFER mask accordingly.
+ */
 bt  $cpufeat_bit(X86_FEATURE_NX), %edx
-jnc 1f
+jnc no_nx_bit
+#if !IS_ENABLED(CONFIG_REQUIRE_NX_BIT)
 orb $EFER_NXE >> 8, 1 + sym_esi(trampoline_efer)
-1:
+no_nx_bit:
+#endif
 
 /* Check for availability of long mode. */
 bt  $cpufeat_bit(X86_FEATURE_LM),%edx
diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S
index c6005fa33d..b964031085 100644
--- a/xen/arch/x86/boot/trampoline.S
+++ b/xen/arch/x86/boot/trampoline.S
@@ -147,7 +147,8 @@ GLOBAL(trampoline_misc_enable_off)
 
 /* EFER OR-mask for boot paths.  SCE conditional on PV support, NX added when 
available. */
 GLOBAL(trampoline_efer)
-.long   EFER_LME | (EFER_SCE * IS_ENABLED(CONFIG_PV))
+.long   EFER_LME | (EFER_SCE * IS_ENABLED(CONFIG_PV)) | \
+(EFER_NXE * IS_ENABLED(CONFIG_REQUIRE_NX_BIT))
 
 GLOBAL(trampoline_xen_phys_start)
 .long   0
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index c94e53d139..641d6996c9 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -751,6 +751,15 @@ static void __init efi_arch_cpu(void)
 {
 caps[FEATURESET_e1d] = cpuid_edx(0x8001);
 
+/*
+ * This check purposefully doesn't use cpu_has_nx because
+ * cpu_has_nx bypasses the boot_cpu_data read if Xen was compiled
+ * with CONFIG_REQUIRE_NX_BIT
+ */
+if ( IS_ENABLED(CONFIG_REQUIRE_NX_BIT) &&
+ !boot_cpu_has(X86_FEATURE_NX) )
+blexit(L"This Xen build requires NX bit support");
+
 if ( cpu_has_nx )
 trampoline_efer |= EFER_NXE;
 }
diff --git a/xen/arch/x86/include/asm/cpufeature.h 
b/xen/arch/x86/include/asm/cpufeature.h
index ace31e3b1f..166f480bbc 100644
--- a/xen/arch/x86/include/asm/cpu

Re: [PATCH v2] iscsi_ibft: Fix finding the iBFT under Xen Dom 0

2023-06-01 Thread Dave Hansen

On 6/1/23 09:57, Dave Hansen wrote:
> On 5/30/23 08:01, Ross Lagerwall wrote:
>> Since firmware doesn't indicate the iBFT in the E820, add a reserved
>> region so that it gets identity mapped when running as Dom 0 so that it
>> is possible to search for it. Move the call to reserve_ibft_region()
>> later so that it is called after the Xen identity mapping adjustments
>> are applied.

Oh, and one more thing:

What is the end user visible effect of this problem and of your solution?

Do Xen Dom 0 systems fail to find their boot iSCSI volume and refuse to
boot?  I take it after this patch that they can boot again.

Re: [PATCH v2] iscsi_ibft: Fix finding the iBFT under Xen Dom 0

2023-06-01 Thread Dave Hansen

On 5/30/23 08:01, Ross Lagerwall wrote:
> Since firmware doesn't indicate the iBFT in the E820, add a reserved
> region so that it gets identity mapped when running as Dom 0 so that it
> is possible to search for it. Move the call to reserve_ibft_region()
> later so that it is called after the Xen identity mapping adjustments
> are applied.
> 
> Finally, instead of using isa_bus_to_virt() which doesn't do the right
> thing under Xen, use early_memremap() like the dmi_scan code does.

This is connecting Xen, iSCSI and x86.  Some background here would be
*really* nice for dummies like me that deal heavily in only one of those
three.

One or two sentences like this:

Firmware can provide an iSCSI-specific table called the iBFT
which helps the OS boot from iSCSI devices.

can go a long way for dummies like me.  As could some background about
why this:

... add a reserved region so that it gets identity mapped when
running as Dom 0 so that it is possible to search for it.

These are all English words, but off the top of my head, I have no idea
why reserved regions get identity mapped when running as Dom 0 or why
that makes it possible to search.

The addresses and size here:

> +#ifdef CONFIG_ISCSI_IBFT_FIND
> + /* Reserve 0.5 MiB to 1 MiB region so iBFT can be found */
> + xen_e820_table.entries[xen_e820_table.nr_entries].addr = 
> 0x8;
> + xen_e820_table.entries[xen_e820_table.nr_entries].size = 
> 0x8;
> + xen_e820_table.entries[xen_e820_table.nr_entries].type = 
> E820_TYPE_RESERVED;
> + xen_e820_table.nr_entries++;
> +#endif

also appear to be conjured out of thin air.

As does the move of:

> + reserve_ibft_region();

I'm sure I can go figure this all out with some research.  But, I'd
really appreciate some extra effort from you in this changelog to save
me the time.  I bet you can explain it a lot more efficiently than I can
go figure it out.

[ovmf test] 181081: all pass - PUSHED

2023-06-01 Thread osstest service owner

flight 181081 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181081/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 15f83fa36442eaa272300b31699b3b82ce7e07a9
baseline version:
 ovmf 1df6658bcbc4cade29a8763808a9804e5d449046

Last test of basis   181076  2023-06-01 11:12:17 Z0 days
Testing same since   181081  2023-06-01 13:15:20 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 
  Gerd Hoffmann 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   1df6658bcb..15f83fa364  15f83fa36442eaa272300b31699b3b82ce7e07a9 -> 
xen-tested-master

Re: [PATCH] x86/ucode: Exit early from early_update_cache() if loading not available

2023-06-01 Thread Alejandro Vallejo

On Thu, Jun 01, 2023 at 03:38:13PM +0100, Andrew Cooper wrote:
> If for any reason early_microcode_init() concludes that no microcode loading
> is available, early_update_cache() will fall over a NULL function pointer:
> 
>   (XEN) Xen call trace:
>   (XEN)[] R show_code+0x91/0x18f
>   (XEN)[] F show_execution_state+0x2d/0x1fc
>   (XEN)[] F fatal_trap+0x87/0x19a
>   (XEN)[] F init_idt_traps+0/0x1bd
>   (XEN)[] F early_page_fault+0x8f/0x94
>   (XEN)[<>] F 
>   (XEN)[] F 
> arch/x86/cpu/microcode/core.c#early_update_cache+0x11/0x74
>   (XEN)[] F microcode_init_cache+0x5a/0x5c
>   (XEN)[] F __start_xen+0x1e11/0x27ee
>   (XEN)[] F __high_start+0x94/0xa0
> 
> which is actually parse_blob()'s use of ucode_ops.collect_cpu_info.
> 
> Skip trying to cache anything if microcode loading is unavailable.
> [...]
> ---
>  xen/arch/x86/cpu/microcode/core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/xen/arch/x86/cpu/microcode/core.c 
> b/xen/arch/x86/cpu/microcode/core.c
> index 5a5c0a8c70db..9029301107d6 100644
> --- a/xen/arch/x86/cpu/microcode/core.c
> +++ b/xen/arch/x86/cpu/microcode/core.c
> @@ -789,6 +789,9 @@ int __init microcode_init_cache(unsigned long *module_map,
>  {
>  int rc = 0;
>  
> +if ( !ucode_ops.apply_microcode )
> +return -ENODEV;
> +
>  if ( ucode_scan )
>  /* Need to rescan the modules because they might have been relocated 
> */
>  microcode_scan_module(module_map, mbi);

Ugh. These bugs are forever. IMO, it would be helpful to have a default set
of stubs (ucode_ops_default?) that unconditionally return -ENODEV when
called. At least the whole system won't crash under our feet if we forgot
an "if ( !ucode_ops.foo ) return -1".

It's still imperfect but there's far less room for errors.

Alejandro

Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-06-01 Thread Andy Shevchenko

On Thu, Jun 01, 2023 at 07:25:46PM +0300, Andy Shevchenko wrote:
> On Wed, May 31, 2023 at 08:48:35PM +0200, Jonas Gorski wrote:
> > On Tue, 30 May 2023 at 23:34, Bjorn Helgaas  wrote:
> > > On Fri, May 12, 2023 at 02:48:51PM -0500, Bjorn Helgaas wrote:

...

> > > Where are we at?  Are we going to ignore this because some Coverity
> > > reports are false positives?
> > 
> > Looking at the code I understand where coverity is coming from:
> > 
> > #define __pci_dev_for_each_res0(dev, res, ...) \
> >for (unsigned int __b = 0;  \
> > res = pci_resource_n(dev, __b), __b < PCI_NUM_RESOURCES;   \
> > __b++)
> > 
> >  res will be assigned before __b is checked for being less than
> > PCI_NUM_RESOURCES, making it point to behind the array at the end of
> > the last loop iteration.
> 
> Which is fine and you stumbled over the same mistake I made, that's why the
> documentation has been added to describe why the heck this macro is written
> the way it's written.
> 
> Coverity sucks.
> 
> > Rewriting the test expression as
> > 
> > __b < PCI_NUM_RESOURCES && (res = pci_resource_n(dev, __b));
> > 
> > should avoid the (coverity) warning by making use of lazy evaluation.
> 
> Obviously NAK.
> 
> > It probably makes the code slightly less performant as res will now be
> > checked for being not NULL (which will always be true), but I doubt it
> > will be significant (or in any hot paths).

Oh my god, I mistakenly read this as bus macro, sorry for my rant,
it's simply wrong.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-06-01 Thread Andy Shevchenko

On Wed, May 31, 2023 at 08:48:35PM +0200, Jonas Gorski wrote:
> On Tue, 30 May 2023 at 23:34, Bjorn Helgaas  wrote:
> > On Fri, May 12, 2023 at 02:48:51PM -0500, Bjorn Helgaas wrote:
> > > On Fri, May 12, 2023 at 01:56:29PM +0300, Andy Shevchenko wrote:
> > > > On Tue, May 09, 2023 at 01:21:22PM -0500, Bjorn Helgaas wrote:
> > > > > On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> > > > > > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > > > > > > Provide two new helper macros to iterate over PCI device 
> > > > > > > resources and
> > > > > > > convert users.
> > > > >
> > > > > > Applied 2-7 to pci/resource for v6.4, thanks, I really like this!
> > > > >
> > > > > This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()")
> > > > > upstream now.
> > > > >
> > > > > Coverity complains about each use,
> > > >
> > > > It needs more clarification here. Use of reduced variant of the
> > > > macro or all of them? If the former one, then I can speculate that
> > > > Coverity (famous for false positives) simply doesn't understand `for
> > > > (type var; var ...)` code.
> > >
> > > True, Coverity finds false positives.  It flagged every use in
> > > drivers/pci and drivers/pnp.  It didn't mention the arch/alpha, arm,
> > > mips, powerpc, sh, or sparc uses, but I think it just didn't look at
> > > those.
> > >
> > > It flagged both:
> > >
> > >   pbus_size_iopci_dev_for_each_resource(dev, r)
> > >   pbus_size_mem   pci_dev_for_each_resource(dev, r, i)
> > >
> > > Here's a spreadsheet with a few more details (unfortunately I don't
> > > know how to make it dump the actual line numbers or analysis like I
> > > pasted below, so "pci_dev_for_each_resource" doesn't appear).  These
> > > are mostly in the "Drivers-PCI" component.
> > >
> > > https://docs.google.com/spreadsheets/d/1ohOJwxqXXoDUA0gwopgk-z-6ArLvhN7AZn4mIlDkHhQ/edit?usp=sharing
> > >
> > > These particular reports are in the "High Impact Outstanding" tab.
> >
> > Where are we at?  Are we going to ignore this because some Coverity
> > reports are false positives?
> 
> Looking at the code I understand where coverity is coming from:
> 
> #define __pci_dev_for_each_res0(dev, res, ...) \
>for (unsigned int __b = 0;  \
> res = pci_resource_n(dev, __b), __b < PCI_NUM_RESOURCES;   \
> __b++)
> 
>  res will be assigned before __b is checked for being less than
> PCI_NUM_RESOURCES, making it point to behind the array at the end of
> the last loop iteration.

Which is fine and you stumbled over the same mistake I made, that's why the
documentation has been added to describe why the heck this macro is written
the way it's written.

Coverity sucks.

> Rewriting the test expression as
> 
> __b < PCI_NUM_RESOURCES && (res = pci_resource_n(dev, __b));
> 
> should avoid the (coverity) warning by making use of lazy evaluation.

Obviously NAK.

> It probably makes the code slightly less performant as res will now be
> checked for being not NULL (which will always be true), but I doubt it
> will be significant (or in any hot paths).

-- 
With Best Regards,
Andy Shevchenko

Re: [linux-linus test] 181063: regressions - FAIL

2023-06-01 Thread Roger Pau Monné

On Thu, Jun 01, 2023 at 01:20:26PM +, osstest service owner wrote:
> flight 181063 linux-linus real [real]
> flight 181077 linux-linus real-retest [real]
> http://logs.test-lab.xenproject.org/osstest/logs/181063/
> http://logs.test-lab.xenproject.org/osstest/logs/181077/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 
> 180278

History for the test:

http://logs.test-lab.xenproject.org/osstest/results/history/test-armhf-armhf-xl/linux-linus.html

Last working flight seems to be 173462, that was using hash
9d84bb40bcb3.  First failure is 173462, which is hash e8bc52cb8df8.

Roger.

[PULL 1/8] block: add blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

Introduce a new API for thread-local blk_io_plug() that does not
traverse the block graph. The goal is to make blk_io_plug() multi-queue
friendly.

Instead of having block drivers track whether or not we're in a plugged
section, provide an API that allows them to defer a function call until
we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
called multiple times with the same fn/opaque pair, then fn() is only
called once at the end of the function - resulting in batching.

This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
because the plug state is now thread-local.

Later patches convert block drivers to blk_io_plug_call() and then we
can finally remove .bdrv_co_io_plug() once all block drivers have been
converted.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-2-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |   1 +
 include/sysemu/block-backend-io.h |  13 +--
 block/block-backend.c |  22 -
 block/plug.c  | 159 ++
 hw/block/dataplane/xen-block.c|   8 +-
 hw/block/virtio-blk.c |   4 +-
 hw/scsi/virtio-scsi.c |   6 +-
 block/meson.build |   1 +
 8 files changed, 173 insertions(+), 41 deletions(-)
 create mode 100644 block/plug.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 4b025a7b63..89f274f85e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2650,6 +2650,7 @@ F: util/aio-*.c
 F: util/aio-*.h
 F: util/fdmon-*.c
 F: block/io.c
+F: block/plug.c
 F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index d62a7ee773..be4dcef59d 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,16 +100,9 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-/*
- * blk_io_plug/unplug are thread-local operations. This means that multiple
- * IOThreads can simultaneously call plug/unplug, but the caller must ensure
- * that each unplug() is called in the same IOThread of the matching plug().
- */
-void coroutine_fn blk_co_io_plug(BlockBackend *blk);
-void co_wrapper blk_io_plug(BlockBackend *blk);
-
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk);
-void co_wrapper blk_io_unplug(BlockBackend *blk);
+void blk_io_plug(void);
+void blk_io_unplug(void);
+void blk_io_plug_call(void (*fn)(void *), void *opaque);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index 241f643507..4009ed5fed 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2582,28 +2582,6 @@ void blk_add_insert_bs_notifier(BlockBackend *blk, 
Notifier *notify)
 notifier_list_add(&blk->insert_bs_notifiers, notify);
 }
 
-void coroutine_fn blk_co_io_plug(BlockBackend *blk)
-{
-BlockDriverState *bs = blk_bs(blk);
-IO_CODE();
-GRAPH_RDLOCK_GUARD();
-
-if (bs) {
-bdrv_co_io_plug(bs);
-}
-}
-
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk)
-{
-BlockDriverState *bs = blk_bs(blk);
-IO_CODE();
-GRAPH_RDLOCK_GUARD();
-
-if (bs) {
-bdrv_co_io_unplug(bs);
-}
-}
-
 BlockAcctStats *blk_get_stats(BlockBackend *blk)
 {
 IO_CODE();
diff --git a/block/plug.c b/block/plug.c
new file mode 100644
index 00..98a155d2f4
--- /dev/null
+++ b/block/plug.c
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Block I/O plugging
+ *
+ * Copyright Red Hat.
+ *
+ * This API defers a function call within a blk_io_plug()/blk_io_unplug()
+ * section, allowing multiple calls to batch up. This is a performance
+ * optimization that is used in the block layer to submit several I/O requests
+ * at once instead of individually:
+ *
+ *   blk_io_plug(); <-- start of plugged region
+ *   ...
+ *   blk_io_plug_call(my_func, my_obj); <-- deferred my_func(my_obj) call
+ *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   ...
+ *   blk_io_unplug(); <-- end of plugged region, my_func(my_obj) is called once
+ *
+ * This code is actually generic and not tied to the block layer. If another
+ * subsystem needs this functionality, it could be renamed.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/coroutine-tls.h"
+#include "qemu/notify.h"
+#include "qemu/thread.h"
+#include "sysemu/block-backend.h"
+
+/* A function call that has been deferred until unplug() */
+typedef struct {
+void (*fn)(void *);
+void *opaque;
+} UnplugFn;
+
+/* Per-thread state */
+typedef struct {
+unsigned count;   /* how many times has plug

[PULL 8/8] qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa

2023-06-01 Thread Stefan Hajnoczi

From: Stefano Garzarella 

The virtio-blk-vhost-vdpa driver in libblkio 1.3.0 supports the fd
passing through the new 'fd' property.

Since now we are using qemu_open() on '@path' if the virtio-blk driver
supports the fd passing, let's announce it.
In this way, the management layer can pass the file descriptor of an
already opened vhost-vdpa character device. This is useful especially
when the device can only be accessed with certain privileges.

Add the '@fdset' feature only when the virtio-blk-vhost-vdpa driver
in libblkio supports it.

Suggested-by: Markus Armbruster 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
Message-id: 20230530071941.8954-3-sgarz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 qapi/block-core.json | 6 ++
 meson.build  | 4 
 2 files changed, 10 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 98d9116dae..4bf89171c6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3955,10 +3955,16 @@
 #
 # @path: path to the vhost-vdpa character device.
 #
+# Features:
+# @fdset: Member @path supports the special "/dev/fdset/N" path
+# (since 8.1)
+#
 # Since: 7.2
 ##
 { 'struct': 'BlockdevOptionsVirtioBlkVhostVdpa',
   'data': { 'path': 'str' },
+  'features': [ { 'name' :'fdset',
+  'if': 'CONFIG_BLKIO_VHOST_VDPA_FD' } ],
   'if': 'CONFIG_BLKIO' }
 
 ##
diff --git a/meson.build b/meson.build
index bc76ea96bf..a61d3e9b06 100644
--- a/meson.build
+++ b/meson.build
@@ -2106,6 +2106,10 @@ config_host_data.set('CONFIG_LZO', lzo.found())
 config_host_data.set('CONFIG_MPATH', mpathpersist.found())
 config_host_data.set('CONFIG_MPATH_NEW_API', mpathpersist_new_api)
 config_host_data.set('CONFIG_BLKIO', blkio.found())
+if blkio.found()
+  config_host_data.set('CONFIG_BLKIO_VHOST_VDPA_FD',
+   blkio.version().version_compare('>=1.3.0'))
+endif
 config_host_data.set('CONFIG_CURL', curl.found())
 config_host_data.set('CONFIG_CURSES', curses.found())
 config_host_data.set('CONFIG_GBM', gbm.found())
-- 
2.40.1

[PULL 6/8] block: remove bdrv_co_io_plug() API

2023-06-01 Thread Stefan Hajnoczi

No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
function pointers.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-7-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-io.h |  3 ---
 include/block/block_int-common.h | 11 --
 block/io.c   | 37 
 3 files changed, 51 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index a27e471a87..43af816d75 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -259,9 +259,6 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, 
AioContext *old_ctx);
 
 AioContext *child_of_bds_get_parent_aio_context(BdrvChild *c);
 
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_plug(BlockDriverState *bs);
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_unplug(BlockDriverState *bs);
-
 bool coroutine_fn GRAPH_RDLOCK
 bdrv_co_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
uint32_t granularity, Error **errp);
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index b1cbc1e00c..74195c3004 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -768,11 +768,6 @@ struct BlockDriver {
 void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_debug_event)(
 BlockDriverState *bs, BlkdebugEvent event);
 
-/* io queue for linux-aio */
-void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_plug)(BlockDriverState 
*bs);
-void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_unplug)(
-BlockDriverState *bs);
-
 bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs);
 
 bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_can_store_new_dirty_bitmap)(
@@ -1227,12 +1222,6 @@ struct BlockDriverState {
 unsigned int in_flight;
 unsigned int serialising_in_flight;
 
-/*
- * counter for nested bdrv_io_plug.
- * Accessed with atomic ops.
- */
-unsigned io_plugged;
-
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
diff --git a/block/io.c b/block/io.c
index 540bf8d26d..f2dfc7c405 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3223,43 +3223,6 @@ void *qemu_try_blockalign0(BlockDriverState *bs, size_t 
size)
 return mem;
 }
 
-void coroutine_fn bdrv_co_io_plug(BlockDriverState *bs)
-{
-BdrvChild *child;
-IO_CODE();
-assert_bdrv_graph_readable();
-
-QLIST_FOREACH(child, &bs->children, next) {
-bdrv_co_io_plug(child->bs);
-}
-
-if (qatomic_fetch_inc(&bs->io_plugged) == 0) {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_co_io_plug) {
-drv->bdrv_co_io_plug(bs);
-}
-}
-}
-
-void coroutine_fn bdrv_co_io_unplug(BlockDriverState *bs)
-{
-BdrvChild *child;
-IO_CODE();
-assert_bdrv_graph_readable();
-
-assert(bs->io_plugged);
-if (qatomic_fetch_dec(&bs->io_plugged) == 1) {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_co_io_unplug) {
-drv->bdrv_co_io_unplug(bs);
-}
-}
-
-QLIST_FOREACH(child, &bs->children, next) {
-bdrv_co_io_unplug(child->bs);
-}
-}
-
 /* Helper that undoes bdrv_register_buf() when it fails partway through */
 static void GRAPH_RDLOCK
 bdrv_register_buf_rollback(BlockDriverState *bs, void *host, size_t size,
-- 
2.40.1

[PULL 5/8] block/linux-aio: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Note that a dev_max_batch check is dropped in laio_io_unplug() because
the semantics of unplug_fn() are different from .bdrv_co_unplug():
1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
   not every time blk_io_unplug() is called.
2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
   way to get per-BlockDriverState fields like dev_max_batch.

Therefore this condition cannot be moved to laio_unplug_fn(). It is not
obvious that this condition affects performance in practice, so I am
removing it instead of trying to come up with a more complex mechanism
to preserve the condition.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Acked-by: Kevin Wolf 
Reviewed-by: Stefano Garzarella 
Message-id: 20230530180959.1108766-6-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/raw-aio.h |  7 ---
 block/file-posix.c  | 28 
 block/linux-aio.c   | 41 +++--
 3 files changed, 11 insertions(+), 65 deletions(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index da60ca13ef..0f63c2800c 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -62,13 +62,6 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, 
QEMUIOVector *qiov,
 
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
 void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
-
-/*
- * laio_io_plug/unplug work in the thread's current AioContext, therefore the
- * caller must ensure that they are paired in the same IOThread.
- */
-void laio_io_plug(void);
-void laio_io_unplug(uint64_t dev_max_batch);
 #endif
 /* io_uring.c - Linux io_uring implementation */
 #ifdef CONFIG_LINUX_IO_URING
diff --git a/block/file-posix.c b/block/file-posix.c
index 7baa8491dd..ac1ed54811 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2550,26 +2550,6 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState 
*bs, int64_t offset,
 return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
 }
 
-static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
-{
-BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-if (s->use_linux_aio) {
-laio_io_plug();
-}
-#endif
-}
-
-static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
-{
-BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-if (s->use_linux_aio) {
-laio_io_unplug(s->aio_max_batch);
-}
-#endif
-}
-
 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
@@ -3914,8 +3894,6 @@ BlockDriver bdrv_file = {
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
 .bdrv_refresh_limits = raw_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
@@ -4286,8 +4264,6 @@ static BlockDriver bdrv_host_device = {
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
 .bdrv_refresh_limits = raw_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
@@ -4424,8 +4400,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
 .bdrv_refresh_limits= cdrom_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
@@ -4552,8 +4526,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
 .bdrv_refresh_limits= cdrom_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 916f001e32..561c71a9ae 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -15,6 +15,7 @@
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "sysemu/block-backend.h"
 
 /* Only used for assertions.  */
 #include "qemu/coroutine_int.h"
@@ -46,7 +47,6 @@ struct qemu_laiocb {
 };
 
 typedef struct {
-int plugged;
 unsigned int in_queue;

[PULL 2/8] block/nvme: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-3-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/nvme.c   | 44 
 block/trace-events |  1 -
 2 files changed, 12 insertions(+), 33 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 17937d398d..7ca85bc44a 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -25,6 +25,7 @@
 #include "qemu/vfio-helpers.h"
 #include "block/block-io.h"
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 #include "sysemu/replay.h"
 #include "trace.h"
 
@@ -119,7 +120,6 @@ struct BDRVNVMeState {
 int blkshift;
 
 uint64_t max_transfer;
-bool plugged;
 
 bool supports_write_zeroes;
 bool supports_discard;
@@ -282,7 +282,7 @@ static void nvme_kick(NVMeQueuePair *q)
 {
 BDRVNVMeState *s = q->s;
 
-if (s->plugged || !q->need_kick) {
+if (!q->need_kick) {
 return;
 }
 trace_nvme_kick(s, q->index);
@@ -387,10 +387,6 @@ static bool nvme_process_completion(NVMeQueuePair *q)
 NvmeCqe *c;
 
 trace_nvme_process_completion(s, q->index, q->inflight);
-if (s->plugged) {
-trace_nvme_process_completion_queue_plugged(s, q->index);
-return false;
-}
 
 /*
  * Support re-entrancy when a request cb() function invokes aio_poll().
@@ -480,6 +476,15 @@ static void nvme_trace_command(const NvmeCmd *cmd)
 }
 }
 
+static void nvme_unplug_fn(void *opaque)
+{
+NVMeQueuePair *q = opaque;
+
+QEMU_LOCK_GUARD(&q->lock);
+nvme_kick(q);
+nvme_process_completion(q);
+}
+
 static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
 NvmeCmd *cmd, BlockCompletionFunc cb,
 void *opaque)
@@ -496,8 +501,7 @@ static void nvme_submit_command(NVMeQueuePair *q, 
NVMeRequest *req,
q->sq.tail * NVME_SQ_ENTRY_BYTES, cmd, sizeof(*cmd));
 q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE;
 q->need_kick++;
-nvme_kick(q);
-nvme_process_completion(q);
+blk_io_plug_call(nvme_unplug_fn, q);
 qemu_mutex_unlock(&q->lock);
 }
 
@@ -1567,27 +1571,6 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
 }
 }
 
-static void coroutine_fn nvme_co_io_plug(BlockDriverState *bs)
-{
-BDRVNVMeState *s = bs->opaque;
-assert(!s->plugged);
-s->plugged = true;
-}
-
-static void coroutine_fn nvme_co_io_unplug(BlockDriverState *bs)
-{
-BDRVNVMeState *s = bs->opaque;
-assert(s->plugged);
-s->plugged = false;
-for (unsigned i = INDEX_IO(0); i < s->queue_count; i++) {
-NVMeQueuePair *q = s->queues[i];
-qemu_mutex_lock(&q->lock);
-nvme_kick(q);
-nvme_process_completion(q);
-qemu_mutex_unlock(&q->lock);
-}
-}
-
 static bool nvme_register_buf(BlockDriverState *bs, void *host, size_t size,
   Error **errp)
 {
@@ -1664,9 +1647,6 @@ static BlockDriver bdrv_nvme = {
 .bdrv_detach_aio_context  = nvme_detach_aio_context,
 .bdrv_attach_aio_context  = nvme_attach_aio_context,
 
-.bdrv_co_io_plug  = nvme_co_io_plug,
-.bdrv_co_io_unplug= nvme_co_io_unplug,
-
 .bdrv_register_buf= nvme_register_buf,
 .bdrv_unregister_buf  = nvme_unregister_buf,
 };
diff --git a/block/trace-events b/block/trace-events
index 32665158d6..048ad27519 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -141,7 +141,6 @@ nvme_kick(void *s, unsigned q_index) "s %p q #%u"
 nvme_dma_flush_queue_wait(void *s) "s %p"
 nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) 
"cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
 nvme_process_completion(void *s, unsigned q_index, int inflight) "s %p q #%u 
inflight %d"
-nvme_process_completion_queue_plugged(void *s, unsigned q_index) "s %p q #%u"
 nvme_complete_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
 nvme_submit_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
 nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int 
c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x"
-- 
2.40.1

[PULL 7/8] block/blkio: use qemu_open() to support fd passing for virtio-blk

2023-06-01 Thread Stefan Hajnoczi

From: Stefano Garzarella 

Some virtio-blk drivers (e.g. virtio-blk-vhost-vdpa) supports the fd
passing. Let's expose this to the user, so the management layer
can pass the file descriptor of an already opened path.

If the libblkio virtio-blk driver supports fd passing, let's always
use qemu_open() to open the `path`, so we can handle fd passing
from the management layer through the "/dev/fdset/N" special path.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
Message-id: 20230530071941.8954-2-sgarz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/blkio.c | 53 ++-
 1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 11be8787a3..527323d625 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -673,25 +673,60 @@ static int blkio_virtio_blk_common_open(BlockDriverState 
*bs,
 {
 const char *path = qdict_get_try_str(options, "path");
 BDRVBlkioState *s = bs->opaque;
-int ret;
+bool fd_supported = false;
+int fd, ret;
 
 if (!path) {
 error_setg(errp, "missing 'path' option");
 return -EINVAL;
 }
 
-ret = blkio_set_str(s->blkio, "path", path);
-qdict_del(options, "path");
-if (ret < 0) {
-error_setg_errno(errp, -ret, "failed to set path: %s",
- blkio_get_error_msg());
-return ret;
-}
-
 if (!(flags & BDRV_O_NOCACHE)) {
 error_setg(errp, "cache.direct=off is not supported");
 return -EINVAL;
 }
+
+if (blkio_get_int(s->blkio, "fd", &fd) == 0) {
+fd_supported = true;
+}
+
+/*
+ * If the libblkio driver supports fd passing, let's always use qemu_open()
+ * to open the `path`, so we can handle fd passing from the management
+ * layer through the "/dev/fdset/N" special path.
+ */
+if (fd_supported) {
+int open_flags;
+
+if (flags & BDRV_O_RDWR) {
+open_flags = O_RDWR;
+} else {
+open_flags = O_RDONLY;
+}
+
+fd = qemu_open(path, open_flags, errp);
+if (fd < 0) {
+return -EINVAL;
+}
+
+ret = blkio_set_int(s->blkio, "fd", fd);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "failed to set fd: %s",
+ blkio_get_error_msg());
+qemu_close(fd);
+return ret;
+}
+} else {
+ret = blkio_set_str(s->blkio, "path", path);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "failed to set path: %s",
+ blkio_get_error_msg());
+return ret;
+}
+}
+
+qdict_del(options, "path");
+
 return 0;
 }
 
-- 
2.40.1

[PULL 4/8] block/io_uring: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-5-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/raw-aio.h |  7 ---
 block/file-posix.c  | 10 --
 block/io_uring.c| 44 -
 block/trace-events  |  5 ++---
 4 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index 0fe85ade77..da60ca13ef 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -81,13 +81,6 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int 
fd, uint64_t offset,
   QEMUIOVector *qiov, int type);
 void luring_detach_aio_context(LuringState *s, AioContext *old_context);
 void luring_attach_aio_context(LuringState *s, AioContext *new_context);
-
-/*
- * luring_io_plug/unplug work in the thread's current AioContext, therefore the
- * caller must ensure that they are paired in the same IOThread.
- */
-void luring_io_plug(void);
-void luring_io_unplug(void);
 #endif
 
 #ifdef _WIN32
diff --git a/block/file-posix.c b/block/file-posix.c
index 0ab158efba..7baa8491dd 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2558,11 +2558,6 @@ static void coroutine_fn raw_co_io_plug(BlockDriverState 
*bs)
 laio_io_plug();
 }
 #endif
-#ifdef CONFIG_LINUX_IO_URING
-if (s->use_linux_io_uring) {
-luring_io_plug();
-}
-#endif
 }
 
 static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
@@ -2573,11 +2568,6 @@ static void coroutine_fn 
raw_co_io_unplug(BlockDriverState *bs)
 laio_io_unplug(s->aio_max_batch);
 }
 #endif
-#ifdef CONFIG_LINUX_IO_URING
-if (s->use_linux_io_uring) {
-luring_io_unplug();
-}
-#endif
 }
 
 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
diff --git a/block/io_uring.c b/block/io_uring.c
index 3a77480e16..69d9820928 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -16,6 +16,7 @@
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "sysemu/block-backend.h"
 #include "trace.h"
 
 /* Only used for assertions.  */
@@ -41,7 +42,6 @@ typedef struct LuringAIOCB {
 } LuringAIOCB;
 
 typedef struct LuringQueue {
-int plugged;
 unsigned int in_queue;
 unsigned int in_flight;
 bool blocked;
@@ -267,7 +267,7 @@ static void 
luring_process_completions_and_submit(LuringState *s)
 {
 luring_process_completions(s);
 
-if (!s->io_q.plugged && s->io_q.in_queue > 0) {
+if (s->io_q.in_queue > 0) {
 ioq_submit(s);
 }
 }
@@ -301,29 +301,17 @@ static void qemu_luring_poll_ready(void *opaque)
 static void ioq_init(LuringQueue *io_q)
 {
 QSIMPLEQ_INIT(&io_q->submit_queue);
-io_q->plugged = 0;
 io_q->in_queue = 0;
 io_q->in_flight = 0;
 io_q->blocked = false;
 }
 
-void luring_io_plug(void)
+static void luring_unplug_fn(void *opaque)
 {
-AioContext *ctx = qemu_get_current_aio_context();
-LuringState *s = aio_get_linux_io_uring(ctx);
-trace_luring_io_plug(s);
-s->io_q.plugged++;
-}
-
-void luring_io_unplug(void)
-{
-AioContext *ctx = qemu_get_current_aio_context();
-LuringState *s = aio_get_linux_io_uring(ctx);
-assert(s->io_q.plugged);
-trace_luring_io_unplug(s, s->io_q.blocked, s->io_q.plugged,
-   s->io_q.in_queue, s->io_q.in_flight);
-if (--s->io_q.plugged == 0 &&
-!s->io_q.blocked && s->io_q.in_queue > 0) {
+LuringState *s = opaque;
+trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
+   s->io_q.in_flight);
+if (!s->io_q.blocked && s->io_q.in_queue > 0) {
 ioq_submit(s);
 }
 }
@@ -370,14 +358,16 @@ static int luring_do_submit(int fd, LuringAIOCB 
*luringcb, LuringState *s,
 
 QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
 s->io_q.in_queue++;
-trace_luring_do_submit(s, s->io_q.blocked, s->io_q.plugged,
-   s->io_q.in_queue, s->io_q.in_flight);
-if (!s->io_q.blocked &&
-(!s->io_q.plugged ||
- s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
-ret = ioq_submit(s);
-trace_luring_do_submit_done(s, ret);
-return ret;
+trace_luring_do_submit(s, s->io_q.blocked, s->io_q.in_queue,
+   s->io_q.in_flight);
+if (!s->io_q.blocked) {
+if (s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES) {
+ret = ioq_submit(s);
+trace_luring_do_submit_done(s, ret);
+return ret;
+}
+
+blk_io_plug_call(luring_unplug_fn, s);
 }
 return 0;
 }
diff --git a/block/trace-events b/block/trace-events
index 048ad27519..6f121b7636

[PULL 3/8] block/blkio: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-4-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/blkio.c | 43 ---
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 72117fa005..11be8787a3 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -17,6 +17,7 @@
 #include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
+#include "sysemu/block-backend.h"
 #include "exec/memory.h" /* for ram_block_discard_disable() */
 
 #include "block/block-io.h"
@@ -320,16 +321,30 @@ static void blkio_detach_aio_context(BlockDriverState *bs)
NULL, NULL, NULL);
 }
 
-/* Call with s->blkio_lock held to submit I/O after enqueuing a new request */
-static void blkio_submit_io(BlockDriverState *bs)
+/*
+ * Called by blk_io_unplug() or immediately if not plugged. Called without
+ * blkio_lock.
+ */
+static void blkio_unplug_fn(void *opaque)
 {
-if (qatomic_read(&bs->io_plugged) == 0) {
-BDRVBlkioState *s = bs->opaque;
+BDRVBlkioState *s = opaque;
 
+WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_do_io(s->blkioq, NULL, 0, 0, NULL);
 }
 }
 
+/*
+ * Schedule I/O submission after enqueuing a new request. Called without
+ * blkio_lock.
+ */
+static void blkio_submit_io(BlockDriverState *bs)
+{
+BDRVBlkioState *s = bs->opaque;
+
+blk_io_plug_call(blkio_unplug_fn, s);
+}
+
 static int coroutine_fn
 blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
@@ -340,9 +355,9 @@ blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, 
int64_t bytes)
 
 WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_discard(s->blkioq, offset, bytes, &cod, 0);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 return cod.ret;
 }
@@ -373,9 +388,9 @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 
 WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_readv(s->blkioq, offset, iov, iovcnt, &cod, 0);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 
 if (use_bounce_buffer) {
@@ -418,9 +433,9 @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState 
*bs, int64_t offset,
 
 WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_writev(s->blkioq, offset, iov, iovcnt, &cod, blkio_flags);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 
 if (use_bounce_buffer) {
@@ -439,9 +454,9 @@ static int coroutine_fn blkio_co_flush(BlockDriverState *bs)
 
 WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_flush(s->blkioq, &cod, 0);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 return cod.ret;
 }
@@ -467,22 +482,13 @@ static int coroutine_fn 
blkio_co_pwrite_zeroes(BlockDriverState *bs,
 
 WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_write_zeroes(s->blkioq, offset, bytes, &cod, blkio_flags);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 return cod.ret;
 }
 
-static void coroutine_fn blkio_co_io_unplug(BlockDriverState *bs)
-{
-BDRVBlkioState *s = bs->opaque;
-
-WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
-blkio_submit_io(bs);
-}
-}
-
 typedef enum {
 BMRR_OK,
 BMRR_SKIP,
@@ -1004,7 +1010,6 @@ static void blkio_refresh_limits(BlockDriverState *bs, 
Error **errp)
 .bdrv_co_pwritev = blkio_co_pwritev, \
 .bdrv_co_flush_to_disk   = blkio_co_flush, \
 .bdrv_co_pwrite_zeroes   = blkio_co_pwrite_zeroes, \
-.bdrv_co_io_unplug   = blkio_co_io_unplug, \
 .bdrv_refresh_limits = blkio_refresh_limits, \
 .bdrv_register_buf   = blkio_register_buf, \
 .bdrv_unregister_buf = blkio_unregister_buf, \
-- 
2.40.1

[PULL 0/8] Block patches

2023-06-01 Thread Stefan Hajnoczi

The following changes since commit c6a5fc2ac76c5ab709896ee1b0edd33685a67ed1:

  decodetree: Add --output-null for meson testing (2023-05-31 19:56:42 -0700)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 98b126f5e3228a346c774e569e26689943b401dd:

  qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa (2023-06-01 
11:08:21 -0400)


Pull request

- Stefano Garzarella's blkio block driver 'fd' parameter
- My thread-local blk_io_plug() series



Stefan Hajnoczi (6):
  block: add blk_io_plug_call() API
  block/nvme: convert to blk_io_plug_call() API
  block/blkio: convert to blk_io_plug_call() API
  block/io_uring: convert to blk_io_plug_call() API
  block/linux-aio: convert to blk_io_plug_call() API
  block: remove bdrv_co_io_plug() API

Stefano Garzarella (2):
  block/blkio: use qemu_open() to support fd passing for virtio-blk
  qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa

 MAINTAINERS   |   1 +
 qapi/block-core.json  |   6 ++
 meson.build   |   4 +
 include/block/block-io.h  |   3 -
 include/block/block_int-common.h  |  11 ---
 include/block/raw-aio.h   |  14 ---
 include/sysemu/block-backend-io.h |  13 +--
 block/blkio.c |  96 --
 block/block-backend.c |  22 -
 block/file-posix.c|  38 ---
 block/io.c|  37 ---
 block/io_uring.c  |  44 -
 block/linux-aio.c |  41 +++-
 block/nvme.c  |  44 +++--
 block/plug.c  | 159 ++
 hw/block/dataplane/xen-block.c|   8 +-
 hw/block/virtio-blk.c |   4 +-
 hw/scsi/virtio-scsi.c |   6 +-
 block/meson.build |   1 +
 block/trace-events|   6 +-
 20 files changed, 293 insertions(+), 265 deletions(-)
 create mode 100644 block/plug.c

-- 
2.40.1

Re: [PATCH] x86/ucode: Exit early from early_update_cache() if loading not available

2023-06-01 Thread Jan Beulich

On 01.06.2023 16:38, Andrew Cooper wrote:
> If for any reason early_microcode_init() concludes that no microcode loading
> is available, early_update_cache() will fall over a NULL function pointer:
> 
>   (XEN) Xen call trace:
>   (XEN)[] R show_code+0x91/0x18f
>   (XEN)[] F show_execution_state+0x2d/0x1fc
>   (XEN)[] F fatal_trap+0x87/0x19a
>   (XEN)[] F init_idt_traps+0/0x1bd
>   (XEN)[] F early_page_fault+0x8f/0x94
>   (XEN)[<>] F 
>   (XEN)[] F 
> arch/x86/cpu/microcode/core.c#early_update_cache+0x11/0x74
>   (XEN)[] F microcode_init_cache+0x5a/0x5c
>   (XEN)[] F __start_xen+0x1e11/0x27ee
>   (XEN)[] F __high_start+0x94/0xa0
> 
> which is actually parse_blob()'s use of ucode_ops.collect_cpu_info.
> 
> Skip trying to cache anything if microcode loading is unavailable.
> 
> Fixes: dc380df12acf ("x86/ucode: load microcode earlier on boot CPU")
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich

[PATCH v2 3/3] x86/cpu-policy: Derive RSBA/RRSBA for guest policies

2023-06-01 Thread Andrew Cooper

The RSBA bit, "RSB Alternative", means that the RSB may use alternative
predictors when empty.  From a practical point of view, this mean "Retpoline
not safe".

Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) is a
statement that IBRS is implemented in hardware (as opposed to the form
retrofitted to existing CPUs in microcode).

The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS
property that predictions are tagged with the mode in which they were learnt.
Therefore, it means "when eIBRS is active, the RSB may fall back to
alternative predictors but restricted to the current prediction mode".  As
such, it's stronger statement than RSBA, but still means "Retpoline not safe".

CPUs are not expected to enumerate both RSBA and RRSBA.

Add feature dependencies for EIBRS and RRSBA.  While technically they're not
linked, absolutely nothing good can of letting the guest see RRSBA without
EIBRS.  Nor can anything good come of a guest seeing EIBRS without IBRSB.
Furthermore, we use this dependency to simplify the max derivation logic.

The max policies gets RSBA and RRSBA unconditionally set (with the EIBRS
dependency maybe hiding RRSBA).  We can run any VM, even if it has been told
"somewhere you might run, Retpoline isn't safe".

The default policies are more complicated.  A guest shouldn't see both bits,
but it needs to see one if the current host suffers from any form of RSBA, and
which bit it needs to see depends on whether eIBRS is visible or not.
Therefore, the calculation must be performed after sanitise_featureset().

Finally, apply the same logic in recalculate_cpuid_policy(), as we do for
other safety settings while we're still overhauling the toolstack logic in
this area.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * Expand/adjust the comment for the max features.
 * Rewrite the default feature derivation in light of new information.
 * Fix up in recalculate_cpuid_policy() too.
---
 xen/arch/x86/cpu-policy.c   | 53 +
 xen/include/public/arch-x86/cpufeatureset.h |  4 +-
 xen/tools/gen-cpuid.py  |  5 +-
 3 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index ee256ff5a137..f3bcb1ea4101 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -423,8 +423,17 @@ static void __init 
guest_common_max_feature_adjustments(uint32_t *fs)
  * Retpoline not safe)", so these need to be visible to a guest in all
  * cases, even when it's only some other server in the pool which
  * suffers the identified behaviour.
+ *
+ * We can always run any VM which has previously (or will
+ * subsequently) run on hardware where Retpoline is not safe.
+ * Note:
+ *  - The dependency logic may hide RRSBA for other reasons.
+ *  - The max policy does not contitute a sensible configuration to
+ *run a guest in.
  */
 __set_bit(X86_FEATURE_ARCH_CAPS, fs);
+__set_bit(X86_FEATURE_RSBA, fs);
+__set_bit(X86_FEATURE_RRSBA, fs);
 }
 }
 
@@ -532,6 +541,21 @@ static void __init calculate_pv_def_policy(void)
 guest_common_default_feature_adjustments(fs);
 
 sanitise_featureset(fs);
+
+/*
+ * If the host suffers from RSBA of any form, and the guest can see
+ * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest
+ * depending on the visibility of eIBRS.
+ */
+if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) &&
+ (cpu_has_rsba || cpu_has_rrsba) )
+{
+bool eibrs = test_bit(X86_FEATURE_EIBRS, fs);
+
+__set_bit(eibrs ? X86_FEATURE_RRSBA
+: X86_FEATURE_RSBA, fs);
+}
+
 x86_cpu_featureset_to_policy(fs, p);
 recalculate_xstate(p);
 }
@@ -664,6 +688,21 @@ static void __init calculate_hvm_def_policy(void)
 __set_bit(X86_FEATURE_VIRT_SSBD, fs);
 
 sanitise_featureset(fs);
+
+/*
+ * If the host suffers from RSBA of any form, and the guest can see
+ * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest
+ * depending on the visibility of eIBRS.
+ */
+if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) &&
+ (cpu_has_rsba || cpu_has_rrsba) )
+{
+bool eibrs = test_bit(X86_FEATURE_EIBRS, fs);
+
+__set_bit(eibrs ? X86_FEATURE_RRSBA
+: X86_FEATURE_RSBA, fs);
+}
+
 x86_cpu_featureset_to_policy(fs, p);
 recalculate_xstate(p);
 }
@@ -786,6 +825,20 @@ void recalculate_cpuid_policy(struct domain *d)
 
 sanitise_featureset(fs);
 
+/*
+ * If the host suffers from RSBA of any form, and the guest can see
+ * MSR_ARCH_CAPS, reflect the appropriate RSBA/RRSBA property to the guest
+ * depending on the visibility of eIBRS.
+ */
+if ( test_bit(X86_FEATURE_ARCH_CAPS, fs) &&
+ (cpu_h

[PATCH v2 2/3] x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate

2023-06-01 Thread Andrew Cooper

In order to level a VM safely for migration, the toolstack needs to know the
RSBA/RRSBA properties of the CPU, whether or not they happen to be enumerated.

See the code comment for details.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * Rewrite almost from scratch.
---
 xen/arch/x86/include/asm/cpufeature.h |  1 +
 xen/arch/x86/spec_ctrl.c  | 92 +--
 2 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/include/asm/cpufeature.h 
b/xen/arch/x86/include/asm/cpufeature.h
index ace31e3b1f1a..e2cb8f3cc728 100644
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -193,6 +193,7 @@ static inline bool boot_cpu_has(unsigned int feat)
 #define cpu_has_tsx_ctrlboot_cpu_has(X86_FEATURE_TSX_CTRL)
 #define cpu_has_taa_no  boot_cpu_has(X86_FEATURE_TAA_NO)
 #define cpu_has_fb_clearboot_cpu_has(X86_FEATURE_FB_CLEAR)
+#define cpu_has_rrsba   boot_cpu_has(X86_FEATURE_RRSBA)
 
 /* Synthesized. */
 #define cpu_has_arch_perfmonboot_cpu_has(X86_FEATURE_ARCH_PERFMON)
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index daee61900afa..29ed410da47a 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -579,7 +579,10 @@ static bool __init check_smt_enabled(void)
 return false;
 }
 
-/* Calculate whether Retpoline is known-safe on this CPU. */
+/*
+ * Calculate whether Retpoline is known-safe on this CPU.  Fix up the
+ * RSBA/RRSBA bits as necessary.
+ */
 static bool __init retpoline_calculations(void)
 {
 unsigned int ucode_rev = this_cpu(cpu_sig).rev;
@@ -593,15 +596,85 @@ static bool __init retpoline_calculations(void)
 return false;
 
 /*
- * RSBA may be set by a hypervisor to indicate that we may move to a
- * processor which isn't retpoline-safe.
+ * The meaning of the RSBA and RRSBA bits have evolved over time.  The
+ * agreed upon meaning at the time of writing (May 2023) is thus:
+ *
+ * - RSBA (RSB Alternative) means that an RSB may fall back to an
+ *   alternative predictor on underflow.  Skylake uarch and later all have
+ *   this property.  Broadwell too, when running microcode versions prior
+ *   to Jan 2018.
+ *
+ * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
+ *   tagging of predictions with the mode in which they were learned.  So
+ *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
+ *
+ * - CPUs are not expected to enumerate both RSBA and RRSBA.
+ *
+ * Some parts (Broadwell) are not expected to ever enumerate this
+ * behaviour directly.  Other parts have differing enumeration with
+ * microcode version.  Fix up Xen's idea, so we can advertise them safely
+ * to guests, and so toolstacks can level a VM safety for migration.
+ *
+ * The following states exist:
+ *
+ * |   | RSBA | EIBRS | RRSBA | Notes  | Action|
+ * |---+--+---+---++---|
+ * | 1 |0 | 0 | 0 | OK (older parts)   | Maybe +RSBA   |
+ * | 2 |0 | 0 | 1 | Broken | +RSBA, -RRSBA |
+ * | 3 |0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA|
+ * | 4 |0 | 1 | 1 | OK |   |
+ * | 5 |1 | 0 | 0 | OK |   |
+ * | 6 |1 | 0 | 1 | Broken | -RRSBA|
+ * | 7 |1 | 1 | 0 | Broken | -RSBA, +RRSBA |
+ * | 8 |1 | 1 | 1 | Broken | -RSBA |
  *
+ * However, we doesn't need perfect adherence to the spec.  Identify the
+ * broken cases (so we stand a chance of spotting violated assumptions),
+ * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify
+ * "alternative predictors potentially in use".
+ */
+if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
+   : cpu_has_rrsba /* Rows 2, 6 */ )
+printk(XENLOG_ERR
+   "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, EIBRS 
%u, RRSBA %u\n",
+   boot_cpu_data.x86, boot_cpu_data.x86_model,
+   boot_cpu_data.x86_mask, ucode_rev,
+   cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);
+
+/*
  * Processors offering Enhanced IBRS are not guarenteed to be
  * repoline-safe.
  */
-if ( cpu_has_rsba || cpu_has_eibrs )
+if ( cpu_has_eibrs )
+{
+/*
+ * Prior to the August 2023 microcode, many eIBRS-capable parts did
+ * not enumerate RRSBA.
+ */
+if ( !cpu_has_rrsba )
+setup_force_cpu_cap(X86_FEATURE_RRSBA);
+
+return false;
+}
+
+/*
+ * RSBA is explicitly enumerated in some cases, but may also be set by a
+ * hypervisor to indicate that we may move to a

[PATCH v2 0/3] x86: RSBA and RRSBA handling

2023-06-01 Thread Andrew Cooper

This series deals with the hanlding of the RSBA and RRSBA bits across all
parts and all mistakes encountered in various microcode versions.

There are substantial changes from v1, following a clarification from Intel.
Importantly, CPUs are not expected to enumerate both RSBA and RRSBA, therefore
we should do the same for VMs.

Andrew Cooper (3):
  x86/spec-ctrl: Rename retpoline_safe() to retpoline_calculations()
  x86/spec-ctrl: Fix up the RSBA/RRSBA bits as appropriate
  x86/cpu-policy: Derive RSBA/RRSBA for guest policies

 xen/arch/x86/cpu-policy.c   |  53 
 xen/arch/x86/include/asm/cpufeature.h   |   1 +
 xen/arch/x86/spec_ctrl.c| 131 +---
 xen/include/public/arch-x86/cpufeatureset.h |   4 +-
 xen/tools/gen-cpuid.py  |   5 +-
 5 files changed, 172 insertions(+), 22 deletions(-)

-- 
2.30.2

[PATCH v2 1/3] x86/spec-ctrl: Rename retpoline_safe() to retpoline_calculations()

2023-06-01 Thread Andrew Cooper

This is prep work, split out to simply the diff on the following change.

 * Rename to retpoline_calculations(), and call unconditionally.  It is
   shortly going to synthesise missing enumerations required for guest safety.
 * For the model check switch statement, store the result in a variable and
   break rather than returning directly.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v2:
 * Extend the 'safe' variable to the entire switch statement.
---
 xen/arch/x86/spec_ctrl.c | 41 +---
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index cd5ea6aa52d9..daee61900afa 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -580,9 +580,10 @@ static bool __init check_smt_enabled(void)
 }
 
 /* Calculate whether Retpoline is known-safe on this CPU. */
-static bool __init retpoline_safe(void)
+static bool __init retpoline_calculations(void)
 {
 unsigned int ucode_rev = this_cpu(cpu_sig).rev;
+bool safe = false;
 
 if ( boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
 return true;
@@ -620,29 +621,31 @@ static bool __init retpoline_safe(void)
 case 0x3f: /* Haswell EX/EP */
 case 0x45: /* Haswell D */
 case 0x46: /* Haswell H */
-return true;
+safe = true;
+break;
 
 /*
  * Broadwell processors are retpoline-safe after specific microcode
  * versions.
  */
 case 0x3d: /* Broadwell */
-return ucode_rev >= 0x2a;
+safe = ucode_rev >= 0x2a;  break;
 case 0x47: /* Broadwell H */
-return ucode_rev >= 0x1d;
+safe = ucode_rev >= 0x1d;  break;
 case 0x4f: /* Broadwell EP/EX */
-return ucode_rev >= 0xb21;
+safe = ucode_rev >= 0xb21; break;
 case 0x56: /* Broadwell D */
 switch ( boot_cpu_data.x86_mask )
 {
-case 2:  return ucode_rev >= 0x15;
-case 3:  return ucode_rev >= 0x712;
-case 4:  return ucode_rev >= 0xf11;
-case 5:  return ucode_rev >= 0xe09;
+case 2:  safe = ucode_rev >= 0x15;  break;
+case 3:  safe = ucode_rev >= 0x712; break;
+case 4:  safe = ucode_rev >= 0xf11; break;
+case 5:  safe = ucode_rev >= 0xe09; break;
 default:
 printk("Unrecognised CPU stepping %#x - assuming not reptpoline 
safe\n",
boot_cpu_data.x86_mask);
-return false;
+safe = false;
+break;
 }
 break;
 
@@ -656,7 +659,8 @@ static bool __init retpoline_safe(void)
 case 0x67: /* Cannonlake? */
 case 0x8e: /* Kabylake M */
 case 0x9e: /* Kabylake D */
-return false;
+safe = false;
+break;
 
 /*
  * Atom processors before Goldmont Plus/Gemini Lake are retpoline-safe.
@@ -675,13 +679,17 @@ static bool __init retpoline_safe(void)
 case 0x5c: /* Goldmont */
 case 0x5f: /* Denverton */
 case 0x85: /* Knights Mill */
-return true;
+safe = true;
+break;
 
 default:
 printk("Unrecognised CPU model %#x - assuming not reptpoline safe\n",
boot_cpu_data.x86_model);
-return false;
+safe = false;
+break;
 }
+
+return safe;
 }
 
 /*
@@ -1114,7 +1122,7 @@ void __init init_speculation_mitigations(void)
 {
 enum ind_thunk thunk = THUNK_DEFAULT;
 bool has_spec_ctrl, ibrs = false, hw_smt_enabled;
-bool cpu_has_bug_taa;
+bool cpu_has_bug_taa, retpoline_safe;
 
 hw_smt_enabled = check_smt_enabled();
 
@@ -1140,6 +1148,9 @@ void __init init_speculation_mitigations(void)
 thunk = THUNK_JMP;
 }
 
+/* Determine if retpoline is safe on this CPU. */
+retpoline_safe = retpoline_calculations();
+
 /*
  * Has the user specified any custom BTI mitigations?  If so, follow their
  * instructions exactly and disable all heuristics.
@@ -1161,7 +1172,7 @@ void __init init_speculation_mitigations(void)
  * On all hardware, we'd like to use retpoline in preference to
  * IBRS, but only if it is safe on this hardware.
  */
-if ( retpoline_safe() )
+if ( retpoline_safe )
 thunk = THUNK_RETPOLINE;
 else if ( has_spec_ctrl )
 ibrs = true;
-- 
2.30.2

[PATCH] x86/ucode: Exit early from early_update_cache() if loading not available

2023-06-01 Thread Andrew Cooper

If for any reason early_microcode_init() concludes that no microcode loading
is available, early_update_cache() will fall over a NULL function pointer:

  (XEN) Xen call trace:
  (XEN)[] R show_code+0x91/0x18f
  (XEN)[] F show_execution_state+0x2d/0x1fc
  (XEN)[] F fatal_trap+0x87/0x19a
  (XEN)[] F init_idt_traps+0/0x1bd
  (XEN)[] F early_page_fault+0x8f/0x94
  (XEN)[<>] F 
  (XEN)[] F 
arch/x86/cpu/microcode/core.c#early_update_cache+0x11/0x74
  (XEN)[] F microcode_init_cache+0x5a/0x5c
  (XEN)[] F __start_xen+0x1e11/0x27ee
  (XEN)[] F __high_start+0x94/0xa0

which is actually parse_blob()'s use of ucode_ops.collect_cpu_info.

Skip trying to cache anything if microcode loading is unavailable.

Fixes: dc380df12acf ("x86/ucode: load microcode earlier on boot CPU")
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Alejandro Vallejo 

Found while doing something unrelated, but this is going to interact poorly
with MCU_CONTROL_DIS_MCU_LOAD.
---
 xen/arch/x86/cpu/microcode/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/arch/x86/cpu/microcode/core.c 
b/xen/arch/x86/cpu/microcode/core.c
index 5a5c0a8c70db..9029301107d6 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
@@ -789,6 +789,9 @@ int __init microcode_init_cache(unsigned long *module_map,
 {
 int rc = 0;
 
+if ( !ucode_ops.apply_microcode )
+return -ENODEV;
+
 if ( ucode_scan )
 /* Need to rescan the modules because they might have been relocated */
 microcode_scan_module(module_map, mbi);

base-commit: 59d0bf62861f5c9b317ccf89f8b5c8b4d19927ad
prerequisite-patch-id: c3f6ae7def85b63808449493e3b5185bc40c405d
prerequisite-patch-id: 59a20dfb4778c62bf512f746e36b1bea0949b0a8
prerequisite-patch-id: a70c8dd42245affe402b08cacd5872b5a32a6d69
prerequisite-patch-id: 3efc26e008858670286c173f77f8ec34ddfd9df1
prerequisite-patch-id: 5f6f7dd6029f401d13bbb87ac3bb88c15700
prerequisite-patch-id: 4133b7d49c978a89042e95f899f46c4ec4ac4498
prerequisite-patch-id: d2d3a24a650f6b1b50e279be158cdd097eb43a4b
prerequisite-patch-id: 358299b6b56983e3c069ea1f30e7cf214b0a2c54
prerequisite-patch-id: b17530cf5672ada3e7792606b7a3bef55c8aa372
prerequisite-patch-id: e9bc40cc80e61b24d90eeb7097cd9b703f0170a6
-- 
2.30.2

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Juergen Gross


On 01.06.23 16:33, Borislav Petkov wrote:

On Thu, Jun 01, 2023 at 03:22:33PM +0200, Borislav Petkov wrote:

Now lemme restart testing.


This is from another box, with the latest changes incorporated:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc1-mtrr

--- proc-mtrr.before2011-03-04 01:03:35.243994733 +0100
+++ proc-mtrr.after 2023-06-01 16:28:54.95456 +0200
@@ -1,3 +1,3 @@
  reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
  reg01: base=0x08000 ( 2048MB), size= 1024MB, count=1: write-back
-reg02: base=0x0c000 ( 3072MB), size=  256MB, count=1: write-back
+reg02: base=0x0c000 ( 3072MB), size=  128MB, count=1: write-back

Want mtrr=debug output again?



Yes, please


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Borislav Petkov

On Thu, Jun 01, 2023 at 03:22:33PM +0200, Borislav Petkov wrote:
> Now lemme restart testing.

This is from another box, with the latest changes incorporated:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc1-mtrr

--- proc-mtrr.before2011-03-04 01:03:35.243994733 +0100
+++ proc-mtrr.after 2023-06-01 16:28:54.95456 +0200
@@ -1,3 +1,3 @@
 reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
 reg01: base=0x08000 ( 2048MB), size= 1024MB, count=1: write-back
-reg02: base=0x0c000 ( 3072MB), size=  256MB, count=1: write-back
+reg02: base=0x0c000 ( 3072MB), size=  128MB, count=1: write-back

Want mtrr=debug output again?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

[PATCH v1 5/7] xenalyze: sync with vmx.h, use EXIT_REASON_VMXON

2023-06-01 Thread Olaf Hering

Signed-off-by: Olaf Hering 
---
 tools/xentrace/xenalyze.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index d2e6c77590..88c3d5f873 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -467,7 +467,7 @@ struct {
 #define EXIT_REASON_VMRESUME24
 #define EXIT_REASON_VMWRITE 25
 #define EXIT_REASON_VMXOFF  26
-#define EXIT_REASON_VMON27
+#define EXIT_REASON_VMXON   27
 #define EXIT_REASON_CR_ACCESS   28
 #define EXIT_REASON_DR_ACCESS   29
 #define EXIT_REASON_IO_INSTRUCTION  30
@@ -523,7 +523,7 @@ const char * 
hvm_vmx_exit_reason_name[HVM_VMX_EXIT_REASON_MAX] = {
 [EXIT_REASON_VMRESUME]="VMRESUME",
 [EXIT_REASON_VMWRITE]="VMWRITE",
 [EXIT_REASON_VMXOFF]="VMXOFF",
-[EXIT_REASON_VMON]="VMON",
+[EXIT_REASON_VMXON]="VMXON",
 [EXIT_REASON_CR_ACCESS]="CR_ACCESS",
 [EXIT_REASON_DR_ACCESS]="DR_ACCESS",
 [EXIT_REASON_IO_INSTRUCTION]="IO_INSTRUCTION",

[PATCH v1 7/7] xenalyze: handle more potential exit reason values from vmx.h

2023-06-01 Thread Olaf Hering

Copy and use more constants from vmx.h, to turn numbers into strings.
Adjust the REASON_MAX value accordingly.
Remove the size constraint from string array, the compiler will grow it
as needed.

Signed-off-by: Olaf Hering 
---
 tools/xentrace/xenalyze.c | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 9635ff453a..9af17d45bf 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -482,6 +482,7 @@ struct {
 #define EXIT_REASON_MCE_DURING_VMENTRY  41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS 44
+#define EXIT_REASON_EOI_INDUCED 45
 #define EXIT_REASON_ACCESS_GDTR_OR_IDTR 46
 #define EXIT_REASON_ACCESS_LDTR_OR_TR   47
 #define EXIT_REASON_EPT_VIOLATION   48
@@ -492,10 +493,18 @@ struct {
 #define EXIT_REASON_INVVPID 53
 #define EXIT_REASON_WBINVD  54
 #define EXIT_REASON_XSETBV  55
-
-#define HVM_VMX_EXIT_REASON_MAX (EXIT_REASON_XSETBV+1)
-
-const char * hvm_vmx_exit_reason_name[HVM_VMX_EXIT_REASON_MAX] = {
+#define EXIT_REASON_APIC_WRITE  56
+#define EXIT_REASON_INVPCID 58
+#define EXIT_REASON_VMFUNC  59
+#define EXIT_REASON_PML_FULL62
+#define EXIT_REASON_XSAVES  63
+#define EXIT_REASON_XRSTORS 64
+#define EXIT_REASON_BUS_LOCK74
+#define EXIT_REASON_NOTIFY  75
+
+#define HVM_VMX_EXIT_REASON_MAX (EXIT_REASON_NOTIFY+1)
+
+const char * hvm_vmx_exit_reason_name[] = {
 [EXIT_REASON_EXCEPTION_NMI]="EXCEPTION_NMI",
 [EXIT_REASON_EXTERNAL_INTERRUPT]="EXTERNAL_INTERRUPT",
 [EXIT_REASON_TRIPLE_FAULT]="TRIPLE_FAULT",
@@ -538,6 +547,9 @@ const char * 
hvm_vmx_exit_reason_name[HVM_VMX_EXIT_REASON_MAX] = {
 [EXIT_REASON_MCE_DURING_VMENTRY]="MCE_DURING_VMENTRY",
 [EXIT_REASON_TPR_BELOW_THRESHOLD]="TPR_BELOW_THRESHOLD",
 [EXIT_REASON_APIC_ACCESS]="APIC_ACCESS",
+[EXIT_REASON_EOI_INDUCED]="EOI_INDUCED",
+[EXIT_REASON_ACCESS_GDTR_OR_IDTR]="ACCESS_GDTR_OR_IDTR",
+[EXIT_REASON_ACCESS_LDTR_OR_TR]="ACCESS_LDTR_OR_TR",
 [EXIT_REASON_EPT_VIOLATION]="EPT_VIOLATION",
 [EXIT_REASON_EPT_MISCONFIG]="EPT_MISCONFIG",
 [EXIT_REASON_INVEPT]="INVEPT",
@@ -546,6 +558,14 @@ const char * 
hvm_vmx_exit_reason_name[HVM_VMX_EXIT_REASON_MAX] = {
 [EXIT_REASON_INVVPID]="INVVPID",
 [EXIT_REASON_WBINVD]="WBINVD",
 [EXIT_REASON_XSETBV]="XSETBV",
+[EXIT_REASON_APIC_WRITE]="APIC_WRITE",
+[EXIT_REASON_INVPCID]="INVPCID",
+[EXIT_REASON_VMFUNC]="VMFUNC",
+[EXIT_REASON_PML_FULL]="PML_FULL",
+[EXIT_REASON_XSAVES]="XSAVES",
+[EXIT_REASON_XRSTORS]="XRSTORS",
+[EXIT_REASON_BUS_LOCK]="BUS_LOCK",
+[EXIT_REASON_NOTIFY]="NOTIFY",
 };
 
 /* SVM data */

[PATCH v1 6/7] xenalyze: sync with vmx.h, use EXIT_REASON_MCE_DURING_VMENTRY

2023-06-01 Thread Olaf Hering

Signed-off-by: Olaf Hering 
---
 tools/xentrace/xenalyze.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 88c3d5f873..9635ff453a 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -479,7 +479,7 @@ struct {
 #define EXIT_REASON_MONITOR_TRAP_FLAG   37
 #define EXIT_REASON_MONITOR_INSTRUCTION 39
 #define EXIT_REASON_PAUSE_INSTRUCTION   40
-#define EXIT_REASON_MACHINE_CHECK   41
+#define EXIT_REASON_MCE_DURING_VMENTRY  41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS 44
 #define EXIT_REASON_ACCESS_GDTR_OR_IDTR 46
@@ -535,7 +535,7 @@ const char * 
hvm_vmx_exit_reason_name[HVM_VMX_EXIT_REASON_MAX] = {
 [EXIT_REASON_MONITOR_TRAP_FLAG]="MONITOR_TRAP_FLAG",
 [EXIT_REASON_MONITOR_INSTRUCTION]="MONITOR_INSTRUCTION",
 [EXIT_REASON_PAUSE_INSTRUCTION]="PAUSE_INSTRUCTION",
-[EXIT_REASON_MACHINE_CHECK]="MACHINE_CHECK",
+[EXIT_REASON_MCE_DURING_VMENTRY]="MCE_DURING_VMENTRY",
 [EXIT_REASON_TPR_BELOW_THRESHOLD]="TPR_BELOW_THRESHOLD",
 [EXIT_REASON_APIC_ACCESS]="APIC_ACCESS",
 [EXIT_REASON_EPT_VIOLATION]="EPT_VIOLATION",

[PATCH v1 1/7] xentrace: remove unimplemented option from man page

2023-06-01 Thread Olaf Hering

The documented option --usage worked because every unknown option
showed the help.

Signed-off-by: Olaf Hering 
---
 docs/man/xentrace.8.pod | 4 
 1 file changed, 4 deletions(-)

diff --git a/docs/man/xentrace.8.pod b/docs/man/xentrace.8.pod
index 69aef05f65..4c174a84c0 100644
--- a/docs/man/xentrace.8.pod
+++ b/docs/man/xentrace.8.pod
@@ -69,10 +69,6 @@ set event capture mask. If not specified the TRC_ALL will be 
used.
 
 =item B<-?>, B<--help>
 
-Give this help list
-
-=item B<--usage>
-
 Give a short usage message
 
 =item B<-V>, B<--version>

[PATCH v1 4/7] xenalyze: sync with vmx.h, use EXIT_REASON_VMXOFF

2023-06-01 Thread Olaf Hering

Signed-off-by: Olaf Hering 
---
 tools/xentrace/xenalyze.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 1cae055ef4..d2e6c77590 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -466,7 +466,7 @@ struct {
 #define EXIT_REASON_VMREAD  23
 #define EXIT_REASON_VMRESUME24
 #define EXIT_REASON_VMWRITE 25
-#define EXIT_REASON_VMOFF   26
+#define EXIT_REASON_VMXOFF  26
 #define EXIT_REASON_VMON27
 #define EXIT_REASON_CR_ACCESS   28
 #define EXIT_REASON_DR_ACCESS   29
@@ -522,7 +522,7 @@ const char * 
hvm_vmx_exit_reason_name[HVM_VMX_EXIT_REASON_MAX] = {
 [EXIT_REASON_VMREAD]="VMREAD",
 [EXIT_REASON_VMRESUME]="VMRESUME",
 [EXIT_REASON_VMWRITE]="VMWRITE",
-[EXIT_REASON_VMOFF]="VMOFF",
+[EXIT_REASON_VMXOFF]="VMXOFF",
 [EXIT_REASON_VMON]="VMON",
 [EXIT_REASON_CR_ACCESS]="CR_ACCESS",
 [EXIT_REASON_DR_ACCESS]="DR_ACCESS",

[PATCH v1 3/7] xenalyze: sync with vmx.h, use EXIT_REASON_PENDING_VIRT_INTR

2023-06-01 Thread Olaf Hering

Signed-off-by: Olaf Hering 
---
 tools/xentrace/xenalyze.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index a50538e9a8..1cae055ef4 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -447,7 +447,7 @@ struct {
 #define EXIT_REASON_SIPI4
 #define EXIT_REASON_IO_SMI  5
 #define EXIT_REASON_OTHER_SMI   6
-#define EXIT_REASON_PENDING_INTERRUPT   7
+#define EXIT_REASON_PENDING_VIRT_INTR   7
 #define EXIT_REASON_PENDING_VIRT_NMI8
 #define EXIT_REASON_TASK_SWITCH 9
 #define EXIT_REASON_CPUID   10
@@ -503,7 +503,7 @@ const char * 
hvm_vmx_exit_reason_name[HVM_VMX_EXIT_REASON_MAX] = {
 [EXIT_REASON_SIPI]="SIPI",
 [EXIT_REASON_IO_SMI]="IO_SMI",
 [EXIT_REASON_OTHER_SMI]="OTHER_SMI",
-[EXIT_REASON_PENDING_INTERRUPT]="PENDING_INTERRUPT",
+[EXIT_REASON_PENDING_VIRT_INTR]="PENDING_VIRT_INTR",
 [EXIT_REASON_PENDING_VIRT_NMI]="PENDING_VIRT_NMI",
 [EXIT_REASON_TASK_SWITCH]="TASK_SWITCH",
 [EXIT_REASON_CPUID]="CPUID",
@@ -4632,7 +4632,7 @@ void hvm_generic_postprocess(struct hvm_data *h)
 switch(h->exit_reason)
 {
 /* These just need us to go through the return path */
-case EXIT_REASON_PENDING_INTERRUPT:
+case EXIT_REASON_PENDING_VIRT_INTR:
 case EXIT_REASON_TPR_BELOW_THRESHOLD:
 /* Not much to log now; may need later */
 case EXIT_REASON_WBINVD:

[PATCH v1 2/7] xentrace: use correct output format for pit and rtc

2023-06-01 Thread Olaf Hering

The input values were always 32bit.

Fixes 55ee5dea32 ("xentrace: add TRC_HVM_EMUL")

Signed-off-by: Olaf Hering 
---
 tools/xentrace/formats | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index 0fcc327a40..afb5ee0112 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -211,8 +211,8 @@
 0x00802008  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  do_irq [ irq = %(1)d, began = 
%(2)dus, ended = %(3)dus ]
 
 0x00084001  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  hpet create [ tn = %(1)d, irq 
= %(2)d, delta = 0x%(4)08x%(3)08x, period = 0x%(6)08x%(5)08x ]
-0x00084002  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  pit create [ delta = 
0x%(1)016x, period = 0x%(2)016x ]
-0x00084003  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtc create [ delta = 
0x%(1)016x , period = 0x%(2)016x ]
+0x00084002  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  pit create [ delta = 
0x%(1)08x, period = 0x%(2)08x ]
+0x00084003  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  rtc create [ delta = 
0x%(1)08x, period = 0x%(2)08x ]
 0x00084004  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  vlapic create [ delta = 
0x%(2)08x%(1)08x , period = 0x%(4)08x%(3)08x, irq = %(5)d ]
 0x00084005  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  hpet destroy [ tn = %(1)d ]
 0x00084006  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  pit destroy  [ ]

[PATCH v1 0/7] xentrace changes

2023-06-01 Thread Olaf Hering

Olaf Hering (7):
  xentrace: remove unimplemented option from man page
  xentrace: use correct output format for pit and rtc
  xenalyze: sync with vmx.h, use EXIT_REASON_PENDING_VIRT_INTR
  xenalyze: sync with vmx.h, use EXIT_REASON_VMXOFF
  xenalyze: sync with vmx.h, use EXIT_REASON_VMXON
  xenalyze: sync with vmx.h, use EXIT_REASON_MCE_DURING_VMENTRY
  xenalyze: handle more potential exit reason values from vmx.h

 docs/man/xentrace.8.pod   |  4 
 tools/xentrace/formats|  4 ++--
 tools/xentrace/xenalyze.c | 46 ---
 3 files changed, 35 insertions(+), 19 deletions(-)

[libvirt test] 181066: tolerable all pass - PUSHED

2023-06-01 Thread osstest service owner

flight 181066 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181066/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 181023
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 181023
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 181023
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  c47e17689e3309e544b59f5a9eb7b9d668967787
baseline version:
 libvirt  9222f35dc6917f00d166be3bb69ac4e5ff8536f0

Last test of basis   181023  2023-05-31 04:21:47 Z1 days
Testing same since   181066  2023-06-01 04:18:54 Z0 days1 attempts


People who touched revisions under test:
  Michal Privoznik 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is a

Re: [PATCH v3 25/34] m68k: Convert various functions to use ptdescs

2023-06-01 Thread kernel test robot

Hi Vishal,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20230531]
[cannot apply to akpm-mm/mm-everything s390/features powerpc/next powerpc/fixes 
geert-m68k/for-next geert-m68k/for-linus linus/master v6.4-rc4 v6.4-rc3 
v6.4-rc2 v6.4-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Vishal-Moola-Oracle/mm-Add-PAGE_TYPE_OP-folio-functions/20230601-053454
base:   next-20230531
patch link:
https://lore.kernel.org/r/20230531213032.25338-26-vishal.moola%40gmail.com
patch subject: [PATCH v3 25/34] m68k: Convert various functions to use ptdescs
config: m68k-randconfig-r002-20230531 
(https://download.01.org/0day-ci/archive/20230601/202306011704.i8xmwkpl-...@intel.com/config)
compiler: m68k-linux-gcc (GCC) 12.3.0
reproduce (this is a W=1 build):
mkdir -p ~/bin
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/915ab62dc3315fe0a0544fccb4ee5f3ee32694b5
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Vishal-Moola-Oracle/mm-Add-PAGE_TYPE_OP-folio-functions/20230601-053454
git checkout 915ab62dc3315fe0a0544fccb4ee5f3ee32694b5
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross 
W=1 O=build_dir ARCH=m68k olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross 
W=1 O=build_dir ARCH=m68k SHELL=/bin/bash arch/m68k/mm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202306011704.i8xmwkpl-...@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from arch/m68k/include/asm/pgalloc.h:12,
from arch/m68k/mm/init.c:26:
   arch/m68k/include/asm/mcf_pgalloc.h: In function 'pgd_alloc':
>> arch/m68k/include/asm/mcf_pgalloc.h:83:59: error: 'GFP_NOWARN' undeclared 
>> (first use in this function); did you mean 'GFP_NOWAIT'?
  83 | struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA | 
GFP_NOWARN, 0);
 |   ^~
 |   GFP_NOWAIT
   arch/m68k/include/asm/mcf_pgalloc.h:83:59: note: each undeclared identifier 
is reported only once for each function it appears in
   arch/m68k/include/asm/mcf_pgalloc.h: At top level:
>> arch/m68k/include/asm/mcf_pgalloc.h:22:27: warning: 'ptdesc_address' is 
>> static but used in inline function 'pte_alloc_one_kernel' which is not static
  22 | return (pte_t *) (ptdesc_address(ptdesc));
 |   ^~
>> arch/m68k/include/asm/mcf_pgalloc.h:17:33: warning: 'pagetable_alloc' is 
>> static but used in inline function 'pte_alloc_one_kernel' which is not static
  17 | struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA | 
__GFP_ZERO, 0);
 | ^~~
>> arch/m68k/include/asm/mcf_pgalloc.h:10:24: warning: 'virt_to_ptdesc' is 
>> static but used in inline function 'pte_free_kernel' which is not static
  10 | pagetable_free(virt_to_ptdesc(pte));
 |^~
>> arch/m68k/include/asm/mcf_pgalloc.h:10:9: warning: 'pagetable_free' is 
>> static but used in inline function 'pte_free_kernel' which is not static
  10 | pagetable_free(virt_to_ptdesc(pte));
 | ^~
--
   In file included from arch/m68k/mm/mcfmmu.c:21:
   arch/m68k/include/asm/mcf_pgalloc.h: In function 'pgd_alloc':
>> arch/m68k/include/asm/mcf_pgalloc.h:83:59: error: 'GFP_NOWARN' undeclared 
>> (first use in this function); did you mean 'GFP_NOWAIT'?
  83 | struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA | 
GFP_NOWARN, 0);
 |   ^~
 |   GFP_NOWAIT
   arch/m68k/include/asm/mcf_pgalloc.h:83:59: note: each undeclared identifier 
is reported only once for each function it appears in
   arch/m68k/mm/mcfmmu.c: At top level:
   arch/m68k/mm/mcfmmu.c:36:13: warning: no previous prototype for 
'paging_init' [-Wmissing-prototypes]
  36 | void __init paging_init(voi

Re: [PATCH 2/2] x86/vPIT: account for "counter stopped" time

2023-06-01 Thread Jan Beulich

On 01.06.2023 13:48, Roger Pau Monné wrote:
> On Tue, May 30, 2023 at 05:30:40PM +0200, Jan Beulich wrote:
>> TBD: "gate" can only ever be low for chan2 (with "x86/vPIT: check/bound
>>  values loaded from state save record" [2] in place), so in
>>  principle we could get away without a new pair of arrays, but just
>>  two individual fields. At the expense of more special casing in
>>  code.
> 
> Hm, I guess we could rename to pit_set_gate_ch2 and remove the ch
> parameter.  That would be OK for me.

Well, simplifying the function is the less ugly part, so I'd be okay
doing that. But doing _just_ that feels wrong: Why would we make the
function less general when we still maintain all the data also for
the other channels, just that we don't update it. My concern was
really towards the further special casing of channel 2 that would be
required if I didn't introduce two new arrays, but just two new
fields.

>> TBD: Should we deal with other aspects of "gate low" in pit_get_out()
>>  here as well, right away? I was hoping to get away without ...
>>  (Note how the two functions also disagree in their placement of the
>>  "default" labels, even if that's largely benign when taking into
>>  account that modes 6 and 7 are transformed to 2 and 3 respectively
>>  by pit_load(). A difference would occur only before the guest first
>>  sets the mode, as pit_reset() sets it to 7.)
> 
> I'm in general afraid of doing changes here (apart from bugfixes)
> because we don't really have a good way to test them AFAIK,

Right, hence why I'm asking.

> maybe you
> do have some XTF or similar tests to exercise those paths?

I did consider making something, but I can't go the route of "try it
directly and then compare with emulation results". Yet without that
I'm not sure such a test (and the time spent putting it together) are
worth it, the more that without being able to compare I might also
end up testing some wrong behavior, simply because of not properly
understanding the somewhat scarce documentation that's available.
(I already had to resort to 30 years old hardcopy documentation to
at least stand a chance of getting things right.)

>> Other observations:
>> - Loading of new counts occurs too early in some of the modes (2/3: at
>>   end of current sequence or when gate goes high; 1/5: only when gate
>>   goes high).

Because of this ...

>> @@ -109,6 +112,7 @@ static void pit_load_count(PITState *pit
>>  pit->count_load_time[channel] = 0;
>>  else
>>  pit->count_load_time[channel] = get_guest_time(v);
>> +pit->stopped_time[channel] = 0;
> 
> Don't you need to also set count_stop_time == count_load_time in case
> the counter is disabled? (s->gate == 0).

... I think you're right, and I should do so unconditionally. In
principle I think this would need to be mode dependent.

>> @@ -181,22 +188,39 @@ static void pit_set_gate(PITState *pit,
>>  
>>  ASSERT(spin_is_locked(&pit->lock));
>>  
>> -switch ( s->mode )
>> -{
>> -default:
>> -case 0:
>> -case 4:
>> -/* XXX: just disable/enable counting */
>> -break;
>> -case 1:
>> -case 5:
>> -case 2:
>> -case 3:
>> -/* Restart counting on rising edge. */
>> -if ( s->gate < val )
>> -pit->count_load_time[channel] = get_guest_time(v);
>> -break;
>> -}
>> +if ( s->gate > val )
>> +switch ( s->mode )
>> +{
>> +case 0:
>> +case 2:
>> +case 3:
>> +case 4:
>> +/* Disable counting. */
>> +if ( !channel )
>> +destroy_periodic_time(&pit->pt0);
>> +pit->count_stop_time[channel] = get_guest_time(v);
>> +break;
>> +}
>> +
>> +if ( s->gate < val )
> 
> Shouldn't this be an else if?

They could, but they don't need to be. I ended up thinking that with the
blank line between both if()s things read slightly better. If you're
pretty convinced that's unhelpful, I'd be willing to adjust.

Jan

Re: [PATCH 1/2] x86/vPIT: re-order functions

2023-06-01 Thread Jan Beulich

On 01.06.2023 13:50, Roger Pau Monné wrote:
> On Thu, Jun 01, 2023 at 11:56:12AM +0200, Jan Beulich wrote:
>> On 01.06.2023 11:17, Roger Pau Monné wrote:
>>> On Tue, May 30, 2023 at 05:30:02PM +0200, Jan Beulich wrote:
 To avoid the need for a forward declaration of pit_load_count() in a
 subsequent change, move it earlier in the file (along with its helper
 callback).

 Signed-off-by: Jan Beulich 
>>>
>>> Reviewed-by: Roger Pau Monné 
>>
>> Thanks.
>>
>>> Just a couple of nits, which you might also noticed but decided to not
>>> fix given this is just code movement.
>>
>> Indeed, I meant this to be pure code movement. Nevertheless I'd be happy
>> to take care of style issues, if that's deemed okay in a "pure code
>> movement" patch. However, ...
> 
> It's just small style issues, so it would be OK for me.

So I've done the obvious ones. There's a further signed/unsigned issue
which isn't quite as clear whether to take care of "on the fly": The
function's 2nd and 3rd parameters both ought to be unsigned, yet
throughout the full file the same issue exists many more times. So I
guess I'll leave those untouched for now.

Jan

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Borislav Petkov

On Thu, Jun 01, 2023 at 10:19:01AM +0200, Juergen Gross wrote:
> Patch 2 wants this diff on top:

Obviously. :-)

That fixes it, thx.

Now lemme restart testing.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

[linux-linus test] 181063: regressions - FAIL

2023-06-01 Thread osstest service owner

flight 181063 linux-linus real [real]
flight 181077 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/181063/
http://logs.test-lab.xenproject.org/osstest/logs/181077/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 180278

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-libvirt-qcow2 19 guest-start/debian.repeat fail pass in 
181077-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-examine  8 reboot   fail  like 180278
 test-armhf-armhf-xl-arndale   8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278
 test-armhf-armhf-xl-credit2   8 xen-boot fail  like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-armhf-armhf-xl-multivcpu  8 xen-boot fail like 180278
 test-armhf-armhf-libvirt-raw  8 xen-boot fail  like 180278
 test-armhf-armhf-libvirt  8 xen-boot fail  like 180278
 test-armhf-armhf-libvirt-qcow2  8 xen-bootfail like 180278
 test-armhf-armhf-xl   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-vhd   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-rtds  8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 linux929ed21dfdb6ee94391db51c9eedb63314ef6847
baseline version:
 linux6c538e1adbfc696ac4747fb10d63e704344f763d

Last test of basis   180278  2023-04-16 19:41:46 Z   45 days
Failing since180281  2023-04-17 06:24:36 Z   45 days   85 attempts
Testing same since   181063  2023-06-01 00:42:42 Z0 days1 attempts


2563 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pv

Re: [PATCH v3 03/34] s390: Use pt_frag_refcount for pagetables

2023-06-01 Thread Gerald Schaefer

 On Wed, 31 May 2023 14:30:01 -0700
"Vishal Moola (Oracle)"  wrote:

> s390 currently uses _refcount to identify fragmented page tables.
> The page table struct already has a member pt_frag_refcount used by
> powerpc, so have s390 use that instead of the _refcount field as well.
> This improves the safety for _refcount and the page table tracking.
> 
> This also allows us to simplify the tracking since we can once again use
> the lower byte of pt_frag_refcount instead of the upper byte of _refcount.

This would conflict with s390 impact of pte_free_defer() work from Hugh Dickins
https://lore.kernel.org/lkml/35e983f5-7ed3-b310-d949-9ae8b130c...@google.com/
https://lore.kernel.org/lkml/6dd63b39-e71f-2e8b-7e0-83e02f3bc...@google.com/

There he uses pt_frag_refcount, or rather pt_mm in the same union, to save
the mm_struct for deferred pte_free().

I still need to look closer into both of your patch series, but so far it
seems that you have no hard functional requirement to switch from _refcount
to pt_frag_refcount here, for s390.

If this is correct, and you do not e.g. need this to make some other use
of _refcount, I would suggest to drop this patch.

Re: [PATCH] MAINTAINERS: remove xenstore related files from LIBS

2023-06-01 Thread Anthony PERARD

On Thu, Jun 01, 2023 at 12:57:56PM +0200, Jan Beulich wrote:
> On 22.05.2023 18:00, Juergen Gross wrote:
> > There is no need to have the Xenstore headers listed in the LIBS
> > section now that they have been added to the XENSTORE section.
> > 
> > Suggested-by: Jan Beulich 
> > Signed-off-by: Juergen Gross 
> 
> Anthony, Wei,
> 
> since this is taking away things from an area you're the maintainers for,
> I think it would best be acked by you.

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Juergen Gross


On 31.05.23 19:48, Borislav Petkov wrote:

On Wed, May 31, 2023 at 04:20:08PM +0200, Juergen Gross wrote:

One other note: why does mtrr_cleanup() think that using 8 instead of 6
variable MTRRs would be an "optimal setting"?


Maybe the more extensive debug output below would help answer that...


IMO it should replace the original setup only in case it is using _less_
MTRRs than before.


Right.


The attached patch will do that.


Juergen

From 7989ef9822115a708fc2ba3f7740888a350cb40f Mon Sep 17 00:00:00 2001
From: Juergen Gross 
Date: Thu, 1 Jun 2023 14:40:58 +0200
Subject: [PATCH v7] x86/mtrr: Let mtrr_cleanup() not increase number of used
 MTRRs

Today mtrr_cleanup() will always use the best found alternative MTRR
setting, even if this setting is using more variable MTRRs than the
BIOS provided setup.

Add a check that only settings with less variable MTRRs are used.

Signed-off-by: Juergen Gross 
---
V7:
- new patch
---
 arch/x86/kernel/cpu/mtrr/cleanup.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index a7cb5d32d03d..a5d331722092 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -567,7 +567,7 @@ static int __init mtrr_need_cleanup(void)
 	num_var_ranges - num[MTRR_NUM_TYPES])
 		return 0;
 
-	return 1;
+	return num_var_ranges - num[MTRR_NUM_TYPES];
 }
 
 static unsigned long __initdata range_sums;
@@ -673,6 +673,7 @@ int __init mtrr_cleanup(void)
 	u64 chunk_size, gran_size;
 	mtrr_type type;
 	int index_good;
+	int num_used;
 	int i;
 
 	if (!cpu_feature_enabled(X86_FEATURE_MTRR) || enable_mtrr_cleanup < 1)
@@ -693,7 +694,8 @@ int __init mtrr_cleanup(void)
 	}
 
 	/* Check if we need handle it and can handle it: */
-	if (!mtrr_need_cleanup())
+	num_used = mtrr_need_cleanup();
+	if (!num_used)
 		return 0;
 
 	/* Print original var MTRRs at first, for debugging: */
@@ -728,6 +730,10 @@ int __init mtrr_cleanup(void)
 		mtrr_print_out_one_result(i);
 
 		if (!result[i].bad) {
+			if (result[i].num_reg >= num_used) {
+Dprintk("BIOS provided MTRR setting is better than found one\n");
+return 0;
+			}
 			set_var_mtrr_all();
 			Dprintk("New variable MTRRs\n");
 			print_out_mtrr_range_state();
@@ -762,8 +768,12 @@ int __init mtrr_cleanup(void)
 	index_good = mtrr_search_optimal_index();
 
 	if (index_good != -1) {
-		pr_info("Found optimal setting for mtrr clean up\n");
 		i = index_good;
+		if (result[i].num_reg >= num_used) {
+			Dprintk("BIOS provided MTRR setting is better than found one\n");
+			return 0;
+		}
+		pr_info("Found optimal setting for mtrr clean up\n");
 		mtrr_print_out_one_result(i);
 
 		/* Convert ranges to var ranges state: */
-- 
2.35.3



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

[ovmf test] 181076: all pass - PUSHED

2023-06-01 Thread osstest service owner

flight 181076 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181076/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 1df6658bcbc4cade29a8763808a9804e5d449046
baseline version:
 ovmf c1e853769046b322690ad336fdb98966757e7414

Last test of basis   181072  2023-06-01 09:10:44 Z0 days
Testing same since   181076  2023-06-01 11:12:17 Z0 days1 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Liming Gao 
  Sunil V L 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   c1e8537690..1df6658bcb  1df6658bcbc4cade29a8763808a9804e5d449046 -> 
xen-tested-master

[PATCH v3 3/3] cmdline: parse multiple instances of the vga option

2023-06-01 Thread Roger Pau Monne

Parse all instances of the vga= option on the command line, in order
to always enforce the last selection on the command line.

Note that it's not safe to parse just the last occurrence of the vga=
option, as then a command line with `vga=current vga=keep` would
result in current being ignored.

Adjust the command line documentation to describe the new behavior.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - New in this version.
---
Build tested only, as I don't have a system that does legacy boot and
has VGA output I can check.

It's mostly encapsulating the current code inside of a while loop and
adding an extra else if for the "ask" option, there's a lot of
indentation changes.
---
 docs/misc/xen-command-line.pandoc |  3 ++
 xen/arch/x86/boot/cmdline.c   | 85 +++
 2 files changed, 45 insertions(+), 43 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index e0b89b7d3319..8cf2f3423d47 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2628,6 +2628,9 @@ with the specified width, height and depth.
 `ask` option.  (N.B menu modes are displayed in hex, so ``
 should be a hexadecimal number)
 
+Note that all the occurrences of the vga option in the command line are parsed,
+and hence later occurrences can overwrite selections done by prior ones.
+
 The optional `keep` parameter causes Xen to continue using the vga
 console even after dom0 has been started.  The default behaviour is to
 relinquish control to dom0.
diff --git a/xen/arch/x86/boot/cmdline.c b/xen/arch/x86/boot/cmdline.c
index fc11c6d3c5c4..511e77e0c2b5 100644
--- a/xen/arch/x86/boot/cmdline.c
+++ b/xen/arch/x86/boot/cmdline.c
@@ -277,59 +277,58 @@ static u16 rows2vmode(unsigned int rows)
 
 static void vga_parse(const char *cmdline, early_boot_opts_t *ebo)
 {
-const char *c;
-unsigned int tmp, vesa_depth, vesa_height, vesa_width;
-
-c = find_opt(cmdline, "vga=", true);
-
-if ( !c )
-return;
+const char *c = cmdline;
 
-ebo->boot_vid_mode = ASK_VGA;
-
-if ( !strmaxcmp(c, "current", delim_chars_comma) )
-ebo->boot_vid_mode = VIDEO_CURRENT_MODE;
-else if ( !strsubcmp(c, "text-80x") )
-{
-c += strlen("text-80x");
-ebo->boot_vid_mode = rows2vmode(strtoui(c, delim_chars_comma, NULL));
-}
-else if ( !strsubcmp(c, "gfx-") )
+while ( (c = find_opt(c, "vga=", true)) != NULL )
 {
-vesa_width = strtoui(c + strlen("gfx-"), "x", &c);
+unsigned int tmp, vesa_depth, vesa_height, vesa_width;
 
-if ( vesa_width > U16_MAX )
-return;
+if ( !strmaxcmp(c, "current", delim_chars_comma) )
+ebo->boot_vid_mode = VIDEO_CURRENT_MODE;
+else if ( !strsubcmp(c, "text-80x") )
+{
+c += strlen("text-80x");
+ebo->boot_vid_mode = rows2vmode(strtoui(c, delim_chars_comma, 
NULL));
+}
+else if ( !strsubcmp(c, "gfx-") )
+{
+vesa_width = strtoui(c + strlen("gfx-"), "x", &c);
 
-/*
- * Increment c outside of strtoui() because otherwise some
- * compiler may complain with following message:
- * warning: operation on 'c' may be undefined.
- */
-++c;
-vesa_height = strtoui(c, "x", &c);
+if ( vesa_width > U16_MAX )
+return;
 
-if ( vesa_height > U16_MAX )
-return;
+/*
+ * Increment c outside of strtoui() because otherwise some
+ * compiler may complain with following message:
+ * warning: operation on 'c' may be undefined.
+ */
+++c;
+vesa_height = strtoui(c, "x", &c);
 
-vesa_depth = strtoui(++c, delim_chars_comma, NULL);
+if ( vesa_height > U16_MAX )
+return;
 
-if ( vesa_depth > U16_MAX )
-return;
+vesa_depth = strtoui(++c, delim_chars_comma, NULL);
 
-ebo->vesa_width = vesa_width;
-ebo->vesa_height = vesa_height;
-ebo->vesa_depth = vesa_depth;
-ebo->boot_vid_mode = VIDEO_VESA_BY_SIZE;
-}
-else if ( !strsubcmp(c, "mode-") )
-{
-tmp = strtoui(c + strlen("mode-"), delim_chars_comma, NULL);
+if ( vesa_depth > U16_MAX )
+return;
 
-if ( tmp > U16_MAX )
-return;
+ebo->vesa_width = vesa_width;
+ebo->vesa_height = vesa_height;
+ebo->vesa_depth = vesa_depth;
+ebo->boot_vid_mode = VIDEO_VESA_BY_SIZE;
+}
+else if ( !strsubcmp(c, "mode-") )
+{
+tmp = strtoui(c + strlen("mode-"), delim_chars_comma, NULL);
 
-ebo->boot_vid_mode = tmp;
+if ( tmp > U16_MAX )
+return;
+
+ebo->boot_vid_mode = tmp;
+}
+else if ( !strsubcmp(c, "ask") )
+ebo->boot_vid_mode = ASK_

[PATCH v3 2/3] multiboot2: do not set StdOut mode unconditionally

2023-06-01 Thread Roger Pau Monne

Only initialize StdOut if the current StdOut mode is unusable.  This
avoids forcefully switching StdOut to the maximum supported
resolution, and thus very likely changing the GOP mode without having
first parsed the command line options.

Signed-off-by: Roger Pau Monné 
---
The code is very similar to the approach suggested by Jan, please let
me know if you would be OK with your suggested-by tag added.
---
Changes since v2:
 - Use approach suggested by Jan.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/efi/efi-boot.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 003ef037bf07..5314f4293b12 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -820,7 +820,13 @@ void __init efi_multiboot2(EFI_HANDLE ImageHandle, 
EFI_SYSTEM_TABLE *SystemTable
 
 efi_init(ImageHandle, SystemTable);
 
-efi_console_set_mode();
+if ( StdOut->QueryMode(StdOut, StdOut->Mode->Mode,
+   &cols, &rows) != EFI_SUCCESS )
+/*
+ * If active StdOut mode is invalid init ConOut (StdOut) to the max
+ * supported size.
+ */
+efi_console_set_mode();
 
 if ( StdOut->QueryMode(StdOut, StdOut->Mode->Mode,
&cols, &rows) == EFI_SUCCESS )
-- 
2.40.0

[PATCH v3 1/3] multiboot2: parse vga= option when setting GOP mode

2023-06-01 Thread Roger Pau Monne

Introduce support for passing the command line to the efi_multiboot2()
helper, and parse the vga= option if present.

Add support for the 'gfx' and 'current' vga options, ignore the 'keep'
option, and print a warning message about other options not being
currently implemented.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Do not parse console=.
 - Allow width or height to be 0 as long as the gfx- option is well
   formed.

Changes since v1:
 - Do not return the last occurrence of a command line.
 - Rearrange the code for assembly processing of the cmdline and use
   lea.
 - Merge patches handling console= and vga= together.
---
 xen/arch/x86/boot/head.S  | 13 ++-
 xen/arch/x86/efi/efi-boot.h   | 61 ++-
 xen/arch/x86/x86_64/asm-offsets.c |  1 +
 3 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 09bebf8635d0..aa443574d26f 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -226,9 +226,10 @@ __efi64_mb2_start:
 jmp x86_32_switch
 
 .Lefi_multiboot2_proto:
-/* Zero EFI SystemTable and EFI ImageHandle addresses. */
+/* Zero EFI SystemTable, EFI ImageHandle addresses and cmdline. */
 xor %esi,%esi
 xor %edi,%edi
+xor %edx,%edx
 
 /* Skip Multiboot2 information fixed part. */
 lea (MB2_fixed_sizeof+MULTIBOOT2_TAG_ALIGN-1)(%rbx),%ecx
@@ -266,6 +267,13 @@ __efi64_mb2_start:
 cmove   MB2_efi64_ih(%rcx),%rdi
 je  .Lefi_mb2_next_tag
 
+/* Get command line from Multiboot2 information. */
+cmpl$MULTIBOOT2_TAG_TYPE_CMDLINE,MB2_tag_type(%rcx)
+jne .Lno_cmdline
+lea MB2_tag_string(%rcx),%rdx
+jmp .Lefi_mb2_next_tag
+.Lno_cmdline:
+
 /* Is it the end of Multiboot2 information? */
 cmpl$MULTIBOOT2_TAG_TYPE_END,MB2_tag_type(%rcx)
 je  .Lrun_bs
@@ -329,7 +337,8 @@ __efi64_mb2_start:
 
 /*
  * efi_multiboot2() is called according to System V AMD64 ABI:
- *   - IN:  %rdi - EFI ImageHandle, %rsi - EFI SystemTable.
+ *   - IN:  %rdi - EFI ImageHandle, %rsi - EFI SystemTable,
+ *  %rdx - MB2 cmdline
  */
 callefi_multiboot2
 
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index c94e53d139a3..003ef037bf07 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -786,7 +786,30 @@ static bool __init 
efi_arch_use_config_file(EFI_SYSTEM_TABLE *SystemTable)
 
 static void __init efi_arch_flush_dcache_area(const void *vaddr, UINTN size) { 
}
 
-void __init efi_multiboot2(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE 
*SystemTable)
+/* Return the first occurrence of opt in cmd. */
+static const char __init *get_option(const char *cmd, const char *opt)
+{
+const char *s = cmd, *o = NULL;
+
+if ( !cmd || !opt )
+return NULL;
+
+while ( (s = strstr(s, opt)) != NULL )
+{
+if ( s == cmd || *(s - 1) == ' ' )
+{
+o = s + strlen(opt);
+break;
+}
+
+s += strlen(opt);
+}
+
+return o;
+}
+
+void __init efi_multiboot2(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE 
*SystemTable,
+   const char *cmdline)
 {
 EFI_GRAPHICS_OUTPUT_PROTOCOL *gop;
 EFI_HANDLE gop_handle;
@@ -807,7 +830,41 @@ void __init efi_multiboot2(EFI_HANDLE ImageHandle, 
EFI_SYSTEM_TABLE *SystemTable
 
 if ( gop )
 {
-gop_mode = efi_find_gop_mode(gop, 0, 0, 0);
+const char *last = cmdline;
+unsigned int width = 0, height = 0, depth = 0;
+bool keep_current = false;
+
+while ( (last = get_option(last, "vga=")) != NULL )
+{
+if ( !strncmp(last, "gfx-", 4) )
+{
+width = simple_strtoul(last + 4, &last, 10);
+if ( *last == 'x' )
+height = simple_strtoul(last + 1, &last, 10);
+if ( *last == 'x' )
+depth = simple_strtoul(last + 1, &last, 10);
+if ( *last != ' ' && *last != '\t' && *last != '\0' &&
+ *last != ',' )
+width = height = depth = 0;
+keep_current = false;
+}
+else if ( !strncmp(last, "current", 7) )
+keep_current = true;
+else if ( !strncmp(last, "keep", 4) )
+{
+/* Ignore. */
+}
+else
+{
+/* Fallback to defaults if unimplemented. */
+width = height = depth = 0;
+keep_current = false;
+PrintStr(L"Warning: Cannot use selected vga option.\r\n");
+}
+}
+
+if ( !keep_current )
+gop_mode = efi_find_gop_mode(gop, width, height, depth);
 
 efi_arch_edid(gop_handle);
 }
diff -

[PATCH v3 0/3] x86/gfx: early boot improvements

2023-06-01 Thread Roger Pau Monne

Hello,

The following series contains some fixes and improvements related to
graphics usage when booting Xen.

Proposed patches fix some shortcomings when using multiboot2, like the
ignoring of the vga= parameter and forcefully switching the console to
the maximum supported resolution.

Thanks, Roger.

Roger Pau Monne (3):
  multiboot2: parse vga= option when setting GOP mode
  multiboot2: do not set StdOut mode unconditionally
  cmdline: parse multiple instances of the vga option

 docs/misc/xen-command-line.pandoc |  3 ++
 xen/arch/x86/boot/cmdline.c   | 85 +++
 xen/arch/x86/boot/head.S  | 13 -
 xen/arch/x86/efi/efi-boot.h   | 69 +++--
 xen/arch/x86/x86_64/asm-offsets.c |  1 +
 5 files changed, 123 insertions(+), 48 deletions(-)

-- 
2.40.0

Re: [PATCH v3 0/6] block: add blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

On Tue, May 30, 2023 at 02:09:53PM -0400, Stefan Hajnoczi wrote:
> v3
> - Patch 5: Mention why dev_max_batch condition was dropped [Stefano]
> v2
> - Patch 1: "is not be freed" -> "is not freed" [Eric]
> - Patch 2: Remove unused nvme_process_completion_queue_plugged trace event
>   [Stefano]
> - Patch 3: Add missing #include and fix blkio_unplug_fn() prototype [Stefano]
> - Patch 4: Removed whitespace hunk [Eric]
> 
> The existing blk_io_plug() API is not block layer multi-queue friendly because
> the plug state is per-BlockDriverState.
> 
> Change blk_io_plug()'s implementation so it is thread-local. This is done by
> introducing the blk_io_plug_call() function that block drivers use to batch
> calls while plugged. It is relatively easy to convert block drivers from
> .bdrv_co_io_plug() to blk_io_plug_call().
> 
> Random read 4KB performance with virtio-blk on a host NVMe block device:
> 
> iodepth   iops   change vs today
> 145612   -4%
> 287967   +2%
> 4   129872   +0%
> 8   171096   -3%
> 16  194508   -4%
> 32  208947   -1%
> 64  217647   +0%
> 128 229629   +0%
> 
> The results are within the noise for these benchmarks. This is to be expected
> because the plugging behavior for a single thread hasn't changed in this patch
> series, only that the state is thread-local now.
> 
> The following graph compares several approaches:
> https://vmsplice.net/~stefan/blk_io_plug-thread-local.png
> - v7.2.0: before most of the multi-queue block layer changes landed.
> - with-blk_io_plug: today's post-8.0.0 QEMU.
> - blk_io_plug-thread-local: this patch series.
> - no-blk_io_plug: what happens when we simply remove plugging?
> - call-after-dispatch: what if we integrate plugging into the event loop? I
>   decided against this approach in the end because it's more likely to
>   introduce performance regressions since I/O submission is deferred until the
>   end of the event loop iteration.
> 
> Aside from the no-blk_io_plug case, which bottlenecks much earlier than the
> others, we see that all plugging approaches are more or less equivalent in 
> this
> benchmark. It is also clear that QEMU 8.0.0 has lower performance than 7.2.0.
> 
> The Ansible playbook, fio results, and a Jupyter notebook are available here:
> https://github.com/stefanha/qemu-perf/tree/remove-blk_io_plug
> 
> Stefan Hajnoczi (6):
>   block: add blk_io_plug_call() API
>   block/nvme: convert to blk_io_plug_call() API
>   block/blkio: convert to blk_io_plug_call() API
>   block/io_uring: convert to blk_io_plug_call() API
>   block/linux-aio: convert to blk_io_plug_call() API
>   block: remove bdrv_co_io_plug() API
> 
>  MAINTAINERS   |   1 +
>  include/block/block-io.h  |   3 -
>  include/block/block_int-common.h  |  11 ---
>  include/block/raw-aio.h   |  14 ---
>  include/sysemu/block-backend-io.h |  13 +--
>  block/blkio.c |  43 
>  block/block-backend.c |  22 -
>  block/file-posix.c|  38 ---
>  block/io.c|  37 ---
>  block/io_uring.c  |  44 -
>  block/linux-aio.c |  41 +++-
>  block/nvme.c  |  44 +++--
>  block/plug.c  | 159 ++
>  hw/block/dataplane/xen-block.c|   8 +-
>  hw/block/virtio-blk.c |   4 +-
>  hw/scsi/virtio-scsi.c |   6 +-
>  block/meson.build |   1 +
>  block/trace-events|   6 +-
>  18 files changed, 239 insertions(+), 256 deletions(-)
>  create mode 100644 block/plug.c
> 
> -- 
> 2.40.1
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Juergen Gross


On 01.06.23 14:48, Borislav Petkov wrote:

On Thu, Jun 01, 2023 at 08:39:17AM +0200, Juergen Gross wrote:

Does this translate to: "we should remove that cleanup crap"? I'd be
positive to that. :-)


Why, what's wrong with that thing?



Why do you need it if you don't think adding MTRRs dynamically is
important?

Having a sub-optimal MTRR setup doesn't matter unless you are running
out of MTRRs to use. When you are not adding MTRRs, you can't run out
of them.

This in turn means you don't need mtrr_cleanup().


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Borislav Petkov

On Thu, Jun 01, 2023 at 08:39:17AM +0200, Juergen Gross wrote:
> Does this translate to: "we should remove that cleanup crap"? I'd be
> positive to that. :-)

Why, what's wrong with that thing?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] xen/arm: debug-pl011: Use 32-bit accessors for broader compatibility

2023-06-01 Thread Michal Orzel



On 01/06/2023 14:17, Julien Grall wrote:
>   
> 
> 
> 
> 
> On Thu, 1 Jun 2023 at 13:48, Michal Orzel  > wrote:
> 
> Hi Julien,
> 
> On 01/06/2023 13:12, Julien Grall wrote:
> >       
> >
> >
> > Hi,
> >
> > Sorry for the formatting.
> >
> > On Thu, 1 Jun 2023 at 12:31, Michal Orzel    >> wrote:
> >
> >     Hi Bertrand,
> >
> >     On 01/06/2023 12:19, Bertrand Marquis wrote:
> >     >
> >     >
> >     > Hi Michal,
> >     >
> >     >> On 1 Jun 2023, at 10:50, Michal Orzel    >> wrote:
> >     >>
> >     >> There are implementations of the PL011 that can only handle 
> 32-bit
> >     >> accesses (i.e. no 16-bit or 8-bit), usually advertised by 
> 'reg-io-width'
> >     >> dt property set to 4. On such UARTs, the current early printk 
> code for
> >     >> arm64 does not work. To fix this issue, make all the accesses to 
> be 32-bit
> >     >> by using ldr, str without a size field. This makes it possible 
> to use
> >     >> early printk on such platforms, while all the other 
> implementations should
> >     >> generally cope with 32-bit accesses. In case they do not, they 
> would
> >     >> already fail as we explicitly use writel/readl in the runtime 
> driver to
> >     >> maintain broader compatibility and to be SBSAv2 compliant. 
> Therefore, this
> >     >> change makes the runtime/early handling consistent (also it 
> matches the
> >     >> arm32 debug-pl011 code).
> >     >>
> >     >> Signed-off-by: Michal Orzel    >>
> >     >> ---
> >     >> xen/arch/arm/arm64/debug-pl011.inc | 8 
> >     >> 1 file changed, 4 insertions(+), 4 deletions(-)
> >     >>
> >     >> diff --git a/xen/arch/arm/arm64/debug-pl011.inc 
> b/xen/arch/arm/arm64/debug-pl011.inc
> >     >> index 6d60e78c8ba3..80eb8fdc1ec7 100644
> >     >> --- a/xen/arch/arm/arm64/debug-pl011.inc
> >     >> +++ b/xen/arch/arm/arm64/debug-pl011.inc
> >     >> @@ -25,9 +25,9 @@
> >     >>  */
> >     >> .macro early_uart_init xb, c
> >     >>         mov   x\c, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE 
> % 16)
> >     >> -        strh  w\c, [\xb, #FBRD]      /* -> UARTFBRD (Baud 
> divisor fraction) */
> >     >> +        str   w\c, [\xb, #FBRD]      /* -> UARTFBRD (Baud 
> divisor fraction) */
> >     >>         mov   x\c, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE 
> / 16)
> >     >> -        strh  w\c, [\xb, #IBRD]      /* -> UARTIBRD (Baud 
> divisor integer) */
> >     >> +        str   w\c, [\xb, #IBRD]      /* -> UARTIBRD (Baud 
> divisor integer) */
> >     >>         mov   x\c, #WLEN_8           /* 8n1 */
> >     >>         str   w\c, [\xb, #LCR_H]     /* -> UARTLCR_H (Line 
> control) */
> >     >>         ldr   x\c, =(RXE | TXE | UARTEN)
> >     >> @@ -41,7 +41,7 @@
> >     >>  */
> >     >> .macro early_uart_ready xb, c
> >     >> 1:
> >     >> -        ldrh  w\c, [\xb, #FR]        /* <- UARTFR (Flag 
> register) */
> >     >> +        ldr   w\c, [\xb, #FR]        /* <- UARTFR (Flag 
> register) */
> >     >>         tst   w\c, #BUSY             /* Check BUSY bit */
> >     >>         b.ne  >  1b       
>               /* Wait for the UART to be ready */
> >     >> .endm
> >     >> @@ -52,7 +52,7 @@
> >     >>  * wt: register which contains the character to transmit
> >     >>  */
> >     >> .macro early_uart_transmit xb, wt
> >     >> -        strb  \wt, [\xb, #DR]        /* -> UARTDR (Data 
> Register) */
> >     >> +        str   \wt, [\xb, #DR]        /* -> UARTDR (Data 
> Register) */
> >     >
> >     > Is it really ok to drop the 8bit access here ?
> >     It is not only ok, it is necessary. Otherwise it won't work on the 
> above mentioned UARTs (they can only perform 32-bit access).
> >
> >
> > IIRC some compilers will complain because you use wN with “str”.
> Hmm, I would expect it to be totally ok as the size is determined by the 
> reg name. Any reference?
> 
> 
> I don’t have the spec with me. I will have a look on Monday and reply back 
> here.
> 
> 
> 
> >
> >
> >     And following to what I wrote in commit msg:
> >     - we use str already in arm32 which results in 32-bit access
> >
> >
> >     - we use reald/writel that end up as str/ldr in runtime driver
> >
> >
> >     - we are down to SBSAv2 spec that runtime driver follows (meaning 
> we can use early printk for SBSA too)
> >
> >
> > The runtime driver is meant to

[xen-unstable-smoke test] 181074: tolerable all pass - PUSHED

2023-06-01 Thread osstest service owner

flight 181074 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181074/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  59d0bf62861f5c9b317ccf89f8b5c8b4d19927ad
baseline version:
 xen  dc98fa74446e5abe417e5ba9a6a632b50444cfa1

Last test of basis   181054  2023-05-31 19:00:25 Z0 days
Testing same since   181074  2023-06-01 10:00:27 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   dc98fa7444..59d0bf6286  59d0bf62861f5c9b317ccf89f8b5c8b4d19927ad -> smoke

Re: [PATCH] xen/arm: debug-pl011: Use 32-bit accessors for broader compatibility

2023-06-01 Thread Julien Grall

On Thu, 1 Jun 2023 at 13:48, Michal Orzel  wrote:

> Hi Julien,
>
> On 01/06/2023 13:12, Julien Grall wrote:
> >
> >
> >
> > Hi,
> >
> > Sorry for the formatting.
> >
> > On Thu, 1 Jun 2023 at 12:31, Michal Orzel  michal.or...@amd.com>> wrote:
> >
> > Hi Bertrand,
> >
> > On 01/06/2023 12:19, Bertrand Marquis wrote:
> > >
> > >
> > > Hi Michal,
> > >
> > >> On 1 Jun 2023, at 10:50, Michal Orzel  > wrote:
> > >>
> > >> There are implementations of the PL011 that can only handle 32-bit
> > >> accesses (i.e. no 16-bit or 8-bit), usually advertised by
> 'reg-io-width'
> > >> dt property set to 4. On such UARTs, the current early printk
> code for
> > >> arm64 does not work. To fix this issue, make all the accesses to
> be 32-bit
> > >> by using ldr, str without a size field. This makes it possible to
> use
> > >> early printk on such platforms, while all the other
> implementations should
> > >> generally cope with 32-bit accesses. In case they do not, they
> would
> > >> already fail as we explicitly use writel/readl in the runtime
> driver to
> > >> maintain broader compatibility and to be SBSAv2 compliant.
> Therefore, this
> > >> change makes the runtime/early handling consistent (also it
> matches the
> > >> arm32 debug-pl011 code).
> > >>
> > >> Signed-off-by: Michal Orzel  michal.or...@amd.com>>
> > >> ---
> > >> xen/arch/arm/arm64/debug-pl011.inc | 8 
> > >> 1 file changed, 4 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/xen/arch/arm/arm64/debug-pl011.inc
> b/xen/arch/arm/arm64/debug-pl011.inc
> > >> index 6d60e78c8ba3..80eb8fdc1ec7 100644
> > >> --- a/xen/arch/arm/arm64/debug-pl011.inc
> > >> +++ b/xen/arch/arm/arm64/debug-pl011.inc
> > >> @@ -25,9 +25,9 @@
> > >>  */
> > >> .macro early_uart_init xb, c
> > >> mov   x\c, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE
> % 16)
> > >> -strh  w\c, [\xb, #FBRD]  /* -> UARTFBRD (Baud
> divisor fraction) */
> > >> +str   w\c, [\xb, #FBRD]  /* -> UARTFBRD (Baud
> divisor fraction) */
> > >> mov   x\c, #(7372800 / CONFIG_EARLY_UART_PL011_BAUD_RATE
> / 16)
> > >> -strh  w\c, [\xb, #IBRD]  /* -> UARTIBRD (Baud
> divisor integer) */
> > >> +str   w\c, [\xb, #IBRD]  /* -> UARTIBRD (Baud
> divisor integer) */
> > >> mov   x\c, #WLEN_8   /* 8n1 */
> > >> str   w\c, [\xb, #LCR_H] /* -> UARTLCR_H (Line
> control) */
> > >> ldr   x\c, =(RXE | TXE | UARTEN)
> > >> @@ -41,7 +41,7 @@
> > >>  */
> > >> .macro early_uart_ready xb, c
> > >> 1:
> > >> -ldrh  w\c, [\xb, #FR]/* <- UARTFR (Flag
> register) */
> > >> +ldr   w\c, [\xb, #FR]/* <- UARTFR (Flag
> register) */
> > >> tst   w\c, #BUSY /* Check BUSY bit */
> > >> b.ne   1b /* Wait for
> the UART to be ready */
> > >> .endm
> > >> @@ -52,7 +52,7 @@
> > >>  * wt: register which contains the character to transmit
> > >>  */
> > >> .macro early_uart_transmit xb, wt
> > >> -strb  \wt, [\xb, #DR]/* -> UARTDR (Data
> Register) */
> > >> +str   \wt, [\xb, #DR]/* -> UARTDR (Data
> Register) */
> > >
> > > Is it really ok to drop the 8bit access here ?
> > It is not only ok, it is necessary. Otherwise it won't work on the
> above mentioned UARTs (they can only perform 32-bit access).
> >
> >
> > IIRC some compilers will complain because you use wN with “str”.
> Hmm, I would expect it to be totally ok as the size is determined by the
> reg name. Any reference?


I don’t have the spec with me. I will have a look on Monday and reply back
here.


>
> >
> >
> > And following to what I wrote in commit msg:
> > - we use str already in arm32 which results in 32-bit access
> >
> >
> > - we use reald/writel that end up as str/ldr in runtime driver
> >
> >
> > - we are down to SBSAv2 spec that runtime driver follows (meaning we
> can use early printk for SBSA too)
> >
> >
> > The runtime driver is meant to follow the PL011 spec first and may have
> some adaptation for SBSA.
> >
> >
> > - this way we support broader list of PL011s consistently (i.e. both
> early and runtime driver works as oppose to only runtime)
> >
> >
> >  I am not sure I agree here. You are focussing on HW that only support
> 32-bit access. And, AFAICT this shouldn’t be the norm.
> I'm focusing on supporting wider range of devices.
> At the moment Xen PL011 runtime makes 32-bit accesses while early code
> makes 8/16-bit accesses (arm32 uses 32-bit only as well).
> So my patch can only improve things and not make them worse. In case of
> some very old legacy device that cannot cope with 32-bit accesses,
> such device would not work anyway with the runtim

1 2 >

1 - 100 of 139 matches

Mail list logo