date:20210125

Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-25 Thread Jan Beulich

On 25.01.2021 18:30, Ian Jackson wrote:
> Jan Beulich writes ("Re: [PATCH v2.5 1/5] libxenguest: support zstd 
> compressed kernels"):
>> On 25.01.2021 17:17, Ian Jackson wrote:
>>> I think we had concluded not to print a warning ?
>>
>> Yes. Even in the projected new form of using the construct I
>> don't intend to change the description's wording, as the
>> intended use of [true] still looks like that can't be intended
>> usage. IOW my remark extended beyond the warning; I'm sorry if
>> this did end up confusing because you were referring to just
>> the warning.
> 
> I'm afraid I don't understand what you mean.  In particular, what you
> mean by "the intended use of [true] still looks like that can't be
> intended usage".
> 
>   the intended {by whom for what puropose?} use of [true] still looks
>   like that {what?} can't be intended {by whom?} usage
> 
> I have the feeling that I have totally failed to grasp your mental
> model, which naturally underlies your comments.
> 
> Do you mean that with "true" for the 4th argument, the printed output
> is not correct, in the failure case ?  Maybe it needs a call to AC_MSG
> or something (but AIUI most of these PKG_* macros ought to do that for
> us).  I'm just guessing at your meaing here...

Well, I'm afraid I'm ending up confusing you because I'm confused
about the possible intentions here. My initial attempt to avoid
configure failing was to specify [] as the 4th argument. This, to
me, would have felt the half-way natural indication that I don't
mean anything to be done in the failure case, neither autoconf's
default nor anything else. [true], otoh, already feels like a
workaround for some shortcoming.

Anyway - I guess we should continue from v3, which I hope to post
later this morning.

Jan

Re: [PATCH v7 04/10] xen/memory: Add a vmtrace_buf resource type

2021-01-25 Thread Jan Beulich

On 25.01.2021 17:31, Jan Beulich wrote:
> On 21.01.2021 22:27, Andrew Cooper wrote:
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -1068,11 +1068,35 @@ static unsigned int resource_max_frames(const struct 
>> domain *d,
>>  case XENMEM_resource_grant_table:
>>  return gnttab_resource_max_frames(d, id);
>>  
>> +case XENMEM_resource_vmtrace_buf:
>> +return d->vmtrace_frames;
>> +
>>  default:
>>  return arch_resource_max_frames(d, type, id);
>>  }
>>  }
>>  
>> +static int acquire_vmtrace_buf(
>> +struct domain *d, unsigned int id, unsigned long frame,
>> +unsigned int nr_frames, xen_pfn_t mfn_list[])
>> +{
>> +const struct vcpu *v = domain_vcpu(d, id);
>> +unsigned int i;
>> +mfn_t mfn;
>> +
>> +if ( !v || !v->vmtrace.buf ||
>> + nr_frames > d->vmtrace_frames ||
>> + (frame + nr_frames) > d->vmtrace_frames )
>> +return -EINVAL;
> 
> 
> I think that for this to guard against overflow, the first nr_frames
> needs to be replaced by frame (as having the wider type), or else a
> very large value of frame coming in will not yield the intended
> -EINVAL.

Actually, besides this then wanting to be >= instead of >, this
wouldn't take care of the 32-bit case (or more generally the
sizeof(long) == sizeof(int) one). So I think you want

if ( !v || !v->vmtrace.buf ||
 (frame + nr_frames) < frame ||
 (frame + nr_frames) > d->vmtrace_frames )
return -EINVAL;

> If you agree, with this changed,
> Reviewed-by: Jan Beulich 

This holds.

Jan

[linux-5.4 test] 158616: regressions - FAIL

2021-01-25 Thread osstest service owner

flight 158616 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158616/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-dom0pvh-xl-intel  8 xen-bootfail REGR. vs. 158387

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 18 
guest-start/debianhvm.repeat fail pass in 158609

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 158387
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 158387
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 158387
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 158387
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 158387
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 158387
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 158387
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 158387
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 158387
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 158387
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 158387
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass

version targeted for testing:
 linux09f983f0c7fc0db79a5f6c883ec3510d424c369c
baseline version:
 linuxa829146c3fdcf6d0b76d9c54556a223820f1f73b

Last test of basis   158387  2021-01-12 19:40:06 Z   13 days
Failing since158473  2021-01-17 13:42:20 Z8 days   14 attempts
Testing same since   158593  2021-01-23 21:09:36 Z2 days4 attempts

Re: [PATCH] x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled

2021-01-25 Thread Jürgen Groß


On 25.01.21 18:26, Andrew Cooper wrote:

On 25/01/2021 14:00, Juergen Gross wrote:

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4409306364dc..82948251f57b 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -583,6 +583,14 @@ DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
exc_debug(regs);
  }
  
+#ifdef CONFIG_AMD_MEM_ENCRYPT

+DEFINE_IDTENTRY_RAW(xenpv_exc_vmm_communication)
+{
+   /* This should never happen and there is no way to handle it. */
+   panic("X86_TRAP_VC in Xen PV mode.");


Honestly, exactly the same is true of #VE, #HV and #SX.

What we do in the hypervisor is wire up one handler for all unknown
exceptions (to avoid potential future #DF issues) leading to a panic.
Wouldn't it be better to do this unconditionally, especially as #GP/#NP
doesn't work for PV guests for unregistered callbacks, rather than
fixing up piecewise like this?


I agree it would be better to have a "catch all unknown" handler.

I'll have a try how this would look like.


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature

[qemu-mainline test] 158613: regressions - FAIL

2021-01-25 Thread osstest service owner

flight 158613 qemu-mainline real [real]
flight 158621 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/158613/
http://logs.test-lab.xenproject.org/osstest/logs/158621/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 152631
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 152631
 test-armhf-armhf-xl-vhd 17 guest-start/debian.repeat fail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152631
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 152631
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152631
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuue81eb5e6d108008445821e4f891fb9563016c71b
baseline version:
 qemuu1d806cef0e38b5db8347a8e12f214d543204a314

Last test of basis   152631  2020-08-20 09:07:46 Z  158 days
Failing since152659  2020-08-21 14:07:39 Z  157 days  322 attempts
Testing same since   158606  2021-01-24 19:38:17 Z1 days2 attempts


363 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm

Re: Question about xen and Rasp 4B

2021-01-25 Thread Stefano Stabellini

On Sat, 23 Jan 2021, Jukka Kaartinen wrote:
> Thanks for the response!
> 
> On Sat, Jan 23, 2021 at 2:27 AM Stefano Stabellini  
> wrote:
>   + xen-devel, Roman,
> 
> 
>   On Fri, 22 Jan 2021, Jukka Kaartinen wrote:
>   > Hi Stefano,
>   > I'm Jukka Kaartinen a SW developer working on enabling hypervisors on 
> mobile platforms. One of our HW that we use on
>   development is
>   > Raspberry Pi 4B. I wonder if you could help me a bit :).
>   >
>   > I'm trying to enable the GPU with Xen + Raspberry Pi for
>   dom0. 
> https://www.raspberrypi.org/forums/viewtopic.php?f=63=232323#p1797605
>   >
>   > I got so far that GPU drivers are loaded (v3d & vc4) without errors. 
> But now Xen returns error when X is starting:
>   > (XEN) traps.c:1986:d0v1 HSR=0x93880045 pc=0x7f97b14e70 
> gva=0x7f7f817000 gpa=0x401315d000
>   >  I tried to debug what causes this and looks like find_mmio_handler 
> cannot find handler.
>   > (See more here: 
> https://www.raspberrypi.org/forums/viewtopic.php?f=63=232323=25#p1801691
>  )
>   >
>   > Any ideas why the handler is not found?
> 
> 
>   Hi Jukka,
> 
>   I am glad to hear that you are interested in Xen on RaspberryPi :-)  I
>   haven't tried the GPU yet, I have been using the serial only.
>   Roman, did you ever get the GPU working?
> 
> 
>   The error is a data abort error: Linux is trying to access an address
>   which is not mapped to dom0. The address seems to be 0x401315d000. It is
>   a pretty high address; I looked in device tree but couldn't spot it.
> 
>   >From the HSR (the syndrom register) it looks like it is a translation
>   fault at EL1 on stage1. As if the Linux address mapping was wrong.
>   Anyone has any ideas how this could happen? Maybe a reserved-memory
>   misconfiguration?
> 
> I had issues with loading the driver in the first place. Apparently swiotlb 
> is used, maybe it can cause this. I also tried to enable CMA.
> config.txt:
> dtoverlay=vc4-fkms-v3d,cma=320M@0x0-0x4000
> gpu_mem=128

Also looking at your other reply and the implementation of
vc4_bo_create, it looks like this is a CMA problem.

It would be good to run a test with the swiotlb-xen disabled:

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 467fa225c3d0..2bdd12785d14 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -138,8 +138,7 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, 
unsigned int order)
 static int __init xen_mm_init(void)
 {
struct gnttab_cache_flush cflush;
-   if (!xen_initial_domain())
-   return 0;
+   return 0;
xen_swiotlb_init(1, false);
 
cflush.op = 0;

It is going to be fine just to boot Dom0 and DomUs without PV drivers.
Also, can you post the device tree that you are using here? Just in case
there is an issue with Xen parsing any possible /reserved-memory nodes
with CMA info that need to be passed on to Dom0.


>   > p.s.
>   > While testing I found issue with Xen master branch and your patch: 
> xen/rpi4: implement watchdog-based reset
>   >
>   > Looks like black listing the bcm2835-pm
>   > @@ -37,12 +41,69 @@ static const struct dt_device_match 
> rpi4_blacklist_dev[] __initconst =
>   >       * The aux peripheral also shares a page with the aux UART.
>   >       */
>   >      DT_MATCH_COMPATIBLE("brcm,bcm2835-aux"),
>   > +    /* Special device used for rebooting */
>   > +    DT_MATCH_COMPATIBLE("brcm,bcm2835-pm"),
>   >
>   > will prevent v3d driver to locate phandle. I think it will use the 
> same resource:
>   >   pm: watchdog@7e10 {
>   >       compatible = "brcm,bcm2835-pm", "brcm,bcm2835-pm-wdt";
>   > #power-domain-cells = <1>;
>   > #reset-cells = <1>;
>   > reg = <0x7e10 0x114>,
>   >      <0x7e00a000 0x24>,
>   >      <0x7ec11000 0x20>;
>   > clocks = < BCM2835_CLOCK_V3D>,
>   > < BCM2835_CLOCK_PERI_IMAGE>,
>   > < BCM2835_CLOCK_H264>,
>   > < BCM2835_CLOCK_ISP>;
>   > clock-names = "v3d", "peri_image", "h264", "isp";
>   > system-power-controller;
>   >
>   > };
> 
>   Yeah, I imagine it could be possible. Can you post the error message you
>   are seeing from the v3d driver?
> 
> This is the error:
> [    0.069682] OF: /v3dbus/v3d@7ec04000: could not find phandle
> [    0.074828] OF: /v3dbus/v3d@7ec04000: could not find phandle
> v3d driver is not loaded.
> 
> --
> Br,
> Jukka Kaartinen
> 
>

[XTF] Add Argo test

2021-01-25 Thread Christopher Clark

Simple test cases for the four Argo operations, register, unregister,
sendv and notify exercised with a single test domain.
Add infrastructure to access Argo: a 5-argument hypercall, number 39.

Signed-off-by: Christopher Clark 
---
 arch/x86/hypercall_page.S|   2 +-
 arch/x86/include/arch/hypercall-x86_32.h |  13 +
 arch/x86/include/arch/hypercall-x86_64.h |  14 +
 docs/all-tests.dox   |   2 +
 include/xen/argo.h   | 255 
 include/xen/xen.h|  11 +-
 include/xtf/hypercall.h  |   8 +
 include/xtf/numbers.h|   5 +
 tests/argo/Makefile  |   9 +
 tests/argo/main.c| 360 +++
 10 files changed, 677 insertions(+), 2 deletions(-)
 create mode 100644 include/xen/argo.h
 create mode 100644 tests/argo/Makefile
 create mode 100644 tests/argo/main.c

diff --git a/arch/x86/hypercall_page.S b/arch/x86/hypercall_page.S
index b77c1b9..cc6ddc2 100644
--- a/arch/x86/hypercall_page.S
+++ b/arch/x86/hypercall_page.S
@@ -59,7 +59,7 @@ DECLARE_HYPERCALL(sysctl)
 DECLARE_HYPERCALL(domctl)
 DECLARE_HYPERCALL(kexec_op)
 DECLARE_HYPERCALL(tmem_op)
-DECLARE_HYPERCALL(xc_reserved_op)
+DECLARE_HYPERCALL(argo_op)
 DECLARE_HYPERCALL(xenpmu_op)
 
 DECLARE_HYPERCALL(arch_0)
diff --git a/arch/x86/include/arch/hypercall-x86_32.h 
b/arch/x86/include/arch/hypercall-x86_32.h
index 34a7026..f372291 100644
--- a/arch/x86/include/arch/hypercall-x86_32.h
+++ b/arch/x86/include/arch/hypercall-x86_32.h
@@ -53,6 +53,19 @@
 (type)res;  \
 })
 
+#define _hypercall32_5(type, hcall, a1, a2, a3, a4, a5) \
+({  \
+long res, _a1 = (long)(a1), _a2 = (long)(a2), _a3 = (long)(a3), \
+_a4 = (long)(a4), _a5 = (long)(a5); \
+asm volatile (  \
+"call hypercall_page + %c[offset]"  \
+: "=a" (res), "+b" (_a1), "+c" (_a2), "+d" (_a3),   \
+  "+S" (_a4), "+D" (_a5)\
+: [offset] "i" (hcall * 32) \
+: "memory" );   \
+(type)res;  \
+})
+
 #endif /* XTF_X86_32_HYPERCALL_H */
 
 /*
diff --git a/arch/x86/include/arch/hypercall-x86_64.h 
b/arch/x86/include/arch/hypercall-x86_64.h
index d283ad3..728bf74 100644
--- a/arch/x86/include/arch/hypercall-x86_64.h
+++ b/arch/x86/include/arch/hypercall-x86_64.h
@@ -53,6 +53,20 @@
 (type)res;  \
 })
 
+#define _hypercall64_5(type, hcall, a1, a2, a3, a4, a5) \
+({  \
+long res, _a1 = (long)(a1), _a2 = (long)(a2), _a3 = (long)(a3); \
+register long _a4 asm ("r10") = (long)(a4); \
+register long _a5 asm ("r8") = (long)(a5); \
+asm volatile (  \
+"call hypercall_page + %c[offset]"  \
+: "=a" (res), "+D" (_a1), "+S" (_a2), "+d" (_a3),   \
+  "+r" (_a4), "+r" (_a5)\
+: [offset] "i" (hcall * 32) \
+: "memory" );   \
+(type)res;  \
+})
+
 #endif /* XTF_X86_64_HYPERCALL_H */
 
 /*
diff --git a/docs/all-tests.dox b/docs/all-tests.dox
index 902fc44..bed674c 100644
--- a/docs/all-tests.dox
+++ b/docs/all-tests.dox
@@ -164,6 +164,8 @@ states.
 
 @section index-utility Utilities
 
+@subpage test-argo - Argo functionality test
+
 @subpage test-cpuid - Print CPUID information.
 
 @subpage test-fep - Test availability of HVM Forced Emulation Prefix.
diff --git a/include/xen/argo.h b/include/xen/argo.h
new file mode 100644
index 000..58ff439
--- /dev/null
+++ b/include/xen/argo.h
@@ -0,0 +1,255 @@
+/**
+ * Argo : Hypervisor-Mediated data eXchange
+ *
+ * Derived from v4v, the version 2 of v2v.
+ *
+ * Copyright (c) 2010, Citrix Systems
+ * Copyright (c) 2018-2019, BAE Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to

Re: [PATCH V5 00/22] IOREQ feature (+ virtio-mmio) on Arm

2021-01-25 Thread Oleksandr




On 26.01.21 01:20, Julien Grall wrote:

Hi Julien, Stefano


On Mon, 25 Jan 2021 at 20:56, Stefano Stabellini  wrote:

Julien,

Hi,


This seems to be an arm randconfig failure:

https://gitlab.com/xen-project/patchew/xen/-/pipelines/246632953
https://gitlab.com/xen-project/patchew/xen/-/jobs/985455044

Thanks! The error is:

#'target_mem_ref' not supported by expression#'memory.c: In function
'do_memory_op':
memory.c:1210:18: error:  may be used uninitialized in this function
[-Werror=maybe-uninitialized]
  1210 | rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
   |  ^~~~
  1211 |_mfn(mfn_list[i]));
   |~~

I found a few references online of the error message, but it is not
clear what it means. From a quick look at Oleksandr's branch, I also
can't spot anything unitialized. Any ideas?
It seems that error happens if *both* CONFIG_GRANT_TABLE and 
CONFIG_IOREQ_SERVER are disabled. Looks like that mfn_list is 
initialized either in acquire_grant_table() or in acquire_ioreq_server().
If these options disabled then corresponding helpers are just stubs, so 
indeed that mfn_list gets uninitialized. But, I am not sure why gcc 
complains about it as set_foreign_p2m_entry() is *not* going to be 
called in that case???




--
Regards,

Oleksandr Tyshchenko

Re: [PATCH V5 04/22] xen/ioreq: Make x86's IOREQ feature common

2021-01-25 Thread Oleksandr




On 26.01.21 01:13, Julien Grall wrote:

Hi,


Hi Julien




On Mon, 25 Jan 2021 at 19:09, Oleksandr Tyshchenko  wrote:

***
Please note, this patch depends on the following which is
on review:
https://patchwork.kernel.org/patch/11816689/
The effort (to get it upstreamed) was paused because of
the security issue around that code (XSA-348).
***

I read this comment as "This series should be applied on top the patch
X". However, looking at your branch, I can't see the patch. What did I
miss?
You didn't miss anything. Patch series doesn't contain it. I mentioned 
about this patch in order not to forget about it
and draw reviewer's attention. Looks like, the activity (to get it 
upstreamed) hasn't been resumed yet and I don't know what we should do 
with that dependency

in the context of this series...




Cheers,


--
Regards,

Oleksandr Tyshchenko

Re: [PATCH V5 00/22] IOREQ feature (+ virtio-mmio) on Arm

2021-01-25 Thread Julien Grall

On Mon, 25 Jan 2021 at 20:56, Stefano Stabellini  wrote:
>
> Julien,

Hi,

>
> This seems to be an arm randconfig failure:
>
> https://gitlab.com/xen-project/patchew/xen/-/pipelines/246632953
> https://gitlab.com/xen-project/patchew/xen/-/jobs/985455044

Thanks! The error is:

#'target_mem_ref' not supported by expression#'memory.c: In function
'do_memory_op':
memory.c:1210:18: error:  may be used uninitialized in this function
[-Werror=maybe-uninitialized]
 1210 | rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
  |  ^~~~
 1211 |_mfn(mfn_list[i]));
  |~~

I found a few references online of the error message, but it is not
clear what it means. From a quick look at Oleksandr's branch, I also
can't spot anything unitialized. Any ideas?

Cheers,

[xen-unstable-smoke test] 158618: tolerable all pass - PUSHED

2021-01-25 Thread osstest service owner

flight 158618 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158618/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  25fcedefaa9fcbd20203202aa1b73eef051a5fa9
baseline version:
 xen  1d24e551b99a85f50c69e72b7828a7d6c4c4e7a5

Last test of basis   158614  2021-01-25 18:02:33 Z0 days
Testing same since   158618  2021-01-25 21:01:28 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Paul Durrant 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   1d24e551b9..25fcedefaa  25fcedefaa9fcbd20203202aa1b73eef051a5fa9 -> smoke

Re: [PATCH V5 04/22] xen/ioreq: Make x86's IOREQ feature common

2021-01-25 Thread Julien Grall

Hi,

On Mon, 25 Jan 2021 at 19:09, Oleksandr Tyshchenko  wrote:
> ***
> Please note, this patch depends on the following which is
> on review:
> https://patchwork.kernel.org/patch/11816689/
> The effort (to get it upstreamed) was paused because of
> the security issue around that code (XSA-348).
> ***

I read this comment as "This series should be applied on top the patch
X". However, looking at your branch, I can't see the patch. What did I
miss?

Cheers,

[linux-linus test] 158611: regressions - FAIL

2021-01-25 Thread osstest service owner

flight 158611 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158611/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-amd64-xl-multivcpu 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-xl-pvshim   14 guest-start  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-amd64-xl  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvhv2-intel 14 guest-start   fail REGR. vs. 152332
 test-amd64-amd64-xl-credit2  14 guest-start  fail REGR. vs. 152332
 test-arm64-arm64-xl-seattle  10 host-ping-check-xen  fail REGR. vs. 152332
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 152332
 test-amd64-amd64-xl-shadow   14 guest-start  fail REGR. vs. 152332
 test-arm64-arm64-xl-xsm  10 host-ping-check-xen  fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-coresched-amd64-xl 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-xl-pvhv2-amd 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-qemuu-freebsd11-amd64 13 guest-startfail REGR. vs. 152332
 test-amd64-amd64-libvirt-xsm 14 guest-start  fail REGR. vs. 152332
 test-arm64-arm64-examine  8 reboot   fail REGR. vs. 152332
 test-amd64-amd64-libvirt 14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-credit1  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-xsm  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-amd64-pvgrub 12 debian-di-install   fail REGR. vs. 152332
 test-amd64-amd64-qemuu-freebsd12-amd64 13 guest-startfail REGR. vs. 152332
 test-amd64-amd64-libvirt-pair 25 guest-start/debian  fail REGR. vs. 152332
 test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 152332
 test-amd64-amd64-qemuu-nested-intel 12 debian-hvm-install fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 12 debian-di-installfail REGR. vs. 152332
 test-amd64-amd64-xl-qemut-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 
152332
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 
152332
 test-amd64-amd64-pygrub

[PATCH v4 1/2] xen: EXPERT clean-up and introduce UNSUPPORTED

2021-01-25 Thread Stefano Stabellini

A recent thread [1] has exposed a couple of issues with our current way
of handling EXPERT.

1) It is not obvious that "Configure standard Xen features (expert
users)" is actually the famous EXPERT we keep talking about on xen-devel

2) It is not obvious when we need to enable EXPERT to get a specific
feature

In particular if you want to enable ACPI support so that you can boot
Xen on an ACPI platform, you have to enable EXPERT first. But searching
through the kconfig menu it is really not clear (type '/' and "ACPI"):
nothing in the description tells you that you need to enable EXPERT to
get the option.

So this patch makes things easier by doing two things:

- introduce a new kconfig option UNSUPPORTED which is clearly to enable
  UNSUPPORTED features as defined by SUPPORT.md

- change EXPERT options to UNSUPPORTED where it makes sense: keep
  depending on EXPERT for features made for experts

- tag unsupported features by adding (UNSUPPORTED) to the one-line
  description

- clarify the EXPERT one-line description

[1] https://marc.info/?l=xen-devel=160333101228981

Signed-off-by: Stefano Stabellini 
CC: andrew.coop...@citrix.com
CC: george.dun...@citrix.com
CC: i...@xenproject.org
CC: jbeul...@suse.com
CC: jul...@xen.org
CC: w...@xen.org

---
Changes in v4:
- clarify support statement of UNSUPPORTED
- move UNSUPPORTED past EXPERT
- add default EXPERT to UNSUPPORTED

Changes in v3:
- improve UNSUPPORTED text description
- avoid changing XEN_SHSTK and EFI_SET_VIRTUAL_ADDRESS_MAP
- update HVM_FEP to be UNSUPPORTED

Changes in v2:
- introduce UNSUPPORTED
- don't switch all EXPERT options to UNSUPPORTED

See as reference the v2 thread here:
https://marc.info/?l=xen-devel=160566066013723
---
 xen/Kconfig  | 11 ++-
 xen/arch/arm/Kconfig | 10 +-
 xen/arch/x86/Kconfig |  6 +++---
 xen/common/Kconfig   |  2 +-
 xen/common/sched/Kconfig |  6 +++---
 5 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/xen/Kconfig b/xen/Kconfig
index 34c318bfa2..bcbd2758e5 100644
--- a/xen/Kconfig
+++ b/xen/Kconfig
@@ -35,7 +35,7 @@ config DEFCONFIG_LIST
default ARCH_DEFCONFIG
 
 config EXPERT
-   bool "Configure standard Xen features (expert users)"
+   bool "Configure EXPERT features"
help
  This option allows certain base Xen options and settings
  to be disabled or tweaked. This is for specialized environments
@@ -45,6 +45,15 @@ config EXPERT
  supported.
default n
 
+config UNSUPPORTED
+   bool "Configure UNSUPPORTED features"
+   default EXPERT
+   help
+ This option allows certain unsupported Xen options to be changed,
+ which includes non-security-supported, experimental, and tech
+ preview features as defined by SUPPORT.md. (Note that if an option
+ doesn't depend on UNSUPPORTED it doesn't imply that is supported.)
+
 config LTO
bool "Link Time Optimisation"
depends on BROKEN
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index c3eb13ea73..cca76040e5 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -32,7 +32,7 @@ menu "Architecture Features"
 source "arch/Kconfig"
 
 config ACPI
-   bool "ACPI (Advanced Configuration and Power Interface) Support" if 
EXPERT
+   bool "ACPI (Advanced Configuration and Power Interface) Support 
(UNSUPPORTED)" if UNSUPPORTED
depends on ARM_64
---help---
 
@@ -49,7 +49,7 @@ config GICV3
  If unsure, say Y
 
 config HAS_ITS
-bool "GICv3 ITS MSI controller support" if EXPERT
+bool "GICv3 ITS MSI controller support (UNSUPPORTED)" if UNSUPPORTED
 depends on GICV3 && !NEW_VGIC
 
 config HVM
@@ -77,7 +77,7 @@ config SBSA_VUART_CONSOLE
  SBSA Generic UART implements a subset of ARM PL011 UART.
 
 config ARM_SSBD
-   bool "Speculative Store Bypass Disable" if EXPERT
+   bool "Speculative Store Bypass Disable (UNSUPPORTED)" if UNSUPPORTED
depends on HAS_ALTERNATIVE
default y
help
@@ -87,7 +87,7 @@ config ARM_SSBD
  If unsure, say Y.
 
 config HARDEN_BRANCH_PREDICTOR
-   bool "Harden the branch predictor against aliasing attacks" if EXPERT
+   bool "Harden the branch predictor against aliasing attacks 
(UNSUPPORTED)" if UNSUPPORTED
default y
help
  Speculation attacks against some high-performance processors rely on
@@ -104,7 +104,7 @@ config HARDEN_BRANCH_PREDICTOR
  If unsure, say Y.
 
 config TEE
-   bool "Enable TEE mediators support" if EXPERT
+   bool "Enable TEE mediators support (UNSUPPORTED)" if UNSUPPORTED
default n
help
  This option enables generic TEE mediators support. It allows guests
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 78f351f94b..302334d3e4 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -147,7 +147,7 @@ config BIGMEM
  If unsure, say N.
 
 config HVM_FEP
-   bool "HVM

[PATCH v4 2/2] xen: add (EXPERT) to one-line descriptions when appropriate

2021-01-25 Thread Stefano Stabellini

Add an "(EXPERT)" tag to the one-line description of Kconfig options
that depend on EXPERT.

Signed-off-by: Stefano Stabellini 
CC: andrew.coop...@citrix.com
CC: george.dun...@citrix.com
CC: i...@xenproject.org
CC: jbeul...@suse.com
CC: jul...@xen.org
CC: w...@xen.org

---
Changes in v4:
- new patch
---
 xen/arch/x86/Kconfig |  2 +-
 xen/common/Kconfig   | 12 ++--
 xen/common/sched/Kconfig |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 302334d3e4..3f630b89e8 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -103,7 +103,7 @@ config HVM
  If unsure, say Y.
 
 config XEN_SHSTK
-   bool "Supervisor Shadow Stacks"
+   bool "Supervisor Shadow Stacks (EXPERT)"
depends on HAS_AS_CET_SS && EXPERT
default y
---help---
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 39451e8350..b49127463d 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -12,7 +12,7 @@ config CORE_PARKING
bool
 
 config GRANT_TABLE
-   bool "Grant table support" if EXPERT
+   bool "Grant table support (EXPERT)" if EXPERT
default y
---help---
  Grant table provides a generic mechanism to memory sharing
@@ -151,7 +151,7 @@ config KEXEC
  If unsure, say Y.
 
 config EFI_SET_VIRTUAL_ADDRESS_MAP
-bool "EFI: call SetVirtualAddressMap()" if EXPERT
+bool "EFI: call SetVirtualAddressMap() (EXPERT)" if EXPERT
 ---help---
   Call EFI SetVirtualAddressMap() runtime service to setup memory map for
   further runtime services. According to UEFI spec, it isn't strictly
@@ -162,7 +162,7 @@ config EFI_SET_VIRTUAL_ADDRESS_MAP
 
 config XENOPROF
def_bool y
-   prompt "Xen Oprofile Support" if EXPERT
+   prompt "Xen Oprofile Support (EXPERT)" if EXPERT
depends on X86
---help---
  Xen OProfile (Xenoprof) is a system-wide profiler for Xen virtual
@@ -199,7 +199,7 @@ config XSM_FLASK
 
 config XSM_FLASK_AVC_STATS
def_bool y
-   prompt "Maintain statistics on the FLASK access vector cache" if EXPERT
+   prompt "Maintain statistics on the FLASK access vector cache (EXPERT)" 
if EXPERT
depends on XSM_FLASK
---help---
  Maintain counters on the access vector cache that can be viewed using
@@ -344,7 +344,7 @@ config SUPPRESS_DUPLICATE_SYMBOL_WARNINGS
  build becoming overly verbose.
 
 config CMDLINE
-   string "Built-in hypervisor command string" if EXPERT
+   string "Built-in hypervisor command string (EXPERT)" if EXPERT
default ""
---help---
  Enter arguments here that should be compiled into the hypervisor
@@ -377,7 +377,7 @@ config DOM0_MEM
  Leave empty if you are not sure what to specify.
 
 config TRACEBUFFER
-   bool "Enable tracing infrastructure" if EXPERT
+   bool "Enable tracing infrastructure (EXPERT)" if EXPERT
default y
---help---
  Enable tracing infrastructure and pre-defined tracepoints within Xen.
diff --git a/xen/common/sched/Kconfig b/xen/common/sched/Kconfig
index 94c9e20139..b6020a83c6 100644
--- a/xen/common/sched/Kconfig
+++ b/xen/common/sched/Kconfig
@@ -1,4 +1,4 @@
-menu "Schedulers"
+menu "Schedulers (EXPERT)"
visible if EXPERT
 
 config SCHED_CREDIT
-- 
2.17.1

[PATCH v4 0/2] introduce UNSUPPORTED

2021-01-25 Thread Stefano Stabellini

Hi all,

A recent thread [1] has exposed a couple of issues with our current way
of handling EXPERT.

1) It is not obvious that "Configure standard Xen features (expert
users)" is actually the famous EXPERT we keep talking about on xen-devel

2) It is not obvious when we need to enable EXPERT to get a specific
feature

In particular if you want to enable ACPI support so that you can boot
Xen on an ACPI platform, you have to enable EXPERT first. But searching
through the kconfig menu it is really not clear (type '/' and "ACPI"):
nothing in the description tells you that you need to enable EXPERT to
get the option.

This series makes things easier by doing the following:

- introduce a new kconfig option UNSUPPORTED which is clearly to enable
  UNSUPPORTED features as defined by SUPPORT.md

- change EXPERT options to UNSUPPORTED where it makes sense: keep
  depending on EXPERT for features made for experts

- tag unsupported features by adding (UNSUPPORTED) to the one-line
  description

- clarify the EXPERT one-line description

[1] https://marc.info/?l=xen-devel=160333101228981

Cheers,

Stefano



Stefano Stabellini (2):
  xen: EXPERT clean-up and introduce UNSUPPORTED
  xen: add (EXPERT) to one-line descriptions when appropriate

 xen/Kconfig  | 11 ++-
 xen/arch/arm/Kconfig | 10 +-
 xen/arch/x86/Kconfig |  8 
 xen/common/Kconfig   | 14 +++---
 xen/common/sched/Kconfig |  8 
 5 files changed, 30 insertions(+), 21 deletions(-)

Re: [PATCH v3] xen: EXPERT clean-up and introduce UNSUPPORTED

2021-01-25 Thread Stefano Stabellini

On Mon, 25 Jan 2021, Bertrand Marquis wrote:
> Hi Stefano,
> 
> > On 23 Jan 2021, at 02:19, Stefano Stabellini  wrote:
> > 
> > A recent thread [1] has exposed a couple of issues with our current way
> > of handling EXPERT.
> > 
> > 1) It is not obvious that "Configure standard Xen features (expert
> > users)" is actually the famous EXPERT we keep talking about on xen-devel
> > 
> > 2) It is not obvious when we need to enable EXPERT to get a specific
> > feature
> > 
> > In particular if you want to enable ACPI support so that you can boot
> > Xen on an ACPI platform, you have to enable EXPERT first. But searching
> > through the kconfig menu it is really not clear (type '/' and "ACPI"):
> > nothing in the description tells you that you need to enable EXPERT to
> > get the option.
> > 
> > So this patch makes things easier by doing two things:
> > 
> > - introduce a new kconfig option UNSUPPORTED which is clearly to enable
> >  UNSUPPORTED features as defined by SUPPORT.md
> 
> That’s a great change which will improve user experience.

Thank you!


> > - change EXPERT options to UNSUPPORTED where it makes sense: keep
> >  depending on EXPERT for features made for experts
> > 
> > - tag unsupported features by adding (UNSUPPORTED) to the one-line
> >  description
> > 
> 
> Shouldn’t we add  (EXPERT) for expert options in the same way for coherency ?

I take you mean add "(EXPERT)" to the one-line description in kconfig. I
am OK with that, probably better as a second separate patch. I'll add
it.

Re: [PATCH v3] xen: EXPERT clean-up and introduce UNSUPPORTED

2021-01-25 Thread Stefano Stabellini

On Mon, 25 Jan 2021, Jan Beulich wrote:
> On 23.01.2021 03:19, Stefano Stabellini wrote:
> > --- a/xen/Kconfig
> > +++ b/xen/Kconfig
> > @@ -34,8 +34,15 @@ config DEFCONFIG_LIST
> > option defconfig_list
> > default ARCH_DEFCONFIG
> >  
> > +config UNSUPPORTED
> > +   bool "Configure UNSUPPORTED features"
> > +   help
> > + This option allows certain unsupported Xen options to be changed,
> > + which includes non-security-supported, experimental, and tech
> > + preview features as defined by SUPPORT.md.
> 
> And by implication anything not depending on UNSUPPORTED is
> supported? I didn't think this was the case (some unsupported
> code can't even be turned off via Kconfig), so I think this
> needs clarifying here, so we won't end up with people
> considering some feature supported which really isn't. That's
> irrespective of the reference to SUPPORT.md.

I'll clarify.


> >  config EXPERT
> > -   bool "Configure standard Xen features (expert users)"
> > +   bool "Configure EXPERT features"
> > help
> >   This option allows certain base Xen options and settings
> >   to be disabled or tweaked. This is for specialized environments
> 
> I'd like to suggest to move UNSUPPORTED past this one, to
> then have that have "default EXPERT".

Sure, good idea.

Re: [PATCH V5 00/22] IOREQ feature (+ virtio-mmio) on Arm

2021-01-25 Thread Stefano Stabellini

Julien,

This seems to be an arm randconfig failure:

https://gitlab.com/xen-project/patchew/xen/-/pipelines/246632953
https://gitlab.com/xen-project/patchew/xen/-/jobs/985455044


On Mon, 25 Jan 2021, no-re...@patchew.org wrote:
> Hi,
> 
> Patchew automatically ran gitlab-ci pipeline with this patch (series) 
> applied, but the job failed. Maybe there's a bug in the patches?
> 
> You can find the link to the pipeline near the end of the report below:
> 
> Type: series
> Message-id: 1611601709-28361-1-git-send-email-olekst...@gmail.com
> Subject: [PATCH V5 00/22] IOREQ feature (+ virtio-mmio) on Arm
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> sleep 10
> patchew gitlab-pipeline-check -p xen-project/patchew/xen
> === TEST SCRIPT END ===
> 
> warning: redirecting to https://gitlab.com/xen-project/patchew/xen.git/
> warning: redirecting to https://gitlab.com/xen-project/patchew/xen.git/
> From https://gitlab.com/xen-project/patchew/xen
>  * [new tag]   
> patchew/1611601709-28361-1-git-send-email-olekst...@gmail.com -> 
> patchew/1611601709-28361-1-git-send-email-olekst...@gmail.com
> Switched to a new branch 'test'
> 51c66995c4 xen/arm: Add mapcache invalidation handling
> ac18dce945 xen/ioreq: Make x86's send_invalidate_req() common
> bff65c909c xen/arm: io: Harden sign extension check
> db7d33509b xen/arm: io: Abstract sign-extension
> 49d920436c xen/dm: Introduce xendevicemodel_set_irq_level DM op
> 9f750615ff xen/ioreq: Introduce domain_has_ioreq_server()
> 8bcf50987f xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
> 54b6d5517d xen/arm: Call vcpu_ioreq_handle_completion() in 
> check_for_vcpu_work()
> c54d6b6a4c arm/ioreq: Introduce arch specific bits for IOREQ/DM features
> 955655a87c xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
> ff3da51e59 xen/ioreq: Remove "hvm" prefixes from involved function names
> f2d022b8d2 xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
> dc137ff63c xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
> da63f74717 xen/ioreq: Make x86's IOREQ related dm-op handling common
> 53ed326f85 xen/ioreq: Move x86's ioreq_server to struct domain
> 6eb4a9b103 xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
> 9198aac40e xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common
> 64669ca3f1 xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
> 64ed7b62fb xen/ioreq: Make x86's IOREQ feature common
> 8fc382a03b x86/ioreq: Provide out-of-line wrapper for the handle_mmio()
> 1aac704b38 x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving
> ab818a53dc x86/ioreq: Prepare IOREQ feature for making it common
> 
> === OUTPUT BEGIN ===
> [2021-01-25 19:28:41] Looking up pipeline...
> [2021-01-25 19:28:42] Found pipeline 246632953:
> 
> https://gitlab.com/xen-project/patchew/xen/-/pipelines/246632953
> 
> [2021-01-25 19:28:42] Waiting for pipeline to finish...
> [2021-01-25 19:43:46] Still waiting...
> [2021-01-25 19:58:50] Still waiting...
> [2021-01-25 20:13:55] Still waiting...
> [2021-01-25 20:29:00] Still waiting...
> [2021-01-25 20:44:04] Still waiting...
> [2021-01-25 20:53:07] Pipeline failed
> [2021-01-25 20:53:08] Job 'qemu-smoke-x86-64-clang-pvh' in stage 'test' is 
> skipped
> [2021-01-25 20:53:08] Job 'qemu-smoke-x86-64-gcc-pvh' in stage 'test' is 
> skipped
> [2021-01-25 20:53:08] Job 'qemu-smoke-x86-64-clang' in stage 'test' is skipped
> [2021-01-25 20:53:08] Job 'qemu-smoke-x86-64-gcc' in stage 'test' is skipped
> [2021-01-25 20:53:08] Job 'qemu-smoke-arm64-gcc' in stage 'test' is skipped
> [2021-01-25 20:53:08] Job 'qemu-alpine-arm64-gcc' in stage 'test' is skipped
> [2021-01-25 20:53:08] Job 'build-each-commit-gcc' in stage 'test' is skipped
> [2021-01-25 20:53:08] Job 'debian-unstable-gcc-debug-arm64-randconfig' in 
> stage 'build' is failed
> [2021-01-25 20:53:08] Job 'debian-unstable-gcc-arm64-randconfig' in stage 
> 'build' is failed
> [2021-01-25 20:53:08] Job 'alpine-3.12-clang-debug' in stage 'build' is failed
> [2021-01-25 20:53:08] Job 'alpine-3.12-clang' in stage 'build' is failed
> [2021-01-25 20:53:08] Job 'alpine-3.12-gcc-debug' in stage 'build' is failed
> [2021-01-25 20:53:08] Job 'alpine-3.12-gcc' in stage 'build' is failed
> === OUTPUT END ===
> 
> Test command exited with code: 1

Re: [PATCH V5 14/22] arm/ioreq: Introduce arch specific bits for IOREQ/DM features

2021-01-25 Thread Stefano Stabellini

On Mon, 25 Jan 2021, Oleksandr Tyshchenko wrote:
> From: Julien Grall 
> 
> This patch adds basic IOREQ/DM support on Arm. The subsequent
> patches will improve functionality and add remaining bits.
> 
> The IOREQ/DM features are supposed to be built with IOREQ_SERVER
> option enabled, which is disabled by default on Arm for now.
> 
> Please note, the "PIO handling" TODO is expected to left unaddressed
> for the current series. It is not an big issue for now while Xen
> doesn't have support for vPCI on Arm. On Arm64 they are only used
> for PCI IO Bar and we would probably want to expose them to emulator
> as PIO access to make a DM completely arch-agnostic. So "PIO handling"
> should be implemented when we add support for vPCI.
> 
> Signed-off-by: Julien Grall 
> Signed-off-by: Oleksandr Tyshchenko 
> [On Arm only]
> Tested-by: Wei Chen 

Reviewed-by: Stefano Stabellini 


> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> ***
> I admit, I didn't resolve header dependencies completely.
> For now, public/hvm/dm_op.h is included by xen/dm.h, but ought to be included
> by arch/arm/dm.c. Details here:
> https://lore.kernel.org/xen-devel/e0bc7f80-974e-945d-4605-173bd0530...@gmail.com/
> ***
> 
> Changes RFC -> V1:
>- was split into:
>  - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
>  - xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
>- update patch description
>- update asm-arm/hvm/ioreq.h according to the newly introduced arch 
> functions:
>  - arch_hvm_destroy_ioreq_server()
>  - arch_handle_hvm_io_completion()
>- update arch files to include xen/ioreq.h
>- remove HVMOP plumbing
>- rewrite a logic to handle properly case when hvm_send_ioreq() returns 
> IO_RETRY
>- add a logic to handle properly handle_hvm_io_completion() return value
>- rename handle_mmio() to ioreq_handle_complete_mmio()
>- move paging_mark_pfn_dirty() to asm-arm/paging.h
>- remove forward declaration for hvm_ioreq_server in asm-arm/paging.h
>- move try_fwd_ioserv() to ioreq.c, provide stubs if !CONFIG_IOREQ_SERVER
>- do not remove #ifdef CONFIG_IOREQ_SERVER in memory.c for guarding 
> xen/ioreq.h
>- use gdprintk in try_fwd_ioserv(), remove unneeded prints
>- update list of #include-s
>- move has_vpci() to asm-arm/domain.h
>- add a comment (TODO) to unimplemented yet handle_pio()
>- remove hvm_mmio_first(last)_byte() and hvm_ioreq_(page/vcpu/server) 
> structs
>  from the arch files, they were already moved to the common code
>- remove set_foreign_p2m_entry() changes, they will be properly implemented
>  in the follow-up patch
>- select IOREQ_SERVER for Arm instead of Arm64 in Kconfig
>- remove x86's realmode and other unneeded stubs from xen/ioreq.h
>- clafify ioreq_t p.df usage in try_fwd_ioserv()
>- set ioreq_t p.count to 1 in try_fwd_ioserv()
> 
> Changes V1 -> V2:
>- was split into:
>  - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
>  - xen/arm: Stick around in leave_hypervisor_to_guest until I/O has 
> completed
>- update the author of a patch
>- update patch description
>- move a loop in leave_hypervisor_to_guest() to a separate patch
>- set IOREQ_SERVER disabled by default
>- remove already clarified /* XXX */
>- replace BUG() by ASSERT_UNREACHABLE() in handle_pio()
>- remove default case for handling the return value of try_handle_mmio()
>- remove struct hvm_domain, enum hvm_io_completion, struct hvm_vcpu_io,
>  struct hvm_vcpu from asm-arm/domain.h, these are common materials now
>- update everything according to the recent changes (IOREQ related function
>  names don't contain "hvm" prefixes/infixes anymore, IOREQ related fields
>  are part of common struct vcpu/domain now, etc)
> 
> Changes V2 -> V3:
>- update patch according the "legacy interface" is x86 specific
>- add dummy arch hooks
>- remove dummy paging_mark_pfn_dirty()
>- don’t include  in common ioreq.c
>- don’t include  in arch ioreq.h
>- remove #define ioreq_params(d, i)
> 
> Changes V3 -> V4:
>- rebase
>- update patch according to the renaming IO_ -> VIO_ (io_ -> vio_)
>  and misc changes to arch hooks
>- update patch according to the IOREQ related dm-op handling changes
>- don't include  from arch header
>- make all arch hooks out-of-line
>- add a comment above IOREQ_STATUS_* #define-s
> 
> Changes V4 -> V5:
>- change the placement of ioreq_server_destroy_all() in arm/domain.c
>- don't include public/hvm/dm_op.h by asm-arm/domain.h
>- include public/hvm/dm_op.h by xen/dm.h
>- put arch ioreq.h directly into asm-arm subdir
>- remove do_dm_op() in arm/dm.c, this is a common material now
>- remove obsolete ioreq_complete_mmio() from asm-arm/ioreq.h
>- optimize

Re: [PATCH v10 00/11] domain context infrastructure

2021-01-25 Thread Andrew Cooper

On 08/10/2020 19:57, Paul Durrant wrote:
> From: Paul Durrant 
>
> Paul Durrant (11):
>   docs / include: introduce a new framework for 'domain context' records
>   xen: introduce implementation of save/restore of 'domain context'
>   xen/common/domctl: introduce XEN_DOMCTL_get/set_domain_context
>   tools/misc: add xen-domctx to present domain context
>   common/domain: add a domain context record for shared_info...
>   x86/time: add a domain context record for tsc_info...
>   docs/specs: add missing definitions to libxc-migration-stream
>   docs / tools: specify migration v4 to include DOMAIN_CONTEXT
>   tools/python: modify libxc.py to verify v4 stream
>   tools/libs/guest: add code to restore a v4 libxc stream
>   tools/libs/guest: add code to save a v4 libxc stream

Thanks - this is much better when it comes to the public API/ABI.

However, my concerns still stands.  What *else* is going in the domain
context record, because you can pull the "bump the interface version and
deprecated new fields" exactly once, as the libxg logic doesn't delve
into domain context stream.

At the moment this does increase the downtime of the VM for no gain. 
What I'm trying to understand is whether this is "no gain (yet)" or
whether you consider this "done" as far as cooperative migrate is concerned.

~Andrew

Re: [PATCH v10 11/11] tools/libs/guest: add code to save a v4 libxc stream

2021-01-25 Thread Andrew Cooper

On 08/10/2020 19:57, Paul Durrant wrote:
> From: Paul Durrant 
>
> This patch adds the necessary code to save a REC_TYPE_DOMAIN_CONTEXT record,
> and stop saving the now obsolete REC_TYPE_SHARED_INFO and REC_TYPE_TSC_INFO
> records for PV guests.
>
> Signed-off-by: Paul Durrant 

Looks broadly ok.

> diff --git a/tools/libs/guest/xg_sr_common_x86.h 
> b/tools/libs/guest/xg_sr_common_x86.h
> index b55758c96d..e504169705 100644
> --- a/tools/libs/guest/xg_sr_common_x86.h
> +++ b/tools/libs/guest/xg_sr_common_x86.h
> @@ -44,6 +44,52 @@ static int write_headers(struct xc_sr_context *ctx, 
> uint16_t guest_type)
>  return 0;
>  }
>  
> +/*
> + * Writes a DOMAIN_CONTEXT record into the stream.
> + */
> +static int write_domain_context_record(struct xc_sr_context *ctx)
> +{
> +xc_interface *xch = ctx->xch;
> +struct xc_sr_record rec = {
> +.type = REC_TYPE_DOMAIN_CONTEXT,
> +};
> +size_t len = 0;
> +int rc;
> +
> +rc = xc_domain_get_context(xch, ctx->domid, NULL, );
> +if ( rc < 0 )
> +{
> +PERROR("can't get record length for dom %u\n", ctx->domid);
> +goto out;
> +}
> +
> +rec.data = malloc(len);
> +
> +rc = -1;
> +if ( !rec.data )
> +{
> +PERROR("can't allocate %lu bytes\n", len);

%zu, because not all versions of C have size_t the same as unsigned long.

> +goto out;
> +}
> +
> +rc = xc_domain_get_context(xch, ctx->domid, rec.data, );
> +if ( rc < 0 )
> +{
> +PERROR("can't get domain record for dom %u\n", ctx->domid);

"domain context", and above.

> diff --git a/tools/libs/guest/xg_sr_save_x86_pv.c 
> b/tools/libs/guest/xg_sr_save_x86_pv.c
> index 4964f1f7b8..3de7b19f54 100644
> --- a/tools/libs/guest/xg_sr_save_x86_pv.c
> +++ b/tools/libs/guest/xg_sr_save_x86_pv.c
> @@ -849,20 +849,6 @@ static int write_x86_pv_p2m_frames(struct xc_sr_context 
> *ctx)
>  return rc;
>  }
>  
> -/*
> - * Writes an SHARED_INFO record into the stream.
> - */
> -static int write_shared_info(struct xc_sr_context *ctx)
> -{
> -struct xc_sr_record rec = {
> -.type = REC_TYPE_SHARED_INFO,
> -.length = PAGE_SIZE,
> -.data = ctx->x86.pv.shinfo,
> -};
> -
> -return write_record(ctx, );
> -}

This change also wants to strip out ctx->x86.pv.shinfo, and the mapping
logic.

~Andrew

[xen-unstable-smoke test] 158614: tolerable all pass - PUSHED

2021-01-25 Thread osstest service owner

flight 158614 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158614/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  1d24e551b99a85f50c69e72b7828a7d6c4c4e7a5
baseline version:
 xen  452ddbe3592b141b05a7e0676f09c8ae07f98fdd

Last test of basis   158590  2021-01-23 12:02:37 Z2 days
Testing same since   158614  2021-01-25 18:02:33 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Juergen Gross 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   452ddbe359..1d24e551b9  1d24e551b99a85f50c69e72b7828a7d6c4c4e7a5 -> smoke

Re: [PATCH v10 10/11] tools/libs/guest: add code to restore a v4 libxc stream

2021-01-25 Thread Andrew Cooper

On 08/10/2020 19:57, Paul Durrant wrote:
> From: Paul Durrant 
>
> This patch adds the necessary code to accept a v4 stream, and to recognise and
> restore a REC_TYPE_DOMAIN_CONTEXT record.
>
> Signed-off-by: Paul Durrant 

Somewhere within this needs to be logic to reject the forbidden records
in relevant stream versions.

> diff --git a/tools/libs/guest/xg_sr_restore_x86_hvm.c 
> b/tools/libs/guest/xg_sr_restore_x86_hvm.c
> index d6ea6f3012..6bb164b9f0 100644
> --- a/tools/libs/guest/xg_sr_restore_x86_hvm.c
> +++ b/tools/libs/guest/xg_sr_restore_x86_hvm.c
> @@ -225,6 +225,15 @@ static int x86_hvm_stream_complete(struct xc_sr_context 
> *ctx)
>  return rc;
>  }
>  
> +rc = xc_domain_set_context(xch, ctx->domid,
> +   ctx->restore.dom_ctx.ptr,
> +   ctx->restore.dom_ctx.size);
> +if ( rc )
> +{
> +PERROR("Unable to restore Domain context");
> +return rc;
> +}

This doesn't match where you specified the record to live in the stream,
and in particular is reordered WRT HVMCONTEXT restoration.

Also, it appears to be in the middle of a block of code which needs to
become `if ( guest-aware )`.

> +
>  rc = xc_dom_gnttab_seed(xch, ctx->domid, true,
>  ctx->restore.console_gfn,
>  ctx->restore.xenstore_gfn,
> diff --git a/tools/libs/guest/xg_sr_restore_x86_pv.c 
> b/tools/libs/guest/xg_sr_restore_x86_pv.c
> index dc50b0f5a8..2dafad7b83 100644
> --- a/tools/libs/guest/xg_sr_restore_x86_pv.c
> +++ b/tools/libs/guest/xg_sr_restore_x86_pv.c
> @@ -1134,6 +1134,15 @@ static int x86_pv_stream_complete(struct xc_sr_context 
> *ctx)
>  if ( rc )
>  return rc;
>  
> +rc = xc_domain_set_context(xch, ctx->domid,
> +   ctx->restore.dom_ctx.ptr,
> +   ctx->restore.dom_ctx.size);
> +if ( rc )
> +{
> +PERROR("Unable to restore Domain context");
> +return rc;
> +}

Similar comment as HVM for the reordering.  PV guests in particular tend
to be far more sensitive to the restoration order.

~Andrew

Re: [PATCH v10 09/11] tools/python: modify libxc.py to verify v4 stream

2021-01-25 Thread Andrew Cooper

On 08/10/2020 19:57, Paul Durrant wrote:
> @@ -476,6 +484,14 @@ class VerifyLibxc(VerifyBase):
>  raise RecordError("Record length %u, expected multiple of %u" %
>(contentsz, sz))
>  
> +def verify_record_domain_context(self, content):
> +""" domain context record """
> +
> +if self.version < 4:
> +raise RecordError("Domain context record found in v3 stream")
> +
> +if len(content) == 0:
> +raise RecordError("Zero length domain context")

This needs a recursive dissector to validate the domain context format,
as it is not a private ABI within Xen.

~Andrew

Re: [PATCH v10 08/11] docs / tools: specify migration v4 to include DOMAIN_CONTEXT

2021-01-25 Thread Andrew Cooper

On 08/10/2020 19:57, Paul Durrant wrote:
> diff --git a/docs/specs/libxc-migration-stream.pandoc 
> b/docs/specs/libxc-migration-stream.pandoc
> index 8aeab3b11b..aa6fe284f3 100644
> --- a/docs/specs/libxc-migration-stream.pandoc
> +++ b/docs/specs/libxc-migration-stream.pandoc
> @@ -127,7 +127,7 @@ marker  0x.
>  
>  id  0x58454E46 ("XENF" in ASCII).
>  
> -version 0x0003.  The version of this specification.
> +version 0x0004.  The version of this specification.
>  
>  options bit 0: Endianness.  0 = little-endian, 1 = big-endian.
>  
> @@ -209,9 +209,9 @@ type 0x: END
>  
>   0x0006: X86_PV_VCPU_XSAVE
>  
> - 0x0007: SHARED_INFO
> + 0x0007: SHARED_INFO (deprecated)

"in v4" ?

> @@ -442,10 +444,11 @@ X86_PV_VCPU_MSRS 
> XEN_DOMCTL_{get,set}\_vcpu_msrs
>  
>  \clearpage
>  
> -SHARED_INFO
> 
> +SHARED_INFO (deprecated)
> +
>  
> -The content of the Shared Info page.
> +The content of the Shared Info page. This is incorporated into the
> +DOMAIN_CONTEXT record as of specification version 4.

This needs to be stricter.  A SHARED_INFO frame must not be present in a
v4 stream.

Absolutely nothing good can come from having the state twice.  Moreover, ...

>  
>   0 1 2 3 4 5 6 7 octet
>  +-+
> @@ -462,11 +465,12 @@ shared_info  Contents of the shared info page.  
> This record
>  
>  \clearpage
>  
> -X86_TSC_INFO
> -
> +X86_TSC_INFO (deprecated)
> +-
>  
>  Domain TSC information, as accessed by the
> -XEN_DOMCTL_{get,set}tscinfo hypercall sub-ops.
> +XEN_DOMCTL_{get,set}tscinfo hypercall sub-ops. This is incorporated into the
> +DOMAIN_CONTEXT record as of specification version 4.

... it is actively problematic for this one, as incarnation counts the
number of set_tsc_info() hypercalls.

~Andrew

Re: [PATCH v10 07/11] docs/specs: add missing definitions to libxc-migration-stream

2021-01-25 Thread Andrew Cooper

On 08/10/2020 19:57, Paul Durrant wrote:
> From: Paul Durrant 
>
> The STATIC_DATA_END, X86_CPUID_POLICY and X86_MSR_POLICY record types have
> sections explaining what they are but their values are not defined. Indeed
> their values are defined as "Reserved for future mandatory records."
>
> Also, the spec revision is adjusted to match the migration stream version
> and an END record is added to the description of a 'typical save record for
> and x86 HVM guest.'
>
> Signed-off-by: Paul Durrant 
> Fixes: 6f71b5b1506 ("docs/migration Specify migration v3 and STATIC_DATA_END")
> Fixes: ddd273d8863 ("docs/migration: Specify X86_{CPUID,MSR}_POLICY records")
> Acked-by: Wei Liu 

I've committed this.  I have no idea where these hunks got lost, because
I definitely did have them at some point during the mig-v3 work.

~Andrew

Re: [PATCH v10 06/11] x86/time: add a domain context record for tsc_info...

2021-01-25 Thread Andrew Cooper

On 08/10/2020 19:57, Paul Durrant wrote:
> +The record body contains the following fields:
> +
> +| Field | Description   |
> +|---|---|
> +| `mode`| 0x: Default (emulate if necessary)|
> +|   | 0x0001: Always emulate|
> +|   | 0x0002: Never emulate |
> +|   |   |
> +| `khz` | The TSC frequency in kHz  |
> +|   |   |
> +| `nsec`| Elapsed time in nanoseconds   |
> +|   |   |
> +| `incarnation` | Incarnation (counter indicating how many  |
> +|   | times the TSC value has been set) |

It is how many set_tsc_info() (hyper)calls have been made, not how many
times the guest has written to the TSC MSR.

That said, it is totally useless now that PVRDTSCP mode has gone, other
than the fact that it appears in guest CPUID as an approximation of "how
many times has this domain been migrated".

I.e. its a number you'll want to actively squash in your usecase.

I'm not sure whether to suggest dropping the field entirely, or not.  I
highly doubt any user exists - IIRC, it was specifically for PVRDTSCP
userspace to notice that the frequency may have changed, and to
re-adjust its calculations.

> diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> index bccbaadd0b..86881864cf 100644
> --- a/xen/include/public/save.h
> +++ b/xen/include/public/save.h
> @@ -50,6 +50,7 @@ enum {
>  DOMAIN_CONTEXT_END,
>  DOMAIN_CONTEXT_START,
>  DOMAIN_CONTEXT_SHARED_INFO,
> +DOMAIN_CONTEXT_TSC_INFO,

At a minimum, this wants an /* x86 only */ comment.  Possibly an X86 infix.

~Andrew

[PATCH V5 14/22] arm/ioreq: Introduce arch specific bits for IOREQ/DM features

2021-01-25 Thread Oleksandr Tyshchenko

From: Julien Grall 

This patch adds basic IOREQ/DM support on Arm. The subsequent
patches will improve functionality and add remaining bits.

The IOREQ/DM features are supposed to be built with IOREQ_SERVER
option enabled, which is disabled by default on Arm for now.

Please note, the "PIO handling" TODO is expected to left unaddressed
for the current series. It is not an big issue for now while Xen
doesn't have support for vPCI on Arm. On Arm64 they are only used
for PCI IO Bar and we would probably want to expose them to emulator
as PIO access to make a DM completely arch-agnostic. So "PIO handling"
should be implemented when we add support for vPCI.

Signed-off-by: Julien Grall 
Signed-off-by: Oleksandr Tyshchenko 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

***
I admit, I didn't resolve header dependencies completely.
For now, public/hvm/dm_op.h is included by xen/dm.h, but ought to be included
by arch/arm/dm.c. Details here:
https://lore.kernel.org/xen-devel/e0bc7f80-974e-945d-4605-173bd0530...@gmail.com/
***

Changes RFC -> V1:
   - was split into:
 - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
 - xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
   - update patch description
   - update asm-arm/hvm/ioreq.h according to the newly introduced arch 
functions:
 - arch_hvm_destroy_ioreq_server()
 - arch_handle_hvm_io_completion()
   - update arch files to include xen/ioreq.h
   - remove HVMOP plumbing
   - rewrite a logic to handle properly case when hvm_send_ioreq() returns 
IO_RETRY
   - add a logic to handle properly handle_hvm_io_completion() return value
   - rename handle_mmio() to ioreq_handle_complete_mmio()
   - move paging_mark_pfn_dirty() to asm-arm/paging.h
   - remove forward declaration for hvm_ioreq_server in asm-arm/paging.h
   - move try_fwd_ioserv() to ioreq.c, provide stubs if !CONFIG_IOREQ_SERVER
   - do not remove #ifdef CONFIG_IOREQ_SERVER in memory.c for guarding 
xen/ioreq.h
   - use gdprintk in try_fwd_ioserv(), remove unneeded prints
   - update list of #include-s
   - move has_vpci() to asm-arm/domain.h
   - add a comment (TODO) to unimplemented yet handle_pio()
   - remove hvm_mmio_first(last)_byte() and hvm_ioreq_(page/vcpu/server) structs
 from the arch files, they were already moved to the common code
   - remove set_foreign_p2m_entry() changes, they will be properly implemented
 in the follow-up patch
   - select IOREQ_SERVER for Arm instead of Arm64 in Kconfig
   - remove x86's realmode and other unneeded stubs from xen/ioreq.h
   - clafify ioreq_t p.df usage in try_fwd_ioserv()
   - set ioreq_t p.count to 1 in try_fwd_ioserv()

Changes V1 -> V2:
   - was split into:
 - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
 - xen/arm: Stick around in leave_hypervisor_to_guest until I/O has 
completed
   - update the author of a patch
   - update patch description
   - move a loop in leave_hypervisor_to_guest() to a separate patch
   - set IOREQ_SERVER disabled by default
   - remove already clarified /* XXX */
   - replace BUG() by ASSERT_UNREACHABLE() in handle_pio()
   - remove default case for handling the return value of try_handle_mmio()
   - remove struct hvm_domain, enum hvm_io_completion, struct hvm_vcpu_io,
 struct hvm_vcpu from asm-arm/domain.h, these are common materials now
   - update everything according to the recent changes (IOREQ related function
 names don't contain "hvm" prefixes/infixes anymore, IOREQ related fields
 are part of common struct vcpu/domain now, etc)

Changes V2 -> V3:
   - update patch according the "legacy interface" is x86 specific
   - add dummy arch hooks
   - remove dummy paging_mark_pfn_dirty()
   - don’t include  in common ioreq.c
   - don’t include  in arch ioreq.h
   - remove #define ioreq_params(d, i)

Changes V3 -> V4:
   - rebase
   - update patch according to the renaming IO_ -> VIO_ (io_ -> vio_)
 and misc changes to arch hooks
   - update patch according to the IOREQ related dm-op handling changes
   - don't include  from arch header
   - make all arch hooks out-of-line
   - add a comment above IOREQ_STATUS_* #define-s

Changes V4 -> V5:
   - change the placement of ioreq_server_destroy_all() in arm/domain.c
   - don't include public/hvm/dm_op.h by asm-arm/domain.h
   - include public/hvm/dm_op.h by xen/dm.h
   - put arch ioreq.h directly into asm-arm subdir
   - remove do_dm_op() in arm/dm.c, this is a common material now
   - remove obsolete ioreq_complete_mmio() from asm-arm/ioreq.h
   - optimize arch_ioreq_complete_mmio() to not call try_handle_mmio(),
 but try_handle_mmio(), use ASSERT_UNREACHABLE() if state is incorrect
   - split changes to check_for_vcpu_work() to be squashed with patch #15
---
---
 xen/arch/arm/Makefile|   2 +
 xen/arch/arm/dm.c|  97 +++

[PATCH V5 15/22] xen/arm: Call vcpu_ioreq_handle_completion() in check_for_vcpu_work()

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

This patch adds remaining bits needed for the IOREQ support on Arm.
Besides just calling vcpu_ioreq_handle_completion() we need to handle
it's return value to make sure that all the vCPU works are done before
we return to the guest (the vcpu_ioreq_handle_completion() may return
false if there is vCPU work to do or IOREQ state is invalid).
For that reason we use an unbounded loop in leave_hypervisor_to_guest().

The worse that can happen here if the vCPU will never run again
(the I/O will never complete). But, in Xen case, if the I/O never
completes then it most likely means that something went horribly
wrong with the Device Emulator. And it is most likely not safe
to continue. So letting the vCPU to spin forever if the I/O never
completes is a safer action than letting it continue and leaving
the guest in unclear state and is the best what we can do for now.

Please note, using this loop we will not spin forever on a pCPU,
preventing any other vCPUs from being scheduled. At every loop
we will call check_for_pcpu_work() that will process pending
softirqs. In case of failure, the guest will crash and the vCPU
will be unscheduled. In normal case, if the rescheduling is necessary
the vCPU will be rescheduled to give place to someone else.

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 
Reviewed-by: Stefano Stabellini 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch, changes were derived from (+ new explanation):
 arm/ioreq: Introduce arch specific bits for IOREQ/DM features

Changes V2 -> V3:
   - update patch description

Changes V3 -> V4:
   - update patch description and comment in code

Changes V4 -> V5:
   - add Stefano's R-b
   - update patch subject/description and comment in code,
 was "xen/arm: Stick around in leave_hypervisor_to_guest until I/O has 
completed"
   - change loop logic a bit
   - squash with changes to check_for_vcpu_work() from patch #14

---
---
 xen/arch/arm/traps.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index b0cd8f9..2039ff5 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2261,12 +2262,23 @@ static void check_for_pcpu_work(void)
  * Process pending work for the vCPU. Any call should be fast or
  * implement preemption.
  */
-static void check_for_vcpu_work(void)
+static bool check_for_vcpu_work(void)
 {
 struct vcpu *v = current;
 
+#ifdef CONFIG_IOREQ_SERVER
+bool handled;
+
+local_irq_enable();
+handled = vcpu_ioreq_handle_completion(v);
+local_irq_disable();
+
+if ( !handled )
+return true;
+#endif
+
 if ( likely(!v->arch.need_flush_to_ram) )
-return;
+return false;
 
 /*
  * Give a chance for the pCPU to process work before handling the vCPU
@@ -2277,6 +2289,8 @@ static void check_for_vcpu_work(void)
 local_irq_enable();
 p2m_flush_vm(v);
 local_irq_disable();
+
+return false;
 }
 
 /*
@@ -2289,7 +2303,13 @@ void leave_hypervisor_to_guest(void)
 {
 local_irq_disable();
 
-check_for_vcpu_work();
+/*
+ * check_for_vcpu_work() may return true if there are more work to before
+ * the vCPU can safely resume. This gives us an opportunity to deschedule
+ * the vCPU if needed.
+ */
+while ( check_for_vcpu_work() )
+check_for_pcpu_work();
 check_for_pcpu_work();
 
 vgic_sync_to_lrs();
-- 
2.7.4

[PATCH V5 18/22] xen/dm: Introduce xendevicemodel_set_irq_level DM op

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

This patch adds ability to the device emulator to notify otherend
(some entity running in the guest) using a SPI and implements Arm
specific bits for it. Proposed interface allows emulator to set
the logical level of a one of a domain's IRQ lines.

We can't reuse the existing DM op (xen_dm_op_set_isa_irq_level)
to inject an interrupt as the "isa_irq" field is only 8-bit and
able to cover IRQ 0 - 255, whereas we need a wider range (0 - 1020).

Please note, for egde-triggered interrupt (which is used for
the virtio-mmio emulation) we only trigger the interrupt on Arm
if the level is asserted (rising edge) and do nothing if the level
is deasserted (falling edge), so the call could be named "trigger_irq"
(without the level parameter). But, in order to model the line closely
(to be able to support level-triggered interrupt) we need to know whether
the line is low or high, so the proposed interface has been chosen.
However, it is worth mentioning that in case of the level-triggered
interrupt, we should keep injecting the interrupt to the guest until
the line is deasserted (this is not covered by current patch).

Signed-off-by: Julien Grall 
Signed-off-by: Oleksandr Tyshchenko 
Acked-by: Stefano Stabellini 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - check incoming parameters in arch_dm_op()
   - add explicit padding to struct xen_dm_op_set_irq_level

Changes V1 -> V2:
   - update the author of a patch
   - update patch description
   - check that padding is always 0
   - mention that interface is Arm only and only SPIs are
 supported for now
   - allow to set the logical level of a line for non-allocated
 interrupts only
   - add xen_dm_op_set_irq_level_t

Changes V2 -> V3:
   - no changes

Changes V3 -> V4:
   - update patch description
   - update patch according to the IOREQ related dm-op handling changes

Changes V4 -> V5:
   - rebase
   - add Stefano-s A-b
---
---
 tools/include/xendevicemodel.h   |  4 +++
 tools/libs/devicemodel/core.c| 18 ++
 tools/libs/devicemodel/libxendevicemodel.map |  1 +
 xen/arch/arm/dm.c| 54 +++-
 xen/include/public/hvm/dm_op.h   | 16 +
 5 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/tools/include/xendevicemodel.h b/tools/include/xendevicemodel.h
index e877f5c..c06b3c8 100644
--- a/tools/include/xendevicemodel.h
+++ b/tools/include/xendevicemodel.h
@@ -209,6 +209,10 @@ int xendevicemodel_set_isa_irq_level(
 xendevicemodel_handle *dmod, domid_t domid, uint8_t irq,
 unsigned int level);
 
+int xendevicemodel_set_irq_level(
+xendevicemodel_handle *dmod, domid_t domid, unsigned int irq,
+unsigned int level);
+
 /**
  * This function maps a PCI INTx line to a an IRQ line.
  *
diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index 4d40639..30bd79f 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
 return xendevicemodel_op(dmod, domid, 1, , sizeof(op));
 }
 
+int xendevicemodel_set_irq_level(
+xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
+unsigned int level)
+{
+struct xen_dm_op op;
+struct xen_dm_op_set_irq_level *data;
+
+memset(, 0, sizeof(op));
+
+op.op = XEN_DMOP_set_irq_level;
+data = _irq_level;
+
+data->irq = irq;
+data->level = level;
+
+return xendevicemodel_op(dmod, domid, 1, , sizeof(op));
+}
+
 int xendevicemodel_set_pci_link_route(
 xendevicemodel_handle *dmod, domid_t domid, uint8_t link, uint8_t irq)
 {
diff --git a/tools/libs/devicemodel/libxendevicemodel.map 
b/tools/libs/devicemodel/libxendevicemodel.map
index 561c62d..a0c3012 100644
--- a/tools/libs/devicemodel/libxendevicemodel.map
+++ b/tools/libs/devicemodel/libxendevicemodel.map
@@ -32,6 +32,7 @@ VERS_1.2 {
global:
xendevicemodel_relocate_memory;
xendevicemodel_pin_memory_cacheattr;
+   xendevicemodel_set_irq_level;
 } VERS_1.1;
 
 VERS_1.3 {
diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
index f254ed7..7854133 100644
--- a/xen/arch/arm/dm.c
+++ b/xen/arch/arm/dm.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 
+#include 
+
 int dm_op(const struct dmop_args *op_args)
 {
 struct domain *d;
@@ -35,6 +37,7 @@ int dm_op(const struct dmop_args *op_args)
 [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct 
xen_dm_op_ioreq_server_range),
 [XEN_DMOP_set_ioreq_server_state]   = sizeof(struct 
xen_dm_op_set_ioreq_server_state),
 [XEN_DMOP_destroy_ioreq_server] = sizeof(struct 
xen_dm_op_destroy_ioreq_server),
+[XEN_DMOP_set_irq_level]= sizeof(struct 
xen_dm_op_set_irq_level),
 };
 
 rc =

[PATCH V5 22/22] xen/arm: Add mapcache invalidation handling

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

We need to send mapcache invalidation request to qemu/demu everytime
the page gets removed from a guest.

At the moment, the Arm code doesn't explicitely remove the existing
mapping before inserting the new mapping. Instead, this is done
implicitely by __p2m_set_entry().

So we need to recognize a case when old entry is a RAM page *and*
the new MFN is different in order to set the corresponding flag.
The most suitable place to do this is p2m_free_entry(), there
we can find the correct leaf type. The invalidation request
will be sent in do_trap_hypercall() later on.

Taking into the account the following the do_trap_hypercall()
is the best place to send invalidation request:
 - The only way a guest can modify its P2M on Arm is via an hypercall
 - When sending the invalidation request, the vCPU will be blocked
   until all the IOREQ servers have acknowledged the invalidation

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 
Reviewed-by: Stefano Stabellini 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

***
Please note, this patch depends on the following which is
on review:
https://patchwork.kernel.org/patch/11803383/

This patch is on par with x86 code (whether it is buggy or not).
If there is a need to improve/harden something, this can be done on
a follow-up.
***

Changes V1 -> V2:
   - new patch, some changes were derived from (+ new explanation):
 xen/ioreq: Make x86's invalidate qemu mapcache handling common
   - put setting of the flag into __p2m_set_entry()
   - clarify the conditions when the flag should be set
   - use domain_has_ioreq_server()
   - update do_trap_hypercall() by adding local variable

Changes V2 -> V3:
   - update patch description
   - move check to p2m_free_entry()
   - add a comment
   - use "curr" instead of "v" in do_trap_hypercall()

Changes V3 -> V4:
   - update patch description
   - re-order check in p2m_free_entry() to call domain_has_ioreq_server()
 only if p2m->domain == current->domain
   - add a comment in do_trap_hypercall()

Changes V4 -> V5:
   - add Stefano's R-b
   - update comment in do_trap_hypercall()
---
---
 xen/arch/arm/p2m.c   | 25 +
 xen/arch/arm/traps.c | 20 +---
 2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index d41c4fa..26acb95d 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,6 +1,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -749,17 +750,25 @@ static void p2m_free_entry(struct p2m_domain *p2m,
 if ( !p2m_is_valid(entry) )
 return;
 
-/* Nothing to do but updating the stats if the entry is a super-page. */
-if ( p2m_is_superpage(entry, level) )
+if ( p2m_is_superpage(entry, level) || (level == 3) )
 {
-p2m->stats.mappings[level]--;
-return;
-}
+#ifdef CONFIG_IOREQ_SERVER
+/*
+ * If this gets called (non-recursively) then either the entry
+ * was replaced by an entry with a different base (valid case) or
+ * the shattering of a superpage was failed (error case).
+ * So, at worst, the spurious mapcache invalidation might be sent.
+ */
+if ( (p2m->domain == current->domain) &&
+  domain_has_ioreq_server(p2m->domain) &&
+  p2m_is_ram(entry.p2m.type) )
+p2m->domain->mapcache_invalidate = true;
+#endif
 
-if ( level == 3 )
-{
 p2m->stats.mappings[level]--;
-p2m_put_l3_page(entry);
+/* Nothing to do if the entry is a super-page. */
+if ( level == 3 )
+p2m_put_l3_page(entry);
 return;
 }
 
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 4cdd343..64b740b 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1443,6 +1443,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, 
register_t *nr,
   const union hsr hsr)
 {
 arm_hypercall_fn_t call = NULL;
+struct vcpu *curr = current;
 
 BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
 
@@ -1459,7 +1460,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, 
register_t *nr,
 return;
 }
 
-current->hcall_preempted = false;
+curr->hcall_preempted = false;
 
 perfc_incra(hypercalls, *nr);
 call = arm_hypercall_table[*nr].fn;
@@ -1472,7 +1473,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, 
register_t *nr,
 HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
 
 #ifndef NDEBUG
-if ( !current->hcall_preempted )
+if ( !curr->hcall_preempted )
 {
 /* Deliberately corrupt parameter regs used by this hypercall. */
 switch ( arm_hypercall_table[*nr].nr_args ) {
@@ -1489,8 +1490,21 @@ static void do_trap_hypercall(struct cpu_user_regs 
*regs,

[PATCH V5 20/22] xen/arm: io: Harden sign extension check

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

In the ideal world we would never get an undefined behavior when
propagating the sign bit since that bit can only be set for access
size smaller than the register size (i.e byte/half-word for aarch32,
byte/half-word/word for aarch64).

In the real world we need to care for *possible* hardware bug such as
advertising a sign extension for either 64-bit (or 32-bit) on Arm64
(resp. Arm32).

So harden a bit more the code to prevent undefined behavior when
propagating the sign bit in case of buggy hardware.

Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Volodymyr Babchuk 
CC: Julien Grall 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V3 -> V4:
   - new patch

Changes V4 -> V5:
   - add Stefano's and Volodymyr's R-b
---
---
 xen/include/asm-arm/traps.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h
index c6b3cc7..2ed2b85 100644
--- a/xen/include/asm-arm/traps.h
+++ b/xen/include/asm-arm/traps.h
@@ -94,7 +94,8 @@ static inline register_t sign_extend(const struct hsr_dabt 
dabt, register_t r)
  * Note that we expect the read handler to have zeroed the bits
  * outside the requested access size.
  */
-if ( dabt.sign && (r & (1UL << (size - 1))) )
+if ( dabt.sign && (size < sizeof(register_t) * 8) &&
+ (r & (1UL << (size - 1))) )
 {
 /*
  * We are relying on register_t using the same as
-- 
2.7.4

Re: [PATCH v10 05/11] common/domain: add a domain context record for shared_info...

2021-01-25 Thread Andrew Cooper

On 08/10/2020 19:57, Paul Durrant wrote:
> diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> index c4be9f570c..bccbaadd0b 100644
> --- a/xen/include/public/save.h
> +++ b/xen/include/public/save.h
> @@ -58,6 +59,16 @@ struct domain_context_start {
>  uint32_t xen_major, xen_minor;
>  };
>  
> +struct domain_context_shared_info {
> +uint32_t flags;
> +
> +#define _DOMAIN_CONTEXT_32BIT_SHARED_INFO 0
> +#define DOMAIN_CONTEXT_32BIT_SHARED_INFO \
> +(1U << _DOMAIN_CONTEXT_32BIT_SHARED_INFO)

There is no need for the logarithm version of this constant.

You do however want an explicit uint32_t _rsvd; so buffer[] doesn't
start at the wrong alignment for an efficient memcpy() in 64bit builds
of Xen.

~Andrew

[PATCH V5 10/22] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

The IOREQ is a common feature now and these fields will be used
on Arm as is. Move them to common struct vcpu as a part of new
struct vcpu_io and drop duplicating "io" prefixes. Also move
enum hvm_io_completion to xen/sched.h and remove "hvm" prefixes.

This patch completely removes layering violation in the common code.

Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Julien Grall 
Reviewed-by: Paul Durrant 
Acked-by: Jan Beulich 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch

Changes V2 -> V3:
   - update patch according the "legacy interface" is x86 specific
   - update patch description
   - drop the "io" prefixes from the field names
   - wrap IO_realmode_completion

Changes V3 -> V4:
   - rename all hvm_vcpu_io locals to "hvio"
   - rename according to the new renaming scheme IO_ -> VIO_ (io_ -> vio_)
   - drop "io" prefix from io_completion locals

Changes V4 -> V5:
   - add Julien's and Paul's R-b, Jan's A-b
---
---
 xen/arch/x86/hvm/emulate.c| 210 +++---
 xen/arch/x86/hvm/hvm.c|   2 +-
 xen/arch/x86/hvm/io.c |  32 +++---
 xen/arch/x86/hvm/ioreq.c  |   6 +-
 xen/arch/x86/hvm/svm/nestedsvm.c  |   2 +-
 xen/arch/x86/hvm/vmx/realmode.c   |   8 +-
 xen/common/ioreq.c|  26 ++---
 xen/include/asm-x86/hvm/emulate.h |   2 +-
 xen/include/asm-x86/hvm/vcpu.h|  11 --
 xen/include/xen/ioreq.h   |   2 +-
 xen/include/xen/sched.h   |  19 
 11 files changed, 164 insertions(+), 156 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 4d62199..21051ce 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -140,15 +140,15 @@ static const struct hvm_io_handler ioreq_server_handler = 
{
  */
 void hvmemul_cancel(struct vcpu *v)
 {
-struct hvm_vcpu_io *vio = >arch.hvm.hvm_io;
+struct hvm_vcpu_io *hvio = >arch.hvm.hvm_io;
 
-vio->io_req.state = STATE_IOREQ_NONE;
-vio->io_completion = HVMIO_no_completion;
-vio->mmio_cache_count = 0;
-vio->mmio_insn_bytes = 0;
-vio->mmio_access = (struct npfec){};
-vio->mmio_retry = false;
-vio->g2m_ioport = NULL;
+v->io.req.state = STATE_IOREQ_NONE;
+v->io.completion = VIO_no_completion;
+hvio->mmio_cache_count = 0;
+hvio->mmio_insn_bytes = 0;
+hvio->mmio_access = (struct npfec){};
+hvio->mmio_retry = false;
+hvio->g2m_ioport = NULL;
 
 hvmemul_cache_disable(v);
 }
@@ -159,7 +159,7 @@ static int hvmemul_do_io(
 {
 struct vcpu *curr = current;
 struct domain *currd = curr->domain;
-struct hvm_vcpu_io *vio = >arch.hvm.hvm_io;
+struct vcpu_io *vio = >io;
 ioreq_t p = {
 .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
 .addr = addr,
@@ -184,13 +184,13 @@ static int hvmemul_do_io(
 return X86EMUL_UNHANDLEABLE;
 }
 
-switch ( vio->io_req.state )
+switch ( vio->req.state )
 {
 case STATE_IOREQ_NONE:
 break;
 case STATE_IORESP_READY:
-vio->io_req.state = STATE_IOREQ_NONE;
-p = vio->io_req;
+vio->req.state = STATE_IOREQ_NONE;
+p = vio->req;
 
 /* Verify the emulation request has been correctly re-issued */
 if ( (p.type != (is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO)) ||
@@ -238,7 +238,7 @@ static int hvmemul_do_io(
 }
 ASSERT(p.count);
 
-vio->io_req = p;
+vio->req = p;
 
 rc = hvm_io_intercept();
 
@@ -247,12 +247,12 @@ static int hvmemul_do_io(
  * our callers and mirror this into latched state.
  */
 ASSERT(p.count <= *reps);
-*reps = vio->io_req.count = p.count;
+*reps = vio->req.count = p.count;
 
 switch ( rc )
 {
 case X86EMUL_OKAY:
-vio->io_req.state = STATE_IOREQ_NONE;
+vio->req.state = STATE_IOREQ_NONE;
 break;
 case X86EMUL_UNHANDLEABLE:
 {
@@ -305,7 +305,7 @@ static int hvmemul_do_io(
 if ( s == NULL )
 {
 rc = X86EMUL_RETRY;
-vio->io_req.state = STATE_IOREQ_NONE;
+vio->req.state = STATE_IOREQ_NONE;
 break;
 }
 
@@ -316,7 +316,7 @@ static int hvmemul_do_io(
 if ( dir == IOREQ_READ )
 {
 rc = hvm_process_io_intercept(_server_handler, );
-vio->io_req.state = STATE_IOREQ_NONE;
+vio->req.state = STATE_IOREQ_NONE;
 break;
 }
 }
@@ -329,14 +329,14 @@ static int hvmemul_do_io(
 if ( !s )
 {
 rc = hvm_process_io_intercept(_handler, );
-vio->io_req.state = STATE_IOREQ_NONE;
+vio->req.state = STATE_IOREQ_NONE;
 }
 else

[PATCH V5 11/22] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common

2021-01-25 Thread Oleksandr Tyshchenko

From: Julien Grall 

As x86 implementation of XENMEM_resource_ioreq_server can be
re-used on Arm later on, this patch makes it common and removes
arch_acquire_resource (and the corresponding option) as unneeded.

Also re-order #include-s alphabetically.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Julien Grall 
Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Jan Beulich 
Reviewed-by: Paul Durrant 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - no changes

Changes V1 -> V2:
   - update the author of a patch

Changes V2 -> V3:
   - don't wrap #include 
   - limit the number of #ifdef-s
   - re-order #include-s alphabetically

Changes V3 -> V4:
   - rebase
   - Add Jan's R-b

Changes V4 -> V5:
   - add Paul's R-b
   - update patch description
   - remove ARCH_ACQUIRE_RESOURCE option, etc
---
---
 xen/arch/x86/Kconfig |  1 -
 xen/arch/x86/mm.c| 44 -
 xen/common/Kconfig   |  3 ---
 xen/common/memory.c  | 63 +++-
 xen/include/asm-x86/mm.h |  4 ---
 xen/include/xen/mm.h |  9 ---
 6 files changed, 51 insertions(+), 73 deletions(-)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index ea9a9ea..abe0fce 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -6,7 +6,6 @@ config X86
select ACPI
select ACPI_LEGACY_TABLES_LOOKUP
select ARCH_SUPPORTS_INT128
-   select ARCH_ACQUIRE_RESOURCE
select COMPAT
select CORE_PARKING
select HAS_ALTERNATIVE
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 59eb5c8..4366ea3 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4587,50 +4587,6 @@ static int handle_iomem_range(unsigned long s, unsigned 
long e, void *p)
 return err || s > e ? err : _handle_iomem_range(s, e, p);
 }
 
-int arch_acquire_resource(struct domain *d, unsigned int type,
-  unsigned int id, unsigned long frame,
-  unsigned int nr_frames, xen_pfn_t mfn_list[])
-{
-int rc;
-
-switch ( type )
-{
-#ifdef CONFIG_HVM
-case XENMEM_resource_ioreq_server:
-{
-ioservid_t ioservid = id;
-unsigned int i;
-
-rc = -EINVAL;
-if ( !is_hvm_domain(d) )
-break;
-
-if ( id != (unsigned int)ioservid )
-break;
-
-rc = 0;
-for ( i = 0; i < nr_frames; i++ )
-{
-mfn_t mfn;
-
-rc = hvm_get_ioreq_server_frame(d, id, frame + i, );
-if ( rc )
-break;
-
-mfn_list[i] = mfn_x(mfn);
-}
-break;
-}
-#endif
-
-default:
-rc = -EOPNOTSUPP;
-break;
-}
-
-return rc;
-}
-
 long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 int rc;
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index cf32a07..fa049a6 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -22,9 +22,6 @@ config GRANT_TABLE
 
  If unsure, say Y.
 
-config ARCH_ACQUIRE_RESOURCE
-   bool
-
 config HAS_ALTERNATIVE
bool
 
diff --git a/xen/common/memory.c b/xen/common/memory.c
index ccb4d49..2f274a6 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -8,22 +8,23 @@
  */
 
 #include 
-#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -1091,6 +1092,40 @@ static int acquire_grant_table(struct domain *d, 
unsigned int id,
 return 0;
 }
 
+static int acquire_ioreq_server(struct domain *d,
+unsigned int id,
+unsigned long frame,
+unsigned int nr_frames,
+xen_pfn_t mfn_list[])
+{
+#ifdef CONFIG_IOREQ_SERVER
+ioservid_t ioservid = id;
+unsigned int i;
+int rc;
+
+if ( !is_hvm_domain(d) )
+return -EINVAL;
+
+if ( id != (unsigned int)ioservid )
+return -EINVAL;
+
+for ( i = 0; i < nr_frames; i++ )
+{
+mfn_t mfn;
+
+rc = hvm_get_ioreq_server_frame(d, id, frame + i, );
+if ( rc )
+return rc;
+
+mfn_list[i] = mfn_x(mfn);
+}
+
+return 0;
+#else
+return -EOPNOTSUPP;
+#endif
+}
+
 static int acquire_resource(
 XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
 {
@@ -1149,9 +1184,13 @@ static int acquire_resource(
  mfn_list);
 break;
 
+case XENMEM_resource_ioreq_server:
+rc = acquire_ioreq_server(d, xmar.id,

[PATCH V5 16/22] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

This patch implements reference counting of foreign entries in
in set_foreign_p2m_entry() on Arm. This is a mandatory action if
we want to run emulator (IOREQ server) in other than dom0 domain,
as we can't trust it to do the right thing if it is not running
in dom0. So we need to grab a reference on the page to avoid it
disappearing.

It is valid to always pass "p2m_map_foreign_rw" type to
guest_physmap_add_entry() since the current and foreign domains
would be always different. A case when they are equal would be
rejected by rcu_lock_remote_domain_by_id(). Besides the similar
comment in the code put a respective ASSERT() to catch incorrect
usage in future.

It was tested with IOREQ feature to confirm that all the pages given
to this function belong to a domain, so we can use the same approach
as for XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap_one().

This involves adding an extra parameter for the foreign domain to
set_foreign_p2m_entry() and a helper to indicate whether the arch
supports the reference counting of foreign entries and the restriction
for the hardware domain in the common code can be skipped for it.

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 
Acked-by: Stefano Stabellini 
Reviewed-by: Julien Grall 
Reviewed-by: Jan Beulich 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch, was split from:
 "[RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM 
features"
   - rewrite a logic to handle properly reference in set_foreign_p2m_entry()
 instead of treating foreign entries as p2m_ram_rw

Changes V1 -> V2:
   - rebase according to the recent changes to acquire_resource()
   - update patch description
   - introduce arch_refcounts_p2m()
   - add an explanation why p2m_map_foreign_rw is valid
   - move set_foreign_p2m_entry() to p2m-common.h
   - add const to new parameter

Changes V2 -> V3:
   - update patch description
   - rename arch_refcounts_p2m() to arch_acquire_resource_check()
   - move comment to x86’s arch_acquire_resource_check()
   - return rc in Arm's set_foreign_p2m_entry()
   - put a respective ASSERT() into Arm's set_foreign_p2m_entry()

Changes V3 -> V4:
   - update arch_acquire_resource_check() implementation on x86
 and common code which uses it, pass struct domain to the function
   - put ASSERT() to x86/Arm set_foreign_p2m_entry()
   - use arch_acquire_resource_check() in p2m_add_foreign()
 instead of open-coding it

Changes V4 -> V5:
   - update x86's arch_acquire_resource_check()
 - use single return statement
 - update comment
   - add Jan's and Julien's R-b, Stefano's A-b
---
---
 xen/arch/arm/p2m.c   | 26 ++
 xen/arch/x86/mm/p2m.c|  9 ++---
 xen/common/memory.c  |  9 ++---
 xen/include/asm-arm/p2m.h| 19 +--
 xen/include/asm-x86/p2m.h| 14 +++---
 xen/include/xen/p2m-common.h |  4 
 6 files changed, 58 insertions(+), 23 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 4eeb867..d41c4fa 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1380,6 +1380,32 @@ int guest_physmap_remove_page(struct domain *d, gfn_t 
gfn, mfn_t mfn,
 return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
 }
 
+int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
+  unsigned long gfn, mfn_t mfn)
+{
+struct page_info *page = mfn_to_page(mfn);
+int rc;
+
+ASSERT(arch_acquire_resource_check(d));
+
+if ( !get_page(page, fd) )
+return -EINVAL;
+
+/*
+ * It is valid to always use p2m_map_foreign_rw here as if this gets
+ * called then d != fd. A case when d == fd would be rejected by
+ * rcu_lock_remote_domain_by_id() earlier. Put a respective ASSERT()
+ * to catch incorrect usage in future.
+ */
+ASSERT(d != fd);
+
+rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw);
+if ( rc )
+put_page(page);
+
+return rc;
+}
+
 static struct page_info *p2m_allocate_root(void)
 {
 struct page_info *page;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c1dd45b..2091aed 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1323,8 +1323,11 @@ static int set_typed_p2m_entry(struct domain *d, 
unsigned long gfn_l,
 }
 
 /* Set foreign mfn in the given guest's p2m table. */
-int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
+  unsigned long gfn, mfn_t mfn)
 {
+ASSERT(arch_acquire_resource_check(d));
+
 return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign,
p2m_get_hostp2m(d)->default_access);
 }
@@ -2587,7 +2590,7 @@

[PATCH V5 19/22] xen/arm: io: Abstract sign-extension

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

In order to avoid code duplication (both handle_read() and
handle_ioserv() contain the same code for the sign-extension)
put this code to a common helper to be used for both.

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 
Reviewed-by: Stefano Stabellini 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch

Changes V2 -> V3:
   - no changes

Changes V3 -> V4:
   - no changes here, but in new patch:
 "xen/arm: io: Harden sign extension check"

Changes V4 -> V5:
   - add Stefano-s R-b
---
---
 xen/arch/arm/io.c   | 18 ++
 xen/arch/arm/ioreq.c| 17 +
 xen/include/asm-arm/traps.h | 24 
 3 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 7ac0303..729287e 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "decode.h"
 
@@ -40,26 +41,11 @@ static enum io_state handle_read(const struct mmio_handler 
*handler,
  * setting r).
  */
 register_t r = 0;
-uint8_t size = (1 << dabt.size) * 8;
 
 if ( !handler->ops->read(v, info, , handler->priv) )
 return IO_ABORT;
 
-/*
- * Sign extend if required.
- * Note that we expect the read handler to have zeroed the bits
- * outside the requested access size.
- */
-if ( dabt.sign && (r & (1UL << (size - 1))) )
-{
-/*
- * We are relying on register_t using the same as
- * an unsigned long in order to keep the 32-bit assembly
- * code smaller.
- */
-BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
-r |= (~0UL) << size;
-}
+r = sign_extend(dabt, r);
 
 set_user_reg(regs, dabt.reg, r);
 
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index ffeeb0b..c7ee1a7 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -28,7 +28,6 @@ enum io_state handle_ioserv(struct cpu_user_regs *regs, 
struct vcpu *v)
 const union hsr hsr = { .bits = regs->hsr };
 const struct hsr_dabt dabt = hsr.dabt;
 /* Code is similar to handle_read */
-uint8_t size = (1 << dabt.size) * 8;
 register_t r = v->io.req.data;
 
 /* We are done with the IO */
@@ -37,21 +36,7 @@ enum io_state handle_ioserv(struct cpu_user_regs *regs, 
struct vcpu *v)
 if ( dabt.write )
 return IO_HANDLED;
 
-/*
- * Sign extend if required.
- * Note that we expect the read handler to have zeroed the bits
- * outside the requested access size.
- */
-if ( dabt.sign && (r & (1UL << (size - 1))) )
-{
-/*
- * We are relying on register_t using the same as
- * an unsigned long in order to keep the 32-bit assembly
- * code smaller.
- */
-BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
-r |= (~0UL) << size;
-}
+r = sign_extend(dabt, r);
 
 set_user_reg(regs, dabt.reg, r);
 
diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h
index c4a3d0f..c6b3cc7 100644
--- a/xen/include/asm-arm/traps.h
+++ b/xen/include/asm-arm/traps.h
@@ -84,6 +84,30 @@ static inline bool VABORT_GEN_BY_GUEST(const struct 
cpu_user_regs *regs)
 (unsigned long)abort_guest_exit_end == regs->pc;
 }
 
+/* Check whether the sign extension is required and perform it */
+static inline register_t sign_extend(const struct hsr_dabt dabt, register_t r)
+{
+uint8_t size = (1 << dabt.size) * 8;
+
+/*
+ * Sign extend if required.
+ * Note that we expect the read handler to have zeroed the bits
+ * outside the requested access size.
+ */
+if ( dabt.sign && (r & (1UL << (size - 1))) )
+{
+/*
+ * We are relying on register_t using the same as
+ * an unsigned long in order to keep the 32-bit assembly
+ * code smaller.
+ */
+BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
+r |= (~0UL) << size;
+}
+
+return r;
+}
+
 #endif /* __ASM_ARM_TRAPS__ */
 /*
  * Local variables:
-- 
2.7.4

[PATCH V5 09/22] xen/ioreq: Make x86's IOREQ related dm-op handling common

2021-01-25 Thread Oleksandr Tyshchenko

From: Julien Grall 

As a lot of x86 code can be re-used on Arm later on, this patch
moves the IOREQ related dm-op handling to the common code.

The idea is to have the top level dm-op handling arch-specific
and call into ioreq_server_dm_op() for otherwise unhandled ops.
Pros:
- More natural than doing it other way around (top level dm-op
handling common).
- Leave compat_dm_op() in x86 code.
Cons:
- Code duplication. Both arches have to duplicate dm_op(), etc.

Make the corresponding functions static and rename them according
to the new naming scheme (including dropping the "hvm" prefixes).

Introduce common dm.c file as a resting place for the do_dm_op()
(which is identical for both Arm and x86) to minimize code duplication.
The common DM feature is supposed to be built with IOREQ_SERVER
option enabled (as well as the IOREQ feature), which is selected
for x86's config HVM for now.

Also update XSM code a bit to let dm-op be used on Arm.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Julien Grall 
Signed-off-by: Oleksandr Tyshchenko 
Acked-by: Jan Beulich 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - update XSM, related changes were pulled from:
 [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM 
features

Changes V1 -> V2:
   - update the author of a patch
   - update patch description
   - introduce xen/dm.h and move definitions here

Changes V2 -> V3:
   - no changes

Changes V3 -> V4:
   - rework to have the top level dm-op handling arch-specific
   - update patch subject/description, was "xen/dm: Make x86's DM feature 
common"
   - make a few functions static in common ioreq.c

Changes V4 -> V5:
   - update patch description
   - add Jan's A-b
   - drop the 'hvm_' prefixes of touched functions and rename them
 instead of doing that in patch #12
   - add common dm.c to keep do_dm_op(), make dm_op() global
---
---
 xen/arch/x86/hvm/dm.c   | 132 ++
 xen/common/Makefile |   1 +
 xen/common/dm.c |  55 +++
 xen/common/ioreq.c  | 137 ++--
 xen/include/xen/dm.h|  41 +++
 xen/include/xen/ioreq.h |  21 ++--
 xen/include/xsm/dummy.h |   4 +-
 xen/include/xsm/xsm.h   |   6 +--
 xen/xsm/dummy.c |   2 +-
 xen/xsm/flask/hooks.c   |   5 +-
 10 files changed, 235 insertions(+), 169 deletions(-)
 create mode 100644 xen/common/dm.c
 create mode 100644 xen/include/xen/dm.h

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index d3e2a9e..5bc172a 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -16,6 +16,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -29,13 +30,6 @@
 
 #include 
 
-struct dmop_args {
-domid_t domid;
-unsigned int nr_bufs;
-/* Reserve enough buf elements for all current hypercalls. */
-struct xen_dm_op_buf buf[2];
-};
-
 static bool _raw_copy_from_guest_buf_offset(void *dst,
 const struct dmop_args *args,
 unsigned int buf_idx,
@@ -338,7 +332,7 @@ static int inject_event(struct domain *d,
 return 0;
 }
 
-static int dm_op(const struct dmop_args *op_args)
+int dm_op(const struct dmop_args *op_args)
 {
 struct domain *d;
 struct xen_dm_op op;
@@ -408,71 +402,6 @@ static int dm_op(const struct dmop_args *op_args)
 
 switch ( op.op )
 {
-case XEN_DMOP_create_ioreq_server:
-{
-struct xen_dm_op_create_ioreq_server *data =
-_ioreq_server;
-
-const_op = false;
-
-rc = -EINVAL;
-if ( data->pad[0] || data->pad[1] || data->pad[2] )
-break;
-
-rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
- >id);
-break;
-}
-
-case XEN_DMOP_get_ioreq_server_info:
-{
-struct xen_dm_op_get_ioreq_server_info *data =
-_ioreq_server_info;
-const uint16_t valid_flags = XEN_DMOP_no_gfns;
-
-const_op = false;
-
-rc = -EINVAL;
-if ( data->flags & ~valid_flags )
-break;
-
-rc = hvm_get_ioreq_server_info(d, data->id,
-   (data->flags & XEN_DMOP_no_gfns) ?
-   NULL : >ioreq_gfn,
-   (data->flags & XEN_DMOP_no_gfns) ?
-   NULL : >bufioreq_gfn,
-   >bufioreq_port);
-break;
-}
-
-case XEN_DMOP_map_io_range_to_ioreq_server:
-{
-const struct xen_dm_op_ioreq_server_range *data =
-_io_range_to_ioreq_server;
-
-rc = -EINVAL;
-if ( data->pad )
-

[PATCH V5 17/22] xen/ioreq: Introduce domain_has_ioreq_server()

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

This patch introduces a helper the main purpose of which is to check
if a domain is using IOREQ server(s).

On Arm the current benefit is to avoid calling vcpu_ioreq_handle_completion()
(which implies iterating over all possible IOREQ servers anyway)
on every return in leave_hypervisor_to_guest() if there is no active
servers for the particular domain.
Also this helper will be used by one of the subsequent patches on Arm.

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch

Changes V1 -> V2:
   - update patch description
   - guard helper with CONFIG_IOREQ_SERVER
   - remove "hvm" prefix
   - modify helper to just return d->arch.hvm.ioreq_server.nr_servers
   - put suitable ASSERT()s
   - use ASSERT(d->ioreq_server.server[id] ? !s : !!s) in set_ioreq_server()
   - remove d->ioreq_server.nr_servers = 0 from hvm_ioreq_init()

Changes V2 -> V3:
   - update patch description
   - remove ASSERT()s from the helper, add a comment
   - use #ifdef CONFIG_IOREQ_SERVER inside function body
   - use new ASSERT() construction in set_ioreq_server()

Changes V3 -> V4:
   - update patch description
   - drop per-domain variable "nr_servers"
   - reimplement a helper to count the non-NULL entries
   - make the helper out-of-line

Changes V4 -> V5:
   - add Stefano's and Paul's R-b

---
---
 xen/arch/arm/traps.c| 15 +--
 xen/common/ioreq.c  | 16 
 xen/include/xen/ioreq.h |  2 ++
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 2039ff5..4cdd343 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2267,14 +2267,17 @@ static bool check_for_vcpu_work(void)
 struct vcpu *v = current;
 
 #ifdef CONFIG_IOREQ_SERVER
-bool handled;
+if ( domain_has_ioreq_server(v->domain) )
+{
+bool handled;
 
-local_irq_enable();
-handled = vcpu_ioreq_handle_completion(v);
-local_irq_disable();
+local_irq_enable();
+handled = vcpu_ioreq_handle_completion(v);
+local_irq_disable();
 
-if ( !handled )
-return true;
+if ( !handled )
+return true;
+}
 #endif
 
 if ( likely(!v->arch.need_flush_to_ram) )
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 07572a5..5b0f03e 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -80,6 +80,22 @@ static ioreq_t *get_ioreq(struct ioreq_server *s, struct 
vcpu *v)
 return >vcpu_ioreq[v->vcpu_id];
 }
 
+/*
+ * This should only be used when d == current->domain or when they're
+ * distinct and d is paused. Otherwise the result is stale before
+ * the caller can inspect it.
+ */
+bool domain_has_ioreq_server(const struct domain *d)
+{
+const struct ioreq_server *s;
+unsigned int id;
+
+FOR_EACH_IOREQ_SERVER(d, id, s)
+return true;
+
+return false;
+}
+
 static struct ioreq_vcpu *get_pending_vcpu(const struct vcpu *v,
struct ioreq_server **srvp)
 {
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 0b433e2..89ee171 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -83,6 +83,8 @@ static inline bool ioreq_needs_completion(const ioreq_t 
*ioreq)
 #define HANDLE_BUFIOREQ(s) \
 ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
 
+bool domain_has_ioreq_server(const struct domain *d);
+
 bool vcpu_ioreq_pending(struct vcpu *v);
 bool vcpu_ioreq_handle_completion(struct vcpu *v);
 bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
-- 
2.7.4

[PATCH V5 12/22] xen/ioreq: Remove "hvm" prefixes from involved function names

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

This patch removes "hvm" prefixes and infixes from IOREQ related
function names in the common code and performs a renaming where
appropriate according to the more consistent new naming scheme:
- IOREQ server functions should start with "ioreq_server_"
- IOREQ functions should start with "ioreq_"

A few function names are clarified to better fit into their purposes:
handle_hvm_io_completion -> vcpu_ioreq_handle_completion
hvm_io_pending   -> vcpu_ioreq_pending
hvm_ioreq_init   -> ioreq_domain_init
hvm_alloc_ioreq_mfn  -> ioreq_server_alloc_mfn
hvm_free_ioreq_mfn   -> ioreq_server_free_mfn

Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Jan Beulich 
Reviewed-by: Paul Durrant 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch

Changes V2 -> V3:
   - update patch according the "legacy interface" is x86 specific
   - update patch description
   - rename everything touched according to new naming scheme

Changes V3 -> V4:
   - rebase
   - rename ioreq_update_evtchn() to ioreq_server_update_evtchn()
   - add Jan's R-b

Changes V4 -> V5:
   - rebase
   - add Paul's R-b
---
---
 xen/arch/x86/hvm/emulate.c  |   6 +-
 xen/arch/x86/hvm/hvm.c  |  10 ++--
 xen/arch/x86/hvm/io.c   |   6 +-
 xen/arch/x86/hvm/ioreq.c|   2 +-
 xen/arch/x86/hvm/stdvga.c   |   4 +-
 xen/arch/x86/hvm/vmx/vvmx.c |   2 +-
 xen/common/ioreq.c  | 138 ++--
 xen/common/memory.c |   2 +-
 xen/include/xen/ioreq.h |  26 -
 9 files changed, 98 insertions(+), 98 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 21051ce..425c8dd 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -261,7 +261,7 @@ static int hvmemul_do_io(
  * an ioreq server that can handle it.
  *
  * Rules:
- * A> PIO or MMIO accesses run through hvm_select_ioreq_server() to
+ * A> PIO or MMIO accesses run through ioreq_server_select() to
  * choose the ioreq server by range. If no server is found, the access
  * is ignored.
  *
@@ -323,7 +323,7 @@ static int hvmemul_do_io(
 }
 
 if ( !s )
-s = hvm_select_ioreq_server(currd, );
+s = ioreq_server_select(currd, );
 
 /* If there is no suitable backing DM, just ignore accesses */
 if ( !s )
@@ -333,7 +333,7 @@ static int hvmemul_do_io(
 }
 else
 {
-rc = hvm_send_ioreq(s, , 0);
+rc = ioreq_send(s, , 0);
 if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
 vio->req.state = STATE_IOREQ_NONE;
 else if ( !ioreq_needs_completion(>req) )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4ed929c..0d7bb42 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -546,7 +546,7 @@ void hvm_do_resume(struct vcpu *v)
 
 pt_restore_timer(v);
 
-if ( !handle_hvm_io_completion(v) )
+if ( !vcpu_ioreq_handle_completion(v) )
 return;
 
 if ( unlikely(v->arch.vm_event) )
@@ -677,7 +677,7 @@ int hvm_domain_initialise(struct domain *d)
 register_g2m_portio_handler(d);
 register_vpci_portio_handler(d);
 
-hvm_ioreq_init(d);
+ioreq_domain_init(d);
 
 hvm_init_guest_time(d);
 
@@ -739,7 +739,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
 
 viridian_domain_deinit(d);
 
-hvm_destroy_all_ioreq_servers(d);
+ioreq_server_destroy_all(d);
 
 msixtbl_pt_cleanup(d);
 
@@ -1582,7 +1582,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 if ( rc )
 goto fail5;
 
-rc = hvm_all_ioreq_servers_add_vcpu(d, v);
+rc = ioreq_server_add_vcpu_all(d, v);
 if ( rc != 0 )
 goto fail6;
 
@@ -1618,7 +1618,7 @@ void hvm_vcpu_destroy(struct vcpu *v)
 {
 viridian_vcpu_deinit(v);
 
-hvm_all_ioreq_servers_remove_vcpu(v->domain, v);
+ioreq_server_remove_vcpu_all(v->domain, v);
 
 if ( hvm_altp2m_supported() )
 altp2m_vcpu_destroy(v);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index dd733e1..66a37ee 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -60,7 +60,7 @@ void send_timeoffset_req(unsigned long timeoff)
 if ( timeoff == 0 )
 return;
 
-if ( hvm_broadcast_ioreq(, true) != 0 )
+if ( ioreq_broadcast(, true) != 0 )
 gprintk(XENLOG_ERR, "Unsuccessful timeoffset update\n");
 }
 
@@ -74,7 +74,7 @@ void send_invalidate_req(void)
 .data = ~0UL, /* flush all */
 };
 
-if ( hvm_broadcast_ioreq(, false) != 0 )
+if ( ioreq_broadcast(, false) != 0 )
 gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n");
 }
 
@@ -155,7 +155,7 @@ bool handle_pio(uint16_t port, unsigned int size, int dir)
  *

[PATCH V5 21/22] xen/ioreq: Make x86's send_invalidate_req() common

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

As the IOREQ is a common feature now and we also need to
invalidate qemu/demu mapcache on Arm when the required condition
occurs this patch moves this function to the common code
(and remames it to ioreq_signal_mapcache_invalidate).
This patch also moves per-domain qemu_mapcache_invalidate
variable out of the arch sub-struct (and drops "qemu" prefix).

We don't put this variable inside the #ifdef CONFIG_IOREQ_SERVER
at the end of struct domain, but in the hole next to the group
of 5 bools further up which is more efficient.

The subsequent patch will add mapcache invalidation handling on Arm.

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 
Reviewed-by: Paul Durrant 
Acked-by: Jan Beulich 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - move send_invalidate_req() to the common code
   - update patch subject/description
   - move qemu_mapcache_invalidate out of the arch sub-struct,
 update checks
   - remove #if defined(CONFIG_ARM64) from the common code

Changes V1 -> V2:
   - was split into:
 - xen/ioreq: Make x86's send_invalidate_req() common
 - xen/arm: Add mapcache invalidation handling
   - update patch description/subject
   - move Arm bits to a separate patch
   - don't alter the common code, the flag is set by arch code
   - rename send_invalidate_req() to send_invalidate_ioreq()
   - guard qemu_mapcache_invalidate with CONFIG_IOREQ_SERVER
   - use bool instead of bool_t
   - remove blank line blank line between head comment and #include-s

Changes V2 -> V3:
   - update patch description
   - drop "qemu" prefix from the variable name
   - rename send_invalidate_req() to ioreq_signal_mapcache_invalidate()

Changes V3 -> V4:
   - change variable location in struct domain

Changes V4 -> V5:
   - add Jan's A-b and Paul's R-b
---
---
 xen/arch/x86/hvm/hypercall.c |  9 +
 xen/arch/x86/hvm/io.c| 14 --
 xen/common/ioreq.c   | 14 ++
 xen/include/asm-x86/hvm/domain.h |  1 -
 xen/include/asm-x86/hvm/io.h |  1 -
 xen/include/xen/ioreq.h  |  1 +
 xen/include/xen/sched.h  |  5 +
 7 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index ac573c8..6d41c56 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -20,6 +20,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -47,7 +48,7 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 rc = compat_memory_op(cmd, arg);
 
 if ( (cmd & MEMOP_CMD_MASK) == XENMEM_decrease_reservation )
-curr->domain->arch.hvm.qemu_mapcache_invalidate = true;
+curr->domain->mapcache_invalidate = true;
 
 return rc;
 }
@@ -326,9 +327,9 @@ int hvm_hypercall(struct cpu_user_regs *regs)
 
 HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%lu -> %lx", eax, regs->rax);
 
-if ( unlikely(currd->arch.hvm.qemu_mapcache_invalidate) &&
- test_and_clear_bool(currd->arch.hvm.qemu_mapcache_invalidate) )
-send_invalidate_req();
+if ( unlikely(currd->mapcache_invalidate) &&
+ test_and_clear_bool(currd->mapcache_invalidate) )
+ioreq_signal_mapcache_invalidate();
 
 return curr->hcall_preempted ? HVM_HCALL_preempted : HVM_HCALL_completed;
 }
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 66a37ee..046a8eb 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -64,20 +64,6 @@ void send_timeoffset_req(unsigned long timeoff)
 gprintk(XENLOG_ERR, "Unsuccessful timeoffset update\n");
 }
 
-/* Ask ioemu mapcache to invalidate mappings. */
-void send_invalidate_req(void)
-{
-ioreq_t p = {
-.type = IOREQ_TYPE_INVALIDATE,
-.size = 4,
-.dir = IOREQ_WRITE,
-.data = ~0UL, /* flush all */
-};
-
-if ( ioreq_broadcast(, false) != 0 )
-gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n");
-}
-
 bool hvm_emulate_one_insn(hvm_emulate_validate_t *validate, const char *descr)
 {
 struct hvm_emulate_ctxt ctxt;
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 5b0f03e..67ef1f7 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -35,6 +35,20 @@
 #include 
 #include 
 
+/* Ask ioemu mapcache to invalidate mappings. */
+void ioreq_signal_mapcache_invalidate(void)
+{
+ioreq_t p = {
+.type = IOREQ_TYPE_INVALIDATE,
+.size = 4,
+.dir = IOREQ_WRITE,
+.data = ~0UL, /* flush all */
+};
+
+if ( ioreq_broadcast(, false) != 0 )
+gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n");
+}
+
 static void set_ioreq_server(struct domain *d, unsigned int id,
  struct ioreq_server *s)
 {
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index

[PATCH V5 13/22] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

The cmpxchg() in ioreq_send_buffered() operates on memory shared
with the emulator domain (and the target domain if the legacy
interface is used).

In order to be on the safe side we need to switch
to guest_cmpxchg64() to prevent a domain to DoS Xen on Arm.
The point to use 64-bit version of helper is to support Arm32
since the IOREQ code uses cmpxchg() with 64-bit value.

As there is no plan to support the legacy interface on Arm,
we will have a page to be mapped in a single domain at the time,
so we can use s->emulator in guest_cmpxchg64() safely.

Thankfully the only user of the legacy interface is x86 so far
and there is not concern regarding the atomics operations.

Please note, that the legacy interface *must* not be used on Arm
without revisiting the code.

Signed-off-by: Oleksandr Tyshchenko 
Acked-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch

Changes V1 -> V2:
   - move earlier to avoid breaking arm32 compilation
   - add an explanation to commit description and hvm_allow_set_param()
   - pass s->emulator

Changes V2 -> V3:
   - update patch description

Changes V3 -> V4:
   - add Stefano's A-b
   - drop comment from arm/hvm.c

Changes V4 -> V5:
   - update patch description
   - rebase
   - add Paul's R-b
---
---
 xen/common/ioreq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index de3066a..07572a5 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 #include 
@@ -1185,7 +1186,7 @@ static int ioreq_send_buffered(struct ioreq_server *s, 
ioreq_t *p)
 
 new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
 new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-cmpxchg(>ptrs.full, old.full, new.full);
+guest_cmpxchg64(s->emulator, >ptrs.full, old.full, new.full);
 }
 
 notify_via_xen_event_channel(d, s->bufioreq_evtchn);
-- 
2.7.4

[PATCH V5 08/22] xen/ioreq: Move x86's ioreq_server to struct domain

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

The IOREQ is a common feature now and this struct will be used
on Arm as is. Move it to common struct domain. This also
significantly reduces the layering violation in the common code
(*arch.hvm* usage).

We don't move ioreq_gfn since it is not used in the common code
(the "legacy" mechanism is x86 specific).

Signed-off-by: Oleksandr Tyshchenko 
Acked-by: Jan Beulich 
Reviewed-by: Julien Grall 
Reviewed-by: Paul Durrant 
Reviewed-by: Alex Bennée 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch

Changes V2 -> V3:
   - remove the mention of "ioreq_gfn" from patch subject/description
   - update patch according the "legacy interface" is x86 specific
   - drop hvm_params related changes in arch/x86/hvm/hvm.c
   - leave ioreq_gfn in hvm_domain

Changes V3 -> V4:
   - rebase
   - drop the stale part of the comment above struct ioreq_server
   - add Jan's A-b

Changes V4 -> V5:
   - add Julien's, Alex's and Paul's R-b
---
---
 xen/common/ioreq.c   | 60 
 xen/include/asm-x86/hvm/domain.h |  8 --
 xen/include/xen/sched.h  | 10 +++
 3 files changed, 40 insertions(+), 38 deletions(-)

diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 7320f23..4cb26e6 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -38,13 +38,13 @@ static void set_ioreq_server(struct domain *d, unsigned int 
id,
  struct ioreq_server *s)
 {
 ASSERT(id < MAX_NR_IOREQ_SERVERS);
-ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
+ASSERT(!s || !d->ioreq_server.server[id]);
 
-d->arch.hvm.ioreq_server.server[id] = s;
+d->ioreq_server.server[id] = s;
 }
 
 #define GET_IOREQ_SERVER(d, id) \
-(d)->arch.hvm.ioreq_server.server[id]
+(d)->ioreq_server.server[id]
 
 static struct ioreq_server *get_ioreq_server(const struct domain *d,
  unsigned int id)
@@ -285,7 +285,7 @@ bool is_ioreq_server_page(struct domain *d, const struct 
page_info *page)
 unsigned int id;
 bool found = false;
 
-spin_lock_recursive(>arch.hvm.ioreq_server.lock);
+spin_lock_recursive(>ioreq_server.lock);
 
 FOR_EACH_IOREQ_SERVER(d, id, s)
 {
@@ -296,7 +296,7 @@ bool is_ioreq_server_page(struct domain *d, const struct 
page_info *page)
 }
 }
 
-spin_unlock_recursive(>arch.hvm.ioreq_server.lock);
+spin_unlock_recursive(>ioreq_server.lock);
 
 return found;
 }
@@ -606,7 +606,7 @@ int hvm_create_ioreq_server(struct domain *d, int 
bufioreq_handling,
 return -ENOMEM;
 
 domain_pause(d);
-spin_lock_recursive(>arch.hvm.ioreq_server.lock);
+spin_lock_recursive(>ioreq_server.lock);
 
 for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
 {
@@ -634,13 +634,13 @@ int hvm_create_ioreq_server(struct domain *d, int 
bufioreq_handling,
 if ( id )
 *id = i;
 
-spin_unlock_recursive(>arch.hvm.ioreq_server.lock);
+spin_unlock_recursive(>ioreq_server.lock);
 domain_unpause(d);
 
 return 0;
 
  fail:
-spin_unlock_recursive(>arch.hvm.ioreq_server.lock);
+spin_unlock_recursive(>ioreq_server.lock);
 domain_unpause(d);
 
 xfree(s);
@@ -652,7 +652,7 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t 
id)
 struct ioreq_server *s;
 int rc;
 
-spin_lock_recursive(>arch.hvm.ioreq_server.lock);
+spin_lock_recursive(>ioreq_server.lock);
 
 s = get_ioreq_server(d, id);
 
@@ -684,7 +684,7 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t 
id)
 rc = 0;
 
  out:
-spin_unlock_recursive(>arch.hvm.ioreq_server.lock);
+spin_unlock_recursive(>ioreq_server.lock);
 
 return rc;
 }
@@ -697,7 +697,7 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t 
id,
 struct ioreq_server *s;
 int rc;
 
-spin_lock_recursive(>arch.hvm.ioreq_server.lock);
+spin_lock_recursive(>ioreq_server.lock);
 
 s = get_ioreq_server(d, id);
 
@@ -731,7 +731,7 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t 
id,
 rc = 0;
 
  out:
-spin_unlock_recursive(>arch.hvm.ioreq_server.lock);
+spin_unlock_recursive(>ioreq_server.lock);
 
 return rc;
 }
@@ -744,7 +744,7 @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t 
id,
 
 ASSERT(is_hvm_domain(d));
 
-spin_lock_recursive(>arch.hvm.ioreq_server.lock);
+spin_lock_recursive(>ioreq_server.lock);
 
 s = get_ioreq_server(d, id);
 
@@ -782,7 +782,7 @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t 
id,
 }
 
  out:
-spin_unlock_recursive(>arch.hvm.ioreq_server.lock);
+spin_unlock_recursive(>ioreq_server.lock);
 
 return rc;
 }
@@ -798,7 +798,7 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, 
ioservid_t id,
 if ( start > end )
 return -EINVAL;

[PATCH V5 07/22] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

The IOREQ is a common feature now and these structs will be used
on Arm as is. Move them to xen/ioreq.h and remove "hvm" prefixes.

Signed-off-by: Oleksandr Tyshchenko 
Acked-by: Jan Beulich 
Reviewed-by: Julien Grall 
Reviewed-by: Paul Durrant 
Reviewed-by: Alex Bennée 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch

Changes V1 -> V2:
   - remove "hvm" prefix

Changes V2 -> V3:
   - update patch according the "legacy interface" is x86 specific

Changes V3 -> V4:
   - add Jan's A-b

Changes V4 -> V5:
   - rebase
   - add Julien's, Alex's and Paul's R-b
   - fix alignment issue in domain.h
---
---
 xen/arch/x86/hvm/emulate.c   |   2 +-
 xen/arch/x86/hvm/ioreq.c |  38 +++---
 xen/arch/x86/hvm/stdvga.c|   2 +-
 xen/arch/x86/mm/p2m.c|   8 +--
 xen/common/ioreq.c   | 108 +++
 xen/include/asm-x86/hvm/domain.h |  36 +
 xen/include/asm-x86/p2m.h|   8 +--
 xen/include/xen/ioreq.h  |  54 
 8 files changed, 128 insertions(+), 128 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index c3487b5..4d62199 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -287,7 +287,7 @@ static int hvmemul_do_io(
  * However, there's no cheap approach to avoid above situations in xen,
  * so the device model side needs to check the incoming ioreq event.
  */
-struct hvm_ioreq_server *s = NULL;
+struct ioreq_server *s = NULL;
 p2m_type_t p2mt = p2m_invalid;
 
 if ( is_mmio )
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 666d695..0cadf34 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -63,7 +63,7 @@ bool arch_vcpu_ioreq_completion(enum hvm_io_completion 
io_completion)
 return true;
 }
 
-static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s)
+static gfn_t hvm_alloc_legacy_ioreq_gfn(struct ioreq_server *s)
 {
 struct domain *d = s->target;
 unsigned int i;
@@ -79,7 +79,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct 
hvm_ioreq_server *s)
 return INVALID_GFN;
 }
 
-static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
+static gfn_t hvm_alloc_ioreq_gfn(struct ioreq_server *s)
 {
 struct domain *d = s->target;
 unsigned int i;
@@ -97,7 +97,7 @@ static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
 return hvm_alloc_legacy_ioreq_gfn(s);
 }
 
-static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s,
+static bool hvm_free_legacy_ioreq_gfn(struct ioreq_server *s,
   gfn_t gfn)
 {
 struct domain *d = s->target;
@@ -115,7 +115,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct 
hvm_ioreq_server *s,
 return true;
 }
 
-static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
+static void hvm_free_ioreq_gfn(struct ioreq_server *s, gfn_t gfn)
 {
 struct domain *d = s->target;
 unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base;
@@ -129,9 +129,9 @@ static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, 
gfn_t gfn)
 }
 }
 
-static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_unmap_ioreq_gfn(struct ioreq_server *s, bool buf)
 {
-struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
+struct ioreq_page *iorp = buf ? >bufioreq : >ioreq;
 
 if ( gfn_eq(iorp->gfn, INVALID_GFN) )
 return;
@@ -143,10 +143,10 @@ static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server 
*s, bool buf)
 iorp->gfn = INVALID_GFN;
 }
 
-static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_map_ioreq_gfn(struct ioreq_server *s, bool buf)
 {
 struct domain *d = s->target;
-struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
+struct ioreq_page *iorp = buf ? >bufioreq : >ioreq;
 int rc;
 
 if ( iorp->page )
@@ -179,11 +179,11 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, 
bool buf)
 return rc;
 }
 
-static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_remove_ioreq_gfn(struct ioreq_server *s, bool buf)
 
 {
 struct domain *d = s->target;
-struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
+struct ioreq_page *iorp = buf ? >bufioreq : >ioreq;
 
 if ( gfn_eq(iorp->gfn, INVALID_GFN) )
 return;
@@ -194,10 +194,10 @@ static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server 
*s, bool buf)
 clear_page(iorp->va);
 }
 
-static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_add_ioreq_gfn(struct ioreq_server *s, bool buf)
 {
 struct domain *d = s->target;
-struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
+struct ioreq_page

[PATCH V5 04/22] xen/ioreq: Make x86's IOREQ feature common

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

As a lot of x86 code can be re-used on Arm later on, this patch
moves previously prepared IOREQ support to the common code
(the code movement is verbatim copy).

The "legacy" mechanism of mapping magic pages for the IOREQ servers
remains x86 specific and not exposed to the common code.

The common IOREQ feature is supposed to be built with IOREQ_SERVER
option enabled, which is selected for x86's config HVM for now.

In order to avoid having a gigantic patch here, the subsequent
patches will update remaining bits in the common code step by step:
- Make IOREQ related structs/materials common
- Drop the "hvm" prefixes and infixes
- Remove layering violation by moving corresponding fields
  out of *arch.hvm* or abstracting away accesses to them

Introduce asm/ioreq.h wrapper to be included by common ioreq.h
instead of asm/hvm/ioreq.h to avoid HVM-ism in the code common.

Also include  which will be needed on Arm
to avoid touch the common code again when introducing Arm specific bits.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Paul Durrant 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

***
Please note, this patch depends on the following which is
on review:
https://patchwork.kernel.org/patch/11816689/
The effort (to get it upstreamed) was paused because of
the security issue around that code (XSA-348).
***

Changes RFC -> V1:
   - was split into three patches:
 - x86/ioreq: Prepare IOREQ feature for making it common
 - xen/ioreq: Make x86's IOREQ feature common
 - xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
   - update MAINTAINERS file
   - do not use a separate subdir for the IOREQ stuff, move it to:
 - xen/common/ioreq.c
 - xen/include/xen/ioreq.h
   - update x86's files to include xen/ioreq.h
   - remove unneeded headers in arch/x86/hvm/ioreq.c
   - re-order the headers alphabetically in common/ioreq.c
   - update common/ioreq.c according to the newly introduced arch functions:
 arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion()

Changes V1 -> V2:
   - update patch description
   - make everything needed in the previous patch to achieve
 a truly rename here
   - don't include unnecessary headers from asm-x86/hvm/ioreq.h
 and xen/ioreq.h
   - use __XEN_IOREQ_H__ instead of __IOREQ_H__
   - move get_ioreq_server() to common/ioreq.c

Changes V2 -> V3:
   - update patch description
   - make everything needed in the previous patch to not
 expose "legacy" interface to the common code here
   - update patch according the "legacy interface" is x86 specific
   - include  in common ioreq.c

Changes V3 -> V4:
   - rebase
   - don't include  from arch header
   - мove all arch hook declarations to the common header

Change V4 -> V5:
   - rebase
   - introduce asm-x86/ioreq.h wrapper:
 - update MAINTAINERS file
 - include asm/ioreq.h instead of asm/hvm/ioreq.h from common ioreq.c
   - include public/hvm/dm_op.h from common ioreq.h
   - add Paul's R-b
---
---
 MAINTAINERS |9 +-
 xen/arch/x86/Kconfig|1 +
 xen/arch/x86/hvm/dm.c   |2 +-
 xen/arch/x86/hvm/emulate.c  |2 +-
 xen/arch/x86/hvm/hvm.c  |2 +-
 xen/arch/x86/hvm/io.c   |2 +-
 xen/arch/x86/hvm/ioreq.c| 1316 ++-
 xen/arch/x86/hvm/stdvga.c   |2 +-
 xen/arch/x86/hvm/vmx/vvmx.c |3 +-
 xen/arch/x86/mm.c   |2 +-
 xen/arch/x86/mm/shadow/common.c |2 +-
 xen/common/Kconfig  |3 +
 xen/common/Makefile |1 +
 xen/common/ioreq.c  | 1290 ++
 xen/include/asm-x86/hvm/ioreq.h |   36 --
 xen/include/asm-x86/ioreq.h |   37 ++
 xen/include/xen/ioreq.h |   38 ++
 17 files changed, 1422 insertions(+), 1326 deletions(-)
 create mode 100644 xen/common/ioreq.c
 create mode 100644 xen/include/asm-x86/ioreq.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5079b83..c4f9aff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -333,6 +333,13 @@ X: xen/drivers/passthrough/vtd/
 X: xen/drivers/passthrough/device_tree.c
 F: xen/include/xen/iommu.h
 
+I/O EMULATION (IOREQ)
+M: Paul Durrant 
+S: Supported
+F: xen/common/ioreq.c
+F: xen/include/xen/ioreq.h
+F: xen/include/public/hvm/ioreq.h
+
 KCONFIG
 M: Doug Goldstein 
 S: Supported
@@ -549,7 +556,7 @@ F:  xen/arch/x86/hvm/ioreq.c
 F: xen/include/asm-x86/hvm/emulate.h
 F: xen/include/asm-x86/hvm/io.h
 F: xen/include/asm-x86/hvm/ioreq.h
-F: xen/include/public/hvm/ioreq.h
+F: xen/include/asm-x86/ioreq.h
 
 X86 MEMORY MANAGEMENT
 M: Jan Beulich 
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index

[PATCH V5 06/22] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

The IOREQ is a common feature now and these helpers will be used
on Arm as is. Move them to xen/ioreq.h and replace "hvm" prefixes
with "ioreq".

Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Paul Durrant 
Acked-by: Jan Beulich 
Reviewed-by: Julien Grall 
Reviewed-by: Alex Bennée 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch

Changes V1 -> V2:
   - replace "hvm" prefix by "ioreq"

Changes V2 -> V3:
   - add Paul's R-b

Changes V3 -> V4:
   - add Jan's A-b

Changes V4 -> V5:
   - rebase
   - add Julien's and Alex's R-b
---
---
 xen/arch/x86/hvm/intercept.c |  5 +++--
 xen/arch/x86/hvm/stdvga.c|  4 ++--
 xen/common/ioreq.c   |  4 ++--
 xen/include/asm-x86/hvm/io.h | 16 
 xen/include/xen/ioreq.h  | 16 
 5 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index cd4c4c1..02ca3b0 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -17,6 +17,7 @@
  * this program; If not, see .
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -34,7 +35,7 @@
 static bool_t hvm_mmio_accept(const struct hvm_io_handler *handler,
   const ioreq_t *p)
 {
-paddr_t first = hvm_mmio_first_byte(p), last;
+paddr_t first = ioreq_mmio_first_byte(p), last;
 
 BUG_ON(handler->type != IOREQ_TYPE_COPY);
 
@@ -42,7 +43,7 @@ static bool_t hvm_mmio_accept(const struct hvm_io_handler 
*handler,
 return 0;
 
 /* Make sure the handler will accept the whole access. */
-last = hvm_mmio_last_byte(p);
+last = ioreq_mmio_last_byte(p);
 if ( last != first &&
  !handler->mmio.ops->check(current, last) )
 domain_crash(current->domain);
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index fd7cadb..17dee74 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -524,8 +524,8 @@ static bool_t stdvga_mem_accept(const struct hvm_io_handler 
*handler,
  * deadlock when hvm_mmio_internal() is called from
  * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
  */
-if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
- (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
+if ( (ioreq_mmio_first_byte(p) < VGA_MEM_BASE) ||
+ (ioreq_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
 return 0;
 
 spin_lock(>lock);
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 61ddd54..89e75ff 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -1078,8 +1078,8 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct 
domain *d,
 break;
 
 case XEN_DMOP_IO_RANGE_MEMORY:
-start = hvm_mmio_first_byte(p);
-end = hvm_mmio_last_byte(p);
+start = ioreq_mmio_first_byte(p);
+end = ioreq_mmio_last_byte(p);
 
 if ( rangeset_contains_range(r, start, end) )
 return s;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 9453b9b..6bc80db 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -40,22 +40,6 @@ struct hvm_mmio_ops {
 hvm_mmio_write_t write;
 };
 
-static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
-{
-return unlikely(p->df) ?
-   p->addr - (p->count - 1ul) * p->size :
-   p->addr;
-}
-
-static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p)
-{
-unsigned long size = p->size;
-
-return unlikely(p->df) ?
-   p->addr + size - 1:
-   p->addr + (p->count * size) - 1;
-}
-
 typedef int (*portio_action_t)(
 int dir, unsigned int port, unsigned int bytes, uint32_t *val);
 
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index e957b52..6853aa3 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -23,6 +23,22 @@
 
 #include 
 
+static inline paddr_t ioreq_mmio_first_byte(const ioreq_t *p)
+{
+return unlikely(p->df) ?
+   p->addr - (p->count - 1ul) * p->size :
+   p->addr;
+}
+
+static inline paddr_t ioreq_mmio_last_byte(const ioreq_t *p)
+{
+unsigned long size = p->size;
+
+return unlikely(p->df) ?
+   p->addr + size - 1:
+   p->addr + (p->count * size) - 1;
+}
+
 static inline bool ioreq_needs_completion(const ioreq_t *ioreq)
 {
 return ioreq->state == STATE_IOREQ_READY &&
-- 
2.7.4

[PATCH V5 05/22] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

The IOREQ is a common feature now and this helper will be used
on Arm as is. Move it to xen/ioreq.h and remove "hvm" prefix.

Although PIO handling on Arm is not introduced with the current series
(it will be implemented when we add support for vPCI), technically
the PIOs exist on Arm (however they are accessed the same way as MMIO)
and it would be better not to diverge now.

Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Paul Durrant 
Acked-by: Jan Beulich 
Reviewed-by: Julien Grall 
Reviewed-by: Alex Bennée 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch, was split from:
 "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common"

Changes V1 -> V2:
   - remove "hvm" prefix

Changes V2 -> V3:
   - add Paul's R-b

Changes V3 -> V4:
   - add Jan's A-b

Changes V4 -> V5:
   - rebase
   - add Julien's and Alex's R-b
---
---
 xen/arch/x86/hvm/emulate.c | 4 ++--
 xen/arch/x86/hvm/io.c  | 2 +-
 xen/common/ioreq.c | 4 ++--
 xen/include/asm-x86/hvm/vcpu.h | 7 ---
 xen/include/xen/ioreq.h| 7 +++
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 60ca465..c3487b5 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -336,7 +336,7 @@ static int hvmemul_do_io(
 rc = hvm_send_ioreq(s, , 0);
 if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
 vio->io_req.state = STATE_IOREQ_NONE;
-else if ( !hvm_ioreq_needs_completion(>io_req) )
+else if ( !ioreq_needs_completion(>io_req) )
 rc = X86EMUL_OKAY;
 }
 break;
@@ -2649,7 +2649,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt 
*hvmemul_ctxt,
 if ( rc == X86EMUL_OKAY && vio->mmio_retry )
 rc = X86EMUL_RETRY;
 
-if ( !hvm_ioreq_needs_completion(>io_req) )
+if ( !ioreq_needs_completion(>io_req) )
 completion = HVMIO_no_completion;
 else if ( completion == HVMIO_no_completion )
 completion = (vio->io_req.type != IOREQ_TYPE_PIO ||
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 11e007d..ef8286b 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -135,7 +135,7 @@ bool handle_pio(uint16_t port, unsigned int size, int dir)
 
 rc = hvmemul_do_pio_buffer(port, size, dir, );
 
-if ( hvm_ioreq_needs_completion(>io_req) )
+if ( ioreq_needs_completion(>io_req) )
 vio->io_completion = HVMIO_pio_completion;
 
 switch ( rc )
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 4e7d91b..61ddd54 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -160,7 +160,7 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, 
ioreq_t *p)
 }
 
 p = >vcpu->arch.hvm.hvm_io.io_req;
-if ( hvm_ioreq_needs_completion(p) )
+if ( ioreq_needs_completion(p) )
 p->data = data;
 
 sv->pending = false;
@@ -186,7 +186,7 @@ bool handle_hvm_io_completion(struct vcpu *v)
 if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
 return false;
 
-vio->io_req.state = hvm_ioreq_needs_completion(>io_req) ?
+vio->io_req.state = ioreq_needs_completion(>io_req) ?
 STATE_IORESP_READY : STATE_IOREQ_NONE;
 
 msix_write_completion(v);
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 5ccd075..6c1feda 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -91,13 +91,6 @@ struct hvm_vcpu_io {
 const struct g2m_ioport *g2m_ioport;
 };
 
-static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
-{
-return ioreq->state == STATE_IOREQ_READY &&
-   !ioreq->data_is_ptr &&
-   (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
-}
-
 struct nestedvcpu {
 bool_t nv_guestmode; /* vcpu in guestmode? */
 void *nv_vvmcx; /* l1 guest virtual VMCB/VMCS */
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 430fc22..e957b52 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -23,6 +23,13 @@
 
 #include 
 
+static inline bool ioreq_needs_completion(const ioreq_t *ioreq)
+{
+return ioreq->state == STATE_IOREQ_READY &&
+   !ioreq->data_is_ptr &&
+   (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
+}
+
 #define HANDLE_BUFIOREQ(s) \
 ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
 
-- 
2.7.4

[PATCH V5 03/22] x86/ioreq: Provide out-of-line wrapper for the handle_mmio()

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

The IOREQ is about to be common feature and Arm will have its own
implementation.

But the name of the function is pretty generic and can be confusing
on Arm (we already have a try_handle_mmio()).

In order not to rename the function (which is used for a varying
set of purposes on x86) globally and get non-confusing variant on Arm
provide a wrapper arch_ioreq_complete_mmio() to be used on common
and Arm code.

Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Jan Beulich 
Reviewed-by: Alex Bennée 
Reviewed-by: Julien Grall 
Reviewed-by: Paul Durrant 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch

Changes V1 -> V2:
   - remove "handle"
   - add Jan's A-b

Changes V2 -> V3:
   - remove Jan's A-b
   - update patch subject/description
   - use out-of-line function instead of #define
   - put earlier in the series to avoid breakage

Changes V3 -> V4:
   - add Jan's R-b
   - rename ioreq_complete_mmio() to arch_ioreq_complete_mmio()

Changes V4 -> V5:
   - rebase
   - add Alex's, Julien's and Paul's R-b
---
---
 xen/arch/x86/hvm/ioreq.c | 7 ++-
 xen/include/xen/ioreq.h  | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 27a4a6f..30e8724 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -36,6 +36,11 @@
 #include 
 #include 
 
+bool arch_ioreq_complete_mmio(void)
+{
+return handle_mmio();
+}
+
 static void set_ioreq_server(struct domain *d, unsigned int id,
  struct hvm_ioreq_server *s)
 {
@@ -226,7 +231,7 @@ bool handle_hvm_io_completion(struct vcpu *v)
 break;
 
 case HVMIO_mmio_completion:
-return handle_mmio();
+return arch_ioreq_complete_mmio();
 
 case HVMIO_pio_completion:
 return handle_pio(vio->io_req.addr, vio->io_req.size,
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index d0980c5..b95d3ef 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -24,6 +24,7 @@
 #define HANDLE_BUFIOREQ(s) \
 ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
 
+bool arch_ioreq_complete_mmio(void);
 bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion);
 int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s);
 void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s);
-- 
2.7.4

[PATCH V5 01/22] x86/ioreq: Prepare IOREQ feature for making it common

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

As a lot of x86 code can be re-used on Arm later on, this
patch makes some preparation to x86/hvm/ioreq.c before moving
to the common code. This way we will get a verbatim copy
for a code movement in subsequent patch.

This patch mostly introduces specific hooks to abstract arch
specific materials taking into the account the requirment to leave
the "legacy" mechanism of mapping magic pages for the IOREQ servers
x86 specific and not expose it to the common code.

These hooks are named according to the more consistent new naming
scheme right away (including dropping the "hvm" prefixes and infixes):
- IOREQ server functions should start with "ioreq_server_"
- IOREQ functions should start with "ioreq_"
other functions will be renamed in subsequent patches.

Introduce common ioreq.h right away and put arch hook declarations
there.

Also re-order #include-s alphabetically.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Alex Bennée 
Reviewed-by: Julien Grall 
Reviewed-by: Paul Durrant 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch, was split from:
 "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common"
   - fold the check of p->type into hvm_get_ioreq_server_range_type()
 and make it return success/failure
   - remove relocate_portio_handler() call from arch_hvm_ioreq_destroy()
 in arch/x86/hvm/ioreq.c
   - introduce arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion()

Changes V1 -> V2:
   - update patch description
   - make arch functions inline and put them into arch header
 to achieve a truly rename by the subsequent patch
   - return void in arch_hvm_destroy_ioreq_server()
   - return bool in arch_hvm_ioreq_destroy()
   - bring relocate_portio_handler() back to arch_hvm_ioreq_destroy()
   - rename IOREQ_IO* to IOREQ_STATUS*
   - remove *handle* from arch_handle_hvm_io_completion()
   - re-order #include-s alphabetically
   - rename hvm_get_ioreq_server_range_type() to 
hvm_ioreq_server_get_type_addr()
 and add "const" to several arguments

Changes V2 -> V3:
   - update patch description
   - name new arch hooks according to the new naming scheme
   - don't make arch hooks inline, move them ioreq.c
   - make get_ioreq_server() local again
   - rework the whole patch taking into the account that "legacy" interface
 should remain x86 specific (additional arch hooks, etc)
   - update the code to be able to use hvm_map_mem_type_to_ioreq_server()
 in the common code (an extra arch hook, etc)
   - don’t include  from arch header
   - add "arch" prefix to hvm_ioreq_server_get_type_addr()
   - move IOREQ_STATUS_* #define-s introduction to the separate patch
   - move HANDLE_BUFIOREQ to the arch header
   - just return relocate_portio_handler() from arch_ioreq_server_destroy_all()
   - misc adjustments proposed by Jan (adding const, unsigned int instead of 
uint32_t)

Changes V3 -> V4:
   - add Alex's R-b
   - update patch description
   - make arch_ioreq_server_get_type_addr return bool
   - drop #include 
   - use two arch hooks in hvm_map_mem_type_to_ioreq_server()
 to avoid calling p2m_change_entry_type_global() with lock held

Changes V4 -> V5:
   - add Julien's and Paul's R-b
   - update patch description
   - remove single use variable in arch_ioreq_server_map_mem_type_completed()
   - put multiple function parameters on a single line in the header
 where possible
   - introduce common ioreq.h right away and put arch hooks declarations
 there instead of doing that in patch #4
---
---
 xen/arch/x86/hvm/ioreq.c | 175 +++
 xen/include/xen/ioreq.h  |  54 +++
 2 files changed, 169 insertions(+), 60 deletions(-)
 create mode 100644 xen/include/xen/ioreq.h

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 1cc27df..3c3c173 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -16,16 +16,16 @@
  * this program; If not, see .
  */
 
-#include 
+#include 
+#include 
 #include 
+#include 
+#include 
 #include 
-#include 
+#include 
 #include 
-#include 
 #include 
-#include 
-#include 
-#include 
+#include 
 #include 
 
 #include 
@@ -170,6 +170,29 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, 
ioreq_t *p)
 return true;
 }
 
+bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion)
+{
+switch ( io_completion )
+{
+case HVMIO_realmode_completion:
+{
+struct hvm_emulate_ctxt ctxt;
+
+hvm_emulate_init_once(, NULL, guest_cpu_user_regs());
+vmx_realmode_emulate_one();
+hvm_emulate_writeback();
+
+break;
+}
+
+default:
+ASSERT_UNREACHABLE();
+

[PATCH V5 02/22] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

This patch continues to make some preparation to x86/hvm/ioreq.c
before moving to the common code.

Add IOREQ_STATUS_* #define-s and update candidates for moving
since X86EMUL_* shouldn't be exposed to the common code in
that form.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Oleksandr Tyshchenko 
Acked-by: Jan Beulich 
Reviewed-by: Alex Bennée 
Reviewed-by: Julien Grall 
Reviewed-by: Paul Durrant 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V2 -> V3:
 - new patch, was split from
   [PATCH V2 01/23] x86/ioreq: Prepare IOREQ feature for making it common

Changes V3 -> V4:
 - add Alex's R-b and Jan's A-b
 - add a comment above IOREQ_STATUS_* #define-s

Changes V4 -> V5:
 - rebase
 - add Julien's and Paul's R-b
---
---
 xen/arch/x86/hvm/ioreq.c| 16 
 xen/include/asm-x86/hvm/ioreq.h |  5 +
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 3c3c173..27a4a6f 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -1401,7 +1401,7 @@ static int hvm_send_buffered_ioreq(struct 
hvm_ioreq_server *s, ioreq_t *p)
 pg = iorp->va;
 
 if ( !pg )
-return X86EMUL_UNHANDLEABLE;
+return IOREQ_STATUS_UNHANDLED;
 
 /*
  * Return 0 for the cases we can't deal with:
@@ -1431,7 +1431,7 @@ static int hvm_send_buffered_ioreq(struct 
hvm_ioreq_server *s, ioreq_t *p)
 break;
 default:
 gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-return X86EMUL_UNHANDLEABLE;
+return IOREQ_STATUS_UNHANDLED;
 }
 
 spin_lock(>bufioreq_lock);
@@ -1441,7 +1441,7 @@ static int hvm_send_buffered_ioreq(struct 
hvm_ioreq_server *s, ioreq_t *p)
 {
 /* The queue is full: send the iopacket through the normal path. */
 spin_unlock(>bufioreq_lock);
-return X86EMUL_UNHANDLEABLE;
+return IOREQ_STATUS_UNHANDLED;
 }
 
 pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
@@ -1472,7 +1472,7 @@ static int hvm_send_buffered_ioreq(struct 
hvm_ioreq_server *s, ioreq_t *p)
 notify_via_xen_event_channel(d, s->bufioreq_evtchn);
 spin_unlock(>bufioreq_lock);
 
-return X86EMUL_OKAY;
+return IOREQ_STATUS_HANDLED;
 }
 
 int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
@@ -1488,7 +1488,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t 
*proto_p,
 return hvm_send_buffered_ioreq(s, proto_p);
 
 if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-return X86EMUL_RETRY;
+return IOREQ_STATUS_RETRY;
 
 list_for_each_entry ( sv,
   >ioreq_vcpu_list,
@@ -1528,11 +1528,11 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t 
*proto_p,
 notify_via_xen_event_channel(d, port);
 
 sv->pending = true;
-return X86EMUL_RETRY;
+return IOREQ_STATUS_RETRY;
 }
 }
 
-return X86EMUL_UNHANDLEABLE;
+return IOREQ_STATUS_UNHANDLED;
 }
 
 unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
@@ -1546,7 +1546,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool 
buffered)
 if ( !s->enabled )
 continue;
 
-if ( hvm_send_ioreq(s, p, buffered) == X86EMUL_UNHANDLEABLE )
+if ( hvm_send_ioreq(s, p, buffered) == IOREQ_STATUS_UNHANDLED )
 failed++;
 }
 
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index e2588e9..df0c292 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -55,6 +55,11 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
 
 void hvm_ioreq_init(struct domain *d);
 
+/* This correlation must not be altered */
+#define IOREQ_STATUS_HANDLED X86EMUL_OKAY
+#define IOREQ_STATUS_UNHANDLED   X86EMUL_UNHANDLEABLE
+#define IOREQ_STATUS_RETRY   X86EMUL_RETRY
+
 #endif /* __ASM_X86_HVM_IOREQ_H__ */
 
 /*
-- 
2.7.4

[PATCH V5 00/22] IOREQ feature (+ virtio-mmio) on Arm

2021-01-25 Thread Oleksandr Tyshchenko

From: Oleksandr Tyshchenko 

Hello all.

The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
You can find an initial discussion at [1] and RFC-V4 series at [2]-[6].
Xen on Arm requires some implementation to forward guest MMIO access to a device
model in order to implement virtio-mmio backend or even mediator outside of 
hypervisor.
As Xen on x86 already contains required support this series tries to make it 
common
and introduce Arm specific bits plus some new functionality. Patch series is 
based on
Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device 
emulator".

***

IMPORTANT NOTE:

Current patch series doesn't contain VirtIO related changes for the toolstack
(but they are still available at the GitHub repo [8]):
- libxl: Introduce basic virtio-mmio support on Arm
- [RFC] libxl: Add support for virtio-disk configuration
I decided to skip these patches for now since they require some rework (not Xen 
4.15 materials),
I will resume pushing them once we get *common* IOREQ in.  

***

According to the initial/subsequent discussions there are a few open
questions/concerns regarding security, performance in VirtIO solution:
1. virtio-mmio vs virtio-pci, SPI vs MSI, or even a composition of virtio-mmio 
+ MSI, 
   different use-cases require different transport...
2. virtio backend is able to access all guest memory, some kind of protection
   is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest', 
etc
   (for these Alex have provided some input at [7])
3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using
   Xenstore in virtio backend if possible. Also, there is a desire to make 
VirtIO
   backend hypervisor-agnostic.
4. a lot of 'foreing mapping' could lead to the memory exhaustion at the host 
side,
   as we are stealing the page from host memory in order to map the guest page.
   Julien has some idea regarding that.
5. Julien also has some ideas how to optimize the IOREQ code:
   5.1 vcpu_ioreq_handle_completion (former handle_hvm_io_completion) which is 
called in
   an hotpath on Arm (everytime we are re-entering to the guest):
   Ideally, vcpu_ioreq_handle_completion should be a NOP (at max a few 
instructions)
   if there is nothing to do (if we don't have I/O forwarded to an IOREQ 
server).
   Maybe we want to introduce a per-vCPU flag indicating if an I/O has been
   forwarded to an IOREQ server. This would allow us to bypass most of the 
function
   if there is nothing to do.
   5.2 The current way to handle MMIO is the following:
   - Pause the vCPU
   - Forward the access to the backend domain
   - Schedule the backend domain
   - Wait for the access to be handled
   - Unpause the vCPU
   The sequence is going to be fairly expensive on Xen.
   It might be possible to optimize the ACK and avoid to wait for the 
backend
   to handle the access.

Looks like all of them are valid and worth considering, but the first thing
which we need on Arm is a mechanism to forward guest IO to a device emulator,
so let's focus on it in the first place.

There are a lot of changes since RFC series, almost all TODOs were resolved on 
Arm,
Arm code was improved and hardened, common IOREQ/DM code became really 
arch-agnostic
(without HVM-ism), the "legacy" mechanism of mapping magic pages for the IOREQ 
servers
was left x86 specific, etc. But one TODO still remains which is "PIO handling" 
on Arm.
The "PIO handling" TODO is expected to left unaddressed for the current series.
It is not an big issue for now while Xen doesn't have support for vPCI on Arm.
On Arm64 they are only used for PCI IO Bar and we would probably want to expose
them to emulator as PIO access to make a DM completely arch-agnostic. So "PIO 
handling"
should be implemented when we add support for vPCI.

There are patches on review this series depends on:
https://patchwork.kernel.org/patch/11816689
https://patchwork.kernel.org/patch/11803383

Please note, that IOREQ feature is disabled by default on Arm within current 
series.

***

Patch series [8] was rebased on recent "staging branch"
(5e31789 tools/ocaml/libs/xb: Do not crash after xenbus is unmapped) and tested 
on
Renesas Salvator-X board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend 
[9]
running in driver domain and unmodified Linux Guest running on existing
virtio-blk driver (frontend). No issues were observed. Guest domain 
'reboot/destroy'
use-cases work properly. Patch series was only build-tested on x86.

Please note, build-test passed for the following modes:
1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set
3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y
4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set  (default)
5. Arm32: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y
6. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set  (default)

***

Any feedback/help would be highly

[linux-5.4 test] 158609: regressions - FAIL

2021-01-25 Thread osstest service owner

flight 158609 linux-5.4 real [real]
flight 158615 linux-5.4 real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/158609/
http://logs.test-lab.xenproject.org/osstest/logs/158615/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-dom0pvh-xl-intel  8 xen-bootfail REGR. vs. 158387

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 158387
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 158387
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 158387
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 158387
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 158387
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 158387
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 158387
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 158387
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 158387
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 158387
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 158387
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass

version targeted for testing:
 linux09f983f0c7fc0db79a5f6c883ec3510d424c369c
baseline version:
 linuxa829146c3fdcf6d0b76d9c54556a223820f1f73b

Last test of basis   158387  2021-01-12 19:40:06 Z   12 days
Failing since158473  2021-01-17 13:42:20 Z8 days   13 attempts
Testing same since   158593  2021-01-23 21:09:36 Z1 days3 attempts


People who

Re: [PATCH v2 3/4] x86: Allow non-faulting accesses to non-emulated MSRs if policy permits this

2021-01-25 Thread Boris Ostrovsky

On 21-01-25 11:22:08, Jan Beulich wrote:
> On 22.01.2021 20:52, Boris Ostrovsky wrote:
> > On 1/22/21 7:51 AM, Jan Beulich wrote:
> >> On 20.01.2021 23:49, Boris Ostrovsky wrote:
> >>> +
> >>> +/*
> >>> + * Accesses to unimplemented MSRs as part of emulation of 
> >>> instructions
> >>> + * other than guest's RDMSR/WRMSR should never succeed.
> >>> + */
> >>> +if ( !is_guest_msr_access )
> >>> +ignore_msrs = MSR_UNHANDLED_NEVER;
> >>
> >> Wouldn't you better "return true" here? Such accesses also
> >> shouldn't be logged imo (albeit I agree that's a change from
> >> current behavior).
> > 
> > 
> > Yes, that's why I didn't return here. We will be here in 
> > !is_guest_msr_access case most likely due to a bug in the emulator so I 
> > think we do want to see the error logged.
> 
> Why "most likely"?


OK, definitely ;-) But I still think logging these accesses would be helpful.

> 
> >>> +if ( unlikely(ignore_msrs != MSR_UNHANDLED_NEVER) )
> >>> +*val = 0;
> >>
> >> I don't understand the conditional here, even more so with
> >> the respective changelog entry. In any event you don't
> >> want to clobber the value ahead of ...
> >>
> >>> +if ( likely(ignore_msrs != MSR_UNHANDLED_SILENT) )
> >>> +{
> >>> +if ( is_write )
> >>> +gdprintk(XENLOG_WARNING, "WRMSR 0x%08x val 0x%016"PRIx64
> >>> +" unimplemented\n", msr, *val);
> >>
> >> ... logging it.
> > 
> > 
> > True. I dropped !is_write from v1 without considering this.
> > 
> > As far as the conditional --- dropping it too would be a behavior change. 
> 
> Albeit an intentional one then? Plus I think I have trouble
> seeing what behavior it would be that would change.


Currently callers of, say, read_msr() don't expect the argument that they pass 
in to change. Granted, they shouldn't (and AFAICS don't) look at it but it's a 
change nonetheless.

> 
> >>> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> >>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> >>> @@ -850,4 +850,10 @@ static inline void x86_emul_reset_event(struct 
> >>> x86_emulate_ctxt *ctxt)
> >>>  ctxt->event = (struct x86_event){};
> >>>  }
> >>>  
> >>> +static inline bool x86_emul_guest_msr_access(struct x86_emulate_ctxt 
> >>> *ctxt)
> >>
> >> The parameter wants to be pointer-to-const. In addition I wonder
> >> whether this wouldn't better be a sibling to
> >> x86_insn_is_cr_access() (without a "state" parameter, which
> >> would be unused and unavailable to the callers), which may end
> >> up finding further uses down the road.
> > 
> > 
> > "Sibling" in terms of name (yes, it would be) or something else?
> 
> Name and (possible) purpose - a validate hook could want to
> make use of this, for example.

A validate hook? 

> 
> >>> +{
> >>> +return ctxt->opcode == X86EMUL_OPC(0x0f, 0x32) ||  /* RDMSR */
> >>> +   ctxt->opcode == X86EMUL_OPC(0x0f, 0x30);/* WRMSR */
> >>> +}
> >>
> >> Personally I'd prefer if this was a single comparison:
> >>
> >> return (ctxt->opcode | 2) == X86EMUL_OPC(0x0f, 0x32);
> >>
> >> But maybe nowadays' compilers are capable of this
> >> transformation?
> > 
> > Here is what I've got (not an inline but shouldn't make much difference I'd 
> > think)
> > 
> > 82d040385960 : # your code
> > 82d040385960:   8b 47 2cmov0x2c(%rdi),%eax
> > 82d040385963:   83 e0 fdand$0xfffd,%eax
> > 82d040385966:   3d 30 00 0f 00  cmp$0xf0030,%eax
> > 82d04038596b:   0f 94 c0sete   %al
> > 82d04038596e:   c3  retq
> > 
> > 82d04038596f : # my code
> > 82d04038596f:   8b 47 2cmov0x2c(%rdi),%eax
> > 82d040385972:   83 c8 02or $0x2,%eax
> > 82d040385975:   3d 32 00 0f 00  cmp$0xf0032,%eax
> > 82d04038597a:   0f 94 c0sete   %al
> > 82d04038597d:   c3  retq
> > 
> > 
> > So it's a wash in terms of generated code.
> 
> True, albeit I guess you got "your code" and "my code" the
> wrong way round, as I don't expect the compiler to
> translate | into "and".


Yes, looks like I did switch them.

> 
> >> I notice you use this function only from PV priv-op emulation.
> >> What about the call paths through hvmemul_{read,write}_msr()?
> >> (It's also questionable whether the write paths need this -
> >> the only MSR written outside of WRMSR emulation is
> >> MSR_SHADOW_GS_BASE, which can't possibly reach the "unhandled"
> >> logic anywhere. But maybe better to be future proof here in
> >> case new MSR writes appear in the emulator, down the road.)
> > 
> > 
> > Won't we end up in hvm_funcs.msr_write_intercept ops which do call it?
> 
> Of course we will - the boolean will very likely need
> propagating (a possible alternative being a per-vCPU flag
> indicating "in emulator").


Oh, I see what you mean. By per-vcpu flag you mean

Re: [PATCH v10 02/11] xen: introduce implementation of save/restore of 'domain context'

2021-01-25 Thread Andrew Cooper

On 08/10/2020 19:57, Paul Durrant wrote:
> diff --git a/xen/common/save.c b/xen/common/save.c
> new file mode 100644
> index 00..9287b20198
> --- /dev/null
> +++ b/xen/common/save.c
> @@ -0,0 +1,339 @@
>
> +static int load_start(struct domain *d, struct domain_ctxt_state *c)
> +{
> +static struct domain_context_start s;
> +unsigned int i;
> +int rc = domain_load_ctxt_rec(c, DOMAIN_CONTEXT_START, , , 
> sizeof(s));
> +
> +if ( rc )
> +return rc;
> +
> +if ( i )
> +return -EINVAL;
> +
> +/*
> + * Make sure we are not attempting to load an image generated by a newer
> + * version of Xen.
> + */
> +if ( s.xen_major > XEN_VERSION && s.xen_minor > XEN_SUBVERSION )

major > XEN_VERSON || (major == XEN_VERSION && minor > XEN_SUBVERSION)

~Andrew

Re: [PATCH v6 10/10] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-25 Thread Stefano Stabellini

On Mon, 25 Jan 2021, Rahul Singh wrote:
> Hello Julien,
> 
> > On 23 Jan 2021, at 11:55 am, Julien Grall  wrote:
> > 
> > Hi Rahul
> > 
> > On 22/01/2021 11:37, Rahul Singh wrote:
> >> Add support for ARM architected SMMUv3 implementation. It is based on
> >> the Linux SMMUv3 driver.
> >> Driver is currently supported as Tech Preview.
> >> Major differences with regard to Linux driver are as follows:
> >> 2. Only Stage-2 translation is supported as compared to the Linux driver
> >>that supports both Stage-1 and Stage-2 translations.
> >> 3. Use P2M  page table instead of creating one as SMMUv3 has the
> >>capability to share the page tables with the CPU.
> >> 4. Tasklets are used in place of threaded IRQ's in Linux for event queue
> >>and priority queue IRQ handling.
> >> 5. Latest version of the Linux SMMUv3 code implements the commands queue
> >>access functions based on atomic operations implemented in Linux.
> >>Atomic functions used by the commands queue access functions are not
> >>implemented in XEN therefore we decided to port the earlier version
> >>of the code. Atomic operations are introduced to fix the bottleneck
> >>of the SMMU command queue insertion operation. A new algorithm for
> >>inserting commands into the queue is introduced, which is lock-free
> >>on the fast-path.
> >>Consequence of reverting the patch is that the command queue
> >>insertion will be slow for large systems as spinlock will be used to
> >>serializes accesses from all CPUs to the single queue supported by
> >>the hardware. Once the proper atomic operations will be available in
> >>XEN the driver can be updated.
> >> 6. Spin lock is used in place of mutex when attaching a device to the
> >>SMMU, as there is no blocking locks implementation available in XEN.
> >>This might introduce latency in XEN. Need to investigate before
> >>driver is out for tech preview.
> >> 7. PCI ATS functionality is not supported, as there is no support
> >>available in XEN to test the functionality. Code is not tested and
> >>compiled. Code is guarded by the flag CONFIG_PCI_ATS.
> >> 8. MSI interrupts are not supported as there is no support available in
> >>XEN to request MSI interrupts. Code is not tested and compiled. Code
> >>is guarded by the flag CONFIG_MSI.
> >> Signed-off-by: Rahul Singh 
> >> Reviewed-by: Bertrand Marquis 
> > 
> > Thank you for sending a new version. I have commited the series now.
> > 
> 
> Thank you for committing the series.

Well done, Rahul!

Re: [PATCH v10 01/11] docs / include: introduce a new framework for 'domain context' records

2021-01-25 Thread Andrew Cooper

On 19/10/2020 14:46, Jan Beulich wrote:
> On 08.10.2020 20:57, Paul Durrant wrote:
>> --- /dev/null
>> +++ b/xen/include/public/save.h
>> @@ -0,0 +1,66 @@
>> +/*
>> + * save.h
>> + *
>> + * Structure definitions for common PV/HVM domain state that is held by Xen.
>> + *
>> + * Copyright Amazon.com Inc. or its affiliates.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a 
>> copy
>> + * of this software and associated documentation files (the "Software"), to
>> + * deal in the Software without restriction, including without limitation 
>> the
>> + * rights to use, copy, modify, merge, publish, distribute, sublicense, 
>> and/or
>> + * sell copies of the Software, and to permit persons to whom the Software 
>> is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included 
>> in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
>> THE
>> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> + * DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef XEN_PUBLIC_SAVE_H
>> +#define XEN_PUBLIC_SAVE_H
>> +
>> +#if defined(__XEN__) || defined(__XEN_TOOLS__)
>> +
>> +#include "xen.h"
>> +
>> +/*
>> + * C structures for the Domain Context v1 format.
>> + * See docs/specs/domain-context.md
>> + */
>> +
>> +struct domain_context_record {
>> +uint32_t type;
>> +uint32_t instance;
>> +uint64_t length;
> Should this be uint64_aligned_t, such that alignof() will
> produce consistent values regardless of bitness of the invoking
> domain?

Does it matter?  Its just a bitstream, and can appear in the migration
fd at any arbitrary alignment.

What matters is that the structure is aligned appropriately for the
bitness of code operating on these fields.

Even with the tools ABI fixed to allow a 32-on-64-on-64  toolstack to
function, I'm not sure that excess alignment would be appropriate.  Sure
- it would be more efficient for 32bit code to align to the 8 byte
boundary for the benefit of a 64bit Xen's copy_from_user(), but this
alignment happens anyway because of how hypercall buffers work.

~Andrew

Re: [PATCH v10 01/11] docs / include: introduce a new framework for 'domain context' records

2021-01-25 Thread Andrew Cooper

On 08/10/2020 19:57, Paul Durrant wrote:
> diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> new file mode 100644
> index 00..c4be9f570c
> --- /dev/null
> +++ b/xen/include/public/save.h
> @@ -0,0 +1,66 @@
> +/*
> + * save.h
> + *
> + * Structure definitions for common PV/HVM domain state that is held by Xen.

What exactly is, and is not in scope, for this new stream?  The PV above
I think refers to "paravirtual state", not PV guests.

> +#define _DOMAIN_CONTEXT_RECORD_ALIGN 3
> +#define DOMAIN_CONTEXT_RECORD_ALIGN (1U << _DOMAIN_CONTEXT_RECORD_ALIGN)

Do we need the logarithm version?

> +
> +enum {
> +DOMAIN_CONTEXT_END,
> +DOMAIN_CONTEXT_START,
> +/* New types go here */
> +DOMAIN_CONTEXT_NR_TYPES
> +};

Does this enum ever end up in an API?

We might be ok as we're inside __XEN_TOOLS__, but enums normally cannot
be used in ABI's because their size is implementation defined, and not
always 4 bytes.

~Andrew

RE: [PATCH v2 2/2] viridian: allow vCPU hotplug for Windows VMs

2021-01-25 Thread Paul Durrant

> -Original Message-
> From: Igor Druzhinin 
> Sent: 12 January 2021 04:17
> To: xen-devel@lists.xenproject.org
> Cc: i...@xenproject.org; w...@xen.org; andrew.coop...@citrix.com; 
> george.dun...@citrix.com;
> jbeul...@suse.com; jul...@xen.org; sstabell...@kernel.org; 
> anthony.per...@citrix.com; p...@xen.org;
> roger@citrix.com; Igor Druzhinin 
> Subject: [PATCH v2 2/2] viridian: allow vCPU hotplug for Windows VMs
> 
> If Viridian extensions are enabled, Windows wouldn't currently allow
> a hotplugged vCPU to be brought up dynamically. We need to expose a special
> bit to let the guest know we allow it. Hide it behind an option to stay
> on the safe side regarding compatibility with existing guests but
> nevertheless set the option on by default.
> 
> Signed-off-by: Igor Druzhinin 

LGTM

Reviewed-by: Paul Durrant 

> ---
> Changes on v2:
> - hide the bit under an option and expose it in libxl
> ---
>  docs/man/xl.cfg.5.pod.in | 7 ++-
>  tools/include/libxl.h| 6 ++
>  tools/libs/light/libxl_types.idl | 1 +
>  tools/libs/light/libxl_x86.c | 4 
>  xen/arch/x86/hvm/viridian/viridian.c | 5 -
>  xen/include/public/hvm/params.h  | 7 ++-
>  6 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index 3467eae..7cdb859 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -2267,11 +2267,16 @@ explicitly have any limits on the number of Virtual 
> processors a guest
>  is allowed to bring up. It is strongly recommended to keep this enabled
>  for guests with more than 64 vCPUs.
> 
> +=item B
> +
> +This set enables dynamic changes to Virtual processor states in Windows
> +guests effectively allowing vCPU hotplug.
> +
>  =item B
> 
>  This is a special value that enables the default set of groups, which
>  is currently the B, B, B, B,
> -B, B and B groups.
> +B, B, B and B groups.
> 
>  =item B
> 
> diff --git a/tools/include/libxl.h b/tools/include/libxl.h
> index be1e288..7c7c541 100644
> --- a/tools/include/libxl.h
> +++ b/tools/include/libxl.h
> @@ -458,6 +458,12 @@
>  #define LIBXL_HAVE_VIRIDIAN_NO_VP_LIMIT 1
> 
>  /*
> + * LIBXL_HAVE_VIRIDIAN_CPU_HOTPLUG indicates that the 'cpu_hotplug' value
> + * is present in the viridian enlightenment enumeration.
> + */
> +#define LIBXL_HAVE_VIRIDIAN_CPU_HOTPLUG 1
> +
> +/*
>   * LIBXL_HAVE_DEVICE_PCI_LIST_FREE indicates that the
>   * libxl_device_pci_list_free() function is defined.
>   */
> diff --git a/tools/libs/light/libxl_types.idl 
> b/tools/libs/light/libxl_types.idl
> index 8502b29..00a8e68 100644
> --- a/tools/libs/light/libxl_types.idl
> +++ b/tools/libs/light/libxl_types.idl
> @@ -240,6 +240,7 @@ libxl_viridian_enlightenment = 
> Enumeration("viridian_enlightenment", [
>  (9, "hcall_ipi"),
>  (10, "ex_processor_masks"),
>  (11, "no_vp_limit"),
> +(12, "cpu_hotplug"),
>  ])
> 
>  libxl_hdtype = Enumeration("hdtype", [
> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
> index 5c4c194..91a9fc7 100644
> --- a/tools/libs/light/libxl_x86.c
> +++ b/tools/libs/light/libxl_x86.c
> @@ -310,6 +310,7 @@ static int hvm_set_viridian_features(libxl__gc *gc, 
> uint32_t domid,
>  libxl_bitmap_set(, 
> LIBXL_VIRIDIAN_ENLIGHTENMENT_APIC_ASSIST);
>  libxl_bitmap_set(, 
> LIBXL_VIRIDIAN_ENLIGHTENMENT_CRASH_CTL);
>  libxl_bitmap_set(, 
> LIBXL_VIRIDIAN_ENLIGHTENMENT_NO_VP_LIMIT);
> +libxl_bitmap_set(, 
> LIBXL_VIRIDIAN_ENLIGHTENMENT_CPU_HOTPLUG);
>  }
> 
>  libxl_for_each_set_bit(v, info->u.hvm.viridian_enable) {
> @@ -373,6 +374,9 @@ static int hvm_set_viridian_features(libxl__gc *gc, 
> uint32_t domid,
>  if (libxl_bitmap_test(, 
> LIBXL_VIRIDIAN_ENLIGHTENMENT_NO_VP_LIMIT))
>  mask |= HVMPV_no_vp_limit;
> 
> +if (libxl_bitmap_test(, 
> LIBXL_VIRIDIAN_ENLIGHTENMENT_CPU_HOTPLUG))
> +mask |= HVMPV_cpu_hotplug;
> +
>  if (mask != 0 &&
>  xc_hvm_param_set(CTX->xch,
>   domid,
> diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
> b/xen/arch/x86/hvm/viridian/viridian.c
> index ae1ea86..b906f7b 100644
> --- a/xen/arch/x86/hvm/viridian/viridian.c
> +++ b/xen/arch/x86/hvm/viridian/viridian.c
> @@ -76,6 +76,7 @@ typedef union _HV_CRASH_CTL_REG_CONTENTS
>  } HV_CRASH_CTL_REG_CONTENTS;
> 
>  /* Viridian CPUID leaf 3, Hypervisor Feature Indication */
> +#define CPUID3D_CPU_DYNAMIC_PARTITIONING (1 << 3)
>  #define CPUID3D_CRASH_MSRS (1 << 10)
>  #define CPUID3D_SINT_POLLING (1 << 17)
> 
> @@ -179,8 +180,10 @@ void cpuid_viridian_leaves(const struct vcpu *v, 
> uint32_t leaf,
>  res->a = u.lo;
>  res->b = u.hi;
> 
> +if ( viridian_feature_mask(d) & HVMPV_cpu_hotplug )
> +   res->d = CPUID3D_CPU_DYNAMIC_PARTITIONING;
>  if ( viridian_feature_mask(d) & HVMPV_crash_ctl )
> -res->d = CPUID3D_CRASH_MSRS;
> +res->d |=

Re: [PATCH v3] x86/mm: Short circuit damage from "fishy" ref/typecount failure

2021-01-25 Thread Andrew Cooper

On 20/01/2021 08:06, Jan Beulich wrote:
> On 19.01.2021 19:09, Andrew Cooper wrote:
>> On 19/01/2021 16:48, Jan Beulich wrote:
>>> On 19.01.2021 14:02, Andrew Cooper wrote:
 This code has been copied in 3 places, but it is problematic.

 All cases will hit a BUG() later in domain teardown, when a the missing
 type/count reference is underflowed.
>>> I'm afraid I could use some help with this: Why would there
>>> be a missing reference, when the getting of one failed?
>> Look at the cleanup logic for the associated fields.
>>
>> Either the plain ref fails (impossible without other fatal refcounting
>> errors AFAICT), or the typeref fails (a concern, but impossible AFAICT).
> In principle I would agree, if there wasn't the question of
> count overflows. The type count presently is 56 bits wide,
> while the general refcount has 54 bits. It'll be a long time
> until they overflow, but it's not impossible. The underlying
> problem there that I see is - where do we draw the line
> between "can't possibly overflow in practice" (as we would
> typically assume for 64-bit counters) and "is to be expected
> to overflow (as we would typically assume for 32-bit
> counters)?

Ok fine - I was treating 54 bits as "not going to happen in practice".

A PV guest needs 2^43 pages of RAM to turn into pagetables to approach
the general refcount limit.  This is more RAM than most people can
accord, and this is way in excess of our security supported limits.

Errors in this area are already hit BUGs in loads of cases, because that
is less bad than the alternatives.

In principle, and as previously discussed, some issues in this area
could be fixed by porting refcount_t from PaX/Linux KSPP which will turn
refcount overflows into memory leaks, which is an even less bad alternative.

>
> Also, as far as "impossible" here goes - the constructs all
> anyway exist only to deal with what we consider impossible.
> The question therefore really is of almost exclusively
> theoretical nature, and hence something like a counter
> possibly overflowing imo needs to be accounted for as
> theoretically possible, albeit impossible with today's
> computers and realistic timing assumptions. If a counter
> overflow occurred, it definitely wouldn't be because of a
> bug in Xen, but because of abnormal behavior elsewhere.
> Hence I remain unconvinced it is appropriate to deal with
> the situation by BUG().

I'm not sure how to be any clearer.

I am literally not changing the current behaviour.  Xen *will* hit a
BUG() if any of these domain_crash() paths are taken.

If you do not believe me, then please go and actually check what happens
when simulating a ref-acquisition failure.

What I am doing is removing complexity (the point of the change) which
gives a false sense of the error being survivable.

If you want to do something other than BUG() in these cases, then you
need to figure some way for the teardown logic to identify which ref
went missing, but this would be a different, follow-on patch.

> But yes, if otoh we assume the failure here to be the result
> of a bug elsewhere in Xen (and not an overflow), then BUG()
> may be warranted. Yet afaic these constructs weren't meant
> to deal with bugs elsewhere in Xen, but with the
> "impossible". So if we change our collective mind here, I
> think the conversion to BUG() would then need accompanying
> by respective commentary.

BUG() is, and has always been, Xen's way of dealing with impossibles,
particularly when it comes to memory handling.

This isn't a "changing minds" occasion.  Removals of BUG()s elsewhere
pertains to logical error based on guest state, which is indeed
inappropriate error handling.

~Andrew

Re: [PATCH] x86/shadow: replace stale literal numbers in hash_{vcpu,domain}_foreach()

2021-01-25 Thread Tim Deegan

At 12:07 +0100 on 25 Jan (1611576438), Jan Beulich wrote:
> 15 apparently once used to be the last valid type to request a callback
> for, and the dimension of the respective array. The arrays meanwhile are
> larger than this (in a benign way, i.e. no caller ever sets a mask bit
> higher than 15), dimensioned by SH_type_unused. Have the ASSERT()s
> follow suit and add build time checks at the call sites.
> 
> Also adjust a comment naming the wrong of the two functions.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Tim Deegan 

> ---
> The ASSERT()s being adjusted look redundant with the BUILD_BUG_ON()s
> being added, so I wonder whether dropping them wouldn't be the better
> route.

I'm happy to keep both, as they do slightly different things.

Thanks for fixing this up!

Tim.

Re: [PATCH] x86/pod: Do not fragment PoD memory allocations

2021-01-25 Thread Elliott Mitchell

On Mon, Jan 25, 2021 at 10:56:25AM +0100, Jan Beulich wrote:
> On 24.01.2021 05:47, Elliott Mitchell wrote:
> > 
> > ---
> > Changes in v2:
> > - Include the obvious removal of the goto target.  Always realize you're
> >   at the wrong place when you press "send".
> 
> Please could you also label the submission then accordingly? I
> got puzzled by two identically titled messages side by side,
> until I noticed the difference.

Sorry about that.  Would you have preferred a third message mentioning
this mistake?

> > I'm not including a separate cover message since this is a single hunk.
> > This really needs some checking in `xl`.  If one has a domain which
> > sometimes gets started on different hosts and is sometimes modified with
> > slightly differing settings, one can run into trouble.
> > 
> > In this case most of the time the particular domain is most often used
> > PV/PVH, but every so often is used as a template for HVM.  Starting it
> > HVM will trigger PoD mode.  If it is started on a machine with less
> > memory than others, PoD may well exhaust all memory and then trigger a
> > panic.
> > 
> > `xl` should likely fail HVM domain creation when the maximum memory
> > exceeds available memory (never mind total memory).
> 
> I don't think so, no - it's the purpose of PoD to allow starting
> a guest despite there not being enough memory available to
> satisfy its "max", as such guests are expected to balloon down
> immediately, rather than triggering an oom condition.

Even Qemu/OVMF is expected to handle ballooning for a *HVM* domain?

> > For example try a domain with the following settings:
> > 
> > memory = 8192
> > maxmem = 2147483648
> > 
> > If type is PV or PVH, it will likely boot successfully.  Change type to
> > HVM and unless your hardware budget is impressive, Xen will soon panic.
> 
> Xen will panic? That would need fixing if so. Also I'd consider
> an excessively high maxmem (compared to memory) a configuration
> error. According to my experiments long, long ago I seem to
> recall that a factor beyond 32 is almost never going to lead to
> anything good, irrespective of guest type. (But as said, badness
> here should be restricted to the guest; Xen itself should limp
> on fine.)

I'll confess I haven't confirmed the panic is in Xen itself.  Problem is
when this gets triggered, by the time the situation is clear and I can
get to the console the computer is already restarting, thus no error
message has been observed.

This is most certainly a configuration error.  Problem is this is a very
small delta between a perfectly valid configuration and the one which
reliably triggers a panic.

The memory:maxmem ratio isn't the problem.  My example had a maxmem of
2147483648 since that is enough to exceed the memory of sub-$100K
computers.  The crucial features are maxmem >= machine memory,
memory < free memory (thus potentially bootable PV/PVH) and type = "hvm".

When was the last time you tried running a Xen machine with near zero
free memory?  Perhaps in the past Xen kept the promise of never panicing
on memory exhaustion, but this feels like this hasn't held for some time.

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-25 Thread Ian Jackson

The quoted-reply part of this message may be going off into the weeds.
Feel free to ignore it, or parts of it, if you think you can make
progress without disabusing me of what I think are my
misunderstandings...

Jan Beulich writes ("Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed 
kernels"):
> On 25.01.2021 17:17, Ian Jackson wrote:
> >  I don't understand why this
> > situation should be handled differently for zstd than for any of the
> > other calls to *PKG* (glib, pixman, libnl).
> 
> The difference is that glib and pixman aren't optional (if
> building qemu), i.e. we want configure to fail if they can't
> be found or are too old.

Yes, but I think that just means adding the [true] fourth argument, to
make failure to find the library a no-op.  I don't think it needs any
more complex handling.  At least, I have not yet understood a need for
more complex handling...

> > Perhaps you experienced some issue which would have been fixed by the
> > addition of the missing PKG_PROG_PKG_CONFIG ?
> 
> I don't think so, no, as I've not tried configuring in a way
> where the earlier PKG_CHECK_MODULES() would be bypassed.

I guess I should take care of this then, since I think it's probably
an accident waiting to happen.

> >> I can, but it feels wrong, in particular if I gave it a
> >> generic looking name (get_unaligned_le32() or some such,
> > 
> > That would seem perfect to me.  I don't know what would be wrong
> > with it.
> 
> Using this (most?) natural name has two issues in my view:
> For one, it'll likely cause conflicts with how other code
> (using hypervisor files) gets built. And then I consider it
> odd to have just one out of a larger set of functions, but
> I would consider it odd as well if I had to introduce them
> all right here.

If you put this new definition in xg_private.h for now then it ought
not to conflict with anyone else.  (Assuming it's a static inline, or
a macro.  If it will have external linkage then it will need a name
prefix which I would prefer to avoid.)

I think it best to add this one macro/inline now.  When someone wants
more of these, they can add them.  If someone want them elsewhere they
can do the work of finding or making a suitably central place.  If it
weren't for the release timing I might think it better to add more,
but my general rule is that steps towards the best possible situation
are better than steps that go away from the best possible situation,
even if the former are not complete.

I think a reasonable alternative would be to arrange to import a
comprehensive set from somewhere.

> > I think we had concluded not to print a warning ?
> 
> Yes. Even in the projected new form of using the construct I
> don't intend to change the description's wording, as the
> intended use of [true] still looks like that can't be intended
> usage. IOW my remark extended beyond the warning; I'm sorry if
> this did end up confusing because you were referring to just
> the warning.

I'm afraid I don't understand what you mean.  In particular, what you
mean by "the intended use of [true] still looks like that can't be
intended usage".

  the intended {by whom for what puropose?} use of [true] still looks
  like that {what?} can't be intended {by whom?} usage

I have the feeling that I have totally failed to grasp your mental
model, which naturally underlies your comments.

Do you mean that with "true" for the 4th argument, the printed output
is not correct, in the failure case ?  Maybe it needs a call to AC_MSG
or something (but AIUI most of these PKG_* macros ought to do that for
us).  I'm just guessing at your meaing here...

> > I mean the inclusion of $libzstd_PKG_ERRORS in the output.
> 
>  I see no point in the warning without including this. In fact
>  I added the AC_MSG_WARN() just so that the contents of this
>  variable (and hence an indication to the user of what to do)
>  was easily accessible.
> >>>
> >>> This is not usual autoconf practice.  The usual approach is to
> >>> consider that missing features are just to be dealt with with a
> >>> minimum of fuss.
> >>
> >> Which is why I made the description say what it says. Just
> >> that - as per above - I don't see viable alternatives (yet).

Quoting this because I think it may still be relevant for
understanding the foregoing...

Ian.

Re: [PATCH] x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled

2021-01-25 Thread Andrew Cooper

On 25/01/2021 14:00, Juergen Gross wrote:
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 4409306364dc..82948251f57b 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -583,6 +583,14 @@ DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
>   exc_debug(regs);
>  }
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +DEFINE_IDTENTRY_RAW(xenpv_exc_vmm_communication)
> +{
> + /* This should never happen and there is no way to handle it. */
> + panic("X86_TRAP_VC in Xen PV mode.");

Honestly, exactly the same is true of #VE, #HV and #SX.

What we do in the hypervisor is wire up one handler for all unknown
exceptions (to avoid potential future #DF issues) leading to a panic. 
Wouldn't it be better to do this unconditionally, especially as #GP/#NP
doesn't work for PV guests for unregistered callbacks, rather than
fixing up piecewise like this?

~Andrew

Re: [PATCH v7 02/10] xen/domain: Add vmtrace_frames domain creation parameter

2021-01-25 Thread Andrew Cooper

On 25/01/2021 15:08, Jan Beulich wrote:
> On 21.01.2021 22:27, Andrew Cooper wrote:
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -132,6 +132,48 @@ static void vcpu_info_reset(struct vcpu *v)
>>  v->vcpu_info_mfn = INVALID_MFN;
>>  }
>>  
>> +static void vmtrace_free_buffer(struct vcpu *v)
>> +{
>> +const struct domain *d = v->domain;
>> +struct page_info *pg = v->vmtrace.buf;
>> +unsigned int i;
>> +
>> +if ( !pg )
>> +return;
>> +
>> +for ( i = 0; i < d->vmtrace_frames; i++ )
>> +{
>> +put_page_alloc_ref([i]);
>> +put_page_and_type([i]);
>> +}
>> +
>> +v->vmtrace.buf = NULL;
> To set a good precedent, maybe this wants moving up ahead of
> the loop and ...
>
>> +}
>> +
>> +static int vmtrace_alloc_buffer(struct vcpu *v)
>> +{
>> +struct domain *d = v->domain;
>> +struct page_info *pg;
>> +unsigned int i;
>> +
>> +if ( !d->vmtrace_frames )
>> +return 0;
>> +
>> +pg = alloc_domheap_pages(d, get_order_from_pages(d->vmtrace_frames),
>> + MEMF_no_refcount);
>> +if ( !pg )
>> +return -ENOMEM;
>> +
>> +v->vmtrace.buf = pg;
> ... this wants moving down past the loop, to avoid
> globally announcing something that isn't fully initialized
> yet / anymore?

Fine.

>
>> +for ( i = 0; i < d->vmtrace_frames; i++ )
>> +/* Domain can't know about this page yet - something fishy going 
>> on. */
>> +if ( !get_page_and_type([i], d, PGT_writable_page) )
>> +BUG();
> Whatever the final verdict to the other similar places
> that one of your patch changes should be applied here,
> too.

Obviously, except there's 0 room for manoeuvring on that patch, so this
hunk is correct.

>
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -94,6 +94,7 @@ struct xen_domctl_createdomain {
>>  uint32_t max_evtchn_port;
>>  int32_t max_grant_frames;
>>  int32_t max_maptrack_frames;
>> +uint32_t vmtrace_frames;
> Considering page size related irritations elsewhere in the
> public interface, could you have a comment clarify the unit
> of this value (Xen's page size according to the rest of the
> patch), and that space will be allocated once per-vCPU
> rather than per-domain (to stand a chance of recognizing
> the ultimate memory footprint resulting from this)?

Well - its hopefully obvious that it shares the same units as the other
*_frames parameters.

But yes - the future ABI fixes, it will be forbidden to use anything in
units of frames, to fix the multitude of interface bugs pertaining to
non-4k page sizes.

I'll switch to using vmtrace_size, in units of bytes, and the per-arch
filtering can enforce being a multiple of 4k.

>
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -257,6 +257,10 @@ struct vcpu
>>  /* vPCI per-vCPU area, used to store data for long running operations. 
>> */
>>  struct vpci_vcpu vpci;
>>  
>> +struct {
>> +struct page_info *buf;
>> +} vmtrace;
> While perhaps minor, I'm unconvinced "buf" is a good name
> for a field of this type.

Please suggest a better one then.  This one is properly namespaced as
v->vmtrace.buf which is the least bad option I could come up with.

>
>> @@ -470,6 +474,9 @@ struct domain
>>  unsignedpbuf_idx;
>>  spinlock_t  pbuf_lock;
>>  
>> +/* Used by vmtrace features */
>> +uint32_tvmtrace_frames;
> unsigned int? Also could you move this to an existing 32-bit
> hole, like immediately after "monitor"?

Ok.

~Andrew

Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-25 Thread Jan Beulich

On 25.01.2021 17:17, Ian Jackson wrote:
> Jan Beulich writes ("Re: [PATCH v2.5 1/5] libxenguest: support zstd 
> compressed kernels"):
>> On 25.01.2021 15:53, Ian Jackson wrote:
>>> Well how about passing "true" for the fourth argument then ?
>>
>> That I did try intermediately, but didn't ever post. It'll
>> screw up when libzstd_CFLAGS and libzstd_LIBS were provided
>> to override pkg-config. When you look at the expanded code,
>> this will end up with pkg_failed set to "untried" and still
>> take the error path. I.e. we wouldn't get the overridden
>> settings appended to $zlib.
> 
> I infer you're reading the autoonf output.  I think pkg_failed is
> something to do with tracking whether pkg-config exists at all.  In
> general, reading autoconf output is an act of desperation when RTFM
> and so on fails.  The output is typically much more complicated than
> the input and can be quite confusing.

Well, after Michael's report I had to understand why the
construct behaved the way it does (and not the way I
thought would be sensible), and short of any documentation
clearly saying so I had to go look at the generated shell
code. Which made me notice the apparently (see below)
unhelpful behavior wrt user overrides.

> I noticed that configure.ac fails to say PKG_PROG_PKG_CONFIG contrary
> to the imprecations in the documentation.  For example, for
> PKG_CHECK_MODULES we have:
> 
>  | # Note that if there is a possibility the first call to
>  | # PKG_CHECK_MODULES might not happen, you should be sure to include
>  | # an explicit call to PKG_PROG_PKG_CONFIG in your configure.ac
>  
> Indeed our first call to PKG_CHECK_* in the existing configure.ac is
> within an if and there is no call to PKG_PROG_PKG_CONFIG.  I think one
> should be added probably somewhere near the top (eg, just after
> AX_XEN_EXPAND_CONFIG).

Probably, but I don't think I should do so here. I did ask
about making the compression checks x86-only, and that would
be the point where I would have seen the need. But you've
asked for the checks to remain arch-independent. 

> I'm not sure exactly what you mean in your paragraph I quote above.  I
> think you mean that if the user supplies the options on the command
> line bugt pkg-config is absent ?

Ah, looks like I indeed got mislead by the bad indentation
of the generate shell code. So let me try again with [true]
as the 4th argument.

>  I don't understand why this
> situation should be handled differently for zstd than for any of the
> other calls to *PKG* (glib, pixman, libnl).

The difference is that glib and pixman aren't optional (if
building qemu), i.e. we want configure to fail if they can't
be found or are too old.

> Perhaps you experienced some issue which would have been fixed by the
> addition of the missing PKG_PROG_PKG_CONFIG ?

I don't think so, no, as I've not tried configuring in a way
where the earlier PKG_CHECK_MODULES() would be bypassed.

> If you want a warning I think it should be a call to AC_MSG_WARN in
> ACTION-IF-NOT-FOUND.

 I didn't to avoid the nesting of things yielding even harder
 to read code.
>>>
>>> In your code it's nested too, just in an if rather than the in the
>>> macro argument - but with a separate condition.  Please do it the
>>> "usual autoconf way".
>>
>> Pieces of shell code look to be permitted - a few lines down
>> from the addition to configure.ac there is a shell case
>> statement. Or are you telling me that's an abuse I shouldn't
>> follow? But then I still don't see how to sensibly replace
>> the construct, given the issue described further up.
> 
> I don't understand what you are getting at.  I think you must have
> misunderstood me.
> 
> You explained that you preferred not to use the 4th argument,
> ACTION-IF-NOT-FOUND, "to avoid nesting".  I was trying to say that I
> didn't think this was a good reason and that instead putting the code
> in a separate conditional is not warranted here (and not idiomatic).
> 
> There is nothing wrong[1] with including (cautious) shell code in
> configure.ac, so that was not part of my argument.

I think the confusion results from my misunderstanding of when
"untried" would result, see above. For that reason I did
consider it necessary to evaluate things once _after_ the
entire construct, rather than inside.

> unziplen = (size_t)gzlen[3] << 24 | gzlen[2] << 16 | gzlen[1] << 8 | 
> gzlen[0];

 Okay, I'll copy that then.
>>>
>>> Could you make a macro or inline function in xg_private.h[1] rather
>>> than open-coding a copy, please ?
>>>
>>> [1] Or, if you prefer, a header with wider scope.
>>
>> I can, but it feels wrong, in particular if I gave it a
>> generic looking name (get_unaligned_le32() or some such,
> 
> That would seem perfect to me.  I don't know what would be wrong
> with it.

Using this (most?) natural name has two issues in my view:
For one, it'll likely cause conflicts with how other code
(using hypervisor files) gets built. And then I consider it
odd

Re: [PATCH v7 04/10] xen/memory: Add a vmtrace_buf resource type

2021-01-25 Thread Jan Beulich

On 21.01.2021 22:27, Andrew Cooper wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1068,11 +1068,35 @@ static unsigned int resource_max_frames(const struct 
> domain *d,
>  case XENMEM_resource_grant_table:
>  return gnttab_resource_max_frames(d, id);
>  
> +case XENMEM_resource_vmtrace_buf:
> +return d->vmtrace_frames;
> +
>  default:
>  return arch_resource_max_frames(d, type, id);
>  }
>  }
>  
> +static int acquire_vmtrace_buf(
> +struct domain *d, unsigned int id, unsigned long frame,
> +unsigned int nr_frames, xen_pfn_t mfn_list[])
> +{
> +const struct vcpu *v = domain_vcpu(d, id);
> +unsigned int i;
> +mfn_t mfn;
> +
> +if ( !v || !v->vmtrace.buf ||
> + nr_frames > d->vmtrace_frames ||
> + (frame + nr_frames) > d->vmtrace_frames )
> +return -EINVAL;


I think that for this to guard against overflow, the first nr_frames
needs to be replaced by frame (as having the wider type), or else a
very large value of frame coming in will not yield the intended
-EINVAL. If you agree, with this changed,
Reviewed-by: Jan Beulich 

Jan

Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-25 Thread Ian Jackson

I have a feeling we may be talking at cross purposes rather too much.

Jan Beulich writes ("Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed 
kernels"):
> On 25.01.2021 15:53, Ian Jackson wrote:
> > Well how about passing "true" for the fourth argument then ?
> 
> That I did try intermediately, but didn't ever post. It'll
> screw up when libzstd_CFLAGS and libzstd_LIBS were provided
> to override pkg-config. When you look at the expanded code,
> this will end up with pkg_failed set to "untried" and still
> take the error path. I.e. we wouldn't get the overridden
> settings appended to $zlib.

I infer you're reading the autoonf output.  I think pkg_failed is
something to do with tracking whether pkg-config exists at all.  In
general, reading autoconf output is an act of desperation when RTFM
and so on fails.  The output is typically much more complicated than
the input and can be quite confusing.

I noticed that configure.ac fails to say PKG_PROG_PKG_CONFIG contrary
to the imprecations in the documentation.  For example, for
PKG_CHECK_MODULES we have:

 | # Note that if there is a possibility the first call to
 | # PKG_CHECK_MODULES might not happen, you should be sure to include
 | # an explicit call to PKG_PROG_PKG_CONFIG in your configure.ac

Indeed our first call to PKG_CHECK_* in the existing configure.ac is
within an if and there is no call to PKG_PROG_PKG_CONFIG.  I think one
should be added probably somewhere near the top (eg, just after
AX_XEN_EXPAND_CONFIG).

I'm not sure exactly what you mean in your paragraph I quote above.  I
think you mean that if the user supplies the options on the command
line bugt pkg-config is absent ?  I don't understand why this
situation should be handled differently for zstd than for any of the
other calls to *PKG* (glib, pixman, libnl).

Perhaps you experienced some issue which would have been fixed by the
addition of the missing PKG_PROG_PKG_CONFIG ?

(Also I note that the docs for PKG_CHECK_EXISTS contain a confusing
slip: there it says "you have to call PKG_CHECK_EXISTS manually" but
surely it must mean PKG_PROG_PKG_CONFIG.)

> >>> If you want a warning I think it should be a call to AC_MSG_WARN in
> >>> ACTION-IF-NOT-FOUND.
> >>
> >> I didn't to avoid the nesting of things yielding even harder
> >> to read code.
> > 
> > In your code it's nested too, just in an if rather than the in the
> > macro argument - but with a separate condition.  Please do it the
> > "usual autoconf way".
> 
> Pieces of shell code look to be permitted - a few lines down
> from the addition to configure.ac there is a shell case
> statement. Or are you telling me that's an abuse I shouldn't
> follow? But then I still don't see how to sensibly replace
> the construct, given the issue described further up.

I don't understand what you are getting at.  I think you must have
misunderstood me.

You explained that you preferred not to use the 4th argument,
ACTION-IF-NOT-FOUND, "to avoid nesting".  I was trying to say that I
didn't think this was a good reason and that instead putting the code
in a separate conditional is not warranted here (and not idiomatic).

There is nothing wrong[1] with including (cautious) shell code in
configure.ac, so that was not part of my argument.

> >>> unziplen = (size_t)gzlen[3] << 24 | gzlen[2] << 16 | gzlen[1] << 8 | 
> >>> gzlen[0];
> >>
> >> Okay, I'll copy that then.
> > 
> > Could you make a macro or inline function in xg_private.h[1] rather
> > than open-coding a copy, please ?
> > 
> > [1] Or, if you prefer, a header with wider scope.
> 
> I can, but it feels wrong, in particular if I gave it a
> generic looking name (get_unaligned_le32() or some such,

That would seem perfect to me.  I don't know what would be wrong
with it.

> >>> I mean the inclusion of $libzstd_PKG_ERRORS in the output.
> >>
> >> I see no point in the warning without including this. In fact
> >> I added the AC_MSG_WARN() just so that the contents of this
> >> variable (and hence an indication to the user of what to do)
> >> was easily accessible.
> > 
> > This is not usual autoconf practice.  The usual approach is to
> > consider that missing features are just to be dealt with with a
> > minimum of fuss.
> 
> Which is why I made the description say what it says. Just
> that - as per above - I don't see viable alternatives (yet).

I think we had concluded not to print a warning ?

Ian.

[1] Contrary to the autoconf docs, which were written for a time when
even some very basic shell constructs such as "if" statements were not
always reliable.  In xen.git we can assume a vaguely posix shell.

Re: Null scheduler and vwfi native problem

2021-01-25 Thread Dario Faggioli

On Fri, 2021-01-22 at 14:26 +, Julien Grall wrote:
> Hi Anders,
> 
> On 22/01/2021 08:06, Anders Törnqvist wrote:
> > On 1/22/21 12:35 AM, Dario Faggioli wrote:
> > > On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:
> > - booting with "sched=null vwfi=native" but not doing the IRQ 
> > passthrough that you mentioned above
> > "xl destroy" gives
> > (XEN) End of domain_destroy function
> > 
> > Then a "xl create" says nothing but the domain has not started
> > correct. 
> > "xl list" look like this for the domain:
> > mydomu   2   512 1 --  
> > 0.0
> 
> This is odd. I would have expected ``xl create`` to fail if something
> went wrong with the domain creation.
>
So, Anders, would it be possible to issue a:

# xl debug-keys r
# xl dmesg

And send it to us ?

Ideally, you'd do it:
 - with Julien's patch (the one he sent the other day, and that you 
   have already given a try to) applied
 - while you are in the state above, i.e., after having tried to 
   destroy a domain and failing
 - and maybe again after having tried to start a new domain

> One possibility is the NULL scheduler doesn't release the pCPUs until
> the domain is fully destroyed. So if there is no pCPU free, it
> wouldn't 
> be able to schedule the new domain.
> 
> However, I would have expected the NULL scheduler to refuse the
> domain 
> to create if there is no pCPU available.
> 
Yeah but, unfortunately, the scheduler does not have it easy to fail
domain creation at this stage (i.e., when we realize there are no
available pCPUs). That's the reason why the NULL scheduler has a
waitqueue, where vCPUs that cannot be put on any pCPU are put.

Of course, this is a configuration error (or a bug, like maybe in this
case :-/), and we print warnings when it happens.

> @Dario, @Stefano, do you know when the NULL scheduler decides to 
> allocate the pCPU?
> 
On which pCPU to allocate a vCPU is decided in null_unit_insert(),
called from sched_alloc_unit() and sched_init_vcpu().

On the other hand, a vCPU is properly removed from its pCPU, hence
making the pCPU free for being assigned to some other vCPU, in
unit_deassign(), called from null_unit_remove(), which in turn is
called from sched_destroy_vcpu() Which is indeed called from
complete_domain_destroy().

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: Null scheduler and vwfi native problem

2021-01-25 Thread Dario Faggioli

On Fri, 2021-01-22 at 18:44 +0100, Anders Törnqvist wrote:
> Listing vcpus looks like this when the domain is running:
> 
> xl vcpu-list
> Name    ID  VCPU   CPU State   Time(s) 
> Affinity (Hard / Soft)
> Domain-0 0 0    0   r-- 101.7 0 /
> all
> Domain-0 0 1    1   r-- 101.0 1 /
> all
> Domain-0 0 2    2   r-- 101.0 2 /
> all
> Domain-0 0 3    3   r-- 100.9 3 /
> all
> Domain-0 0 4    4   r-- 100.9 4 /
> all
> mydomu      1 0    5   r--  89.5 5 /
> all
> 
> vCPU nr 0 is also for dom0. Is that normal?
> 
Yeah, that's the vCPU IDs numbering. Each VM/guest (including dom0) has
its vCPUs and they have ID starting from 0.

What counts here, to make sure that the NULL scheduler "configuration"
is correct, is that each VCPU is associated to one and only one PCPU.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2 1/2] viridian: remove implicit limit of 64 VPs per partition

2021-01-25 Thread Igor Druzhinin

On 12/01/2021 04:17, Igor Druzhinin wrote:
> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
> the maximum number of virtual processors per partition" that "can be obtained
> through CPUID leaf 0x4005". Furthermore, "Requirements for Implementing
> the Microsoft Hypervisor Interface" defines that starting from Windows Server
> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
> contain a value -1 basically assuming the hypervisor has no restriction while
> 0 (that we currently expose) means the default restriction is still present.
> 
> Along with the previous changes exposing ExProcessorMasks this allows a recent
> Windows VM with Viridian extension enabled to have more than 64 vCPUs without
> going into BSOD in some cases.
> 
> Since we didn't expose the leaf before and to keep CPUID data consistent for
> incoming streams from previous Xen versions - let's keep it behind an option.
> 
> Signed-off-by: Igor Druzhinin 
> ---

ping? Paul?

Igor

Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-25 Thread Jan Beulich

On 25.01.2021 15:53, Ian Jackson wrote:
> Jan Beulich writes ("Re: [PATCH v2.5 1/5] libxenguest: support zstd 
> compressed kernels"):
>> On 25.01.2021 14:51, Ian Jackson wrote:
>>> I mean, the parts where you examine libzstd_PKG_ERRORS.
>>
>> The only thing I make use of is it being (non-)empty. Do you
>> really think that's a problem?
> 
> It's highly unusual.   Conceivably it might be empty even if
> pkg-config failed.
> 
>>>  AC_CHECK_LIB([lzo2], [lzo1x_decompress], [zlib="$zlib -DHAVE_LZO1X 
>>> -llzo2"])
>>> +PKG_CHECK_MODULES([libzstd], [libzstd], [zlib="$zlib -DHAVE_ZSTD 
>>> $libzstd_CFLAGS $libzstd_LIBS"])
>>
>> No, that's what I did initially, resulting in libzstd becoming
>> a strict requirement (i.e. configure failing if it's absent),
>> as reported by Michael Young.
> 
> Well how about passing "true" for the fourth argument then ?

That I did try intermediately, but didn't ever post. It'll
screw up when libzstd_CFLAGS and libzstd_LIBS were provided
to override pkg-config. When you look at the expanded code,
this will end up with pkg_failed set to "untried" and still
take the error path. I.e. we wouldn't get the overridden
settings appended to $zlib.

>>> I mean the inclusion of $libzstd_PKG_ERRORS in the output.
>>
>> I see no point in the warning without including this. In fact
>> I added the AC_MSG_WARN() just so that the contents of this
>> variable (and hence an indication to the user of what to do)
>> was easily accessible.
> 
> This is not usual autoconf practice.  The usual approach is to
> consider that missing features are just to be dealt with with a
> minimum of fuss.

Which is why I made the description say what it says. Just
that - as per above - I don't see viable alternatives (yet).

>>> If you want a warning I think it should be a call to AC_MSG_WARN in
>>> ACTION-IF-NOT-FOUND.
>>
>> I didn't to avoid the nesting of things yielding even harder
>> to read code.
> 
> In your code it's nested too, just in an if rather than the in the
> macro argument - but with a separate condition.  Please do it the
> "usual autoconf way".

Pieces of shell code look to be permitted - a few lines down
from the addition to configure.ac there is a shell case
statement. Or are you telling me that's an abuse I shouldn't
follow? But then I still don't see how to sensibly replace
the construct, given the issue described further up.

>>> How unfortunate.  I have also hunted for some existing code and also
>>> didn't find anything suitably general.
>>>
>>> I did find this, open-coded in xg_dom_core.c:xc_dom_check_gzip:
>>>
>>> unziplen = (size_t)gzlen[3] << 24 | gzlen[2] << 16 | gzlen[1] << 8 | 
>>> gzlen[0];
>>
>> Okay, I'll copy that then.
> 
> Could you make a macro or inline function in xg_private.h[1] rather
> than open-coding a copy, please ?
> 
> [1] Or, if you prefer, a header with wider scope.

I can, but it feels wrong, in particular if I gave it a
generic looking name (get_unaligned_le32() or some such, if
I was to follow the kernel-originating(?) approach used in
the mini-os wrapping of the hypervisor decompressor code),
and something like get_linuxes_idea_of_uncompressed_size()
is also, well, not really nice. Especially if put in a
general header like xg_private.h (i.e. in this latter case
I'd rather see the helper have more narrow scope, but of
course introducing a new header just for this seems overkill
as well). Any more concrete suggestion would be appreciated
here.

Jan

Re: [PATCH v7 08/10] tools/misc: Add xen-vmtrace tool

2021-01-25 Thread Andrew Cooper

On 22/01/2021 15:33, Ian Jackson wrote:
> Andrew Cooper writes ("[PATCH v7 08/10] tools/misc: Add xen-vmtrace tool"):
>> From: Michał Leszczyński 
> ...
>> +if ( signal(SIGINT, int_handler) == SIG_ERR )
>> +err(1, "Failed to register signal handler\n");
> How bad is it if this signal handler is not effective ?

I believe far less so now that I've fixed up everything to use a (fixed)
XENMEM_acquire_resource, so Xen doesn't crash if this process dies in
the wrong order WRT the domain shutting down.

But I would have to defer to Michał on that.

>> +if ( xc_vmtrace_disable(xch, domid, vcpu) )
>> +perror("xc_vmtrace_disable()");
> I guess the tracing will remain on, pointlessly, which has a perf
> impact but nothing else ?

The perf hit is substantial, but it is safe to leave enabled.

> How is it possible for the user to clean this up ?

For now, enable/disable can only fail with -EINVAL for calls made in the
wrong context, so a failure here is benign in practice.

I specifically didn't opt for reference counting the enable/disable
calls, because there cannot (usefully) be two users of this interface.

>
> Also: at the very least, you need to trap SIGTERM SIGHUP SIGPIPE.
>
> It would be good to exit with the right signal by re-raising it.

This is example code, not a production utility.

Anything more production-wise using this needs to account for the fact
that Intel Processor Trace can't pause on a full buffer.  (It ought to
be able to on forthcoming hardware, but this facility isn't available yet.)

The use-cases thus far are always "small delta of execution between
introspection events", using a massive buffer as the mitigation for
hardware wrapping.

No amount of additional code here can prevent stream corruption problems
with the buffer wrapping.  As a result, it is kept as simple as possible
as a demonstration of how to use the API.

~Andrew

Re: [PATCH v7 02/10] xen/domain: Add vmtrace_frames domain creation parameter

2021-01-25 Thread Jan Beulich

On 21.01.2021 22:27, Andrew Cooper wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -132,6 +132,48 @@ static void vcpu_info_reset(struct vcpu *v)
>  v->vcpu_info_mfn = INVALID_MFN;
>  }
>  
> +static void vmtrace_free_buffer(struct vcpu *v)
> +{
> +const struct domain *d = v->domain;
> +struct page_info *pg = v->vmtrace.buf;
> +unsigned int i;
> +
> +if ( !pg )
> +return;
> +
> +for ( i = 0; i < d->vmtrace_frames; i++ )
> +{
> +put_page_alloc_ref([i]);
> +put_page_and_type([i]);
> +}
> +
> +v->vmtrace.buf = NULL;

To set a good precedent, maybe this wants moving up ahead of
the loop and ...

> +}
> +
> +static int vmtrace_alloc_buffer(struct vcpu *v)
> +{
> +struct domain *d = v->domain;
> +struct page_info *pg;
> +unsigned int i;
> +
> +if ( !d->vmtrace_frames )
> +return 0;
> +
> +pg = alloc_domheap_pages(d, get_order_from_pages(d->vmtrace_frames),
> + MEMF_no_refcount);
> +if ( !pg )
> +return -ENOMEM;
> +
> +v->vmtrace.buf = pg;

... this wants moving down past the loop, to avoid
globally announcing something that isn't fully initialized
yet / anymore?

> +for ( i = 0; i < d->vmtrace_frames; i++ )
> +/* Domain can't know about this page yet - something fishy going on. 
> */
> +if ( !get_page_and_type([i], d, PGT_writable_page) )
> +BUG();

Whatever the final verdict to the other similar places
that one of your patch changes should be applied here,
too.

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -94,6 +94,7 @@ struct xen_domctl_createdomain {
>  uint32_t max_evtchn_port;
>  int32_t max_grant_frames;
>  int32_t max_maptrack_frames;
> +uint32_t vmtrace_frames;

Considering page size related irritations elsewhere in the
public interface, could you have a comment clarify the unit
of this value (Xen's page size according to the rest of the
patch), and that space will be allocated once per-vCPU
rather than per-domain (to stand a chance of recognizing
the ultimate memory footprint resulting from this)?

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -257,6 +257,10 @@ struct vcpu
>  /* vPCI per-vCPU area, used to store data for long running operations. */
>  struct vpci_vcpu vpci;
>  
> +struct {
> +struct page_info *buf;
> +} vmtrace;

While perhaps minor, I'm unconvinced "buf" is a good name
for a field of this type.

> @@ -470,6 +474,9 @@ struct domain
>  unsignedpbuf_idx;
>  spinlock_t  pbuf_lock;
>  
> +/* Used by vmtrace features */
> +uint32_tvmtrace_frames;

unsigned int? Also could you move this to an existing 32-bit
hole, like immediately after "monitor"?

Jan

Re: [PATCH] xen/include: compat/xlat.h may change with .config changes

2021-01-25 Thread Andrew Cooper

On 25/01/2021 11:03, Jan Beulich wrote:
> $(xlat-y) getting derived from $(headers-y) means its contents may
> change with changes to .config. The individual files $(xlat-y) refers
> to, otoh, may not change, and hence not trigger rebuilding of xlat.h.
> (Note that the issue was already present before the commit referred to
> below, but it was far more limited in affecting only changes to
> CONFIG_XSM_FLASK.)
>
> Fixes: 2c8fabb2232d ("x86: only generate compat headers actually needed")
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper

Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-25 Thread Ian Jackson

Jan Beulich writes ("Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed 
kernels"):
> On 25.01.2021 14:51, Ian Jackson wrote:
> > I mean, the parts where you examine libzstd_PKG_ERRORS.
> 
> The only thing I make use of is it being (non-)empty. Do you
> really think that's a problem?

It's highly unusual.   Conceivably it might be empty even if
pkg-config failed.

> >  AC_CHECK_LIB([lzo2], [lzo1x_decompress], [zlib="$zlib -DHAVE_LZO1X 
> > -llzo2"])
> > +PKG_CHECK_MODULES([libzstd], [libzstd], [zlib="$zlib -DHAVE_ZSTD 
> > $libzstd_CFLAGS $libzstd_LIBS"])
> 
> No, that's what I did initially, resulting in libzstd becoming
> a strict requirement (i.e. configure failing if it's absent),
> as reported by Michael Young.

Well how about passing "true" for the fourth argument then ?

> > I mean the inclusion of $libzstd_PKG_ERRORS in the output.
> 
> I see no point in the warning without including this. In fact
> I added the AC_MSG_WARN() just so that the contents of this
> variable (and hence an indication to the user of what to do)
> was easily accessible.

This is not usual autoconf practice.  The usual approach is to
consider that missing features are just to be dealt with with a
minimum of fuss.

> The [true] in the 4th argument is there to prevent the default
> behavior of failing the configure process altogether. You'd
> see autoconf's default there only if the argument was absent.

I see.  Thanks for correcting me.  See above.

> > If you want a warning I think it should be a call to AC_MSG_WARN in
> > ACTION-IF-NOT-FOUND.
> 
> I didn't to avoid the nesting of things yielding even harder
> to read code.

In your code it's nested too, just in an if rather than the in the
macro argument - but with a separate condition.  Please do it the
"usual autoconf way".

> > This suggests to me that a warning for missing zstd is not necessarily
> > a good idea unless it is conditional for x86.
> 
> Well, okay, I'll drop the warning then.

Thanks.

> > IDK what the zstd-defined endianness is.  I guess it must be LE for
> > your patch to work on x86.
> 
> This field is not part of the zstd output, but gets appended
> to the output by the Linux kernel build system. IOW its
> endianness gets defined by Linux; the text in the respective
> Makefile says "littleendian".

How odd.  OK.

> > How unfortunate.  I have also hunted for some existing code and also
> > didn't find anything suitably general.
> > 
> > I did find this, open-coded in xg_dom_core.c:xc_dom_check_gzip:
> > 
> > unziplen = (size_t)gzlen[3] << 24 | gzlen[2] << 16 | gzlen[1] << 8 | 
> > gzlen[0];
> 
> Okay, I'll copy that then.

Could you make a macro or inline function in xg_private.h[1] rather
than open-coding a copy, please ?

[1] Or, if you prefer, a header with wider scope.

Thanks,
Ian.

[xen-unstable test] 158607: tolerable FAIL

2021-01-25 Thread osstest service owner

flight 158607 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158607/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-migrupgrade 21 debian-fixup/dst_host  fail pass in 158601
 test-arm64-arm64-examine  8 reboot fail pass in 158601

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail like 
158591
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 158601
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 158601
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 158601
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 158601
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 158601
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 158601
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 158601
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 158601
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 158601
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 158601
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 158601
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  452ddbe3592b141b05a7e0676f09c8ae07f98fdd
baseline version:
 xen  452ddbe3592b141b05a7e0676f09c8ae07f98fdd

Last test of basis   158607  2021-01-25 01:53:39 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm

Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-25 Thread Jan Beulich

On 25.01.2021 14:51, Ian Jackson wrote:
> Jan Beulich writes ("Re: [PATCH v2.5 1/5] libxenguest: support zstd 
> compressed kernels"):
>> On 25.01.2021 12:30, Ian Jackson wrote:
 As far as configure.ac goes, I'm pretty sure there is a better (more
 "standard") way of using PKG_CHECK_MODULES().
>>>
>>> Yes, what you have done is rather unidiomatic and seems to rely on
>>> undocumented internals of the PKG_*. macros.
>>
>> Which specific part of the construct are you referring to?
>> I didn't think I used anything outright undocumented. Of
>> course I did have some trouble finding suitable docs, but in
>> the end I managed to locate at least something that I was
>> able to grok.
> 
> I mean, the parts where you examine libzstd_PKG_ERRORS.

The only thing I make use of is it being (non-)empty. Do you
really think that's a problem?

>>>  Why not do as was done for bz2, lzma, lzo2 ?
>>
>> Because the pkg-config approach is more flexible - aiui
>> AC_CHECK_HEADER() and AC_CHECK_LIB() won't find a
>> dependency when sitting in some custom place, which the *.pc
>> files are specifically supposed to cover for.
> 
> Yes, sorry, I didn't mean to suggest that the use of PKG_CHECK_MODULES
> rather than AC_CHECK_LIB was wrong.  But I think you can just pass
> similar if-found and if-not-found fragments.  Maybe something like:
> 
>  AC_CHECK_LIB([lzo2], [lzo1x_decompress], [zlib="$zlib -DHAVE_LZO1X -llzo2"])
> +PKG_CHECK_MODULES([libzstd], [libzstd], [zlib="$zlib -DHAVE_ZSTD 
> $libzstd_CFLAGS $libzstd_LIBS"])

No, that's what I did initially, resulting in libzstd becoming
a strict requirement (i.e. configure failing if it's absent),
as reported by Michael Young.

>>>  Printing the errors to configure's terminal is
>>> not normally done, either.
>>
>> With this you mean the AC_MSG_WARN()?
> 
> I don't mind there being a call to AC_MSG_WARN.  I don't think I have
> a strong opinion about whether lack of zstd ought to produce a
> warning.  If there ought to be a warning, then it ought to be made
> with AC_MSG_WARN, indeed.
> 
> I mean the inclusion of $libzstd_PKG_ERRORS in the output.

I see no point in the warning without including this. In fact
I added the AC_MSG_WARN() just so that the contents of this
variable (and hence an indication to the user of what to do)
was easily accessible.

>> I'm okay to drop it; I was actually half tempted to myself already,
>> but thought having it would be better in line with
>> PKG_CHECK_MODULES() when not passed a 4th argument (where it gets
>> quite verbose, but of course also fails the configure process
>> altogether).
> 
> Does it ?  Admittedly the documentation I found in pkg.m4 for these
> PKG_* macros doesn't say what the default is for ACTION-IF-NOT-FOUND
> but it would surely parallel all the autoconf-provided macros where
> the default is a no-op.  I read the autoconf output in your patch
> (where admitteedly you pass [true]) and that seems to support my
> supposition.

The [true] in the 4th argument is there to prevent the default
behavior of failing the configure process altogether. You'd
see autoconf's default there only if the argument was absent.

> If you want a warning I think it should be a call to AC_MSG_WARN in
> ACTION-IF-NOT-FOUND.

I didn't to avoid the nesting of things yielding even harder
to read code.

>>> I don't understand why there is an x86-specific angle here.
>>
>> On a "normal" libxenguest build decompression is available
>> only on x86, because of
>>
>> SRCS-$(CONFIG_X86) += xg_dom_bzimageloader.c
> 
> Oh!
> 
>> Hence the dependencies thereof also only ought to need
>> checking on x86.
> 
> I see.  Hmm.  TBH this seems anomalous.  I would prefer to keep the
> configure test and expect that eventually some non-x86 folsk will
> decide to turn this on there too.
> 
> This suggests to me that a warning for missing zstd is not necessarily
> a good idea unless it is conditional for x86.

Well, okay, I'll drop the warning then.

 +insize = *size - 4;
 +outsize = *(uint32_t *)(*blob + insize);
>>>
>>> Potentiallty unaligned access.  IDK if this kind of thing is thought
>>> OK in hypervisor code but it it's not sufficiently portable for tools.
>>
>> Also a possible endianness issue, yes.
> 
> The endianness issue at least just means "this code doesn't work and
> will always reject images".  The alignment issue might mean "feeding
> a corrupted image file will crash your management daemon".
> 
> IDK what the zstd-defined endianness is.  I guess it must be LE for
> your patch to work on x86.

This field is not part of the zstd output, but gets appended
to the output by the Linux kernel build system. IOW its
endianness gets defined by Linux; the text in the respective
Makefile says "littleendian".

>> Since as per above this
>> code gets used on x86 only, I thought this would be fine at least
>> for now.
> 
> I think that's too much of a boobytrap to leave in the code.
> 
>> In fact before using this simplistic approach I did
>> check

[PATCH] x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled

2021-01-25 Thread Juergen Gross

When booting a kernel which has been built with CONFIG_AMD_MEM_ENCRYPT
enabled as a Xen pv guest a warning is issued for each processor:

[5.964347] [ cut here ]
[5.968314] WARNING: CPU: 0 PID: 1 at 
/home/gross/linux/head/arch/x86/xen/enlighten_pv.c:660 get_trap_addr+0x59/0x90
[5.972321] Modules linked in:
[5.976313] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
5.11.0-rc5-default #75
[5.980313] Hardware name: Dell Inc. OptiPlex 9020/0PC5F7, BIOS A05 
12/05/2013
[5.984313] RIP: e030:get_trap_addr+0x59/0x90
[5.988313] Code: 42 10 83 f0 01 85 f6 74 04 84 c0 75 1d b8 01 00 00 00 c3 
48 3d 00 80 83 82 72 08 48 3d 20 81 83 82 72 0c b8 01 00 00 00 eb db <0f> 0b 31 
c0 c3 48 2d 00 80 83 82 48 ba 72 1c c7 71 1c c7 71 1c 48
[5.992313] RSP: e02b:c90040033d38 EFLAGS: 00010202
[5.996313] RAX: 0001 RBX: 82a141d0 RCX: 8222ec38
[6.000312] RDX: 8222ec38 RSI: 0005 RDI: c90040033d40
[6.004313] RBP: 8881003984a0 R08: 0007 R09: 888100398000
[6.008312] R10: 0007 R11: c90040246000 R12: 8884082182a8
[6.012313] R13: 0100 R14: 001d R15: 8881003982d0
[6.016316] FS:  () GS:88840820() 
knlGS:
[6.020313] CS:  e030 DS:  ES:  CR0: 80050033
[6.024313] CR2: c900020ef000 CR3: 0220a000 CR4: 00050660
[6.028314] Call Trace:
[6.032313]  cvt_gate_to_trap.part.7+0x3f/0x90
[6.036313]  ? asm_exc_double_fault+0x30/0x30
[6.040313]  xen_convert_trap_info+0x87/0xd0
[6.044313]  xen_pv_cpu_up+0x17a/0x450
[6.048313]  bringup_cpu+0x2b/0xc0
[6.052313]  ? cpus_read_trylock+0x50/0x50
[6.056313]  cpuhp_invoke_callback+0x80/0x4c0
[6.060313]  _cpu_up+0xa7/0x140
[6.064313]  cpu_up+0x98/0xd0
[6.068313]  bringup_nonboot_cpus+0x4f/0x60
[6.072313]  smp_init+0x26/0x79
[6.076313]  kernel_init_freeable+0x103/0x258
[6.080313]  ? rest_init+0xd0/0xd0
[6.084313]  kernel_init+0xa/0x110
[6.088313]  ret_from_fork+0x1f/0x30
[6.092313] ---[ end trace be9ecf17dceeb4f3 ]---

Reason is that there is no Xen pv trap entry for X86_TRAP_VC.

Fix that by defining a trap entry for X86_TRAP_VC in Xen pv mode.

Fixes: 0786138c78e793 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Cc:  # v5.10
Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/idtentry.h |  3 +++
 arch/x86/xen/enlighten_pv.c | 11 +++
 arch/x86/xen/xen-asm.S  |  3 +++
 3 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 247a60a47331..115a76e77e65 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -609,6 +609,9 @@ DECLARE_IDTENTRY_DF(X86_TRAP_DF,exc_double_fault);
 /* #VC */
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 DECLARE_IDTENTRY_VC(X86_TRAP_VC,   exc_vmm_communication);
+#ifdef CONFIG_XEN_PV
+DECLARE_IDTENTRY_RAW(X86_TRAP_VC,  xenpv_exc_vmm_communication);
+#endif
 #endif
 
 #ifdef CONFIG_XEN_PV
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4409306364dc..82948251f57b 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -583,6 +583,14 @@ DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
exc_debug(regs);
 }
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+DEFINE_IDTENTRY_RAW(xenpv_exc_vmm_communication)
+{
+   /* This should never happen and there is no way to handle it. */
+   panic("X86_TRAP_VC in Xen PV mode.");
+}
+#endif
+
 struct trap_array_entry {
void (*orig)(void);
void (*xen)(void);
@@ -625,6 +633,9 @@ static struct trap_array_entry trap_array[] = {
TRAP_ENTRY(exc_coprocessor_error,   false ),
TRAP_ENTRY(exc_alignment_check, false ),
TRAP_ENTRY(exc_simd_coprocessor_error,  false ),
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   TRAP_ENTRY_REDIR(exc_vmm_communication, true ),
+#endif
 };
 
 static bool __ref get_trap_addr(void **addr, unsigned int ist)
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 1cb0e84b9161..16f4db35de44 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -175,6 +175,9 @@ xen_pv_trap asm_exc_alignment_check
 xen_pv_trap asm_exc_machine_check
 #endif /* CONFIG_X86_MCE */
 xen_pv_trap asm_exc_simd_coprocessor_error
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+xen_pv_trap asm_xenpv_exc_vmm_communication
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
 #ifdef CONFIG_IA32_EMULATION
 xen_pv_trap entry_INT80_compat
 #endif
-- 
2.26.2

Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-25 Thread Ian Jackson

Jan Beulich writes ("Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed 
kernels"):
> On 25.01.2021 12:30, Ian Jackson wrote:
> >> As far as configure.ac goes, I'm pretty sure there is a better (more
> >> "standard") way of using PKG_CHECK_MODULES().
> > 
> > Yes, what you have done is rather unidiomatic and seems to rely on
> > undocumented internals of the PKG_*. macros.
> 
> Which specific part of the construct are you referring to?
> I didn't think I used anything outright undocumented. Of
> course I did have some trouble finding suitable docs, but in
> the end I managed to locate at least something that I was
> able to grok.

I mean, the parts where you examine libzstd_PKG_ERRORS.

> >  Why not do as was done for bz2, lzma, lzo2 ?
> 
> Because the pkg-config approach is more flexible - aiui
> AC_CHECK_HEADER() and AC_CHECK_LIB() won't find a
> dependency when sitting in some custom place, which the *.pc
> files are specifically supposed to cover for.

Yes, sorry, I didn't mean to suggest that the use of PKG_CHECK_MODULES
rather than AC_CHECK_LIB was wrong.  But I think you can just pass
similar if-found and if-not-found fragments.  Maybe something like:

 AC_CHECK_LIB([lzo2], [lzo1x_decompress], [zlib="$zlib -DHAVE_LZO1X -llzo2"])
+PKG_CHECK_MODULES([libzstd], [libzstd], [zlib="$zlib -DHAVE_ZSTD 
$libzstd_CFLAGS $libzstd_LIBS"])

> >  Printing the errors to configure's terminal is
> > not normally done, either.
> 
> With this you mean the AC_MSG_WARN()?

I don't mind there being a call to AC_MSG_WARN.  I don't think I have
a strong opinion about whether lack of zstd ought to produce a
warning.  If there ought to be a warning, then it ought to be made
with AC_MSG_WARN, indeed.

I mean the inclusion of $libzstd_PKG_ERRORS in the output.

> I'm okay to drop it; I was actually half tempted to myself already,
> but thought having it would be better in line with
> PKG_CHECK_MODULES() when not passed a 4th argument (where it gets
> quite verbose, but of course also fails the configure process
> altogether).

Does it ?  Admittedly the documentation I found in pkg.m4 for these
PKG_* macros doesn't say what the default is for ACTION-IF-NOT-FOUND
but it would surely parallel all the autoconf-provided macros where
the default is a no-op.  I read the autoconf output in your patch
(where admitteedly you pass [true]) and that seems to support my
supposition.

If you want a warning I think it should be a call to AC_MSG_WARN in
ACTION-IF-NOT-FOUND.

> > I don't understand why there is an x86-specific angle here.
> 
> On a "normal" libxenguest build decompression is available
> only on x86, because of
> 
> SRCS-$(CONFIG_X86) += xg_dom_bzimageloader.c

Oh!

> Hence the dependencies thereof also only ought to need
> checking on x86.

I see.  Hmm.  TBH this seems anomalous.  I would prefer to keep the
configure test and expect that eventually some non-x86 folsk will
decide to turn this on there too.

This suggests to me that a warning for missing zstd is not necessarily
a good idea unless it is conditional for x86.

> I have to admit I'm uncertain about the stubdom build. I was
> merely implying that if decompression is unavailable in "normal"
> builds outside of x86, then _if_ non-x86 builds of stubdom exist
> in the first place, decompression code there is at best dead
> (the quoted restriction from Makefile applies in this case too,
> and hence I can't see callers of that code, despite
> 
> ifeq ($(CONFIG_LIBXC_MINIOS),y)
> SRCS-y += xg_dom_decompress_unsafe.c
> SRCS-y += xg_dom_decompress_unsafe_bzip2.c
> SRCS-y += xg_dom_decompress_unsafe_lzma.c
> SRCS-y += xg_dom_decompress_unsafe_lzo1x.c
> SRCS-y += xg_dom_decompress_unsafe_xz.c
> SRCS-y += xg_dom_decompress_unsafe_zstd.c
> endif
> 
> not restricting it to x86).

I think there is no mini-os and no stubdom build on ARM.  I don't
think this is necessarily for any particularly principled reason
except that minios in particular is not so easy to port.

So that would explain why the build isn't broken despite this
inconsistency.

> >> This follows the logic used for other decompression methods utilizing an
> >> external library, albeit here we can't ignore the 32-bit size field
> >> appended to the compressed image - its presence causes decompression to
> >> fail. Leverage the field instead to allocate the output buffer in one
> >> go, i.e. without incrementally realloc()ing.
> > 
> >> +insize = *size - 4;
> >> +outsize = *(uint32_t *)(*blob + insize);
> > 
> > Potentiallty unaligned access.  IDK if this kind of thing is thought
> > OK in hypervisor code but it it's not sufficiently portable for tools.
> 
> Also a possible endianness issue, yes.

The endianness issue at least just means "this code doesn't work and
will always reject images".  The alignment issue might mean "feeding
a corrupted image file will crash your management daemon".

IDK what the

Re: [PATCH v4 02/10] evtchn: bind-interdomain doesn't need to hold both domains' event locks

2021-01-25 Thread Jan Beulich

On 09.01.2021 17:14, Julien Grall wrote:
> On 09/01/2021 15:41, Julien Grall wrote:
>> On 05/01/2021 13:09, Jan Beulich wrote:
>>> The local domain's lock is needed for the port allocation, but for the
>>> remote side the per-channel lock is sufficient. The per-channel locks
>>> then need to be acquired slightly earlier, though.
>>
>> I was expecting is little bit more information in the commit message 
>> because there are a few changes in behavior with this change:
>>
>>   1) AFAICT, evtchn_allocate_port() rely on rchn->state to be protected 
>> by the rd->event_lock. Now that you dropped the rd->event_lock, 
>> rchn->state may be accessed while it is updated in 
>> evtchn_bind_interdomain(). The port cannot go back to ECS_FREE here, but 
>> I think the access needs to be switched to {read, write}_atomic() or 
>> ACCESS_ONCE.
>>
>>    2) xsm_evtchn_interdomain() is now going to be called without the 
>> rd->event_lock. Can you confirm that the lock is not needed by XSM?
> 
> Actually, I think there is a bigger issue. evtchn_close() will check 
> chn1->state with just d1->event_lock held (IOW, there chn1->lock is not 
> held).
> 
> If the remote domain happen to close the unbound port at the same time 
> the local domain bound it, then you may end up in the following situation:
> 
> 
> evtchn_bind_interdomain()| evtchn_close()
>   |
>   |  switch ( chn1->state )
>   |  case ECS_UNBOUND:
>   |  /* nothing to do */
> double_evtchn_lock()  |
> rchn->state = ECS_INTERDOMAIN |
> double_evtchn_unlock()|
>   |  evtchn_write_lock(chn1)
>   |  evtchn_free(d1, chn1)
>   |  evtchn_write_unlock(chn1)
> 
> When the local domain will try to close the port, it will hit the 
> BUG_ON(chn2->state != ECS_INTERDOMAIN) because the remote port were 
> already freed.

Hmm, yes, thanks for spotting (and sorry for taking a while to
reply).

> I think this can be solved by acquiring the event lock earlier on in 
> evtchn_close(). Although, this may become a can of worms as it would be 
> more complex to prevent lock inversion because chn1->lock and chn2->lock.

Indeed. I think I'll give up on trying to eliminate the double
per-domain event locking for the time being, and resubmit with
both patches dropped.

Jan

Re: [PATCH v2 3/5] libxenguest: "standardize" LZO kernel decompression code

2021-01-25 Thread Ian Jackson

Jan Beulich writes ("Re: [PATCH v2 3/5] libxenguest: "standardize" LZO kernel 
decompression code"):
> On 25.01.2021 12:59, Ian Jackson wrote:
> > I don't mind throwing in the DOMPRINTF too.
> 
> Am I fine to transliterate this into R-a-b?

Err, yes, sorry, should have been more explicit.

Ian.

Re: [PATCH v2 3/5] libxenguest: "standardize" LZO kernel decompression code

2021-01-25 Thread Jan Beulich

On 25.01.2021 12:59, Ian Jackson wrote:
> Wei Liu writes ("Re: [PATCH v2 3/5] libxenguest: "standardize" LZO kernel 
> decompression code"):
>> On Tue, Jan 19, 2021 at 04:16:35PM +0100, Jan Beulich wrote:
>>> Add a DOMPRINTF() other methods have, indicating success. To facilitate
>>> this, introduce an "outsize" local variable and update *size as well as
>>> *blob only once done. The latter then also avoids leaving a pointer to
>>> freed memory in dom->kernel_blob in case of a decompression error.
>>>
>>> Signed-off-by: Jan Beulich 
>>
>> Acked-by: Wei Liu 
> 
> The latter part of this is a bugfix which ought to go into 4.15, I
> think, and be backported.
> 
> I don't mind throwing in the DOMPRINTF too.

Am I fine to transliterate this into R-a-b?

Jan

Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-25 Thread Jan Beulich

On 25.01.2021 12:30, Ian Jackson wrote:
> Hi.  Thanks for this.  Firstly, RM hat: this is the tools half of zstd
> decompression support which I think is a blocker for the release.  So
> I am going to waive the last posting date requirement.  Therefore,
> 
> Assuming it's committed this week:
> 
> Release-Acked-by: Ian Jackson 

Thanks.

> Secondly, I think it would be sensible for me to review it:
> 
>> As far as configure.ac goes, I'm pretty sure there is a better (more
>> "standard") way of using PKG_CHECK_MODULES().
> 
> Yes, what you have done is rather unidiomatic and seems to rely on
> undocumented internals of the PKG_*. macros.

Which specific part of the construct are you referring to?
I didn't think I used anything outright undocumented. Of
course I did have some trouble finding suitable docs, but in
the end I managed to locate at least something that I was
able to grok.

>  Why not do as was done for bz2, lzma, lzo2 ?

Because the pkg-config approach is more flexible - aiui
AC_CHECK_HEADER() and AC_CHECK_LIB() won't find a
dependency when sitting in some custom place, which the *.pc
files are specifically supposed to cover for.

>  Printing the errors to configure's terminal is
> not normally done, either.

With this you mean the AC_MSG_WARN()? I'm okay to drop it; I
was actually half tempted to myself already, but thought having
it would be better in line with PKG_CHECK_MODULES() when not
passed a 4th argument (where it gets quite verbose, but of
course also fails the configure process altogether).

>>  The construct also gets
>> put next to the other decompression library checks, albeit I think they
>> all ought to be x86-specific (e.g. placed in the existing case block a
>> few lines down).
> 
> I don't understand why there is an x86-specific angle here.

On a "normal" libxenguest build decompression is available
only on x86, because of

SRCS-$(CONFIG_X86) += xg_dom_bzimageloader.c

Hence the dependencies thereof also only ought to need
checking on x86.

I have to admit I'm uncertain about the stubdom build. I was
merely implying that if decompression is unavailable in "normal"
builds outside of x86, then _if_ non-x86 builds of stubdom exist
in the first place, decompression code there is at best dead
(the quoted restriction from Makefile applies in this case too,
and hence I can't see callers of that code, despite

ifeq ($(CONFIG_LIBXC_MINIOS),y)
SRCS-y += xg_dom_decompress_unsafe.c
SRCS-y += xg_dom_decompress_unsafe_bzip2.c
SRCS-y += xg_dom_decompress_unsafe_lzma.c
SRCS-y += xg_dom_decompress_unsafe_lzo1x.c
SRCS-y += xg_dom_decompress_unsafe_xz.c
SRCS-y += xg_dom_decompress_unsafe_zstd.c
endif

not restricting it to x86).

>> This follows the logic used for other decompression methods utilizing an
>> external library, albeit here we can't ignore the 32-bit size field
>> appended to the compressed image - its presence causes decompression to
>> fail. Leverage the field instead to allocate the output buffer in one
>> go, i.e. without incrementally realloc()ing.
> 
>> +insize = *size - 4;
>> +outsize = *(uint32_t *)(*blob + insize);
> 
> Potentiallty unaligned access.  IDK if this kind of thing is thought
> OK in hypervisor code but it it's not sufficiently portable for tools.

Also a possible endianness issue, yes. Since as per above this
code gets used on x86 only, I thought this would be fine at least
for now. In fact before using this simplistic approach I did
check whether xg_dom_bzimageloader.c had suitable abstraction
available, yet I couldn't spot any.

> The rest of this code looks OK to me.  I spent quite a while trying to
> figure out the memory management / ownership rules for the interface
> to these decompression functions.  This business where they all
> allocate a new buffer, and overwrite their input pointer with it (but
> only on success), is pretty nasty.  I wasn't able to find where the
> old buffer was freed.  But the other decompressors all seem to work
> the same way.  Urgh.  In summary: nasty, but, this new code seems to
> follow the existing convension.

Yes, this isn't pretty, but looks to have served the purpose. I'd
be happy to see it improved, but I'm afraid beyond what's in this
series I won't have much time to help the overall situation.

Jan

Re: [PATCH v2 3/5] libxenguest: "standardize" LZO kernel decompression code

2021-01-25 Thread Ian Jackson

Wei Liu writes ("Re: [PATCH v2 3/5] libxenguest: "standardize" LZO kernel 
decompression code"):
> On Tue, Jan 19, 2021 at 04:16:35PM +0100, Jan Beulich wrote:
> > Add a DOMPRINTF() other methods have, indicating success. To facilitate
> > this, introduce an "outsize" local variable and update *size as well as
> > *blob only once done. The latter then also avoids leaving a pointer to
> > freed memory in dom->kernel_blob in case of a decompression error.
> > 
> > Signed-off-by: Jan Beulich 
> 
> Acked-by: Wei Liu 

The latter part of this is a bugfix which ought to go into 4.15, I
think, and be backported.

I don't mind throwing in the DOMPRINTF too.

Ian.

[ovmf test] 158608: all pass - PUSHED

2021-01-25 Thread osstest service owner

flight 158608 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158608/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 96a9acfc527964dc5ab7298862a0cd8aa5fffc6a
baseline version:
 ovmf 3b769c5110384fb33bcfeddced80f721ec7838cc

Last test of basis   158585  2021-01-23 01:06:49 Z2 days
Testing same since   158608  2021-01-25 02:39:45 Z0 days1 attempts


People who touched revisions under test:
  Nhi Pham 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   3b769c5110..96a9acfc52  96a9acfc527964dc5ab7298862a0cd8aa5fffc6a -> 
xen-tested-master

Re: [PATCH 15/17] x86/shadow: drop SH_type_l2h_pae_shadow

2021-01-25 Thread Jan Beulich

On 22.01.2021 21:02, Tim Deegan wrote:
> At 17:31 +0100 on 22 Jan (1611336662), Jan Beulich wrote:
>> Because of this having been benign (due to none of the callback
>> tables specifying non-NULL entries there), wouldn't it make
>> sense to dimension the tables by SH_type_max_shadow + 1 only?
>> Or would you consider this too risky?
> 
> Yes, I think that would be fine, also changing '<= 15' to
> '<= SH_type_max_shadow'.  Maybe add a matching
> ASSERT(t <= SH_type_max_shadow) in shadow_hash_insert as well?

The latter (also for shadow_hash_delete()) would seem kind
of orthogonal to me, but for now I've put it in there.

Jan

Re: [PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-25 Thread Ian Jackson

Hi.  Thanks for this.  Firstly, RM hat: this is the tools half of zstd
decompression support which I think is a blocker for the release.  So
I am going to waive the last posting date requirement.  Therefore,

Assuming it's committed this week:

Release-Acked-by: Ian Jackson 


Secondly, I think it would be sensible for me to review it:

> As far as configure.ac goes, I'm pretty sure there is a better (more
> "standard") way of using PKG_CHECK_MODULES().

Yes, what you have done is rather unidiomatic and seems to rely on
undocumented internals of the PKG_*. macros.  Why not do as was done
for bz2, lzma, lzo2 ?  Printing the errors to configure's terminal is
not normally done, either.

>  The construct also gets
> put next to the other decompression library checks, albeit I think they
> all ought to be x86-specific (e.g. placed in the existing case block a
> few lines down).

I don't understand why there is an x86-specific angle here.

> This follows the logic used for other decompression methods utilizing an
> external library, albeit here we can't ignore the 32-bit size field
> appended to the compressed image - its presence causes decompression to
> fail. Leverage the field instead to allocate the output buffer in one
> go, i.e. without incrementally realloc()ing.

> +insize = *size - 4;
> +outsize = *(uint32_t *)(*blob + insize);

Potentiallty unaligned access.  IDK if this kind of thing is thought
OK in hypervisor code but it it's not sufficiently portable for tools.

The rest of this code looks OK to me.  I spent quite a while trying to
figure out the memory management / ownership rules for the interface
to these decompression functions.  This business where they all
allocate a new buffer, and overwrite their input pointer with it (but
only on success), is pretty nasty.  I wasn't able to find where the
old buffer was freed.  But the other decompressors all seem to work
the same way.  Urgh.  In summary: nasty, but, this new code seems to
follow the existing convension.

Thanks,
Ian.

Re: [PATCH 15/17] x86/shadow: drop SH_type_l2h_pae_shadow

2021-01-25 Thread Jan Beulich

On 22.01.2021 21:02, Tim Deegan wrote:
> At 17:31 +0100 on 22 Jan (1611336662), Jan Beulich wrote:
>> On 22.01.2021 14:11, Tim Deegan wrote:
>>> At 16:10 +0100 on 14 Jan (1610640627), Jan Beulich wrote:
 hash_{domain,vcpu}_foreach() have a use each of literal 15. It's not
 clear to me what the proper replacement constant would be, as it
 doesn't look as if SH_type_monitor_table was meant.
>>>
>>> Good spot.  I think the '<= 15' should be replaced with '< SH_type_unused'.
>>> It originally matched the callback arrays all being coded as
>>> "static hash_callback_t callbacks[16]".
>>
>> So are you saying this was off by one then prior to this patch
>> (when SH_type_unused was still 17), albeit in apparently a
>> benign way?
> 
> Yes - this assertion is just to catch overruns of the callback table,
> and so it was OK for its limit to be too low.  The new types that were
> added since then are never put in the hash table, so don't trigger
> this assertion.
> 
>> And the comments in sh_remove_write_access(),
>> sh_remove_all_mappings(), sh_remove_shadows(), and
>> sh_reset_l3_up_pointers() are then wrong as well, and would
>> instead better be like in shadow_audit_tables()?
> 
> Yes, it looks like those comments are also out of date where they
> mention 'unused'.

For this, which likely will end up being part of ...

>> Because of this having been benign (due to none of the callback
>> tables specifying non-NULL entries there), wouldn't it make
>> sense to dimension the tables by SH_type_max_shadow + 1 only?
>> Or would you consider this too risky?
> 
> Yes, I think that would be fine, also changing '<= 15' to
> '<= SH_type_max_shadow'.  Maybe add a matching
> ASSERT(t <= SH_type_max_shadow) in shadow_hash_insert as well?

... this, I'll send a patch for 4.16 going beyond the more
immediate one sent, which I'll ask Ian to consider for 4.15
(assuming of course you consider it okay in the first place).

Jan

[PATCH] x86/shadow: replace stale literal numbers in hash_{vcpu,domain}_foreach()

2021-01-25 Thread Jan Beulich

15 apparently once used to be the last valid type to request a callback
for, and the dimension of the respective array. The arrays meanwhile are
larger than this (in a benign way, i.e. no caller ever sets a mask bit
higher than 15), dimensioned by SH_type_unused. Have the ASSERT()s
follow suit and add build time checks at the call sites.

Also adjust a comment naming the wrong of the two functions.

Signed-off-by: Jan Beulich 
---
The ASSERT()s being adjusted look redundant with the BUILD_BUG_ON()s
being added, so I wonder whether dropping them wouldn't be the better
route.

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1623,6 +1623,9 @@ void shadow_hash_delete(struct domain *d
 typedef int (*hash_vcpu_callback_t)(struct vcpu *v, mfn_t smfn, mfn_t 
other_mfn);
 typedef int (*hash_domain_callback_t)(struct domain *d, mfn_t smfn, mfn_t 
other_mfn);
 
+#define HASH_CALLBACKS_CHECK(mask) \
+BUILD_BUG_ON((mask) > (1U << ARRAY_SIZE(callbacks)) - 1)
+
 static void hash_vcpu_foreach(struct vcpu *v, unsigned int callback_mask,
   const hash_vcpu_callback_t callbacks[],
   mfn_t callback_mfn)
@@ -1658,7 +1661,7 @@ static void hash_vcpu_foreach(struct vcp
 {
 if ( callback_mask & (1 << x->u.sh.type) )
 {
-ASSERT(x->u.sh.type <= 15);
+ASSERT(x->u.sh.type < SH_type_unused);
 ASSERT(callbacks[x->u.sh.type] != NULL);
 done = callbacks[x->u.sh.type](v, page_to_mfn(x),
callback_mfn);
@@ -1705,7 +1708,7 @@ static void hash_domain_foreach(struct d
 {
 if ( callback_mask & (1 << x->u.sh.type) )
 {
-ASSERT(x->u.sh.type <= 15);
+ASSERT(x->u.sh.type < SH_type_unused);
 ASSERT(callbacks[x->u.sh.type] != NULL);
 done = callbacks[x->u.sh.type](d, page_to_mfn(x),
callback_mfn);
@@ -2009,6 +2012,7 @@ int sh_remove_write_access(struct domain
 perfc_incr(shadow_writeable_bf_1);
 else
 perfc_incr(shadow_writeable_bf);
+HASH_CALLBACKS_CHECK(callback_mask);
 hash_domain_foreach(d, callback_mask, callbacks, gmfn);
 
 /* If that didn't catch the mapping, then there's some non-pagetable
@@ -2080,6 +2084,7 @@ int sh_remove_all_mappings(struct domain
 
 /* Brute-force search of all the shadows, by walking the hash */
 perfc_incr(shadow_mappings_bf);
+HASH_CALLBACKS_CHECK(callback_mask);
 hash_domain_foreach(d, callback_mask, callbacks, gmfn);
 
 /* If that didn't catch the mapping, something is very wrong */
@@ -2246,10 +2251,12 @@ void sh_remove_shadows(struct domain *d,
 /* Search for this shadow in all appropriate shadows */
 perfc_incr(shadow_unshadow);
 
-/* Lower-level shadows need to be excised from upper-level shadows.
- * This call to hash_vcpu_foreach() looks dangerous but is in fact OK: each
+/*
+ * Lower-level shadows need to be excised from upper-level shadows. This
+ * call to hash_domain_foreach() looks dangerous but is in fact OK: each
  * call will remove at most one shadow, and terminate immediately when
- * it does remove it, so we never walk the hash after doing a deletion.  */
+ * it does remove it, so we never walk the hash after doing a deletion.
+ */
 #define DO_UNSHADOW(_type) do { \
 t = (_type);\
 if( !(pg->count_info & PGC_page_table)  \
@@ -2270,6 +2277,7 @@ void sh_remove_shadows(struct domain *d,
 if( !fast   \
 && (pg->count_info & PGC_page_table)\
 && (pg->shadow_flags & (1 << t)) )  \
+HASH_CALLBACKS_CHECK(SHF_page_type_mask);   \
 hash_domain_foreach(d, masks[t], callbacks, smfn);  \
 } while (0)
 
@@ -2370,6 +2378,7 @@ void sh_reset_l3_up_pointers(struct vcpu
 };
 static const unsigned int callback_mask = SHF_L3_64;
 
+HASH_CALLBACKS_CHECK(callback_mask);
 hash_vcpu_foreach(v, callback_mask, callbacks, INVALID_MFN);
 }
 
@@ -3420,6 +3429,7 @@ void shadow_audit_tables(struct vcpu *v)
 }
 }
 
+HASH_CALLBACKS_CHECK(SHF_page_type_mask);
 hash_vcpu_foreach(v, mask, callbacks, INVALID_MFN);
 }

[PATCH] xen/include: compat/xlat.h may change with .config changes

2021-01-25 Thread Jan Beulich

$(xlat-y) getting derived from $(headers-y) means its contents may
change with changes to .config. The individual files $(xlat-y) refers
to, otoh, may not change, and hence not trigger rebuilding of xlat.h.
(Note that the issue was already present before the commit referred to
below, but it was far more limited in affecting only changes to
CONFIG_XSM_FLASK.)

Fixes: 2c8fabb2232d ("x86: only generate compat headers actually needed")
Signed-off-by: Jan Beulich 

--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -81,7 +81,7 @@ compat/.xlat/%.lst: xlat.lst Makefile
 xlat-y := $(shell sed -ne 's,@arch@,$(compat-arch-y),g' -re 
's,^[?!][[:blank:]]+[^[:blank:]]+[[:blank:]]+,,p' xlat.lst | uniq)
 xlat-y := $(filter $(patsubst compat/%,%,$(headers-y)),$(xlat-y))
 
-compat/xlat.h: $(addprefix compat/.xlat/,$(xlat-y)) Makefile
+compat/xlat.h: $(addprefix compat/.xlat/,$(xlat-y)) config/auto.conf Makefile
cat $(filter %.h,$^) >$@.new
mv -f $@.new $@

Re: [PATCH] tools/xenstore: fix use after free bug in xenstore_control

2021-01-25 Thread Andrew Cooper

On 25/01/2021 07:23, Juergen Gross wrote:
> There is a very unlikely use after free bug and a memory leak in
> live_update_start() of xenstore_control. Fix those.
>
> Coverity-Id: 1472399
> Fixes: 7f97193e6aa858 ("tools/xenstore: add live update command to 
> xenstore-control")
> Signed-off-by: Juergen Gross 

Acked-by: Andrew Cooper

[qemu-mainline test] 158606: regressions - FAIL

2021-01-25 Thread osstest service owner

flight 158606 qemu-mainline real [real]
flight 158612 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/158606/
http://logs.test-lab.xenproject.org/osstest/logs/158612/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 152631
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 152631
 test-armhf-armhf-xl-vhd 17 guest-start/debian.repeat fail REGR. vs. 152631

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemuu-debianhvm-amd64 18 guest-localmigrate/x10 fail pass 
in 158612-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152631
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 152631
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152631
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuue81eb5e6d108008445821e4f891fb9563016c71b
baseline version:
 qemuu1d806cef0e38b5db8347a8e12f214d543204a314

Last test of basis   152631  2020-08-20 09:07:46 Z  158 days
Failing since152659  2020-08-21 14:07:39 Z  156 days  321 attempts
Testing same since   158606  2021-01-24 19:38:17 Z0 days1 attempts


363 people

Re: [PATCH v3] xen: EXPERT clean-up and introduce UNSUPPORTED

2021-01-25 Thread Bertrand Marquis

Hi Stefano,

> On 23 Jan 2021, at 02:19, Stefano Stabellini  wrote:
> 
> A recent thread [1] has exposed a couple of issues with our current way
> of handling EXPERT.
> 
> 1) It is not obvious that "Configure standard Xen features (expert
> users)" is actually the famous EXPERT we keep talking about on xen-devel
> 
> 2) It is not obvious when we need to enable EXPERT to get a specific
> feature
> 
> In particular if you want to enable ACPI support so that you can boot
> Xen on an ACPI platform, you have to enable EXPERT first. But searching
> through the kconfig menu it is really not clear (type '/' and "ACPI"):
> nothing in the description tells you that you need to enable EXPERT to
> get the option.
> 
> So this patch makes things easier by doing two things:
> 
> - introduce a new kconfig option UNSUPPORTED which is clearly to enable
>  UNSUPPORTED features as defined by SUPPORT.md

That’s a great change which will improve user experience.

> 
> - change EXPERT options to UNSUPPORTED where it makes sense: keep
>  depending on EXPERT for features made for experts
> 
> - tag unsupported features by adding (UNSUPPORTED) to the one-line
>  description
> 

Shouldn’t we add  (EXPERT) for expert options in the same way for coherency ?

Cheers
Bertrand

> - clarify the EXPERT one-line description
> 
> [1] https://marc.info/?l=xen-devel=160333101228981
> 
> Signed-off-by: Stefano Stabellini 
> CC: andrew.coop...@citrix.com
> CC: george.dun...@citrix.com
> CC: i...@xenproject.org
> CC: jbeul...@suse.com
> CC: jul...@xen.org
> CC: w...@xen.org
> CC: bertrand.marq...@arm.com
> 
> Signed-off-by: Stefano Stabellini 
> ---
> Changes in v3:
> - improve UNSUPPORTED text description
> - avoid changing XEN_SHSTK and EFI_SET_VIRTUAL_ADDRESS_MAP
> - update HVM_FEP to be UNSUPPORTED
> 
> Changes in v2:
> - introduce UNSUPPORTED
> - don't switch all EXPERT options to UNSUPPORTED
> 
> See as reference the v2 thread here:
> https://marc.info/?l=xen-devel=160566066013723
> ---
> xen/Kconfig  |  9 -
> xen/arch/arm/Kconfig | 10 +-
> xen/arch/x86/Kconfig |  6 +++---
> xen/common/Kconfig   |  2 +-
> xen/common/sched/Kconfig |  6 +++---
> 5 files changed, 20 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/Kconfig b/xen/Kconfig
> index 34c318bfa2..4a3d988353 100644
> --- a/xen/Kconfig
> +++ b/xen/Kconfig
> @@ -34,8 +34,15 @@ config DEFCONFIG_LIST
>   option defconfig_list
>   default ARCH_DEFCONFIG
> 
> +config UNSUPPORTED
> + bool "Configure UNSUPPORTED features"
> + help
> +   This option allows certain unsupported Xen options to be changed,
> +   which includes non-security-supported, experimental, and tech
> +   preview features as defined by SUPPORT.md.
> +
> config EXPERT
> - bool "Configure standard Xen features (expert users)"
> + bool "Configure EXPERT features"
>   help
> This option allows certain base Xen options and settings
> to be disabled or tweaked. This is for specialized environments
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index c3eb13ea73..cca76040e5 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -32,7 +32,7 @@ menu "Architecture Features"
> source "arch/Kconfig"
> 
> config ACPI
> - bool "ACPI (Advanced Configuration and Power Interface) Support" if 
> EXPERT
> + bool "ACPI (Advanced Configuration and Power Interface) Support 
> (UNSUPPORTED)" if UNSUPPORTED
>   depends on ARM_64
>   ---help---
> 
> @@ -49,7 +49,7 @@ config GICV3
> If unsure, say Y
> 
> config HAS_ITS
> -bool "GICv3 ITS MSI controller support" if EXPERT
> +bool "GICv3 ITS MSI controller support (UNSUPPORTED)" if UNSUPPORTED
> depends on GICV3 && !NEW_VGIC
> 
> config HVM
> @@ -77,7 +77,7 @@ config SBSA_VUART_CONSOLE
> SBSA Generic UART implements a subset of ARM PL011 UART.
> 
> config ARM_SSBD
> - bool "Speculative Store Bypass Disable" if EXPERT
> + bool "Speculative Store Bypass Disable (UNSUPPORTED)" if UNSUPPORTED
>   depends on HAS_ALTERNATIVE
>   default y
>   help
> @@ -87,7 +87,7 @@ config ARM_SSBD
> If unsure, say Y.
> 
> config HARDEN_BRANCH_PREDICTOR
> - bool "Harden the branch predictor against aliasing attacks" if EXPERT
> + bool "Harden the branch predictor against aliasing attacks 
> (UNSUPPORTED)" if UNSUPPORTED
>   default y
>   help
> Speculation attacks against some high-performance processors rely on
> @@ -104,7 +104,7 @@ config HARDEN_BRANCH_PREDICTOR
> If unsure, say Y.
> 
> config TEE
> - bool "Enable TEE mediators support" if EXPERT
> + bool "Enable TEE mediators support (UNSUPPORTED)" if UNSUPPORTED
>   default n
>   help
> This option enables generic TEE mediators support. It allows guests
> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
> index 78f351f94b..302334d3e4 100644
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig

Re: [PATCH 2/6] x86/mm: p2m_add_foreign() is HVM-only

2021-01-25 Thread Jan Beulich

On 23.01.2021 14:22, Julien Grall wrote:
> On 13/01/2021 15:06, Oleksandr wrote:
>> On 12.01.21 13:58, Jan Beulich wrote:
>>> On 11.01.2021 09:23, Oleksandr wrote:
 On 11.01.21 09:41, Jan Beulich wrote:
> If you could also provide your exact .config, I could see whether I
> can repro here with some of the gcc5 versions I have laying around.
 Please see attached
>>> Builds perfectly fine with 5.4.0 here.
>>
>> Thank you for testing.
>>
>>
>> I wonder whether I indeed missed something. I have switched to 5.4.0 
>> again (from 9.3.0) and rechecked, a build issue was still present.
>> I even downloaded 5.4.0 sources and built them to try to build Xen, and 
>> got the same effect.  What I noticed is that for non-debug builds the 
>> build issue wasn't present.
>> Then I decided to build today's staging 
>> (414be7b66349e7dca42bc1fd47c2b2f5b2d27432 xen/memory: Fix compat 
>> XENMEM_acquire_resource for size requests) instead of 9-day's old one when
>> I had initially reported about that build issue 
>> (7ba2ab495be54f608cb47440e1497b2795bd301a x86/p2m: Fix 
>> paging_gva_to_gfn() for nested virt). Today's staging builds perfectly 
>> fine with 5.4.0.
>> It seems that commit in the middle 
>> (994f6478a48a60e3b407c7defc2d36a80f880b04 xsm/dummy: harden against 
>> speculative abuse) indirectly fixes that weird build issue with 5.4.0...
> 
> The gitlab CI reported a similar issue today (see [1]) when building 
> with randconfig ([2]). This is happening on Debian sid with GCC 9.3.
> 
> Note that the default compiler on sid is GCC 10.2.1. So you will have to 
> install the package gcc-9 and then use CC=gcc-9 make <...>.
> 
> 
>  From a local repro, I get the following message:
> 
> ld: ld: prelink.o: in function `xenmem_add_to_physmap_batch':
> /root/xen/xen/common/memory.c:942: undefined reference to 
> `xenmem_add_to_physmap_one'
> /root/xen/xen/common/memory.c:942:(.text+0x22145): relocation truncated 
> to fit: R_X86_64_PLT32 against undefined symbol `xenmem_add_to_physmap_one'
> prelink-efi.o: in function `xenmem_add_to_physmap_batch':
> /root/xen/xen/common/memory.c:942: undefined reference to 
> `xenmem_add_to_physmap_one'
> make[2]: *** [Makefile:215: /root/xen/xen/xen.efi] Error 1
> make[2]: *** Waiting for unfinished jobs
> ld: /root/xen/xen/.xen-syms.0: hidden symbol `xenmem_add_to_physmap_one' 
> isn't defined
> ld: final link failed: bad value
> 
> 
> This points to the call in xenmem_add_to_physmap_batch(). I have played 
> a bit with the .config options. I was able to get it built as soon as I 
> disabled CONFIG_COVERAGE=y.
> 
> So maybe the optimizer is not clever enough on GCC 9 when building with 
> coverage enabled?

Possibly, albeit I can't repro this locally. Even with gcc9
the code gets collapsed sufficiently. I do notice though
that overall gcc10 does quite a bit better a job, so I could
see further factors potentially leading to what you did
observe, and then possibly independent of the specific gcc9
build in use.

Jan

Re: [PATCH v2 3/4] x86: Allow non-faulting accesses to non-emulated MSRs if policy permits this

2021-01-25 Thread Jan Beulich

On 22.01.2021 20:52, Boris Ostrovsky wrote:
> On 1/22/21 7:51 AM, Jan Beulich wrote:
>> On 20.01.2021 23:49, Boris Ostrovsky wrote:
>>> +
>>> +/*
>>> + * Accesses to unimplemented MSRs as part of emulation of instructions
>>> + * other than guest's RDMSR/WRMSR should never succeed.
>>> + */
>>> +if ( !is_guest_msr_access )
>>> +ignore_msrs = MSR_UNHANDLED_NEVER;
>>
>> Wouldn't you better "return true" here? Such accesses also
>> shouldn't be logged imo (albeit I agree that's a change from
>> current behavior).
> 
> 
> Yes, that's why I didn't return here. We will be here in !is_guest_msr_access 
> case most likely due to a bug in the emulator so I think we do want to see 
> the error logged.

Why "most likely"?

>>> +if ( unlikely(ignore_msrs != MSR_UNHANDLED_NEVER) )
>>> +*val = 0;
>>
>> I don't understand the conditional here, even more so with
>> the respective changelog entry. In any event you don't
>> want to clobber the value ahead of ...
>>
>>> +if ( likely(ignore_msrs != MSR_UNHANDLED_SILENT) )
>>> +{
>>> +if ( is_write )
>>> +gdprintk(XENLOG_WARNING, "WRMSR 0x%08x val 0x%016"PRIx64
>>> +" unimplemented\n", msr, *val);
>>
>> ... logging it.
> 
> 
> True. I dropped !is_write from v1 without considering this.
> 
> As far as the conditional --- dropping it too would be a behavior change. 

Albeit an intentional one then? Plus I think I have trouble
seeing what behavior it would be that would change.

>>> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
>>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
>>> @@ -850,4 +850,10 @@ static inline void x86_emul_reset_event(struct 
>>> x86_emulate_ctxt *ctxt)
>>>  ctxt->event = (struct x86_event){};
>>>  }
>>>  
>>> +static inline bool x86_emul_guest_msr_access(struct x86_emulate_ctxt *ctxt)
>>
>> The parameter wants to be pointer-to-const. In addition I wonder
>> whether this wouldn't better be a sibling to
>> x86_insn_is_cr_access() (without a "state" parameter, which
>> would be unused and unavailable to the callers), which may end
>> up finding further uses down the road.
> 
> 
> "Sibling" in terms of name (yes, it would be) or something else?

Name and (possible) purpose - a validate hook could want to
make use of this, for example.

>>> +{
>>> +return ctxt->opcode == X86EMUL_OPC(0x0f, 0x32) ||  /* RDMSR */
>>> +   ctxt->opcode == X86EMUL_OPC(0x0f, 0x30);/* WRMSR */
>>> +}
>>
>> Personally I'd prefer if this was a single comparison:
>>
>> return (ctxt->opcode | 2) == X86EMUL_OPC(0x0f, 0x32);
>>
>> But maybe nowadays' compilers are capable of this
>> transformation?
> 
> Here is what I've got (not an inline but shouldn't make much difference I'd 
> think)
> 
> 82d040385960 : # your code
> 82d040385960:   8b 47 2cmov0x2c(%rdi),%eax
> 82d040385963:   83 e0 fdand$0xfffd,%eax
> 82d040385966:   3d 30 00 0f 00  cmp$0xf0030,%eax
> 82d04038596b:   0f 94 c0sete   %al
> 82d04038596e:   c3  retq
> 
> 82d04038596f : # my code
> 82d04038596f:   8b 47 2cmov0x2c(%rdi),%eax
> 82d040385972:   83 c8 02or $0x2,%eax
> 82d040385975:   3d 32 00 0f 00  cmp$0xf0032,%eax
> 82d04038597a:   0f 94 c0sete   %al
> 82d04038597d:   c3  retq
> 
> 
> So it's a wash in terms of generated code.

True, albeit I guess you got "your code" and "my code" the
wrong way round, as I don't expect the compiler to
translate | into "and".

>> I notice you use this function only from PV priv-op emulation.
>> What about the call paths through hvmemul_{read,write}_msr()?
>> (It's also questionable whether the write paths need this -
>> the only MSR written outside of WRMSR emulation is
>> MSR_SHADOW_GS_BASE, which can't possibly reach the "unhandled"
>> logic anywhere. But maybe better to be future proof here in
>> case new MSR writes appear in the emulator, down the road.)
> 
> 
> Won't we end up in hvm_funcs.msr_write_intercept ops which do call it?

Of course we will - the boolean will very likely need
propagating (a possible alternative being a per-vCPU flag
indicating "in emulator").

Jan

Re: [PATCH] x86/pod: Do not fragment PoD memory allocations

2021-01-25 Thread Andrew Cooper

On 25/01/2021 09:56, Jan Beulich wrote:
> On 24.01.2021 05:47, Elliott Mitchell wrote:
>> Previously p2m_pod_set_cache_target() would fall back to allocating 4KB
>> pages if 2MB pages ran out.  This is counterproductive since it suggests
>> severe memory pressure and is likely a precursor to a memory exhaustion
>> panic.  As such don't try to fill requests for 2MB pages from 4KB pages
>> if 2MB pages run out.
> I disagree - there may be ample 4k pages available, yet no 2Mb
> ones at all. I only agree that this _may_ be counterproductive
> _if indeed_ the system is short on memory.

Further to this, PoD is very frequently used in combination with
ballooning operations, in which case there are (or can be made to be)
plenty of 4k pages, even without a single 2M range in sight.

~Andrew

Re: New Defects reported by Coverity Scan for XenProject

2021-01-25 Thread Jan Beulich

On 24.01.2021 11:35, scan-ad...@coverity.com wrote:
> *** CID 1472394:  Concurrent data access violations  (MISSING_LOCK)
> /xen/drivers/passthrough/x86/hvm.c: 1054 in pci_clean_dpci_irq()
> 1048 list_for_each_entry_safe ( digl, tmp, _dpci->digl_list, 
> list )
> 1049 {
> 1050 list_del(>list);
> 1051 xfree(digl);
> 1052 }
> 1053 /* Note the pirq is now unbound. */
 CID 1472394:  Concurrent data access violations  (MISSING_LOCK)
 Accessing "pirq_dpci->flags" without holding lock "domain.event_lock". 
 Elsewhere, "hvm_pirq_dpci.flags" is accessed with "domain.event_lock" held 
 10 out of 11 times.
> 1054 pirq_dpci->flags = 0;
> 1055 
> 1056 return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0;
> 1057 }

The only (indirect) caller of this function is ...

> 1059 int arch_pci_clean_pirqs(struct domain *d)

... this one, which very clearly acquires the lock in question.
Does anyone have any idea what misleads Coverity here in its
conclusion, and hence possibly what may silence this?

Jan

Re: [PATCH] x86/pod: Do not fragment PoD memory allocations

2021-01-25 Thread Jan Beulich

On 24.01.2021 05:47, Elliott Mitchell wrote:
> Previously p2m_pod_set_cache_target() would fall back to allocating 4KB
> pages if 2MB pages ran out.  This is counterproductive since it suggests
> severe memory pressure and is likely a precursor to a memory exhaustion
> panic.  As such don't try to fill requests for 2MB pages from 4KB pages
> if 2MB pages run out.

I disagree - there may be ample 4k pages available, yet no 2Mb
ones at all. I only agree that this _may_ be counterproductive
_if indeed_ the system is short on memory.

> Signed-off-by: Elliott Mitchell 
> 
> ---
> Changes in v2:
> - Include the obvious removal of the goto target.  Always realize you're
>   at the wrong place when you press "send".

Please could you also label the submission then accordingly? I
got puzzled by two identically titled messages side by side,
until I noticed the difference.

> I'm not including a separate cover message since this is a single hunk.
> This really needs some checking in `xl`.  If one has a domain which
> sometimes gets started on different hosts and is sometimes modified with
> slightly differing settings, one can run into trouble.
> 
> In this case most of the time the particular domain is most often used
> PV/PVH, but every so often is used as a template for HVM.  Starting it
> HVM will trigger PoD mode.  If it is started on a machine with less
> memory than others, PoD may well exhaust all memory and then trigger a
> panic.
> 
> `xl` should likely fail HVM domain creation when the maximum memory
> exceeds available memory (never mind total memory).

I don't think so, no - it's the purpose of PoD to allow starting
a guest despite there not being enough memory available to
satisfy its "max", as such guests are expected to balloon down
immediately, rather than triggering an oom condition.

> For example try a domain with the following settings:
> 
> memory = 8192
> maxmem = 2147483648
> 
> If type is PV or PVH, it will likely boot successfully.  Change type to
> HVM and unless your hardware budget is impressive, Xen will soon panic.

Xen will panic? That would need fixing if so. Also I'd consider
an excessively high maxmem (compared to memory) a configuration
error. According to my experiments long, long ago I seem to
recall that a factor beyond 32 is almost never going to lead to
anything good, irrespective of guest type. (But as said, badness
here should be restricted to the guest; Xen itself should limp
on fine.)

> --- a/xen/arch/x86/mm/p2m-pod.c
> +++ b/xen/arch/x86/mm/p2m-pod.c
> @@ -212,16 +212,13 @@ p2m_pod_set_cache_target(struct p2m_domain *p2m, 
> unsigned long pod_target, int p
>  order = PAGE_ORDER_2M;
>  else
>  order = PAGE_ORDER_4K;
> -retry:
>  page = alloc_domheap_pages(d, order, 0);
>  if ( unlikely(page == NULL) )
>  {
> -if ( order == PAGE_ORDER_2M )
> -{
> -/* If we can't allocate a superpage, try singleton pages */
> -order = PAGE_ORDER_4K;
> -goto retry;
> -}
> +/* Superpages allocation failures likely indicate severe memory
> +** pressure.  Continuing to try to fulfill attempts using 4KB 
> pages
> +** is likely to exhaust memory and trigger a panic.  As such it 
> is
> +** NOT worth trying to use 4KB pages to fulfill 2MB page 
> requests.*/

Just in case my arguments against this change get overridden:
This comment is malformed - please see ./CODING_STYLE.

Jan

Re: [PATCH v3] xen: EXPERT clean-up and introduce UNSUPPORTED

2021-01-25 Thread Jan Beulich

On 23.01.2021 03:19, Stefano Stabellini wrote:
> --- a/xen/Kconfig
> +++ b/xen/Kconfig
> @@ -34,8 +34,15 @@ config DEFCONFIG_LIST
>   option defconfig_list
>   default ARCH_DEFCONFIG
>  
> +config UNSUPPORTED
> + bool "Configure UNSUPPORTED features"
> + help
> +   This option allows certain unsupported Xen options to be changed,
> +   which includes non-security-supported, experimental, and tech
> +   preview features as defined by SUPPORT.md.

And by implication anything not depending on UNSUPPORTED is
supported? I didn't think this was the case (some unsupported
code can't even be turned off via Kconfig), so I think this
needs clarifying here, so we won't end up with people
considering some feature supported which really isn't. That's
irrespective of the reference to SUPPORT.md.

>  config EXPERT
> - bool "Configure standard Xen features (expert users)"
> + bool "Configure EXPERT features"
>   help
> This option allows certain base Xen options and settings
> to be disabled or tweaked. This is for specialized environments

I'd like to suggest to move UNSUPPORTED past this one, to
then have that have "default EXPERT".

Jan

1 2 >

1 - 100 of 105 matches

Mail list logo