Re: [Xen-devel] [PATCH v3] xen: make sure stop_machine_run() is always called in a tasklet

2020-02-28 Thread Jürgen Groß

On 28.02.20 20:06, Andrew Cooper wrote:

On 28/02/2020 17:13, Juergen Gross wrote:

@@ -700,6 +688,32 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
  return ret;
  }
  
+int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)

+{
+int ret;
+struct ucode_buf *buffer;
+
+if ( len != (uint32_t)len )
+return -E2BIG;
+
+if ( microcode_ops == NULL )
+return -EINVAL;
+
+buffer = xmalloc_flex_struct(struct ucode_buf, buffer, len);
+if ( !buffer )
+return -ENOMEM;
+
+ret = copy_from_guest(buffer->buffer, buf, len);
+if ( ret )
+{
+xfree(buffer);
+return -EFAULT;
+}
+buffer->len = len;
+
+return continue_hypercall_on_cpu(0, microcode_update_helper, buffer);


Any reason why cpu 0 here?  There is no restriction at the moment, and
running the tasklet on the current CPU is surely better than poking
CPU0's tasklet queue remotely, then interrupting it.


As stop_machine_run() is scheduling a tasklet on all other cpus it
doesn't really matter. In the end I don't really mind either way.



Everything else looks ok.  This adjustments could be done on commit to
save a v4.

~Andrew

P.S. Might it be sensible to have a continue_hypercall_in_tasklet()
wrapper which passes smp_processor_id() into continue_hypercall_on_cpu()?


When a second user is coming up, maybe.

The other would be continue_hypercall_on_bootcpu().


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [qemu-mainline test] 147710: regressions - FAIL

2020-02-28 Thread osstest service owner
flight 147710 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/147710/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-freebsd10-i386 11 guest-startfail REGR. vs. 144861
 test-amd64-i386-freebsd10-amd64 11 guest-start   fail REGR. vs. 144861
 test-amd64-i386-qemuu-rhel6hvm-amd 10 redhat-install fail REGR. vs. 144861
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-amd64-qemuu-nested-amd 10 debian-hvm-install  fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail REGR. vs. 144861
 test-amd64-i386-qemuu-rhel6hvm-intel 10 redhat-install   fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-amd64-xl-qemuu-win7-amd64 10 windows-install  fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 144861
 test-amd64-amd64-qemuu-nested-intel 10 debian-hvm-install fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-win7-amd64 10 windows-install   fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-install  fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-ws16-amd64 10 windows-install   fail REGR. vs. 144861

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10   fail  like 144861
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144861
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144861
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore

[Xen-devel] [linux-4.14 test] 147718: regressions - FAIL

2020-02-28 Thread osstest service owner
flight 147718 linux-4.14 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/147718/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-examine  8 reboot   fail REGR. vs. 142849
 test-armhf-armhf-xl-credit2   7 xen-boot fail REGR. vs. 142849
 test-armhf-armhf-xl   7 xen-boot fail REGR. vs. 142849
 test-armhf-armhf-xl-vhd   7 xen-boot fail REGR. vs. 142849
 test-armhf-armhf-xl-credit1   7 xen-boot fail REGR. vs. 142849
 test-armhf-armhf-libvirt-raw  7 xen-boot fail REGR. vs. 142849
 test-armhf-armhf-xl-multivcpu  7 xen-bootfail REGR. vs. 142849
 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 
142849
 test-armhf-armhf-xl-arndale   7 xen-boot fail REGR. vs. 142849
 test-armhf-armhf-libvirt  7 xen-boot fail REGR. vs. 142849

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-amd64 15 guest-saverestore.2 fail in 
147334 pass in 147718
 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install  fail pass in 147334
 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail pass in 147487

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10   fail  like 142849
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 142849
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass

version targeted for testing:
 linux98db2bf27b9ed2d5ed0b6c9c8a4bfcb127a19796
baseline version:
 linuxb98aebd298246df37b472c52a2ee1023256d02e3

Last test of basis   142849  2019-10-17 21:11:16 Z  134 days
Failing since143327  2019-10-29 08:49:30 Z  122 days   19 attempts
Testing same since   147094  2020-02-15 17:36:50 Z   13 days9 attempts


1438 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm   

[Xen-devel] CfP for RT-Cloud (Academic) Research Conference

2020-02-28 Thread Dario Faggioli
Hello everyone,

Please, find here The CfP for the first edition of the RT-Cloud
workshop:

  https://www.ecrts.org/rt-cloud-2020/

It's an academic research event, affiliated with one of the most
important academic conferences on real-time systems and real-time
scheduling (ECRTS, https://www.ecrts.org/).

Here they are the first paragraphs of the CfP:

"While most of the current cloud services operate on a best effort
basis and provide no timing guarantees, there is a growing interest in
cloud applications for industrial and critical applications. However,
for cloud technologies to be fully embraced by industry, a higher
degree of determinism is desired. This calls for new techniques and
methodologies to design predictable and reliable cloud applications."

Deadline for writing and submitting a 4 to 6 pages academic research
style paper is on April 16, 2020. This being a workshop, it's would be
generally ok for the paper to be about ideas, proposals and/or
unfinished work.

If interested, but in need of more information, don't hesitate to ask
me.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [linux-linus test] 147706: regressions - FAIL

2020-02-28 Thread osstest service owner
flight 147706 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/147706/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-shadow12 guest-start  fail REGR. vs. 133580
 test-amd64-amd64-xl-shadow   12 guest-start  fail REGR. vs. 133580
 test-amd64-amd64-xl-credit2  12 guest-start  fail REGR. vs. 133580
 test-amd64-amd64-xl-multivcpu 12 guest-start fail REGR. vs. 133580
 test-amd64-amd64-xl-credit1  12 guest-start  fail REGR. vs. 133580
 test-arm64-arm64-xl-credit1  12 guest-start  fail REGR. vs. 133580
 test-arm64-arm64-xl-credit2  12 guest-start  fail REGR. vs. 133580
 test-armhf-armhf-xl-multivcpu 12 guest-start fail REGR. vs. 133580
 test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail REGR. vs. 
133580
 test-armhf-armhf-xl-credit2  12 guest-start  fail REGR. vs. 133580
 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 
133580
 test-armhf-armhf-xl-credit1  12 guest-start  fail REGR. vs. 133580
 test-arm64-arm64-xl-xsm 16 guest-start/debian.repeat fail REGR. vs. 133580
 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 
133580
 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 133580

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 12 guest-start  fail REGR. vs. 133580
 test-armhf-armhf-xl-rtds 12 guest-start  fail REGR. vs. 133580

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-seattle 16 guest-start/debian.repeat fail baseline untested
 test-arm64-arm64-xl-thunderx 16 guest-start/debian.repeat fail baseline 
untested
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 133580
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 133580
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 133580
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 133580
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 133580
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 133580
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 133580
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 133580
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never p

[Xen-devel] PVH dom0 construction timeout

2020-02-28 Thread Andrew Cooper
It turns out that PVH dom0 construction doesn't work so well on a
2-socket Rome system...

(XEN) NX (Execute Disable) protection active

(XEN) *** Building a PVH Dom0 ***

(XEN) Watchdog timer detects that CPU0 is stuck!

(XEN) [ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]

(XEN) CPU:    0

(XEN) RIP:    e008:[] page_get_ram_type+0x58/0xb6

(XEN) RFLAGS: 0206   CONTEXT: hypervisor

(XEN) rax: 82d080948fe0   rbx: 02b73db9   rcx: 

(XEN) rdx: 0400   rsi: 0400   rdi: 002b73db9000

(XEN) rbp: 82d080827be0   rsp: 82d080827ba0   r8:  82d080948fcc

(XEN) r9:  002b73dba000   r10: 82d0809491fc   r11: 8000

(XEN) r12: 02b73db9   r13: 8320341bc000   r14: 0404fc00

(XEN) r15: 82d08046f209   cr0: 8005003b   cr4: 001506e0

(XEN) cr3: a0414000   cr2: 

(XEN) fsb:    gsb:    gss: 

(XEN) ds:    es:    fs:    gs:    ss:    cs: e008

(XEN) Xen code around  (page_get_ram_type+0x58/0xb6):

(XEN)  4c 39 d0 74 4d 49 39 d1 <76> 0b 89 ca 83 ca 10 48 39 38 0f 47 ca 49 89 c0

(XEN) Xen stack trace from rsp=82d080827ba0:

(XEN)    82d08061ee91 82d080827bb4 000b2403 82d080804340

(XEN)    8320341bc000 82d080804340 8303df90 8320341bc000

(XEN)    82d080827c08 82d08061c38c 8320341bc000 82d080827ca8

(XEN)    82d080648750 82d080827c20 82d08061852c 0020

(XEN)    82d080827d60 82d080638abe 82d080232854 82d080930c60

(XEN)    82d080930280 82d080674800 8303df90 01a4

(XEN)    8303df80 82d080827c80 0206 8320341bc000

(XEN)    82d080827cb8 82d080827ca8 82d080232854 82d080961780

(XEN)    82d080930280 82d080827c00 0002 82d08022f9a0

(XEN)    010a4bb0 82d080827ce0 0206 0381b66d

(XEN)    82d080827d00 82d0802b1e87 82d080936900 82d080936900

(XEN)    82d080827d18 82d0802b30d0 82d080936900 82d080827d50

(XEN)    82d08022ef5e 8320341bc000 8303df80 8320341bc000

(XEN)    8303df80 01a4 8303df90 82d080674800

(XEN)    82d080827d98 82d08063cd06 0001 82d080674800

(XEN)    82d080931050 0100 82d080950c80 82d080827ee8

(XEN)    82d08062eae7 01a40fff  00082d080e00

(XEN)     0005 0004 0004

(XEN)    0003 0003 0002 0002

(XEN)    0205  82d080674c20 82d080674ea0

(XEN) Xen call trace:

(XEN)    [] R page_get_ram_type+0x58/0xb6

(XEN)    [] S arch_iommu_hwdom_init+0x239/0x2b7

(XEN)    [] F 
drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0x85/0x9f

(XEN)    [] F iommu_hwdom_init+0x44/0x4b

(XEN)    [] F dom0_construct_pvh+0x160/0x1233

(XEN)    [] F construct_dom0+0x5c/0x280e

(XEN)    [] F __start_xen+0x25db/0x2860

(XEN)    [] F __high_start+0x4c/0x4e

(XEN)

(XEN) CPU1 @ e008:82d0802f203f 
(arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

(XEN) CPU31 @ e008:82d0802f203f 
(arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

(XEN) CPU30 @ e008:82d0802f203f 
(arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

(XEN) CPU27 @ e008:82d08022ad5a (scrub_one_page+0x6d/0x7b)

(XEN) CPU26 @ e008:82d0802f203f 
(arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

(XEN) CPU244 @ e008:82d0802f203f 
(arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

(XEN) CPU245 @ e008:82d08022ad5a (scrub_one_page+0x6d/0x7b)

(XEN) CPU247 @ e008:82d080256e3f 
(drivers/char/ns16550.c#ns_read_reg+0x2d/0x35)

(XEN) CPU246 @ e008:82d0802f203f 
(arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)




This stack trace is the same on several boots, and in particular,
page_get_ram_type() being the %rip which took the timeout.  For an
equivalent PV dom0 build, it takes perceptibly 0 time, based on how
quickly the next line is printed.

I haven't diagnosed the exact issue, but some observations:

The arch_iommu_hwdom_init() loop's positioning of
process_pending_softirqs() looks problematic, because it is short
circuited conditionally by hwdom_iommu_map().

page_get_ram_type() is definitely suboptimal here.  We have an linear
search over a (large-ish) sorted list, and a caller which has every MFN
in the system passed into it, which makes the total runtime of
arch_iommu_hwdom_init() quadratic with the size of the system.

~Andrew
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/5] IOMMU: iommu_snoop is x86/HVM-only

2020-02-28 Thread Andrew Cooper
On 28/02/2020 12:27, Jan Beulich wrote:
> In fact it's VT-d specific, but we don't have a way yet to build code
> for just one vendor. Provide a #define for all other cases.
>
> Signed-off-by: Jan Beulich 

iommu_snoop has no specific interaction with HVM.

It is for any cacheability games the hypervisor may play on a VM, and
that in principle includes PV guests as well.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/5] IOMMU: iommu_intremap is x86-only

2020-02-28 Thread Andrew Cooper
On 28/02/2020 12:26, Jan Beulich wrote:
> Provide a #define for other cases; it didn't seem worthwhile to me to
> introduce an IOMMU_INTREMAP Kconfig option at this point.
>
> Signed-off-by: Jan Beulich 
>
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1299,6 +1299,8 @@ boolean (e.g. `iommu=no`) can override t
>  generation of IOMMUs only supported DMA remapping, and Interrupt 
> Remapping
>  appeared in the second generation.
>  
> +This option is not valid on Arm.

The longevity of this comment would be greater if it were phrased as "is
only valid on x86", especially given the RFC RISCV series on list.

> +
>  *   The `intpost` boolean controls the Posted Interrupt sub-feature.  In
>  combination with APIC acceleration (VT-x APICV, SVM AVIC), the IOMMU can
>  be configured to deliver interrupts from assigned PCI devices directly
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -35,7 +35,6 @@ bool __read_mostly iommu_quarantine = tr
>  bool_t __read_mostly iommu_igfx = 1;
>  bool_t __read_mostly iommu_snoop = 1;
>  bool_t __read_mostly iommu_qinval = 1;
> -enum iommu_intremap __read_mostly iommu_intremap = iommu_intremap_full;
>  bool_t __read_mostly iommu_crash_disable;
>  
>  static bool __hwdom_initdata iommu_hwdom_none;
> @@ -90,8 +89,10 @@ static int __init parse_iommu_param(cons
>  iommu_snoop = val;
>  else if ( (val = parse_boolean("qinval", s, ss)) >= 0 )
>  iommu_qinval = val;
> +#ifndef iommu_intremap
>  else if ( (val = parse_boolean("intremap", s, ss)) >= 0 )
>  iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off;
> +#endif

The use of ifndef in particular makes the result very weird to read. 
There appear to be no uses of iommu_intremap outside of x86 code, other
than in this setup, so having it false in the !CONFIG_X86 case isn't
helpful.

How about just guarding uses of the variable with IS_ENABLED(CONFIG_X86)
and a common extern?  We use this DCE trick already to reduce the
ifdefary in the code.

The result would certainly be easier to follow

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [linux-4.19 test] 147688: regressions - FAIL

2020-02-28 Thread osstest service owner
flight 147688 linux-4.19 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/147688/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 
142932
 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 142932
 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 
142932
 build-i386-pvops  6 kernel-build   fail in 147609 REGR. vs. 142932

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-multivcpu  7 xen-boot  fail pass in 147609

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 1 build-check(1) blocked in 147609 
n/a
 test-amd64-i386-pair  1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
in 147609 n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)blocked in 147609 n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)blocked in 147609 n/a
 test-amd64-i386-xl1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64 1 build-check(1) blocked in 147609 n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)blocked in 147609 n/a
 test-amd64-i386-xl-qemut-win7-amd64  1 build-check(1)blocked in 147609 n/a
 test-amd64-i386-xl-raw1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-xl-qemut-ws16-amd64  1 build-check(1)blocked in 147609 n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1) blocked in 147609 n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1) blocked in 147609 n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked in 
147609 n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-xl-shadow 1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-qemut-rhel6hvm-amd  1 build-check(1) blocked in 147609 n/a
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
in 147609 n/a
 test-amd64-i386-xl-pvshim 1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-examine   1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64 1 build-check(1) blocked in 147609 n/a
 test-amd64-i386-xl-qemuu-ws16-amd64  1 build-check(1)blocked in 147609 n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 1 build-check(1) blocked in 
147609 n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked in 147609 n/a
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 1 build-check(1) blocked in 147609 
n/a
 test-armhf-armhf-xl-multivcpu 13 migrate-support-check fail in 147609 never 
pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-check fail in 147609 
never pass
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 142880
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail like 142932
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-cre

Re: [Xen-devel] [PATCH 5/6] mm: add 'is_special_page' macro...

2020-02-28 Thread Tamas K Lengyel
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> index 3835bc928f..c14a724c6d 100644
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -842,7 +842,7 @@ static int nominate_page(struct domain *d, gfn_t gfn,
>
>  /* Skip xen heap pages */

Perhaps adjust (or remove) the comment?

>  page = mfn_to_page(mfn);
> -if ( !page || is_xen_heap_page(page) )
> +if ( !page || is_special_page(page) )
>  goto out;

Thanks,
Tamas

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/grant-table: Remove 'led' variable in map_grant_ref

2020-02-28 Thread Andrew Cooper
On 28/02/2020 18:57, Julien Grall wrote:
> From: Julien Grall 
>
> The name of the variable 'led' is confusing and only used in one place a
> line after. So remove it.
>
> Signed-off-by: Julien Grall 

I agree.  Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] xen: make sure stop_machine_run() is always called in a tasklet

2020-02-28 Thread Andrew Cooper
On 28/02/2020 17:13, Juergen Gross wrote:
> @@ -700,6 +688,32 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
> buf, unsigned long len)
>  return ret;
>  }
>  
> +int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long 
> len)
> +{
> +int ret;
> +struct ucode_buf *buffer;
> +
> +if ( len != (uint32_t)len )
> +return -E2BIG;
> +
> +if ( microcode_ops == NULL )
> +return -EINVAL;
> +
> +buffer = xmalloc_flex_struct(struct ucode_buf, buffer, len);
> +if ( !buffer )
> +return -ENOMEM;
> +
> +ret = copy_from_guest(buffer->buffer, buf, len);
> +if ( ret )
> +{
> +xfree(buffer);
> +return -EFAULT;
> +}
> +buffer->len = len;
> +
> +return continue_hypercall_on_cpu(0, microcode_update_helper, buffer);

Any reason why cpu 0 here?  There is no restriction at the moment, and
running the tasklet on the current CPU is surely better than poking
CPU0's tasklet queue remotely, then interrupting it.

Everything else looks ok.  This adjustments could be done on commit to
save a v4.

~Andrew

P.S. Might it be sensible to have a continue_hypercall_in_tasklet()
wrapper which passes smp_processor_id() into continue_hypercall_on_cpu()?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/grant-table: Remove 'led' variable in map_grant_ref

2020-02-28 Thread Julien Grall

Please ignore this version as I forgot to CC the maintainers on it.

Cheers,

On 28/02/2020 18:57, Julien Grall wrote:

From: Julien Grall 

The name of the variable 'led' is confusing and only used in one place a
line after. So remove it.

Signed-off-by: Julien Grall 
---
  xen/common/grant_table.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 057c78f620..9fd6e60416 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -944,7 +944,6 @@ map_grant_ref(
  struct domain *ld, *rd, *owner = NULL;
  struct grant_table *lgt, *rgt;
  grant_ref_t ref;
-struct vcpu   *led;
  grant_handle_t handle;
  mfn_t mfn;
  struct page_info *pg = NULL;
@@ -957,8 +956,7 @@ map_grant_ref(
  uint16_t *status;
  bool_t need_iommu;
  
-led = current;

-ld = led->domain;
+ld = current->domain;
  
  if ( unlikely((op->flags & (GNTMAP_device_map|GNTMAP_host_map)) == 0) )

  {



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xen/grant-table: Remove 'led' variable in map_grant_ref

2020-02-28 Thread Julien Grall
From: Julien Grall 

The name of the variable 'led' is confusing and only used in one place a
line after. So remove it.

Signed-off-by: Julien Grall 
---
 xen/common/grant_table.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 057c78f620..9fd6e60416 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -944,7 +944,6 @@ map_grant_ref(
 struct domain *ld, *rd, *owner = NULL;
 struct grant_table *lgt, *rgt;
 grant_ref_t ref;
-struct vcpu   *led;
 grant_handle_t handle;
 mfn_t mfn;
 struct page_info *pg = NULL;
@@ -957,8 +956,7 @@ map_grant_ref(
 uint16_t *status;
 bool_t need_iommu;
 
-led = current;
-ld = led->domain;
+ld = current->domain;
 
 if ( unlikely((op->flags & (GNTMAP_device_map|GNTMAP_host_map)) == 0) )
 {
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xen/grant-table: Remove 'led' variable in map_grant_ref

2020-02-28 Thread Julien Grall
From: Julien Grall 

The name of the variable 'led' is confusing and only used in one place a
line after. So remove it.

Signed-off-by: Julien Grall 
---
 xen/common/grant_table.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 057c78f620..9fd6e60416 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -944,7 +944,6 @@ map_grant_ref(
 struct domain *ld, *rd, *owner = NULL;
 struct grant_table *lgt, *rgt;
 grant_ref_t ref;
-struct vcpu   *led;
 grant_handle_t handle;
 mfn_t mfn;
 struct page_info *pg = NULL;
@@ -957,8 +956,7 @@ map_grant_ref(
 uint16_t *status;
 bool_t need_iommu;
 
-led = current;
-ld = led->domain;
+ld = current->domain;
 
 if ( unlikely((op->flags & (GNTMAP_device_map|GNTMAP_host_map)) == 0) )
 {
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 10/17] tools/libxl: Plumb a restore boolean down into libxl__build_pre()

2020-02-28 Thread Andrew Cooper
On 24/02/2020 17:39, Ian Jackson wrote:
> Andrew Cooper writes ("[PATCH v2 10/17] tools/libxl: Plumb a restore boolean 
> down into libxl__build_pre()"):
>> To fix CPUID handling, libxl__build_pre() is going to have to distinguish
>> between a brand new VM vs one which is being migrated-in/resumed.
> The distinction between libxl__build_pre and these other functions is
> not particularly principled.  It is a legacy from an old API (prior to
> the existince of *create) where libxl callers were expected to "build"
> a domain first and then do other things to it.
>
> Maybe it would be better to pass this via libxl__domain_build_state
> rather than as an additional parameter ?

Well - I tried a similar approach the first time around, and it broke
stubdoms so badly it needed reverting.

(Untrim the commit details)

> v2:
>  * New.  This is c/s aacc1430064 "tools/libxl: Plumb domain_create_state down
>into libxl__build_pre()" take-2, without any collateral damage to stubdoms.

The actual information we want is in libxl__domain_create_state
(specifically, restore_fd >= -1).  I first tried plumbing dcs down, to
avoid stashing the same information in two different structures at
different times.

Sadly, plumbing dcs didn't work because it is common between the real
domain and the stubdom (and this lead to the stubdom getting no settings
at all).  What we want to do is only influence the CPUID construction of
the main domain (which may be migrating in), whereas the stubdom always
wants fresh settings.

I could duplicate it into dbs, and at a guess that would probably work,
but isn't it taking a bad problem and making it worse?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 0/3] VM forking

2020-02-28 Thread Tamas K Lengyel
The following series implements VM forking for Intel HVM guests to allow for
the fast creation of identical VMs without the assosciated high startup costs
of booting or restoring the VM from a savefile.

JIRA issue: https://xenproject.atlassian.net/browse/XEN-89

The fork operation is implemented as part of the "xl fork-vm" command:
xl fork-vm -C  -Q  -m  

By default a fully functional fork is created. The user is in charge however to
create the appropriate config file for the fork and to generate the QEMU save
file before the fork-vm call is made. The config file needs to give the
fork a new name at minimum but other settings may also require changes. Certain
settings in the config file of both the parent and the fork have to be set to
default. Details are documented.

The interface also allows to split the forking into two steps:
xl fork-vm --launch-dm no \
   -m  \
   -p 
xl fork-vm --launch-dm late \
   -C  \
   -Q  \
   

The split creation model is useful when the VM needs to be created as fast as
possible. The forked VM can be unpaused without the device model being launched
to be monitored and accessed via VMI. Note however that without its device
model running (depending on what is executing in the VM) it is bound to
misbehave or even crash when its trying to access devices that would be
emulated by QEMU. We anticipate that for certain use-cases this would be an
acceptable situation, in case for example when fuzzing is performed of code
segments that don't access such devices.

Launching the device model requires the QEMU Xen savefile to be generated
manually from the parent VM. This can be accomplished simply by connecting to
its QMP socket and issuing the "xen-save-devices-state" command. For example
using the standard tool socat these commands can be used to generate the file:
socat - UNIX-CONNECT:/var/run/xen/qmp-libxl-
{ "execute": "qmp_capabilities" }
{ "execute": "xen-save-devices-state", \
"arguments": { "filename": "/path/to/save/qemu_state", \
"live": false} }

At runtime the forked VM starts running with an empty p2m which gets lazily
populated when the VM generates EPT faults, similar to how altp2m views are
populated. If the memory access is a read-only access, the p2m entry is
populated with a memory shared entry with its parent. For write memory accesses
or in case memory sharing wasn't possible (for example in case a reference is
held by a third party), a new page is allocated and the page contents are
copied over from the parent VM. Forks can be further forked if needed, thus
allowing for further memory savings.

A VM fork reset hypercall is also added that allows the fork to be reset to the
state it was just after a fork, also accessible via xl:
xl fork-vm --fork-reset -p 

This is an optimization for cases where the forks are very short-lived and run
without a device model, so resetting saves some time compared to creating a
brand new fork provided the fork has not aquired a lot of memory. If the fork
has a lot of memory deduplicated it is likely going to be faster to create a
new fork from scratch and asynchronously destroying the old one.

The series has been tested with Windows VMs and functions as expected. Linux
VMs when forked from a running VM will have a frozen VNC screen. Linux VMs at
this time can only be forked with a working device model when the parent VM was
restored from a snapshot using "xl restore -p". This is a known limitation.
Also note that PVHVM/PVH Linux guests have not been tested. Forking most likely
works but PV devices and drivers would require additional wiring to set things
up properly since the guests are unaware of the forking taking place, unlike
the save/restore routine where the guest is made aware of the procedure.

Forking time has been measured to be 0.0007s, device model launch to be around
1s depending largely on the number of devices being emulated. Fork resets have
been measured to be 0.0001s under the optimal circumstances.

New in v11:
Fully copy & reset vcpu_info pages
Setup vcpu_runstate for forks
Added TODO note for PV timers
Copy & reset shared_info page
Copy & reset HVM special pages

New in v10:
Rebased on staging and minor fixes for things pointed out by Roger
Allocate pages for vcpu_info if used by parent
Document limitation of guest settings that have to be set to default
Require max-vcpus to be specified by toolstack-side
Code movement in toolstack & compile tested on ARM
Implement hypercall continuation for reset operation

Patch 1-2 implements the VM fork & reset operation hypervisor side bits

Patch 3 adds the toolstack-side code implementing VM forking and reset

Tamas K Lengyel (3):
  xen/mem_sharing: VM forking
  x86/mem_sharing: reset a fork
  xen/tools: VM forking toolstack side

 docs/man/xl.1.pod.in  |  44 +++
 tools/libxc/include/xenctrl.

[Xen-devel] [xen-unstable-smoke test] 147730: tolerable all pass - PUSHED

2020-02-28 Thread osstest service owner
flight 147730 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/147730/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  9649cef3b3a7eaca1347154ea7f274586d48bc29
baseline version:
 xen  90d19e6f53a47f8f7f2154c67f03adc192c0d760

Last test of basis   147711  2020-02-28 00:00:57 Z0 days
Testing same since   147730  2020-02-28 16:00:31 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   90d19e6f53..9649cef3b3  9649cef3b3a7eaca1347154ea7f274586d48bc29 -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 1/3] xen/mem_sharing: VM forking

2020-02-28 Thread Tamas K Lengyel
VM forking is the process of creating a domain with an empty memory space and a
parent domain specified from which to populate the memory when necessary. For
the new domain to be functional the VM state is copied over as part of the fork
operation (HVM params, hap allocation, etc).

Signed-off-by: Tamas K Lengyel 
---
v11: Fully copy vcpu_info pages
 Setup vcpu_runstate for forks
 Added TODO note for PV timers
 Copy shared_info page
 Add copy_settings function, to be shared with fork_reset in the next patch
---
 xen/arch/x86/domain.c |  11 +
 xen/arch/x86/hvm/hvm.c|   4 +-
 xen/arch/x86/mm/hap/hap.c |   3 +-
 xen/arch/x86/mm/mem_sharing.c | 368 ++
 xen/arch/x86/mm/p2m.c |   9 +-
 xen/common/domain.c   |   3 +
 xen/include/asm-x86/hap.h |   1 +
 xen/include/asm-x86/hvm/hvm.h |   2 +
 xen/include/asm-x86/mem_sharing.h |  17 ++
 xen/include/public/memory.h   |   5 +
 xen/include/xen/sched.h   |   5 +
 11 files changed, 423 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fe63c23676..1ab0ca0942 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2203,6 +2203,17 @@ int domain_relinquish_resources(struct domain *d)
 ret = relinquish_shared_pages(d);
 if ( ret )
 return ret;
+
+/*
+ * If the domain is forked, decrement the parent's pause count
+ * and release the domain.
+ */
+if ( d->parent )
+{
+domain_unpause(d->parent);
+put_domain(d->parent);
+d->parent = NULL;
+}
 }
 #endif
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index db5d7b4d30..c551cc31f3 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1916,7 +1916,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 }
 #endif
 
-/* Spurious fault? PoD and log-dirty also take this path. */
+/* Spurious fault? PoD, log-dirty and VM forking also take this path. */
 if ( p2m_is_ram(p2mt) )
 {
 rc = 1;
@@ -4430,7 +4430,7 @@ static int hvm_allow_get_param(struct domain *d,
 return rc;
 }
 
-static int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value)
+int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value)
 {
 int rc;
 
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 3d93f3451c..c7c7ff6e99 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct 
page_info *pg)
 }
 
 /* Return the size of the pool, rounded up to the nearest MB */
-static unsigned int
-hap_get_allocation(struct domain *d)
+unsigned int hap_get_allocation(struct domain *d)
 {
 unsigned int pg = d->arch.paging.hap.total_pages
 + d->arch.paging.hap.p2m_pages;
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 3835bc928f..4a840d09dd 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +37,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include "mm-locks.h"
@@ -1444,6 +1447,334 @@ static inline int mem_sharing_control(struct domain *d, 
bool enable)
 return 0;
 }
 
+/*
+ * Forking a page only gets called when the VM faults due to no entry being
+ * in the EPT for the access. Depending on the type of access we either
+ * populate the physmap with a shared entry for read-only access or
+ * fork the page if its a write access.
+ *
+ * The client p2m is already locked so we only need to lock
+ * the parent's here.
+ */
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
+{
+int rc = -ENOENT;
+shr_handle_t handle;
+struct domain *parent = d->parent;
+struct p2m_domain *p2m;
+unsigned long gfn_l = gfn_x(gfn);
+mfn_t mfn, new_mfn;
+p2m_type_t p2mt;
+struct page_info *page;
+
+if ( !mem_sharing_is_fork(d) )
+return -ENOENT;
+
+if ( !unsharing )
+{
+/* For read-only accesses we just add a shared entry to the physmap */
+while ( parent )
+{
+if ( !(rc = nominate_page(parent, gfn, 0, &handle)) )
+break;
+
+parent = parent->parent;
+}
+
+if ( !rc )
+{
+/* The client's p2m is already locked */
+struct p2m_domain *pp2m = p2m_get_hostp2m(parent);
+
+p2m_lock(pp2m);
+rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false);
+p2m_unlock(pp2m);
+
+if ( !rc )
+return 0;
+}
+}
+
+/*
+ * If it's a write access (ie. unsharing) or if adding a shared entry to
+ * the physmap failed 

[Xen-devel] [PATCH v11 2/3] x86/mem_sharing: reset a fork

2020-02-28 Thread Tamas K Lengyel
Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel 
---
v11: use copy_settings function to reset special pages
---
 xen/arch/x86/mm/mem_sharing.c | 115 ++
 xen/include/public/memory.h   |   4 ++
 2 files changed, 119 insertions(+)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 4a840d09dd..a458096b01 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1775,6 +1775,91 @@ static int fork(struct domain *cd, struct domain *d)
 return rc;
 }
 
+/*
+ * The fork reset operation is intended to be used on short-lived forks only.
+ */
+static int fork_reset(struct domain *d, struct domain *cd,
+  struct mem_sharing_op_fork_reset *fr)
+{
+int rc = 0;
+struct p2m_domain* p2m = p2m_get_hostp2m(cd);
+struct page_info *page, *tmp;
+unsigned long list_position = 0, preempt_count = 0, restart = fr->opaque;
+
+domain_pause(cd);
+
+page_list_for_each_safe(page, tmp, &cd->page_list)
+{
+p2m_type_t p2mt;
+p2m_access_t p2ma;
+gfn_t gfn;
+mfn_t mfn;
+bool shared = false;
+
+list_position++;
+
+/* Resume were we left of before preemption */
+if ( restart && list_position < restart )
+continue;
+
+mfn = page_to_mfn(page);
+if ( mfn_valid(mfn) )
+{
+
+gfn = mfn_to_gfn(cd, mfn);
+mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
+0, NULL, false);
+
+if ( p2m_is_ram(p2mt) && !p2m_is_shared(p2mt) )
+{
+/* take an extra reference, must work for a shared page */
+if( !get_page(page, cd) )
+{
+ASSERT_UNREACHABLE();
+return -EINVAL;
+}
+
+shared = true;
+preempt_count += 0x10;
+
+/*
+ * Must succeed, it's a shared page that exists and
+ * thus its size is guaranteed to be 4k so we are not splitting
+ * large pages.
+ */
+rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
+p2m_invalid, p2m_access_rwx, -1);
+ASSERT(!rc);
+
+put_page_alloc_ref(page);
+put_page(page);
+}
+}
+
+if ( !shared )
+preempt_count++;
+
+/* Preempt every 2MiB (shared) or 32MiB (unshared) - arbitrary. */
+if ( preempt_count >= 0x2000 )
+{
+if ( hypercall_preempt_check() )
+{
+rc = -ERESTART;
+break;
+}
+preempt_count = 0;
+}
+}
+
+if ( rc )
+fr->opaque = list_position;
+else
+rc = copy_settings(cd, d);
+
+domain_unpause(cd);
+return rc;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
 int rc;
@@ -2066,6 +2151,36 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 break;
 }
 
+case XENMEM_sharing_op_fork_reset:
+{
+struct domain *pd;
+
+rc = -ENOSYS;
+if ( !mem_sharing_is_fork(d) )
+goto out;
+
+rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, &pd);
+if ( rc )
+goto out;
+
+rc = fork_reset(pd, d, &mso.u.fork_reset);
+
+rcu_unlock_domain(pd);
+
+if ( rc > 0 )
+{
+if ( __copy_to_guest(arg, &mso, 1) )
+rc = -EFAULT;
+else
+rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
+   "lh", XENMEM_sharing_op,
+   arg);
+}
+else
+mso.u.fork_reset.opaque = 0;
+break;
+}
+
 default:
 rc = -ENOSYS;
 break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index c1dbad060e..7ca07c01dd 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_audit 7
 #define XENMEM_sharing_op_range_share   8
 #define XENMEM_sharing_op_fork  9
+#define XENMEM_sharing_op_fork_reset10

[Xen-devel] [PATCH v11 3/3] xen/tools: VM forking toolstack side

2020-02-28 Thread Tamas K Lengyel
Add necessary bits to implement "xl fork-vm" commands. The command allows the
user to specify how to launch the device model allowing for a late-launch model
in which the user can execute the fork without the device model and decide to
only later launch it.

Signed-off-by: Tamas K Lengyel 
---
 docs/man/xl.1.pod.in  |  44 +
 tools/libxc/include/xenctrl.h |  13 ++
 tools/libxc/xc_memshr.c   |  22 +++
 tools/libxl/libxl.h   |  11 ++
 tools/libxl/libxl_create.c| 361 +++---
 tools/libxl/libxl_dm.c|   2 +-
 tools/libxl/libxl_dom.c   |  43 +++-
 tools/libxl/libxl_internal.h  |   7 +
 tools/libxl/libxl_types.idl   |   1 +
 tools/libxl/libxl_x86.c   |  41 
 tools/xl/Makefile |   2 +-
 tools/xl/xl.h |   5 +
 tools/xl/xl_cmdtable.c|  15 ++
 tools/xl/xl_forkvm.c  | 147 ++
 tools/xl/xl_vmcontrol.c   |  14 ++
 15 files changed, 562 insertions(+), 166 deletions(-)
 create mode 100644 tools/xl/xl_forkvm.c

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 09339282e6..59c03c6427 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -708,6 +708,50 @@ above).
 
 =back
 
+=item B [I] I
+
+Create a fork of a running VM.  The domain will be paused after the operation
+and remains paused while forks of it exist.  Experimental and x86 only.
+Forks can only be made of domains with HAP enabled and on Intel hardware.  The
+parent domain must be created with the xl toolstack and its configuration must
+not manually define max_grant_frames, max_maptrack_frames or 
max_event_channels.
+
+B
+
+=over 4
+
+=item B<-p>
+
+Leave the fork paused after creating it.
+
+=item B<--launch-dm>
+
+Specify whether the device model (QEMU) should be launched for the fork. Late
+launch allows to start the device model for an already running fork.
+
+=item B<-C>
+
+The config file to use when launching the device model.  Currently required 
when
+launching the device model.  Most config settings MUST match the parent domain
+exactly, only change VM name, disk path and network configurations.
+
+=item B<-Q>
+
+The path to the qemu save file to use when launching the device model.  
Currently
+required when launching the device model.
+
+=item B<--fork-reset>
+
+Perform a reset operation of an already running fork.  Note that resetting may
+be less performant then creating a new fork depending on how much memory the
+fork has deduplicated during its runtime.
+
+=item B<--max-vcpus>
+
+Specify the max-vcpus matching the parent domain when not launching the dm.
+
+=back
+
 =item B [I]
 
 Display the number of shared pages for a specified domain. If no domain is
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index fc6e57a1a0..00cb4cf1f7 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2225,6 +2225,19 @@ int xc_memshr_range_share(xc_interface *xch,
   uint64_t first_gfn,
   uint64_t last_gfn);
 
+int xc_memshr_fork(xc_interface *xch,
+   uint32_t source_domain,
+   uint32_t client_domain);
+
+/*
+ * Note: this function is only intended to be used on short-lived forks that
+ * haven't yet aquired a lot of memory. In case the fork has a lot of memory
+ * it is likely more performant to create a new fork with xc_memshr_fork.
+ *
+ * With VMs that have a lot of memory this call may block for a long time.
+ */
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t forked_domain);
+
 /* Debug calls: return the number of pages referencing the shared frame backing
  * the input argument. Should be one or greater.
  *
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index 97e2e6a8d9..d0e4ee225b 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -239,6 +239,28 @@ int xc_memshr_debug_gref(xc_interface *xch,
 return xc_memshr_memop(xch, domid, &mso);
 }
 
+int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid)
+{
+xen_mem_sharing_op_t mso;
+
+memset(&mso, 0, sizeof(mso));
+
+mso.op = XENMEM_sharing_op_fork;
+mso.u.fork.parent_domain = pdomid;
+
+return xc_memshr_memop(xch, domid, &mso);
+}
+
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t domid)
+{
+xen_mem_sharing_op_t mso;
+
+memset(&mso, 0, sizeof(mso));
+mso.op = XENMEM_sharing_op_fork_reset;
+
+return xc_memshr_memop(xch, domid, &mso);
+}
+
 int xc_memshr_audit(xc_interface *xch)
 {
 xen_mem_sharing_op_t mso;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 35e13428b2..6b968bdcb4 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -2657,6 +2657,17 @@ int libxl_psr_get_hw_info(libxl_ctx *ctx, 
libxl_psr_feat_type type,
   unsigned int lvl, unsigned int *nr,
   libxl_psr_hw_info **info);
 void libxl_psr_hw_info_list_free(libxl_psr_hw_info *list, unsigned in

[Xen-devel] [libvirt test] 147703: regressions - FAIL

2020-02-28 Thread osstest service owner
flight 147703 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/147703/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 146182
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 146182
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 146182
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 146182

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  0b0907316dd34b3266bf5e8d766e659e15742da5
baseline version:
 libvirt  a1cd25b919509be2645dbe6f952d5263e0d4e4e5

Last test of basis   146182  2020-01-17 06:00:23 Z   42 days
Failing since146211  2020-01-18 04:18:52 Z   41 days   39 attempts
Testing same since   147703  2020-02-27 21:03:20 Z0 days1 attempts


People who touched revisions under test:
  Andrea Bolognani 
  Arnaud Patard 
  Boris Fiuczynski 
  Christian Ehrhardt 
  Collin Walling 
  Daniel Henrique Barboza 
  Daniel P. Berrangé 
  Dario Faggioli 
  Erik Skultety 
  Han Han 
  Jim Fehlig 
  Jiri Denemark 
  Jonathon Jongsma 
  Julio Faracco 
  Ján Tomko 
  Laine Stump 
  Marek Marczykowski-Górecki 
  Michal Privoznik 
  Nikolay Shirokovskiy 
  Pavel Hrdina 
  Pavel Mores 
  Peter Krempa 
  Richard W.M. Jones 
  Rikard Falkeborn 
  Ryan Moeller 
  Sahid Orentino Ferdjaoui 
  Stefan Berger 
  Stefan Berger 
  Thomas Huth 
  Your Name 
  zhenwei pi 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-arm64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   blocked 
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked 
 test-amd64-amd64-libvirt-xsm blocked 
 test-arm64-arm64-libvirt-xsm blocked 
 test-amd64-i386-libvirt-xsm  blocked 
 test-amd64-amd64-libvirt blocked 
 test-arm64-arm64-libvirt blocked 
 test-armhf-armhf-libvirt blocked 
 test-amd64-i386-libvirt  blocked 
 test-amd64-amd64-libvirt-pairblocked 
 test-amd64-i386-libvirt-pair blocked 
 test-arm64-arm64-libvirt-qcow2   blocked 
 test-armhf-armhf-libvirt-raw blocked 
 test-amd64-amd64-libvirt-vhd blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject

[Xen-devel] [xen-unstable test] 147683: regressions - FAIL

2020-02-28 Thread osstest service owner
flight 147683 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/147683/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail REGR. vs. 
147600

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 16 guest-localmigrate   fail REGR. vs. 147600

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail like 
147600
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 147600
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 147600
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 147600
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 147600
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 147600
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 147600
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 147600
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 147600
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 147600
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 147600
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 xen  d7d751bfdbd4424e286eddfa0ce36075627e3c31
baseline version:
 xen  e465fecbfdb865c75f762055c0396bc617005748

Last test of basis   147600  2020-02-25 13:42:49 Z3 days
Testing sam

Re: [Xen-devel] [PATCH] x86/ioperm: add new paravirt function update_io_bitmap

2020-02-28 Thread Thomas Gleixner
Jürgen Groß  writes:

> Friendly ping...

Ooops. I pick it up first thing tomorrow morning

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 0/2] docs: Migration design documents

2020-02-28 Thread Durrant, Paul
Ping again...

> -Original Message-
> From: Durrant, Paul 
> Sent: 20 February 2020 12:54
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Andrew Cooper ; George Dunlap
> ; Ian Jackson ;
> Jan Beulich ; Julien Grall ; Konrad
> Rzeszutek Wilk ; Stefano Stabellini
> ; Wei Liu 
> Subject: RE: [PATCH v5 0/2] docs: Migration design documents
> 
> Ping?
> 
> I have not receieved any further comments on v5. Can I please get acks or
> otherwise so we can (hopefully) move on with coding?
> 
>   Paul
> 
> > -Original Message-
> > From: Paul Durrant 
> > Sent: 13 February 2020 10:53
> > To: xen-devel@lists.xenproject.org
> > Cc: Durrant, Paul ; Andrew Cooper
> > ; George Dunlap
> ;
> > Ian Jackson ; Jan Beulich
> ;
> > Julien Grall ; Konrad Rzeszutek Wilk
> > ; Stefano Stabellini ;
> Wei
> > Liu 
> > Subject: [PATCH v5 0/2] docs: Migration design documents
> >
> > Paul Durrant (2):
> >   docs/designs: Add a design document for non-cooperative live migration
> >   docs/designs: Add a design document for migration of xenstore data
> >
> >  docs/designs/non-cooperative-migration.md | 272 ++
> >  docs/designs/xenstore-migration.md| 136 +++
> >  2 files changed, 408 insertions(+)
> >  create mode 100644 docs/designs/non-cooperative-migration.md
> >  create mode 100644 docs/designs/xenstore-migration.md
> > ---
> > Cc: Andrew Cooper 
> > Cc: George Dunlap 
> > Cc: Ian Jackson 
> > Cc: Jan Beulich 
> > Cc: Julien Grall 
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: Stefano Stabellini 
> > Cc: Wei Liu 
> > --
> > 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [XEN PATCH v2] libxl: wait for console path before firing console_available

2020-02-28 Thread Marek Marczykowski-Górecki
On Thu, Feb 20, 2020 at 02:31:03PM +0100, Paweł Marczewski wrote:
> If we skip the bootloader, the TTY path will be set for xenconsoled.
> However, there is no guarantee that this will happen by the time we
> want to call the console_available callback, so we have to wait.
> 
> Signed-off-by: Paweł Marczewski 

With minor fix below:
Reviewed-by: Marek Marczykowski-Górecki 

> ---
> Changed since v1:
>   * use xswait mechanism to add a timeout
> 
> As mentioned before, this is to fix a race condition that appears when
> using libxl via libvirt and not using bootloader (console_available
> fires too early).
> 
> I have tested the patch on Qubes OS 4.1 (with Xen 4.13), and it seems
> to solve the problem. I also checked the timeout: when xenconsoled is
> stopped, libxl waits for 10 seconds and then aborts domain creation.
> 
>  tools/libxl/libxl_create.c   | 43 +---
>  tools/libxl/libxl_internal.h |  1 +
>  2 files changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 3a7364e2ac..4b150d92b9 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -1190,6 +1190,33 @@ static void domcreate_console_available(libxl__egc 
> *egc,
>  dcs->aop_console_how.for_event));
>  }
>  
> +static void console_xswait_callback(libxl__egc *egc, libxl__xswait_state 
> *xswa,
> +int rc, const char *p)
> +{
> +EGC_GC;
> +libxl__domain_create_state *dcs = CONTAINER_OF(xswa, *dcs, 
> console_xswait);
> +char *dompath = libxl__xs_get_dompath(gc, dcs->guest_domid);
> +char *tty_path = GCSPRINTF("%s/console/tty", dompath);
> +char *tty;
> +
> +if (rc) {
> +if (rc == ERROR_TIMEDOUT)
> +LOG(ERROR, "%s: timed out", xswa->what);
> +libxl__xswait_stop(gc, xswa);
> +domcreate_complete(egc, dcs, rc);
> +return;
> +}
> +
> +tty = libxl__xs_read(gc, XBT_NULL, tty_path);
> +
> +if (tty && tty[0] != '\0') {
> +libxl__xswait_stop(gc, xswa);
> +
> +domcreate_console_available(egc, dcs);
> +domcreate_complete(egc, dcs, 0);
> +}
> +}
> +
>  static void domcreate_bootloader_done(libxl__egc *egc,
>libxl__bootloader_state *bl,
>int rc)
> @@ -1728,9 +1755,18 @@ static void domcreate_attach_devices(libxl__egc *egc,
>  return;
>  }
>  
> -domcreate_console_available(egc, dcs);
> -
> -domcreate_complete(egc, dcs, 0);
> +dcs->console_xswait.ao = ao;
> +dcs->console_xswait.what = GCSPRINTF("domain %d console tty", domid);
> +dcs->console_xswait.path = GCSPRINTF("%s/console/tty",
> + libxl__xs_get_dompath(gc, domid));
> +dcs->console_xswait.timeout_ms = 10 * 1000;

Better not use explicit value _here_, but a constant in some header. I
think LIBXL_INIT_TIMEOUT is a good fit here.

> +dcs->console_xswait.callback = console_xswait_callback;
> +ret = libxl__xswait_start(gc, &dcs->console_xswait);
> +if (ret) {
> +LOG(ERROR, "unable to set up watch for domain %d console path",
> +domid);
> +goto error_out;
> +}
>  
>  return;
>  
> @@ -1861,6 +1897,7 @@ static int do_domain_create(libxl_ctx *ctx, 
> libxl_domain_config *d_config,
>  
>  libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how);
>  cdcs->domid_out = domid;
> +libxl__xswait_init(&cdcs->dcs.console_xswait);
>  
>  initiate_domain_create(egc, &cdcs->dcs);
>  
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 4936446069..d8129417dc 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -4180,6 +4180,7 @@ struct libxl__domain_create_state {
>  /* necessary if the domain creation failed and we have to destroy it */
>  libxl__domain_destroy_state dds;
>  libxl__multidev multidev;
> +libxl__xswait_state console_xswait;
>  };
>  
>  _hidden int libxl__device_nic_set_devids(libxl__gc *gc,

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 06:00:44PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
> > This greatly increases the performance of TLB flushes when running
> > with a high amount of vCPUs as a Xen guest, and is specially important
> > when running in shim mode.
> > 
> > The following figures are from a PV guest running `make -j32 xen` in
> > shim mode with 32 vCPUs and HAP.
> > 
> > Using x2APIC and ALLBUT shorthand:
> > real4m35.973s
> > user4m35.110s
> > sys 36m24.117s
> > 
> > Using L0 assisted flush:
> > real1m2.596s
> > user4m34.818s
> > sys 5m16.374s
> > 
> > The implementation adds a new hook to hypervisor_ops so other
> > enlightenments can also implement such assisted flush just by filling
> > the hook. Note that the Xen implementation completely ignores the
> > dirty CPU mask and the linear address passed in, and always performs a
> > global TLB flush on all vCPUs.
> 
> This isn't because of an implementation choice of yours, but because
> of how HVMOP_flush_tlbs works. I think the statement should somehow
> express this. I also think it wants clarifying that using the
> hypercall is indeed faster even in the case of single-page, single-
> CPU flush

Are you sure about this? I think taking a vmexit is going to be more
costly than executing a local invlpg?

> (which I suspect may not be the case especially as vCPU
> count grows). The stats above prove a positive overall effect, but
> they don't say whether the effect could be even bigger by being at
> least a little selective.

I assume that being able to provide a bitmap with the vCPUs on whether
the TLB flush should be performed would give us some more performance,
but I haven't looked into it yet.

> > @@ -73,6 +74,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820)
> >  ops.e820_fixup(e820);
> >  }
> >  
> > +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
> > + unsigned int order)
> > +{
> > +if ( ops.flush_tlb )
> > +return alternative_call(ops.flush_tlb, mask, va, order);
> > +
> > +return -ENOSYS;
> > +}
> 
> Please no new -ENOSYS anywhere (except in new ports' top level
> hypercall handlers).

Ack. Is EOPNOTSUPP OK?

> > @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void 
> > *va, unsigned int flags)
> >  if ( (flags & ~FLUSH_ORDER_MASK) &&
> >   !cpumask_subset(mask, cpumask_of(cpu)) )
> >  {
> > +if ( cpu_has_hypervisor &&
> > + !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
> > + FLUSH_ORDER_MASK)) &&
> > + !hypervisor_flush_tlb(mask, va, (flags - 1) & 
> > FLUSH_ORDER_MASK) )
> > +{
> > +if ( tlb_clk_enabled )
> > +tlb_clk_enabled = false;
> 
> Why does this need doing here? Couldn't Xen guest setup code
> clear the flag?

Because it's possible that the hypercalls fails, and hence the tlb
clock should be kept enabled. There's no reason to disable it until
Xen knows the assisted flush is indeed available.

I don't mind moving it to Xen guest setup code, but I'm not sure I see
why it would be any better than doing it here. The only reason I guess
is to avoid checking tlb_clk_enabled on every successful assisted
flush?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3] xen: make sure stop_machine_run() is always called in a tasklet

2020-02-28 Thread Juergen Gross
With core scheduling active it is mandatory for stop_machine_run() to
be called in idle context only (so either during boot or in a tasklet),
as otherwise a scheduling deadlock would occur: stop_machine_run()
does a cpu rendezvous by activating a tasklet on all other cpus. In
case stop_machine_run() was not called in an idle vcpu it would block
scheduling the idle vcpu on its siblings with core scheduling being
active, resulting in a hang.

Put a BUG_ON() into stop_machine_run() to test for being called in an
idle vcpu only and adapt the missing call site (ucode loading) to use a
tasklet for calling stop_machine_run().

Signed-off-by: Juergen Gross 
---
V2:
- rephrase commit message (Julien Grall)

V3:
- add comments (Jan Beulich)
---
 xen/arch/x86/microcode.c  | 54 +--
 xen/common/rcupdate.c |  4 
 xen/common/stop_machine.c |  7 ++
 3 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 35c1d36cdc..4cf4e66b18 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -562,30 +562,18 @@ static int do_microcode_update(void *patch)
 return ret;
 }
 
-int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
+struct ucode_buf {
+unsigned int len;
+char buffer[];
+};
+
+static long microcode_update_helper(void *data)
 {
 int ret;
-void *buffer;
+struct ucode_buf *buffer = data;
 unsigned int cpu, updated;
 struct microcode_patch *patch;
 
-if ( len != (uint32_t)len )
-return -E2BIG;
-
-if ( microcode_ops == NULL )
-return -EINVAL;
-
-buffer = xmalloc_bytes(len);
-if ( !buffer )
-return -ENOMEM;
-
-ret = copy_from_guest(buffer, buf, len);
-if ( ret )
-{
-xfree(buffer);
-return -EFAULT;
-}
-
 /* cpu_online_map must not change during update */
 if ( !get_cpu_maps() )
 {
@@ -607,7 +595,7 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
 return -EPERM;
 }
 
-patch = parse_blob(buffer, len);
+patch = parse_blob(buffer->buffer, buffer->len);
 xfree(buffer);
 if ( IS_ERR(patch) )
 {
@@ -700,6 +688,32 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
 return ret;
 }
 
+int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
+{
+int ret;
+struct ucode_buf *buffer;
+
+if ( len != (uint32_t)len )
+return -E2BIG;
+
+if ( microcode_ops == NULL )
+return -EINVAL;
+
+buffer = xmalloc_flex_struct(struct ucode_buf, buffer, len);
+if ( !buffer )
+return -ENOMEM;
+
+ret = copy_from_guest(buffer->buffer, buf, len);
+if ( ret )
+{
+xfree(buffer);
+return -EFAULT;
+}
+buffer->len = len;
+
+return continue_hypercall_on_cpu(0, microcode_update_helper, buffer);
+}
+
 static int __init microcode_init(void)
 {
 /*
diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index 91d4ad0fd8..d76b991627 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -178,6 +178,10 @@ static int rcu_barrier_action(void *_cpu_count)
 return 0;
 }
 
+/*
+ * As rcu_barrier() is using stop_machine_run() it is allowed to be used in
+ * idle context only (see comment for stop_machine_run()).
+ */
 int rcu_barrier(void)
 {
 atomic_t cpu_count = ATOMIC_INIT(0);
diff --git a/xen/common/stop_machine.c b/xen/common/stop_machine.c
index 33d9602217..2d5f6aef61 100644
--- a/xen/common/stop_machine.c
+++ b/xen/common/stop_machine.c
@@ -67,6 +67,12 @@ static void stopmachine_wait_state(void)
 cpu_relax();
 }
 
+/*
+ * Sync all processors and call a function on one or all of them.
+ * As stop_machine_run() is using a tasklet for syncing the processors it is
+ * mandatory to be called only on an idle vcpu, as otherwise active core
+ * scheduling might hang.
+ */
 int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu)
 {
 unsigned int i, nr_cpus;
@@ -74,6 +80,7 @@ int stop_machine_run(int (*fn)(void *), void *data, unsigned 
int cpu)
 int ret;
 
 BUG_ON(!local_irq_is_enabled());
+BUG_ON(!is_idle_vcpu(current));
 
 /* cpu_online_map must not change. */
 if ( !get_cpu_maps() )
-- 
2.16.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush

2020-02-28 Thread Jan Beulich
On 28.02.2020 17:57, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 05:44:57PM +0100, Jan Beulich wrote:
>> On 28.02.2020 17:31, Roger Pau Monné wrote:
>>> On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote:
 On 19.02.2020 18:43, Roger Pau Monne wrote:
> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, 
> struct vcpu *v),
>  if ( !flush_vcpu(ctxt, v) )
>  continue;
>  
> -paging_update_cr3(v, false);
> +hvm_asid_flush_vcpu(v);
>  
>  cpu = read_atomic(&v->dirty_cpu);
> -if ( is_vcpu_dirty_cpu(cpu) )
> +if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
>  __cpumask_set_cpu(cpu, mask);
>  }
>  
> -/* Flush TLBs on all CPUs with dirty vcpu state. */
> -flush_tlb_mask(mask);
> -
> -/* Done. */
> -for_each_vcpu ( d, v )
> -if ( v != current && flush_vcpu(ctxt, v) )
> -vcpu_unpause(v);
> +/*
> + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to 
> force an
> + * ASID/VPID change and hence accomplish a guest TLB flush. Note 
> that vCPUs
> + * not currently running will already be flushed when scheduled 
> because of
> + * the ASID tickle done in the loop above.
> + */
> +on_selected_cpus(mask, handle_flush, mask, 0);
> +while ( !cpumask_empty(mask) )
> +cpu_relax();

 Why do you pass 0 as last argument? If you passed 1, you wouldn't
 need the while() here, handle_flush() could be empty, and you'd
 save a perhaps large amount of CPUs to all try to modify two
 cache lines (the one used by on_selected_cpus() itself and the
 one here) instead of just one. Yes, latency from the last CPU
 clearing its bit to you being able to exit from here may be a
 little larger this way, but overall latency may shrink with the
 cut by half amount of atomic ops issued to the bus.
>>>
>>> In fact I think passing 0 as the last argument is fine, and
>>> handle_flush can be empty in that case anyway. We don't really care
>>> whether on_selected_cpus returns before all CPUs have executed the
>>> dummy function, as long as all of them have taken a vmexit. Using 0
>>> already guarantees that AFAICT.
>>
>> Isn't it that the caller ants to be guaranteed that the flush
>> has actually occurred (or at least that no further insns can
>> be executed prior to the flush happening, i.e. at least the VM
>> exit having occurred)?
> 
> Yes, but that would happen with 0 as the last argument already, see
> the code in smp_call_function_interrupt:
> 
> if ( call_data.wait )
> {
> (*func)(info);
> smp_mb();
> cpumask_clear_cpu(cpu, &call_data.selected);
> }
> else
> {
> smp_mb();
> cpumask_clear_cpu(cpu, &call_data.selected);
> (*func)(info);
> }
> 
> The only difference is whether on_selected_cpus can return before all
> the functions have been executed on all CPUs, or whether all CPUs have
> taken a vmexit. The later is fine for our use-case.

Oh, yes - right you are.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available

2020-02-28 Thread Jan Beulich
On 19.02.2020 18:43, Roger Pau Monne wrote:
> Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
> This greatly increases the performance of TLB flushes when running
> with a high amount of vCPUs as a Xen guest, and is specially important
> when running in shim mode.
> 
> The following figures are from a PV guest running `make -j32 xen` in
> shim mode with 32 vCPUs and HAP.
> 
> Using x2APIC and ALLBUT shorthand:
> real  4m35.973s
> user  4m35.110s
> sys   36m24.117s
> 
> Using L0 assisted flush:
> real1m2.596s
> user4m34.818s
> sys 5m16.374s
> 
> The implementation adds a new hook to hypervisor_ops so other
> enlightenments can also implement such assisted flush just by filling
> the hook. Note that the Xen implementation completely ignores the
> dirty CPU mask and the linear address passed in, and always performs a
> global TLB flush on all vCPUs.

This isn't because of an implementation choice of yours, but because
of how HVMOP_flush_tlbs works. I think the statement should somehow
express this. I also think it wants clarifying that using the
hypercall is indeed faster even in the case of single-page, single-
CPU flush (which I suspect may not be the case especially as vCPU
count grows). The stats above prove a positive overall effect, but
they don't say whether the effect could be even bigger by being at
least a little selective.

> @@ -73,6 +74,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820)
>  ops.e820_fixup(e820);
>  }
>  
> +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
> + unsigned int order)
> +{
> +if ( ops.flush_tlb )
> +return alternative_call(ops.flush_tlb, mask, va, order);
> +
> +return -ENOSYS;
> +}

Please no new -ENOSYS anywhere (except in new ports' top level
hypercall handlers).

> @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void 
> *va, unsigned int flags)
>  if ( (flags & ~FLUSH_ORDER_MASK) &&
>   !cpumask_subset(mask, cpumask_of(cpu)) )
>  {
> +if ( cpu_has_hypervisor &&
> + !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
> + FLUSH_ORDER_MASK)) &&
> + !hypervisor_flush_tlb(mask, va, (flags - 1) & FLUSH_ORDER_MASK) 
> )
> +{
> +if ( tlb_clk_enabled )
> +tlb_clk_enabled = false;

Why does this need doing here? Couldn't Xen guest setup code
clear the flag?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 05:44:57PM +0100, Jan Beulich wrote:
> On 28.02.2020 17:31, Roger Pau Monné wrote:
> > On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote:
> >> On 19.02.2020 18:43, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/mm/hap/hap.c
> >>> +++ b/xen/arch/x86/mm/hap/hap.c
> >>> @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int 
> >>> do_locking, bool noflush)
> >>>  hvm_update_guest_cr3(v, noflush);
> >>>  }
> >>>  
> >>> +/*
> >>> + * NB: doesn't actually perform any flush, used just to clear the CPU 
> >>> from the
> >>> + * mask and hence signal that the guest TLB flush has been done.
> >>> + */
> >>
> >> "has been done" isn't correct, as the flush may happen only later
> >> on (upon next VM entry). I think wording here needs to be as
> >> precise as possible, however the comment may turn out unnecessary
> >> altogether:
> > 
> > What about:
> > 
> > /*
> >  * NB: Dummy function exclusively used as a way to trigger a vmexit,
> >  * and thus force an ASID/VPID update on vmentry (that will flush the
> >  * cache).
> >  */
> 
> Once it's empty - yes, looks okay (with s/cache/TLB/).

Ack.

> >>> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, 
> >>> struct vcpu *v),
> >>>  if ( !flush_vcpu(ctxt, v) )
> >>>  continue;
> >>>  
> >>> -paging_update_cr3(v, false);
> >>> +hvm_asid_flush_vcpu(v);
> >>>  
> >>>  cpu = read_atomic(&v->dirty_cpu);
> >>> -if ( is_vcpu_dirty_cpu(cpu) )
> >>> +if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
> >>>  __cpumask_set_cpu(cpu, mask);
> >>>  }
> >>>  
> >>> -/* Flush TLBs on all CPUs with dirty vcpu state. */
> >>> -flush_tlb_mask(mask);
> >>> -
> >>> -/* Done. */
> >>> -for_each_vcpu ( d, v )
> >>> -if ( v != current && flush_vcpu(ctxt, v) )
> >>> -vcpu_unpause(v);
> >>> +/*
> >>> + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to 
> >>> force an
> >>> + * ASID/VPID change and hence accomplish a guest TLB flush. Note 
> >>> that vCPUs
> >>> + * not currently running will already be flushed when scheduled 
> >>> because of
> >>> + * the ASID tickle done in the loop above.
> >>> + */
> >>> +on_selected_cpus(mask, handle_flush, mask, 0);
> >>> +while ( !cpumask_empty(mask) )
> >>> +cpu_relax();
> >>
> >> Why do you pass 0 as last argument? If you passed 1, you wouldn't
> >> need the while() here, handle_flush() could be empty, and you'd
> >> save a perhaps large amount of CPUs to all try to modify two
> >> cache lines (the one used by on_selected_cpus() itself and the
> >> one here) instead of just one. Yes, latency from the last CPU
> >> clearing its bit to you being able to exit from here may be a
> >> little larger this way, but overall latency may shrink with the
> >> cut by half amount of atomic ops issued to the bus.
> > 
> > In fact I think passing 0 as the last argument is fine, and
> > handle_flush can be empty in that case anyway. We don't really care
> > whether on_selected_cpus returns before all CPUs have executed the
> > dummy function, as long as all of them have taken a vmexit. Using 0
> > already guarantees that AFAICT.
> 
> Isn't it that the caller ants to be guaranteed that the flush
> has actually occurred (or at least that no further insns can
> be executed prior to the flush happening, i.e. at least the VM
> exit having occurred)?

Yes, but that would happen with 0 as the last argument already, see
the code in smp_call_function_interrupt:

if ( call_data.wait )
{
(*func)(info);
smp_mb();
cpumask_clear_cpu(cpu, &call_data.selected);
}
else
{
smp_mb();
cpumask_clear_cpu(cpu, &call_data.selected);
(*func)(info);
}

The only difference is whether on_selected_cpus can return before all
the functions have been executed on all CPUs, or whether all CPUs have
taken a vmexit. The later is fine for our use-case.

> >> (Forcing an empty function to be called may seem odd, but sending
> >>  just some IPI [like the event check one] wouldn't be enough, as
> >>  you want to be sure the other side has actually received the
> >>  request.)
> > 
> > Exactly, that's the reason for the empty function.
> 
> But the function isn't empty.

It will be now, since on_selected_cpus already does the needed
synchronization as you pointed out.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 05:29:32PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/guest/hyperv/hyperv.c
> > +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> > @@ -199,7 +199,7 @@ static void __init e820_fixup(struct e820map *e820)
> >  panic("Unable to reserve Hyper-V hypercall range\n");
> >  }
> >  
> > -static const struct hypervisor_ops ops = {
> > +static const struct hypervisor_ops __initdata ops = {
> 
> This needs to be __initconstrel in order to avoid triggering
> (possibly only in the future) section mismatch warnings with
> at least some gcc versions. With this and the other instance
> adjusted

I can do that when posting a new version, unless you want to pick this
earlier and adjust on commit.

> Reviewed-by: Jan Beulich 

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 05:14:05PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > Introduce a specific flag to request a HVM guest TLB flush, which is
> > an ASID/VPID tickle that forces a linear TLB flush for all HVM guests.
> 
> Here and below, what do you mean by "linear"? I guess you mean
> TLBs holding translations from guest linear to guest physical,
> but I think this could do with then also saying so, even if it's
> more words.

Yes, will change.

> > This was previously unconditionally done in each pre_flush call, but
> > that's not required: HVM guests not using shadow don't require linear
> > TLB flushes as Xen doesn't modify the guest page tables in that case
> > (ie: when using HAP).
> 
> This explains the correctness in one direction. What about the
> removal of this from the switch_cr3_cr4() path?

AFAICT that's never used by shadow code to modify cr3 or cr4, and
hence doesn't require a guest linear TLB flush.

> And what about
> our safety assumptions from the ticking of tlbflush_clock,
> where we then imply that pages e.g. about to be freed can't
> have any translations left in any TLBs anymore?

I'm slightly confused. That flush only affects HVM guests linear TLB,
and hence is not under Xen control unless shadow mode is used. Pages
to be freed in the HAP case need to be flushed from the EPT/NPT, but
whether there are references left in the guest TLB to point to that
gfn really doesn't matter to Xen at all, since the guest is in full
control of it's MMU and TLB in that case.

For shadow any change in the guest page-tables should already be
accompanied by a guest TLB flush, or else the guest could access no
longer present entries, regardless of whether the affected pages are
freed or not.

> > --- a/xen/include/asm-x86/flushtlb.h
> > +++ b/xen/include/asm-x86/flushtlb.h
> > @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long 
> > cr4);
> >  #define FLUSH_VCPU_STATE 0x1000
> >   /* Flush the per-cpu root page table */
> >  #define FLUSH_ROOT_PGTBL 0x2000
> > + /* Flush all HVM guests linear TLB (using ASID/VPID) */
> > +#define FLUSH_GUESTS_TLB 0x4000
> 
> For one, the "all" is pretty misleading. A single such request
> doesn't do this for all vCPU-s of all HVM guests, does it?

It kind of does because it tickles the pCPU ASID/VPID generation ID,
so any vCPU scheduled on the selected pCPUs will get a new ASID/VPID
allocated and thus a clean TLB.

> I'm
> also struggling with the 'S' in "GUESTS" - why is it not just
> "GUEST"? 

Any guest vCPUs running on the selected pCPUs will get a new ASID/VPID
ID and thus a clean TLB.

> I admit the names of the involved functions
> (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat
> misleading, as they don't actually do any flushing, they merely
> arrange for what is in the TLB to no longer be able to be used,
> so giving this a suitable name is "historically" complicated.
> What if we did away with the hvm_flush_guest_tlbs() wrapper,
> naming the constant here then after hvm_asid_flush_core(), e.g.
> FLUSH_ASID_CORE?

I'm not opposed to renaming. The comment before the definition was
also meant to clarify it's usage, and hence the explicit mention of
ASID/VPID.

> I also think this constant might better be zero when !HVM.

Yes, that's a good idea.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xev-devel][PATCH][linux-4.19] swiotlb-xen.c: Fixed cache invalidation fault

2020-02-28 Thread Nataliya Korovkina
Problem: xen_swiotlb_map_sg_attrs() maps a set of buffers described in 
scatterlist.
It calls xen_dma_map_page() and sets sglist.dma_address to the address 
calculated by
xen_phys_to_bus(). Let's call it dma_address_1.

xen_dma_map_page() maps the area and gets dma address different from 
dma_address_1,
let's call it dma_address_2. Then function __swiotlb_map_page() makes 
conversion:
dma_addr -> phys_addr -> virt_addr   using dma_address_2 and passes "virt_addr"
to __dma_map_area(), the latter successfully invalidates cache.

When xen_swiotlb_unmap_sg_attrs() unmaps the area it passes dma_address_1
to __swiotlb_unmap_page().   __swiotlb_unmap_page() makes the same convertion:
dma_addr -> phys_addr -> virt_addr using dma_address_1 and passes "virt_addr"
to __dma_map_area(), the latter gets Data Abort in attempt to invalidate cache.

The problem manfests on devices that have dma_pfn_offset != 0.
Specifically, while loading brcmfmac module (BCM4345 WiFi chip) in Dom0.

What was done: sglist.dma_address is set to phys_to_dma("phys").

Signed-off-by: Nataliya Korovkina 
---
 drivers/xen/swiotlb-xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index a6f9ba85dc4b..9363a52e4756 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -583,7 +583,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct 
scatterlist *sgl,
sg->length,
dir,
attrs);
-   sg->dma_address = dev_addr;
+   sg->dma_address = phys_to_dma(hwdev, dev_addr);
} else {
/* we are not interested in the dma_addr returned by
 * xen_dma_map_page, only in the potential cache 
flushes executed
@@ -594,7 +594,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct 
scatterlist *sgl,
sg->length,
dir,
attrs);
-   sg->dma_address = dev_addr;
+   sg->dma_address = phys_to_dma(hwdev, dev_addr);
}
sg_dma_len(sg) = sg->length;
}
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush

2020-02-28 Thread Jan Beulich
On 28.02.2020 17:31, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote:
>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/mm/hap/hap.c
>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>> @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int 
>>> do_locking, bool noflush)
>>>  hvm_update_guest_cr3(v, noflush);
>>>  }
>>>  
>>> +/*
>>> + * NB: doesn't actually perform any flush, used just to clear the CPU from 
>>> the
>>> + * mask and hence signal that the guest TLB flush has been done.
>>> + */
>>
>> "has been done" isn't correct, as the flush may happen only later
>> on (upon next VM entry). I think wording here needs to be as
>> precise as possible, however the comment may turn out unnecessary
>> altogether:
> 
> What about:
> 
> /*
>  * NB: Dummy function exclusively used as a way to trigger a vmexit,
>  * and thus force an ASID/VPID update on vmentry (that will flush the
>  * cache).
>  */

Once it's empty - yes, looks okay (with s/cache/TLB/).

>>> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, 
>>> struct vcpu *v),
>>>  if ( !flush_vcpu(ctxt, v) )
>>>  continue;
>>>  
>>> -paging_update_cr3(v, false);
>>> +hvm_asid_flush_vcpu(v);
>>>  
>>>  cpu = read_atomic(&v->dirty_cpu);
>>> -if ( is_vcpu_dirty_cpu(cpu) )
>>> +if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
>>>  __cpumask_set_cpu(cpu, mask);
>>>  }
>>>  
>>> -/* Flush TLBs on all CPUs with dirty vcpu state. */
>>> -flush_tlb_mask(mask);
>>> -
>>> -/* Done. */
>>> -for_each_vcpu ( d, v )
>>> -if ( v != current && flush_vcpu(ctxt, v) )
>>> -vcpu_unpause(v);
>>> +/*
>>> + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to 
>>> force an
>>> + * ASID/VPID change and hence accomplish a guest TLB flush. Note that 
>>> vCPUs
>>> + * not currently running will already be flushed when scheduled 
>>> because of
>>> + * the ASID tickle done in the loop above.
>>> + */
>>> +on_selected_cpus(mask, handle_flush, mask, 0);
>>> +while ( !cpumask_empty(mask) )
>>> +cpu_relax();
>>
>> Why do you pass 0 as last argument? If you passed 1, you wouldn't
>> need the while() here, handle_flush() could be empty, and you'd
>> save a perhaps large amount of CPUs to all try to modify two
>> cache lines (the one used by on_selected_cpus() itself and the
>> one here) instead of just one. Yes, latency from the last CPU
>> clearing its bit to you being able to exit from here may be a
>> little larger this way, but overall latency may shrink with the
>> cut by half amount of atomic ops issued to the bus.
> 
> In fact I think passing 0 as the last argument is fine, and
> handle_flush can be empty in that case anyway. We don't really care
> whether on_selected_cpus returns before all CPUs have executed the
> dummy function, as long as all of them have taken a vmexit. Using 0
> already guarantees that AFAICT.

Isn't it that the caller ants to be guaranteed that the flush
has actually occurred (or at least that no further insns can
be executed prior to the flush happening, i.e. at least the VM
exit having occurred)?

>> (Forcing an empty function to be called may seem odd, but sending
>>  just some IPI [like the event check one] wouldn't be enough, as
>>  you want to be sure the other side has actually received the
>>  request.)
> 
> Exactly, that's the reason for the empty function.

But the function isn't empty.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks

2020-02-28 Thread Jan Beulich
On 28.02.2020 17:19, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 03:50:31PM +0100, Jan Beulich wrote:
>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>> Add shadow and hap implementation specific helpers to perform guest
>>> TLB flushes. Note that the code for both is exactly the same at the
>>> moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
>>> further patches that will add implementation specific optimizations to
>>> them.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Roger Pau Monné 
>>> Reviewed-by: Wei Liu 
>>> Acked-by: Tim Deegan 
>>
>> This looks good in principle, with one possible anomaly:
>>
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -3990,55 +3990,10 @@ static void hvm_s3_resume(struct domain *d)
>>>  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>>>  void *ctxt)
>>>  {
>>> -static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
>>> -cpumask_t *mask = &this_cpu(flush_cpumask);
>>> -struct domain *d = current->domain;
>>> -struct vcpu *v;
>>> -
>>> -/* Avoid deadlock if more than one vcpu tries this at the same time. */
>>> -if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
>>> -return false;
>>> -
>>> -/* Pause all other vcpus. */
>>> -for_each_vcpu ( d, v )
>>> -if ( v != current && flush_vcpu(ctxt, v) )
>>> -vcpu_pause_nosync(v);
>>> -
>>> -/* Now that all VCPUs are signalled to deschedule, we wait... */
>>> -for_each_vcpu ( d, v )
>>> -if ( v != current && flush_vcpu(ctxt, v) )
>>> -while ( !vcpu_runnable(v) && v->is_running )
>>> -cpu_relax();
>>> -
>>> -/* All other vcpus are paused, safe to unlock now. */
>>> -spin_unlock(&d->hypercall_deadlock_mutex);
>>> -
>>> -cpumask_clear(mask);
>>> -
>>> -/* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). 
>>> */
>>> -for_each_vcpu ( d, v )
>>> -{
>>> -unsigned int cpu;
>>> -
>>> -if ( !flush_vcpu(ctxt, v) )
>>> -continue;
>>> -
>>> -paging_update_cr3(v, false);
>>> +struct domain *currd = current->domain;
>>>  
>>> -cpu = read_atomic(&v->dirty_cpu);
>>> -if ( is_vcpu_dirty_cpu(cpu) )
>>> -__cpumask_set_cpu(cpu, mask);
>>> -}
>>> -
>>> -/* Flush TLBs on all CPUs with dirty vcpu state. */
>>> -flush_tlb_mask(mask);
>>> -
>>> -/* Done. */
>>> -for_each_vcpu ( d, v )
>>> -if ( v != current && flush_vcpu(ctxt, v) )
>>> -vcpu_unpause(v);
>>> -
>>> -return true;
>>> +return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt)
>>> +  : hap_flush_tlb(flush_vcpu, ctxt);
>>>  }
>>
>> Following our current model I think this should be a new pointer
>> in struct paging_mode (then truly fitting "hooks" in the title).
> 
> I tried doing it that way, but there was something weird about it, the
> paging mode is per-vcpu, and hence I needed to do something like:
> 
> paging_get_hostmode(current)->flush(current->domain, ...)

I don't see anything wrong with the left side of the -> (it
parallels what is needed for the write_p2m_entry() hook). For
the right I can't see why you'd want to have current->domain
there when both functions want flush_vcpu and ctxt. Ultimately
we probably want per-vCPU and per-domain hooks in separate
structures (the former for hooks where the current paging mode
matters, the latter for those where it doesn't matter), but of
course that's nothing I'm meaning to ask you to do.

> I can try to move it to being a paging_mode hook if you prefer.

It would seem cleaner to me, but ...

>> I can see the desire to avoid the indirect call though, but I
>> also think that if we were to go that route, we should settle on
>> switching around others as well which are paging mode dependent.
>> (FAOD this is nothing I ask you to do here.) Andrew, thoughts?

... as said I'd prefer to also know Andrew's opinion, in
particular to settle where we would want this to move in the
mid to long term. Whereas ...

> I think it's already quite of a mixed bag, see track_dirty_vram for
> example which uses a similar model.

... you probably know my typical response to something like
this: Bad examples aren't to be taken as excuse to introduce
further inconsistencies.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/mm/hap/hap.c
> > +++ b/xen/arch/x86/mm/hap/hap.c
> > @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int 
> > do_locking, bool noflush)
> >  hvm_update_guest_cr3(v, noflush);
> >  }
> >  
> > +/*
> > + * NB: doesn't actually perform any flush, used just to clear the CPU from 
> > the
> > + * mask and hence signal that the guest TLB flush has been done.
> > + */
> 
> "has been done" isn't correct, as the flush may happen only later
> on (upon next VM entry). I think wording here needs to be as
> precise as possible, however the comment may turn out unnecessary
> altogether:

What about:

/*
 * NB: Dummy function exclusively used as a way to trigger a vmexit,
 * and thus force an ASID/VPID update on vmentry (that will flush the
 * cache).
 */

> > @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, 
> > struct vcpu *v),
> >  if ( !flush_vcpu(ctxt, v) )
> >  continue;
> >  
> > -paging_update_cr3(v, false);
> > +hvm_asid_flush_vcpu(v);
> >  
> >  cpu = read_atomic(&v->dirty_cpu);
> > -if ( is_vcpu_dirty_cpu(cpu) )
> > +if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
> >  __cpumask_set_cpu(cpu, mask);
> >  }
> >  
> > -/* Flush TLBs on all CPUs with dirty vcpu state. */
> > -flush_tlb_mask(mask);
> > -
> > -/* Done. */
> > -for_each_vcpu ( d, v )
> > -if ( v != current && flush_vcpu(ctxt, v) )
> > -vcpu_unpause(v);
> > +/*
> > + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to 
> > force an
> > + * ASID/VPID change and hence accomplish a guest TLB flush. Note that 
> > vCPUs
> > + * not currently running will already be flushed when scheduled 
> > because of
> > + * the ASID tickle done in the loop above.
> > + */
> > +on_selected_cpus(mask, handle_flush, mask, 0);
> > +while ( !cpumask_empty(mask) )
> > +cpu_relax();
> 
> Why do you pass 0 as last argument? If you passed 1, you wouldn't
> need the while() here, handle_flush() could be empty, and you'd
> save a perhaps large amount of CPUs to all try to modify two
> cache lines (the one used by on_selected_cpus() itself and the
> one here) instead of just one. Yes, latency from the last CPU
> clearing its bit to you being able to exit from here may be a
> little larger this way, but overall latency may shrink with the
> cut by half amount of atomic ops issued to the bus.

In fact I think passing 0 as the last argument is fine, and
handle_flush can be empty in that case anyway. We don't really care
whether on_selected_cpus returns before all CPUs have executed the
dummy function, as long as all of them have taken a vmexit. Using 0
already guarantees that AFAICT.

> (Forcing an empty function to be called may seem odd, but sending
>  just some IPI [like the event check one] wouldn't be enough, as
>  you want to be sure the other side has actually received the
>  request.)

Exactly, that's the reason for the empty function.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls

2020-02-28 Thread Jan Beulich
On 19.02.2020 18:43, Roger Pau Monne wrote:
> --- a/xen/arch/x86/guest/hyperv/hyperv.c
> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> @@ -199,7 +199,7 @@ static void __init e820_fixup(struct e820map *e820)
>  panic("Unable to reserve Hyper-V hypercall range\n");
>  }
>  
> -static const struct hypervisor_ops ops = {
> +static const struct hypervisor_ops __initdata ops = {

This needs to be __initconstrel in order to avoid triggering
(possibly only in the future) section mismatch warnings with
at least some gcc versions. With this and the other instance
adjusted
Reviewed-by: Jan Beulich 

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 5/7] x86/tlb: allow disabling the TLB clock

2020-02-28 Thread Jan Beulich
On 19.02.2020 18:43, Roger Pau Monne wrote:
> The TLB clock is helpful when running Xen on bare metal because when
> doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
> last flush.
> 
> This is not the case however when Xen is running virtualized, and the
> underlying hypervisor provides mechanism to assist in performing TLB
> flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
> order to perform a TLB flush without having to IPI each CPU. When
> using such mechanisms it's no longer possible to keep a timestamp of
> the flushes on each CPU, as they are performed by the underlying
> hypervisor.
> 
> Offer a boolean in order to signal Xen that the timestamped TLB
> shouldn't be used. This avoids keeping the timestamps of the flushes,
> and also forces NEED_FLUSH to always return true.
> 
> No functional change intended, as this change doesn't introduce any
> user that disables the timestamped TLB.
> 
> Signed-off-by: Roger Pau Monné 
> Reviewed-by: Wei Liu 

Acked-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 04:47:57PM +0100, Jan Beulich wrote:
> On 28.02.2020 16:27, Roger Pau Monné wrote:
> > On Fri, Feb 28, 2020 at 02:29:09PM +0100, Jan Beulich wrote:
> >> On 19.02.2020 18:43, Roger Pau Monne wrote:
> >>> Current implementation of hvm_asid_flush_vcpu is not safe to use
> >>> unless the target vCPU is either paused or the currently running one,
> >>> as it modifies the generation without any locking.
> >>
> >> Indeed, but the issue you're taking care of is highly theoretical:
> >> I don't think any sane compiler will split writes of the fields
> >> to multiple insns. It would be nice if this was made clear here.
> > 
> > What about adding:
> > 
> >>> Fix this by using atomic operations when accessing the generation
> >>> field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
> >>> allows to safely flush the current ASID generation. Note that for the
> >>> flush to take effect if the vCPU is currently running a vmexit is
> >>> required.
> > 
> > "Most compilers will already do such writes and reads as a single
> > instruction, so the usage of atomic operations is mostly used as a
> > safety measure."
> > 
> > Here?
> 
> Could you perhaps start with "Compilers will normally ..." I'm fine
> with the rest, it's just that "most compilers" still feels like
> an understatement.

Sure, that's fine.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 03:50:31PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > Add shadow and hap implementation specific helpers to perform guest
> > TLB flushes. Note that the code for both is exactly the same at the
> > moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
> > further patches that will add implementation specific optimizations to
> > them.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Roger Pau Monné 
> > Reviewed-by: Wei Liu 
> > Acked-by: Tim Deegan 
> 
> This looks good in principle, with one possible anomaly:
> 
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -3990,55 +3990,10 @@ static void hvm_s3_resume(struct domain *d)
> >  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> >  void *ctxt)
> >  {
> > -static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
> > -cpumask_t *mask = &this_cpu(flush_cpumask);
> > -struct domain *d = current->domain;
> > -struct vcpu *v;
> > -
> > -/* Avoid deadlock if more than one vcpu tries this at the same time. */
> > -if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
> > -return false;
> > -
> > -/* Pause all other vcpus. */
> > -for_each_vcpu ( d, v )
> > -if ( v != current && flush_vcpu(ctxt, v) )
> > -vcpu_pause_nosync(v);
> > -
> > -/* Now that all VCPUs are signalled to deschedule, we wait... */
> > -for_each_vcpu ( d, v )
> > -if ( v != current && flush_vcpu(ctxt, v) )
> > -while ( !vcpu_runnable(v) && v->is_running )
> > -cpu_relax();
> > -
> > -/* All other vcpus are paused, safe to unlock now. */
> > -spin_unlock(&d->hypercall_deadlock_mutex);
> > -
> > -cpumask_clear(mask);
> > -
> > -/* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). 
> > */
> > -for_each_vcpu ( d, v )
> > -{
> > -unsigned int cpu;
> > -
> > -if ( !flush_vcpu(ctxt, v) )
> > -continue;
> > -
> > -paging_update_cr3(v, false);
> > +struct domain *currd = current->domain;
> >  
> > -cpu = read_atomic(&v->dirty_cpu);
> > -if ( is_vcpu_dirty_cpu(cpu) )
> > -__cpumask_set_cpu(cpu, mask);
> > -}
> > -
> > -/* Flush TLBs on all CPUs with dirty vcpu state. */
> > -flush_tlb_mask(mask);
> > -
> > -/* Done. */
> > -for_each_vcpu ( d, v )
> > -if ( v != current && flush_vcpu(ctxt, v) )
> > -vcpu_unpause(v);
> > -
> > -return true;
> > +return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt)
> > +  : hap_flush_tlb(flush_vcpu, ctxt);
> >  }
> 
> Following our current model I think this should be a new pointer
> in struct paging_mode (then truly fitting "hooks" in the title).

I tried doing it that way, but there was something weird about it, the
paging mode is per-vcpu, and hence I needed to do something like:

paging_get_hostmode(current)->flush(current->domain, ...)

I can try to move it to being a paging_mode hook if you prefer.

> I can see the desire to avoid the indirect call though, but I
> also think that if we were to go that route, we should settle on
> switching around others as well which are paging mode dependent.
> (FAOD this is nothing I ask you to do here.) Andrew, thoughts?

I think it's already quite of a mixed bag, see track_dirty_vram for
example which uses a similar model.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag

2020-02-28 Thread Jan Beulich
On 19.02.2020 18:43, Roger Pau Monne wrote:
> Introduce a specific flag to request a HVM guest TLB flush, which is
> an ASID/VPID tickle that forces a linear TLB flush for all HVM guests.

Here and below, what do you mean by "linear"? I guess you mean
TLBs holding translations from guest linear to guest physical,
but I think this could do with then also saying so, even if it's
more words.

> This was previously unconditionally done in each pre_flush call, but
> that's not required: HVM guests not using shadow don't require linear
> TLB flushes as Xen doesn't modify the guest page tables in that case
> (ie: when using HAP).

This explains the correctness in one direction. What about the
removal of this from the switch_cr3_cr4() path? And what about
our safety assumptions from the ticking of tlbflush_clock,
where we then imply that pages e.g. about to be freed can't
have any translations left in any TLBs anymore?

> --- a/xen/include/asm-x86/flushtlb.h
> +++ b/xen/include/asm-x86/flushtlb.h
> @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
>  #define FLUSH_VCPU_STATE 0x1000
>   /* Flush the per-cpu root page table */
>  #define FLUSH_ROOT_PGTBL 0x2000
> + /* Flush all HVM guests linear TLB (using ASID/VPID) */
> +#define FLUSH_GUESTS_TLB 0x4000

For one, the "all" is pretty misleading. A single such request
doesn't do this for all vCPU-s of all HVM guests, does it? I'm
also struggling with the 'S' in "GUESTS" - why is it not just
"GUEST"? I admit the names of the involved functions
(hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat
misleading, as they don't actually do any flushing, they merely
arrange for what is in the TLB to no longer be able to be used,
so giving this a suitable name is "historically" complicated.
What if we did away with the hvm_flush_guest_tlbs() wrapper,
naming the constant here then after hvm_asid_flush_core(), e.g.
FLUSH_ASID_CORE?

I also think this constant might better be zero when !HVM.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current

2020-02-28 Thread Jan Beulich
On 28.02.2020 16:27, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 02:29:09PM +0100, Jan Beulich wrote:
>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>> Current implementation of hvm_asid_flush_vcpu is not safe to use
>>> unless the target vCPU is either paused or the currently running one,
>>> as it modifies the generation without any locking.
>>
>> Indeed, but the issue you're taking care of is highly theoretical:
>> I don't think any sane compiler will split writes of the fields
>> to multiple insns. It would be nice if this was made clear here.
> 
> What about adding:
> 
>>> Fix this by using atomic operations when accessing the generation
>>> field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
>>> allows to safely flush the current ASID generation. Note that for the
>>> flush to take effect if the vCPU is currently running a vmexit is
>>> required.
> 
> "Most compilers will already do such writes and reads as a single
> instruction, so the usage of atomic operations is mostly used as a
> safety measure."
> 
> Here?

Could you perhaps start with "Compilers will normally ..." I'm fine
with the rest, it's just that "most compilers" still feels like
an understatement.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/ioperm: add new paravirt function update_io_bitmap

2020-02-28 Thread Jürgen Groß

Friendly ping...

On 18.02.20 16:47, Juergen Gross wrote:

Commit 111e7b15cf10f6 ("x86/ioperm: Extend IOPL config to control
ioperm() as well") reworked the iopl syscall to use I/O bitmaps.

Unfortunately this broke Xen PV domains using that syscall as there
is currently no I/O bitmap support in PV domains.

Add I/O bitmap support via a new paravirt function update_io_bitmap
which Xen PV domains can use to update their I/O bitmaps via a
hypercall.

Fixes: 111e7b15cf10f6 ("x86/ioperm: Extend IOPL config to control ioperm() as 
well")
Reported-by: Jan Beulich 
Cc:  # 5.5
Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
Tested-by: Jan Beulich 
---
  arch/x86/include/asm/io_bitmap.h  |  9 -
  arch/x86/include/asm/paravirt.h   |  7 +++
  arch/x86/include/asm/paravirt_types.h |  4 
  arch/x86/kernel/paravirt.c|  5 +
  arch/x86/kernel/process.c |  2 +-
  arch/x86/xen/enlighten_pv.c   | 25 +
  6 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/io_bitmap.h b/arch/x86/include/asm/io_bitmap.h
index 02c6ef8f7667..07344d82e88e 100644
--- a/arch/x86/include/asm/io_bitmap.h
+++ b/arch/x86/include/asm/io_bitmap.h
@@ -19,7 +19,14 @@ struct task_struct;
  void io_bitmap_share(struct task_struct *tsk);
  void io_bitmap_exit(void);
  
-void tss_update_io_bitmap(void);

+void native_tss_update_io_bitmap(void);
+
+#ifdef CONFIG_PARAVIRT_XXL
+#include 
+#else
+#define tss_update_io_bitmap native_tss_update_io_bitmap
+#endif
+
  #else
  static inline void io_bitmap_share(struct task_struct *tsk) { }
  static inline void io_bitmap_exit(void) { }
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 86e7317eb31f..694d8daf4983 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -295,6 +295,13 @@ static inline void write_idt_entry(gate_desc *dt, int 
entry, const gate_desc *g)
PVOP_VCALL3(cpu.write_idt_entry, dt, entry, g);
  }
  
+#ifdef CONFIG_X86_IOPL_IOPERM

+static inline void tss_update_io_bitmap(void)
+{
+   PVOP_VCALL0(cpu.update_io_bitmap);
+}
+#endif
+
  static inline void paravirt_activate_mm(struct mm_struct *prev,
struct mm_struct *next)
  {
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 84812964d3dd..732f62e04ddb 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -140,6 +140,10 @@ struct pv_cpu_ops {
  
  	void (*load_sp0)(unsigned long sp0);
  
+#ifdef CONFIG_X86_IOPL_IOPERM

+   void (*update_io_bitmap)(void);
+#endif
+
void (*wbinvd)(void);
  
  	/* cpuid emulation, mostly so that caps bits can be disabled */

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 789f5e4f89de..c131ba4e70ef 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -30,6 +30,7 @@
  #include 
  #include 
  #include 
+#include 
  
  /*

   * nop stub, which must not clobber anything *including the stack* to
@@ -341,6 +342,10 @@ struct paravirt_patch_template pv_ops = {
.cpu.iret   = native_iret,
.cpu.swapgs = native_swapgs,
  
+#ifdef CONFIG_X86_IOPL_IOPERM

+   .cpu.update_io_bitmap   = native_tss_update_io_bitmap,
+#endif
+
.cpu.start_context_switch   = paravirt_nop,
.cpu.end_context_switch = paravirt_nop,
  
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c

index 839b5244e3b7..3053c85e0e42 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,7 +374,7 @@ static void tss_copy_io_bitmap(struct tss_struct *tss, 
struct io_bitmap *iobm)
  /**
   * tss_update_io_bitmap - Update I/O bitmap before exiting to usermode
   */
-void tss_update_io_bitmap(void)
+void native_tss_update_io_bitmap(void)
  {
struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
struct thread_struct *t = ¤t->thread;
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 1f756e8b..feaf2e68ee5c 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -72,6 +72,9 @@
  #include 
  #include 
  #include 
+#ifdef CONFIG_X86_IOPL_IOPERM
+#include 
+#endif
  
  #ifdef CONFIG_ACPI

  #include 
@@ -837,6 +840,25 @@ static void xen_load_sp0(unsigned long sp0)
this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0);
  }
  
+#ifdef CONFIG_X86_IOPL_IOPERM

+static void xen_update_io_bitmap(void)
+{
+   struct physdev_set_iobitmap iobitmap;
+   struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
+
+   native_tss_update_io_bitmap();
+
+   iobitmap.bitmap = (uint8_t *)(&tss->x86_tss) +
+ tss->x86_tss.io_bitmap_base;
+   if (tss->x86_tss.io_bitmap_base == IO_BITMAP_OFFSET_INVALID)
+   iobitmap.nr_ports = 0;
+   else
+   iobitmap.nr_ports = IO_BITMAP_BITS;
+
+ 

Re: [Xen-devel] [PATCH] x86/mm: fix dump_pagetables with Xen PV

2020-02-28 Thread Jürgen Groß

Friendly ping...

On 21.02.20 11:38, Juergen Gross wrote:

Commit 2ae27137b2db89 ("x86: mm: convert dump_pagetables to use
walk_page_range") broke Xen PV guests as the hypervisor reserved hole
in the memory map was not taken into account.

Fix that by starting the kernel range only at GUARD_HOLE_END_ADDR.

Fixes: 2ae27137b2db89 ("x86: mm: convert dump_pagetables to use 
walk_page_range")
Reported-by: Julien Grall 
Signed-off-by: Juergen Gross 
---
  arch/x86/mm/dump_pagetables.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 64229dad7eab..69309cd56fdf 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -363,13 +363,8 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m,
  {
const struct ptdump_range ptdump_ranges[] = {
  #ifdef CONFIG_X86_64
-
-#define normalize_addr_shift (64 - (__VIRTUAL_MASK_SHIFT + 1))
-#define normalize_addr(u) ((signed long)((u) << normalize_addr_shift) >> \
-  normalize_addr_shift)
-
{0, PTRS_PER_PGD * PGD_LEVEL_MULT / 2},
-   {normalize_addr(PTRS_PER_PGD * PGD_LEVEL_MULT / 2), ~0UL},
+   {GUARD_HOLE_END_ADDR, ~0UL},
  #else
{0, ~0UL},
  #endif




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 02:29:09PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > Current implementation of hvm_asid_flush_vcpu is not safe to use
> > unless the target vCPU is either paused or the currently running one,
> > as it modifies the generation without any locking.
> 
> Indeed, but the issue you're taking care of is highly theoretical:
> I don't think any sane compiler will split writes of the fields
> to multiple insns. It would be nice if this was made clear here.

What about adding:

> > Fix this by using atomic operations when accessing the generation
> > field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
> > allows to safely flush the current ASID generation. Note that for the
> > flush to take effect if the vCPU is currently running a vmexit is
> > required.

"Most compilers will already do such writes and reads as a single
instruction, so the usage of atomic operations is mostly used as a
safety measure."

Here?

> > Note the same could be achieved by introducing an extra field to
> > hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call
> > hvm_asid_flush_vcpu on the given vCPU before vmentry, this however
> > seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU
> > fields to 0, so there's no need to delay this to the vmentry ASID
> > helper.
> > 
> > This is not a bugfix as no callers that would violate the assumptions
> > listed in the first paragraph have been found, but a preparatory
> > change in order to allow remote flushing of HVM vCPUs.
> > 
> > Signed-off-by: Roger Pau Monné 
> > Reviewed-by: Wei Liu 
> 
> With a suitable clarification added
> Acked-by: Jan Beulich 

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 01:39:59PM +0100, Jan Beulich wrote:
> On 28.02.2020 13:31, Roger Pau Monné wrote:
> > On Fri, Feb 28, 2020 at 01:12:03PM +0100, Jan Beulich wrote:
> >> --- a/xen/arch/x86/genapic/x2apic.c
> >> +++ b/xen/arch/x86/genapic/x2apic.c
> >> @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic
> >>  x2apic_phys = !iommu_intremap ||
> >>(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL);
> >>  }
> >> -else if ( !x2apic_phys && !iommu_intremap )
> >> -{
> >> -printk("WARNING: x2APIC cluster mode is not supported without 
> >> interrupt remapping\n"
> >> -   "x2APIC: forcing phys mode\n");
> >> -x2apic_phys = true;
> >> -}
> >> +else if ( !x2apic_phys )
> >> +switch ( iommu_intremap )
> >> +{
> >> +case iommu_intremap_off:
> >> +case iommu_intremap_restricted:
> >> +printk("WARNING: x2APIC cluster mode is not supported %s 
> >> interrupt remapping\n"
> >> +   "x2APIC: forcing phys mode\n",
> >> +   iommu_intremap == iommu_intremap_off ? "without"
> >> +: "with 
> >> restricted");
> >> +x2apic_phys = true;
> > 
> > I think you also need to fixup the usage of iommu_intremap in __cpu_up
> > so that CPUs with APIC IDs > 255 are not brought up when in
> > iommu_intremap_restricted mode.
> 
> That certainly wants changing, yes, but I view this as an orthogonal
> adjustment, which I'd like to make only once I understand what the
> behavior for APIC ID 0xff should be in this setup.

I would say APIC ID 0xff should be the broadcast ID, or else remapped
interrupts won't be able to use a broadcast destination? I'm however
not able to find any mention to this in the AMD-Vi spec.

So the check in __cpu_up should be adjusted to iommu_intremap !=
iommu_intremap_full I think, or else you won't be able to address the
CPUs brought up from the interrupt remapping tables.

Anyway, the change looks fine, so:

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

2020-02-28 Thread Jan Beulich
On 28.02.2020 15:48, Andrew Cooper wrote:
> On 28/02/2020 12:12, Jan Beulich wrote:
>> The wider cluster mode APIC IDs aren't generally representable. Convert
>> the iommu_intremap variable into a tristate, allowing the AMD IOMMU
>> driver to signal this special restriction to the apic_x2apic_probe().
>> (Note: assignments to the variable get adjusted, while existing
>> consumers - all assuming a boolean property - are left alone.)
> 
> I think it would be helpful to state that while we are not aware of any
> hardware with this as a restriction, it is a situation which can be
> created on fully x2apic-capable systems via firmware settings.

Added.

>> Signed-off-by: Jan Beulich 
>> ---
>> v2: New.
>>
>> --- a/xen/arch/x86/genapic/x2apic.c
>> +++ b/xen/arch/x86/genapic/x2apic.c
>> @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic
>>  x2apic_phys = !iommu_intremap ||
>>(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL);
>>  }
>> -else if ( !x2apic_phys && !iommu_intremap )
>> -{
>> -printk("WARNING: x2APIC cluster mode is not supported without 
>> interrupt remapping\n"
>> -   "x2APIC: forcing phys mode\n");
>> -x2apic_phys = true;
>> -}
>> +else if ( !x2apic_phys )
>> +switch ( iommu_intremap )
>> +{
>> +case iommu_intremap_off:
>> +case iommu_intremap_restricted:
>> +printk("WARNING: x2APIC cluster mode is not supported %s 
>> interrupt remapping\n"
>> +   "x2APIC: forcing phys mode\n",
> 
> Any chance to fold this into a single line with "- forcing phys mode\n"
> as a suffix?

I did consider doing so myself, but didn't do it then for being
an unrelated change. Now that you ask for it - done.

>> +   iommu_intremap == iommu_intremap_off ? "without"
>> +: "with 
>> restricted");
>> +x2apic_phys = true;
>> +break;
>> +
>> +case iommu_intremap_full:
>> +break;
>> +}
>>  
>>  if ( x2apic_phys )
>>  return &apic_x2apic_phys;
>> --- a/xen/include/xen/iommu.h
>> +++ b/xen/include/xen/iommu.h
>> @@ -54,7 +54,18 @@ static inline bool_t dfn_eq(dfn_t x, dfn
>>  
>>  extern bool_t iommu_enable, iommu_enabled;
>>  extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx;
>> -extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
>> +extern bool_t iommu_snoop, iommu_qinval, iommu_intpost;
>> +extern enum __packed iommu_intremap {
>> +   /*
>> +* In order to allow traditional boolean uses of the iommu_intremap
>> +* variable, the "off" value has to come first (yielding a value of 
>> zero).
>> +*/
>> +   iommu_intremap_off,
>> +#ifdef CONFIG_X86
>> +   iommu_intremap_restricted,
> 
> This needs a note about its meaning.  How about this?
> 
> /* Interrupt remapping enabled, but only able to generate interrupts
> with an 8-bit APIC ID. */

Added.

> Otherwise, Reviewed-by: Andrew Cooper 

Thanks.

> Not an issue for now, but "restricted" might become ambiguous with
> future extensions.

Yes, fair point.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [ovmf test] 147686: regressions - FAIL

2020-02-28 Thread osstest service owner
flight 147686 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/147686/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 
145767
 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 145767

version targeted for testing:
 ovmf edfe16a6d9f8c6830d7ad93ee7616225fe4e9c13
baseline version:
 ovmf 70911f1f4aee0366b6122f2b90d367ec0f066beb

Last test of basis   145767  2020-01-08 00:39:09 Z   51 days
Failing since145774  2020-01-08 02:50:20 Z   51 days  133 attempts
Testing same since   147686  2020-02-27 12:29:19 Z1 days1 attempts


People who touched revisions under test:
  Aaron Li 
  Albecki, Mateusz 
  Amol N Sukerkar 
  Anthony PERARD 
  Antoine Coeur 
  Ard Biesheuvel 
  Ashish Singhal 
  Bob Feng 
  Bret Barkelew 
  Brian R Haug 
  Chasel Chiu 
  Dandan Bi 
  Eric Dong 
  Fan, ZhijuX 
  Felix Polyudov 
  Guo Dong 
  GuoMinJ 
  Hao A Wu 
  Heinrich Schuchardt 
  Heng Luo 
  Jason Voelz 
  Jeff Brasen 
  Jian J Wang 
  Jiaxin Wu 
  Kinney, Michael D 
  Krzysztof Koch 
  Laszlo Ersek 
  Leif Lindholm 
  Li, Aaron 
  Liming Gao 
  Liu, Zhiguang 
  Mateusz Albecki 
  Matthew Carlson 
  Michael D Kinney 
  Michael Kubacki 
  Nicholas Armour 
  Pavana.K 
  Philippe Mathieu-Daud? 
  Philippe Mathieu-Daude 
  Philippe Mathieu-Daudé 
  Philippe Mathieu-Daudé 
  Pierre Gondois 
  Ray Ni 
  Sami Mujawar 
  Sean Brogan 
  Siyuan Fu 
  Siyuan, Fu 
  Star Zeng 
  Steven 
  Steven Shi 
  Sudipto Paul 
  Vitaly Cheptsov 
  Vitaly Cheptsov via Groups.Io 
  Wei6 Xu 
  Wu Jiaxin 
  Xu, Wei6 
  Zeng, Star 
  Zhichao Gao 
  Zhiguang Liu 
  Zhiju.Fan 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64  fail



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 5772 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 07/17] libxc/restore: STATIC_DATA_END inference for v2 compatibility

2020-02-28 Thread Andrew Cooper
On 24/02/2020 17:32, Ian Jackson wrote:
> Andrew Cooper writes ("[PATCH v2 07/17] libxc/restore: STATIC_DATA_END 
> inference for v2 compatibility"):
>> A v3 stream can compatibly read a v2 stream by inferring the position of the
>> STATIC_DATA_END record.
>>
>> v2 compatibility is only needed for x86.  No other architectures exist yet,
>> but they will have a minimum of v3 when introduced.
>>
>> The x86 HVM compatibility point being in handle_page_data() (which is common
>> code) is a bit awkward.  However, as the two compatibility points are subtly
>> different, and it is (intentionally) not possible to call into arch specific
>> code from common code (except via the ops hooks), use some #ifdef-ary and
>> opencode the check, rather than make handle_page_data() a per-arch helper.
> ...
>> +#if defined(__i386__) || defined(__x86_64__)
>> +/* v2 compat.  Infer the position of STATIC_DATA_END. */
>> +if ( ctx->restore.format_version < 3 && 
>> !ctx->restore.seen_static_data_end )
>> +{
>> +rc = handle_static_data_end(ctx);
>> +if ( rc )
> These 17 lines appears twice, in basically identical form.  Could it
> be refactored ?

Not really, no.

The error handling (i.e. half of those 17 lines) is different, making it
somewhat awkward to fit into a static inline.

More importantly however, by design, common code can't call
arch-specific code without a restore_ops hook.  Deduping these would
require breaking the restriction which is currently doing a decent job
of keeping x86-isms out of common code.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks

2020-02-28 Thread Jan Beulich
On 19.02.2020 18:43, Roger Pau Monne wrote:
> Add shadow and hap implementation specific helpers to perform guest
> TLB flushes. Note that the code for both is exactly the same at the
> moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
> further patches that will add implementation specific optimizations to
> them.
> 
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné 
> Reviewed-by: Wei Liu 
> Acked-by: Tim Deegan 

This looks good in principle, with one possible anomaly:

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3990,55 +3990,10 @@ static void hvm_s3_resume(struct domain *d)
>  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>  void *ctxt)
>  {
> -static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
> -cpumask_t *mask = &this_cpu(flush_cpumask);
> -struct domain *d = current->domain;
> -struct vcpu *v;
> -
> -/* Avoid deadlock if more than one vcpu tries this at the same time. */
> -if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
> -return false;
> -
> -/* Pause all other vcpus. */
> -for_each_vcpu ( d, v )
> -if ( v != current && flush_vcpu(ctxt, v) )
> -vcpu_pause_nosync(v);
> -
> -/* Now that all VCPUs are signalled to deschedule, we wait... */
> -for_each_vcpu ( d, v )
> -if ( v != current && flush_vcpu(ctxt, v) )
> -while ( !vcpu_runnable(v) && v->is_running )
> -cpu_relax();
> -
> -/* All other vcpus are paused, safe to unlock now. */
> -spin_unlock(&d->hypercall_deadlock_mutex);
> -
> -cpumask_clear(mask);
> -
> -/* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
> -for_each_vcpu ( d, v )
> -{
> -unsigned int cpu;
> -
> -if ( !flush_vcpu(ctxt, v) )
> -continue;
> -
> -paging_update_cr3(v, false);
> +struct domain *currd = current->domain;
>  
> -cpu = read_atomic(&v->dirty_cpu);
> -if ( is_vcpu_dirty_cpu(cpu) )
> -__cpumask_set_cpu(cpu, mask);
> -}
> -
> -/* Flush TLBs on all CPUs with dirty vcpu state. */
> -flush_tlb_mask(mask);
> -
> -/* Done. */
> -for_each_vcpu ( d, v )
> -if ( v != current && flush_vcpu(ctxt, v) )
> -vcpu_unpause(v);
> -
> -return true;
> +return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt)
> +  : hap_flush_tlb(flush_vcpu, ctxt);
>  }

Following our current model I think this should be a new pointer
in struct paging_mode (then truly fitting "hooks" in the title).
I can see the desire to avoid the indirect call though, but I
also think that if we were to go that route, we should settle on
switching around others as well which are paging mode dependent.
(FAOD this is nothing I ask you to do here.) Andrew, thoughts?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

2020-02-28 Thread Andrew Cooper
On 28/02/2020 12:12, Jan Beulich wrote:
> The wider cluster mode APIC IDs aren't generally representable. Convert
> the iommu_intremap variable into a tristate, allowing the AMD IOMMU
> driver to signal this special restriction to the apic_x2apic_probe().
> (Note: assignments to the variable get adjusted, while existing
> consumers - all assuming a boolean property - are left alone.)

I think it would be helpful to state that while we are not aware of any
hardware with this as a restriction, it is a situation which can be
created on fully x2apic-capable systems via firmware settings.

>
> Signed-off-by: Jan Beulich 
> ---
> v2: New.
>
> --- a/xen/arch/x86/genapic/x2apic.c
> +++ b/xen/arch/x86/genapic/x2apic.c
> @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic
>  x2apic_phys = !iommu_intremap ||
>(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL);
>  }
> -else if ( !x2apic_phys && !iommu_intremap )
> -{
> -printk("WARNING: x2APIC cluster mode is not supported without 
> interrupt remapping\n"
> -   "x2APIC: forcing phys mode\n");
> -x2apic_phys = true;
> -}
> +else if ( !x2apic_phys )
> +switch ( iommu_intremap )
> +{
> +case iommu_intremap_off:
> +case iommu_intremap_restricted:
> +printk("WARNING: x2APIC cluster mode is not supported %s 
> interrupt remapping\n"
> +   "x2APIC: forcing phys mode\n",

Any chance to fold this into a single line with "- forcing phys mode\n"
as a suffix?

> +   iommu_intremap == iommu_intremap_off ? "without"
> +: "with restricted");
> +x2apic_phys = true;
> +break;
> +
> +case iommu_intremap_full:
> +break;
> +}
>  
>  if ( x2apic_phys )
>  return &apic_x2apic_phys;
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -54,7 +54,18 @@ static inline bool_t dfn_eq(dfn_t x, dfn
>  
>  extern bool_t iommu_enable, iommu_enabled;
>  extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx;
> -extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
> +extern bool_t iommu_snoop, iommu_qinval, iommu_intpost;
> +extern enum __packed iommu_intremap {
> +   /*
> +* In order to allow traditional boolean uses of the iommu_intremap
> +* variable, the "off" value has to come first (yielding a value of zero).
> +*/
> +   iommu_intremap_off,
> +#ifdef CONFIG_X86
> +   iommu_intremap_restricted,

This needs a note about its meaning.  How about this?

/* Interrupt remapping enabled, but only able to generate interrupts
with an 8-bit APIC ID. */

Otherwise, Reviewed-by: Andrew Cooper 

Not an issue for now, but "restricted" might become ambiguous with
future extensions.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 02:13:45PM +0100, Jan Beulich wrote:
> On 28.02.2020 13:41, Roger Pau Monné wrote:
> > On Fri, Feb 28, 2020 at 01:10:59PM +0100, Jan Beulich wrote:
> >> We should neither cause IOMMU initialization as a whole to fail in this
> >> case (we should still be able to bring up the system in non-x2APIC or
> >> x2APIC physical mode), nor should the remainder of the function be
> >> skipped (as the main part of it won't get entered a 2nd time)
> > 
> > I'm not sure I see why it won't get entered a second time. AFAICT
> > init_done won't be set if amd_iommu_init fails, and hence will be
> > called again with xt == false in iov_detect?
> 
> Not very far from the top of the function there is
> 
> /* Have we been here before? */
> if ( ivrs_bdf_entries )
> return 0;
> 
> Hence me saying "main part" rather than "the function as a whole".

Oh, yes, sorry.

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush

2020-02-28 Thread Jan Beulich
On 19.02.2020 18:43, Roger Pau Monne wrote:
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int 
> do_locking, bool noflush)
>  hvm_update_guest_cr3(v, noflush);
>  }
>  
> +/*
> + * NB: doesn't actually perform any flush, used just to clear the CPU from 
> the
> + * mask and hence signal that the guest TLB flush has been done.
> + */

"has been done" isn't correct, as the flush may happen only later
on (upon next VM entry). I think wording here needs to be as
precise as possible, however the comment may turn out unnecessary
altogether:

> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, 
> struct vcpu *v),
>  if ( !flush_vcpu(ctxt, v) )
>  continue;
>  
> -paging_update_cr3(v, false);
> +hvm_asid_flush_vcpu(v);
>  
>  cpu = read_atomic(&v->dirty_cpu);
> -if ( is_vcpu_dirty_cpu(cpu) )
> +if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
>  __cpumask_set_cpu(cpu, mask);
>  }
>  
> -/* Flush TLBs on all CPUs with dirty vcpu state. */
> -flush_tlb_mask(mask);
> -
> -/* Done. */
> -for_each_vcpu ( d, v )
> -if ( v != current && flush_vcpu(ctxt, v) )
> -vcpu_unpause(v);
> +/*
> + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to force 
> an
> + * ASID/VPID change and hence accomplish a guest TLB flush. Note that 
> vCPUs
> + * not currently running will already be flushed when scheduled because 
> of
> + * the ASID tickle done in the loop above.
> + */
> +on_selected_cpus(mask, handle_flush, mask, 0);
> +while ( !cpumask_empty(mask) )
> +cpu_relax();

Why do you pass 0 as last argument? If you passed 1, you wouldn't
need the while() here, handle_flush() could be empty, and you'd
save a perhaps large amount of CPUs to all try to modify two
cache lines (the one used by on_selected_cpus() itself and the
one here) instead of just one. Yes, latency from the last CPU
clearing its bit to you being able to exit from here may be a
little larger this way, but overall latency may shrink with the
cut by half amount of atomic ops issued to the bus.

(Forcing an empty function to be called may seem odd, but sending
 just some IPI [like the event check one] wouldn't be enough, as
 you want to be sure the other side has actually received the
 request.)

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/6] x86 / p2m: remove page_list check in p2m_alloc_table

2020-02-28 Thread Paul Durrant
There does not seem to be any justification for refusing to create the
domain's p2m table simply because it may have assigned pages. Particularly
it prevents the prior allocation of PGC_extra pages.

Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 

v2:
 - New in v2
---
 xen/arch/x86/mm/p2m.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 3719deae77..9fd4b115be 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -695,14 +695,6 @@ int p2m_alloc_table(struct p2m_domain *p2m)
 
 p2m_lock(p2m);
 
-if ( p2m_is_hostp2m(p2m)
- && !page_list_empty(&d->page_list) )
-{
-P2M_ERROR("dom %d already has memory allocated\n", d->domain_id);
-p2m_unlock(p2m);
-return -EINVAL;
-}
-
 if ( pagetable_get_pfn(p2m_get_pagetable(p2m)) != 0 )
 {
 P2M_ERROR("p2m already allocated for this domain\n");
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 6/6] domain: use PGC_extra domheap page for shared_info

2020-02-28 Thread Paul Durrant
Currently shared_info is a shared xenheap page but shared xenheap pages
complicate future plans for live-update of Xen so it is desirable to,
where possible, not use them [1]. This patch therefore converts shared_info
into a PGC_extra domheap page. This does entail freeing shared_info during
domain_relinquish_resources() rather than domain_destroy() so care is
needed to avoid de-referencing a NULL shared_info pointer hence some
extra checks of 'is_dying' are needed.

NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is
  left in place since it is idempotent and called in the error path for
  arch_domain_create().

[1] See 
https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg02018.html

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Volodymyr Babchuk 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Wei Liu 

v2:
 - Addressed comments from Julien
 - Expanded the commit comment to explain why this patch is wanted
---
 xen/arch/arm/domain.c  |  2 ++
 xen/arch/x86/domain.c  |  3 ++-
 xen/common/domain.c| 28 
 xen/common/event_channel.c |  3 +++
 xen/common/time.c  | 15 +++
 5 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2cbcdaac08..3904519256 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -1006,6 +1006,8 @@ int domain_relinquish_resources(struct domain *d)
 BUG();
 }
 
+free_shared_info(d);
+
 return 0;
 }
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index eb7b0fc51c..3ad532eccf 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -691,7 +691,6 @@ void arch_domain_destroy(struct domain *d)
 pv_domain_destroy(d);
 free_perdomain_mappings(d);
 
-free_shared_info(d);
 cleanup_domain_irq_mapping(d);
 
 psr_domain_free(d);
@@ -2246,6 +2245,8 @@ int domain_relinquish_resources(struct domain *d)
 if ( is_hvm_domain(d) )
 hvm_domain_relinquish_resources(d);
 
+free_shared_info(d);
+
 return 0;
 }
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index ba7a905258..886206f648 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1650,24 +1650,44 @@ int continue_hypercall_on_cpu(
 
 int alloc_shared_info(struct domain *d, unsigned int memflags)
 {
-if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) == NULL )
+struct page_info *pg;
+
+pg = alloc_domheap_page(d, MEMF_no_refcount | memflags);
+if ( !pg )
 return -ENOMEM;
 
-d->shared_info.mfn = virt_to_mfn(d->shared_info.virt);
+if ( !get_page_and_type(pg, d, PGT_writable_page) )
+{
+/*
+ * The domain should not be running at this point so there is
+ * no way we should reach this error path.
+ */
+ASSERT_UNREACHABLE();
+return -ENODATA;
+}
+
+d->shared_info.mfn = page_to_mfn(pg);
+d->shared_info.virt = __map_domain_page_global(pg);
 
 clear_page(d->shared_info.virt);
-share_xen_page_with_guest(mfn_to_page(d->shared_info.mfn), d, SHARE_rw);
 
 return 0;
 }
 
 void free_shared_info(struct domain *d)
 {
+struct page_info *pg;
+
 if ( !d->shared_info.virt )
 return;
 
-free_xenheap_page(d->shared_info.virt);
+unmap_domain_page_global(d->shared_info.virt);
 d->shared_info.virt = NULL;
+
+pg = mfn_to_page(d->shared_info.mfn);
+
+put_page_alloc_ref(pg);
+put_page_and_type(pg);
 }
 
 /*
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index e86e2bfab0..a17422284d 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1325,6 +1325,9 @@ void evtchn_destroy(struct domain *d)
 {
 unsigned int i;
 
+/* This must be done before shared_info is freed */
+BUG_ON(!d->shared_info.virt);
+
 /* After this barrier no new event-channel allocations can occur. */
 BUG_ON(!d->is_dying);
 spin_barrier(&d->event_lock);
diff --git a/xen/common/time.c b/xen/common/time.c
index 58fa9abc40..ada02faf07 100644
--- a/xen/common/time.c
+++ b/xen/common/time.c
@@ -99,6 +99,18 @@ void update_domain_wallclock_time(struct domain *d)
 uint32_t *wc_version;
 uint64_t sec;
 
+if ( d != current->domain )
+{
+/*
+ * We need to check is_dying here as, if it is set, the
+ * shared_info may have been freed. To do this safely we need
+ * hold the domain lock.
+ */
+domain_lock(d);
+if ( d->is_dying )
+goto unlock;
+}
+
 spin_lock(&wc_lock);
 
 wc_version = &shared_info(d, wc_version);
@@ -121,6 +133,9 @@ void update_domain_wallclock_time(struct domain *d)
 *wc_version = version_update_end(*wc_version);
 
 spin_unlock(&wc_lock);
+ unlock:
+if ( d != current->domain )
+domain_unlock(d);
 }
 
 /* Set clock to 

[Xen-devel] [PATCH v2 0/6] remove one more shared xenheap page: shared_info

2020-02-28 Thread Paul Durrant
Patches #2 and #3 have been split out of the previous version of patch #6
(which was patch #2 of the previous series). Patch #4 is not entirely
related but is useful to have in place before patch #5. Patch #5 is also
new.

Paul Durrant (6):
  domain: introduce alloc/free_shared_info() helpers...
  x86 / p2m: remove page_list check in p2m_alloc_table
  x86 / pv: do not treat PGC_extra pages as RAM when constructing dom0
  x86 / ioreq: use a MEMF_no_refcount allocation for server pages...
  mm: add 'is_special_page' macro...
  domain: use PGC_extra domheap page for shared_info

 xen/arch/arm/domain.c   | 10 +++
 xen/arch/arm/mm.c   |  2 +-
 xen/arch/x86/domain.c   | 12 -
 xen/arch/x86/domctl.c   |  2 +-
 xen/arch/x86/hvm/ioreq.c|  2 +-
 xen/arch/x86/mm.c   | 11 
 xen/arch/x86/mm/altp2m.c|  2 +-
 xen/arch/x86/mm/mem_sharing.c   |  2 +-
 xen/arch/x86/mm/p2m.c   |  8 --
 xen/arch/x86/mm/shadow/common.c | 13 ++
 xen/arch/x86/pv/dom0_build.c|  6 -
 xen/arch/x86/pv/shim.c  |  2 +-
 xen/common/domain.c | 46 +
 xen/common/domctl.c |  2 +-
 xen/common/event_channel.c  |  3 +++
 xen/common/time.c   | 19 --
 xen/include/asm-x86/shared.h| 15 ++-
 xen/include/xen/domain.h|  3 +++
 xen/include/xen/mm.h|  3 +++
 xen/include/xen/sched.h |  5 +++-
 xen/include/xen/shared.h|  2 +-
 21 files changed, 119 insertions(+), 51 deletions(-)
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Paul Durrant 
Cc: "Roger Pau Monné" 
Cc: Stefano Stabellini 
Cc: Tamas K Lengyel 
Cc: Tim Deegan 
Cc: Volodymyr Babchuk 
Cc: Wei Liu 
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 5/6] mm: add 'is_special_page' macro...

2020-02-28 Thread Paul Durrant
... to cover xenheap and PGC_extra pages.

PGC_extra pages are intended to hold data structures that are associated
with a domain and my be mapped by that domain. They should not be treated
as 'normal' guest pages (i.e. RAM or page tables). Hence, in many cases
where code currently tests is_xen_heap_page() it should also check for
the PGC_extra bit in 'count_info'.

This patch therefore defines is_special_page() to cover both cases and
converts tests if is_xen_heap_page() to is_special_page() where
appropriate.

Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tamas K Lengyel 
Cc: Tim Deegan 

v2:
 - New in v2
---
 xen/arch/x86/domctl.c   |  2 +-
 xen/arch/x86/mm.c   |  9 -
 xen/arch/x86/mm/altp2m.c|  2 +-
 xen/arch/x86/mm/mem_sharing.c   |  2 +-
 xen/arch/x86/mm/shadow/common.c | 13 -
 xen/include/xen/mm.h|  3 +++
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index ed86762fa6..add70126b9 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -394,7 +394,7 @@ long arch_do_domctl(
 page = get_page_from_gfn(d, gfn, &t, P2M_ALLOC);
 
 if ( unlikely(!page) ||
- unlikely(is_xen_heap_page(page)) )
+ unlikely(is_special_page(page)) )
 {
 if ( unlikely(p2m_is_broken(t)) )
 type = XEN_DOMCTL_PFINFO_BROKEN;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5067125cbb..051ea5ebd6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1012,7 +1012,7 @@ get_page_from_l1e(
 unsigned long cacheattr = pte_flags_to_cacheattr(l1f);
 int err;
 
-if ( is_xen_heap_page(page) )
+if ( is_special_page(page) )
 {
 if ( write )
 put_page_type(page);
@@ -2445,7 +2445,7 @@ static int cleanup_page_mappings(struct page_info *page)
 {
 page->count_info &= ~PGC_cacheattr_mask;
 
-BUG_ON(is_xen_heap_page(page));
+BUG_ON(is_special_page(page));
 
 rc = update_xen_mappings(mfn, 0);
 }
@@ -2475,7 +2475,7 @@ static int cleanup_page_mappings(struct page_info *page)
 rc = rc2;
 }
 
-if ( likely(!is_xen_heap_page(page)) )
+if ( likely(!is_special_page(page)) )
 {
 ASSERT((page->u.inuse.type_info &
 (PGT_type_mask | PGT_count_mask)) == PGT_writable_page);
@@ -4214,8 +4214,7 @@ int steal_page(
 if ( !(owner = page_get_owner_and_reference(page)) )
 goto fail;
 
-if ( owner != d || is_xen_heap_page(page) ||
- (page->count_info & PGC_extra) )
+if ( owner != d || is_special_page(page) )
 goto fail_put;
 
 /*
diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c
index 50768f2547..c091b03ea3 100644
--- a/xen/arch/x86/mm/altp2m.c
+++ b/xen/arch/x86/mm/altp2m.c
@@ -77,7 +77,7 @@ int altp2m_vcpu_enable_ve(struct vcpu *v, gfn_t gfn)
  * pageable() predicate for this, due to it having the same properties
  * that we want.
  */
-if ( !p2m_is_pageable(p2mt) || is_xen_heap_page(pg) )
+if ( !p2m_is_pageable(p2mt) || is_special_page(pg) )
 {
 rc = -EINVAL;
 goto err;
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 3835bc928f..c14a724c6d 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -842,7 +842,7 @@ static int nominate_page(struct domain *d, gfn_t gfn,
 
 /* Skip xen heap pages */
 page = mfn_to_page(mfn);
-if ( !page || is_xen_heap_page(page) )
+if ( !page || is_special_page(page) )
 goto out;
 
 /* Check if there are mem_access/remapped altp2m entries for this page */
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index cba3ab1eba..e835940d86 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -2087,19 +2087,22 @@ static int sh_remove_all_mappings(struct domain *d, 
mfn_t gmfn, gfn_t gfn)
  * The qemu helper process has an untyped mapping of this dom's RAM
  * and the HVM restore program takes another.
  * Also allow one typed refcount for
- * - Xen heap pages, to match share_xen_page_with_guest(),
- * - ioreq server pages, to match prepare_ring_for_helper().
+ * - special pages, which are explicitly referenced and mapped by
+ *   Xen.
+ * - ioreq server pages, which may be special pages or normal
+ *   guest pages with an extra reference taken by
+ *   prepare_ring_for_helper().
  */
 if ( !(shadow_mode_external(d)
&& (page->count_info & PGC_count_mask) <= 3
&& ((page->u.inuse.type_info & PGT_

[Xen-devel] [PATCH 4/6] x86 / ioreq: use a MEMF_no_refcount allocation for server pages...

2020-02-28 Thread Paul Durrant
... now that it is safe to assign them.

This avoids relying on libxl (or whatever toolstack is in use) setting
max_pages up with sufficient 'slop' to allow all necessary ioreq server
pages to be allocated.

Signed-off-by: Paul Durrant 
---
Cc: Paul Durrant 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 

v2:
 - New in v2
---
 xen/arch/x86/hvm/ioreq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index f8a5c81546..648ef9137f 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -375,7 +375,7 @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, 
bool buf)
 return 0;
 }
 
-page = alloc_domheap_page(s->target, 0);
+page = alloc_domheap_page(s->target, MEMF_no_refcount);
 
 if ( !page )
 return -ENOMEM;
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/6] x86 / pv: do not treat PGC_extra pages as RAM when constructing dom0

2020-02-28 Thread Paul Durrant
The walk of page_list in dom0_construct_pv() should ignore PGC_extra pages
as they are only created for special purposes and, if mapped, will be mapped
explicitly for whatever purpose is relevant.

Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 

v2:
 - New in v2
---
 xen/arch/x86/pv/dom0_build.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index dc16ef2e79..f8f1bbe2f4 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -792,6 +792,10 @@ int __init dom0_construct_pv(struct domain *d,
 {
 mfn = mfn_x(page_to_mfn(page));
 BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn)));
+
+if ( page->count_info & PGC_extra )
+continue;
+
 if ( get_gpfn_from_mfn(mfn) >= count )
 {
 BUG_ON(is_pv_32bit_domain(d));
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/6] domain: introduce alloc/free_shared_info() helpers...

2020-02-28 Thread Paul Durrant
... and save the MFN.

This patch modifies the 'shared_info' field of struct domain to be
a structure comprising an MFN and a virtual address. Allocations are
still done from xenheap, so the virtual address still equates to
virt_to_mfn() called on the MFN but subsequent patch will change this.
Hence the need to save the MFN.

NOTE: Whist defining the new helpers, virt_to_mfn() in common/domain.c
  is made type safe.
  The definition of nmi_reason() in asm-x86/shared.h is also re-
  flowed to avoid overly long lines.

Signed-off-by: Paul Durrant 
Reviewed-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Volodymyr Babchuk 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 
---
 xen/arch/arm/domain.c|  8 ++--
 xen/arch/arm/mm.c|  2 +-
 xen/arch/x86/domain.c| 11 ---
 xen/arch/x86/mm.c|  2 +-
 xen/arch/x86/pv/dom0_build.c |  2 +-
 xen/arch/x86/pv/shim.c   |  2 +-
 xen/common/domain.c  | 26 ++
 xen/common/domctl.c  |  2 +-
 xen/common/time.c|  4 ++--
 xen/include/asm-x86/shared.h | 15 ---
 xen/include/xen/domain.h |  3 +++
 xen/include/xen/sched.h  |  5 -
 xen/include/xen/shared.h |  2 +-
 13 files changed, 55 insertions(+), 29 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index aa3df3b3ba..2cbcdaac08 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -690,13 +690,9 @@ int arch_domain_create(struct domain *d,
 if ( (rc = p2m_init(d)) != 0 )
 goto fail;
 
-rc = -ENOMEM;
-if ( (d->shared_info = alloc_xenheap_pages(0, 0)) == NULL )
+if ( (rc = alloc_shared_info(d, 0)) != 0 )
 goto fail;
 
-clear_page(d->shared_info);
-share_xen_page_with_guest(virt_to_page(d->shared_info), d, SHARE_rw);
-
 switch ( config->arch.gic_version )
 {
 case XEN_DOMCTL_CONFIG_GIC_V2:
@@ -767,7 +763,7 @@ void arch_domain_destroy(struct domain *d)
 p2m_teardown(d);
 domain_vgic_free(d);
 domain_vuart_free(d);
-free_xenheap_page(d->shared_info);
+free_shared_info(d);
 #ifdef CONFIG_ACPI
 free_xenheap_pages(d->arch.efi_acpi_table,
get_order_from_bytes(d->arch.efi_acpi_len));
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 727107eefa..2bb592101d 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1424,7 +1424,7 @@ int xenmem_add_to_physmap_one(
 if ( idx != 0 )
 return -EINVAL;
 
-mfn = virt_to_mfn(d->shared_info);
+mfn = d->shared_info.mfn;
 t = p2m_ram_rw;
 
 break;
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fe63c23676..eb7b0fc51c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -612,12 +612,9 @@ int arch_domain_create(struct domain *d,
  * The shared_info machine address must fit in a 32-bit field within a
  * 32-bit guest's start_info structure. Hence we specify MEMF_bits(32).
  */
-if ( (d->shared_info = alloc_xenheap_pages(0, MEMF_bits(32))) == NULL )
+if ( (rc = alloc_shared_info(d, MEMF_bits(32))) != 0 )
 goto fail;
 
-clear_page(d->shared_info);
-share_xen_page_with_guest(virt_to_page(d->shared_info), d, SHARE_rw);
-
 if ( (rc = init_domain_irq_mapping(d)) != 0 )
 goto fail;
 
@@ -665,7 +662,7 @@ int arch_domain_create(struct domain *d,
 psr_domain_free(d);
 iommu_domain_destroy(d);
 cleanup_domain_irq_mapping(d);
-free_xenheap_page(d->shared_info);
+free_shared_info(d);
 xfree(d->arch.cpuid);
 xfree(d->arch.msr);
 if ( paging_initialised )
@@ -694,7 +691,7 @@ void arch_domain_destroy(struct domain *d)
 pv_domain_destroy(d);
 free_perdomain_mappings(d);
 
-free_xenheap_page(d->shared_info);
+free_shared_info(d);
 cleanup_domain_irq_mapping(d);
 
 psr_domain_free(d);
@@ -720,7 +717,7 @@ void arch_domain_unpause(struct domain *d)
 
 int arch_domain_soft_reset(struct domain *d)
 {
-struct page_info *page = virt_to_page(d->shared_info), *new_page;
+struct page_info *page = mfn_to_page(d->shared_info.mfn), *new_page;
 int ret = 0;
 struct domain *owner;
 mfn_t mfn;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 70b87c4830..5067125cbb 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4538,7 +4538,7 @@ int xenmem_add_to_physmap_one(
 {
 case XENMAPSPACE_shared_info:
 if ( idx == 0 )
-mfn = virt_to_mfn(d->shared_info);
+mfn = d->shared_info.mfn;
 break;
 case XENMAPSPACE_grant_table:
 rc = gnttab_map_frame(d, idx, gpfn, &mfn);
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 5678da782d..dc16ef2e79 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -743,7 +743,7 @@ int __init dom0_co

Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current

2020-02-28 Thread Jan Beulich
On 19.02.2020 18:43, Roger Pau Monne wrote:
> Current implementation of hvm_asid_flush_vcpu is not safe to use
> unless the target vCPU is either paused or the currently running one,
> as it modifies the generation without any locking.

Indeed, but the issue you're taking care of is highly theoretical:
I don't think any sane compiler will split writes of the fields
to multiple insns. It would be nice if this was made clear here.

> Fix this by using atomic operations when accessing the generation
> field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
> allows to safely flush the current ASID generation. Note that for the
> flush to take effect if the vCPU is currently running a vmexit is
> required.
> 
> Note the same could be achieved by introducing an extra field to
> hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call
> hvm_asid_flush_vcpu on the given vCPU before vmentry, this however
> seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU
> fields to 0, so there's no need to delay this to the vmentry ASID
> helper.
> 
> This is not a bugfix as no callers that would violate the assumptions
> listed in the first paragraph have been found, but a preparatory
> change in order to allow remote flushing of HVM vCPUs.
> 
> Signed-off-by: Roger Pau Monné 
> Reviewed-by: Wei Liu 

With a suitable clarification added
Acked-by: Jan Beulich 

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable

2020-02-28 Thread Jan Beulich
On 28.02.2020 13:41, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 01:10:59PM +0100, Jan Beulich wrote:
>> We should neither cause IOMMU initialization as a whole to fail in this
>> case (we should still be able to bring up the system in non-x2APIC or
>> x2APIC physical mode), nor should the remainder of the function be
>> skipped (as the main part of it won't get entered a 2nd time)
> 
> I'm not sure I see why it won't get entered a second time. AFAICT
> init_done won't be set if amd_iommu_init fails, and hence will be
> called again with xt == false in iov_detect?

Not very far from the top of the function there is

/* Have we been here before? */
if ( ivrs_bdf_entries )
return 0;

Hence me saying "main part" rather than "the function as a whole".

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable

2020-02-28 Thread Jan Beulich
On 28.02.2020 13:41, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 01:10:59PM +0100, Jan Beulich wrote:
>> We should neither cause IOMMU initialization as a whole to fail in this
>> case (we should still be able to bring up the system in non-x2APIC or
>> x2APIC physical mode), nor should the remainder of the function be
>> skipped (as the main part of it won't get entered a 2nd time)
> 
> I'm not sure I see why it won't get entered a second time. AFAICT
> init_done won't be set if amd_iommu_init fails, and hence will be
> called again with xt == false in iov_detect?
> 
>> in such an
>> event. It is merely necessary for the function to indicate to the caller
>> (iov_supports_xt()) that setup failed as far as x2APIC is concerned.
>>
>> Signed-off-by: Jan Beulich 
> 
> LGTM, but it needs to go in with 2/2 AFAICT, or else you would report
> interrupt remapping enabled and x2APIC also enabled but won't handle
> correctly a 32 bit destination field.

Well, this is an improvement on its own imo, even if it doesn't
fully fix everything. The primary point here is to make IOMMU
initialization not fail altogether, when the system could be run
in xAPIC mode.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/2] x86: add accessors for scratch cpu mask

2020-02-28 Thread Jan Beulich
On 28.02.2020 13:07, Roger Pau Monne wrote:
> Current usage of the per-CPU scratch cpumask is dangerous since
> there's no way to figure out if the mask is already being used except
> for manual code inspection of all the callers and possible call paths.
> 
> This is unsafe and not reliable, so introduce a minimal get/put
> infrastructure to prevent nested usage of the scratch mask and usage
> in interrupt context.
> 
> Move the declaration of scratch_cpumask to smp.c in order to place the
> declaration and the accessors as close as possible.
> 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 01:10:59PM +0100, Jan Beulich wrote:
> We should neither cause IOMMU initialization as a whole to fail in this
> case (we should still be able to bring up the system in non-x2APIC or
> x2APIC physical mode), nor should the remainder of the function be
> skipped (as the main part of it won't get entered a 2nd time)

I'm not sure I see why it won't get entered a second time. AFAICT
init_done won't be set if amd_iommu_init fails, and hence will be
called again with xt == false in iov_detect?

> in such an
> event. It is merely necessary for the function to indicate to the caller
> (iov_supports_xt()) that setup failed as far as x2APIC is concerned.
> 
> Signed-off-by: Jan Beulich 

LGTM, but it needs to go in with 2/2 AFAICT, or else you would report
interrupt remapping enabled and x2APIC also enabled but won't handle
correctly a 32 bit destination field.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

2020-02-28 Thread Jan Beulich
On 28.02.2020 13:31, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 01:12:03PM +0100, Jan Beulich wrote:
>> --- a/xen/arch/x86/genapic/x2apic.c
>> +++ b/xen/arch/x86/genapic/x2apic.c
>> @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic
>>  x2apic_phys = !iommu_intremap ||
>>(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL);
>>  }
>> -else if ( !x2apic_phys && !iommu_intremap )
>> -{
>> -printk("WARNING: x2APIC cluster mode is not supported without 
>> interrupt remapping\n"
>> -   "x2APIC: forcing phys mode\n");
>> -x2apic_phys = true;
>> -}
>> +else if ( !x2apic_phys )
>> +switch ( iommu_intremap )
>> +{
>> +case iommu_intremap_off:
>> +case iommu_intremap_restricted:
>> +printk("WARNING: x2APIC cluster mode is not supported %s 
>> interrupt remapping\n"
>> +   "x2APIC: forcing phys mode\n",
>> +   iommu_intremap == iommu_intremap_off ? "without"
>> +: "with 
>> restricted");
>> +x2apic_phys = true;
> 
> I think you also need to fixup the usage of iommu_intremap in __cpu_up
> so that CPUs with APIC IDs > 255 are not brought up when in
> iommu_intremap_restricted mode.

That certainly wants changing, yes, but I view this as an orthogonal
adjustment, which I'd like to make only once I understand what the
behavior for APIC ID 0xff should be in this setup.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable

2020-02-28 Thread Andrew Cooper
On 28/02/2020 12:10, Jan Beulich wrote:
> We should neither cause IOMMU initialization as a whole to fail in this
> case (we should still be able to bring up the system in non-x2APIC or
> x2APIC physical mode), nor should the remainder of the function be
> skipped (as the main part of it won't get entered a 2nd time) in such an
> event. It is merely necessary for the function to indicate to the caller
> (iov_supports_xt()) that setup failed as far as x2APIC is concerned.
>
> Signed-off-by: Jan Beulich 

Looks much better.

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 01:12:03PM +0100, Jan Beulich wrote:
> The wider cluster mode APIC IDs aren't generally representable. Convert
> the iommu_intremap variable into a tristate, allowing the AMD IOMMU
> driver to signal this special restriction to the apic_x2apic_probe().
> (Note: assignments to the variable get adjusted, while existing
> consumers - all assuming a boolean property - are left alone.)
> 
> Signed-off-by: Jan Beulich 
> ---
> v2: New.
> 
> --- a/xen/arch/x86/genapic/x2apic.c
> +++ b/xen/arch/x86/genapic/x2apic.c
> @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic
>  x2apic_phys = !iommu_intremap ||
>(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL);
>  }
> -else if ( !x2apic_phys && !iommu_intremap )
> -{
> -printk("WARNING: x2APIC cluster mode is not supported without 
> interrupt remapping\n"
> -   "x2APIC: forcing phys mode\n");
> -x2apic_phys = true;
> -}
> +else if ( !x2apic_phys )
> +switch ( iommu_intremap )
> +{
> +case iommu_intremap_off:
> +case iommu_intremap_restricted:
> +printk("WARNING: x2APIC cluster mode is not supported %s 
> interrupt remapping\n"
> +   "x2APIC: forcing phys mode\n",
> +   iommu_intremap == iommu_intremap_off ? "without"
> +: "with restricted");
> +x2apic_phys = true;

I think you also need to fixup the usage of iommu_intremap in __cpu_up
so that CPUs with APIC IDs > 255 are not brought up when in
iommu_intremap_restricted mode.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 4/5] IOMMU: iommu_qinval is x86-only

2020-02-28 Thread Jan Beulich
In fact it's VT-d specific, but we don't have a way yet to build code
for just one vendor.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -33,7 +33,6 @@ bool_t __read_mostly force_iommu;
 bool_t __read_mostly iommu_verbose;
 bool __read_mostly iommu_quarantine = true;
 bool_t __read_mostly iommu_snoop = 1;
-bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_crash_disable;
 
 static bool __hwdom_initdata iommu_hwdom_none;
@@ -75,13 +74,13 @@ static int __init parse_iommu_param(cons
 #ifdef CONFIG_X86
 else if ( (val = parse_boolean("igfx", s, ss)) >= 0 )
 iommu_igfx = val;
+else if ( (val = parse_boolean("qinval", s, ss)) >= 0 )
+iommu_qinval = val;
 #endif
 else if ( (val = parse_boolean("verbose", s, ss)) >= 0 )
 iommu_verbose = val;
 else if ( (val = parse_boolean("snoop", s, ss)) >= 0 )
 iommu_snoop = val;
-else if ( (val = parse_boolean("qinval", s, ss)) >= 0 )
-iommu_qinval = val;
 #ifndef iommu_intremap
 else if ( (val = parse_boolean("intremap", s, ss)) >= 0 )
 iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off;
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -51,6 +51,7 @@ struct mapped_rmrr {
 bool __read_mostly untrusted_msi;
 
 bool __read_mostly iommu_igfx = true;
+bool __read_mostly iommu_qinval = true;
 
 int nr_iommus;
 
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -54,7 +54,7 @@ static inline bool_t dfn_eq(dfn_t x, dfn
 
 extern bool_t iommu_enable, iommu_enabled;
 extern bool force_iommu, iommu_quarantine, iommu_verbose;
-extern bool_t iommu_snoop, iommu_qinval;
+extern bool_t iommu_snoop;
 
 #ifdef CONFIG_X86
 extern enum __packed iommu_intremap {
@@ -66,7 +66,7 @@ extern enum __packed iommu_intremap {
iommu_intremap_restricted,
iommu_intremap_full,
 } iommu_intremap;
-extern bool iommu_igfx;
+extern bool iommu_igfx, iommu_qinval;
 #else
 # define iommu_intremap false
 #endif


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 5/5] IOMMU: iommu_snoop is x86/HVM-only

2020-02-28 Thread Jan Beulich
In fact it's VT-d specific, but we don't have a way yet to build code
for just one vendor. Provide a #define for all other cases.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -32,7 +32,6 @@ bool_t __read_mostly iommu_enabled;
 bool_t __read_mostly force_iommu;
 bool_t __read_mostly iommu_verbose;
 bool __read_mostly iommu_quarantine = true;
-bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_crash_disable;
 
 static bool __hwdom_initdata iommu_hwdom_none;
@@ -79,8 +78,10 @@ static int __init parse_iommu_param(cons
 #endif
 else if ( (val = parse_boolean("verbose", s, ss)) >= 0 )
 iommu_verbose = val;
+#ifndef iommu_snoop
 else if ( (val = parse_boolean("snoop", s, ss)) >= 0 )
 iommu_snoop = val;
+#endif
 #ifndef iommu_intremap
 else if ( (val = parse_boolean("intremap", s, ss)) >= 0 )
 iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off;
@@ -488,7 +489,9 @@ int __init iommu_setup(void)
 printk("I/O virtualisation %sabled\n", iommu_enabled ? "en" : "dis");
 if ( !iommu_enabled )
 {
-iommu_snoop = 0;
+#ifndef iommu_snoop
+iommu_snoop = false;
+#endif
 iommu_hwdom_passthrough = false;
 iommu_hwdom_strict = false;
 }
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -52,6 +52,9 @@ bool __read_mostly untrusted_msi;
 
 bool __read_mostly iommu_igfx = true;
 bool __read_mostly iommu_qinval = true;
+#ifndef iommu_snoop
+bool __read_mostly iommu_snoop = true;
+#endif
 
 int nr_iommus;
 
@@ -2288,8 +2291,10 @@ static int __init vtd_setup(void)
cap_sps_2mb(iommu->cap) ? ", 2MB" : "",
cap_sps_1gb(iommu->cap) ? ", 1GB" : "");
 
+#ifndef iommu_snoop
 if ( iommu_snoop && !ecap_snp_ctl(iommu->ecap) )
-iommu_snoop = 0;
+iommu_snoop = false;
+#endif
 
 if ( iommu_hwdom_passthrough && !ecap_pass_thru(iommu->ecap) )
 iommu_hwdom_passthrough = false;
@@ -2331,7 +2336,9 @@ static int __init vtd_setup(void)
 }
 
 #define P(p,s) printk("Intel VT-d %s %senabled.\n", s, (p)? "" : "not ")
+#ifndef iommu_snoop
 P(iommu_snoop, "Snoop Control");
+#endif
 P(iommu_hwdom_passthrough, "Dom0 DMA Passthrough");
 P(iommu_qinval, "Queued Invalidation");
 P(iommu_intremap, "Interrupt Remapping");
@@ -2351,7 +2358,9 @@ static int __init vtd_setup(void)
 
  error:
 iommu_enabled = 0;
-iommu_snoop = 0;
+#ifndef iommu_snoop
+iommu_snoop = false;
+#endif
 iommu_hwdom_passthrough = false;
 iommu_qinval = 0;
 iommu_intremap = iommu_intremap_off;
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -54,7 +54,6 @@ static inline bool_t dfn_eq(dfn_t x, dfn
 
 extern bool_t iommu_enable, iommu_enabled;
 extern bool force_iommu, iommu_quarantine, iommu_verbose;
-extern bool_t iommu_snoop;
 
 #ifdef CONFIG_X86
 extern enum __packed iommu_intremap {
@@ -72,9 +71,10 @@ extern bool iommu_igfx, iommu_qinval;
 #endif
 
 #if defined(CONFIG_X86) && defined(CONFIG_HVM)
-extern bool iommu_intpost;
+extern bool iommu_intpost, iommu_snoop;
 #else
 # define iommu_intpost false
+# define iommu_snoop false
 #endif
 
 #if defined(CONFIG_IOMMU_FORCE_PT_SHARE)


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/5] IOMMU: iommu_intpost is x86/HVM-only

2020-02-28 Thread Jan Beulich
Provide a #define for all other cases.

Signed-off-by: Jan Beulich 

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1309,6 +1309,9 @@ boolean (e.g. `iommu=no`) can override t
 This option depends on `intremap`, and is disabled by default due to some
 corner cases in the implementation which have yet to be resolved.
 
+This option is not valid on Arm, or on x86 builds of Xen without HVM
+support.
+
 *   The `crash-disable` boolean controls disabling IOMMU functionality 
(DMAR/IR/QI)
 before switching to a crash kernel. This option is inactive by default and
 is for compatibility with older kdump kernels only. Modern kernels copy
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -43,14 +43,6 @@ bool __read_mostly iommu_hwdom_passthrou
 bool __hwdom_initdata iommu_hwdom_inclusive;
 int8_t __hwdom_initdata iommu_hwdom_reserved = -1;
 
-/*
- * In the current implementation of VT-d posted interrupts, in some extreme
- * cases, the per cpu list which saves the blocked vCPU will be very long,
- * and this will affect the interrupt latency, so let this feature off by
- * default until we find a good solution to resolve it.
- */
-bool_t __read_mostly iommu_intpost;
-
 #ifndef iommu_hap_pt_share
 bool __read_mostly iommu_hap_pt_share = true;
 #endif
@@ -93,8 +85,10 @@ static int __init parse_iommu_param(cons
 else if ( (val = parse_boolean("intremap", s, ss)) >= 0 )
 iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off;
 #endif
+#ifndef iommu_intpost
 else if ( (val = parse_boolean("intpost", s, ss)) >= 0 )
 iommu_intpost = val;
+#endif
 #ifdef CONFIG_KEXEC
 else if ( (val = parse_boolean("crash-disable", s, ss)) >= 0 )
 iommu_crash_disable = val;
@@ -486,8 +480,10 @@ int __init iommu_setup(void)
 panic("Couldn't enable %s and iommu=required/force\n",
   !iommu_enabled ? "IOMMU" : "Interrupt Remapping");
 
+#ifndef iommu_intpost
 if ( !iommu_intremap )
 iommu_intpost = 0;
+#endif
 
 printk("I/O virtualisation %sabled\n", iommu_enabled ? "en" : "dis");
 if ( !iommu_enabled )
@@ -563,10 +559,13 @@ void iommu_crash_shutdown(void)
 
 if ( iommu_enabled )
 iommu_get_ops()->crash_shutdown();
-iommu_enabled = iommu_intpost = 0;
+iommu_enabled = false;
 #ifndef iommu_intremap
 iommu_intremap = iommu_intremap_off;
 #endif
+#ifndef iommu_intpost
+iommu_intpost = false;
+#endif
 }
 
 int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2297,13 +2297,15 @@ static int __init vtd_setup(void)
 if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
 iommu_intremap = iommu_intremap_off;
 
+#ifndef iommu_intpost
 /*
  * We cannot use posted interrupt if X86_FEATURE_CX16 is
  * not supported, since we count on this feature to
  * atomically update 16-byte IRTE in posted format.
  */
 if ( !cap_intr_post(iommu->cap) || !iommu_intremap || !cpu_has_cx16 )
-iommu_intpost = 0;
+iommu_intpost = false;
+#endif
 
 if ( !vtd_ept_page_compatible(iommu) )
 clear_iommu_hap_pt_share();
@@ -2330,7 +2332,9 @@ static int __init vtd_setup(void)
 P(iommu_hwdom_passthrough, "Dom0 DMA Passthrough");
 P(iommu_qinval, "Queued Invalidation");
 P(iommu_intremap, "Interrupt Remapping");
+#ifndef iommu_intpost
 P(iommu_intpost, "Posted Interrupt");
+#endif
 P(iommu_hap_pt_share, "Shared EPT tables");
 #undef P
 
@@ -2348,7 +2352,9 @@ static int __init vtd_setup(void)
 iommu_hwdom_passthrough = false;
 iommu_qinval = 0;
 iommu_intremap = iommu_intremap_off;
-iommu_intpost = 0;
+#ifndef iommu_intpost
+iommu_intpost = false;
+#endif
 return ret;
 }
 
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -29,6 +29,16 @@ struct iommu_ops __read_mostly iommu_ops
 
 enum iommu_intremap __read_mostly iommu_intremap = iommu_intremap_full;
 
+#ifndef iommu_intpost
+/*
+ * In the current implementation of VT-d posted interrupts, in some extreme
+ * cases, the per cpu list which saves the blocked vCPU will be very long,
+ * and this will affect the interrupt latency, so let this feature off by
+ * default until we find a good solution to resolve it.
+ */
+bool __read_mostly iommu_intpost;
+#endif
+
 int __init iommu_hardware_setup(void)
 {
 struct IO_APIC_route_entry **ioapic_entries = NULL;
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -54,7 +54,7 @@ static inline bool_t dfn_eq(dfn_t x, dfn
 
 extern bool_t iommu_enable, iommu_enabled;
 extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx;
-extern bool_t iommu_snoop, iommu_qinval, iommu_intpost;
+extern bool_t iommu_snoop, iommu_qinval;
 
 #ifdef CONFIG_X86
 extern enum __p

[Xen-devel] [PATCH 3/5] IOMMU: iommu_igfx is x86-only

2020-02-28 Thread Jan Beulich
In fact it's VT-d specific, but we don't have a way yet to build code
for just one vendor.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -32,7 +32,6 @@ bool_t __read_mostly iommu_enabled;
 bool_t __read_mostly force_iommu;
 bool_t __read_mostly iommu_verbose;
 bool __read_mostly iommu_quarantine = true;
-bool_t __read_mostly iommu_igfx = 1;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_crash_disable;
@@ -73,8 +72,10 @@ static int __init parse_iommu_param(cons
 force_iommu = val;
 else if ( (val = parse_boolean("quarantine", s, ss)) >= 0 )
 iommu_quarantine = val;
+#ifdef CONFIG_X86
 else if ( (val = parse_boolean("igfx", s, ss)) >= 0 )
 iommu_igfx = val;
+#endif
 else if ( (val = parse_boolean("verbose", s, ss)) >= 0 )
 iommu_verbose = val;
 else if ( (val = parse_boolean("snoop", s, ss)) >= 0 )
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -50,6 +50,8 @@ struct mapped_rmrr {
 /* Possible unfiltered LAPIC/MSI messages from untrusted sources? */
 bool __read_mostly untrusted_msi;
 
+bool __read_mostly iommu_igfx = true;
+
 int nr_iommus;
 
 static struct tasklet vtd_fault_tasklet;
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -53,7 +53,7 @@ static inline bool_t dfn_eq(dfn_t x, dfn
 }
 
 extern bool_t iommu_enable, iommu_enabled;
-extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx;
+extern bool force_iommu, iommu_quarantine, iommu_verbose;
 extern bool_t iommu_snoop, iommu_qinval;
 
 #ifdef CONFIG_X86
@@ -66,6 +66,7 @@ extern enum __packed iommu_intremap {
iommu_intremap_restricted,
iommu_intremap_full,
 } iommu_intremap;
+extern bool iommu_igfx;
 #else
 # define iommu_intremap false
 #endif


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/5] IOMMU: iommu_intremap is x86-only

2020-02-28 Thread Jan Beulich
Provide a #define for other cases; it didn't seem worthwhile to me to
introduce an IOMMU_INTREMAP Kconfig option at this point.

Signed-off-by: Jan Beulich 

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1299,6 +1299,8 @@ boolean (e.g. `iommu=no`) can override t
 generation of IOMMUs only supported DMA remapping, and Interrupt Remapping
 appeared in the second generation.
 
+This option is not valid on Arm.
+
 *   The `intpost` boolean controls the Posted Interrupt sub-feature.  In
 combination with APIC acceleration (VT-x APICV, SVM AVIC), the IOMMU can
 be configured to deliver interrupts from assigned PCI devices directly
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -35,7 +35,6 @@ bool __read_mostly iommu_quarantine = tr
 bool_t __read_mostly iommu_igfx = 1;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
-enum iommu_intremap __read_mostly iommu_intremap = iommu_intremap_full;
 bool_t __read_mostly iommu_crash_disable;
 
 static bool __hwdom_initdata iommu_hwdom_none;
@@ -90,8 +89,10 @@ static int __init parse_iommu_param(cons
 iommu_snoop = val;
 else if ( (val = parse_boolean("qinval", s, ss)) >= 0 )
 iommu_qinval = val;
+#ifndef iommu_intremap
 else if ( (val = parse_boolean("intremap", s, ss)) >= 0 )
 iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off;
+#endif
 else if ( (val = parse_boolean("intpost", s, ss)) >= 0 )
 iommu_intpost = val;
 #ifdef CONFIG_KEXEC
@@ -474,8 +475,11 @@ int __init iommu_setup(void)
 rc = iommu_hardware_setup();
 iommu_enabled = (rc == 0);
 }
+
+#ifndef iommu_intremap
 if ( !iommu_enabled )
 iommu_intremap = iommu_intremap_off;
+#endif
 
 if ( (force_iommu && !iommu_enabled) ||
  (force_intremap && !iommu_intremap) )
@@ -500,7 +504,9 @@ int __init iommu_setup(void)
 printk(" - Dom0 mode: %s\n",
iommu_hwdom_passthrough ? "Passthrough" :
iommu_hwdom_strict ? "Strict" : "Relaxed");
+#ifndef iommu_intremap
 printk("Interrupt remapping %sabled\n", iommu_intremap ? "en" : "dis");
+#endif
 tasklet_init(&iommu_pt_cleanup_tasklet, iommu_free_pagetables, NULL);
 }
 
@@ -558,7 +564,9 @@ void iommu_crash_shutdown(void)
 if ( iommu_enabled )
 iommu_get_ops()->crash_shutdown();
 iommu_enabled = iommu_intpost = 0;
+#ifndef iommu_intremap
 iommu_intremap = iommu_intremap_off;
+#endif
 }
 
 int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -27,6 +27,8 @@
 const struct iommu_init_ops *__initdata iommu_init_ops;
 struct iommu_ops __read_mostly iommu_ops;
 
+enum iommu_intremap __read_mostly iommu_intremap = iommu_intremap_full;
+
 int __init iommu_hardware_setup(void)
 {
 struct IO_APIC_route_entry **ioapic_entries = NULL;
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -55,17 +55,20 @@ static inline bool_t dfn_eq(dfn_t x, dfn
 extern bool_t iommu_enable, iommu_enabled;
 extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx;
 extern bool_t iommu_snoop, iommu_qinval, iommu_intpost;
+
+#ifdef CONFIG_X86
 extern enum __packed iommu_intremap {
/*
 * In order to allow traditional boolean uses of the iommu_intremap
 * variable, the "off" value has to come first (yielding a value of zero).
 */
iommu_intremap_off,
-#ifdef CONFIG_X86
iommu_intremap_restricted,
-#endif
iommu_intremap_full,
 } iommu_intremap;
+#else
+# define iommu_intremap false
+#endif
 
 #if defined(CONFIG_IOMMU_FORCE_PT_SHARE)
 #define iommu_hap_pt_share true


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/5] IOMMU: restrict visibility/scope if certain variables

2020-02-28 Thread Jan Beulich
A number of the command line controlled variables are x86-
or even x86-HVM-specific. Don't have those variables elsewhere
in the first place (in some cases replace them by a #define),
and as a result also don't silently accept such "iommu="
sub-options which in fact have no effect.

1: iommu_intremap is x86-only
2: iommu_intpost is x86/HVM-only
3: iommu_igfx is x86-only
4: iommu_qinval is x86-only
5: iommu_snoop is x86/HVM-only

The series contextually depends on "AMD/IOMMU: without XT,
x2APIC needs to be forced into physical mode"

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: Replace zero-length array with flexible-array member

2020-02-28 Thread Boris Ostrovsky


On 2/27/20 4:31 AM, Jürgen Groß wrote:
> On 26.02.20 22:26, Gustavo A. R. Silva wrote:
>> The current codebase makes use of the zero-length array language
>> extension to the C90 standard, but the preferred mechanism to declare
>> variable-length types such as these ones is a flexible array
>> member[1][2],
>> introduced in C99:
>>
>> struct foo {
>>  int stuff;
>>  struct boo array[];
>> };
>>
>> By making use of the mechanism above, we will get a compiler warning
>> in case the flexible array does not occur last in the structure, which
>> will help us prevent some kind of undefined behavior bugs from being
>> inadvertently introduced[3] to the codebase from now on.
>>
>> Also, notice that, dynamic memory allocations won't be affected by
>> this change:
>>
>> "Flexible array members have incomplete type, and so the sizeof operator
>> may not be applied. As a quirk of the original implementation of
>> zero-length arrays, sizeof evaluates to zero."[1]
>>
>> This issue was found with the help of Coccinelle.
>>
>> [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
>> [2] https://github.com/KSPP/linux/issues/21
>> [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
>>
>> Signed-off-by: Gustavo A. R. Silva 
>
> Reviewed-by: Juergen Gross 


Applied to for-linus-5.6.

-boris

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

2020-02-28 Thread Jan Beulich
The wider cluster mode APIC IDs aren't generally representable. Convert
the iommu_intremap variable into a tristate, allowing the AMD IOMMU
driver to signal this special restriction to the apic_x2apic_probe().
(Note: assignments to the variable get adjusted, while existing
consumers - all assuming a boolean property - are left alone.)

Signed-off-by: Jan Beulich 
---
v2: New.

--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic
 x2apic_phys = !iommu_intremap ||
   (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL);
 }
-else if ( !x2apic_phys && !iommu_intremap )
-{
-printk("WARNING: x2APIC cluster mode is not supported without 
interrupt remapping\n"
-   "x2APIC: forcing phys mode\n");
-x2apic_phys = true;
-}
+else if ( !x2apic_phys )
+switch ( iommu_intremap )
+{
+case iommu_intremap_off:
+case iommu_intremap_restricted:
+printk("WARNING: x2APIC cluster mode is not supported %s interrupt 
remapping\n"
+   "x2APIC: forcing phys mode\n",
+   iommu_intremap == iommu_intremap_off ? "without"
+: "with restricted");
+x2apic_phys = true;
+break;
+
+case iommu_intremap_full:
+break;
+}
 
 if ( x2apic_phys )
 return &apic_x2apic_phys;
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1139,7 +1139,7 @@ static void __init amd_iommu_init_cleanu
 
 iommu_enabled = 0;
 iommu_hwdom_passthrough = false;
-iommu_intremap = 0;
+iommu_intremap = iommu_intremap_off;
 iommuv2_enabled = 0;
 }
 
@@ -1413,6 +1413,9 @@ int __init amd_iommu_prepare(bool xt)
 iommu->ctrl.int_cap_xt_en = xt && has_xt;
 }
 
+if ( iommu_intremap && !has_xt )
+iommu_intremap = iommu_intremap_restricted;
+
 rc = amd_iommu_update_ivrs_mapping_acpi();
 
  error_out:
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -157,7 +157,7 @@ int __init acpi_ivrs_init(void)
 
 if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
 {
-iommu_intremap = 0;
+iommu_intremap = iommu_intremap_off;
 return -ENODEV;
 }
 
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -35,7 +35,7 @@ bool __read_mostly iommu_quarantine = tr
 bool_t __read_mostly iommu_igfx = 1;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
-bool_t __read_mostly iommu_intremap = 1;
+enum iommu_intremap __read_mostly iommu_intremap = iommu_intremap_full;
 bool_t __read_mostly iommu_crash_disable;
 
 static bool __hwdom_initdata iommu_hwdom_none;
@@ -91,7 +91,7 @@ static int __init parse_iommu_param(cons
 else if ( (val = parse_boolean("qinval", s, ss)) >= 0 )
 iommu_qinval = val;
 else if ( (val = parse_boolean("intremap", s, ss)) >= 0 )
-iommu_intremap = val;
+iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off;
 else if ( (val = parse_boolean("intpost", s, ss)) >= 0 )
 iommu_intpost = val;
 #ifdef CONFIG_KEXEC
@@ -475,7 +475,7 @@ int __init iommu_setup(void)
 iommu_enabled = (rc == 0);
 }
 if ( !iommu_enabled )
-iommu_intremap = 0;
+iommu_intremap = iommu_intremap_off;
 
 if ( (force_iommu && !iommu_enabled) ||
  (force_intremap && !iommu_intremap) )
@@ -557,7 +557,8 @@ void iommu_crash_shutdown(void)
 
 if ( iommu_enabled )
 iommu_get_ops()->crash_shutdown();
-iommu_enabled = iommu_intremap = iommu_intpost = 0;
+iommu_enabled = iommu_intpost = 0;
+iommu_intremap = iommu_intremap_off;
 }
 
 int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2177,7 +2177,7 @@ static int __must_check init_vtd_hw(void
 {
 if ( ioapic_to_iommu(IO_APIC_ID(apic)) == NULL )
 {
-iommu_intremap = 0;
+iommu_intremap = iommu_intremap_off;
 dprintk(XENLOG_ERR VTDPREFIX,
 "ioapic_to_iommu: ioapic %#x (id: %#x) is NULL! "
 "Will not try to enable Interrupt Remapping.\n",
@@ -2193,7 +2193,7 @@ static int __must_check init_vtd_hw(void
 iommu = drhd->iommu;
 if ( enable_intremap(iommu, 0) != 0 )
 {
-iommu_intremap = 0;
+iommu_intremap = iommu_intremap_off;
 dprintk(XENLOG_WARNING VTDPREFIX,
 "Interrupt Remapping not enabled\n");
 
@@ -2295,7 +2295,7 @@ static int __init vtd_setup(void)
 iommu_qinval = 0;
 
 if ( iommu_intremap && !ecap_intr_remap(iomm

[Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable

2020-02-28 Thread Jan Beulich
We should neither cause IOMMU initialization as a whole to fail in this
case (we should still be able to bring up the system in non-x2APIC or
x2APIC physical mode), nor should the remainder of the function be
skipped (as the main part of it won't get entered a 2nd time) in such an
event. It is merely necessary for the function to indicate to the caller
(iov_supports_xt()) that setup failed as far as x2APIC is concerned.

Signed-off-by: Jan Beulich 
---
v2: Also adjust the loop consuming "xt". no_xt -> has_xt.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1364,6 +1364,7 @@ static int __init amd_iommu_prepare_one(
 int __init amd_iommu_prepare(bool xt)
 {
 struct amd_iommu *iommu;
+bool has_xt = true;
 int rc = -ENODEV;
 
 BUG_ON( !iommu_found() );
@@ -1400,17 +1401,16 @@ int __init amd_iommu_prepare(bool xt)
 if ( rc )
 goto error_out;
 
-rc = -ENODEV;
-if ( xt && (!iommu->features.flds.ga_sup || 
!iommu->features.flds.xt_sup) )
-goto error_out;
+if ( !iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup )
+has_xt = false;
 }
 
 for_each_amd_iommu ( iommu )
 {
 /* NB: There's no need to actually write these out right here. */
-iommu->ctrl.ga_en |= xt;
-iommu->ctrl.xt_en = xt;
-iommu->ctrl.int_cap_xt_en = xt;
+iommu->ctrl.ga_en |= xt && has_xt;
+iommu->ctrl.xt_en = xt && has_xt;
+iommu->ctrl.int_cap_xt_en = xt && has_xt;
 }
 
 rc = amd_iommu_update_ivrs_mapping_acpi();
@@ -1422,7 +1422,7 @@ int __init amd_iommu_prepare(bool xt)
 ivhd_type = 0;
 }
 
-return rc;
+return rc ?: xt && !has_xt ? -ENODEV : 0;
 }
 
 int __init amd_iommu_init(bool xt)


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/2] x86/smp: use a dedicated CPU mask in send_IPI_mask

2020-02-28 Thread Jan Beulich
On 28.02.2020 13:07, Roger Pau Monne wrote:
> Some callers of send_IPI_mask pass the scratch cpumask as the mask
> parameter of send_IPI_mask, so the scratch cpumask cannot be used by
> the function. The following trace has been obtained with a debug patch
> and shows one of those callers:
> 
> (XEN) scratch CPU mask already in use by 
> arch/x86/mm.c#_get_page_type+0x1f9/0x1abf
> (XEN) Xen BUG at smp.c:45
> [...]
> (XEN) Xen call trace:
> (XEN)[] R scratch_cpumask+0xd3/0xf9
> (XEN)[] F send_IPI_mask+0x72/0x1ca
> (XEN)[] F flush_area_mask+0x10c/0x16c
> (XEN)[] F arch/x86/mm.c#_get_page_type+0x3ff/0x1abf
> (XEN)[] F get_page_type+0xe/0x2c
> (XEN)[] F pv_set_gdt+0xa1/0x2aa
> (XEN)[] F arch_set_info_guest+0x1196/0x16ba
> (XEN)[] F default_initialise_vcpu+0xc7/0xd4
> (XEN)[] F arch_initialise_vcpu+0x61/0xcd
> (XEN)[] F do_vcpu_op+0x219/0x690
> (XEN)[] F pv_hypercall+0x2f6/0x593
> (XEN)[] F lstar_enter+0x112/0x120
> 
> _get_page_type will use the scratch cpumask to call flush_tlb_mask,
> which in turn calls send_IPI_mask.
> 
> Fix this by using a dedicated per CPU cpumask in send_IPI_mask.
> 
> Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when 
> possible')
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 0/2] AMD/IOMMU: improve x2APIC handling when XT is unavailable

2020-02-28 Thread Jan Beulich
For AMD the connection between IOMMU and x2APIC is less strong,
hence even if unlikely to occur we would better deal correctly
with XT being unavailable (for whatever reasons) in particular
when x2APIC is already enabled on a system.

1: correct handling when XT's prereq features are unavailable
2: without XT, x2APIC needs to be forced into physical mode

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 1/2] x86/smp: use a dedicated CPU mask in send_IPI_mask

2020-02-28 Thread Roger Pau Monne
Some callers of send_IPI_mask pass the scratch cpumask as the mask
parameter of send_IPI_mask, so the scratch cpumask cannot be used by
the function. The following trace has been obtained with a debug patch
and shows one of those callers:

(XEN) scratch CPU mask already in use by 
arch/x86/mm.c#_get_page_type+0x1f9/0x1abf
(XEN) Xen BUG at smp.c:45
[...]
(XEN) Xen call trace:
(XEN)[] R scratch_cpumask+0xd3/0xf9
(XEN)[] F send_IPI_mask+0x72/0x1ca
(XEN)[] F flush_area_mask+0x10c/0x16c
(XEN)[] F arch/x86/mm.c#_get_page_type+0x3ff/0x1abf
(XEN)[] F get_page_type+0xe/0x2c
(XEN)[] F pv_set_gdt+0xa1/0x2aa
(XEN)[] F arch_set_info_guest+0x1196/0x16ba
(XEN)[] F default_initialise_vcpu+0xc7/0xd4
(XEN)[] F arch_initialise_vcpu+0x61/0xcd
(XEN)[] F do_vcpu_op+0x219/0x690
(XEN)[] F pv_hypercall+0x2f6/0x593
(XEN)[] F lstar_enter+0x112/0x120

_get_page_type will use the scratch cpumask to call flush_tlb_mask,
which in turn calls send_IPI_mask.

Fix this by using a dedicated per CPU cpumask in send_IPI_mask.

Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when 
possible')
Signed-off-by: Roger Pau Monné 
---
Changes since v4:
 - Place the declaration of send_ipi_cpumask in smp.h.
---
 xen/arch/x86/smp.c| 2 +-
 xen/arch/x86/smpboot.c| 9 -
 xen/include/asm-x86/smp.h | 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 0461812cf6..dd0b49d731 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -67,7 +67,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int 
vector,
 void send_IPI_mask(const cpumask_t *mask, int vector)
 {
 bool cpus_locked = false;
-cpumask_t *scratch = this_cpu(scratch_cpumask);
+cpumask_t *scratch = this_cpu(send_ipi_cpumask);
 
 if ( in_irq() || in_mce_handler() || in_nmi_handler() )
 {
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index ad49f2dcd7..6c548b0b53 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -57,6 +57,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
 static cpumask_t scratch_cpu0mask;
 
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask);
+static cpumask_t send_ipi_cpu0mask;
+
 cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
@@ -930,6 +933,8 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove)
 FREE_CPUMASK_VAR(per_cpu(cpu_core_mask, cpu));
 if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
 FREE_CPUMASK_VAR(per_cpu(scratch_cpumask, cpu));
+if ( per_cpu(send_ipi_cpumask, cpu) != &send_ipi_cpu0mask )
+FREE_CPUMASK_VAR(per_cpu(send_ipi_cpumask, cpu));
 }
 
 cleanup_cpu_root_pgt(cpu);
@@ -1034,7 +1039,8 @@ static int cpu_smpboot_alloc(unsigned int cpu)
 
 if ( !(cond_zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
cond_zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
-   cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
+   cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu)) &&
+   cond_alloc_cpumask_var(&per_cpu(send_ipi_cpumask, cpu))) )
 goto out;
 
 rc = 0;
@@ -1175,6 +1181,7 @@ void __init smp_prepare_boot_cpu(void)
 cpumask_set_cpu(cpu, &cpu_present_map);
 #if NR_CPUS > 2 * BITS_PER_LONG
 per_cpu(scratch_cpumask, cpu) = &scratch_cpu0mask;
+per_cpu(send_ipi_cpumask, cpu) = &send_ipi_cpu0mask;
 #endif
 
 get_cpu_info()->use_pv_cr3 = false;
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index 92d69a5ea0..6150363655 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -22,6 +22,7 @@
 DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
+DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask);
 
 /*
  * Do we, for platform reasons, need to actually keep CPUs online when we
-- 
2.25.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 0/2] x86: scratch cpumask fixes/improvement

2020-02-28 Thread Roger Pau Monne
Hello,

Following series contain yet one more bugfix that removes the usage of
the scratch cpumask in send_IPI_mask and the introduction of accessors
to get/put the per-CPU scratch cpumask in order to prevent such issues
form happening in the future.

Thanks, Roger.

Roger Pau Monne (2):
  x86/smp: use a dedicated CPU mask in send_IPI_mask
  x86: add accessors for scratch cpu mask

 xen/arch/x86/io_apic.c|  6 --
 xen/arch/x86/irq.c| 14 ++
 xen/arch/x86/mm.c | 39 +++
 xen/arch/x86/msi.c|  4 +++-
 xen/arch/x86/smp.c| 27 ++-
 xen/arch/x86/smpboot.c| 10 --
 xen/include/asm-x86/smp.h | 15 +++
 7 files changed, 93 insertions(+), 22 deletions(-)

-- 
2.25.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 2/2] x86: add accessors for scratch cpu mask

2020-02-28 Thread Roger Pau Monne
Current usage of the per-CPU scratch cpumask is dangerous since
there's no way to figure out if the mask is already being used except
for manual code inspection of all the callers and possible call paths.

This is unsafe and not reliable, so introduce a minimal get/put
infrastructure to prevent nested usage of the scratch mask and usage
in interrupt context.

Move the declaration of scratch_cpumask to smp.c in order to place the
declaration and the accessors as close as possible.

Signed-off-by: Roger Pau Monné 
---
Changes since v4:
 - Fix get_scratch_cpumask call order in _clear_irq_vector.
 - Remove double newline in __do_update_va_mapping.
 - Constify scratch_cpumask_use.
 - Don't explicitly print the address in the format string, a followup
   patch will be sent to make %p[sS] do so.

Changes since v3:
 - Fix commit message.
 - Split the cpumask taken section into two in _clear_irq_vector.
 - Add an empty statement in do_mmuext_op to avoid a break.
 - Change the logic used to release the scratch cpumask in
   __do_update_va_mapping.
 - Add a %ps print to scratch_cpumask helper.
 - Remove printing the current IP, as that would be done by BUG
   anyway.
 - Pass the cpumask to put_scratch_cpumask and zap the pointer.

Changes since v1:
 - Use __builtin_return_address(0) instead of __func__.
 - Move declaration of scratch_cpumask and scratch_cpumask accessor to
   smp.c.
 - Do not allow usage in #MC or #NMI context.
---
 xen/arch/x86/io_apic.c|  6 --
 xen/arch/x86/irq.c| 14 ++
 xen/arch/x86/mm.c | 39 +++
 xen/arch/x86/msi.c|  4 +++-
 xen/arch/x86/smp.c| 25 +
 xen/arch/x86/smpboot.c|  1 -
 xen/include/asm-x86/smp.h | 14 ++
 7 files changed, 83 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e98e08e9c8..0bb994f0ba 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2236,10 +2236,11 @@ int io_apic_set_pci_routing (int ioapic, int pin, int 
irq, int edge_level, int a
 entry.vector = vector;
 
 if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) {
-cpumask_t *mask = this_cpu(scratch_cpumask);
+cpumask_t *mask = get_scratch_cpumask();
 
 cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
 SET_DEST(entry, logical, cpu_mask_to_apicid(mask));
+put_scratch_cpumask(mask);
 } else {
 printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n",
irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS));
@@ -2433,10 +2434,11 @@ int ioapic_guest_write(unsigned long physbase, unsigned 
int reg, u32 val)
 
 if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) )
 {
-cpumask_t *mask = this_cpu(scratch_cpumask);
+cpumask_t *mask = get_scratch_cpumask();
 
 cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
 SET_DEST(rte, logical, cpu_mask_to_apicid(mask));
+put_scratch_cpumask(mask);
 }
 else
 {
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index cc2eb8e925..0a526ee800 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
 {
 unsigned int cpu, old_vector, irq = desc->irq;
 unsigned int vector = desc->arch.vector;
-cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
+cpumask_t *tmp_mask = get_scratch_cpumask();
 
 BUG_ON(!valid_irq_vector(vector));
 
@@ -208,6 +208,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
 ASSERT(per_cpu(vector_irq, cpu)[vector] == irq);
 per_cpu(vector_irq, cpu)[vector] = ~irq;
 }
+put_scratch_cpumask(tmp_mask);
 
 desc->arch.vector = IRQ_VECTOR_UNASSIGNED;
 cpumask_clear(desc->arch.cpu_mask);
@@ -227,8 +228,9 @@ static void _clear_irq_vector(struct irq_desc *desc)
 
 /* If we were in motion, also clear desc->arch.old_vector */
 old_vector = desc->arch.old_vector;
-cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
 
+tmp_mask = get_scratch_cpumask();
+cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
 for_each_cpu(cpu, tmp_mask)
 {
 ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq);
@@ -236,6 +238,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
 per_cpu(vector_irq, cpu)[old_vector] = ~irq;
 }
 
+put_scratch_cpumask(tmp_mask);
 release_old_vec(desc);
 
 desc->arch.move_in_progress = 0;
@@ -1152,10 +1155,11 @@ static void irq_guest_eoi_timer_fn(void *data)
 break;
 
 case ACKTYPE_EOI:
-cpu_eoi_map = this_cpu(scratch_cpumask);
+cpu_eoi_map = get_scratch_cpumask();
 cpumask_copy(cpu_eoi_map, action->cpu_eoi_map);
 spin_unlock_irq(&desc->lock);
 on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0);
+put_scratch_cpumask(cpu_eoi_map);
 return;
 }
 
@@ -25

Re: [Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask

2020-02-28 Thread Jan Beulich
On 28.02.2020 12:41, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 12:15:21PM +0100, Jan Beulich wrote:
>> On 28.02.2020 11:31, Roger Pau Monné wrote:
>>> On Fri, Feb 28, 2020 at 11:16:55AM +0100, Jan Beulich wrote:
 On 28.02.2020 10:33, Roger Pau Monne wrote:
> +/*
> + * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or 
> #NMI
> + * context.
> + */
> +BUG_ON(in_irq() || in_mce_handler() || in_nmi_handler());
> +
> +if ( use && unlikely(this_cpu(scratch_cpumask_use)) )
> +{
> +printk("scratch CPU mask already in use by %ps (%p)\n",
> +   this_cpu(scratch_cpumask_use), 
> this_cpu(scratch_cpumask_use));

 Why the raw %p as well? We don't do so elsewhere, I think. Yes,
 it's debugging code only, but I wonder anyway.
>>>
>>> I use addr2line to find the offending line, and it's much easier to do
>>> so if you have the address directly, rather than having to use nm in
>>> order to figure out the address of the symbol and then add the offset.
>>>
>>> Maybe I'm missing some other way to do this more easily?
>>
>> In such a case we may want to consider making %ps (and %pS)
>> print a hex presentation next to the decoded one, in debug
>> builds at least. Andrew, thoughts? (There may be cases where
>> this is not wanted, bit if we made this a debug mode only
>> feature, I think it wouldn't do too much harm.)
> 
> If you agree to make %p[sS] print the address then I can drop this
> and send a patch to that effect (likely next week).

In principle I agree, but the effect in particular on stack
dumps needs to be looked at pretty closely.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [linux-5.4 test] 147670: regressions - FAIL

2020-02-28 Thread osstest service owner
flight 147670 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/147670/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 
146121
 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 
146121
 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 146121

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat  fail pass in 147586

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass

version targeted for testing:
 linuxf22dcb31727e3cf31a9143437f134ea133021982
baseline version:
 linux122179cb7d648a6f36b20dd6bf34f953cb384c30

Last test of basis   146121  2020-01-15 17:42:04 Z   43 days
Failing since 

Re: [Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 12:15:21PM +0100, Jan Beulich wrote:
> On 28.02.2020 11:31, Roger Pau Monné wrote:
> > On Fri, Feb 28, 2020 at 11:16:55AM +0100, Jan Beulich wrote:
> >> On 28.02.2020 10:33, Roger Pau Monne wrote:
> >>> +/*
> >>> + * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or 
> >>> #NMI
> >>> + * context.
> >>> + */
> >>> +BUG_ON(in_irq() || in_mce_handler() || in_nmi_handler());
> >>> +
> >>> +if ( use && unlikely(this_cpu(scratch_cpumask_use)) )
> >>> +{
> >>> +printk("scratch CPU mask already in use by %ps (%p)\n",
> >>> +   this_cpu(scratch_cpumask_use), 
> >>> this_cpu(scratch_cpumask_use));
> >>
> >> Why the raw %p as well? We don't do so elsewhere, I think. Yes,
> >> it's debugging code only, but I wonder anyway.
> > 
> > I use addr2line to find the offending line, and it's much easier to do
> > so if you have the address directly, rather than having to use nm in
> > order to figure out the address of the symbol and then add the offset.
> > 
> > Maybe I'm missing some other way to do this more easily?
> 
> In such a case we may want to consider making %ps (and %pS)
> print a hex presentation next to the decoded one, in debug
> builds at least. Andrew, thoughts? (There may be cases where
> this is not wanted, bit if we made this a debug mode only
> feature, I think it wouldn't do too much harm.)

If you agree to make %p[sS] print the address then I can drop this
and send a patch to that effect (likely next week).

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PULL 0/3] Xen queue 2020-02-27

2020-02-28 Thread Peter Maydell
On Thu, 27 Feb 2020 at 12:16, Anthony PERARD  wrote:
>
> The following changes since commit db736e0437aa6fd7c1b7e4599c17f9619ab6b837:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2020-02-25 13:31:16 +)
>
> are available in the Git repository at:
>
>   https://xenbits.xen.org/git-http/people/aperard/qemu-dm.git 
> tags/pull-xen-20200227
>
> for you to fetch changes up to 5d4c954931ba62661c6a1bc16ce604a012a10007:
>
>   Memory: Only call ramblock_ptr when needed in qemu_ram_writeback 
> (2020-02-27 11:50:30 +)
>
> 
> Xen queue 2020-02-27
>
> * fix for xen-block
> * fix in exec.c for migration of xen guest
> * one cleanup patch
>
> 

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask

2020-02-28 Thread Jan Beulich
On 28.02.2020 11:31, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 11:16:55AM +0100, Jan Beulich wrote:
>> On 28.02.2020 10:33, Roger Pau Monne wrote:
>>> +/*
>>> + * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or #NMI
>>> + * context.
>>> + */
>>> +BUG_ON(in_irq() || in_mce_handler() || in_nmi_handler());
>>> +
>>> +if ( use && unlikely(this_cpu(scratch_cpumask_use)) )
>>> +{
>>> +printk("scratch CPU mask already in use by %ps (%p)\n",
>>> +   this_cpu(scratch_cpumask_use), 
>>> this_cpu(scratch_cpumask_use));
>>
>> Why the raw %p as well? We don't do so elsewhere, I think. Yes,
>> it's debugging code only, but I wonder anyway.
> 
> I use addr2line to find the offending line, and it's much easier to do
> so if you have the address directly, rather than having to use nm in
> order to figure out the address of the symbol and then add the offset.
> 
> Maybe I'm missing some other way to do this more easily?

In such a case we may want to consider making %ps (and %pS)
print a hex presentation next to the decoded one, in debug
builds at least. Andrew, thoughts? (There may be cases where
this is not wanted, bit if we made this a debug mode only
feature, I think it wouldn't do too much harm.)

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore suspend event channel node optional

2020-02-28 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 28 February 2020 10:26
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Anthony PERARD ; Ian Jackson
> ; Wei Liu 
> Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore
> suspend event channel node optional
> 
> Hi Paul,
> 
> On 28/02/2020 09:28, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Julien Grall 
> >> Sent: 27 February 2020 22:52
> >> To: Durrant, Paul ; xen-
> de...@lists.xenproject.org
> >> Cc: Anthony PERARD ; Ian Jackson
> >> ; Wei Liu 
> >> Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore
> >> suspend event channel node optional
> >>
> >> Hi,
> >>
> >> On 26/02/2020 16:08, Paul Durrant wrote:
> >>> The purpose and semantics of the node are explained in
> >>> xenstore-paths.pandoc [1]. It was originally introduced in xend by
> >> commit
> >>> 17636f47a474 "Teach xc_save to use event-channel-based domain suspend
> if
> >>> available.". Note that, because, the top-level frontend 'device' node
> >> was
> >>> created writable by the guest in xend, there was no need to explicitly
> >>> create the 'suspend-event-channel' node as writable node.
> >>>
> >>> However, libxl creates the 'device' node as read-only by the guest and
> >> so
> >>> explicit creation of the 'suspend-event-channel' node is necessary to
> >> make
> >>> it usable. This unfortunately has the side-effect of making some old
> >>> Windows PV drivers [2] cease to function. This is because they scan
> the
> >> top
> >>> level 'device' node, find the 'suspend' node and expect it to contain
> >> the
> >>> usual sub-nodes describing a PV frontend. When this is found not to be
> >> the
> >>> case, enumeration ceases and (because the 'suspend' node is observed
> >> before
> >>> the 'vbd' node) no system disk is enumerated. Windows will then crash
> >> with
> >>> bugcheck code 0x7B.
> >>>
> >>> This patch adds a boolean 'suspend_event_channel' field into
> >>> libxl_create_info to control whether the xenstore node is created and
> a
> >>> similarly named option in xl.cfg which, for compatibility with
> previous
> >>> libxl behaviour, defaults to true.
> >>>
> >>> [1]
> >> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore-
> >> paths.pandoc;hb=HEAD#l177
> >>> [2] https://access.redhat.com/documentation/en-
> >> us/red_hat_enterprise_linux/5/html/para-
> >> virtualized_windows_drivers_guide/sect-para-
> >> virtualized_windows_drivers_guide-
> >> installing_and_configuring_the_para_virtualized_drivers-
> >> installing_the_para_virtualized_drivers
> >>>
> >>> NOTE: While adding the new LIBXL_HAVE_CREATEINFO_SUSPEND_EVENT_CHANNEL
> >>> definition into libxl.h, this patch corrects the previous
> stanza
> >>> which erroneously implies ibxl_domain_create_info is a
> function.
> >>>
> >>> Signed-off-by: Paul Durrant 
> >>> ---
> >>> Cc: Ian Jackson 
> >>> Cc: Wei Liu 
> >>> Cc: Anthony PERARD 
> >>> ---
> >>>docs/man/xl.cfg.5.pod.in|  7 +++
> >>>tools/libxl/libxl.h | 13 -
> >>>tools/libxl/libxl_create.c  | 12 +---
> >>>tools/libxl/libxl_types.idl |  1 +
> >>>tools/xl/xl_parse.c |  3 +++
> >>
> >> You may want to update xenstore-paths.pandoc as the document mention
> the
> >> node will be created by the toolstack.
> >>
> >
> > The doc already says that:
> >
> > -
> >  ~/device/suspend/event-channel = ""|EVTCHN [w]
> >
> > The domain's suspend event channel. The toolstack will create this
> > path with an empty value which the guest may choose to overwrite.
> > -
> 
> The paragraph you quoted does not suggest the node may not exist. So I
> think you want to update the documentation to reflect the node may not
> exist.
> 
> >>>5 files changed, 32 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> >>> index 0cad561375..5f476f1e1d 100644
> >>> --- a/docs/man/xl.cfg.5.pod.in
> >>> +++ b/docs/man/xl.cfg.5.pod.in
> >>> @@ -668,6 +668,13 @@ file.
> >>>
> >>>=back
> >>>
> >>> +=item B
> >>> +
> >>> +Create the xenstore path for the domain's suspend event channel. The
> >>> +existence of this path can cause problems with older PV drivers
> running
> >>> +in the guest. If this option is not specified then it will default to
> >>> +B.
> >>
> >> In the next patch you are going to make device/ rw. Do you see any
> issue
> >> with just not creating the node for everyone? Are PV drivers allowed to
> >> assume a node will be present?
> >
> > Well that's the problem. Some of them may have become accustomed to it
> being present. Also the documentation does say it is created by the
> toolstack (but not when). Perhaps I should update the doc to say the
> toolstack *may* create this path when the domain is created.
> 
> Hmmm fair point. However, this now means you may need to modify your
> configuration file depending on the PV driver installed.
> 
> This is not a very ideal

Re: [Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 11:16:55AM +0100, Jan Beulich wrote:
> On 28.02.2020 10:33, Roger Pau Monne wrote:
> > Current usage of the per-CPU scratch cpumask is dangerous since
> > there's no way to figure out if the mask is already being used except
> > for manual code inspection of all the callers and possible call paths.
> > 
> > This is unsafe and not reliable, so introduce a minimal get/put
> > infrastructure to prevent nested usage of the scratch mask and usage
> > in interrupt context.
> > 
> > Move the definition of scratch_cpumask to smp.c in order to place the
> > declaration and the accessors as close as possible.
> 
> You've changed one instance of "declaration", but not also the other.

Oh, sorry. Sadly you are not the only one with a cold this week :).

> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
> >  {
> >  unsigned int cpu, old_vector, irq = desc->irq;
> >  unsigned int vector = desc->arch.vector;
> > -cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
> > +cpumask_t *tmp_mask = get_scratch_cpumask();
> >  
> >  BUG_ON(!valid_irq_vector(vector));
> >  
> > @@ -208,6 +208,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
> >  ASSERT(per_cpu(vector_irq, cpu)[vector] == irq);
> >  per_cpu(vector_irq, cpu)[vector] = ~irq;
> >  }
> > +put_scratch_cpumask(tmp_mask);
> >  
> >  desc->arch.vector = IRQ_VECTOR_UNASSIGNED;
> >  cpumask_clear(desc->arch.cpu_mask);
> > @@ -227,8 +228,9 @@ static void _clear_irq_vector(struct irq_desc *desc)
> >  
> >  /* If we were in motion, also clear desc->arch.old_vector */
> >  old_vector = desc->arch.old_vector;
> > -cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
> >  
> > +cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
> > +tmp_mask = get_scratch_cpumask();
> 
> Did you test this? It looks overwhelmingly likely that the two
> lines need to be the other way around.

Urg, yes, I've tested it but likely missed to trigger this case and
even worse failed to spot it on my own review. It's obviously wrong.

> > +/*
> > + * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or #NMI
> > + * context.
> > + */
> > +BUG_ON(in_irq() || in_mce_handler() || in_nmi_handler());
> > +
> > +if ( use && unlikely(this_cpu(scratch_cpumask_use)) )
> > +{
> > +printk("scratch CPU mask already in use by %ps (%p)\n",
> > +   this_cpu(scratch_cpumask_use), 
> > this_cpu(scratch_cpumask_use));
> 
> Why the raw %p as well? We don't do so elsewhere, I think. Yes,
> it's debugging code only, but I wonder anyway.

I use addr2line to find the offending line, and it's much easier to do
so if you have the address directly, rather than having to use nm in
order to figure out the address of the symbol and then add the offset.

Maybe I'm missing some other way to do this more easily?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore suspend event channel node optional

2020-02-28 Thread Julien Grall

Hi Paul,

On 28/02/2020 09:28, Durrant, Paul wrote:

-Original Message-
From: Julien Grall 
Sent: 27 February 2020 22:52
To: Durrant, Paul ; xen-devel@lists.xenproject.org
Cc: Anthony PERARD ; Ian Jackson
; Wei Liu 
Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore
suspend event channel node optional

Hi,

On 26/02/2020 16:08, Paul Durrant wrote:

The purpose and semantics of the node are explained in
xenstore-paths.pandoc [1]. It was originally introduced in xend by

commit

17636f47a474 "Teach xc_save to use event-channel-based domain suspend if
available.". Note that, because, the top-level frontend 'device' node

was

created writable by the guest in xend, there was no need to explicitly
create the 'suspend-event-channel' node as writable node.

However, libxl creates the 'device' node as read-only by the guest and

so

explicit creation of the 'suspend-event-channel' node is necessary to

make

it usable. This unfortunately has the side-effect of making some old
Windows PV drivers [2] cease to function. This is because they scan the

top

level 'device' node, find the 'suspend' node and expect it to contain

the

usual sub-nodes describing a PV frontend. When this is found not to be

the

case, enumeration ceases and (because the 'suspend' node is observed

before

the 'vbd' node) no system disk is enumerated. Windows will then crash

with

bugcheck code 0x7B.

This patch adds a boolean 'suspend_event_channel' field into
libxl_create_info to control whether the xenstore node is created and a
similarly named option in xl.cfg which, for compatibility with previous
libxl behaviour, defaults to true.

[1]

https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore-
paths.pandoc;hb=HEAD#l177

[2] https://access.redhat.com/documentation/en-

us/red_hat_enterprise_linux/5/html/para-
virtualized_windows_drivers_guide/sect-para-
virtualized_windows_drivers_guide-
installing_and_configuring_the_para_virtualized_drivers-
installing_the_para_virtualized_drivers


NOTE: While adding the new LIBXL_HAVE_CREATEINFO_SUSPEND_EVENT_CHANNEL
definition into libxl.h, this patch corrects the previous stanza
which erroneously implies ibxl_domain_create_info is a function.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Anthony PERARD 
---
   docs/man/xl.cfg.5.pod.in|  7 +++
   tools/libxl/libxl.h | 13 -
   tools/libxl/libxl_create.c  | 12 +---
   tools/libxl/libxl_types.idl |  1 +
   tools/xl/xl_parse.c |  3 +++


You may want to update xenstore-paths.pandoc as the document mention the
node will be created by the toolstack.



The doc already says that:

-
 ~/device/suspend/event-channel = ""|EVTCHN [w]
  
The domain's suspend event channel. The toolstack will create this

path with an empty value which the guest may choose to overwrite.
-


The paragraph you quoted does not suggest the node may not exist. So I 
think you want to update the documentation to reflect the node may not 
exist.



   5 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 0cad561375..5f476f1e1d 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -668,6 +668,13 @@ file.

   =back

+=item B
+
+Create the xenstore path for the domain's suspend event channel. The
+existence of this path can cause problems with older PV drivers running
+in the guest. If this option is not specified then it will default to
+B.


In the next patch you are going to make device/ rw. Do you see any issue
with just not creating the node for everyone? Are PV drivers allowed to
assume a node will be present?


Well that's the problem. Some of them may have become accustomed to it being 
present. Also the documentation does say it is created by the toolstack (but 
not when). Perhaps I should update the doc to say the toolstack *may* create 
this path when the domain is created.


Hmmm fair point. However, this now means you may need to modify your 
configuration file depending on the PV driver installed.


This is not a very ideal situation to be in when you have to upgrade 
your OS (imagine the new PV driver requires the path).


How do you envision this to work?





My knowledge of xenstore is limited, so I thought I would ask the
questions to understand a bit more how stable the ABI is meant to be. :).


Stable? That'll be the day :-)


Thank you for the confirmation :).

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/2] x86/smp: use a dedicated CPU mask in send_IPI_mask

2020-02-28 Thread Roger Pau Monné
On Fri, Feb 28, 2020 at 11:08:43AM +0100, Jan Beulich wrote:
> On 28.02.2020 10:33, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/smp.c
> > +++ b/xen/arch/x86/smp.c
> > @@ -59,6 +59,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int 
> > vector,
> >  apic_write(APIC_ICR, cfg);
> >  }
> >  
> > +DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask);
> 
> This needs to be put in a header, so that ...
> 
> > --- a/xen/arch/x86/smpboot.c
> > +++ b/xen/arch/x86/smpboot.c
> > @@ -57,6 +57,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
> >  DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
> >  static cpumask_t scratch_cpu0mask;
> >  
> > +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask);
> 
> ... this gets compiled with having seen the declaration, such
> that if one gets changed without also changing the other, the
> build will break.

Right, was trying to limit the scope of the declaration, but your
suggestion makes sense.

> Everything else looks fine to me.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask

2020-02-28 Thread Jan Beulich
On 28.02.2020 10:33, Roger Pau Monne wrote:
> Current usage of the per-CPU scratch cpumask is dangerous since
> there's no way to figure out if the mask is already being used except
> for manual code inspection of all the callers and possible call paths.
> 
> This is unsafe and not reliable, so introduce a minimal get/put
> infrastructure to prevent nested usage of the scratch mask and usage
> in interrupt context.
> 
> Move the definition of scratch_cpumask to smp.c in order to place the
> declaration and the accessors as close as possible.

You've changed one instance of "declaration", but not also the other.

> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
>  {
>  unsigned int cpu, old_vector, irq = desc->irq;
>  unsigned int vector = desc->arch.vector;
> -cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
> +cpumask_t *tmp_mask = get_scratch_cpumask();
>  
>  BUG_ON(!valid_irq_vector(vector));
>  
> @@ -208,6 +208,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
>  ASSERT(per_cpu(vector_irq, cpu)[vector] == irq);
>  per_cpu(vector_irq, cpu)[vector] = ~irq;
>  }
> +put_scratch_cpumask(tmp_mask);
>  
>  desc->arch.vector = IRQ_VECTOR_UNASSIGNED;
>  cpumask_clear(desc->arch.cpu_mask);
> @@ -227,8 +228,9 @@ static void _clear_irq_vector(struct irq_desc *desc)
>  
>  /* If we were in motion, also clear desc->arch.old_vector */
>  old_vector = desc->arch.old_vector;
> -cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
>  
> +cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
> +tmp_mask = get_scratch_cpumask();

Did you test this? It looks overwhelmingly likely that the two
lines need to be the other way around.

> @@ -4384,6 +4389,17 @@ static int __do_update_va_mapping(
>  break;
>  }
>  
> +switch ( flags & ~UVMF_FLUSHTYPE_MASK )
> +{
> +case UVMF_LOCAL:
> +case UVMF_ALL:
> +break;
> +
> +default:
> +put_scratch_cpumask(mask);
> +}
> +
> +
>  return rc;

No two successive blank lines please.

> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -25,6 +25,31 @@
>  #include 
>  #include 
>  
> +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
> +
> +#ifndef NDEBUG
> +cpumask_t *scratch_cpumask(bool use)
> +{
> +static DEFINE_PER_CPU(void *, scratch_cpumask_use);

I'd make this "const void *", btw.

> +/*
> + * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or #NMI
> + * context.
> + */
> +BUG_ON(in_irq() || in_mce_handler() || in_nmi_handler());
> +
> +if ( use && unlikely(this_cpu(scratch_cpumask_use)) )
> +{
> +printk("scratch CPU mask already in use by %ps (%p)\n",
> +   this_cpu(scratch_cpumask_use), this_cpu(scratch_cpumask_use));

Why the raw %p as well? We don't do so elsewhere, I think. Yes,
it's debugging code only, but I wonder anyway.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/2] x86/smp: use a dedicated CPU mask in send_IPI_mask

2020-02-28 Thread Jan Beulich
On 28.02.2020 10:33, Roger Pau Monne wrote:
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -59,6 +59,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int 
> vector,
>  apic_write(APIC_ICR, cfg);
>  }
>  
> +DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask);

This needs to be put in a header, so that ...

> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -57,6 +57,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
>  DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
>  static cpumask_t scratch_cpu0mask;
>  
> +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask);

... this gets compiled with having seen the declaration, such
that if one gets changed without also changing the other, the
build will break.

Everything else looks fine to me.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore suspend event channel node optional

2020-02-28 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 27 February 2020 22:52
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Anthony PERARD ; Ian Jackson
> ; Wei Liu 
> Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore
> suspend event channel node optional
> 
> Hi,
> 
> On 26/02/2020 16:08, Paul Durrant wrote:
> > The purpose and semantics of the node are explained in
> > xenstore-paths.pandoc [1]. It was originally introduced in xend by
> commit
> > 17636f47a474 "Teach xc_save to use event-channel-based domain suspend if
> > available.". Note that, because, the top-level frontend 'device' node
> was
> > created writable by the guest in xend, there was no need to explicitly
> > create the 'suspend-event-channel' node as writable node.
> >
> > However, libxl creates the 'device' node as read-only by the guest and
> so
> > explicit creation of the 'suspend-event-channel' node is necessary to
> make
> > it usable. This unfortunately has the side-effect of making some old
> > Windows PV drivers [2] cease to function. This is because they scan the
> top
> > level 'device' node, find the 'suspend' node and expect it to contain
> the
> > usual sub-nodes describing a PV frontend. When this is found not to be
> the
> > case, enumeration ceases and (because the 'suspend' node is observed
> before
> > the 'vbd' node) no system disk is enumerated. Windows will then crash
> with
> > bugcheck code 0x7B.
> >
> > This patch adds a boolean 'suspend_event_channel' field into
> > libxl_create_info to control whether the xenstore node is created and a
> > similarly named option in xl.cfg which, for compatibility with previous
> > libxl behaviour, defaults to true.
> >
> > [1]
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore-
> paths.pandoc;hb=HEAD#l177
> > [2] https://access.redhat.com/documentation/en-
> us/red_hat_enterprise_linux/5/html/para-
> virtualized_windows_drivers_guide/sect-para-
> virtualized_windows_drivers_guide-
> installing_and_configuring_the_para_virtualized_drivers-
> installing_the_para_virtualized_drivers
> >
> > NOTE: While adding the new LIBXL_HAVE_CREATEINFO_SUSPEND_EVENT_CHANNEL
> >definition into libxl.h, this patch corrects the previous stanza
> >which erroneously implies ibxl_domain_create_info is a function.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Ian Jackson 
> > Cc: Wei Liu 
> > Cc: Anthony PERARD 
> > ---
> >   docs/man/xl.cfg.5.pod.in|  7 +++
> >   tools/libxl/libxl.h | 13 -
> >   tools/libxl/libxl_create.c  | 12 +---
> >   tools/libxl/libxl_types.idl |  1 +
> >   tools/xl/xl_parse.c |  3 +++
> 
> You may want to update xenstore-paths.pandoc as the document mention the
> node will be created by the toolstack.
> 

The doc already says that:

-
 ~/device/suspend/event-channel = ""|EVTCHN [w]
 
The domain's suspend event channel. The toolstack will create this
path with an empty value which the guest may choose to overwrite.
-

> >   5 files changed, 32 insertions(+), 4 deletions(-)
> >
> > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> > index 0cad561375..5f476f1e1d 100644
> > --- a/docs/man/xl.cfg.5.pod.in
> > +++ b/docs/man/xl.cfg.5.pod.in
> > @@ -668,6 +668,13 @@ file.
> >
> >   =back
> >
> > +=item B
> > +
> > +Create the xenstore path for the domain's suspend event channel. The
> > +existence of this path can cause problems with older PV drivers running
> > +in the guest. If this option is not specified then it will default to
> > +B.
> 
> In the next patch you are going to make device/ rw. Do you see any issue
> with just not creating the node for everyone? Are PV drivers allowed to
> assume a node will be present?

Well that's the problem. Some of them may have become accustomed to it being 
present. Also the documentation does say it is created by the toolstack (but 
not when). Perhaps I should update the doc to say the toolstack *may* create 
this path when the domain is created.

> 
> My knowledge of xenstore is limited, so I thought I would ask the
> questions to understand a bit more how stable the ABI is meant to be. :).

Stable? That'll be the day :-)

  Paul

> 
> Cheers,
> 
> --
> Julien Grall
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/grant: Fix signed/unsigned comparisons issues

2020-02-28 Thread Julien Grall

Hi Jan,

On 28/02/2020 09:19, Jan Beulich wrote:

On 28.02.2020 10:09, Julien Grall wrote:

Hi Jan,

On 28/02/2020 08:41, Jan Beulich wrote:

On 27.02.2020 19:46, Andrew Cooper wrote:

Each function takes an unsigned count, and loops based on a signed i.  This
causes problems when between 2 and 4 billion operations are requested.


I can see problems, but not (as the title says) with comparison issues
(the signed i gets converted to unsigned for the purpose of the
comparison).


In practice, signed-ness issues aren't possible because count exceeding
INT_MAX is excluded earlier in do_grant_op(), but the code reads as if it is
buggy, and GCC obviously can't spot this either.

Bloat-o-meter reports:
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-95 (-95)
Function old new   delta
do_grant_table_op   71557140 -15
gnttab_transfer 27322716 -16
gnttab_unmap_grant_ref   771 739 -32
gnttab_unmap_and_replace 771 739 -32
Total: Before=2996364, After=2996269, chg -0.00%

and inspection of gnttab_unmap_grant_ref() at least reveals one fewer local
variables on the stack.


This is a good thing for x86. However, having started to familiarize
myself a tiny bit with RISC-V, I'm no longer certain our present way
of preferring unsigned int for array indexing variable and alike is
actually a good thing without further abstraction. Nevertheless, this
isn't an immediate issue, so ...


Would you mind expanding a bit more about this comment here? How
unsigned int is going to make things worse on RISC-V?


Other than x86-64 and Arm64, 64-bit (and wider) RISC-V sign-extend
the result of 32-bit operations by default. Hence for an unsigned
32-bit type to be used as array index, an additional zero-extending
insn is going to be needed (just like for our existing ports a
sign-extending insn is needed when array indexing variables are
calculated as 32-bit signed quantities, which is one of the reasons
why right now we prefer unsigned int in such cases).


Meh, I am not convinced this is a good enough reason to switch array 
indexes to signed int for RISC-V. There are probably better place to 
look at optimization in common code and would benefits all arch.


Regarding the reason for "unsigned int", I don't request it because it 
produce "shorter' code but because there is no reason to have a signed 
index for array.


Anyway, let's cross that bridge when we have an actual RISC-V port for 
Xen merged.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask

2020-02-28 Thread Roger Pau Monne
Current usage of the per-CPU scratch cpumask is dangerous since
there's no way to figure out if the mask is already being used except
for manual code inspection of all the callers and possible call paths.

This is unsafe and not reliable, so introduce a minimal get/put
infrastructure to prevent nested usage of the scratch mask and usage
in interrupt context.

Move the definition of scratch_cpumask to smp.c in order to place the
declaration and the accessors as close as possible.

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - Fix commit message.
 - Split the cpumask taken section into two in _clear_irq_vector.
 - Add an empty statement in do_mmuext_op to avoid a break.
 - Change the logic used to release the scratch cpumask in
   __do_update_va_mapping.
 - Add a %ps print to scratch_cpumask helper.
 - Remove printing the current IP, as that would be done by BUG
   anyway.
 - Pass the cpumask to put_scratch_cpumask and zap the pointer.

Changes since v1:
 - Use __builtin_return_address(0) instead of __func__.
 - Move declaration of scratch_cpumask and scratch_cpumask accessor to
   smp.c.
 - Do not allow usage in #MC or #NMI context.
---
 xen/arch/x86/io_apic.c|  6 --
 xen/arch/x86/irq.c| 14 ++
 xen/arch/x86/mm.c | 40 +++
 xen/arch/x86/msi.c|  4 +++-
 xen/arch/x86/smp.c| 25 
 xen/arch/x86/smpboot.c|  1 -
 xen/include/asm-x86/smp.h | 14 ++
 7 files changed, 84 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e98e08e9c8..0bb994f0ba 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2236,10 +2236,11 @@ int io_apic_set_pci_routing (int ioapic, int pin, int 
irq, int edge_level, int a
 entry.vector = vector;
 
 if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) {
-cpumask_t *mask = this_cpu(scratch_cpumask);
+cpumask_t *mask = get_scratch_cpumask();
 
 cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
 SET_DEST(entry, logical, cpu_mask_to_apicid(mask));
+put_scratch_cpumask(mask);
 } else {
 printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n",
irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS));
@@ -2433,10 +2434,11 @@ int ioapic_guest_write(unsigned long physbase, unsigned 
int reg, u32 val)
 
 if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) )
 {
-cpumask_t *mask = this_cpu(scratch_cpumask);
+cpumask_t *mask = get_scratch_cpumask();
 
 cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
 SET_DEST(rte, logical, cpu_mask_to_apicid(mask));
+put_scratch_cpumask(mask);
 }
 else
 {
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index cc2eb8e925..19488dae21 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
 {
 unsigned int cpu, old_vector, irq = desc->irq;
 unsigned int vector = desc->arch.vector;
-cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
+cpumask_t *tmp_mask = get_scratch_cpumask();
 
 BUG_ON(!valid_irq_vector(vector));
 
@@ -208,6 +208,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
 ASSERT(per_cpu(vector_irq, cpu)[vector] == irq);
 per_cpu(vector_irq, cpu)[vector] = ~irq;
 }
+put_scratch_cpumask(tmp_mask);
 
 desc->arch.vector = IRQ_VECTOR_UNASSIGNED;
 cpumask_clear(desc->arch.cpu_mask);
@@ -227,8 +228,9 @@ static void _clear_irq_vector(struct irq_desc *desc)
 
 /* If we were in motion, also clear desc->arch.old_vector */
 old_vector = desc->arch.old_vector;
-cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
 
+cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
+tmp_mask = get_scratch_cpumask();
 for_each_cpu(cpu, tmp_mask)
 {
 ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq);
@@ -236,6 +238,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
 per_cpu(vector_irq, cpu)[old_vector] = ~irq;
 }
 
+put_scratch_cpumask(tmp_mask);
 release_old_vec(desc);
 
 desc->arch.move_in_progress = 0;
@@ -1152,10 +1155,11 @@ static void irq_guest_eoi_timer_fn(void *data)
 break;
 
 case ACKTYPE_EOI:
-cpu_eoi_map = this_cpu(scratch_cpumask);
+cpu_eoi_map = get_scratch_cpumask();
 cpumask_copy(cpu_eoi_map, action->cpu_eoi_map);
 spin_unlock_irq(&desc->lock);
 on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0);
+put_scratch_cpumask(cpu_eoi_map);
 return;
 }
 
@@ -2531,12 +2535,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 unsigned int irq;
 static int warned;
 struct irq_desc *desc;
+cpumask_t *affinity = get_scratch_cpumask();
 
 for ( irq = 0; irq < nr_irqs; irq++ )
 {
 bool break_affinity = fa

[Xen-devel] [PATCH v4 0/2] x86: scratch cpumask fixes/improvement

2020-02-28 Thread Roger Pau Monne
Hello,

Following series contain yet one more bugfix that removes the usage of
the scratch cpumask in send_IPI_mask and the introduction of accessors
to get/put the per-CPU scratch cpumask in order to prevent such issues
form happening in the future.

Thanks, Roger.

Roger Pau Monne (2):
  x86/smp: use a dedicated CPU mask in send_IPI_mask
  x86: add accessors for scratch cpu mask

 xen/arch/x86/io_apic.c|  6 --
 xen/arch/x86/irq.c| 14 ++
 xen/arch/x86/mm.c | 40 +++
 xen/arch/x86/msi.c|  4 +++-
 xen/arch/x86/smp.c| 29 +++-
 xen/arch/x86/smpboot.c| 10 --
 xen/include/asm-x86/smp.h | 14 ++
 7 files changed, 95 insertions(+), 22 deletions(-)

-- 
2.25.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v4 1/2] x86/smp: use a dedicated CPU mask in send_IPI_mask

2020-02-28 Thread Roger Pau Monne
Some callers of send_IPI_mask pass the scratch cpumask as the mask
parameter of send_IPI_mask, so the scratch cpumask cannot be used by
the function. The following trace has been obtained with a debug patch
and shows one of those callers:

(XEN) scratch CPU mask already in use by 
arch/x86/mm.c#_get_page_type+0x1f9/0x1abf
(XEN) Xen BUG at smp.c:45
[...]
(XEN) Xen call trace:
(XEN)[] R scratch_cpumask+0xd3/0xf9
(XEN)[] F send_IPI_mask+0x72/0x1ca
(XEN)[] F flush_area_mask+0x10c/0x16c
(XEN)[] F arch/x86/mm.c#_get_page_type+0x3ff/0x1abf
(XEN)[] F get_page_type+0xe/0x2c
(XEN)[] F pv_set_gdt+0xa1/0x2aa
(XEN)[] F arch_set_info_guest+0x1196/0x16ba
(XEN)[] F default_initialise_vcpu+0xc7/0xd4
(XEN)[] F arch_initialise_vcpu+0x61/0xcd
(XEN)[] F do_vcpu_op+0x219/0x690
(XEN)[] F pv_hypercall+0x2f6/0x593
(XEN)[] F lstar_enter+0x112/0x120

_get_page_type will use the scratch cpumask to call flush_tlb_mask,
which in turn calls send_IPI_mask.

Fix this by using a dedicated per CPU cpumask in send_IPI_mask.

Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when 
possible')
Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/smp.c | 4 +++-
 xen/arch/x86/smpboot.c | 9 -
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 0461812cf6..072638f0f6 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -59,6 +59,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int 
vector,
 apic_write(APIC_ICR, cfg);
 }
 
+DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask);
+
 /*
  * send_IPI_mask(cpumask, vector): sends @vector IPI to CPUs in @cpumask,
  * excluding the local CPU. @cpumask may be empty.
@@ -67,7 +69,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int 
vector,
 void send_IPI_mask(const cpumask_t *mask, int vector)
 {
 bool cpus_locked = false;
-cpumask_t *scratch = this_cpu(scratch_cpumask);
+cpumask_t *scratch = this_cpu(send_ipi_cpumask);
 
 if ( in_irq() || in_mce_handler() || in_nmi_handler() )
 {
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index ad49f2dcd7..6c548b0b53 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -57,6 +57,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
 static cpumask_t scratch_cpu0mask;
 
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask);
+static cpumask_t send_ipi_cpu0mask;
+
 cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
@@ -930,6 +933,8 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove)
 FREE_CPUMASK_VAR(per_cpu(cpu_core_mask, cpu));
 if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
 FREE_CPUMASK_VAR(per_cpu(scratch_cpumask, cpu));
+if ( per_cpu(send_ipi_cpumask, cpu) != &send_ipi_cpu0mask )
+FREE_CPUMASK_VAR(per_cpu(send_ipi_cpumask, cpu));
 }
 
 cleanup_cpu_root_pgt(cpu);
@@ -1034,7 +1039,8 @@ static int cpu_smpboot_alloc(unsigned int cpu)
 
 if ( !(cond_zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
cond_zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
-   cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
+   cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu)) &&
+   cond_alloc_cpumask_var(&per_cpu(send_ipi_cpumask, cpu))) )
 goto out;
 
 rc = 0;
@@ -1175,6 +1181,7 @@ void __init smp_prepare_boot_cpu(void)
 cpumask_set_cpu(cpu, &cpu_present_map);
 #if NR_CPUS > 2 * BITS_PER_LONG
 per_cpu(scratch_cpumask, cpu) = &scratch_cpu0mask;
+per_cpu(send_ipi_cpumask, cpu) = &send_ipi_cpu0mask;
 #endif
 
 get_cpu_info()->use_pv_cr3 = false;
-- 
2.25.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen: make sure stop_machine_run() is always called in a tasklet

2020-02-28 Thread Jan Beulich
On 28.02.2020 10:18, Jürgen Groß wrote:
> On 28.02.20 10:15, Jan Beulich wrote:
>> On 28.02.2020 09:58, Jürgen Groß wrote:
>>> On 28.02.20 09:27, Jan Beulich wrote:
 On 28.02.2020 08:19, Juergen Gross wrote:
> With core scheduling active it is mandatory for stop_machine_run() to
> be called in idle context only (so either during boot or in a tasklet),
> as otherwise a scheduling deadlock would occur: stop_machine_run()
> does a cpu rendezvous by activating a tasklet on all other cpus. In
> case stop_machine_run() was not called in an idle vcpu it would block
> scheduling the idle vcpu on its siblings with core scheduling being
> active, resulting in a hang.
>
> Put a BUG_ON() into stop_machine_run() to test for being called in an
> idle vcpu only and adapt the missing call site (ucode loading) to use a
> tasklet for calling stop_machine_run().
>
> Signed-off-by: Juergen Gross 
> ---
> V2:
> - rephrase commit message (Julien Grall)
> ---
>xen/arch/x86/microcode.c  | 54 
> +--
>xen/common/stop_machine.c |  1 +
>2 files changed, 35 insertions(+), 20 deletions(-)

 There's no mention anywhere of a connection to your RCU series,
 nor to rcu_barrier(). Yet the change puts a new restriction also
 on its use, and iirc this was mentioned in prior discussion. Did
 I miss anything?
>>>
>>> Basically this patch makes the restriction explicit. Without the patch
>>> stop_machine_run() being called outside of a tasklet would just hang
>>> with core scheduling being active. Now it will catch those violations
>>> early even with core scheduling inactive.
>>>
>>> Note that currently there are no violations of this restriction anywhere
>>> in the hypervisor other than the one addressed by this patch.
>>
>> Well, there is a connection to core scheduling. Without it, i.e.
>> prior to 4.13, there was no restriction on the placement of
>> rcu_barrier() invocations. According to what you say above, the
>> restriction was implicitly introduced with core scheduling. It
>> should imo be made explicit by attaching a comment, which would
>> (again imo) best be done here because now you also make this
>> case crash without core scheduling in use.
> 
> Okay, I'll add a comment to stop_machine_run() and to rcu_barrier(). The
> rcu_barrier() comment will be then removed by my rcu series again.

Right - the alternative then would be to call out a dependency of
this patch on that series.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >