Re: [Xen-devel] [PATCH v3] xen: make sure stop_machine_run() is always called in a tasklet
On 28.02.20 20:06, Andrew Cooper wrote: On 28/02/2020 17:13, Juergen Gross wrote: @@ -700,6 +688,32 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) return ret; } +int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) +{ +int ret; +struct ucode_buf *buffer; + +if ( len != (uint32_t)len ) +return -E2BIG; + +if ( microcode_ops == NULL ) +return -EINVAL; + +buffer = xmalloc_flex_struct(struct ucode_buf, buffer, len); +if ( !buffer ) +return -ENOMEM; + +ret = copy_from_guest(buffer->buffer, buf, len); +if ( ret ) +{ +xfree(buffer); +return -EFAULT; +} +buffer->len = len; + +return continue_hypercall_on_cpu(0, microcode_update_helper, buffer); Any reason why cpu 0 here? There is no restriction at the moment, and running the tasklet on the current CPU is surely better than poking CPU0's tasklet queue remotely, then interrupting it. As stop_machine_run() is scheduling a tasklet on all other cpus it doesn't really matter. In the end I don't really mind either way. Everything else looks ok. This adjustments could be done on commit to save a v4. ~Andrew P.S. Might it be sensible to have a continue_hypercall_in_tasklet() wrapper which passes smp_processor_id() into continue_hypercall_on_cpu()? When a second user is coming up, maybe. The other would be continue_hypercall_on_bootcpu(). Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [qemu-mainline test] 147710: regressions - FAIL
flight 147710 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/147710/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-freebsd10-i386 11 guest-startfail REGR. vs. 144861 test-amd64-i386-freebsd10-amd64 11 guest-start fail REGR. vs. 144861 test-amd64-i386-qemuu-rhel6hvm-amd 10 redhat-install fail REGR. vs. 144861 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-qemuu-nested-amd 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-qemuu-rhel6hvm-intel 10 redhat-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-win7-amd64 10 windows-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-qemuu-nested-intel 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-win7-amd64 10 windows-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-ws16-amd64 10 windows-install fail REGR. vs. 144861 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10 fail like 144861 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144861 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144861 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore
[Xen-devel] [linux-4.14 test] 147718: regressions - FAIL
flight 147718 linux-4.14 real [real] http://logs.test-lab.xenproject.org/osstest/logs/147718/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-examine 8 reboot fail REGR. vs. 142849 test-armhf-armhf-xl-credit2 7 xen-boot fail REGR. vs. 142849 test-armhf-armhf-xl 7 xen-boot fail REGR. vs. 142849 test-armhf-armhf-xl-vhd 7 xen-boot fail REGR. vs. 142849 test-armhf-armhf-xl-credit1 7 xen-boot fail REGR. vs. 142849 test-armhf-armhf-libvirt-raw 7 xen-boot fail REGR. vs. 142849 test-armhf-armhf-xl-multivcpu 7 xen-bootfail REGR. vs. 142849 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 142849 test-armhf-armhf-xl-arndale 7 xen-boot fail REGR. vs. 142849 test-armhf-armhf-libvirt 7 xen-boot fail REGR. vs. 142849 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemut-debianhvm-amd64 15 guest-saverestore.2 fail in 147334 pass in 147718 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail pass in 147334 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail pass in 147487 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10 fail like 142849 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail like 142849 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: linux98db2bf27b9ed2d5ed0b6c9c8a4bfcb127a19796 baseline version: linuxb98aebd298246df37b472c52a2ee1023256d02e3 Last test of basis 142849 2019-10-17 21:11:16 Z 134 days Failing since143327 2019-10-29 08:49:30 Z 122 days 19 attempts Testing same since 147094 2020-02-15 17:36:50 Z 13 days9 attempts 1438 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-arm64-xsm
[Xen-devel] CfP for RT-Cloud (Academic) Research Conference
Hello everyone, Please, find here The CfP for the first edition of the RT-Cloud workshop: https://www.ecrts.org/rt-cloud-2020/ It's an academic research event, affiliated with one of the most important academic conferences on real-time systems and real-time scheduling (ECRTS, https://www.ecrts.org/). Here they are the first paragraphs of the CfP: "While most of the current cloud services operate on a best effort basis and provide no timing guarantees, there is a growing interest in cloud applications for industrial and critical applications. However, for cloud technologies to be fully embraced by industry, a higher degree of determinism is desired. This calls for new techniques and methodologies to design predictable and reliable cloud applications." Deadline for writing and submitting a 4 to 6 pages academic research style paper is on April 16, 2020. This being a workshop, it's would be generally ok for the paper to be about ideas, proposals and/or unfinished work. If interested, but in need of more information, don't hesitate to ask me. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [linux-linus test] 147706: regressions - FAIL
flight 147706 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/147706/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-shadow12 guest-start fail REGR. vs. 133580 test-amd64-amd64-xl-shadow 12 guest-start fail REGR. vs. 133580 test-amd64-amd64-xl-credit2 12 guest-start fail REGR. vs. 133580 test-amd64-amd64-xl-multivcpu 12 guest-start fail REGR. vs. 133580 test-amd64-amd64-xl-credit1 12 guest-start fail REGR. vs. 133580 test-arm64-arm64-xl-credit1 12 guest-start fail REGR. vs. 133580 test-arm64-arm64-xl-credit2 12 guest-start fail REGR. vs. 133580 test-armhf-armhf-xl-multivcpu 12 guest-start fail REGR. vs. 133580 test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail REGR. vs. 133580 test-armhf-armhf-xl-credit2 12 guest-start fail REGR. vs. 133580 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 133580 test-armhf-armhf-xl-credit1 12 guest-start fail REGR. vs. 133580 test-arm64-arm64-xl-xsm 16 guest-start/debian.repeat fail REGR. vs. 133580 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 133580 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 133580 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 12 guest-start fail REGR. vs. 133580 test-armhf-armhf-xl-rtds 12 guest-start fail REGR. vs. 133580 Tests which did not succeed, but are not blocking: test-arm64-arm64-xl-seattle 16 guest-start/debian.repeat fail baseline untested test-arm64-arm64-xl-thunderx 16 guest-start/debian.repeat fail baseline untested test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 133580 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 133580 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 133580 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 133580 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 133580 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 133580 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 133580 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 133580 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never p
[Xen-devel] PVH dom0 construction timeout
It turns out that PVH dom0 construction doesn't work so well on a 2-socket Rome system... (XEN) NX (Execute Disable) protection active (XEN) *** Building a PVH Dom0 *** (XEN) Watchdog timer detects that CPU0 is stuck! (XEN) [ Xen-4.14-unstable x86_64 debug=y Not tainted ] (XEN) CPU: 0 (XEN) RIP: e008:[] page_get_ram_type+0x58/0xb6 (XEN) RFLAGS: 0206 CONTEXT: hypervisor (XEN) rax: 82d080948fe0 rbx: 02b73db9 rcx: (XEN) rdx: 0400 rsi: 0400 rdi: 002b73db9000 (XEN) rbp: 82d080827be0 rsp: 82d080827ba0 r8: 82d080948fcc (XEN) r9: 002b73dba000 r10: 82d0809491fc r11: 8000 (XEN) r12: 02b73db9 r13: 8320341bc000 r14: 0404fc00 (XEN) r15: 82d08046f209 cr0: 8005003b cr4: 001506e0 (XEN) cr3: a0414000 cr2: (XEN) fsb: gsb: gss: (XEN) ds: es: fs: gs: ss: cs: e008 (XEN) Xen code around (page_get_ram_type+0x58/0xb6): (XEN) 4c 39 d0 74 4d 49 39 d1 <76> 0b 89 ca 83 ca 10 48 39 38 0f 47 ca 49 89 c0 (XEN) Xen stack trace from rsp=82d080827ba0: (XEN) 82d08061ee91 82d080827bb4 000b2403 82d080804340 (XEN) 8320341bc000 82d080804340 8303df90 8320341bc000 (XEN) 82d080827c08 82d08061c38c 8320341bc000 82d080827ca8 (XEN) 82d080648750 82d080827c20 82d08061852c 0020 (XEN) 82d080827d60 82d080638abe 82d080232854 82d080930c60 (XEN) 82d080930280 82d080674800 8303df90 01a4 (XEN) 8303df80 82d080827c80 0206 8320341bc000 (XEN) 82d080827cb8 82d080827ca8 82d080232854 82d080961780 (XEN) 82d080930280 82d080827c00 0002 82d08022f9a0 (XEN) 010a4bb0 82d080827ce0 0206 0381b66d (XEN) 82d080827d00 82d0802b1e87 82d080936900 82d080936900 (XEN) 82d080827d18 82d0802b30d0 82d080936900 82d080827d50 (XEN) 82d08022ef5e 8320341bc000 8303df80 8320341bc000 (XEN) 8303df80 01a4 8303df90 82d080674800 (XEN) 82d080827d98 82d08063cd06 0001 82d080674800 (XEN) 82d080931050 0100 82d080950c80 82d080827ee8 (XEN) 82d08062eae7 01a40fff 00082d080e00 (XEN) 0005 0004 0004 (XEN) 0003 0003 0002 0002 (XEN) 0205 82d080674c20 82d080674ea0 (XEN) Xen call trace: (XEN) [] R page_get_ram_type+0x58/0xb6 (XEN) [] S arch_iommu_hwdom_init+0x239/0x2b7 (XEN) [] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0x85/0x9f (XEN) [] F iommu_hwdom_init+0x44/0x4b (XEN) [] F dom0_construct_pvh+0x160/0x1233 (XEN) [] F construct_dom0+0x5c/0x280e (XEN) [] F __start_xen+0x25db/0x2860 (XEN) [] F __high_start+0x4c/0x4e (XEN) (XEN) CPU1 @ e008:82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) (XEN) CPU31 @ e008:82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) (XEN) CPU30 @ e008:82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) (XEN) CPU27 @ e008:82d08022ad5a (scrub_one_page+0x6d/0x7b) (XEN) CPU26 @ e008:82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) (XEN) CPU244 @ e008:82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) (XEN) CPU245 @ e008:82d08022ad5a (scrub_one_page+0x6d/0x7b) (XEN) CPU247 @ e008:82d080256e3f (drivers/char/ns16550.c#ns_read_reg+0x2d/0x35) (XEN) CPU246 @ e008:82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf) This stack trace is the same on several boots, and in particular, page_get_ram_type() being the %rip which took the timeout. For an equivalent PV dom0 build, it takes perceptibly 0 time, based on how quickly the next line is printed. I haven't diagnosed the exact issue, but some observations: The arch_iommu_hwdom_init() loop's positioning of process_pending_softirqs() looks problematic, because it is short circuited conditionally by hwdom_iommu_map(). page_get_ram_type() is definitely suboptimal here. We have an linear search over a (large-ish) sorted list, and a caller which has every MFN in the system passed into it, which makes the total runtime of arch_iommu_hwdom_init() quadratic with the size of the system. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 5/5] IOMMU: iommu_snoop is x86/HVM-only
On 28/02/2020 12:27, Jan Beulich wrote: > In fact it's VT-d specific, but we don't have a way yet to build code > for just one vendor. Provide a #define for all other cases. > > Signed-off-by: Jan Beulich iommu_snoop has no specific interaction with HVM. It is for any cacheability games the hypervisor may play on a VM, and that in principle includes PV guests as well. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/5] IOMMU: iommu_intremap is x86-only
On 28/02/2020 12:26, Jan Beulich wrote: > Provide a #define for other cases; it didn't seem worthwhile to me to > introduce an IOMMU_INTREMAP Kconfig option at this point. > > Signed-off-by: Jan Beulich > > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -1299,6 +1299,8 @@ boolean (e.g. `iommu=no`) can override t > generation of IOMMUs only supported DMA remapping, and Interrupt > Remapping > appeared in the second generation. > > +This option is not valid on Arm. The longevity of this comment would be greater if it were phrased as "is only valid on x86", especially given the RFC RISCV series on list. > + > * The `intpost` boolean controls the Posted Interrupt sub-feature. In > combination with APIC acceleration (VT-x APICV, SVM AVIC), the IOMMU can > be configured to deliver interrupts from assigned PCI devices directly > --- a/xen/drivers/passthrough/iommu.c > +++ b/xen/drivers/passthrough/iommu.c > @@ -35,7 +35,6 @@ bool __read_mostly iommu_quarantine = tr > bool_t __read_mostly iommu_igfx = 1; > bool_t __read_mostly iommu_snoop = 1; > bool_t __read_mostly iommu_qinval = 1; > -enum iommu_intremap __read_mostly iommu_intremap = iommu_intremap_full; > bool_t __read_mostly iommu_crash_disable; > > static bool __hwdom_initdata iommu_hwdom_none; > @@ -90,8 +89,10 @@ static int __init parse_iommu_param(cons > iommu_snoop = val; > else if ( (val = parse_boolean("qinval", s, ss)) >= 0 ) > iommu_qinval = val; > +#ifndef iommu_intremap > else if ( (val = parse_boolean("intremap", s, ss)) >= 0 ) > iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off; > +#endif The use of ifndef in particular makes the result very weird to read. There appear to be no uses of iommu_intremap outside of x86 code, other than in this setup, so having it false in the !CONFIG_X86 case isn't helpful. How about just guarding uses of the variable with IS_ENABLED(CONFIG_X86) and a common extern? We use this DCE trick already to reduce the ifdefary in the code. The result would certainly be easier to follow ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [linux-4.19 test] 147688: regressions - FAIL
flight 147688 linux-4.19 real [real] http://logs.test-lab.xenproject.org/osstest/logs/147688/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 142932 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 142932 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 142932 build-i386-pvops 6 kernel-build fail in 147609 REGR. vs. 142932 Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-multivcpu 7 xen-boot fail pass in 147609 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemut-debianhvm-i386-xsm 1 build-check(1) blocked in 147609 n/a test-amd64-i386-pair 1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1)blocked in 147609 n/a test-amd64-i386-freebsd10-amd64 1 build-check(1)blocked in 147609 n/a test-amd64-i386-xl1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-qemuu-debianhvm-amd64 1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-qemuu-win7-amd64 1 build-check(1)blocked in 147609 n/a test-amd64-i386-xl-qemut-win7-amd64 1 build-check(1)blocked in 147609 n/a test-amd64-i386-xl-raw1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-qemut-ws16-amd64 1 build-check(1)blocked in 147609 n/a test-amd64-i386-qemuu-rhel6hvm-amd 1 build-check(1) blocked in 147609 n/a test-amd64-i386-qemut-rhel6hvm-intel 1 build-check(1) blocked in 147609 n/a test-amd64-i386-freebsd10-i386 1 build-check(1) blocked in 147609 n/a test-amd64-i386-qemuu-rhel6hvm-intel 1 build-check(1) blocked in 147609 n/a test-amd64-i386-libvirt 1 build-check(1) blocked in 147609 n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-xsm1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-shadow 1 build-check(1) blocked in 147609 n/a test-amd64-i386-qemut-rhel6hvm-amd 1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-pvshim 1 build-check(1) blocked in 147609 n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked in 147609 n/a test-amd64-i386-examine 1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-qemut-debianhvm-amd64 1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-qemuu-ws16-amd64 1 build-check(1)blocked in 147609 n/a test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 1 build-check(1) blocked in 147609 n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked in 147609 n/a test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 1 build-check(1) blocked in 147609 n/a test-armhf-armhf-xl-multivcpu 13 migrate-support-check fail in 147609 never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-check fail in 147609 never pass test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail like 142880 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail like 142932 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-cre
Re: [Xen-devel] [PATCH 5/6] mm: add 'is_special_page' macro...
> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c > index 3835bc928f..c14a724c6d 100644 > --- a/xen/arch/x86/mm/mem_sharing.c > +++ b/xen/arch/x86/mm/mem_sharing.c > @@ -842,7 +842,7 @@ static int nominate_page(struct domain *d, gfn_t gfn, > > /* Skip xen heap pages */ Perhaps adjust (or remove) the comment? > page = mfn_to_page(mfn); > -if ( !page || is_xen_heap_page(page) ) > +if ( !page || is_special_page(page) ) > goto out; Thanks, Tamas ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/grant-table: Remove 'led' variable in map_grant_ref
On 28/02/2020 18:57, Julien Grall wrote: > From: Julien Grall > > The name of the variable 'led' is confusing and only used in one place a > line after. So remove it. > > Signed-off-by: Julien Grall I agree. Acked-by: Andrew Cooper ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3] xen: make sure stop_machine_run() is always called in a tasklet
On 28/02/2020 17:13, Juergen Gross wrote: > @@ -700,6 +688,32 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) > buf, unsigned long len) > return ret; > } > > +int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long > len) > +{ > +int ret; > +struct ucode_buf *buffer; > + > +if ( len != (uint32_t)len ) > +return -E2BIG; > + > +if ( microcode_ops == NULL ) > +return -EINVAL; > + > +buffer = xmalloc_flex_struct(struct ucode_buf, buffer, len); > +if ( !buffer ) > +return -ENOMEM; > + > +ret = copy_from_guest(buffer->buffer, buf, len); > +if ( ret ) > +{ > +xfree(buffer); > +return -EFAULT; > +} > +buffer->len = len; > + > +return continue_hypercall_on_cpu(0, microcode_update_helper, buffer); Any reason why cpu 0 here? There is no restriction at the moment, and running the tasklet on the current CPU is surely better than poking CPU0's tasklet queue remotely, then interrupting it. Everything else looks ok. This adjustments could be done on commit to save a v4. ~Andrew P.S. Might it be sensible to have a continue_hypercall_in_tasklet() wrapper which passes smp_processor_id() into continue_hypercall_on_cpu()? ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/grant-table: Remove 'led' variable in map_grant_ref
Please ignore this version as I forgot to CC the maintainers on it. Cheers, On 28/02/2020 18:57, Julien Grall wrote: From: Julien Grall The name of the variable 'led' is confusing and only used in one place a line after. So remove it. Signed-off-by: Julien Grall --- xen/common/grant_table.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index 057c78f620..9fd6e60416 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -944,7 +944,6 @@ map_grant_ref( struct domain *ld, *rd, *owner = NULL; struct grant_table *lgt, *rgt; grant_ref_t ref; -struct vcpu *led; grant_handle_t handle; mfn_t mfn; struct page_info *pg = NULL; @@ -957,8 +956,7 @@ map_grant_ref( uint16_t *status; bool_t need_iommu; -led = current; -ld = led->domain; +ld = current->domain; if ( unlikely((op->flags & (GNTMAP_device_map|GNTMAP_host_map)) == 0) ) { -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] xen/grant-table: Remove 'led' variable in map_grant_ref
From: Julien Grall The name of the variable 'led' is confusing and only used in one place a line after. So remove it. Signed-off-by: Julien Grall --- xen/common/grant_table.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index 057c78f620..9fd6e60416 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -944,7 +944,6 @@ map_grant_ref( struct domain *ld, *rd, *owner = NULL; struct grant_table *lgt, *rgt; grant_ref_t ref; -struct vcpu *led; grant_handle_t handle; mfn_t mfn; struct page_info *pg = NULL; @@ -957,8 +956,7 @@ map_grant_ref( uint16_t *status; bool_t need_iommu; -led = current; -ld = led->domain; +ld = current->domain; if ( unlikely((op->flags & (GNTMAP_device_map|GNTMAP_host_map)) == 0) ) { -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] xen/grant-table: Remove 'led' variable in map_grant_ref
From: Julien Grall The name of the variable 'led' is confusing and only used in one place a line after. So remove it. Signed-off-by: Julien Grall --- xen/common/grant_table.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index 057c78f620..9fd6e60416 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -944,7 +944,6 @@ map_grant_ref( struct domain *ld, *rd, *owner = NULL; struct grant_table *lgt, *rgt; grant_ref_t ref; -struct vcpu *led; grant_handle_t handle; mfn_t mfn; struct page_info *pg = NULL; @@ -957,8 +956,7 @@ map_grant_ref( uint16_t *status; bool_t need_iommu; -led = current; -ld = led->domain; +ld = current->domain; if ( unlikely((op->flags & (GNTMAP_device_map|GNTMAP_host_map)) == 0) ) { -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 10/17] tools/libxl: Plumb a restore boolean down into libxl__build_pre()
On 24/02/2020 17:39, Ian Jackson wrote: > Andrew Cooper writes ("[PATCH v2 10/17] tools/libxl: Plumb a restore boolean > down into libxl__build_pre()"): >> To fix CPUID handling, libxl__build_pre() is going to have to distinguish >> between a brand new VM vs one which is being migrated-in/resumed. > The distinction between libxl__build_pre and these other functions is > not particularly principled. It is a legacy from an old API (prior to > the existince of *create) where libxl callers were expected to "build" > a domain first and then do other things to it. > > Maybe it would be better to pass this via libxl__domain_build_state > rather than as an additional parameter ? Well - I tried a similar approach the first time around, and it broke stubdoms so badly it needed reverting. (Untrim the commit details) > v2: > * New. This is c/s aacc1430064 "tools/libxl: Plumb domain_create_state down >into libxl__build_pre()" take-2, without any collateral damage to stubdoms. The actual information we want is in libxl__domain_create_state (specifically, restore_fd >= -1). I first tried plumbing dcs down, to avoid stashing the same information in two different structures at different times. Sadly, plumbing dcs didn't work because it is common between the real domain and the stubdom (and this lead to the stubdom getting no settings at all). What we want to do is only influence the CPUID construction of the main domain (which may be migrating in), whereas the stubdom always wants fresh settings. I could duplicate it into dbs, and at a guess that would probably work, but isn't it taking a bad problem and making it worse? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v11 0/3] VM forking
The following series implements VM forking for Intel HVM guests to allow for the fast creation of identical VMs without the assosciated high startup costs of booting or restoring the VM from a savefile. JIRA issue: https://xenproject.atlassian.net/browse/XEN-89 The fork operation is implemented as part of the "xl fork-vm" command: xl fork-vm -C -Q -m By default a fully functional fork is created. The user is in charge however to create the appropriate config file for the fork and to generate the QEMU save file before the fork-vm call is made. The config file needs to give the fork a new name at minimum but other settings may also require changes. Certain settings in the config file of both the parent and the fork have to be set to default. Details are documented. The interface also allows to split the forking into two steps: xl fork-vm --launch-dm no \ -m \ -p xl fork-vm --launch-dm late \ -C \ -Q \ The split creation model is useful when the VM needs to be created as fast as possible. The forked VM can be unpaused without the device model being launched to be monitored and accessed via VMI. Note however that without its device model running (depending on what is executing in the VM) it is bound to misbehave or even crash when its trying to access devices that would be emulated by QEMU. We anticipate that for certain use-cases this would be an acceptable situation, in case for example when fuzzing is performed of code segments that don't access such devices. Launching the device model requires the QEMU Xen savefile to be generated manually from the parent VM. This can be accomplished simply by connecting to its QMP socket and issuing the "xen-save-devices-state" command. For example using the standard tool socat these commands can be used to generate the file: socat - UNIX-CONNECT:/var/run/xen/qmp-libxl- { "execute": "qmp_capabilities" } { "execute": "xen-save-devices-state", \ "arguments": { "filename": "/path/to/save/qemu_state", \ "live": false} } At runtime the forked VM starts running with an empty p2m which gets lazily populated when the VM generates EPT faults, similar to how altp2m views are populated. If the memory access is a read-only access, the p2m entry is populated with a memory shared entry with its parent. For write memory accesses or in case memory sharing wasn't possible (for example in case a reference is held by a third party), a new page is allocated and the page contents are copied over from the parent VM. Forks can be further forked if needed, thus allowing for further memory savings. A VM fork reset hypercall is also added that allows the fork to be reset to the state it was just after a fork, also accessible via xl: xl fork-vm --fork-reset -p This is an optimization for cases where the forks are very short-lived and run without a device model, so resetting saves some time compared to creating a brand new fork provided the fork has not aquired a lot of memory. If the fork has a lot of memory deduplicated it is likely going to be faster to create a new fork from scratch and asynchronously destroying the old one. The series has been tested with Windows VMs and functions as expected. Linux VMs when forked from a running VM will have a frozen VNC screen. Linux VMs at this time can only be forked with a working device model when the parent VM was restored from a snapshot using "xl restore -p". This is a known limitation. Also note that PVHVM/PVH Linux guests have not been tested. Forking most likely works but PV devices and drivers would require additional wiring to set things up properly since the guests are unaware of the forking taking place, unlike the save/restore routine where the guest is made aware of the procedure. Forking time has been measured to be 0.0007s, device model launch to be around 1s depending largely on the number of devices being emulated. Fork resets have been measured to be 0.0001s under the optimal circumstances. New in v11: Fully copy & reset vcpu_info pages Setup vcpu_runstate for forks Added TODO note for PV timers Copy & reset shared_info page Copy & reset HVM special pages New in v10: Rebased on staging and minor fixes for things pointed out by Roger Allocate pages for vcpu_info if used by parent Document limitation of guest settings that have to be set to default Require max-vcpus to be specified by toolstack-side Code movement in toolstack & compile tested on ARM Implement hypercall continuation for reset operation Patch 1-2 implements the VM fork & reset operation hypervisor side bits Patch 3 adds the toolstack-side code implementing VM forking and reset Tamas K Lengyel (3): xen/mem_sharing: VM forking x86/mem_sharing: reset a fork xen/tools: VM forking toolstack side docs/man/xl.1.pod.in | 44 +++ tools/libxc/include/xenctrl.
[Xen-devel] [xen-unstable-smoke test] 147730: tolerable all pass - PUSHED
flight 147730 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/147730/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 9649cef3b3a7eaca1347154ea7f274586d48bc29 baseline version: xen 90d19e6f53a47f8f7f2154c67f03adc192c0d760 Last test of basis 147711 2020-02-28 00:00:57 Z0 days Testing same since 147730 2020-02-28 16:00:31 Z0 days1 attempts People who touched revisions under test: Jan Beulich Roger Pau Monné jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 90d19e6f53..9649cef3b3 9649cef3b3a7eaca1347154ea7f274586d48bc29 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v11 1/3] xen/mem_sharing: VM forking
VM forking is the process of creating a domain with an empty memory space and a parent domain specified from which to populate the memory when necessary. For the new domain to be functional the VM state is copied over as part of the fork operation (HVM params, hap allocation, etc). Signed-off-by: Tamas K Lengyel --- v11: Fully copy vcpu_info pages Setup vcpu_runstate for forks Added TODO note for PV timers Copy shared_info page Add copy_settings function, to be shared with fork_reset in the next patch --- xen/arch/x86/domain.c | 11 + xen/arch/x86/hvm/hvm.c| 4 +- xen/arch/x86/mm/hap/hap.c | 3 +- xen/arch/x86/mm/mem_sharing.c | 368 ++ xen/arch/x86/mm/p2m.c | 9 +- xen/common/domain.c | 3 + xen/include/asm-x86/hap.h | 1 + xen/include/asm-x86/hvm/hvm.h | 2 + xen/include/asm-x86/mem_sharing.h | 17 ++ xen/include/public/memory.h | 5 + xen/include/xen/sched.h | 5 + 11 files changed, 423 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index fe63c23676..1ab0ca0942 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2203,6 +2203,17 @@ int domain_relinquish_resources(struct domain *d) ret = relinquish_shared_pages(d); if ( ret ) return ret; + +/* + * If the domain is forked, decrement the parent's pause count + * and release the domain. + */ +if ( d->parent ) +{ +domain_unpause(d->parent); +put_domain(d->parent); +d->parent = NULL; +} } #endif diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index db5d7b4d30..c551cc31f3 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1916,7 +1916,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla, } #endif -/* Spurious fault? PoD and log-dirty also take this path. */ +/* Spurious fault? PoD, log-dirty and VM forking also take this path. */ if ( p2m_is_ram(p2mt) ) { rc = 1; @@ -4430,7 +4430,7 @@ static int hvm_allow_get_param(struct domain *d, return rc; } -static int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value) +int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value) { int rc; diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index 3d93f3451c..c7c7ff6e99 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct page_info *pg) } /* Return the size of the pool, rounded up to the nearest MB */ -static unsigned int -hap_get_allocation(struct domain *d) +unsigned int hap_get_allocation(struct domain *d) { unsigned int pg = d->arch.paging.hap.total_pages + d->arch.paging.hap.p2m_pages; diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 3835bc928f..4a840d09dd 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -22,6 +22,7 @@ #include #include +#include #include #include #include @@ -36,6 +37,8 @@ #include #include #include +#include +#include #include #include "mm-locks.h" @@ -1444,6 +1447,334 @@ static inline int mem_sharing_control(struct domain *d, bool enable) return 0; } +/* + * Forking a page only gets called when the VM faults due to no entry being + * in the EPT for the access. Depending on the type of access we either + * populate the physmap with a shared entry for read-only access or + * fork the page if its a write access. + * + * The client p2m is already locked so we only need to lock + * the parent's here. + */ +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing) +{ +int rc = -ENOENT; +shr_handle_t handle; +struct domain *parent = d->parent; +struct p2m_domain *p2m; +unsigned long gfn_l = gfn_x(gfn); +mfn_t mfn, new_mfn; +p2m_type_t p2mt; +struct page_info *page; + +if ( !mem_sharing_is_fork(d) ) +return -ENOENT; + +if ( !unsharing ) +{ +/* For read-only accesses we just add a shared entry to the physmap */ +while ( parent ) +{ +if ( !(rc = nominate_page(parent, gfn, 0, &handle)) ) +break; + +parent = parent->parent; +} + +if ( !rc ) +{ +/* The client's p2m is already locked */ +struct p2m_domain *pp2m = p2m_get_hostp2m(parent); + +p2m_lock(pp2m); +rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false); +p2m_unlock(pp2m); + +if ( !rc ) +return 0; +} +} + +/* + * If it's a write access (ie. unsharing) or if adding a shared entry to + * the physmap failed
[Xen-devel] [PATCH v11 2/3] x86/mem_sharing: reset a fork
Implement hypercall that allows a fork to shed all memory that got allocated for it during its execution and re-load its vCPU context from the parent VM. This allows the forked VM to reset into the same state the parent VM is in a faster way then creating a new fork would be. Measurements show about a 2x speedup during normal fuzzing operations. Performance may vary depending how much memory got allocated for the forked VM. If it has been completely deduplicated from the parent VM then creating a new fork would likely be more performant. Signed-off-by: Tamas K Lengyel --- v11: use copy_settings function to reset special pages --- xen/arch/x86/mm/mem_sharing.c | 115 ++ xen/include/public/memory.h | 4 ++ 2 files changed, 119 insertions(+) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 4a840d09dd..a458096b01 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1775,6 +1775,91 @@ static int fork(struct domain *cd, struct domain *d) return rc; } +/* + * The fork reset operation is intended to be used on short-lived forks only. + */ +static int fork_reset(struct domain *d, struct domain *cd, + struct mem_sharing_op_fork_reset *fr) +{ +int rc = 0; +struct p2m_domain* p2m = p2m_get_hostp2m(cd); +struct page_info *page, *tmp; +unsigned long list_position = 0, preempt_count = 0, restart = fr->opaque; + +domain_pause(cd); + +page_list_for_each_safe(page, tmp, &cd->page_list) +{ +p2m_type_t p2mt; +p2m_access_t p2ma; +gfn_t gfn; +mfn_t mfn; +bool shared = false; + +list_position++; + +/* Resume were we left of before preemption */ +if ( restart && list_position < restart ) +continue; + +mfn = page_to_mfn(page); +if ( mfn_valid(mfn) ) +{ + +gfn = mfn_to_gfn(cd, mfn); +mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma, +0, NULL, false); + +if ( p2m_is_ram(p2mt) && !p2m_is_shared(p2mt) ) +{ +/* take an extra reference, must work for a shared page */ +if( !get_page(page, cd) ) +{ +ASSERT_UNREACHABLE(); +return -EINVAL; +} + +shared = true; +preempt_count += 0x10; + +/* + * Must succeed, it's a shared page that exists and + * thus its size is guaranteed to be 4k so we are not splitting + * large pages. + */ +rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K, +p2m_invalid, p2m_access_rwx, -1); +ASSERT(!rc); + +put_page_alloc_ref(page); +put_page(page); +} +} + +if ( !shared ) +preempt_count++; + +/* Preempt every 2MiB (shared) or 32MiB (unshared) - arbitrary. */ +if ( preempt_count >= 0x2000 ) +{ +if ( hypercall_preempt_check() ) +{ +rc = -ERESTART; +break; +} +preempt_count = 0; +} +} + +if ( rc ) +fr->opaque = list_position; +else +rc = copy_settings(cd, d); + +domain_unpause(cd); +return rc; +} + int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) { int rc; @@ -2066,6 +2151,36 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) break; } +case XENMEM_sharing_op_fork_reset: +{ +struct domain *pd; + +rc = -ENOSYS; +if ( !mem_sharing_is_fork(d) ) +goto out; + +rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, &pd); +if ( rc ) +goto out; + +rc = fork_reset(pd, d, &mso.u.fork_reset); + +rcu_unlock_domain(pd); + +if ( rc > 0 ) +{ +if ( __copy_to_guest(arg, &mso, 1) ) +rc = -EFAULT; +else +rc = hypercall_create_continuation(__HYPERVISOR_memory_op, + "lh", XENMEM_sharing_op, + arg); +} +else +mso.u.fork_reset.opaque = 0; +break; +} + default: rc = -ENOSYS; break; diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index c1dbad060e..7ca07c01dd 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t); #define XENMEM_sharing_op_audit 7 #define XENMEM_sharing_op_range_share 8 #define XENMEM_sharing_op_fork 9 +#define XENMEM_sharing_op_fork_reset10
[Xen-devel] [PATCH v11 3/3] xen/tools: VM forking toolstack side
Add necessary bits to implement "xl fork-vm" commands. The command allows the user to specify how to launch the device model allowing for a late-launch model in which the user can execute the fork without the device model and decide to only later launch it. Signed-off-by: Tamas K Lengyel --- docs/man/xl.1.pod.in | 44 + tools/libxc/include/xenctrl.h | 13 ++ tools/libxc/xc_memshr.c | 22 +++ tools/libxl/libxl.h | 11 ++ tools/libxl/libxl_create.c| 361 +++--- tools/libxl/libxl_dm.c| 2 +- tools/libxl/libxl_dom.c | 43 +++- tools/libxl/libxl_internal.h | 7 + tools/libxl/libxl_types.idl | 1 + tools/libxl/libxl_x86.c | 41 tools/xl/Makefile | 2 +- tools/xl/xl.h | 5 + tools/xl/xl_cmdtable.c| 15 ++ tools/xl/xl_forkvm.c | 147 ++ tools/xl/xl_vmcontrol.c | 14 ++ 15 files changed, 562 insertions(+), 166 deletions(-) create mode 100644 tools/xl/xl_forkvm.c diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in index 09339282e6..59c03c6427 100644 --- a/docs/man/xl.1.pod.in +++ b/docs/man/xl.1.pod.in @@ -708,6 +708,50 @@ above). =back +=item B [I] I + +Create a fork of a running VM. The domain will be paused after the operation +and remains paused while forks of it exist. Experimental and x86 only. +Forks can only be made of domains with HAP enabled and on Intel hardware. The +parent domain must be created with the xl toolstack and its configuration must +not manually define max_grant_frames, max_maptrack_frames or max_event_channels. + +B + +=over 4 + +=item B<-p> + +Leave the fork paused after creating it. + +=item B<--launch-dm> + +Specify whether the device model (QEMU) should be launched for the fork. Late +launch allows to start the device model for an already running fork. + +=item B<-C> + +The config file to use when launching the device model. Currently required when +launching the device model. Most config settings MUST match the parent domain +exactly, only change VM name, disk path and network configurations. + +=item B<-Q> + +The path to the qemu save file to use when launching the device model. Currently +required when launching the device model. + +=item B<--fork-reset> + +Perform a reset operation of an already running fork. Note that resetting may +be less performant then creating a new fork depending on how much memory the +fork has deduplicated during its runtime. + +=item B<--max-vcpus> + +Specify the max-vcpus matching the parent domain when not launching the dm. + +=back + =item B [I] Display the number of shared pages for a specified domain. If no domain is diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index fc6e57a1a0..00cb4cf1f7 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2225,6 +2225,19 @@ int xc_memshr_range_share(xc_interface *xch, uint64_t first_gfn, uint64_t last_gfn); +int xc_memshr_fork(xc_interface *xch, + uint32_t source_domain, + uint32_t client_domain); + +/* + * Note: this function is only intended to be used on short-lived forks that + * haven't yet aquired a lot of memory. In case the fork has a lot of memory + * it is likely more performant to create a new fork with xc_memshr_fork. + * + * With VMs that have a lot of memory this call may block for a long time. + */ +int xc_memshr_fork_reset(xc_interface *xch, uint32_t forked_domain); + /* Debug calls: return the number of pages referencing the shared frame backing * the input argument. Should be one or greater. * diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c index 97e2e6a8d9..d0e4ee225b 100644 --- a/tools/libxc/xc_memshr.c +++ b/tools/libxc/xc_memshr.c @@ -239,6 +239,28 @@ int xc_memshr_debug_gref(xc_interface *xch, return xc_memshr_memop(xch, domid, &mso); } +int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid) +{ +xen_mem_sharing_op_t mso; + +memset(&mso, 0, sizeof(mso)); + +mso.op = XENMEM_sharing_op_fork; +mso.u.fork.parent_domain = pdomid; + +return xc_memshr_memop(xch, domid, &mso); +} + +int xc_memshr_fork_reset(xc_interface *xch, uint32_t domid) +{ +xen_mem_sharing_op_t mso; + +memset(&mso, 0, sizeof(mso)); +mso.op = XENMEM_sharing_op_fork_reset; + +return xc_memshr_memop(xch, domid, &mso); +} + int xc_memshr_audit(xc_interface *xch) { xen_mem_sharing_op_t mso; diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 35e13428b2..6b968bdcb4 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -2657,6 +2657,17 @@ int libxl_psr_get_hw_info(libxl_ctx *ctx, libxl_psr_feat_type type, unsigned int lvl, unsigned int *nr, libxl_psr_hw_info **info); void libxl_psr_hw_info_list_free(libxl_psr_hw_info *list, unsigned in
[Xen-devel] [libvirt test] 147703: regressions - FAIL
flight 147703 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/147703/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 146182 build-i386-libvirt6 libvirt-buildfail REGR. vs. 146182 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 146182 build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 146182 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a version targeted for testing: libvirt 0b0907316dd34b3266bf5e8d766e659e15742da5 baseline version: libvirt a1cd25b919509be2645dbe6f952d5263e0d4e4e5 Last test of basis 146182 2020-01-17 06:00:23 Z 42 days Failing since146211 2020-01-18 04:18:52 Z 41 days 39 attempts Testing same since 147703 2020-02-27 21:03:20 Z0 days1 attempts People who touched revisions under test: Andrea Bolognani Arnaud Patard Boris Fiuczynski Christian Ehrhardt Collin Walling Daniel Henrique Barboza Daniel P. Berrangé Dario Faggioli Erik Skultety Han Han Jim Fehlig Jiri Denemark Jonathon Jongsma Julio Faracco Ján Tomko Laine Stump Marek Marczykowski-Górecki Michal Privoznik Nikolay Shirokovskiy Pavel Hrdina Pavel Mores Peter Krempa Richard W.M. Jones Rikard Falkeborn Ryan Moeller Sahid Orentino Ferdjaoui Stefan Berger Stefan Berger Thomas Huth Your Name zhenwei pi jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt fail build-arm64-libvirt fail build-armhf-libvirt fail build-i386-libvirt fail build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm blocked test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked test-amd64-amd64-libvirt-xsm blocked test-arm64-arm64-libvirt-xsm blocked test-amd64-i386-libvirt-xsm blocked test-amd64-amd64-libvirt blocked test-arm64-arm64-libvirt blocked test-armhf-armhf-libvirt blocked test-amd64-i386-libvirt blocked test-amd64-amd64-libvirt-pairblocked test-amd64-i386-libvirt-pair blocked test-arm64-arm64-libvirt-qcow2 blocked test-armhf-armhf-libvirt-raw blocked test-amd64-amd64-libvirt-vhd blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject
[Xen-devel] [xen-unstable test] 147683: regressions - FAIL
flight 147683 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/147683/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail REGR. vs. 147600 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 16 guest-localmigrate fail REGR. vs. 147600 Tests which did not succeed, but are not blocking: test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail like 147600 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 147600 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail like 147600 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 147600 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 147600 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 147600 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 147600 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 147600 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 147600 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 147600 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 147600 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen d7d751bfdbd4424e286eddfa0ce36075627e3c31 baseline version: xen e465fecbfdb865c75f762055c0396bc617005748 Last test of basis 147600 2020-02-25 13:42:49 Z3 days Testing sam
Re: [Xen-devel] [PATCH] x86/ioperm: add new paravirt function update_io_bitmap
Jürgen Groß writes: > Friendly ping... Ooops. I pick it up first thing tomorrow morning ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 0/2] docs: Migration design documents
Ping again... > -Original Message- > From: Durrant, Paul > Sent: 20 February 2020 12:54 > To: Durrant, Paul ; xen-devel@lists.xenproject.org > Cc: Andrew Cooper ; George Dunlap > ; Ian Jackson ; > Jan Beulich ; Julien Grall ; Konrad > Rzeszutek Wilk ; Stefano Stabellini > ; Wei Liu > Subject: RE: [PATCH v5 0/2] docs: Migration design documents > > Ping? > > I have not receieved any further comments on v5. Can I please get acks or > otherwise so we can (hopefully) move on with coding? > > Paul > > > -Original Message- > > From: Paul Durrant > > Sent: 13 February 2020 10:53 > > To: xen-devel@lists.xenproject.org > > Cc: Durrant, Paul ; Andrew Cooper > > ; George Dunlap > ; > > Ian Jackson ; Jan Beulich > ; > > Julien Grall ; Konrad Rzeszutek Wilk > > ; Stefano Stabellini ; > Wei > > Liu > > Subject: [PATCH v5 0/2] docs: Migration design documents > > > > Paul Durrant (2): > > docs/designs: Add a design document for non-cooperative live migration > > docs/designs: Add a design document for migration of xenstore data > > > > docs/designs/non-cooperative-migration.md | 272 ++ > > docs/designs/xenstore-migration.md| 136 +++ > > 2 files changed, 408 insertions(+) > > create mode 100644 docs/designs/non-cooperative-migration.md > > create mode 100644 docs/designs/xenstore-migration.md > > --- > > Cc: Andrew Cooper > > Cc: George Dunlap > > Cc: Ian Jackson > > Cc: Jan Beulich > > Cc: Julien Grall > > Cc: Konrad Rzeszutek Wilk > > Cc: Stefano Stabellini > > Cc: Wei Liu > > -- > > 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [XEN PATCH v2] libxl: wait for console path before firing console_available
On Thu, Feb 20, 2020 at 02:31:03PM +0100, Paweł Marczewski wrote: > If we skip the bootloader, the TTY path will be set for xenconsoled. > However, there is no guarantee that this will happen by the time we > want to call the console_available callback, so we have to wait. > > Signed-off-by: Paweł Marczewski With minor fix below: Reviewed-by: Marek Marczykowski-Górecki > --- > Changed since v1: > * use xswait mechanism to add a timeout > > As mentioned before, this is to fix a race condition that appears when > using libxl via libvirt and not using bootloader (console_available > fires too early). > > I have tested the patch on Qubes OS 4.1 (with Xen 4.13), and it seems > to solve the problem. I also checked the timeout: when xenconsoled is > stopped, libxl waits for 10 seconds and then aborts domain creation. > > tools/libxl/libxl_create.c | 43 +--- > tools/libxl/libxl_internal.h | 1 + > 2 files changed, 41 insertions(+), 3 deletions(-) > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index 3a7364e2ac..4b150d92b9 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -1190,6 +1190,33 @@ static void domcreate_console_available(libxl__egc > *egc, > dcs->aop_console_how.for_event)); > } > > +static void console_xswait_callback(libxl__egc *egc, libxl__xswait_state > *xswa, > +int rc, const char *p) > +{ > +EGC_GC; > +libxl__domain_create_state *dcs = CONTAINER_OF(xswa, *dcs, > console_xswait); > +char *dompath = libxl__xs_get_dompath(gc, dcs->guest_domid); > +char *tty_path = GCSPRINTF("%s/console/tty", dompath); > +char *tty; > + > +if (rc) { > +if (rc == ERROR_TIMEDOUT) > +LOG(ERROR, "%s: timed out", xswa->what); > +libxl__xswait_stop(gc, xswa); > +domcreate_complete(egc, dcs, rc); > +return; > +} > + > +tty = libxl__xs_read(gc, XBT_NULL, tty_path); > + > +if (tty && tty[0] != '\0') { > +libxl__xswait_stop(gc, xswa); > + > +domcreate_console_available(egc, dcs); > +domcreate_complete(egc, dcs, 0); > +} > +} > + > static void domcreate_bootloader_done(libxl__egc *egc, >libxl__bootloader_state *bl, >int rc) > @@ -1728,9 +1755,18 @@ static void domcreate_attach_devices(libxl__egc *egc, > return; > } > > -domcreate_console_available(egc, dcs); > - > -domcreate_complete(egc, dcs, 0); > +dcs->console_xswait.ao = ao; > +dcs->console_xswait.what = GCSPRINTF("domain %d console tty", domid); > +dcs->console_xswait.path = GCSPRINTF("%s/console/tty", > + libxl__xs_get_dompath(gc, domid)); > +dcs->console_xswait.timeout_ms = 10 * 1000; Better not use explicit value _here_, but a constant in some header. I think LIBXL_INIT_TIMEOUT is a good fit here. > +dcs->console_xswait.callback = console_xswait_callback; > +ret = libxl__xswait_start(gc, &dcs->console_xswait); > +if (ret) { > +LOG(ERROR, "unable to set up watch for domain %d console path", > +domid); > +goto error_out; > +} > > return; > > @@ -1861,6 +1897,7 @@ static int do_domain_create(libxl_ctx *ctx, > libxl_domain_config *d_config, > > libxl__ao_progress_gethow(&cdcs->dcs.aop_console_how, aop_console_how); > cdcs->domid_out = domid; > +libxl__xswait_init(&cdcs->dcs.console_xswait); > > initiate_domain_create(egc, &cdcs->dcs); > > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h > index 4936446069..d8129417dc 100644 > --- a/tools/libxl/libxl_internal.h > +++ b/tools/libxl/libxl_internal.h > @@ -4180,6 +4180,7 @@ struct libxl__domain_create_state { > /* necessary if the domain creation failed and we have to destroy it */ > libxl__domain_destroy_state dds; > libxl__multidev multidev; > +libxl__xswait_state console_xswait; > }; > > _hidden int libxl__device_nic_set_devids(libxl__gc *gc, -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
On Fri, Feb 28, 2020 at 06:00:44PM +0100, Jan Beulich wrote: > On 19.02.2020 18:43, Roger Pau Monne wrote: > > Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes. > > This greatly increases the performance of TLB flushes when running > > with a high amount of vCPUs as a Xen guest, and is specially important > > when running in shim mode. > > > > The following figures are from a PV guest running `make -j32 xen` in > > shim mode with 32 vCPUs and HAP. > > > > Using x2APIC and ALLBUT shorthand: > > real4m35.973s > > user4m35.110s > > sys 36m24.117s > > > > Using L0 assisted flush: > > real1m2.596s > > user4m34.818s > > sys 5m16.374s > > > > The implementation adds a new hook to hypervisor_ops so other > > enlightenments can also implement such assisted flush just by filling > > the hook. Note that the Xen implementation completely ignores the > > dirty CPU mask and the linear address passed in, and always performs a > > global TLB flush on all vCPUs. > > This isn't because of an implementation choice of yours, but because > of how HVMOP_flush_tlbs works. I think the statement should somehow > express this. I also think it wants clarifying that using the > hypercall is indeed faster even in the case of single-page, single- > CPU flush Are you sure about this? I think taking a vmexit is going to be more costly than executing a local invlpg? > (which I suspect may not be the case especially as vCPU > count grows). The stats above prove a positive overall effect, but > they don't say whether the effect could be even bigger by being at > least a little selective. I assume that being able to provide a bitmap with the vCPUs on whether the TLB flush should be performed would give us some more performance, but I haven't looked into it yet. > > @@ -73,6 +74,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820) > > ops.e820_fixup(e820); > > } > > > > +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va, > > + unsigned int order) > > +{ > > +if ( ops.flush_tlb ) > > +return alternative_call(ops.flush_tlb, mask, va, order); > > + > > +return -ENOSYS; > > +} > > Please no new -ENOSYS anywhere (except in new ports' top level > hypercall handlers). Ack. Is EOPNOTSUPP OK? > > @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void > > *va, unsigned int flags) > > if ( (flags & ~FLUSH_ORDER_MASK) && > > !cpumask_subset(mask, cpumask_of(cpu)) ) > > { > > +if ( cpu_has_hypervisor && > > + !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID | > > + FLUSH_ORDER_MASK)) && > > + !hypervisor_flush_tlb(mask, va, (flags - 1) & > > FLUSH_ORDER_MASK) ) > > +{ > > +if ( tlb_clk_enabled ) > > +tlb_clk_enabled = false; > > Why does this need doing here? Couldn't Xen guest setup code > clear the flag? Because it's possible that the hypercalls fails, and hence the tlb clock should be kept enabled. There's no reason to disable it until Xen knows the assisted flush is indeed available. I don't mind moving it to Xen guest setup code, but I'm not sure I see why it would be any better than doing it here. The only reason I guess is to avoid checking tlb_clk_enabled on every successful assisted flush? Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3] xen: make sure stop_machine_run() is always called in a tasklet
With core scheduling active it is mandatory for stop_machine_run() to be called in idle context only (so either during boot or in a tasklet), as otherwise a scheduling deadlock would occur: stop_machine_run() does a cpu rendezvous by activating a tasklet on all other cpus. In case stop_machine_run() was not called in an idle vcpu it would block scheduling the idle vcpu on its siblings with core scheduling being active, resulting in a hang. Put a BUG_ON() into stop_machine_run() to test for being called in an idle vcpu only and adapt the missing call site (ucode loading) to use a tasklet for calling stop_machine_run(). Signed-off-by: Juergen Gross --- V2: - rephrase commit message (Julien Grall) V3: - add comments (Jan Beulich) --- xen/arch/x86/microcode.c | 54 +-- xen/common/rcupdate.c | 4 xen/common/stop_machine.c | 7 ++ 3 files changed, 45 insertions(+), 20 deletions(-) diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c index 35c1d36cdc..4cf4e66b18 100644 --- a/xen/arch/x86/microcode.c +++ b/xen/arch/x86/microcode.c @@ -562,30 +562,18 @@ static int do_microcode_update(void *patch) return ret; } -int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) +struct ucode_buf { +unsigned int len; +char buffer[]; +}; + +static long microcode_update_helper(void *data) { int ret; -void *buffer; +struct ucode_buf *buffer = data; unsigned int cpu, updated; struct microcode_patch *patch; -if ( len != (uint32_t)len ) -return -E2BIG; - -if ( microcode_ops == NULL ) -return -EINVAL; - -buffer = xmalloc_bytes(len); -if ( !buffer ) -return -ENOMEM; - -ret = copy_from_guest(buffer, buf, len); -if ( ret ) -{ -xfree(buffer); -return -EFAULT; -} - /* cpu_online_map must not change during update */ if ( !get_cpu_maps() ) { @@ -607,7 +595,7 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) return -EPERM; } -patch = parse_blob(buffer, len); +patch = parse_blob(buffer->buffer, buffer->len); xfree(buffer); if ( IS_ERR(patch) ) { @@ -700,6 +688,32 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) return ret; } +int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) +{ +int ret; +struct ucode_buf *buffer; + +if ( len != (uint32_t)len ) +return -E2BIG; + +if ( microcode_ops == NULL ) +return -EINVAL; + +buffer = xmalloc_flex_struct(struct ucode_buf, buffer, len); +if ( !buffer ) +return -ENOMEM; + +ret = copy_from_guest(buffer->buffer, buf, len); +if ( ret ) +{ +xfree(buffer); +return -EFAULT; +} +buffer->len = len; + +return continue_hypercall_on_cpu(0, microcode_update_helper, buffer); +} + static int __init microcode_init(void) { /* diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c index 91d4ad0fd8..d76b991627 100644 --- a/xen/common/rcupdate.c +++ b/xen/common/rcupdate.c @@ -178,6 +178,10 @@ static int rcu_barrier_action(void *_cpu_count) return 0; } +/* + * As rcu_barrier() is using stop_machine_run() it is allowed to be used in + * idle context only (see comment for stop_machine_run()). + */ int rcu_barrier(void) { atomic_t cpu_count = ATOMIC_INIT(0); diff --git a/xen/common/stop_machine.c b/xen/common/stop_machine.c index 33d9602217..2d5f6aef61 100644 --- a/xen/common/stop_machine.c +++ b/xen/common/stop_machine.c @@ -67,6 +67,12 @@ static void stopmachine_wait_state(void) cpu_relax(); } +/* + * Sync all processors and call a function on one or all of them. + * As stop_machine_run() is using a tasklet for syncing the processors it is + * mandatory to be called only on an idle vcpu, as otherwise active core + * scheduling might hang. + */ int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu) { unsigned int i, nr_cpus; @@ -74,6 +80,7 @@ int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu) int ret; BUG_ON(!local_irq_is_enabled()); +BUG_ON(!is_idle_vcpu(current)); /* cpu_online_map must not change. */ if ( !get_cpu_maps() ) -- 2.16.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
On 28.02.2020 17:57, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 05:44:57PM +0100, Jan Beulich wrote: >> On 28.02.2020 17:31, Roger Pau Monné wrote: >>> On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote: On 19.02.2020 18:43, Roger Pau Monne wrote: > @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, > struct vcpu *v), > if ( !flush_vcpu(ctxt, v) ) > continue; > > -paging_update_cr3(v, false); > +hvm_asid_flush_vcpu(v); > > cpu = read_atomic(&v->dirty_cpu); > -if ( is_vcpu_dirty_cpu(cpu) ) > +if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) ) > __cpumask_set_cpu(cpu, mask); > } > > -/* Flush TLBs on all CPUs with dirty vcpu state. */ > -flush_tlb_mask(mask); > - > -/* Done. */ > -for_each_vcpu ( d, v ) > -if ( v != current && flush_vcpu(ctxt, v) ) > -vcpu_unpause(v); > +/* > + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to > force an > + * ASID/VPID change and hence accomplish a guest TLB flush. Note > that vCPUs > + * not currently running will already be flushed when scheduled > because of > + * the ASID tickle done in the loop above. > + */ > +on_selected_cpus(mask, handle_flush, mask, 0); > +while ( !cpumask_empty(mask) ) > +cpu_relax(); Why do you pass 0 as last argument? If you passed 1, you wouldn't need the while() here, handle_flush() could be empty, and you'd save a perhaps large amount of CPUs to all try to modify two cache lines (the one used by on_selected_cpus() itself and the one here) instead of just one. Yes, latency from the last CPU clearing its bit to you being able to exit from here may be a little larger this way, but overall latency may shrink with the cut by half amount of atomic ops issued to the bus. >>> >>> In fact I think passing 0 as the last argument is fine, and >>> handle_flush can be empty in that case anyway. We don't really care >>> whether on_selected_cpus returns before all CPUs have executed the >>> dummy function, as long as all of them have taken a vmexit. Using 0 >>> already guarantees that AFAICT. >> >> Isn't it that the caller ants to be guaranteed that the flush >> has actually occurred (or at least that no further insns can >> be executed prior to the flush happening, i.e. at least the VM >> exit having occurred)? > > Yes, but that would happen with 0 as the last argument already, see > the code in smp_call_function_interrupt: > > if ( call_data.wait ) > { > (*func)(info); > smp_mb(); > cpumask_clear_cpu(cpu, &call_data.selected); > } > else > { > smp_mb(); > cpumask_clear_cpu(cpu, &call_data.selected); > (*func)(info); > } > > The only difference is whether on_selected_cpus can return before all > the functions have been executed on all CPUs, or whether all CPUs have > taken a vmexit. The later is fine for our use-case. Oh, yes - right you are. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
On 19.02.2020 18:43, Roger Pau Monne wrote: > Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes. > This greatly increases the performance of TLB flushes when running > with a high amount of vCPUs as a Xen guest, and is specially important > when running in shim mode. > > The following figures are from a PV guest running `make -j32 xen` in > shim mode with 32 vCPUs and HAP. > > Using x2APIC and ALLBUT shorthand: > real 4m35.973s > user 4m35.110s > sys 36m24.117s > > Using L0 assisted flush: > real1m2.596s > user4m34.818s > sys 5m16.374s > > The implementation adds a new hook to hypervisor_ops so other > enlightenments can also implement such assisted flush just by filling > the hook. Note that the Xen implementation completely ignores the > dirty CPU mask and the linear address passed in, and always performs a > global TLB flush on all vCPUs. This isn't because of an implementation choice of yours, but because of how HVMOP_flush_tlbs works. I think the statement should somehow express this. I also think it wants clarifying that using the hypercall is indeed faster even in the case of single-page, single- CPU flush (which I suspect may not be the case especially as vCPU count grows). The stats above prove a positive overall effect, but they don't say whether the effect could be even bigger by being at least a little selective. > @@ -73,6 +74,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820) > ops.e820_fixup(e820); > } > > +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va, > + unsigned int order) > +{ > +if ( ops.flush_tlb ) > +return alternative_call(ops.flush_tlb, mask, va, order); > + > +return -ENOSYS; > +} Please no new -ENOSYS anywhere (except in new ports' top level hypercall handlers). > @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void > *va, unsigned int flags) > if ( (flags & ~FLUSH_ORDER_MASK) && > !cpumask_subset(mask, cpumask_of(cpu)) ) > { > +if ( cpu_has_hypervisor && > + !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID | > + FLUSH_ORDER_MASK)) && > + !hypervisor_flush_tlb(mask, va, (flags - 1) & FLUSH_ORDER_MASK) > ) > +{ > +if ( tlb_clk_enabled ) > +tlb_clk_enabled = false; Why does this need doing here? Couldn't Xen guest setup code clear the flag? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
On Fri, Feb 28, 2020 at 05:44:57PM +0100, Jan Beulich wrote: > On 28.02.2020 17:31, Roger Pau Monné wrote: > > On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote: > >> On 19.02.2020 18:43, Roger Pau Monne wrote: > >>> --- a/xen/arch/x86/mm/hap/hap.c > >>> +++ b/xen/arch/x86/mm/hap/hap.c > >>> @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int > >>> do_locking, bool noflush) > >>> hvm_update_guest_cr3(v, noflush); > >>> } > >>> > >>> +/* > >>> + * NB: doesn't actually perform any flush, used just to clear the CPU > >>> from the > >>> + * mask and hence signal that the guest TLB flush has been done. > >>> + */ > >> > >> "has been done" isn't correct, as the flush may happen only later > >> on (upon next VM entry). I think wording here needs to be as > >> precise as possible, however the comment may turn out unnecessary > >> altogether: > > > > What about: > > > > /* > > * NB: Dummy function exclusively used as a way to trigger a vmexit, > > * and thus force an ASID/VPID update on vmentry (that will flush the > > * cache). > > */ > > Once it's empty - yes, looks okay (with s/cache/TLB/). Ack. > >>> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, > >>> struct vcpu *v), > >>> if ( !flush_vcpu(ctxt, v) ) > >>> continue; > >>> > >>> -paging_update_cr3(v, false); > >>> +hvm_asid_flush_vcpu(v); > >>> > >>> cpu = read_atomic(&v->dirty_cpu); > >>> -if ( is_vcpu_dirty_cpu(cpu) ) > >>> +if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) ) > >>> __cpumask_set_cpu(cpu, mask); > >>> } > >>> > >>> -/* Flush TLBs on all CPUs with dirty vcpu state. */ > >>> -flush_tlb_mask(mask); > >>> - > >>> -/* Done. */ > >>> -for_each_vcpu ( d, v ) > >>> -if ( v != current && flush_vcpu(ctxt, v) ) > >>> -vcpu_unpause(v); > >>> +/* > >>> + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to > >>> force an > >>> + * ASID/VPID change and hence accomplish a guest TLB flush. Note > >>> that vCPUs > >>> + * not currently running will already be flushed when scheduled > >>> because of > >>> + * the ASID tickle done in the loop above. > >>> + */ > >>> +on_selected_cpus(mask, handle_flush, mask, 0); > >>> +while ( !cpumask_empty(mask) ) > >>> +cpu_relax(); > >> > >> Why do you pass 0 as last argument? If you passed 1, you wouldn't > >> need the while() here, handle_flush() could be empty, and you'd > >> save a perhaps large amount of CPUs to all try to modify two > >> cache lines (the one used by on_selected_cpus() itself and the > >> one here) instead of just one. Yes, latency from the last CPU > >> clearing its bit to you being able to exit from here may be a > >> little larger this way, but overall latency may shrink with the > >> cut by half amount of atomic ops issued to the bus. > > > > In fact I think passing 0 as the last argument is fine, and > > handle_flush can be empty in that case anyway. We don't really care > > whether on_selected_cpus returns before all CPUs have executed the > > dummy function, as long as all of them have taken a vmexit. Using 0 > > already guarantees that AFAICT. > > Isn't it that the caller ants to be guaranteed that the flush > has actually occurred (or at least that no further insns can > be executed prior to the flush happening, i.e. at least the VM > exit having occurred)? Yes, but that would happen with 0 as the last argument already, see the code in smp_call_function_interrupt: if ( call_data.wait ) { (*func)(info); smp_mb(); cpumask_clear_cpu(cpu, &call_data.selected); } else { smp_mb(); cpumask_clear_cpu(cpu, &call_data.selected); (*func)(info); } The only difference is whether on_selected_cpus can return before all the functions have been executed on all CPUs, or whether all CPUs have taken a vmexit. The later is fine for our use-case. > >> (Forcing an empty function to be called may seem odd, but sending > >> just some IPI [like the event check one] wouldn't be enough, as > >> you want to be sure the other side has actually received the > >> request.) > > > > Exactly, that's the reason for the empty function. > > But the function isn't empty. It will be now, since on_selected_cpus already does the needed synchronization as you pointed out. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls
On Fri, Feb 28, 2020 at 05:29:32PM +0100, Jan Beulich wrote: > On 19.02.2020 18:43, Roger Pau Monne wrote: > > --- a/xen/arch/x86/guest/hyperv/hyperv.c > > +++ b/xen/arch/x86/guest/hyperv/hyperv.c > > @@ -199,7 +199,7 @@ static void __init e820_fixup(struct e820map *e820) > > panic("Unable to reserve Hyper-V hypercall range\n"); > > } > > > > -static const struct hypervisor_ops ops = { > > +static const struct hypervisor_ops __initdata ops = { > > This needs to be __initconstrel in order to avoid triggering > (possibly only in the future) section mismatch warnings with > at least some gcc versions. With this and the other instance > adjusted I can do that when posting a new version, unless you want to pick this earlier and adjust on commit. > Reviewed-by: Jan Beulich Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag
On Fri, Feb 28, 2020 at 05:14:05PM +0100, Jan Beulich wrote: > On 19.02.2020 18:43, Roger Pau Monne wrote: > > Introduce a specific flag to request a HVM guest TLB flush, which is > > an ASID/VPID tickle that forces a linear TLB flush for all HVM guests. > > Here and below, what do you mean by "linear"? I guess you mean > TLBs holding translations from guest linear to guest physical, > but I think this could do with then also saying so, even if it's > more words. Yes, will change. > > This was previously unconditionally done in each pre_flush call, but > > that's not required: HVM guests not using shadow don't require linear > > TLB flushes as Xen doesn't modify the guest page tables in that case > > (ie: when using HAP). > > This explains the correctness in one direction. What about the > removal of this from the switch_cr3_cr4() path? AFAICT that's never used by shadow code to modify cr3 or cr4, and hence doesn't require a guest linear TLB flush. > And what about > our safety assumptions from the ticking of tlbflush_clock, > where we then imply that pages e.g. about to be freed can't > have any translations left in any TLBs anymore? I'm slightly confused. That flush only affects HVM guests linear TLB, and hence is not under Xen control unless shadow mode is used. Pages to be freed in the HAP case need to be flushed from the EPT/NPT, but whether there are references left in the guest TLB to point to that gfn really doesn't matter to Xen at all, since the guest is in full control of it's MMU and TLB in that case. For shadow any change in the guest page-tables should already be accompanied by a guest TLB flush, or else the guest could access no longer present entries, regardless of whether the affected pages are freed or not. > > --- a/xen/include/asm-x86/flushtlb.h > > +++ b/xen/include/asm-x86/flushtlb.h > > @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long > > cr4); > > #define FLUSH_VCPU_STATE 0x1000 > > /* Flush the per-cpu root page table */ > > #define FLUSH_ROOT_PGTBL 0x2000 > > + /* Flush all HVM guests linear TLB (using ASID/VPID) */ > > +#define FLUSH_GUESTS_TLB 0x4000 > > For one, the "all" is pretty misleading. A single such request > doesn't do this for all vCPU-s of all HVM guests, does it? It kind of does because it tickles the pCPU ASID/VPID generation ID, so any vCPU scheduled on the selected pCPUs will get a new ASID/VPID allocated and thus a clean TLB. > I'm > also struggling with the 'S' in "GUESTS" - why is it not just > "GUEST"? Any guest vCPUs running on the selected pCPUs will get a new ASID/VPID ID and thus a clean TLB. > I admit the names of the involved functions > (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat > misleading, as they don't actually do any flushing, they merely > arrange for what is in the TLB to no longer be able to be used, > so giving this a suitable name is "historically" complicated. > What if we did away with the hvm_flush_guest_tlbs() wrapper, > naming the constant here then after hvm_asid_flush_core(), e.g. > FLUSH_ASID_CORE? I'm not opposed to renaming. The comment before the definition was also meant to clarify it's usage, and hence the explicit mention of ASID/VPID. > I also think this constant might better be zero when !HVM. Yes, that's a good idea. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xev-devel][PATCH][linux-4.19] swiotlb-xen.c: Fixed cache invalidation fault
Problem: xen_swiotlb_map_sg_attrs() maps a set of buffers described in scatterlist. It calls xen_dma_map_page() and sets sglist.dma_address to the address calculated by xen_phys_to_bus(). Let's call it dma_address_1. xen_dma_map_page() maps the area and gets dma address different from dma_address_1, let's call it dma_address_2. Then function __swiotlb_map_page() makes conversion: dma_addr -> phys_addr -> virt_addr using dma_address_2 and passes "virt_addr" to __dma_map_area(), the latter successfully invalidates cache. When xen_swiotlb_unmap_sg_attrs() unmaps the area it passes dma_address_1 to __swiotlb_unmap_page(). __swiotlb_unmap_page() makes the same convertion: dma_addr -> phys_addr -> virt_addr using dma_address_1 and passes "virt_addr" to __dma_map_area(), the latter gets Data Abort in attempt to invalidate cache. The problem manfests on devices that have dma_pfn_offset != 0. Specifically, while loading brcmfmac module (BCM4345 WiFi chip) in Dom0. What was done: sglist.dma_address is set to phys_to_dma("phys"). Signed-off-by: Nataliya Korovkina --- drivers/xen/swiotlb-xen.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index a6f9ba85dc4b..9363a52e4756 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -583,7 +583,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, sg->length, dir, attrs); - sg->dma_address = dev_addr; + sg->dma_address = phys_to_dma(hwdev, dev_addr); } else { /* we are not interested in the dma_addr returned by * xen_dma_map_page, only in the potential cache flushes executed @@ -594,7 +594,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, sg->length, dir, attrs); - sg->dma_address = dev_addr; + sg->dma_address = phys_to_dma(hwdev, dev_addr); } sg_dma_len(sg) = sg->length; } -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
On 28.02.2020 17:31, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote: >> On 19.02.2020 18:43, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/mm/hap/hap.c >>> +++ b/xen/arch/x86/mm/hap/hap.c >>> @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int >>> do_locking, bool noflush) >>> hvm_update_guest_cr3(v, noflush); >>> } >>> >>> +/* >>> + * NB: doesn't actually perform any flush, used just to clear the CPU from >>> the >>> + * mask and hence signal that the guest TLB flush has been done. >>> + */ >> >> "has been done" isn't correct, as the flush may happen only later >> on (upon next VM entry). I think wording here needs to be as >> precise as possible, however the comment may turn out unnecessary >> altogether: > > What about: > > /* > * NB: Dummy function exclusively used as a way to trigger a vmexit, > * and thus force an ASID/VPID update on vmentry (that will flush the > * cache). > */ Once it's empty - yes, looks okay (with s/cache/TLB/). >>> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, >>> struct vcpu *v), >>> if ( !flush_vcpu(ctxt, v) ) >>> continue; >>> >>> -paging_update_cr3(v, false); >>> +hvm_asid_flush_vcpu(v); >>> >>> cpu = read_atomic(&v->dirty_cpu); >>> -if ( is_vcpu_dirty_cpu(cpu) ) >>> +if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) ) >>> __cpumask_set_cpu(cpu, mask); >>> } >>> >>> -/* Flush TLBs on all CPUs with dirty vcpu state. */ >>> -flush_tlb_mask(mask); >>> - >>> -/* Done. */ >>> -for_each_vcpu ( d, v ) >>> -if ( v != current && flush_vcpu(ctxt, v) ) >>> -vcpu_unpause(v); >>> +/* >>> + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to >>> force an >>> + * ASID/VPID change and hence accomplish a guest TLB flush. Note that >>> vCPUs >>> + * not currently running will already be flushed when scheduled >>> because of >>> + * the ASID tickle done in the loop above. >>> + */ >>> +on_selected_cpus(mask, handle_flush, mask, 0); >>> +while ( !cpumask_empty(mask) ) >>> +cpu_relax(); >> >> Why do you pass 0 as last argument? If you passed 1, you wouldn't >> need the while() here, handle_flush() could be empty, and you'd >> save a perhaps large amount of CPUs to all try to modify two >> cache lines (the one used by on_selected_cpus() itself and the >> one here) instead of just one. Yes, latency from the last CPU >> clearing its bit to you being able to exit from here may be a >> little larger this way, but overall latency may shrink with the >> cut by half amount of atomic ops issued to the bus. > > In fact I think passing 0 as the last argument is fine, and > handle_flush can be empty in that case anyway. We don't really care > whether on_selected_cpus returns before all CPUs have executed the > dummy function, as long as all of them have taken a vmexit. Using 0 > already guarantees that AFAICT. Isn't it that the caller ants to be guaranteed that the flush has actually occurred (or at least that no further insns can be executed prior to the flush happening, i.e. at least the VM exit having occurred)? >> (Forcing an empty function to be called may seem odd, but sending >> just some IPI [like the event check one] wouldn't be enough, as >> you want to be sure the other side has actually received the >> request.) > > Exactly, that's the reason for the empty function. But the function isn't empty. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks
On 28.02.2020 17:19, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 03:50:31PM +0100, Jan Beulich wrote: >> On 19.02.2020 18:43, Roger Pau Monne wrote: >>> Add shadow and hap implementation specific helpers to perform guest >>> TLB flushes. Note that the code for both is exactly the same at the >>> moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by >>> further patches that will add implementation specific optimizations to >>> them. >>> >>> No functional change intended. >>> >>> Signed-off-by: Roger Pau Monné >>> Reviewed-by: Wei Liu >>> Acked-by: Tim Deegan >> >> This looks good in principle, with one possible anomaly: >> >>> --- a/xen/arch/x86/hvm/hvm.c >>> +++ b/xen/arch/x86/hvm/hvm.c >>> @@ -3990,55 +3990,10 @@ static void hvm_s3_resume(struct domain *d) >>> bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v), >>> void *ctxt) >>> { >>> -static DEFINE_PER_CPU(cpumask_t, flush_cpumask); >>> -cpumask_t *mask = &this_cpu(flush_cpumask); >>> -struct domain *d = current->domain; >>> -struct vcpu *v; >>> - >>> -/* Avoid deadlock if more than one vcpu tries this at the same time. */ >>> -if ( !spin_trylock(&d->hypercall_deadlock_mutex) ) >>> -return false; >>> - >>> -/* Pause all other vcpus. */ >>> -for_each_vcpu ( d, v ) >>> -if ( v != current && flush_vcpu(ctxt, v) ) >>> -vcpu_pause_nosync(v); >>> - >>> -/* Now that all VCPUs are signalled to deschedule, we wait... */ >>> -for_each_vcpu ( d, v ) >>> -if ( v != current && flush_vcpu(ctxt, v) ) >>> -while ( !vcpu_runnable(v) && v->is_running ) >>> -cpu_relax(); >>> - >>> -/* All other vcpus are paused, safe to unlock now. */ >>> -spin_unlock(&d->hypercall_deadlock_mutex); >>> - >>> -cpumask_clear(mask); >>> - >>> -/* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). >>> */ >>> -for_each_vcpu ( d, v ) >>> -{ >>> -unsigned int cpu; >>> - >>> -if ( !flush_vcpu(ctxt, v) ) >>> -continue; >>> - >>> -paging_update_cr3(v, false); >>> +struct domain *currd = current->domain; >>> >>> -cpu = read_atomic(&v->dirty_cpu); >>> -if ( is_vcpu_dirty_cpu(cpu) ) >>> -__cpumask_set_cpu(cpu, mask); >>> -} >>> - >>> -/* Flush TLBs on all CPUs with dirty vcpu state. */ >>> -flush_tlb_mask(mask); >>> - >>> -/* Done. */ >>> -for_each_vcpu ( d, v ) >>> -if ( v != current && flush_vcpu(ctxt, v) ) >>> -vcpu_unpause(v); >>> - >>> -return true; >>> +return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt) >>> + : hap_flush_tlb(flush_vcpu, ctxt); >>> } >> >> Following our current model I think this should be a new pointer >> in struct paging_mode (then truly fitting "hooks" in the title). > > I tried doing it that way, but there was something weird about it, the > paging mode is per-vcpu, and hence I needed to do something like: > > paging_get_hostmode(current)->flush(current->domain, ...) I don't see anything wrong with the left side of the -> (it parallels what is needed for the write_p2m_entry() hook). For the right I can't see why you'd want to have current->domain there when both functions want flush_vcpu and ctxt. Ultimately we probably want per-vCPU and per-domain hooks in separate structures (the former for hooks where the current paging mode matters, the latter for those where it doesn't matter), but of course that's nothing I'm meaning to ask you to do. > I can try to move it to being a paging_mode hook if you prefer. It would seem cleaner to me, but ... >> I can see the desire to avoid the indirect call though, but I >> also think that if we were to go that route, we should settle on >> switching around others as well which are paging mode dependent. >> (FAOD this is nothing I ask you to do here.) Andrew, thoughts? ... as said I'd prefer to also know Andrew's opinion, in particular to settle where we would want this to move in the mid to long term. Whereas ... > I think it's already quite of a mixed bag, see track_dirty_vram for > example which uses a similar model. ... you probably know my typical response to something like this: Bad examples aren't to be taken as excuse to introduce further inconsistencies. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote: > On 19.02.2020 18:43, Roger Pau Monne wrote: > > --- a/xen/arch/x86/mm/hap/hap.c > > +++ b/xen/arch/x86/mm/hap/hap.c > > @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int > > do_locking, bool noflush) > > hvm_update_guest_cr3(v, noflush); > > } > > > > +/* > > + * NB: doesn't actually perform any flush, used just to clear the CPU from > > the > > + * mask and hence signal that the guest TLB flush has been done. > > + */ > > "has been done" isn't correct, as the flush may happen only later > on (upon next VM entry). I think wording here needs to be as > precise as possible, however the comment may turn out unnecessary > altogether: What about: /* * NB: Dummy function exclusively used as a way to trigger a vmexit, * and thus force an ASID/VPID update on vmentry (that will flush the * cache). */ > > @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, > > struct vcpu *v), > > if ( !flush_vcpu(ctxt, v) ) > > continue; > > > > -paging_update_cr3(v, false); > > +hvm_asid_flush_vcpu(v); > > > > cpu = read_atomic(&v->dirty_cpu); > > -if ( is_vcpu_dirty_cpu(cpu) ) > > +if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) ) > > __cpumask_set_cpu(cpu, mask); > > } > > > > -/* Flush TLBs on all CPUs with dirty vcpu state. */ > > -flush_tlb_mask(mask); > > - > > -/* Done. */ > > -for_each_vcpu ( d, v ) > > -if ( v != current && flush_vcpu(ctxt, v) ) > > -vcpu_unpause(v); > > +/* > > + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to > > force an > > + * ASID/VPID change and hence accomplish a guest TLB flush. Note that > > vCPUs > > + * not currently running will already be flushed when scheduled > > because of > > + * the ASID tickle done in the loop above. > > + */ > > +on_selected_cpus(mask, handle_flush, mask, 0); > > +while ( !cpumask_empty(mask) ) > > +cpu_relax(); > > Why do you pass 0 as last argument? If you passed 1, you wouldn't > need the while() here, handle_flush() could be empty, and you'd > save a perhaps large amount of CPUs to all try to modify two > cache lines (the one used by on_selected_cpus() itself and the > one here) instead of just one. Yes, latency from the last CPU > clearing its bit to you being able to exit from here may be a > little larger this way, but overall latency may shrink with the > cut by half amount of atomic ops issued to the bus. In fact I think passing 0 as the last argument is fine, and handle_flush can be empty in that case anyway. We don't really care whether on_selected_cpus returns before all CPUs have executed the dummy function, as long as all of them have taken a vmexit. Using 0 already guarantees that AFAICT. > (Forcing an empty function to be called may seem odd, but sending > just some IPI [like the event check one] wouldn't be enough, as > you want to be sure the other side has actually received the > request.) Exactly, that's the reason for the empty function. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls
On 19.02.2020 18:43, Roger Pau Monne wrote: > --- a/xen/arch/x86/guest/hyperv/hyperv.c > +++ b/xen/arch/x86/guest/hyperv/hyperv.c > @@ -199,7 +199,7 @@ static void __init e820_fixup(struct e820map *e820) > panic("Unable to reserve Hyper-V hypercall range\n"); > } > > -static const struct hypervisor_ops ops = { > +static const struct hypervisor_ops __initdata ops = { This needs to be __initconstrel in order to avoid triggering (possibly only in the future) section mismatch warnings with at least some gcc versions. With this and the other instance adjusted Reviewed-by: Jan Beulich Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 5/7] x86/tlb: allow disabling the TLB clock
On 19.02.2020 18:43, Roger Pau Monne wrote: > The TLB clock is helpful when running Xen on bare metal because when > doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the > last flush. > > This is not the case however when Xen is running virtualized, and the > underlying hypervisor provides mechanism to assist in performing TLB > flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in > order to perform a TLB flush without having to IPI each CPU. When > using such mechanisms it's no longer possible to keep a timestamp of > the flushes on each CPU, as they are performed by the underlying > hypervisor. > > Offer a boolean in order to signal Xen that the timestamped TLB > shouldn't be used. This avoids keeping the timestamps of the flushes, > and also forces NEED_FLUSH to always return true. > > No functional change intended, as this change doesn't introduce any > user that disables the timestamped TLB. > > Signed-off-by: Roger Pau Monné > Reviewed-by: Wei Liu Acked-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current
On Fri, Feb 28, 2020 at 04:47:57PM +0100, Jan Beulich wrote: > On 28.02.2020 16:27, Roger Pau Monné wrote: > > On Fri, Feb 28, 2020 at 02:29:09PM +0100, Jan Beulich wrote: > >> On 19.02.2020 18:43, Roger Pau Monne wrote: > >>> Current implementation of hvm_asid_flush_vcpu is not safe to use > >>> unless the target vCPU is either paused or the currently running one, > >>> as it modifies the generation without any locking. > >> > >> Indeed, but the issue you're taking care of is highly theoretical: > >> I don't think any sane compiler will split writes of the fields > >> to multiple insns. It would be nice if this was made clear here. > > > > What about adding: > > > >>> Fix this by using atomic operations when accessing the generation > >>> field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This > >>> allows to safely flush the current ASID generation. Note that for the > >>> flush to take effect if the vCPU is currently running a vmexit is > >>> required. > > > > "Most compilers will already do such writes and reads as a single > > instruction, so the usage of atomic operations is mostly used as a > > safety measure." > > > > Here? > > Could you perhaps start with "Compilers will normally ..." I'm fine > with the rest, it's just that "most compilers" still feels like > an understatement. Sure, that's fine. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks
On Fri, Feb 28, 2020 at 03:50:31PM +0100, Jan Beulich wrote: > On 19.02.2020 18:43, Roger Pau Monne wrote: > > Add shadow and hap implementation specific helpers to perform guest > > TLB flushes. Note that the code for both is exactly the same at the > > moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by > > further patches that will add implementation specific optimizations to > > them. > > > > No functional change intended. > > > > Signed-off-by: Roger Pau Monné > > Reviewed-by: Wei Liu > > Acked-by: Tim Deegan > > This looks good in principle, with one possible anomaly: > > > --- a/xen/arch/x86/hvm/hvm.c > > +++ b/xen/arch/x86/hvm/hvm.c > > @@ -3990,55 +3990,10 @@ static void hvm_s3_resume(struct domain *d) > > bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v), > > void *ctxt) > > { > > -static DEFINE_PER_CPU(cpumask_t, flush_cpumask); > > -cpumask_t *mask = &this_cpu(flush_cpumask); > > -struct domain *d = current->domain; > > -struct vcpu *v; > > - > > -/* Avoid deadlock if more than one vcpu tries this at the same time. */ > > -if ( !spin_trylock(&d->hypercall_deadlock_mutex) ) > > -return false; > > - > > -/* Pause all other vcpus. */ > > -for_each_vcpu ( d, v ) > > -if ( v != current && flush_vcpu(ctxt, v) ) > > -vcpu_pause_nosync(v); > > - > > -/* Now that all VCPUs are signalled to deschedule, we wait... */ > > -for_each_vcpu ( d, v ) > > -if ( v != current && flush_vcpu(ctxt, v) ) > > -while ( !vcpu_runnable(v) && v->is_running ) > > -cpu_relax(); > > - > > -/* All other vcpus are paused, safe to unlock now. */ > > -spin_unlock(&d->hypercall_deadlock_mutex); > > - > > -cpumask_clear(mask); > > - > > -/* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). > > */ > > -for_each_vcpu ( d, v ) > > -{ > > -unsigned int cpu; > > - > > -if ( !flush_vcpu(ctxt, v) ) > > -continue; > > - > > -paging_update_cr3(v, false); > > +struct domain *currd = current->domain; > > > > -cpu = read_atomic(&v->dirty_cpu); > > -if ( is_vcpu_dirty_cpu(cpu) ) > > -__cpumask_set_cpu(cpu, mask); > > -} > > - > > -/* Flush TLBs on all CPUs with dirty vcpu state. */ > > -flush_tlb_mask(mask); > > - > > -/* Done. */ > > -for_each_vcpu ( d, v ) > > -if ( v != current && flush_vcpu(ctxt, v) ) > > -vcpu_unpause(v); > > - > > -return true; > > +return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt) > > + : hap_flush_tlb(flush_vcpu, ctxt); > > } > > Following our current model I think this should be a new pointer > in struct paging_mode (then truly fitting "hooks" in the title). I tried doing it that way, but there was something weird about it, the paging mode is per-vcpu, and hence I needed to do something like: paging_get_hostmode(current)->flush(current->domain, ...) I can try to move it to being a paging_mode hook if you prefer. > I can see the desire to avoid the indirect call though, but I > also think that if we were to go that route, we should settle on > switching around others as well which are paging mode dependent. > (FAOD this is nothing I ask you to do here.) Andrew, thoughts? I think it's already quite of a mixed bag, see track_dirty_vram for example which uses a similar model. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag
On 19.02.2020 18:43, Roger Pau Monne wrote: > Introduce a specific flag to request a HVM guest TLB flush, which is > an ASID/VPID tickle that forces a linear TLB flush for all HVM guests. Here and below, what do you mean by "linear"? I guess you mean TLBs holding translations from guest linear to guest physical, but I think this could do with then also saying so, even if it's more words. > This was previously unconditionally done in each pre_flush call, but > that's not required: HVM guests not using shadow don't require linear > TLB flushes as Xen doesn't modify the guest page tables in that case > (ie: when using HAP). This explains the correctness in one direction. What about the removal of this from the switch_cr3_cr4() path? And what about our safety assumptions from the ticking of tlbflush_clock, where we then imply that pages e.g. about to be freed can't have any translations left in any TLBs anymore? > --- a/xen/include/asm-x86/flushtlb.h > +++ b/xen/include/asm-x86/flushtlb.h > @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4); > #define FLUSH_VCPU_STATE 0x1000 > /* Flush the per-cpu root page table */ > #define FLUSH_ROOT_PGTBL 0x2000 > + /* Flush all HVM guests linear TLB (using ASID/VPID) */ > +#define FLUSH_GUESTS_TLB 0x4000 For one, the "all" is pretty misleading. A single such request doesn't do this for all vCPU-s of all HVM guests, does it? I'm also struggling with the 'S' in "GUESTS" - why is it not just "GUEST"? I admit the names of the involved functions (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat misleading, as they don't actually do any flushing, they merely arrange for what is in the TLB to no longer be able to be used, so giving this a suitable name is "historically" complicated. What if we did away with the hvm_flush_guest_tlbs() wrapper, naming the constant here then after hvm_asid_flush_core(), e.g. FLUSH_ASID_CORE? I also think this constant might better be zero when !HVM. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current
On 28.02.2020 16:27, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 02:29:09PM +0100, Jan Beulich wrote: >> On 19.02.2020 18:43, Roger Pau Monne wrote: >>> Current implementation of hvm_asid_flush_vcpu is not safe to use >>> unless the target vCPU is either paused or the currently running one, >>> as it modifies the generation without any locking. >> >> Indeed, but the issue you're taking care of is highly theoretical: >> I don't think any sane compiler will split writes of the fields >> to multiple insns. It would be nice if this was made clear here. > > What about adding: > >>> Fix this by using atomic operations when accessing the generation >>> field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This >>> allows to safely flush the current ASID generation. Note that for the >>> flush to take effect if the vCPU is currently running a vmexit is >>> required. > > "Most compilers will already do such writes and reads as a single > instruction, so the usage of atomic operations is mostly used as a > safety measure." > > Here? Could you perhaps start with "Compilers will normally ..." I'm fine with the rest, it's just that "most compilers" still feels like an understatement. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/ioperm: add new paravirt function update_io_bitmap
Friendly ping... On 18.02.20 16:47, Juergen Gross wrote: Commit 111e7b15cf10f6 ("x86/ioperm: Extend IOPL config to control ioperm() as well") reworked the iopl syscall to use I/O bitmaps. Unfortunately this broke Xen PV domains using that syscall as there is currently no I/O bitmap support in PV domains. Add I/O bitmap support via a new paravirt function update_io_bitmap which Xen PV domains can use to update their I/O bitmaps via a hypercall. Fixes: 111e7b15cf10f6 ("x86/ioperm: Extend IOPL config to control ioperm() as well") Reported-by: Jan Beulich Cc: # 5.5 Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Tested-by: Jan Beulich --- arch/x86/include/asm/io_bitmap.h | 9 - arch/x86/include/asm/paravirt.h | 7 +++ arch/x86/include/asm/paravirt_types.h | 4 arch/x86/kernel/paravirt.c| 5 + arch/x86/kernel/process.c | 2 +- arch/x86/xen/enlighten_pv.c | 25 + 6 files changed, 50 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/io_bitmap.h b/arch/x86/include/asm/io_bitmap.h index 02c6ef8f7667..07344d82e88e 100644 --- a/arch/x86/include/asm/io_bitmap.h +++ b/arch/x86/include/asm/io_bitmap.h @@ -19,7 +19,14 @@ struct task_struct; void io_bitmap_share(struct task_struct *tsk); void io_bitmap_exit(void); -void tss_update_io_bitmap(void); +void native_tss_update_io_bitmap(void); + +#ifdef CONFIG_PARAVIRT_XXL +#include +#else +#define tss_update_io_bitmap native_tss_update_io_bitmap +#endif + #else static inline void io_bitmap_share(struct task_struct *tsk) { } static inline void io_bitmap_exit(void) { } diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 86e7317eb31f..694d8daf4983 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -295,6 +295,13 @@ static inline void write_idt_entry(gate_desc *dt, int entry, const gate_desc *g) PVOP_VCALL3(cpu.write_idt_entry, dt, entry, g); } +#ifdef CONFIG_X86_IOPL_IOPERM +static inline void tss_update_io_bitmap(void) +{ + PVOP_VCALL0(cpu.update_io_bitmap); +} +#endif + static inline void paravirt_activate_mm(struct mm_struct *prev, struct mm_struct *next) { diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 84812964d3dd..732f62e04ddb 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -140,6 +140,10 @@ struct pv_cpu_ops { void (*load_sp0)(unsigned long sp0); +#ifdef CONFIG_X86_IOPL_IOPERM + void (*update_io_bitmap)(void); +#endif + void (*wbinvd)(void); /* cpuid emulation, mostly so that caps bits can be disabled */ diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 789f5e4f89de..c131ba4e70ef 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -30,6 +30,7 @@ #include #include #include +#include /* * nop stub, which must not clobber anything *including the stack* to @@ -341,6 +342,10 @@ struct paravirt_patch_template pv_ops = { .cpu.iret = native_iret, .cpu.swapgs = native_swapgs, +#ifdef CONFIG_X86_IOPL_IOPERM + .cpu.update_io_bitmap = native_tss_update_io_bitmap, +#endif + .cpu.start_context_switch = paravirt_nop, .cpu.end_context_switch = paravirt_nop, diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 839b5244e3b7..3053c85e0e42 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -374,7 +374,7 @@ static void tss_copy_io_bitmap(struct tss_struct *tss, struct io_bitmap *iobm) /** * tss_update_io_bitmap - Update I/O bitmap before exiting to usermode */ -void tss_update_io_bitmap(void) +void native_tss_update_io_bitmap(void) { struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw); struct thread_struct *t = ¤t->thread; diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 1f756e8b..feaf2e68ee5c 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -72,6 +72,9 @@ #include #include #include +#ifdef CONFIG_X86_IOPL_IOPERM +#include +#endif #ifdef CONFIG_ACPI #include @@ -837,6 +840,25 @@ static void xen_load_sp0(unsigned long sp0) this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0); } +#ifdef CONFIG_X86_IOPL_IOPERM +static void xen_update_io_bitmap(void) +{ + struct physdev_set_iobitmap iobitmap; + struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw); + + native_tss_update_io_bitmap(); + + iobitmap.bitmap = (uint8_t *)(&tss->x86_tss) + + tss->x86_tss.io_bitmap_base; + if (tss->x86_tss.io_bitmap_base == IO_BITMAP_OFFSET_INVALID) + iobitmap.nr_ports = 0; + else + iobitmap.nr_ports = IO_BITMAP_BITS; + +
Re: [Xen-devel] [PATCH] x86/mm: fix dump_pagetables with Xen PV
Friendly ping... On 21.02.20 11:38, Juergen Gross wrote: Commit 2ae27137b2db89 ("x86: mm: convert dump_pagetables to use walk_page_range") broke Xen PV guests as the hypervisor reserved hole in the memory map was not taken into account. Fix that by starting the kernel range only at GUARD_HOLE_END_ADDR. Fixes: 2ae27137b2db89 ("x86: mm: convert dump_pagetables to use walk_page_range") Reported-by: Julien Grall Signed-off-by: Juergen Gross --- arch/x86/mm/dump_pagetables.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 64229dad7eab..69309cd56fdf 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -363,13 +363,8 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, { const struct ptdump_range ptdump_ranges[] = { #ifdef CONFIG_X86_64 - -#define normalize_addr_shift (64 - (__VIRTUAL_MASK_SHIFT + 1)) -#define normalize_addr(u) ((signed long)((u) << normalize_addr_shift) >> \ - normalize_addr_shift) - {0, PTRS_PER_PGD * PGD_LEVEL_MULT / 2}, - {normalize_addr(PTRS_PER_PGD * PGD_LEVEL_MULT / 2), ~0UL}, + {GUARD_HOLE_END_ADDR, ~0UL}, #else {0, ~0UL}, #endif ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current
On Fri, Feb 28, 2020 at 02:29:09PM +0100, Jan Beulich wrote: > On 19.02.2020 18:43, Roger Pau Monne wrote: > > Current implementation of hvm_asid_flush_vcpu is not safe to use > > unless the target vCPU is either paused or the currently running one, > > as it modifies the generation without any locking. > > Indeed, but the issue you're taking care of is highly theoretical: > I don't think any sane compiler will split writes of the fields > to multiple insns. It would be nice if this was made clear here. What about adding: > > Fix this by using atomic operations when accessing the generation > > field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This > > allows to safely flush the current ASID generation. Note that for the > > flush to take effect if the vCPU is currently running a vmexit is > > required. "Most compilers will already do such writes and reads as a single instruction, so the usage of atomic operations is mostly used as a safety measure." Here? > > Note the same could be achieved by introducing an extra field to > > hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call > > hvm_asid_flush_vcpu on the given vCPU before vmentry, this however > > seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU > > fields to 0, so there's no need to delay this to the vmentry ASID > > helper. > > > > This is not a bugfix as no callers that would violate the assumptions > > listed in the first paragraph have been found, but a preparatory > > change in order to allow remote flushing of HVM vCPUs. > > > > Signed-off-by: Roger Pau Monné > > Reviewed-by: Wei Liu > > With a suitable clarification added > Acked-by: Jan Beulich Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
On Fri, Feb 28, 2020 at 01:39:59PM +0100, Jan Beulich wrote: > On 28.02.2020 13:31, Roger Pau Monné wrote: > > On Fri, Feb 28, 2020 at 01:12:03PM +0100, Jan Beulich wrote: > >> --- a/xen/arch/x86/genapic/x2apic.c > >> +++ b/xen/arch/x86/genapic/x2apic.c > >> @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic > >> x2apic_phys = !iommu_intremap || > >>(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL); > >> } > >> -else if ( !x2apic_phys && !iommu_intremap ) > >> -{ > >> -printk("WARNING: x2APIC cluster mode is not supported without > >> interrupt remapping\n" > >> - "x2APIC: forcing phys mode\n"); > >> -x2apic_phys = true; > >> -} > >> +else if ( !x2apic_phys ) > >> +switch ( iommu_intremap ) > >> +{ > >> +case iommu_intremap_off: > >> +case iommu_intremap_restricted: > >> +printk("WARNING: x2APIC cluster mode is not supported %s > >> interrupt remapping\n" > >> + "x2APIC: forcing phys mode\n", > >> + iommu_intremap == iommu_intremap_off ? "without" > >> +: "with > >> restricted"); > >> +x2apic_phys = true; > > > > I think you also need to fixup the usage of iommu_intremap in __cpu_up > > so that CPUs with APIC IDs > 255 are not brought up when in > > iommu_intremap_restricted mode. > > That certainly wants changing, yes, but I view this as an orthogonal > adjustment, which I'd like to make only once I understand what the > behavior for APIC ID 0xff should be in this setup. I would say APIC ID 0xff should be the broadcast ID, or else remapped interrupts won't be able to use a broadcast destination? I'm however not able to find any mention to this in the AMD-Vi spec. So the check in __cpu_up should be adjusted to iommu_intremap != iommu_intremap_full I think, or else you won't be able to address the CPUs brought up from the interrupt remapping tables. Anyway, the change looks fine, so: Reviewed-by: Roger Pau Monné Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
On 28.02.2020 15:48, Andrew Cooper wrote: > On 28/02/2020 12:12, Jan Beulich wrote: >> The wider cluster mode APIC IDs aren't generally representable. Convert >> the iommu_intremap variable into a tristate, allowing the AMD IOMMU >> driver to signal this special restriction to the apic_x2apic_probe(). >> (Note: assignments to the variable get adjusted, while existing >> consumers - all assuming a boolean property - are left alone.) > > I think it would be helpful to state that while we are not aware of any > hardware with this as a restriction, it is a situation which can be > created on fully x2apic-capable systems via firmware settings. Added. >> Signed-off-by: Jan Beulich >> --- >> v2: New. >> >> --- a/xen/arch/x86/genapic/x2apic.c >> +++ b/xen/arch/x86/genapic/x2apic.c >> @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic >> x2apic_phys = !iommu_intremap || >>(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL); >> } >> -else if ( !x2apic_phys && !iommu_intremap ) >> -{ >> -printk("WARNING: x2APIC cluster mode is not supported without >> interrupt remapping\n" >> - "x2APIC: forcing phys mode\n"); >> -x2apic_phys = true; >> -} >> +else if ( !x2apic_phys ) >> +switch ( iommu_intremap ) >> +{ >> +case iommu_intremap_off: >> +case iommu_intremap_restricted: >> +printk("WARNING: x2APIC cluster mode is not supported %s >> interrupt remapping\n" >> + "x2APIC: forcing phys mode\n", > > Any chance to fold this into a single line with "- forcing phys mode\n" > as a suffix? I did consider doing so myself, but didn't do it then for being an unrelated change. Now that you ask for it - done. >> + iommu_intremap == iommu_intremap_off ? "without" >> +: "with >> restricted"); >> +x2apic_phys = true; >> +break; >> + >> +case iommu_intremap_full: >> +break; >> +} >> >> if ( x2apic_phys ) >> return &apic_x2apic_phys; >> --- a/xen/include/xen/iommu.h >> +++ b/xen/include/xen/iommu.h >> @@ -54,7 +54,18 @@ static inline bool_t dfn_eq(dfn_t x, dfn >> >> extern bool_t iommu_enable, iommu_enabled; >> extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx; >> -extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost; >> +extern bool_t iommu_snoop, iommu_qinval, iommu_intpost; >> +extern enum __packed iommu_intremap { >> + /* >> +* In order to allow traditional boolean uses of the iommu_intremap >> +* variable, the "off" value has to come first (yielding a value of >> zero). >> +*/ >> + iommu_intremap_off, >> +#ifdef CONFIG_X86 >> + iommu_intremap_restricted, > > This needs a note about its meaning. How about this? > > /* Interrupt remapping enabled, but only able to generate interrupts > with an 8-bit APIC ID. */ Added. > Otherwise, Reviewed-by: Andrew Cooper Thanks. > Not an issue for now, but "restricted" might become ambiguous with > future extensions. Yes, fair point. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [ovmf test] 147686: regressions - FAIL
flight 147686 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/147686/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 145767 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 145767 version targeted for testing: ovmf edfe16a6d9f8c6830d7ad93ee7616225fe4e9c13 baseline version: ovmf 70911f1f4aee0366b6122f2b90d367ec0f066beb Last test of basis 145767 2020-01-08 00:39:09 Z 51 days Failing since145774 2020-01-08 02:50:20 Z 51 days 133 attempts Testing same since 147686 2020-02-27 12:29:19 Z1 days1 attempts People who touched revisions under test: Aaron Li Albecki, Mateusz Amol N Sukerkar Anthony PERARD Antoine Coeur Ard Biesheuvel Ashish Singhal Bob Feng Bret Barkelew Brian R Haug Chasel Chiu Dandan Bi Eric Dong Fan, ZhijuX Felix Polyudov Guo Dong GuoMinJ Hao A Wu Heinrich Schuchardt Heng Luo Jason Voelz Jeff Brasen Jian J Wang Jiaxin Wu Kinney, Michael D Krzysztof Koch Laszlo Ersek Leif Lindholm Li, Aaron Liming Gao Liu, Zhiguang Mateusz Albecki Matthew Carlson Michael D Kinney Michael Kubacki Nicholas Armour Pavana.K Philippe Mathieu-Daud? Philippe Mathieu-Daude Philippe Mathieu-Daudé Philippe Mathieu-Daudé Pierre Gondois Ray Ni Sami Mujawar Sean Brogan Siyuan Fu Siyuan, Fu Star Zeng Steven Steven Shi Sudipto Paul Vitaly Cheptsov Vitaly Cheptsov via Groups.Io Wei6 Xu Wu Jiaxin Xu, Wei6 Zeng, Star Zhichao Gao Zhiguang Liu Zhiju.Fan jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 fail test-amd64-i386-xl-qemuu-ovmf-amd64 fail sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 5772 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 07/17] libxc/restore: STATIC_DATA_END inference for v2 compatibility
On 24/02/2020 17:32, Ian Jackson wrote: > Andrew Cooper writes ("[PATCH v2 07/17] libxc/restore: STATIC_DATA_END > inference for v2 compatibility"): >> A v3 stream can compatibly read a v2 stream by inferring the position of the >> STATIC_DATA_END record. >> >> v2 compatibility is only needed for x86. No other architectures exist yet, >> but they will have a minimum of v3 when introduced. >> >> The x86 HVM compatibility point being in handle_page_data() (which is common >> code) is a bit awkward. However, as the two compatibility points are subtly >> different, and it is (intentionally) not possible to call into arch specific >> code from common code (except via the ops hooks), use some #ifdef-ary and >> opencode the check, rather than make handle_page_data() a per-arch helper. > ... >> +#if defined(__i386__) || defined(__x86_64__) >> +/* v2 compat. Infer the position of STATIC_DATA_END. */ >> +if ( ctx->restore.format_version < 3 && >> !ctx->restore.seen_static_data_end ) >> +{ >> +rc = handle_static_data_end(ctx); >> +if ( rc ) > These 17 lines appears twice, in basically identical form. Could it > be refactored ? Not really, no. The error handling (i.e. half of those 17 lines) is different, making it somewhat awkward to fit into a static inline. More importantly however, by design, common code can't call arch-specific code without a restore_ops hook. Deduping these would require breaking the restriction which is currently doing a decent job of keeping x86-isms out of common code. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks
On 19.02.2020 18:43, Roger Pau Monne wrote: > Add shadow and hap implementation specific helpers to perform guest > TLB flushes. Note that the code for both is exactly the same at the > moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by > further patches that will add implementation specific optimizations to > them. > > No functional change intended. > > Signed-off-by: Roger Pau Monné > Reviewed-by: Wei Liu > Acked-by: Tim Deegan This looks good in principle, with one possible anomaly: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -3990,55 +3990,10 @@ static void hvm_s3_resume(struct domain *d) > bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v), > void *ctxt) > { > -static DEFINE_PER_CPU(cpumask_t, flush_cpumask); > -cpumask_t *mask = &this_cpu(flush_cpumask); > -struct domain *d = current->domain; > -struct vcpu *v; > - > -/* Avoid deadlock if more than one vcpu tries this at the same time. */ > -if ( !spin_trylock(&d->hypercall_deadlock_mutex) ) > -return false; > - > -/* Pause all other vcpus. */ > -for_each_vcpu ( d, v ) > -if ( v != current && flush_vcpu(ctxt, v) ) > -vcpu_pause_nosync(v); > - > -/* Now that all VCPUs are signalled to deschedule, we wait... */ > -for_each_vcpu ( d, v ) > -if ( v != current && flush_vcpu(ctxt, v) ) > -while ( !vcpu_runnable(v) && v->is_running ) > -cpu_relax(); > - > -/* All other vcpus are paused, safe to unlock now. */ > -spin_unlock(&d->hypercall_deadlock_mutex); > - > -cpumask_clear(mask); > - > -/* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */ > -for_each_vcpu ( d, v ) > -{ > -unsigned int cpu; > - > -if ( !flush_vcpu(ctxt, v) ) > -continue; > - > -paging_update_cr3(v, false); > +struct domain *currd = current->domain; > > -cpu = read_atomic(&v->dirty_cpu); > -if ( is_vcpu_dirty_cpu(cpu) ) > -__cpumask_set_cpu(cpu, mask); > -} > - > -/* Flush TLBs on all CPUs with dirty vcpu state. */ > -flush_tlb_mask(mask); > - > -/* Done. */ > -for_each_vcpu ( d, v ) > -if ( v != current && flush_vcpu(ctxt, v) ) > -vcpu_unpause(v); > - > -return true; > +return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt) > + : hap_flush_tlb(flush_vcpu, ctxt); > } Following our current model I think this should be a new pointer in struct paging_mode (then truly fitting "hooks" in the title). I can see the desire to avoid the indirect call though, but I also think that if we were to go that route, we should settle on switching around others as well which are paging mode dependent. (FAOD this is nothing I ask you to do here.) Andrew, thoughts? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
On 28/02/2020 12:12, Jan Beulich wrote: > The wider cluster mode APIC IDs aren't generally representable. Convert > the iommu_intremap variable into a tristate, allowing the AMD IOMMU > driver to signal this special restriction to the apic_x2apic_probe(). > (Note: assignments to the variable get adjusted, while existing > consumers - all assuming a boolean property - are left alone.) I think it would be helpful to state that while we are not aware of any hardware with this as a restriction, it is a situation which can be created on fully x2apic-capable systems via firmware settings. > > Signed-off-by: Jan Beulich > --- > v2: New. > > --- a/xen/arch/x86/genapic/x2apic.c > +++ b/xen/arch/x86/genapic/x2apic.c > @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic > x2apic_phys = !iommu_intremap || >(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL); > } > -else if ( !x2apic_phys && !iommu_intremap ) > -{ > -printk("WARNING: x2APIC cluster mode is not supported without > interrupt remapping\n" > - "x2APIC: forcing phys mode\n"); > -x2apic_phys = true; > -} > +else if ( !x2apic_phys ) > +switch ( iommu_intremap ) > +{ > +case iommu_intremap_off: > +case iommu_intremap_restricted: > +printk("WARNING: x2APIC cluster mode is not supported %s > interrupt remapping\n" > + "x2APIC: forcing phys mode\n", Any chance to fold this into a single line with "- forcing phys mode\n" as a suffix? > + iommu_intremap == iommu_intremap_off ? "without" > +: "with restricted"); > +x2apic_phys = true; > +break; > + > +case iommu_intremap_full: > +break; > +} > > if ( x2apic_phys ) > return &apic_x2apic_phys; > --- a/xen/include/xen/iommu.h > +++ b/xen/include/xen/iommu.h > @@ -54,7 +54,18 @@ static inline bool_t dfn_eq(dfn_t x, dfn > > extern bool_t iommu_enable, iommu_enabled; > extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx; > -extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost; > +extern bool_t iommu_snoop, iommu_qinval, iommu_intpost; > +extern enum __packed iommu_intremap { > + /* > +* In order to allow traditional boolean uses of the iommu_intremap > +* variable, the "off" value has to come first (yielding a value of zero). > +*/ > + iommu_intremap_off, > +#ifdef CONFIG_X86 > + iommu_intremap_restricted, This needs a note about its meaning. How about this? /* Interrupt remapping enabled, but only able to generate interrupts with an 8-bit APIC ID. */ Otherwise, Reviewed-by: Andrew Cooper Not an issue for now, but "restricted" might become ambiguous with future extensions. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable
On Fri, Feb 28, 2020 at 02:13:45PM +0100, Jan Beulich wrote: > On 28.02.2020 13:41, Roger Pau Monné wrote: > > On Fri, Feb 28, 2020 at 01:10:59PM +0100, Jan Beulich wrote: > >> We should neither cause IOMMU initialization as a whole to fail in this > >> case (we should still be able to bring up the system in non-x2APIC or > >> x2APIC physical mode), nor should the remainder of the function be > >> skipped (as the main part of it won't get entered a 2nd time) > > > > I'm not sure I see why it won't get entered a second time. AFAICT > > init_done won't be set if amd_iommu_init fails, and hence will be > > called again with xt == false in iov_detect? > > Not very far from the top of the function there is > > /* Have we been here before? */ > if ( ivrs_bdf_entries ) > return 0; > > Hence me saying "main part" rather than "the function as a whole". Oh, yes, sorry. Reviewed-by: Roger Pau Monné Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
On 19.02.2020 18:43, Roger Pau Monne wrote: > --- a/xen/arch/x86/mm/hap/hap.c > +++ b/xen/arch/x86/mm/hap/hap.c > @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int > do_locking, bool noflush) > hvm_update_guest_cr3(v, noflush); > } > > +/* > + * NB: doesn't actually perform any flush, used just to clear the CPU from > the > + * mask and hence signal that the guest TLB flush has been done. > + */ "has been done" isn't correct, as the flush may happen only later on (upon next VM entry). I think wording here needs to be as precise as possible, however the comment may turn out unnecessary altogether: > @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, > struct vcpu *v), > if ( !flush_vcpu(ctxt, v) ) > continue; > > -paging_update_cr3(v, false); > +hvm_asid_flush_vcpu(v); > > cpu = read_atomic(&v->dirty_cpu); > -if ( is_vcpu_dirty_cpu(cpu) ) > +if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) ) > __cpumask_set_cpu(cpu, mask); > } > > -/* Flush TLBs on all CPUs with dirty vcpu state. */ > -flush_tlb_mask(mask); > - > -/* Done. */ > -for_each_vcpu ( d, v ) > -if ( v != current && flush_vcpu(ctxt, v) ) > -vcpu_unpause(v); > +/* > + * Trigger a vmexit on all pCPUs with dirty vCPU state in order to force > an > + * ASID/VPID change and hence accomplish a guest TLB flush. Note that > vCPUs > + * not currently running will already be flushed when scheduled because > of > + * the ASID tickle done in the loop above. > + */ > +on_selected_cpus(mask, handle_flush, mask, 0); > +while ( !cpumask_empty(mask) ) > +cpu_relax(); Why do you pass 0 as last argument? If you passed 1, you wouldn't need the while() here, handle_flush() could be empty, and you'd save a perhaps large amount of CPUs to all try to modify two cache lines (the one used by on_selected_cpus() itself and the one here) instead of just one. Yes, latency from the last CPU clearing its bit to you being able to exit from here may be a little larger this way, but overall latency may shrink with the cut by half amount of atomic ops issued to the bus. (Forcing an empty function to be called may seem odd, but sending just some IPI [like the event check one] wouldn't be enough, as you want to be sure the other side has actually received the request.) Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/6] x86 / p2m: remove page_list check in p2m_alloc_table
There does not seem to be any justification for refusing to create the domain's p2m table simply because it may have assigned pages. Particularly it prevents the prior allocation of PGC_extra pages. Signed-off-by: Paul Durrant --- Cc: Jan Beulich Cc: Andrew Cooper Cc: George Dunlap Cc: Wei Liu Cc: "Roger Pau Monné" v2: - New in v2 --- xen/arch/x86/mm/p2m.c | 8 1 file changed, 8 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 3719deae77..9fd4b115be 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -695,14 +695,6 @@ int p2m_alloc_table(struct p2m_domain *p2m) p2m_lock(p2m); -if ( p2m_is_hostp2m(p2m) - && !page_list_empty(&d->page_list) ) -{ -P2M_ERROR("dom %d already has memory allocated\n", d->domain_id); -p2m_unlock(p2m); -return -EINVAL; -} - if ( pagetable_get_pfn(p2m_get_pagetable(p2m)) != 0 ) { P2M_ERROR("p2m already allocated for this domain\n"); -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 6/6] domain: use PGC_extra domheap page for shared_info
Currently shared_info is a shared xenheap page but shared xenheap pages complicate future plans for live-update of Xen so it is desirable to, where possible, not use them [1]. This patch therefore converts shared_info into a PGC_extra domheap page. This does entail freeing shared_info during domain_relinquish_resources() rather than domain_destroy() so care is needed to avoid de-referencing a NULL shared_info pointer hence some extra checks of 'is_dying' are needed. NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is left in place since it is idempotent and called in the error path for arch_domain_create(). [1] See https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg02018.html Signed-off-by: Paul Durrant --- Cc: Stefano Stabellini Cc: Julien Grall Cc: Volodymyr Babchuk Cc: Andrew Cooper Cc: George Dunlap Cc: Ian Jackson Cc: Jan Beulich Cc: Konrad Rzeszutek Wilk Cc: Wei Liu v2: - Addressed comments from Julien - Expanded the commit comment to explain why this patch is wanted --- xen/arch/arm/domain.c | 2 ++ xen/arch/x86/domain.c | 3 ++- xen/common/domain.c| 28 xen/common/event_channel.c | 3 +++ xen/common/time.c | 15 +++ 5 files changed, 46 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 2cbcdaac08..3904519256 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -1006,6 +1006,8 @@ int domain_relinquish_resources(struct domain *d) BUG(); } +free_shared_info(d); + return 0; } diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index eb7b0fc51c..3ad532eccf 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -691,7 +691,6 @@ void arch_domain_destroy(struct domain *d) pv_domain_destroy(d); free_perdomain_mappings(d); -free_shared_info(d); cleanup_domain_irq_mapping(d); psr_domain_free(d); @@ -2246,6 +2245,8 @@ int domain_relinquish_resources(struct domain *d) if ( is_hvm_domain(d) ) hvm_domain_relinquish_resources(d); +free_shared_info(d); + return 0; } diff --git a/xen/common/domain.c b/xen/common/domain.c index ba7a905258..886206f648 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -1650,24 +1650,44 @@ int continue_hypercall_on_cpu( int alloc_shared_info(struct domain *d, unsigned int memflags) { -if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) == NULL ) +struct page_info *pg; + +pg = alloc_domheap_page(d, MEMF_no_refcount | memflags); +if ( !pg ) return -ENOMEM; -d->shared_info.mfn = virt_to_mfn(d->shared_info.virt); +if ( !get_page_and_type(pg, d, PGT_writable_page) ) +{ +/* + * The domain should not be running at this point so there is + * no way we should reach this error path. + */ +ASSERT_UNREACHABLE(); +return -ENODATA; +} + +d->shared_info.mfn = page_to_mfn(pg); +d->shared_info.virt = __map_domain_page_global(pg); clear_page(d->shared_info.virt); -share_xen_page_with_guest(mfn_to_page(d->shared_info.mfn), d, SHARE_rw); return 0; } void free_shared_info(struct domain *d) { +struct page_info *pg; + if ( !d->shared_info.virt ) return; -free_xenheap_page(d->shared_info.virt); +unmap_domain_page_global(d->shared_info.virt); d->shared_info.virt = NULL; + +pg = mfn_to_page(d->shared_info.mfn); + +put_page_alloc_ref(pg); +put_page_and_type(pg); } /* diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index e86e2bfab0..a17422284d 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -1325,6 +1325,9 @@ void evtchn_destroy(struct domain *d) { unsigned int i; +/* This must be done before shared_info is freed */ +BUG_ON(!d->shared_info.virt); + /* After this barrier no new event-channel allocations can occur. */ BUG_ON(!d->is_dying); spin_barrier(&d->event_lock); diff --git a/xen/common/time.c b/xen/common/time.c index 58fa9abc40..ada02faf07 100644 --- a/xen/common/time.c +++ b/xen/common/time.c @@ -99,6 +99,18 @@ void update_domain_wallclock_time(struct domain *d) uint32_t *wc_version; uint64_t sec; +if ( d != current->domain ) +{ +/* + * We need to check is_dying here as, if it is set, the + * shared_info may have been freed. To do this safely we need + * hold the domain lock. + */ +domain_lock(d); +if ( d->is_dying ) +goto unlock; +} + spin_lock(&wc_lock); wc_version = &shared_info(d, wc_version); @@ -121,6 +133,9 @@ void update_domain_wallclock_time(struct domain *d) *wc_version = version_update_end(*wc_version); spin_unlock(&wc_lock); + unlock: +if ( d != current->domain ) +domain_unlock(d); } /* Set clock to
[Xen-devel] [PATCH v2 0/6] remove one more shared xenheap page: shared_info
Patches #2 and #3 have been split out of the previous version of patch #6 (which was patch #2 of the previous series). Patch #4 is not entirely related but is useful to have in place before patch #5. Patch #5 is also new. Paul Durrant (6): domain: introduce alloc/free_shared_info() helpers... x86 / p2m: remove page_list check in p2m_alloc_table x86 / pv: do not treat PGC_extra pages as RAM when constructing dom0 x86 / ioreq: use a MEMF_no_refcount allocation for server pages... mm: add 'is_special_page' macro... domain: use PGC_extra domheap page for shared_info xen/arch/arm/domain.c | 10 +++ xen/arch/arm/mm.c | 2 +- xen/arch/x86/domain.c | 12 - xen/arch/x86/domctl.c | 2 +- xen/arch/x86/hvm/ioreq.c| 2 +- xen/arch/x86/mm.c | 11 xen/arch/x86/mm/altp2m.c| 2 +- xen/arch/x86/mm/mem_sharing.c | 2 +- xen/arch/x86/mm/p2m.c | 8 -- xen/arch/x86/mm/shadow/common.c | 13 ++ xen/arch/x86/pv/dom0_build.c| 6 - xen/arch/x86/pv/shim.c | 2 +- xen/common/domain.c | 46 + xen/common/domctl.c | 2 +- xen/common/event_channel.c | 3 +++ xen/common/time.c | 19 -- xen/include/asm-x86/shared.h| 15 ++- xen/include/xen/domain.h| 3 +++ xen/include/xen/mm.h| 3 +++ xen/include/xen/sched.h | 5 +++- xen/include/xen/shared.h| 2 +- 21 files changed, 119 insertions(+), 51 deletions(-) --- Cc: Andrew Cooper Cc: George Dunlap Cc: Ian Jackson Cc: Jan Beulich Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Paul Durrant Cc: "Roger Pau Monné" Cc: Stefano Stabellini Cc: Tamas K Lengyel Cc: Tim Deegan Cc: Volodymyr Babchuk Cc: Wei Liu -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 5/6] mm: add 'is_special_page' macro...
... to cover xenheap and PGC_extra pages. PGC_extra pages are intended to hold data structures that are associated with a domain and my be mapped by that domain. They should not be treated as 'normal' guest pages (i.e. RAM or page tables). Hence, in many cases where code currently tests is_xen_heap_page() it should also check for the PGC_extra bit in 'count_info'. This patch therefore defines is_special_page() to cover both cases and converts tests if is_xen_heap_page() to is_special_page() where appropriate. Signed-off-by: Paul Durrant --- Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: "Roger Pau Monné" Cc: George Dunlap Cc: Ian Jackson Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Tamas K Lengyel Cc: Tim Deegan v2: - New in v2 --- xen/arch/x86/domctl.c | 2 +- xen/arch/x86/mm.c | 9 - xen/arch/x86/mm/altp2m.c| 2 +- xen/arch/x86/mm/mem_sharing.c | 2 +- xen/arch/x86/mm/shadow/common.c | 13 - xen/include/xen/mm.h| 3 +++ 6 files changed, 18 insertions(+), 13 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index ed86762fa6..add70126b9 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -394,7 +394,7 @@ long arch_do_domctl( page = get_page_from_gfn(d, gfn, &t, P2M_ALLOC); if ( unlikely(!page) || - unlikely(is_xen_heap_page(page)) ) + unlikely(is_special_page(page)) ) { if ( unlikely(p2m_is_broken(t)) ) type = XEN_DOMCTL_PFINFO_BROKEN; diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 5067125cbb..051ea5ebd6 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1012,7 +1012,7 @@ get_page_from_l1e( unsigned long cacheattr = pte_flags_to_cacheattr(l1f); int err; -if ( is_xen_heap_page(page) ) +if ( is_special_page(page) ) { if ( write ) put_page_type(page); @@ -2445,7 +2445,7 @@ static int cleanup_page_mappings(struct page_info *page) { page->count_info &= ~PGC_cacheattr_mask; -BUG_ON(is_xen_heap_page(page)); +BUG_ON(is_special_page(page)); rc = update_xen_mappings(mfn, 0); } @@ -2475,7 +2475,7 @@ static int cleanup_page_mappings(struct page_info *page) rc = rc2; } -if ( likely(!is_xen_heap_page(page)) ) +if ( likely(!is_special_page(page)) ) { ASSERT((page->u.inuse.type_info & (PGT_type_mask | PGT_count_mask)) == PGT_writable_page); @@ -4214,8 +4214,7 @@ int steal_page( if ( !(owner = page_get_owner_and_reference(page)) ) goto fail; -if ( owner != d || is_xen_heap_page(page) || - (page->count_info & PGC_extra) ) +if ( owner != d || is_special_page(page) ) goto fail_put; /* diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c index 50768f2547..c091b03ea3 100644 --- a/xen/arch/x86/mm/altp2m.c +++ b/xen/arch/x86/mm/altp2m.c @@ -77,7 +77,7 @@ int altp2m_vcpu_enable_ve(struct vcpu *v, gfn_t gfn) * pageable() predicate for this, due to it having the same properties * that we want. */ -if ( !p2m_is_pageable(p2mt) || is_xen_heap_page(pg) ) +if ( !p2m_is_pageable(p2mt) || is_special_page(pg) ) { rc = -EINVAL; goto err; diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 3835bc928f..c14a724c6d 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -842,7 +842,7 @@ static int nominate_page(struct domain *d, gfn_t gfn, /* Skip xen heap pages */ page = mfn_to_page(mfn); -if ( !page || is_xen_heap_page(page) ) +if ( !page || is_special_page(page) ) goto out; /* Check if there are mem_access/remapped altp2m entries for this page */ diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c index cba3ab1eba..e835940d86 100644 --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -2087,19 +2087,22 @@ static int sh_remove_all_mappings(struct domain *d, mfn_t gmfn, gfn_t gfn) * The qemu helper process has an untyped mapping of this dom's RAM * and the HVM restore program takes another. * Also allow one typed refcount for - * - Xen heap pages, to match share_xen_page_with_guest(), - * - ioreq server pages, to match prepare_ring_for_helper(). + * - special pages, which are explicitly referenced and mapped by + * Xen. + * - ioreq server pages, which may be special pages or normal + * guest pages with an extra reference taken by + * prepare_ring_for_helper(). */ if ( !(shadow_mode_external(d) && (page->count_info & PGC_count_mask) <= 3 && ((page->u.inuse.type_info & PGT_
[Xen-devel] [PATCH 4/6] x86 / ioreq: use a MEMF_no_refcount allocation for server pages...
... now that it is safe to assign them. This avoids relying on libxl (or whatever toolstack is in use) setting max_pages up with sufficient 'slop' to allow all necessary ioreq server pages to be allocated. Signed-off-by: Paul Durrant --- Cc: Paul Durrant Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: "Roger Pau Monné" v2: - New in v2 --- xen/arch/x86/hvm/ioreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index f8a5c81546..648ef9137f 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -375,7 +375,7 @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) return 0; } -page = alloc_domheap_page(s->target, 0); +page = alloc_domheap_page(s->target, MEMF_no_refcount); if ( !page ) return -ENOMEM; -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 3/6] x86 / pv: do not treat PGC_extra pages as RAM when constructing dom0
The walk of page_list in dom0_construct_pv() should ignore PGC_extra pages as they are only created for special purposes and, if mapped, will be mapped explicitly for whatever purpose is relevant. Signed-off-by: Paul Durrant --- Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: "Roger Pau Monné" v2: - New in v2 --- xen/arch/x86/pv/dom0_build.c | 4 1 file changed, 4 insertions(+) diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index dc16ef2e79..f8f1bbe2f4 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -792,6 +792,10 @@ int __init dom0_construct_pv(struct domain *d, { mfn = mfn_x(page_to_mfn(page)); BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn))); + +if ( page->count_info & PGC_extra ) +continue; + if ( get_gpfn_from_mfn(mfn) >= count ) { BUG_ON(is_pv_32bit_domain(d)); -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/6] domain: introduce alloc/free_shared_info() helpers...
... and save the MFN. This patch modifies the 'shared_info' field of struct domain to be a structure comprising an MFN and a virtual address. Allocations are still done from xenheap, so the virtual address still equates to virt_to_mfn() called on the MFN but subsequent patch will change this. Hence the need to save the MFN. NOTE: Whist defining the new helpers, virt_to_mfn() in common/domain.c is made type safe. The definition of nmi_reason() in asm-x86/shared.h is also re- flowed to avoid overly long lines. Signed-off-by: Paul Durrant Reviewed-by: Julien Grall --- Cc: Stefano Stabellini Cc: Volodymyr Babchuk Cc: Andrew Cooper Cc: George Dunlap Cc: Ian Jackson Cc: Jan Beulich Cc: Konrad Rzeszutek Wilk Cc: Wei Liu Cc: "Roger Pau Monné" --- xen/arch/arm/domain.c| 8 ++-- xen/arch/arm/mm.c| 2 +- xen/arch/x86/domain.c| 11 --- xen/arch/x86/mm.c| 2 +- xen/arch/x86/pv/dom0_build.c | 2 +- xen/arch/x86/pv/shim.c | 2 +- xen/common/domain.c | 26 ++ xen/common/domctl.c | 2 +- xen/common/time.c| 4 ++-- xen/include/asm-x86/shared.h | 15 --- xen/include/xen/domain.h | 3 +++ xen/include/xen/sched.h | 5 - xen/include/xen/shared.h | 2 +- 13 files changed, 55 insertions(+), 29 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index aa3df3b3ba..2cbcdaac08 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -690,13 +690,9 @@ int arch_domain_create(struct domain *d, if ( (rc = p2m_init(d)) != 0 ) goto fail; -rc = -ENOMEM; -if ( (d->shared_info = alloc_xenheap_pages(0, 0)) == NULL ) +if ( (rc = alloc_shared_info(d, 0)) != 0 ) goto fail; -clear_page(d->shared_info); -share_xen_page_with_guest(virt_to_page(d->shared_info), d, SHARE_rw); - switch ( config->arch.gic_version ) { case XEN_DOMCTL_CONFIG_GIC_V2: @@ -767,7 +763,7 @@ void arch_domain_destroy(struct domain *d) p2m_teardown(d); domain_vgic_free(d); domain_vuart_free(d); -free_xenheap_page(d->shared_info); +free_shared_info(d); #ifdef CONFIG_ACPI free_xenheap_pages(d->arch.efi_acpi_table, get_order_from_bytes(d->arch.efi_acpi_len)); diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 727107eefa..2bb592101d 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1424,7 +1424,7 @@ int xenmem_add_to_physmap_one( if ( idx != 0 ) return -EINVAL; -mfn = virt_to_mfn(d->shared_info); +mfn = d->shared_info.mfn; t = p2m_ram_rw; break; diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index fe63c23676..eb7b0fc51c 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -612,12 +612,9 @@ int arch_domain_create(struct domain *d, * The shared_info machine address must fit in a 32-bit field within a * 32-bit guest's start_info structure. Hence we specify MEMF_bits(32). */ -if ( (d->shared_info = alloc_xenheap_pages(0, MEMF_bits(32))) == NULL ) +if ( (rc = alloc_shared_info(d, MEMF_bits(32))) != 0 ) goto fail; -clear_page(d->shared_info); -share_xen_page_with_guest(virt_to_page(d->shared_info), d, SHARE_rw); - if ( (rc = init_domain_irq_mapping(d)) != 0 ) goto fail; @@ -665,7 +662,7 @@ int arch_domain_create(struct domain *d, psr_domain_free(d); iommu_domain_destroy(d); cleanup_domain_irq_mapping(d); -free_xenheap_page(d->shared_info); +free_shared_info(d); xfree(d->arch.cpuid); xfree(d->arch.msr); if ( paging_initialised ) @@ -694,7 +691,7 @@ void arch_domain_destroy(struct domain *d) pv_domain_destroy(d); free_perdomain_mappings(d); -free_xenheap_page(d->shared_info); +free_shared_info(d); cleanup_domain_irq_mapping(d); psr_domain_free(d); @@ -720,7 +717,7 @@ void arch_domain_unpause(struct domain *d) int arch_domain_soft_reset(struct domain *d) { -struct page_info *page = virt_to_page(d->shared_info), *new_page; +struct page_info *page = mfn_to_page(d->shared_info.mfn), *new_page; int ret = 0; struct domain *owner; mfn_t mfn; diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 70b87c4830..5067125cbb 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4538,7 +4538,7 @@ int xenmem_add_to_physmap_one( { case XENMAPSPACE_shared_info: if ( idx == 0 ) -mfn = virt_to_mfn(d->shared_info); +mfn = d->shared_info.mfn; break; case XENMAPSPACE_grant_table: rc = gnttab_map_frame(d, idx, gpfn, &mfn); diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index 5678da782d..dc16ef2e79 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -743,7 +743,7 @@ int __init dom0_co
Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current
On 19.02.2020 18:43, Roger Pau Monne wrote: > Current implementation of hvm_asid_flush_vcpu is not safe to use > unless the target vCPU is either paused or the currently running one, > as it modifies the generation without any locking. Indeed, but the issue you're taking care of is highly theoretical: I don't think any sane compiler will split writes of the fields to multiple insns. It would be nice if this was made clear here. > Fix this by using atomic operations when accessing the generation > field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This > allows to safely flush the current ASID generation. Note that for the > flush to take effect if the vCPU is currently running a vmexit is > required. > > Note the same could be achieved by introducing an extra field to > hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call > hvm_asid_flush_vcpu on the given vCPU before vmentry, this however > seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU > fields to 0, so there's no need to delay this to the vmentry ASID > helper. > > This is not a bugfix as no callers that would violate the assumptions > listed in the first paragraph have been found, but a preparatory > change in order to allow remote flushing of HVM vCPUs. > > Signed-off-by: Roger Pau Monné > Reviewed-by: Wei Liu With a suitable clarification added Acked-by: Jan Beulich Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable
On 28.02.2020 13:41, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 01:10:59PM +0100, Jan Beulich wrote: >> We should neither cause IOMMU initialization as a whole to fail in this >> case (we should still be able to bring up the system in non-x2APIC or >> x2APIC physical mode), nor should the remainder of the function be >> skipped (as the main part of it won't get entered a 2nd time) > > I'm not sure I see why it won't get entered a second time. AFAICT > init_done won't be set if amd_iommu_init fails, and hence will be > called again with xt == false in iov_detect? Not very far from the top of the function there is /* Have we been here before? */ if ( ivrs_bdf_entries ) return 0; Hence me saying "main part" rather than "the function as a whole". Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable
On 28.02.2020 13:41, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 01:10:59PM +0100, Jan Beulich wrote: >> We should neither cause IOMMU initialization as a whole to fail in this >> case (we should still be able to bring up the system in non-x2APIC or >> x2APIC physical mode), nor should the remainder of the function be >> skipped (as the main part of it won't get entered a 2nd time) > > I'm not sure I see why it won't get entered a second time. AFAICT > init_done won't be set if amd_iommu_init fails, and hence will be > called again with xt == false in iov_detect? > >> in such an >> event. It is merely necessary for the function to indicate to the caller >> (iov_supports_xt()) that setup failed as far as x2APIC is concerned. >> >> Signed-off-by: Jan Beulich > > LGTM, but it needs to go in with 2/2 AFAICT, or else you would report > interrupt remapping enabled and x2APIC also enabled but won't handle > correctly a 32 bit destination field. Well, this is an improvement on its own imo, even if it doesn't fully fix everything. The primary point here is to make IOMMU initialization not fail altogether, when the system could be run in xAPIC mode. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 2/2] x86: add accessors for scratch cpu mask
On 28.02.2020 13:07, Roger Pau Monne wrote: > Current usage of the per-CPU scratch cpumask is dangerous since > there's no way to figure out if the mask is already being used except > for manual code inspection of all the callers and possible call paths. > > This is unsafe and not reliable, so introduce a minimal get/put > infrastructure to prevent nested usage of the scratch mask and usage > in interrupt context. > > Move the declaration of scratch_cpumask to smp.c in order to place the > declaration and the accessors as close as possible. > > Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable
On Fri, Feb 28, 2020 at 01:10:59PM +0100, Jan Beulich wrote: > We should neither cause IOMMU initialization as a whole to fail in this > case (we should still be able to bring up the system in non-x2APIC or > x2APIC physical mode), nor should the remainder of the function be > skipped (as the main part of it won't get entered a 2nd time) I'm not sure I see why it won't get entered a second time. AFAICT init_done won't be set if amd_iommu_init fails, and hence will be called again with xt == false in iov_detect? > in such an > event. It is merely necessary for the function to indicate to the caller > (iov_supports_xt()) that setup failed as far as x2APIC is concerned. > > Signed-off-by: Jan Beulich LGTM, but it needs to go in with 2/2 AFAICT, or else you would report interrupt remapping enabled and x2APIC also enabled but won't handle correctly a 32 bit destination field. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
On 28.02.2020 13:31, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 01:12:03PM +0100, Jan Beulich wrote: >> --- a/xen/arch/x86/genapic/x2apic.c >> +++ b/xen/arch/x86/genapic/x2apic.c >> @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic >> x2apic_phys = !iommu_intremap || >>(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL); >> } >> -else if ( !x2apic_phys && !iommu_intremap ) >> -{ >> -printk("WARNING: x2APIC cluster mode is not supported without >> interrupt remapping\n" >> - "x2APIC: forcing phys mode\n"); >> -x2apic_phys = true; >> -} >> +else if ( !x2apic_phys ) >> +switch ( iommu_intremap ) >> +{ >> +case iommu_intremap_off: >> +case iommu_intremap_restricted: >> +printk("WARNING: x2APIC cluster mode is not supported %s >> interrupt remapping\n" >> + "x2APIC: forcing phys mode\n", >> + iommu_intremap == iommu_intremap_off ? "without" >> +: "with >> restricted"); >> +x2apic_phys = true; > > I think you also need to fixup the usage of iommu_intremap in __cpu_up > so that CPUs with APIC IDs > 255 are not brought up when in > iommu_intremap_restricted mode. That certainly wants changing, yes, but I view this as an orthogonal adjustment, which I'd like to make only once I understand what the behavior for APIC ID 0xff should be in this setup. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable
On 28/02/2020 12:10, Jan Beulich wrote: > We should neither cause IOMMU initialization as a whole to fail in this > case (we should still be able to bring up the system in non-x2APIC or > x2APIC physical mode), nor should the remainder of the function be > skipped (as the main part of it won't get entered a 2nd time) in such an > event. It is merely necessary for the function to indicate to the caller > (iov_supports_xt()) that setup failed as far as x2APIC is concerned. > > Signed-off-by: Jan Beulich Looks much better. Reviewed-by: Andrew Cooper ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
On Fri, Feb 28, 2020 at 01:12:03PM +0100, Jan Beulich wrote: > The wider cluster mode APIC IDs aren't generally representable. Convert > the iommu_intremap variable into a tristate, allowing the AMD IOMMU > driver to signal this special restriction to the apic_x2apic_probe(). > (Note: assignments to the variable get adjusted, while existing > consumers - all assuming a boolean property - are left alone.) > > Signed-off-by: Jan Beulich > --- > v2: New. > > --- a/xen/arch/x86/genapic/x2apic.c > +++ b/xen/arch/x86/genapic/x2apic.c > @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic > x2apic_phys = !iommu_intremap || >(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL); > } > -else if ( !x2apic_phys && !iommu_intremap ) > -{ > -printk("WARNING: x2APIC cluster mode is not supported without > interrupt remapping\n" > - "x2APIC: forcing phys mode\n"); > -x2apic_phys = true; > -} > +else if ( !x2apic_phys ) > +switch ( iommu_intremap ) > +{ > +case iommu_intremap_off: > +case iommu_intremap_restricted: > +printk("WARNING: x2APIC cluster mode is not supported %s > interrupt remapping\n" > + "x2APIC: forcing phys mode\n", > + iommu_intremap == iommu_intremap_off ? "without" > +: "with restricted"); > +x2apic_phys = true; I think you also need to fixup the usage of iommu_intremap in __cpu_up so that CPUs with APIC IDs > 255 are not brought up when in iommu_intremap_restricted mode. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 4/5] IOMMU: iommu_qinval is x86-only
In fact it's VT-d specific, but we don't have a way yet to build code for just one vendor. Signed-off-by: Jan Beulich --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -33,7 +33,6 @@ bool_t __read_mostly force_iommu; bool_t __read_mostly iommu_verbose; bool __read_mostly iommu_quarantine = true; bool_t __read_mostly iommu_snoop = 1; -bool_t __read_mostly iommu_qinval = 1; bool_t __read_mostly iommu_crash_disable; static bool __hwdom_initdata iommu_hwdom_none; @@ -75,13 +74,13 @@ static int __init parse_iommu_param(cons #ifdef CONFIG_X86 else if ( (val = parse_boolean("igfx", s, ss)) >= 0 ) iommu_igfx = val; +else if ( (val = parse_boolean("qinval", s, ss)) >= 0 ) +iommu_qinval = val; #endif else if ( (val = parse_boolean("verbose", s, ss)) >= 0 ) iommu_verbose = val; else if ( (val = parse_boolean("snoop", s, ss)) >= 0 ) iommu_snoop = val; -else if ( (val = parse_boolean("qinval", s, ss)) >= 0 ) -iommu_qinval = val; #ifndef iommu_intremap else if ( (val = parse_boolean("intremap", s, ss)) >= 0 ) iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off; --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -51,6 +51,7 @@ struct mapped_rmrr { bool __read_mostly untrusted_msi; bool __read_mostly iommu_igfx = true; +bool __read_mostly iommu_qinval = true; int nr_iommus; --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -54,7 +54,7 @@ static inline bool_t dfn_eq(dfn_t x, dfn extern bool_t iommu_enable, iommu_enabled; extern bool force_iommu, iommu_quarantine, iommu_verbose; -extern bool_t iommu_snoop, iommu_qinval; +extern bool_t iommu_snoop; #ifdef CONFIG_X86 extern enum __packed iommu_intremap { @@ -66,7 +66,7 @@ extern enum __packed iommu_intremap { iommu_intremap_restricted, iommu_intremap_full, } iommu_intremap; -extern bool iommu_igfx; +extern bool iommu_igfx, iommu_qinval; #else # define iommu_intremap false #endif ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 5/5] IOMMU: iommu_snoop is x86/HVM-only
In fact it's VT-d specific, but we don't have a way yet to build code for just one vendor. Provide a #define for all other cases. Signed-off-by: Jan Beulich --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -32,7 +32,6 @@ bool_t __read_mostly iommu_enabled; bool_t __read_mostly force_iommu; bool_t __read_mostly iommu_verbose; bool __read_mostly iommu_quarantine = true; -bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_crash_disable; static bool __hwdom_initdata iommu_hwdom_none; @@ -79,8 +78,10 @@ static int __init parse_iommu_param(cons #endif else if ( (val = parse_boolean("verbose", s, ss)) >= 0 ) iommu_verbose = val; +#ifndef iommu_snoop else if ( (val = parse_boolean("snoop", s, ss)) >= 0 ) iommu_snoop = val; +#endif #ifndef iommu_intremap else if ( (val = parse_boolean("intremap", s, ss)) >= 0 ) iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off; @@ -488,7 +489,9 @@ int __init iommu_setup(void) printk("I/O virtualisation %sabled\n", iommu_enabled ? "en" : "dis"); if ( !iommu_enabled ) { -iommu_snoop = 0; +#ifndef iommu_snoop +iommu_snoop = false; +#endif iommu_hwdom_passthrough = false; iommu_hwdom_strict = false; } --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -52,6 +52,9 @@ bool __read_mostly untrusted_msi; bool __read_mostly iommu_igfx = true; bool __read_mostly iommu_qinval = true; +#ifndef iommu_snoop +bool __read_mostly iommu_snoop = true; +#endif int nr_iommus; @@ -2288,8 +2291,10 @@ static int __init vtd_setup(void) cap_sps_2mb(iommu->cap) ? ", 2MB" : "", cap_sps_1gb(iommu->cap) ? ", 1GB" : ""); +#ifndef iommu_snoop if ( iommu_snoop && !ecap_snp_ctl(iommu->ecap) ) -iommu_snoop = 0; +iommu_snoop = false; +#endif if ( iommu_hwdom_passthrough && !ecap_pass_thru(iommu->ecap) ) iommu_hwdom_passthrough = false; @@ -2331,7 +2336,9 @@ static int __init vtd_setup(void) } #define P(p,s) printk("Intel VT-d %s %senabled.\n", s, (p)? "" : "not ") +#ifndef iommu_snoop P(iommu_snoop, "Snoop Control"); +#endif P(iommu_hwdom_passthrough, "Dom0 DMA Passthrough"); P(iommu_qinval, "Queued Invalidation"); P(iommu_intremap, "Interrupt Remapping"); @@ -2351,7 +2358,9 @@ static int __init vtd_setup(void) error: iommu_enabled = 0; -iommu_snoop = 0; +#ifndef iommu_snoop +iommu_snoop = false; +#endif iommu_hwdom_passthrough = false; iommu_qinval = 0; iommu_intremap = iommu_intremap_off; --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -54,7 +54,6 @@ static inline bool_t dfn_eq(dfn_t x, dfn extern bool_t iommu_enable, iommu_enabled; extern bool force_iommu, iommu_quarantine, iommu_verbose; -extern bool_t iommu_snoop; #ifdef CONFIG_X86 extern enum __packed iommu_intremap { @@ -72,9 +71,10 @@ extern bool iommu_igfx, iommu_qinval; #endif #if defined(CONFIG_X86) && defined(CONFIG_HVM) -extern bool iommu_intpost; +extern bool iommu_intpost, iommu_snoop; #else # define iommu_intpost false +# define iommu_snoop false #endif #if defined(CONFIG_IOMMU_FORCE_PT_SHARE) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/5] IOMMU: iommu_intpost is x86/HVM-only
Provide a #define for all other cases. Signed-off-by: Jan Beulich --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -1309,6 +1309,9 @@ boolean (e.g. `iommu=no`) can override t This option depends on `intremap`, and is disabled by default due to some corner cases in the implementation which have yet to be resolved. +This option is not valid on Arm, or on x86 builds of Xen without HVM +support. + * The `crash-disable` boolean controls disabling IOMMU functionality (DMAR/IR/QI) before switching to a crash kernel. This option is inactive by default and is for compatibility with older kdump kernels only. Modern kernels copy --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -43,14 +43,6 @@ bool __read_mostly iommu_hwdom_passthrou bool __hwdom_initdata iommu_hwdom_inclusive; int8_t __hwdom_initdata iommu_hwdom_reserved = -1; -/* - * In the current implementation of VT-d posted interrupts, in some extreme - * cases, the per cpu list which saves the blocked vCPU will be very long, - * and this will affect the interrupt latency, so let this feature off by - * default until we find a good solution to resolve it. - */ -bool_t __read_mostly iommu_intpost; - #ifndef iommu_hap_pt_share bool __read_mostly iommu_hap_pt_share = true; #endif @@ -93,8 +85,10 @@ static int __init parse_iommu_param(cons else if ( (val = parse_boolean("intremap", s, ss)) >= 0 ) iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off; #endif +#ifndef iommu_intpost else if ( (val = parse_boolean("intpost", s, ss)) >= 0 ) iommu_intpost = val; +#endif #ifdef CONFIG_KEXEC else if ( (val = parse_boolean("crash-disable", s, ss)) >= 0 ) iommu_crash_disable = val; @@ -486,8 +480,10 @@ int __init iommu_setup(void) panic("Couldn't enable %s and iommu=required/force\n", !iommu_enabled ? "IOMMU" : "Interrupt Remapping"); +#ifndef iommu_intpost if ( !iommu_intremap ) iommu_intpost = 0; +#endif printk("I/O virtualisation %sabled\n", iommu_enabled ? "en" : "dis"); if ( !iommu_enabled ) @@ -563,10 +559,13 @@ void iommu_crash_shutdown(void) if ( iommu_enabled ) iommu_get_ops()->crash_shutdown(); -iommu_enabled = iommu_intpost = 0; +iommu_enabled = false; #ifndef iommu_intremap iommu_intremap = iommu_intremap_off; #endif +#ifndef iommu_intpost +iommu_intpost = false; +#endif } int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt) --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2297,13 +2297,15 @@ static int __init vtd_setup(void) if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) ) iommu_intremap = iommu_intremap_off; +#ifndef iommu_intpost /* * We cannot use posted interrupt if X86_FEATURE_CX16 is * not supported, since we count on this feature to * atomically update 16-byte IRTE in posted format. */ if ( !cap_intr_post(iommu->cap) || !iommu_intremap || !cpu_has_cx16 ) -iommu_intpost = 0; +iommu_intpost = false; +#endif if ( !vtd_ept_page_compatible(iommu) ) clear_iommu_hap_pt_share(); @@ -2330,7 +2332,9 @@ static int __init vtd_setup(void) P(iommu_hwdom_passthrough, "Dom0 DMA Passthrough"); P(iommu_qinval, "Queued Invalidation"); P(iommu_intremap, "Interrupt Remapping"); +#ifndef iommu_intpost P(iommu_intpost, "Posted Interrupt"); +#endif P(iommu_hap_pt_share, "Shared EPT tables"); #undef P @@ -2348,7 +2352,9 @@ static int __init vtd_setup(void) iommu_hwdom_passthrough = false; iommu_qinval = 0; iommu_intremap = iommu_intremap_off; -iommu_intpost = 0; +#ifndef iommu_intpost +iommu_intpost = false; +#endif return ret; } --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -29,6 +29,16 @@ struct iommu_ops __read_mostly iommu_ops enum iommu_intremap __read_mostly iommu_intremap = iommu_intremap_full; +#ifndef iommu_intpost +/* + * In the current implementation of VT-d posted interrupts, in some extreme + * cases, the per cpu list which saves the blocked vCPU will be very long, + * and this will affect the interrupt latency, so let this feature off by + * default until we find a good solution to resolve it. + */ +bool __read_mostly iommu_intpost; +#endif + int __init iommu_hardware_setup(void) { struct IO_APIC_route_entry **ioapic_entries = NULL; --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -54,7 +54,7 @@ static inline bool_t dfn_eq(dfn_t x, dfn extern bool_t iommu_enable, iommu_enabled; extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx; -extern bool_t iommu_snoop, iommu_qinval, iommu_intpost; +extern bool_t iommu_snoop, iommu_qinval; #ifdef CONFIG_X86 extern enum __p
[Xen-devel] [PATCH 3/5] IOMMU: iommu_igfx is x86-only
In fact it's VT-d specific, but we don't have a way yet to build code for just one vendor. Signed-off-by: Jan Beulich --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -32,7 +32,6 @@ bool_t __read_mostly iommu_enabled; bool_t __read_mostly force_iommu; bool_t __read_mostly iommu_verbose; bool __read_mostly iommu_quarantine = true; -bool_t __read_mostly iommu_igfx = 1; bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_qinval = 1; bool_t __read_mostly iommu_crash_disable; @@ -73,8 +72,10 @@ static int __init parse_iommu_param(cons force_iommu = val; else if ( (val = parse_boolean("quarantine", s, ss)) >= 0 ) iommu_quarantine = val; +#ifdef CONFIG_X86 else if ( (val = parse_boolean("igfx", s, ss)) >= 0 ) iommu_igfx = val; +#endif else if ( (val = parse_boolean("verbose", s, ss)) >= 0 ) iommu_verbose = val; else if ( (val = parse_boolean("snoop", s, ss)) >= 0 ) --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -50,6 +50,8 @@ struct mapped_rmrr { /* Possible unfiltered LAPIC/MSI messages from untrusted sources? */ bool __read_mostly untrusted_msi; +bool __read_mostly iommu_igfx = true; + int nr_iommus; static struct tasklet vtd_fault_tasklet; --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -53,7 +53,7 @@ static inline bool_t dfn_eq(dfn_t x, dfn } extern bool_t iommu_enable, iommu_enabled; -extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx; +extern bool force_iommu, iommu_quarantine, iommu_verbose; extern bool_t iommu_snoop, iommu_qinval; #ifdef CONFIG_X86 @@ -66,6 +66,7 @@ extern enum __packed iommu_intremap { iommu_intremap_restricted, iommu_intremap_full, } iommu_intremap; +extern bool iommu_igfx; #else # define iommu_intremap false #endif ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/5] IOMMU: iommu_intremap is x86-only
Provide a #define for other cases; it didn't seem worthwhile to me to introduce an IOMMU_INTREMAP Kconfig option at this point. Signed-off-by: Jan Beulich --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -1299,6 +1299,8 @@ boolean (e.g. `iommu=no`) can override t generation of IOMMUs only supported DMA remapping, and Interrupt Remapping appeared in the second generation. +This option is not valid on Arm. + * The `intpost` boolean controls the Posted Interrupt sub-feature. In combination with APIC acceleration (VT-x APICV, SVM AVIC), the IOMMU can be configured to deliver interrupts from assigned PCI devices directly --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -35,7 +35,6 @@ bool __read_mostly iommu_quarantine = tr bool_t __read_mostly iommu_igfx = 1; bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_qinval = 1; -enum iommu_intremap __read_mostly iommu_intremap = iommu_intremap_full; bool_t __read_mostly iommu_crash_disable; static bool __hwdom_initdata iommu_hwdom_none; @@ -90,8 +89,10 @@ static int __init parse_iommu_param(cons iommu_snoop = val; else if ( (val = parse_boolean("qinval", s, ss)) >= 0 ) iommu_qinval = val; +#ifndef iommu_intremap else if ( (val = parse_boolean("intremap", s, ss)) >= 0 ) iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off; +#endif else if ( (val = parse_boolean("intpost", s, ss)) >= 0 ) iommu_intpost = val; #ifdef CONFIG_KEXEC @@ -474,8 +475,11 @@ int __init iommu_setup(void) rc = iommu_hardware_setup(); iommu_enabled = (rc == 0); } + +#ifndef iommu_intremap if ( !iommu_enabled ) iommu_intremap = iommu_intremap_off; +#endif if ( (force_iommu && !iommu_enabled) || (force_intremap && !iommu_intremap) ) @@ -500,7 +504,9 @@ int __init iommu_setup(void) printk(" - Dom0 mode: %s\n", iommu_hwdom_passthrough ? "Passthrough" : iommu_hwdom_strict ? "Strict" : "Relaxed"); +#ifndef iommu_intremap printk("Interrupt remapping %sabled\n", iommu_intremap ? "en" : "dis"); +#endif tasklet_init(&iommu_pt_cleanup_tasklet, iommu_free_pagetables, NULL); } @@ -558,7 +564,9 @@ void iommu_crash_shutdown(void) if ( iommu_enabled ) iommu_get_ops()->crash_shutdown(); iommu_enabled = iommu_intpost = 0; +#ifndef iommu_intremap iommu_intremap = iommu_intremap_off; +#endif } int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt) --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -27,6 +27,8 @@ const struct iommu_init_ops *__initdata iommu_init_ops; struct iommu_ops __read_mostly iommu_ops; +enum iommu_intremap __read_mostly iommu_intremap = iommu_intremap_full; + int __init iommu_hardware_setup(void) { struct IO_APIC_route_entry **ioapic_entries = NULL; --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -55,17 +55,20 @@ static inline bool_t dfn_eq(dfn_t x, dfn extern bool_t iommu_enable, iommu_enabled; extern bool force_iommu, iommu_quarantine, iommu_verbose, iommu_igfx; extern bool_t iommu_snoop, iommu_qinval, iommu_intpost; + +#ifdef CONFIG_X86 extern enum __packed iommu_intremap { /* * In order to allow traditional boolean uses of the iommu_intremap * variable, the "off" value has to come first (yielding a value of zero). */ iommu_intremap_off, -#ifdef CONFIG_X86 iommu_intremap_restricted, -#endif iommu_intremap_full, } iommu_intremap; +#else +# define iommu_intremap false +#endif #if defined(CONFIG_IOMMU_FORCE_PT_SHARE) #define iommu_hap_pt_share true ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 0/5] IOMMU: restrict visibility/scope if certain variables
A number of the command line controlled variables are x86- or even x86-HVM-specific. Don't have those variables elsewhere in the first place (in some cases replace them by a #define), and as a result also don't silently accept such "iommu=" sub-options which in fact have no effect. 1: iommu_intremap is x86-only 2: iommu_intpost is x86/HVM-only 3: iommu_igfx is x86-only 4: iommu_qinval is x86-only 5: iommu_snoop is x86/HVM-only The series contextually depends on "AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode" Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen: Replace zero-length array with flexible-array member
On 2/27/20 4:31 AM, Jürgen Groß wrote: > On 26.02.20 22:26, Gustavo A. R. Silva wrote: >> The current codebase makes use of the zero-length array language >> extension to the C90 standard, but the preferred mechanism to declare >> variable-length types such as these ones is a flexible array >> member[1][2], >> introduced in C99: >> >> struct foo { >> int stuff; >> struct boo array[]; >> }; >> >> By making use of the mechanism above, we will get a compiler warning >> in case the flexible array does not occur last in the structure, which >> will help us prevent some kind of undefined behavior bugs from being >> inadvertently introduced[3] to the codebase from now on. >> >> Also, notice that, dynamic memory allocations won't be affected by >> this change: >> >> "Flexible array members have incomplete type, and so the sizeof operator >> may not be applied. As a quirk of the original implementation of >> zero-length arrays, sizeof evaluates to zero."[1] >> >> This issue was found with the help of Coccinelle. >> >> [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html >> [2] https://github.com/KSPP/linux/issues/21 >> [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") >> >> Signed-off-by: Gustavo A. R. Silva > > Reviewed-by: Juergen Gross Applied to for-linus-5.6. -boris ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 2/2] AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
The wider cluster mode APIC IDs aren't generally representable. Convert the iommu_intremap variable into a tristate, allowing the AMD IOMMU driver to signal this special restriction to the apic_x2apic_probe(). (Note: assignments to the variable get adjusted, while existing consumers - all assuming a boolean property - are left alone.) Signed-off-by: Jan Beulich --- v2: New. --- a/xen/arch/x86/genapic/x2apic.c +++ b/xen/arch/x86/genapic/x2apic.c @@ -236,12 +236,21 @@ const struct genapic *__init apic_x2apic x2apic_phys = !iommu_intremap || (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL); } -else if ( !x2apic_phys && !iommu_intremap ) -{ -printk("WARNING: x2APIC cluster mode is not supported without interrupt remapping\n" - "x2APIC: forcing phys mode\n"); -x2apic_phys = true; -} +else if ( !x2apic_phys ) +switch ( iommu_intremap ) +{ +case iommu_intremap_off: +case iommu_intremap_restricted: +printk("WARNING: x2APIC cluster mode is not supported %s interrupt remapping\n" + "x2APIC: forcing phys mode\n", + iommu_intremap == iommu_intremap_off ? "without" +: "with restricted"); +x2apic_phys = true; +break; + +case iommu_intremap_full: +break; +} if ( x2apic_phys ) return &apic_x2apic_phys; --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -1139,7 +1139,7 @@ static void __init amd_iommu_init_cleanu iommu_enabled = 0; iommu_hwdom_passthrough = false; -iommu_intremap = 0; +iommu_intremap = iommu_intremap_off; iommuv2_enabled = 0; } @@ -1413,6 +1413,9 @@ int __init amd_iommu_prepare(bool xt) iommu->ctrl.int_cap_xt_en = xt && has_xt; } +if ( iommu_intremap && !has_xt ) +iommu_intremap = iommu_intremap_restricted; + rc = amd_iommu_update_ivrs_mapping_acpi(); error_out: --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -157,7 +157,7 @@ int __init acpi_ivrs_init(void) if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) ) { -iommu_intremap = 0; +iommu_intremap = iommu_intremap_off; return -ENODEV; } --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -35,7 +35,7 @@ bool __read_mostly iommu_quarantine = tr bool_t __read_mostly iommu_igfx = 1; bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_qinval = 1; -bool_t __read_mostly iommu_intremap = 1; +enum iommu_intremap __read_mostly iommu_intremap = iommu_intremap_full; bool_t __read_mostly iommu_crash_disable; static bool __hwdom_initdata iommu_hwdom_none; @@ -91,7 +91,7 @@ static int __init parse_iommu_param(cons else if ( (val = parse_boolean("qinval", s, ss)) >= 0 ) iommu_qinval = val; else if ( (val = parse_boolean("intremap", s, ss)) >= 0 ) -iommu_intremap = val; +iommu_intremap = val ? iommu_intremap_full : iommu_intremap_off; else if ( (val = parse_boolean("intpost", s, ss)) >= 0 ) iommu_intpost = val; #ifdef CONFIG_KEXEC @@ -475,7 +475,7 @@ int __init iommu_setup(void) iommu_enabled = (rc == 0); } if ( !iommu_enabled ) -iommu_intremap = 0; +iommu_intremap = iommu_intremap_off; if ( (force_iommu && !iommu_enabled) || (force_intremap && !iommu_intremap) ) @@ -557,7 +557,8 @@ void iommu_crash_shutdown(void) if ( iommu_enabled ) iommu_get_ops()->crash_shutdown(); -iommu_enabled = iommu_intremap = iommu_intpost = 0; +iommu_enabled = iommu_intpost = 0; +iommu_intremap = iommu_intremap_off; } int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt) --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2177,7 +2177,7 @@ static int __must_check init_vtd_hw(void { if ( ioapic_to_iommu(IO_APIC_ID(apic)) == NULL ) { -iommu_intremap = 0; +iommu_intremap = iommu_intremap_off; dprintk(XENLOG_ERR VTDPREFIX, "ioapic_to_iommu: ioapic %#x (id: %#x) is NULL! " "Will not try to enable Interrupt Remapping.\n", @@ -2193,7 +2193,7 @@ static int __must_check init_vtd_hw(void iommu = drhd->iommu; if ( enable_intremap(iommu, 0) != 0 ) { -iommu_intremap = 0; +iommu_intremap = iommu_intremap_off; dprintk(XENLOG_WARNING VTDPREFIX, "Interrupt Remapping not enabled\n"); @@ -2295,7 +2295,7 @@ static int __init vtd_setup(void) iommu_qinval = 0; if ( iommu_intremap && !ecap_intr_remap(iomm
[Xen-devel] [PATCH v2 1/2] AMD/IOMMU: correct handling when XT's prereq features are unavailable
We should neither cause IOMMU initialization as a whole to fail in this case (we should still be able to bring up the system in non-x2APIC or x2APIC physical mode), nor should the remainder of the function be skipped (as the main part of it won't get entered a 2nd time) in such an event. It is merely necessary for the function to indicate to the caller (iov_supports_xt()) that setup failed as far as x2APIC is concerned. Signed-off-by: Jan Beulich --- v2: Also adjust the loop consuming "xt". no_xt -> has_xt. --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -1364,6 +1364,7 @@ static int __init amd_iommu_prepare_one( int __init amd_iommu_prepare(bool xt) { struct amd_iommu *iommu; +bool has_xt = true; int rc = -ENODEV; BUG_ON( !iommu_found() ); @@ -1400,17 +1401,16 @@ int __init amd_iommu_prepare(bool xt) if ( rc ) goto error_out; -rc = -ENODEV; -if ( xt && (!iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup) ) -goto error_out; +if ( !iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup ) +has_xt = false; } for_each_amd_iommu ( iommu ) { /* NB: There's no need to actually write these out right here. */ -iommu->ctrl.ga_en |= xt; -iommu->ctrl.xt_en = xt; -iommu->ctrl.int_cap_xt_en = xt; +iommu->ctrl.ga_en |= xt && has_xt; +iommu->ctrl.xt_en = xt && has_xt; +iommu->ctrl.int_cap_xt_en = xt && has_xt; } rc = amd_iommu_update_ivrs_mapping_acpi(); @@ -1422,7 +1422,7 @@ int __init amd_iommu_prepare(bool xt) ivhd_type = 0; } -return rc; +return rc ?: xt && !has_xt ? -ENODEV : 0; } int __init amd_iommu_init(bool xt) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/2] x86/smp: use a dedicated CPU mask in send_IPI_mask
On 28.02.2020 13:07, Roger Pau Monne wrote: > Some callers of send_IPI_mask pass the scratch cpumask as the mask > parameter of send_IPI_mask, so the scratch cpumask cannot be used by > the function. The following trace has been obtained with a debug patch > and shows one of those callers: > > (XEN) scratch CPU mask already in use by > arch/x86/mm.c#_get_page_type+0x1f9/0x1abf > (XEN) Xen BUG at smp.c:45 > [...] > (XEN) Xen call trace: > (XEN)[] R scratch_cpumask+0xd3/0xf9 > (XEN)[] F send_IPI_mask+0x72/0x1ca > (XEN)[] F flush_area_mask+0x10c/0x16c > (XEN)[] F arch/x86/mm.c#_get_page_type+0x3ff/0x1abf > (XEN)[] F get_page_type+0xe/0x2c > (XEN)[] F pv_set_gdt+0xa1/0x2aa > (XEN)[] F arch_set_info_guest+0x1196/0x16ba > (XEN)[] F default_initialise_vcpu+0xc7/0xd4 > (XEN)[] F arch_initialise_vcpu+0x61/0xcd > (XEN)[] F do_vcpu_op+0x219/0x690 > (XEN)[] F pv_hypercall+0x2f6/0x593 > (XEN)[] F lstar_enter+0x112/0x120 > > _get_page_type will use the scratch cpumask to call flush_tlb_mask, > which in turn calls send_IPI_mask. > > Fix this by using a dedicated per CPU cpumask in send_IPI_mask. > > Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when > possible') > Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 0/2] AMD/IOMMU: improve x2APIC handling when XT is unavailable
For AMD the connection between IOMMU and x2APIC is less strong, hence even if unlikely to occur we would better deal correctly with XT being unavailable (for whatever reasons) in particular when x2APIC is already enabled on a system. 1: correct handling when XT's prereq features are unavailable 2: without XT, x2APIC needs to be forced into physical mode Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v5 1/2] x86/smp: use a dedicated CPU mask in send_IPI_mask
Some callers of send_IPI_mask pass the scratch cpumask as the mask parameter of send_IPI_mask, so the scratch cpumask cannot be used by the function. The following trace has been obtained with a debug patch and shows one of those callers: (XEN) scratch CPU mask already in use by arch/x86/mm.c#_get_page_type+0x1f9/0x1abf (XEN) Xen BUG at smp.c:45 [...] (XEN) Xen call trace: (XEN)[] R scratch_cpumask+0xd3/0xf9 (XEN)[] F send_IPI_mask+0x72/0x1ca (XEN)[] F flush_area_mask+0x10c/0x16c (XEN)[] F arch/x86/mm.c#_get_page_type+0x3ff/0x1abf (XEN)[] F get_page_type+0xe/0x2c (XEN)[] F pv_set_gdt+0xa1/0x2aa (XEN)[] F arch_set_info_guest+0x1196/0x16ba (XEN)[] F default_initialise_vcpu+0xc7/0xd4 (XEN)[] F arch_initialise_vcpu+0x61/0xcd (XEN)[] F do_vcpu_op+0x219/0x690 (XEN)[] F pv_hypercall+0x2f6/0x593 (XEN)[] F lstar_enter+0x112/0x120 _get_page_type will use the scratch cpumask to call flush_tlb_mask, which in turn calls send_IPI_mask. Fix this by using a dedicated per CPU cpumask in send_IPI_mask. Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible') Signed-off-by: Roger Pau Monné --- Changes since v4: - Place the declaration of send_ipi_cpumask in smp.h. --- xen/arch/x86/smp.c| 2 +- xen/arch/x86/smpboot.c| 9 - xen/include/asm-x86/smp.h | 1 + 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c index 0461812cf6..dd0b49d731 100644 --- a/xen/arch/x86/smp.c +++ b/xen/arch/x86/smp.c @@ -67,7 +67,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector, void send_IPI_mask(const cpumask_t *mask, int vector) { bool cpus_locked = false; -cpumask_t *scratch = this_cpu(scratch_cpumask); +cpumask_t *scratch = this_cpu(send_ipi_cpumask); if ( in_irq() || in_mce_handler() || in_nmi_handler() ) { diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index ad49f2dcd7..6c548b0b53 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -57,6 +57,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask); DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask); static cpumask_t scratch_cpu0mask; +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask); +static cpumask_t send_ipi_cpu0mask; + cpumask_t cpu_online_map __read_mostly; EXPORT_SYMBOL(cpu_online_map); @@ -930,6 +933,8 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove) FREE_CPUMASK_VAR(per_cpu(cpu_core_mask, cpu)); if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask ) FREE_CPUMASK_VAR(per_cpu(scratch_cpumask, cpu)); +if ( per_cpu(send_ipi_cpumask, cpu) != &send_ipi_cpu0mask ) +FREE_CPUMASK_VAR(per_cpu(send_ipi_cpumask, cpu)); } cleanup_cpu_root_pgt(cpu); @@ -1034,7 +1039,8 @@ static int cpu_smpboot_alloc(unsigned int cpu) if ( !(cond_zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) && cond_zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) && - cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) ) + cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu)) && + cond_alloc_cpumask_var(&per_cpu(send_ipi_cpumask, cpu))) ) goto out; rc = 0; @@ -1175,6 +1181,7 @@ void __init smp_prepare_boot_cpu(void) cpumask_set_cpu(cpu, &cpu_present_map); #if NR_CPUS > 2 * BITS_PER_LONG per_cpu(scratch_cpumask, cpu) = &scratch_cpu0mask; +per_cpu(send_ipi_cpumask, cpu) = &send_ipi_cpu0mask; #endif get_cpu_info()->use_pv_cr3 = false; diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h index 92d69a5ea0..6150363655 100644 --- a/xen/include/asm-x86/smp.h +++ b/xen/include/asm-x86/smp.h @@ -22,6 +22,7 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask); DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask); DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask); +DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask); /* * Do we, for platform reasons, need to actually keep CPUs online when we -- 2.25.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v5 0/2] x86: scratch cpumask fixes/improvement
Hello, Following series contain yet one more bugfix that removes the usage of the scratch cpumask in send_IPI_mask and the introduction of accessors to get/put the per-CPU scratch cpumask in order to prevent such issues form happening in the future. Thanks, Roger. Roger Pau Monne (2): x86/smp: use a dedicated CPU mask in send_IPI_mask x86: add accessors for scratch cpu mask xen/arch/x86/io_apic.c| 6 -- xen/arch/x86/irq.c| 14 ++ xen/arch/x86/mm.c | 39 +++ xen/arch/x86/msi.c| 4 +++- xen/arch/x86/smp.c| 27 ++- xen/arch/x86/smpboot.c| 10 -- xen/include/asm-x86/smp.h | 15 +++ 7 files changed, 93 insertions(+), 22 deletions(-) -- 2.25.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v5 2/2] x86: add accessors for scratch cpu mask
Current usage of the per-CPU scratch cpumask is dangerous since there's no way to figure out if the mask is already being used except for manual code inspection of all the callers and possible call paths. This is unsafe and not reliable, so introduce a minimal get/put infrastructure to prevent nested usage of the scratch mask and usage in interrupt context. Move the declaration of scratch_cpumask to smp.c in order to place the declaration and the accessors as close as possible. Signed-off-by: Roger Pau Monné --- Changes since v4: - Fix get_scratch_cpumask call order in _clear_irq_vector. - Remove double newline in __do_update_va_mapping. - Constify scratch_cpumask_use. - Don't explicitly print the address in the format string, a followup patch will be sent to make %p[sS] do so. Changes since v3: - Fix commit message. - Split the cpumask taken section into two in _clear_irq_vector. - Add an empty statement in do_mmuext_op to avoid a break. - Change the logic used to release the scratch cpumask in __do_update_va_mapping. - Add a %ps print to scratch_cpumask helper. - Remove printing the current IP, as that would be done by BUG anyway. - Pass the cpumask to put_scratch_cpumask and zap the pointer. Changes since v1: - Use __builtin_return_address(0) instead of __func__. - Move declaration of scratch_cpumask and scratch_cpumask accessor to smp.c. - Do not allow usage in #MC or #NMI context. --- xen/arch/x86/io_apic.c| 6 -- xen/arch/x86/irq.c| 14 ++ xen/arch/x86/mm.c | 39 +++ xen/arch/x86/msi.c| 4 +++- xen/arch/x86/smp.c| 25 + xen/arch/x86/smpboot.c| 1 - xen/include/asm-x86/smp.h | 14 ++ 7 files changed, 83 insertions(+), 20 deletions(-) diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c index e98e08e9c8..0bb994f0ba 100644 --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -2236,10 +2236,11 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a entry.vector = vector; if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) { -cpumask_t *mask = this_cpu(scratch_cpumask); +cpumask_t *mask = get_scratch_cpumask(); cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS); SET_DEST(entry, logical, cpu_mask_to_apicid(mask)); +put_scratch_cpumask(mask); } else { printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n", irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS)); @@ -2433,10 +2434,11 @@ int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val) if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) ) { -cpumask_t *mask = this_cpu(scratch_cpumask); +cpumask_t *mask = get_scratch_cpumask(); cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS); SET_DEST(rte, logical, cpu_mask_to_apicid(mask)); +put_scratch_cpumask(mask); } else { diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index cc2eb8e925..0a526ee800 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc) { unsigned int cpu, old_vector, irq = desc->irq; unsigned int vector = desc->arch.vector; -cpumask_t *tmp_mask = this_cpu(scratch_cpumask); +cpumask_t *tmp_mask = get_scratch_cpumask(); BUG_ON(!valid_irq_vector(vector)); @@ -208,6 +208,7 @@ static void _clear_irq_vector(struct irq_desc *desc) ASSERT(per_cpu(vector_irq, cpu)[vector] == irq); per_cpu(vector_irq, cpu)[vector] = ~irq; } +put_scratch_cpumask(tmp_mask); desc->arch.vector = IRQ_VECTOR_UNASSIGNED; cpumask_clear(desc->arch.cpu_mask); @@ -227,8 +228,9 @@ static void _clear_irq_vector(struct irq_desc *desc) /* If we were in motion, also clear desc->arch.old_vector */ old_vector = desc->arch.old_vector; -cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map); +tmp_mask = get_scratch_cpumask(); +cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map); for_each_cpu(cpu, tmp_mask) { ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq); @@ -236,6 +238,7 @@ static void _clear_irq_vector(struct irq_desc *desc) per_cpu(vector_irq, cpu)[old_vector] = ~irq; } +put_scratch_cpumask(tmp_mask); release_old_vec(desc); desc->arch.move_in_progress = 0; @@ -1152,10 +1155,11 @@ static void irq_guest_eoi_timer_fn(void *data) break; case ACKTYPE_EOI: -cpu_eoi_map = this_cpu(scratch_cpumask); +cpu_eoi_map = get_scratch_cpumask(); cpumask_copy(cpu_eoi_map, action->cpu_eoi_map); spin_unlock_irq(&desc->lock); on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0); +put_scratch_cpumask(cpu_eoi_map); return; } @@ -25
Re: [Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask
On 28.02.2020 12:41, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 12:15:21PM +0100, Jan Beulich wrote: >> On 28.02.2020 11:31, Roger Pau Monné wrote: >>> On Fri, Feb 28, 2020 at 11:16:55AM +0100, Jan Beulich wrote: On 28.02.2020 10:33, Roger Pau Monne wrote: > +/* > + * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or > #NMI > + * context. > + */ > +BUG_ON(in_irq() || in_mce_handler() || in_nmi_handler()); > + > +if ( use && unlikely(this_cpu(scratch_cpumask_use)) ) > +{ > +printk("scratch CPU mask already in use by %ps (%p)\n", > + this_cpu(scratch_cpumask_use), > this_cpu(scratch_cpumask_use)); Why the raw %p as well? We don't do so elsewhere, I think. Yes, it's debugging code only, but I wonder anyway. >>> >>> I use addr2line to find the offending line, and it's much easier to do >>> so if you have the address directly, rather than having to use nm in >>> order to figure out the address of the symbol and then add the offset. >>> >>> Maybe I'm missing some other way to do this more easily? >> >> In such a case we may want to consider making %ps (and %pS) >> print a hex presentation next to the decoded one, in debug >> builds at least. Andrew, thoughts? (There may be cases where >> this is not wanted, bit if we made this a debug mode only >> feature, I think it wouldn't do too much harm.) > > If you agree to make %p[sS] print the address then I can drop this > and send a patch to that effect (likely next week). In principle I agree, but the effect in particular on stack dumps needs to be looked at pretty closely. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [linux-5.4 test] 147670: regressions - FAIL
flight 147670 linux-5.4 real [real] http://logs.test-lab.xenproject.org/osstest/logs/147670/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 146121 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 146121 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 146121 Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat fail pass in 147586 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: linuxf22dcb31727e3cf31a9143437f134ea133021982 baseline version: linux122179cb7d648a6f36b20dd6bf34f953cb384c30 Last test of basis 146121 2020-01-15 17:42:04 Z 43 days Failing since
Re: [Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask
On Fri, Feb 28, 2020 at 12:15:21PM +0100, Jan Beulich wrote: > On 28.02.2020 11:31, Roger Pau Monné wrote: > > On Fri, Feb 28, 2020 at 11:16:55AM +0100, Jan Beulich wrote: > >> On 28.02.2020 10:33, Roger Pau Monne wrote: > >>> +/* > >>> + * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or > >>> #NMI > >>> + * context. > >>> + */ > >>> +BUG_ON(in_irq() || in_mce_handler() || in_nmi_handler()); > >>> + > >>> +if ( use && unlikely(this_cpu(scratch_cpumask_use)) ) > >>> +{ > >>> +printk("scratch CPU mask already in use by %ps (%p)\n", > >>> + this_cpu(scratch_cpumask_use), > >>> this_cpu(scratch_cpumask_use)); > >> > >> Why the raw %p as well? We don't do so elsewhere, I think. Yes, > >> it's debugging code only, but I wonder anyway. > > > > I use addr2line to find the offending line, and it's much easier to do > > so if you have the address directly, rather than having to use nm in > > order to figure out the address of the symbol and then add the offset. > > > > Maybe I'm missing some other way to do this more easily? > > In such a case we may want to consider making %ps (and %pS) > print a hex presentation next to the decoded one, in debug > builds at least. Andrew, thoughts? (There may be cases where > this is not wanted, bit if we made this a debug mode only > feature, I think it wouldn't do too much harm.) If you agree to make %p[sS] print the address then I can drop this and send a patch to that effect (likely next week). Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PULL 0/3] Xen queue 2020-02-27
On Thu, 27 Feb 2020 at 12:16, Anthony PERARD wrote: > > The following changes since commit db736e0437aa6fd7c1b7e4599c17f9619ab6b837: > > Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into > staging (2020-02-25 13:31:16 +) > > are available in the Git repository at: > > https://xenbits.xen.org/git-http/people/aperard/qemu-dm.git > tags/pull-xen-20200227 > > for you to fetch changes up to 5d4c954931ba62661c6a1bc16ce604a012a10007: > > Memory: Only call ramblock_ptr when needed in qemu_ram_writeback > (2020-02-27 11:50:30 +) > > > Xen queue 2020-02-27 > > * fix for xen-block > * fix in exec.c for migration of xen guest > * one cleanup patch > > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0 for any user-visible changes. -- PMM ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask
On 28.02.2020 11:31, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 11:16:55AM +0100, Jan Beulich wrote: >> On 28.02.2020 10:33, Roger Pau Monne wrote: >>> +/* >>> + * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or #NMI >>> + * context. >>> + */ >>> +BUG_ON(in_irq() || in_mce_handler() || in_nmi_handler()); >>> + >>> +if ( use && unlikely(this_cpu(scratch_cpumask_use)) ) >>> +{ >>> +printk("scratch CPU mask already in use by %ps (%p)\n", >>> + this_cpu(scratch_cpumask_use), >>> this_cpu(scratch_cpumask_use)); >> >> Why the raw %p as well? We don't do so elsewhere, I think. Yes, >> it's debugging code only, but I wonder anyway. > > I use addr2line to find the offending line, and it's much easier to do > so if you have the address directly, rather than having to use nm in > order to figure out the address of the symbol and then add the offset. > > Maybe I'm missing some other way to do this more easily? In such a case we may want to consider making %ps (and %pS) print a hex presentation next to the decoded one, in debug builds at least. Andrew, thoughts? (There may be cases where this is not wanted, bit if we made this a debug mode only feature, I think it wouldn't do too much harm.) Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore suspend event channel node optional
> -Original Message- > From: Julien Grall > Sent: 28 February 2020 10:26 > To: Durrant, Paul ; xen-devel@lists.xenproject.org > Cc: Anthony PERARD ; Ian Jackson > ; Wei Liu > Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore > suspend event channel node optional > > Hi Paul, > > On 28/02/2020 09:28, Durrant, Paul wrote: > >> -Original Message- > >> From: Julien Grall > >> Sent: 27 February 2020 22:52 > >> To: Durrant, Paul ; xen- > de...@lists.xenproject.org > >> Cc: Anthony PERARD ; Ian Jackson > >> ; Wei Liu > >> Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore > >> suspend event channel node optional > >> > >> Hi, > >> > >> On 26/02/2020 16:08, Paul Durrant wrote: > >>> The purpose and semantics of the node are explained in > >>> xenstore-paths.pandoc [1]. It was originally introduced in xend by > >> commit > >>> 17636f47a474 "Teach xc_save to use event-channel-based domain suspend > if > >>> available.". Note that, because, the top-level frontend 'device' node > >> was > >>> created writable by the guest in xend, there was no need to explicitly > >>> create the 'suspend-event-channel' node as writable node. > >>> > >>> However, libxl creates the 'device' node as read-only by the guest and > >> so > >>> explicit creation of the 'suspend-event-channel' node is necessary to > >> make > >>> it usable. This unfortunately has the side-effect of making some old > >>> Windows PV drivers [2] cease to function. This is because they scan > the > >> top > >>> level 'device' node, find the 'suspend' node and expect it to contain > >> the > >>> usual sub-nodes describing a PV frontend. When this is found not to be > >> the > >>> case, enumeration ceases and (because the 'suspend' node is observed > >> before > >>> the 'vbd' node) no system disk is enumerated. Windows will then crash > >> with > >>> bugcheck code 0x7B. > >>> > >>> This patch adds a boolean 'suspend_event_channel' field into > >>> libxl_create_info to control whether the xenstore node is created and > a > >>> similarly named option in xl.cfg which, for compatibility with > previous > >>> libxl behaviour, defaults to true. > >>> > >>> [1] > >> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore- > >> paths.pandoc;hb=HEAD#l177 > >>> [2] https://access.redhat.com/documentation/en- > >> us/red_hat_enterprise_linux/5/html/para- > >> virtualized_windows_drivers_guide/sect-para- > >> virtualized_windows_drivers_guide- > >> installing_and_configuring_the_para_virtualized_drivers- > >> installing_the_para_virtualized_drivers > >>> > >>> NOTE: While adding the new LIBXL_HAVE_CREATEINFO_SUSPEND_EVENT_CHANNEL > >>> definition into libxl.h, this patch corrects the previous > stanza > >>> which erroneously implies ibxl_domain_create_info is a > function. > >>> > >>> Signed-off-by: Paul Durrant > >>> --- > >>> Cc: Ian Jackson > >>> Cc: Wei Liu > >>> Cc: Anthony PERARD > >>> --- > >>>docs/man/xl.cfg.5.pod.in| 7 +++ > >>>tools/libxl/libxl.h | 13 - > >>>tools/libxl/libxl_create.c | 12 +--- > >>>tools/libxl/libxl_types.idl | 1 + > >>>tools/xl/xl_parse.c | 3 +++ > >> > >> You may want to update xenstore-paths.pandoc as the document mention > the > >> node will be created by the toolstack. > >> > > > > The doc already says that: > > > > - > > ~/device/suspend/event-channel = ""|EVTCHN [w] > > > > The domain's suspend event channel. The toolstack will create this > > path with an empty value which the guest may choose to overwrite. > > - > > The paragraph you quoted does not suggest the node may not exist. So I > think you want to update the documentation to reflect the node may not > exist. > > >>>5 files changed, 32 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in > >>> index 0cad561375..5f476f1e1d 100644 > >>> --- a/docs/man/xl.cfg.5.pod.in > >>> +++ b/docs/man/xl.cfg.5.pod.in > >>> @@ -668,6 +668,13 @@ file. > >>> > >>>=back > >>> > >>> +=item B > >>> + > >>> +Create the xenstore path for the domain's suspend event channel. The > >>> +existence of this path can cause problems with older PV drivers > running > >>> +in the guest. If this option is not specified then it will default to > >>> +B. > >> > >> In the next patch you are going to make device/ rw. Do you see any > issue > >> with just not creating the node for everyone? Are PV drivers allowed to > >> assume a node will be present? > > > > Well that's the problem. Some of them may have become accustomed to it > being present. Also the documentation does say it is created by the > toolstack (but not when). Perhaps I should update the doc to say the > toolstack *may* create this path when the domain is created. > > Hmmm fair point. However, this now means you may need to modify your > configuration file depending on the PV driver installed. > > This is not a very ideal
Re: [Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask
On Fri, Feb 28, 2020 at 11:16:55AM +0100, Jan Beulich wrote: > On 28.02.2020 10:33, Roger Pau Monne wrote: > > Current usage of the per-CPU scratch cpumask is dangerous since > > there's no way to figure out if the mask is already being used except > > for manual code inspection of all the callers and possible call paths. > > > > This is unsafe and not reliable, so introduce a minimal get/put > > infrastructure to prevent nested usage of the scratch mask and usage > > in interrupt context. > > > > Move the definition of scratch_cpumask to smp.c in order to place the > > declaration and the accessors as close as possible. > > You've changed one instance of "declaration", but not also the other. Oh, sorry. Sadly you are not the only one with a cold this week :). > > --- a/xen/arch/x86/irq.c > > +++ b/xen/arch/x86/irq.c > > @@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc) > > { > > unsigned int cpu, old_vector, irq = desc->irq; > > unsigned int vector = desc->arch.vector; > > -cpumask_t *tmp_mask = this_cpu(scratch_cpumask); > > +cpumask_t *tmp_mask = get_scratch_cpumask(); > > > > BUG_ON(!valid_irq_vector(vector)); > > > > @@ -208,6 +208,7 @@ static void _clear_irq_vector(struct irq_desc *desc) > > ASSERT(per_cpu(vector_irq, cpu)[vector] == irq); > > per_cpu(vector_irq, cpu)[vector] = ~irq; > > } > > +put_scratch_cpumask(tmp_mask); > > > > desc->arch.vector = IRQ_VECTOR_UNASSIGNED; > > cpumask_clear(desc->arch.cpu_mask); > > @@ -227,8 +228,9 @@ static void _clear_irq_vector(struct irq_desc *desc) > > > > /* If we were in motion, also clear desc->arch.old_vector */ > > old_vector = desc->arch.old_vector; > > -cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map); > > > > +cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map); > > +tmp_mask = get_scratch_cpumask(); > > Did you test this? It looks overwhelmingly likely that the two > lines need to be the other way around. Urg, yes, I've tested it but likely missed to trigger this case and even worse failed to spot it on my own review. It's obviously wrong. > > +/* > > + * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or #NMI > > + * context. > > + */ > > +BUG_ON(in_irq() || in_mce_handler() || in_nmi_handler()); > > + > > +if ( use && unlikely(this_cpu(scratch_cpumask_use)) ) > > +{ > > +printk("scratch CPU mask already in use by %ps (%p)\n", > > + this_cpu(scratch_cpumask_use), > > this_cpu(scratch_cpumask_use)); > > Why the raw %p as well? We don't do so elsewhere, I think. Yes, > it's debugging code only, but I wonder anyway. I use addr2line to find the offending line, and it's much easier to do so if you have the address directly, rather than having to use nm in order to figure out the address of the symbol and then add the offset. Maybe I'm missing some other way to do this more easily? Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore suspend event channel node optional
Hi Paul, On 28/02/2020 09:28, Durrant, Paul wrote: -Original Message- From: Julien Grall Sent: 27 February 2020 22:52 To: Durrant, Paul ; xen-devel@lists.xenproject.org Cc: Anthony PERARD ; Ian Jackson ; Wei Liu Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore suspend event channel node optional Hi, On 26/02/2020 16:08, Paul Durrant wrote: The purpose and semantics of the node are explained in xenstore-paths.pandoc [1]. It was originally introduced in xend by commit 17636f47a474 "Teach xc_save to use event-channel-based domain suspend if available.". Note that, because, the top-level frontend 'device' node was created writable by the guest in xend, there was no need to explicitly create the 'suspend-event-channel' node as writable node. However, libxl creates the 'device' node as read-only by the guest and so explicit creation of the 'suspend-event-channel' node is necessary to make it usable. This unfortunately has the side-effect of making some old Windows PV drivers [2] cease to function. This is because they scan the top level 'device' node, find the 'suspend' node and expect it to contain the usual sub-nodes describing a PV frontend. When this is found not to be the case, enumeration ceases and (because the 'suspend' node is observed before the 'vbd' node) no system disk is enumerated. Windows will then crash with bugcheck code 0x7B. This patch adds a boolean 'suspend_event_channel' field into libxl_create_info to control whether the xenstore node is created and a similarly named option in xl.cfg which, for compatibility with previous libxl behaviour, defaults to true. [1] https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore- paths.pandoc;hb=HEAD#l177 [2] https://access.redhat.com/documentation/en- us/red_hat_enterprise_linux/5/html/para- virtualized_windows_drivers_guide/sect-para- virtualized_windows_drivers_guide- installing_and_configuring_the_para_virtualized_drivers- installing_the_para_virtualized_drivers NOTE: While adding the new LIBXL_HAVE_CREATEINFO_SUSPEND_EVENT_CHANNEL definition into libxl.h, this patch corrects the previous stanza which erroneously implies ibxl_domain_create_info is a function. Signed-off-by: Paul Durrant --- Cc: Ian Jackson Cc: Wei Liu Cc: Anthony PERARD --- docs/man/xl.cfg.5.pod.in| 7 +++ tools/libxl/libxl.h | 13 - tools/libxl/libxl_create.c | 12 +--- tools/libxl/libxl_types.idl | 1 + tools/xl/xl_parse.c | 3 +++ You may want to update xenstore-paths.pandoc as the document mention the node will be created by the toolstack. The doc already says that: - ~/device/suspend/event-channel = ""|EVTCHN [w] The domain's suspend event channel. The toolstack will create this path with an empty value which the guest may choose to overwrite. - The paragraph you quoted does not suggest the node may not exist. So I think you want to update the documentation to reflect the node may not exist. 5 files changed, 32 insertions(+), 4 deletions(-) diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in index 0cad561375..5f476f1e1d 100644 --- a/docs/man/xl.cfg.5.pod.in +++ b/docs/man/xl.cfg.5.pod.in @@ -668,6 +668,13 @@ file. =back +=item B + +Create the xenstore path for the domain's suspend event channel. The +existence of this path can cause problems with older PV drivers running +in the guest. If this option is not specified then it will default to +B. In the next patch you are going to make device/ rw. Do you see any issue with just not creating the node for everyone? Are PV drivers allowed to assume a node will be present? Well that's the problem. Some of them may have become accustomed to it being present. Also the documentation does say it is created by the toolstack (but not when). Perhaps I should update the doc to say the toolstack *may* create this path when the domain is created. Hmmm fair point. However, this now means you may need to modify your configuration file depending on the PV driver installed. This is not a very ideal situation to be in when you have to upgrade your OS (imagine the new PV driver requires the path). How do you envision this to work? My knowledge of xenstore is limited, so I thought I would ask the questions to understand a bit more how stable the ABI is meant to be. :). Stable? That'll be the day :-) Thank you for the confirmation :). Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 1/2] x86/smp: use a dedicated CPU mask in send_IPI_mask
On Fri, Feb 28, 2020 at 11:08:43AM +0100, Jan Beulich wrote: > On 28.02.2020 10:33, Roger Pau Monne wrote: > > --- a/xen/arch/x86/smp.c > > +++ b/xen/arch/x86/smp.c > > @@ -59,6 +59,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int > > vector, > > apic_write(APIC_ICR, cfg); > > } > > > > +DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask); > > This needs to be put in a header, so that ... > > > --- a/xen/arch/x86/smpboot.c > > +++ b/xen/arch/x86/smpboot.c > > @@ -57,6 +57,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask); > > DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask); > > static cpumask_t scratch_cpu0mask; > > > > +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask); > > ... this gets compiled with having seen the declaration, such > that if one gets changed without also changing the other, the > build will break. Right, was trying to limit the scope of the declaration, but your suggestion makes sense. > Everything else looks fine to me. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask
On 28.02.2020 10:33, Roger Pau Monne wrote: > Current usage of the per-CPU scratch cpumask is dangerous since > there's no way to figure out if the mask is already being used except > for manual code inspection of all the callers and possible call paths. > > This is unsafe and not reliable, so introduce a minimal get/put > infrastructure to prevent nested usage of the scratch mask and usage > in interrupt context. > > Move the definition of scratch_cpumask to smp.c in order to place the > declaration and the accessors as close as possible. You've changed one instance of "declaration", but not also the other. > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc) > { > unsigned int cpu, old_vector, irq = desc->irq; > unsigned int vector = desc->arch.vector; > -cpumask_t *tmp_mask = this_cpu(scratch_cpumask); > +cpumask_t *tmp_mask = get_scratch_cpumask(); > > BUG_ON(!valid_irq_vector(vector)); > > @@ -208,6 +208,7 @@ static void _clear_irq_vector(struct irq_desc *desc) > ASSERT(per_cpu(vector_irq, cpu)[vector] == irq); > per_cpu(vector_irq, cpu)[vector] = ~irq; > } > +put_scratch_cpumask(tmp_mask); > > desc->arch.vector = IRQ_VECTOR_UNASSIGNED; > cpumask_clear(desc->arch.cpu_mask); > @@ -227,8 +228,9 @@ static void _clear_irq_vector(struct irq_desc *desc) > > /* If we were in motion, also clear desc->arch.old_vector */ > old_vector = desc->arch.old_vector; > -cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map); > > +cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map); > +tmp_mask = get_scratch_cpumask(); Did you test this? It looks overwhelmingly likely that the two lines need to be the other way around. > @@ -4384,6 +4389,17 @@ static int __do_update_va_mapping( > break; > } > > +switch ( flags & ~UVMF_FLUSHTYPE_MASK ) > +{ > +case UVMF_LOCAL: > +case UVMF_ALL: > +break; > + > +default: > +put_scratch_cpumask(mask); > +} > + > + > return rc; No two successive blank lines please. > --- a/xen/arch/x86/smp.c > +++ b/xen/arch/x86/smp.c > @@ -25,6 +25,31 @@ > #include > #include > > +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask); > + > +#ifndef NDEBUG > +cpumask_t *scratch_cpumask(bool use) > +{ > +static DEFINE_PER_CPU(void *, scratch_cpumask_use); I'd make this "const void *", btw. > +/* > + * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or #NMI > + * context. > + */ > +BUG_ON(in_irq() || in_mce_handler() || in_nmi_handler()); > + > +if ( use && unlikely(this_cpu(scratch_cpumask_use)) ) > +{ > +printk("scratch CPU mask already in use by %ps (%p)\n", > + this_cpu(scratch_cpumask_use), this_cpu(scratch_cpumask_use)); Why the raw %p as well? We don't do so elsewhere, I think. Yes, it's debugging code only, but I wonder anyway. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 1/2] x86/smp: use a dedicated CPU mask in send_IPI_mask
On 28.02.2020 10:33, Roger Pau Monne wrote: > --- a/xen/arch/x86/smp.c > +++ b/xen/arch/x86/smp.c > @@ -59,6 +59,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int > vector, > apic_write(APIC_ICR, cfg); > } > > +DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask); This needs to be put in a header, so that ... > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -57,6 +57,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask); > DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask); > static cpumask_t scratch_cpu0mask; > > +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask); ... this gets compiled with having seen the declaration, such that if one gets changed without also changing the other, the build will break. Everything else looks fine to me. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore suspend event channel node optional
> -Original Message- > From: Julien Grall > Sent: 27 February 2020 22:52 > To: Durrant, Paul ; xen-devel@lists.xenproject.org > Cc: Anthony PERARD ; Ian Jackson > ; Wei Liu > Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore > suspend event channel node optional > > Hi, > > On 26/02/2020 16:08, Paul Durrant wrote: > > The purpose and semantics of the node are explained in > > xenstore-paths.pandoc [1]. It was originally introduced in xend by > commit > > 17636f47a474 "Teach xc_save to use event-channel-based domain suspend if > > available.". Note that, because, the top-level frontend 'device' node > was > > created writable by the guest in xend, there was no need to explicitly > > create the 'suspend-event-channel' node as writable node. > > > > However, libxl creates the 'device' node as read-only by the guest and > so > > explicit creation of the 'suspend-event-channel' node is necessary to > make > > it usable. This unfortunately has the side-effect of making some old > > Windows PV drivers [2] cease to function. This is because they scan the > top > > level 'device' node, find the 'suspend' node and expect it to contain > the > > usual sub-nodes describing a PV frontend. When this is found not to be > the > > case, enumeration ceases and (because the 'suspend' node is observed > before > > the 'vbd' node) no system disk is enumerated. Windows will then crash > with > > bugcheck code 0x7B. > > > > This patch adds a boolean 'suspend_event_channel' field into > > libxl_create_info to control whether the xenstore node is created and a > > similarly named option in xl.cfg which, for compatibility with previous > > libxl behaviour, defaults to true. > > > > [1] > https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore- > paths.pandoc;hb=HEAD#l177 > > [2] https://access.redhat.com/documentation/en- > us/red_hat_enterprise_linux/5/html/para- > virtualized_windows_drivers_guide/sect-para- > virtualized_windows_drivers_guide- > installing_and_configuring_the_para_virtualized_drivers- > installing_the_para_virtualized_drivers > > > > NOTE: While adding the new LIBXL_HAVE_CREATEINFO_SUSPEND_EVENT_CHANNEL > >definition into libxl.h, this patch corrects the previous stanza > >which erroneously implies ibxl_domain_create_info is a function. > > > > Signed-off-by: Paul Durrant > > --- > > Cc: Ian Jackson > > Cc: Wei Liu > > Cc: Anthony PERARD > > --- > > docs/man/xl.cfg.5.pod.in| 7 +++ > > tools/libxl/libxl.h | 13 - > > tools/libxl/libxl_create.c | 12 +--- > > tools/libxl/libxl_types.idl | 1 + > > tools/xl/xl_parse.c | 3 +++ > > You may want to update xenstore-paths.pandoc as the document mention the > node will be created by the toolstack. > The doc already says that: - ~/device/suspend/event-channel = ""|EVTCHN [w] The domain's suspend event channel. The toolstack will create this path with an empty value which the guest may choose to overwrite. - > > 5 files changed, 32 insertions(+), 4 deletions(-) > > > > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in > > index 0cad561375..5f476f1e1d 100644 > > --- a/docs/man/xl.cfg.5.pod.in > > +++ b/docs/man/xl.cfg.5.pod.in > > @@ -668,6 +668,13 @@ file. > > > > =back > > > > +=item B > > + > > +Create the xenstore path for the domain's suspend event channel. The > > +existence of this path can cause problems with older PV drivers running > > +in the guest. If this option is not specified then it will default to > > +B. > > In the next patch you are going to make device/ rw. Do you see any issue > with just not creating the node for everyone? Are PV drivers allowed to > assume a node will be present? Well that's the problem. Some of them may have become accustomed to it being present. Also the documentation does say it is created by the toolstack (but not when). Perhaps I should update the doc to say the toolstack *may* create this path when the domain is created. > > My knowledge of xenstore is limited, so I thought I would ask the > questions to understand a bit more how stable the ABI is meant to be. :). Stable? That'll be the day :-) Paul > > Cheers, > > -- > Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/grant: Fix signed/unsigned comparisons issues
Hi Jan, On 28/02/2020 09:19, Jan Beulich wrote: On 28.02.2020 10:09, Julien Grall wrote: Hi Jan, On 28/02/2020 08:41, Jan Beulich wrote: On 27.02.2020 19:46, Andrew Cooper wrote: Each function takes an unsigned count, and loops based on a signed i. This causes problems when between 2 and 4 billion operations are requested. I can see problems, but not (as the title says) with comparison issues (the signed i gets converted to unsigned for the purpose of the comparison). In practice, signed-ness issues aren't possible because count exceeding INT_MAX is excluded earlier in do_grant_op(), but the code reads as if it is buggy, and GCC obviously can't spot this either. Bloat-o-meter reports: add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-95 (-95) Function old new delta do_grant_table_op 71557140 -15 gnttab_transfer 27322716 -16 gnttab_unmap_grant_ref 771 739 -32 gnttab_unmap_and_replace 771 739 -32 Total: Before=2996364, After=2996269, chg -0.00% and inspection of gnttab_unmap_grant_ref() at least reveals one fewer local variables on the stack. This is a good thing for x86. However, having started to familiarize myself a tiny bit with RISC-V, I'm no longer certain our present way of preferring unsigned int for array indexing variable and alike is actually a good thing without further abstraction. Nevertheless, this isn't an immediate issue, so ... Would you mind expanding a bit more about this comment here? How unsigned int is going to make things worse on RISC-V? Other than x86-64 and Arm64, 64-bit (and wider) RISC-V sign-extend the result of 32-bit operations by default. Hence for an unsigned 32-bit type to be used as array index, an additional zero-extending insn is going to be needed (just like for our existing ports a sign-extending insn is needed when array indexing variables are calculated as 32-bit signed quantities, which is one of the reasons why right now we prefer unsigned int in such cases). Meh, I am not convinced this is a good enough reason to switch array indexes to signed int for RISC-V. There are probably better place to look at optimization in common code and would benefits all arch. Regarding the reason for "unsigned int", I don't request it because it produce "shorter' code but because there is no reason to have a signed index for array. Anyway, let's cross that bridge when we have an actual RISC-V port for Xen merged. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v4 2/2] x86: add accessors for scratch cpu mask
Current usage of the per-CPU scratch cpumask is dangerous since there's no way to figure out if the mask is already being used except for manual code inspection of all the callers and possible call paths. This is unsafe and not reliable, so introduce a minimal get/put infrastructure to prevent nested usage of the scratch mask and usage in interrupt context. Move the definition of scratch_cpumask to smp.c in order to place the declaration and the accessors as close as possible. Signed-off-by: Roger Pau Monné --- Changes since v3: - Fix commit message. - Split the cpumask taken section into two in _clear_irq_vector. - Add an empty statement in do_mmuext_op to avoid a break. - Change the logic used to release the scratch cpumask in __do_update_va_mapping. - Add a %ps print to scratch_cpumask helper. - Remove printing the current IP, as that would be done by BUG anyway. - Pass the cpumask to put_scratch_cpumask and zap the pointer. Changes since v1: - Use __builtin_return_address(0) instead of __func__. - Move declaration of scratch_cpumask and scratch_cpumask accessor to smp.c. - Do not allow usage in #MC or #NMI context. --- xen/arch/x86/io_apic.c| 6 -- xen/arch/x86/irq.c| 14 ++ xen/arch/x86/mm.c | 40 +++ xen/arch/x86/msi.c| 4 +++- xen/arch/x86/smp.c| 25 xen/arch/x86/smpboot.c| 1 - xen/include/asm-x86/smp.h | 14 ++ 7 files changed, 84 insertions(+), 20 deletions(-) diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c index e98e08e9c8..0bb994f0ba 100644 --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -2236,10 +2236,11 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a entry.vector = vector; if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) { -cpumask_t *mask = this_cpu(scratch_cpumask); +cpumask_t *mask = get_scratch_cpumask(); cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS); SET_DEST(entry, logical, cpu_mask_to_apicid(mask)); +put_scratch_cpumask(mask); } else { printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n", irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS)); @@ -2433,10 +2434,11 @@ int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val) if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) ) { -cpumask_t *mask = this_cpu(scratch_cpumask); +cpumask_t *mask = get_scratch_cpumask(); cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS); SET_DEST(rte, logical, cpu_mask_to_apicid(mask)); +put_scratch_cpumask(mask); } else { diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index cc2eb8e925..19488dae21 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc) { unsigned int cpu, old_vector, irq = desc->irq; unsigned int vector = desc->arch.vector; -cpumask_t *tmp_mask = this_cpu(scratch_cpumask); +cpumask_t *tmp_mask = get_scratch_cpumask(); BUG_ON(!valid_irq_vector(vector)); @@ -208,6 +208,7 @@ static void _clear_irq_vector(struct irq_desc *desc) ASSERT(per_cpu(vector_irq, cpu)[vector] == irq); per_cpu(vector_irq, cpu)[vector] = ~irq; } +put_scratch_cpumask(tmp_mask); desc->arch.vector = IRQ_VECTOR_UNASSIGNED; cpumask_clear(desc->arch.cpu_mask); @@ -227,8 +228,9 @@ static void _clear_irq_vector(struct irq_desc *desc) /* If we were in motion, also clear desc->arch.old_vector */ old_vector = desc->arch.old_vector; -cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map); +cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map); +tmp_mask = get_scratch_cpumask(); for_each_cpu(cpu, tmp_mask) { ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq); @@ -236,6 +238,7 @@ static void _clear_irq_vector(struct irq_desc *desc) per_cpu(vector_irq, cpu)[old_vector] = ~irq; } +put_scratch_cpumask(tmp_mask); release_old_vec(desc); desc->arch.move_in_progress = 0; @@ -1152,10 +1155,11 @@ static void irq_guest_eoi_timer_fn(void *data) break; case ACKTYPE_EOI: -cpu_eoi_map = this_cpu(scratch_cpumask); +cpu_eoi_map = get_scratch_cpumask(); cpumask_copy(cpu_eoi_map, action->cpu_eoi_map); spin_unlock_irq(&desc->lock); on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0); +put_scratch_cpumask(cpu_eoi_map); return; } @@ -2531,12 +2535,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose) unsigned int irq; static int warned; struct irq_desc *desc; +cpumask_t *affinity = get_scratch_cpumask(); for ( irq = 0; irq < nr_irqs; irq++ ) { bool break_affinity = fa
[Xen-devel] [PATCH v4 0/2] x86: scratch cpumask fixes/improvement
Hello, Following series contain yet one more bugfix that removes the usage of the scratch cpumask in send_IPI_mask and the introduction of accessors to get/put the per-CPU scratch cpumask in order to prevent such issues form happening in the future. Thanks, Roger. Roger Pau Monne (2): x86/smp: use a dedicated CPU mask in send_IPI_mask x86: add accessors for scratch cpu mask xen/arch/x86/io_apic.c| 6 -- xen/arch/x86/irq.c| 14 ++ xen/arch/x86/mm.c | 40 +++ xen/arch/x86/msi.c| 4 +++- xen/arch/x86/smp.c| 29 +++- xen/arch/x86/smpboot.c| 10 -- xen/include/asm-x86/smp.h | 14 ++ 7 files changed, 95 insertions(+), 22 deletions(-) -- 2.25.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v4 1/2] x86/smp: use a dedicated CPU mask in send_IPI_mask
Some callers of send_IPI_mask pass the scratch cpumask as the mask parameter of send_IPI_mask, so the scratch cpumask cannot be used by the function. The following trace has been obtained with a debug patch and shows one of those callers: (XEN) scratch CPU mask already in use by arch/x86/mm.c#_get_page_type+0x1f9/0x1abf (XEN) Xen BUG at smp.c:45 [...] (XEN) Xen call trace: (XEN)[] R scratch_cpumask+0xd3/0xf9 (XEN)[] F send_IPI_mask+0x72/0x1ca (XEN)[] F flush_area_mask+0x10c/0x16c (XEN)[] F arch/x86/mm.c#_get_page_type+0x3ff/0x1abf (XEN)[] F get_page_type+0xe/0x2c (XEN)[] F pv_set_gdt+0xa1/0x2aa (XEN)[] F arch_set_info_guest+0x1196/0x16ba (XEN)[] F default_initialise_vcpu+0xc7/0xd4 (XEN)[] F arch_initialise_vcpu+0x61/0xcd (XEN)[] F do_vcpu_op+0x219/0x690 (XEN)[] F pv_hypercall+0x2f6/0x593 (XEN)[] F lstar_enter+0x112/0x120 _get_page_type will use the scratch cpumask to call flush_tlb_mask, which in turn calls send_IPI_mask. Fix this by using a dedicated per CPU cpumask in send_IPI_mask. Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible') Signed-off-by: Roger Pau Monné --- xen/arch/x86/smp.c | 4 +++- xen/arch/x86/smpboot.c | 9 - 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c index 0461812cf6..072638f0f6 100644 --- a/xen/arch/x86/smp.c +++ b/xen/arch/x86/smp.c @@ -59,6 +59,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector, apic_write(APIC_ICR, cfg); } +DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask); + /* * send_IPI_mask(cpumask, vector): sends @vector IPI to CPUs in @cpumask, * excluding the local CPU. @cpumask may be empty. @@ -67,7 +69,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector, void send_IPI_mask(const cpumask_t *mask, int vector) { bool cpus_locked = false; -cpumask_t *scratch = this_cpu(scratch_cpumask); +cpumask_t *scratch = this_cpu(send_ipi_cpumask); if ( in_irq() || in_mce_handler() || in_nmi_handler() ) { diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index ad49f2dcd7..6c548b0b53 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -57,6 +57,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask); DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask); static cpumask_t scratch_cpu0mask; +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask); +static cpumask_t send_ipi_cpu0mask; + cpumask_t cpu_online_map __read_mostly; EXPORT_SYMBOL(cpu_online_map); @@ -930,6 +933,8 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove) FREE_CPUMASK_VAR(per_cpu(cpu_core_mask, cpu)); if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask ) FREE_CPUMASK_VAR(per_cpu(scratch_cpumask, cpu)); +if ( per_cpu(send_ipi_cpumask, cpu) != &send_ipi_cpu0mask ) +FREE_CPUMASK_VAR(per_cpu(send_ipi_cpumask, cpu)); } cleanup_cpu_root_pgt(cpu); @@ -1034,7 +1039,8 @@ static int cpu_smpboot_alloc(unsigned int cpu) if ( !(cond_zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) && cond_zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) && - cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) ) + cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu)) && + cond_alloc_cpumask_var(&per_cpu(send_ipi_cpumask, cpu))) ) goto out; rc = 0; @@ -1175,6 +1181,7 @@ void __init smp_prepare_boot_cpu(void) cpumask_set_cpu(cpu, &cpu_present_map); #if NR_CPUS > 2 * BITS_PER_LONG per_cpu(scratch_cpumask, cpu) = &scratch_cpu0mask; +per_cpu(send_ipi_cpumask, cpu) = &send_ipi_cpu0mask; #endif get_cpu_info()->use_pv_cr3 = false; -- 2.25.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen: make sure stop_machine_run() is always called in a tasklet
On 28.02.2020 10:18, Jürgen Groß wrote: > On 28.02.20 10:15, Jan Beulich wrote: >> On 28.02.2020 09:58, Jürgen Groß wrote: >>> On 28.02.20 09:27, Jan Beulich wrote: On 28.02.2020 08:19, Juergen Gross wrote: > With core scheduling active it is mandatory for stop_machine_run() to > be called in idle context only (so either during boot or in a tasklet), > as otherwise a scheduling deadlock would occur: stop_machine_run() > does a cpu rendezvous by activating a tasklet on all other cpus. In > case stop_machine_run() was not called in an idle vcpu it would block > scheduling the idle vcpu on its siblings with core scheduling being > active, resulting in a hang. > > Put a BUG_ON() into stop_machine_run() to test for being called in an > idle vcpu only and adapt the missing call site (ucode loading) to use a > tasklet for calling stop_machine_run(). > > Signed-off-by: Juergen Gross > --- > V2: > - rephrase commit message (Julien Grall) > --- >xen/arch/x86/microcode.c | 54 > +-- >xen/common/stop_machine.c | 1 + >2 files changed, 35 insertions(+), 20 deletions(-) There's no mention anywhere of a connection to your RCU series, nor to rcu_barrier(). Yet the change puts a new restriction also on its use, and iirc this was mentioned in prior discussion. Did I miss anything? >>> >>> Basically this patch makes the restriction explicit. Without the patch >>> stop_machine_run() being called outside of a tasklet would just hang >>> with core scheduling being active. Now it will catch those violations >>> early even with core scheduling inactive. >>> >>> Note that currently there are no violations of this restriction anywhere >>> in the hypervisor other than the one addressed by this patch. >> >> Well, there is a connection to core scheduling. Without it, i.e. >> prior to 4.13, there was no restriction on the placement of >> rcu_barrier() invocations. According to what you say above, the >> restriction was implicitly introduced with core scheduling. It >> should imo be made explicit by attaching a comment, which would >> (again imo) best be done here because now you also make this >> case crash without core scheduling in use. > > Okay, I'll add a comment to stop_machine_run() and to rcu_barrier(). The > rcu_barrier() comment will be then removed by my rcu series again. Right - the alternative then would be to call out a dependency of this patch on that series. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel