Re: [Xen-devel] [PATCH] x86/cpu: Support a new cpu vendor, which is Shanghai
Hi Jan, Yes, I mean with another patch. And Thank you again. -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, April 18, 2018 6:52 PM To: Fiona Li(BJ-RD)Cc: xen-devel Subject: RE: RE: [PATCH] x86/cpu: Support a new cpu vendor,which is Shanghai >>> On 18.04.18 at 12:25, wrote: > [FionaLi] : I am sorry. I understood wrongly. The C000 range are > extensions, which provide some additional feature different from > Intel. As you suggested, we will enable those features in guest OSes > and remove these code from patch. Can we support the CPUID leaves and > emulate the MSR with another submit? With another patch you mean? Yes please - if at all possible put separate sets of changes into separate patches, combined into a series if there are dependencies between them. Jan 保密声明: 本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。 CONFIDENTIAL NOTE: This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 5/6] arm: add a small kconfig for qemu-system-aarch64
Disable erratas because they don't apply to QEMU's software emulated CPUs. Arbitrarily choose a limit of 4 CPUs. Select the credit and NULL schedulers only. Signed-off-by: Stefano Stabellini--- xen/arch/arm/configs/qemu.config | 81 1 file changed, 81 insertions(+) create mode 100644 xen/arch/arm/configs/qemu.config diff --git a/xen/arch/arm/configs/qemu.config b/xen/arch/arm/configs/qemu.config new file mode 100644 index 000..6351cfd --- /dev/null +++ b/xen/arch/arm/configs/qemu.config @@ -0,0 +1,81 @@ +# +# Automatically generated file; DO NOT EDIT. +# Xen/arm 4.11-unstable Configuration +# +CONFIG_64BIT=y +CONFIG_ARM_64=y +CONFIG_ARM=y +CONFIG_ARCH_DEFCONFIG="arch/arm/configs/arm64_defconfig" + +# +# Architecture Features +# +CONFIG_NR_CPUS=4 +# CONFIG_ACPI is not set +CONFIG_HAS_GICV3=y +# CONFIG_HAS_MEM_ACCESS is not set +# CONFIG_HAS_ITS is not set +# CONFIG_NEW_VGIC is not set +# CONFIG_SBSA_VUART_CONSOLE is not set + +# +# ARM errata workaround via the alternative framework +# +# CONFIG_ARM64_ERRATUM_827319 is not set +# CONFIG_ARM64_ERRATUM_824069 is not set +# CONFIG_ARM64_ERRATUM_819472 is not set +# CONFIG_ARM64_ERRATUM_832075 is not set +# CONFIG_ARM64_ERRATUM_834220 is not set +CONFIG_HARDEN_BRANCH_PREDICTOR=y +CONFIG_ARM64_HARDEN_BRANCH_PREDICTOR=y + +# +# Common Features +# +CONFIG_HAS_ALTERNATIVE=y +CONFIG_HAS_DEVICE_TREE=y +CONFIG_HAS_PDX=y +# CONFIG_TMEM is not set +# CONFIG_XSM is not set + +# +# Schedulers +# +CONFIG_SCHED_CREDIT=y +# CONFIG_SCHED_CREDIT2 is not set +# CONFIG_SCHED_RTDS is not set +# CONFIG_SCHED_ARINC653 is not set +CONFIG_SCHED_NULL=y +CONFIG_SCHED_CREDIT_DEFAULT=y +# CONFIG_SCHED_NULL_DEFAULT is not set +CONFIG_SCHED_DEFAULT="credit" +# CONFIG_LIVEPATCH is not set +# CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS is not set +CONFIG_CMDLINE="" + +# +# Device Drivers +# +# CONFIG_HAS_NS16550 is not set +# CONFIG_HAS_CADENCE_UART is not set +# CONFIG_HAS_MVEBU is not set +CONFIG_HAS_PL011=y +# CONFIG_HAS_SCIF is not set +# CONFIG_HAS_EHCI is not set +CONFIG_HAS_PASSTHROUGH=y +# CONFIG_HAS_SMMUv2 is not set +# CONFIG_VIDEO is not set +# CONFIG_HAS_ARM_HDLCD is not set +CONFIG_DEFCONFIG_LIST="$ARCH_DEFCONFIG" + +# +# Debugging Options +# +# CONFIG_DEBUG is not set +# CONFIG_FRAME_POINTER is not set +# CONFIG_COVERAGE is not set +# CONFIG_LOCK_PROFILE is not set +# CONFIG_PERF_COUNTERS is not set +# CONFIG_VERBOSE_DEBUG is not set +# CONFIG_DEVICE_TREE_DEBUG is not set +# CONFIG_SCRUB_DEBUG is not set -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 0/6] arm: more kconfig configurability and small default configs
Hi all, This patch series is the first step toward building a small certifiable Xen hypervisor for ARM boards. First, the series makes a few changes to allow disabling more kconfig options: most of them already exist but cannot be disabled. Then, it introduces a reference kconfig for Renesas RCar (due to popular demand, candidate for certifications) and for QEMU aarch64 (not for certifications, but useful for debugging). The last patch in the series adds a convenient cloc target to count the total lines of code of the source files built. There a couple of open questions which need to be addressed in regards to which options to enable/disable in the reference kconfig. The primary one is which schedulers to enable. In this series, I enabled NULL and credit, but none of the others. The choice is somewhat arbitrary but driven by the idea that NULL is required for best interrupt latency results, and credit is the default. They can be paired using multiple cpupools. Aside from the schedulers, most other kconfig options seem pretty obvious. I am interested in hearing other opinions about this. Cheers, Stefano Stefano Stabellini (6): arm: make it possible to disable more kconfig options arm: make it possible to enable/disable UART drivers arm: make it possible to disable the SMMU driver arm: add a small kconfig for Renesas RCar H3 arm: add a small kconfig for qemu-system-aarch64 xen: add cloc target xen/Makefile | 14 ++- xen/Rules.mk | 2 + xen/arch/arm/Kconfig | 15 +-- xen/arch/arm/configs/qemu.config | 81 xen/arch/arm/configs/renesas.config | 80 +++ xen/drivers/char/Kconfig | 16 +++ xen/drivers/passthrough/Kconfig | 2 + xen/drivers/passthrough/arm/Kconfig | 7 xen/drivers/passthrough/arm/Makefile | 2 +- xen/drivers/video/Kconfig| 8 +++- 10 files changed, 211 insertions(+), 16 deletions(-) create mode 100644 xen/arch/arm/configs/qemu.config create mode 100644 xen/arch/arm/configs/renesas.config create mode 100644 xen/drivers/passthrough/arm/Kconfig ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCHv2] x86/xen: Remove use of VLAs
On 04/18/2018 01:08 PM, Laura Abbott wrote: > There's an ongoing effort to remove VLAs[1] from the kernel to eventually > turn on -Wvla. It turns out, the few VLAs in use in Xen produce only a > single entry array that is always bounded by GDT_SIZE. Clean up the code to > get rid of the VLA and the loop. > > [1] https://lkml.org/lkml/2018/3/7/621 > > Signed-off-by: Laura Abbott> --- > v2: Updated the code to reflect that we know size is always bounded by > GDT_SIZE. This gets rid of the array and the loop. I can throw a few > more comments in there if someone thinks they need to be updated. > --- > arch/x86/xen/enlighten_pv.c | 84 > - > 1 file changed, 29 insertions(+), 55 deletions(-) > > diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c > index c36d23aa6c35..1254f2fa3a89 100644 > --- a/arch/x86/xen/enlighten_pv.c > +++ b/arch/x86/xen/enlighten_pv.c > @@ -421,45 +421,32 @@ static void xen_load_gdt(const struct desc_ptr *dtr) > { > unsigned long va = dtr->address; > unsigned int size = dtr->size + 1; > - unsigned pages = DIV_ROUND_UP(size, PAGE_SIZE); > - unsigned long frames[pages]; > - int f; > - > - /* > - * A GDT can be up to 64k in size, which corresponds to 8192 > - * 8-byte entries, or 16 4k pages.. > - */ > + unsigned long pfn, mfn; > + int level; > + pte_t *ptep; > + void *virt; > > - BUG_ON(size > 65536); > + BUG_ON(size > GDT_SIZE); I'd probably BUG_ON(size>PAGE_SIZE) because that's what we are really trying to avoid. Maybe with a comment that we expect GDT_SIZE at most, and it is less than PAGE_SIZE. I can fix it while committing if you don't object. Reviewed-by: Boris Ostrovsky ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 4/6] arm: add a small kconfig for Renesas RCar H3
This is a reference tiny kconfig for Renesas RCar. In terms of schedulers, it selects credit and NULL only. It enables all the ARM64 errata. Signed-off-by: Stefano StabelliniCC: artem_myga...@epam.com CC: volodymyr_babc...@epam.com --- This patch is untested on Renesas RCar, please test! Also, I am not sure whether some of the errata workarounds can be disabled on the RCar. --- xen/arch/arm/configs/renesas.config | 80 + 1 file changed, 80 insertions(+) create mode 100644 xen/arch/arm/configs/renesas.config diff --git a/xen/arch/arm/configs/renesas.config b/xen/arch/arm/configs/renesas.config new file mode 100644 index 000..7ad3f1c --- /dev/null +++ b/xen/arch/arm/configs/renesas.config @@ -0,0 +1,80 @@ +# +# Automatically generated file; DO NOT EDIT. +# Xen/arm 4.11-unstable Configuration +# +CONFIG_64BIT=y +CONFIG_ARM_64=y +CONFIG_ARM=y +CONFIG_ARCH_DEFCONFIG="arch/arm/configs/arm64_defconfig" + +# +# Architecture Features +# +CONFIG_NR_CPUS=8 +# CONFIG_ACPI is not set +# CONFIG_HAS_GICV3 is not set +# CONFIG_HAS_MEM_ACCESS is not set +# CONFIG_NEW_VGIC is not set +# CONFIG_SBSA_VUART_CONSOLE is not set + +# +# ARM errata workaround via the alternative framework +# +CONFIG_ARM64_ERRATUM_827319=y +CONFIG_ARM64_ERRATUM_824069=y +CONFIG_ARM64_ERRATUM_819472=y +CONFIG_ARM64_ERRATUM_832075=y +CONFIG_ARM64_ERRATUM_834220=y +CONFIG_HARDEN_BRANCH_PREDICTOR=y +CONFIG_ARM64_HARDEN_BRANCH_PREDICTOR=y + +# +# Common Features +# +CONFIG_HAS_ALTERNATIVE=y +CONFIG_HAS_DEVICE_TREE=y +CONFIG_HAS_PDX=y +# CONFIG_TMEM is not set +# CONFIG_XSM is not set + +# +# Schedulers +# +CONFIG_SCHED_CREDIT=y +# CONFIG_SCHED_CREDIT2 is not set +# CONFIG_SCHED_RTDS is not set +# CONFIG_SCHED_ARINC653 is not set +CONFIG_SCHED_NULL=y +# CONFIG_SCHED_CREDIT_DEFAULT is not set +CONFIG_SCHED_NULL_DEFAULT=y +CONFIG_SCHED_DEFAULT="null" +# CONFIG_LIVEPATCH is not set +# CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS is not set +CONFIG_CMDLINE="" + +# +# Device Drivers +# +# CONFIG_HAS_NS16550 is not set +# CONFIG_HAS_CADENCE_UART is not set +# CONFIG_HAS_MVEBU is not set +# CONFIG_HAS_PL011 is not set +CONFIG_HAS_SCIF=y +# CONFIG_HAS_EHCI is not set +CONFIG_HAS_PASSTHROUGH=y +CONFIG_HAS_SMMUv2=y +# CONFIG_VIDEO is not set +# CONFIG_HAS_ARM_HDLCD is not set +CONFIG_DEFCONFIG_LIST="$ARCH_DEFCONFIG" + +# +# Debugging Options +# +# CONFIG_DEBUG is not set +# CONFIG_FRAME_POINTER is not set +# CONFIG_COVERAGE is not set +# CONFIG_LOCK_PROFILE is not set +# CONFIG_PERF_COUNTERS is not set +# CONFIG_VERBOSE_DEBUG is not set +# CONFIG_DEVICE_TREE_DEBUG is not set +# CONFIG_SCRUB_DEBUG is not set -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 6/6] xen: add cloc target
Add a Xen build target to count the lines of code of the source files built. Uses `cloc' to do the job. Generate the list of source files from the %.o targets, append output to "sourcelist". Remove sourcelist on clean, and also at the beginning of the build target to avoid appending to sourcelist on consequence builds. Otherwise one could imagine sourcelist could become large if the user builds Xen repeatedly without calling clean. For the cloc target, first clean, then build to make sure all files are properly accounted (no partial builds). Signed-off-by: Stefano StabelliniCC: jbeul...@suse.com CC: andrew.coop...@citrix.com --- xen/Makefile | 14 -- xen/Rules.mk | 2 ++ 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/xen/Makefile b/xen/Makefile index 62d479c..f53fd22 100644 --- a/xen/Makefile +++ b/xen/Makefile @@ -48,7 +48,7 @@ else endif .PHONY: _build -_build: $(TARGET)$(CONFIG_XEN_INSTALL_SUFFIX) +_build: clean-sourcelist $(TARGET)$(CONFIG_XEN_INSTALL_SUFFIX) .PHONY: _install _install: D=$(DESTDIR) @@ -108,7 +108,7 @@ _debug: $(OBJDUMP) -D -S $(TARGET)-syms > $(TARGET).s .PHONY: _clean -_clean: delete-unfresh-files +_clean: delete-unfresh-files clean-sourcelist $(MAKE) -C tools clean $(MAKE) -f $(BASEDIR)/Rules.mk -C include clean $(MAKE) -f $(BASEDIR)/Rules.mk -C common clean @@ -267,3 +267,13 @@ $(KCONFIG_CONFIG): include/config/auto.conf.cmd: ; -include $(BASEDIR)/include/config/auto.conf.cmd + +.PHONY: cloc +cloc: $(BASEDIR)/sourcelist + cloc --list-file=$(BASEDIR)/sourcelist + +$(BASEDIR)/sourcelist: clean build + +.PHONY: clean-sourcelist +clean-sourcelist: + rm -f $(BASEDIR)/sourcelist diff --git a/xen/Rules.mk b/xen/Rules.mk index 5337e20..cb48aa7 100644 --- a/xen/Rules.mk +++ b/xen/Rules.mk @@ -190,9 +190,11 @@ _clean_%/: FORCE $(MAKE) -f $(BASEDIR)/Rules.mk -C $* clean %.o: %.c Makefile + echo `pwd`/$< >> $(BASEDIR)/sourcelist $(CC) $(CFLAGS) -c $< -o $@ %.o: %.S Makefile + echo `pwd`/$< >> $(BASEDIR)/sourcelist $(CC) $(AFLAGS) -c $< -o $@ SPECIAL_DATA_SECTIONS := rodata $(foreach a,1 2 4 8 16, \ -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/6] arm: make it possible to disable more kconfig options
Make it possible to disable the following existing kconfig options: HAS_GICV3 HAS_ARM_HDLCD HAS_MEM_ACCESS Today they are silent option. This patch adds one line descriptions and make them de/selectable. Also, do not select VIDEO: make HAS_ARM_HDLCD select VIDEO instead. In fact, VIDEO is only needed by HAS_ARM_HDLCD. Signed-off-by: Stefano Stabellini--- xen/arch/arm/Kconfig | 15 +++ xen/drivers/video/Kconfig | 8 +++- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig index 8174c0c..83963c8 100644 --- a/xen/arch/arm/Kconfig +++ b/xen/arch/arm/Kconfig @@ -12,17 +12,13 @@ config ARM_32 config ARM_64 def_bool y depends on 64BIT - select HAS_GICV3 config ARM def_bool y select HAS_ALTERNATIVE - select HAS_ARM_HDLCD select HAS_DEVICE_TREE - select HAS_MEM_ACCESS select HAS_PASSTHROUGH select HAS_PDX - select VIDEO config ARCH_DEFCONFIG string @@ -44,6 +40,17 @@ config ACPI config HAS_GICV3 bool + prompt "GICv3 driver" + default y + +config HAS_MEM_ACCESS + bool + prompt "Memory Access and VM events" + default y + ---help--- + + Framework to configure memory access types for guests and receive + related events in userspace. config HAS_ITS bool diff --git a/xen/drivers/video/Kconfig b/xen/drivers/video/Kconfig index 52e8ce6..26daf9a 100644 --- a/xen/drivers/video/Kconfig +++ b/xen/drivers/video/Kconfig @@ -13,4 +13,10 @@ config VGA If unsure, say Y. config HAS_ARM_HDLCD - bool + bool "ARM HDLCD driver" + default n + select VIDEO + ---help--- + Enable Xen video output to ARM HDLCD graphic controllers. + + if unsure, say N. -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/6] arm: make it possible to enable/disable UART drivers
All the UART drivers are silent options. Add one line descriptions so that can be de/selected via menuconfig. Signed-off-by: Stefano Stabellini--- xen/drivers/char/Kconfig | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/xen/drivers/char/Kconfig b/xen/drivers/char/Kconfig index cc78ec3..b6fcefd 100644 --- a/xen/drivers/char/Kconfig +++ b/xen/drivers/char/Kconfig @@ -1,11 +1,11 @@ config HAS_NS16550 - bool + bool "NS16550 UART driver" default y help This selects the 16550-series UART support. For most systems, say Y. config HAS_CADENCE_UART - bool + bool "Xilinx Cadence UART driver" default y depends on ARM_64 help @@ -13,7 +13,7 @@ config HAS_CADENCE_UART based board, say Y. config HAS_MVEBU - bool + bool "Marvell MVEBU UART driver" default y depends on ARM_64 help @@ -21,7 +21,7 @@ config HAS_MVEBU based board, say Y. config HAS_PL011 - bool + bool "ARM PL011 UART driver" default y depends on ARM help @@ -29,7 +29,7 @@ config HAS_PL011 an Integrator/PP2, Integrator/CP or Versatile platform, say Y. config HAS_EXYNOS4210 - bool + bool "Samsung Exynos 4210 UART driver" default y depends on ARM_32 help @@ -37,7 +37,7 @@ config HAS_EXYNOS4210 Exynos based board, say Y. config HAS_OMAP - bool + bool "Texas Instruments OMAP UART driver" default y depends on ARM_32 help @@ -45,7 +45,7 @@ config HAS_OMAP Instruments based CPU, say Y. config HAS_SCIF - bool + bool "SuperH SCI(F) UART driver" default y depends on ARM help @@ -53,7 +53,7 @@ config HAS_SCIF or Renesas R-Car Gen 2/3 based board say Y. config HAS_EHCI - bool + bool "EHCI UART driver" help This selects the USB based EHCI debug port to be used as a UART. If you have an x86 based system with USB, say Y. -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 3/6] arm: make it possible to disable the SMMU driver
Introduce a Kconfig option for the ARM SMMUv2 driver. Signed-off-by: Stefano StabelliniCC: jbeul...@suse.com --- xen/drivers/passthrough/Kconfig | 2 ++ xen/drivers/passthrough/arm/Kconfig | 7 +++ xen/drivers/passthrough/arm/Makefile | 2 +- 3 files changed, 10 insertions(+), 1 deletion(-) create mode 100644 xen/drivers/passthrough/arm/Kconfig diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig index 8d90b67..2fe94fa 100644 --- a/xen/drivers/passthrough/Kconfig +++ b/xen/drivers/passthrough/Kconfig @@ -1,3 +1,5 @@ config HAS_PASSTHROUGH bool + +source "drivers/passthrough/arm/Kconfig" diff --git a/xen/drivers/passthrough/arm/Kconfig b/xen/drivers/passthrough/arm/Kconfig new file mode 100644 index 000..98318d9 --- /dev/null +++ b/xen/drivers/passthrough/arm/Kconfig @@ -0,0 +1,7 @@ + +config HAS_SMMUv2 + bool "ARM SMMUv2 driver" + default y + depends on ARM + ---help--- + Driver for the ARM SMMU version 2, a popular IOMMU by ARM. diff --git a/xen/drivers/passthrough/arm/Makefile b/xen/drivers/passthrough/arm/Makefile index f4cd26e..4db273a 100644 --- a/xen/drivers/passthrough/arm/Makefile +++ b/xen/drivers/passthrough/arm/Makefile @@ -1,2 +1,2 @@ obj-y += iommu.o -obj-y += smmu.o +obj-$(HAS_SMMUv2) += smmu.o -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.11] x86/spec_ctrl: Updates to retpoline-safety decision making
All of this is as recommended by the Intel whitepaper: https://software.intel.com/sites/default/files/managed/1d/46/Retpoline-A-Branch-Target-Injection-Mitigation.pdf The RSB-alternative bit in MSR_ARCH_CAPABILITIES may be set by a hypervisor to indicate that the virtual machine may migrate to a processor which isn't retpoline-safe. Introduce a shortened name (to reduce code volume), treat it as authorative in retpoline_safe(), and print its value along with the other ARCH_CAPS bits. The exact processor models which do have RSB semantics which fall back to BTB predictions are enumerated, and include Kabylake and Cannonlake. Leave a printk() in the default case to help identify cases which aren't covered. The exact microcode versions from Broadwell RSB-safety are taken from the referenced microcode update file (adjusting for the known-bad microcode versions). In practice, this means that all Broadwell hardware with up-to-date microcode will use retpoline in preference to IBRS. Signed-off-by: Andrew Cooper--- CC: Jan Beulich CC: Juergen Gross This should be backported to everywhere which has Spectre mitigations, and therefore should be considered for 4.11 at this point. --- xen/arch/x86/spec_ctrl.c| 48 +++-- xen/include/asm-x86/msr-index.h | 1 + 2 files changed, 42 insertions(+), 7 deletions(-) diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c index 5b5ec90..aff06f0 100644 --- a/xen/arch/x86/spec_ctrl.c +++ b/xen/arch/x86/spec_ctrl.c @@ -113,12 +113,13 @@ static void __init print_details(enum ind_thunk thunk) printk(XENLOG_DEBUG "Speculative mitigation facilities:\n"); /* Hardware features which pertain to speculative mitigations. */ -printk(XENLOG_DEBUG " Hardware features:%s%s%s%s%s\n", +printk(XENLOG_DEBUG " Hardware features:%s%s%s%s%s%s\n", (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "", (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP" : "", (e8b & cpufeat_mask(X86_FEATURE_IBPB)) ? " IBPB" : "", (caps & ARCH_CAPABILITIES_IBRS_ALL) ? " IBRS_ALL" : "", - (caps & ARCH_CAPABILITIES_RDCL_NO) ? " RDCL_NO" : ""); + (caps & ARCH_CAPABILITIES_RDCL_NO) ? " RDCL_NO" : "", + (caps & ARCH_CAPS_RSBA) ? " RBSA" : ""); /* Compiled-in support which pertains to BTI mitigations. */ if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) ) @@ -151,6 +152,20 @@ static bool __init retpoline_safe(void) boot_cpu_data.x86 != 6 ) return false; +if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) ) +{ +uint64_t caps; + +rdmsrl(MSR_ARCH_CAPABILITIES, caps); + +/* + * RBSA may be set by a hypervisor to indicate that we may move to a + * processor which isn't retpoline-safe. + */ +if ( caps & ARCH_CAPS_RSBA ) +return false; +} + switch ( boot_cpu_data.x86_model ) { case 0x17: /* Penryn */ @@ -177,18 +192,37 @@ static bool __init retpoline_safe(void) * versions. */ case 0x3d: /* Broadwell */ -return ucode_rev >= 0x28; +return ucode_rev >= 0x2a; case 0x47: /* Broadwell H */ -return ucode_rev >= 0x1b; +return ucode_rev >= 0x1d; case 0x4f: /* Broadwell EP/EX */ -return ucode_rev >= 0xb25; +return ucode_rev >= 0xb21; case 0x56: /* Broadwell D */ -return false; /* TBD. */ +switch ( boot_cpu_data.x86_mask ) +{ +case 2: return ucode_rev >= 0x15; +case 3: return ucode_rev >= 0x712; +case 4: return ucode_rev >= 0xf11; +case 5: return ucode_rev >= 0xe09; +default: return false; +} +break; /* - * Skylake and later processors are not retpoline-safe. + * Skylake, Kabylake and Cannonlake processors are not retpoline-safe. */ +case 0x4e: +case 0x55: +case 0x5e: +case 0x66: +case 0x67: +case 0x8e: +case 0x9e: +return false; + default: +printk("Unrecognised CPU model %#x - assuming not reptpoline safe\n", + boot_cpu_data.x86_model); return false; } } diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h index 8416756..c9f44eb 100644 --- a/xen/include/asm-x86/msr-index.h +++ b/xen/include/asm-x86/msr-index.h @@ -42,6 +42,7 @@ #define MSR_ARCH_CAPABILITIES 0x010a #define ARCH_CAPABILITIES_RDCL_NO (_AC(1, ULL) << 0) #define ARCH_CAPABILITIES_IBRS_ALL (_AC(1, ULL) << 1) +#define ARCH_CAPS_RSBA (_AC(1, ULL) << 2) /* Intel MSRs. Some also available on other CPUs */ #define MSR_IA32_PERFCTR0 0x00c1 -- 2.1.4 ___
Re: [Xen-devel] Setting up a call to discuss PCI Emulation - Future Direction
On Sat, 14 Apr 2018, Paul Durrant wrote: > > -Original Message- > > From: Roger Pau Monne > > Sent: 13 April 2018 20:53 > > To: Lars Kurth> > Cc: xen-devel ; Daniel Smith > > ; Alexey G ; Stefano > > Stabellini ; Julien Grall ; > > Paul > > Durrant ; Christopher Clark > > ; Rich Persaud > > Subject: Re: Setting up a call to discuss PCI Emulation - Future Direction > > > > On Fri, Apr 13, 2018 at 12:59:15PM +0100, Lars Kurth wrote: > > > > > > > > > On 13/04/2018, 11:01, "Roger Pau Monne" wrote: > > > > > > On Thu, Apr 12, 2018 at 05:50:00PM +0100, Lars Kurth wrote: > > > > > > > > > > > > On 12/04/2018, 17:41, "Roger Pau Monne" > > wrote: > > > > > > > > On Thu, Apr 12, 2018 at 05:32:57PM +0100, Lars Kurth wrote: > > > > > > > > > > > > >may work. For me Mon, Wed and Fri’s generally work at those > > time-slots. > > > > >Next week is a little busy for me, so I would prefer the > > > following > > week. > > > > >If you could fill out the following Google poll, if this > > > week works > > that > > > > >would be great. Otherwise please scream. > > > > > > > > I'm afraid I'm on vacations from the 21st to the 29th of April, > > > so I > > > > won't be able to join the meeting unless we move it to the week > > after. > > > > Let's see what people think of the current dates. > > > > > > > > Roger. > > > > > > > > Hi, I changed the dates to the week after. Poll so far has been > > invalidated. > > > > > > > > See https://doodle.com/poll/gdnmcrvnibmw563n > > > > > > Thanks! I've already fixed my vote. > > > > > > I guess this will come later, but we need a clear agenda of items > > > because the x86 and ARM topics are probably going to be completely > > > different (albeit all related to PCI). > > > > > > Royger: I am OK with trying to get a draft agenda in place in this e-mail > > thread. But I can't drive this, as I don’t understand the issues. But I am > > happy > > to collate everything as for the x86 call and write up minutes > > > > On the x86 side: > > > > - Q35 HVM emulation, adding MCFG support to guests. > > One of the issues here is support for multiple external emulators w.r.t. > config space access. > > > - PVH guest pci-passthrough: using the internal vPCI infrastructure. > > Let's not distinguish between HVM or PVH here. We want to use the same > passthrough mechanisms for both. Ideally, we'll use the same mechanism for ARM too.___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCHv2] x86/xen: Remove use of VLAs
There's an ongoing effort to remove VLAs[1] from the kernel to eventually turn on -Wvla. It turns out, the few VLAs in use in Xen produce only a single entry array that is always bounded by GDT_SIZE. Clean up the code to get rid of the VLA and the loop. [1] https://lkml.org/lkml/2018/3/7/621 Signed-off-by: Laura Abbott--- v2: Updated the code to reflect that we know size is always bounded by GDT_SIZE. This gets rid of the array and the loop. I can throw a few more comments in there if someone thinks they need to be updated. --- arch/x86/xen/enlighten_pv.c | 84 - 1 file changed, 29 insertions(+), 55 deletions(-) diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index c36d23aa6c35..1254f2fa3a89 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -421,45 +421,32 @@ static void xen_load_gdt(const struct desc_ptr *dtr) { unsigned long va = dtr->address; unsigned int size = dtr->size + 1; - unsigned pages = DIV_ROUND_UP(size, PAGE_SIZE); - unsigned long frames[pages]; - int f; - - /* -* A GDT can be up to 64k in size, which corresponds to 8192 -* 8-byte entries, or 16 4k pages.. -*/ + unsigned long pfn, mfn; + int level; + pte_t *ptep; + void *virt; - BUG_ON(size > 65536); + BUG_ON(size > GDT_SIZE); BUG_ON(va & ~PAGE_MASK); - for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) { - int level; - pte_t *ptep; - unsigned long pfn, mfn; - void *virt; - - /* -* The GDT is per-cpu and is in the percpu data area. -* That can be virtually mapped, so we need to do a -* page-walk to get the underlying MFN for the -* hypercall. The page can also be in the kernel's -* linear range, so we need to RO that mapping too. -*/ - ptep = lookup_address(va, ); - BUG_ON(ptep == NULL); - - pfn = pte_pfn(*ptep); - mfn = pfn_to_mfn(pfn); - virt = __va(PFN_PHYS(pfn)); + /* +* The GDT is per-cpu and is in the percpu data area. +* That can be virtually mapped, so we need to do a +* page-walk to get the underlying MFN for the +* hypercall. The page can also be in the kernel's +* linear range, so we need to RO that mapping too. +*/ + ptep = lookup_address(va, ); + BUG_ON(ptep == NULL); - frames[f] = mfn; + pfn = pte_pfn(*ptep); + mfn = pfn_to_mfn(pfn); + virt = __va(PFN_PHYS(pfn)); - make_lowmem_page_readonly((void *)va); - make_lowmem_page_readonly(virt); - } + make_lowmem_page_readonly((void *)va); + make_lowmem_page_readonly(virt); - if (HYPERVISOR_set_gdt(frames, size / sizeof(struct desc_struct))) + if (HYPERVISOR_set_gdt(, size / sizeof(struct desc_struct))) BUG(); } @@ -470,34 +457,21 @@ static void __init xen_load_gdt_boot(const struct desc_ptr *dtr) { unsigned long va = dtr->address; unsigned int size = dtr->size + 1; - unsigned pages = DIV_ROUND_UP(size, PAGE_SIZE); - unsigned long frames[pages]; - int f; - - /* -* A GDT can be up to 64k in size, which corresponds to 8192 -* 8-byte entries, or 16 4k pages.. -*/ + unsigned long pfn, mfn; + pte_t pte; - BUG_ON(size > 65536); + BUG_ON(size > GDT_SIZE); BUG_ON(va & ~PAGE_MASK); - for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) { - pte_t pte; - unsigned long pfn, mfn; + pfn = virt_to_pfn(va); + mfn = pfn_to_mfn(pfn); - pfn = virt_to_pfn(va); - mfn = pfn_to_mfn(pfn); + pte = pfn_pte(pfn, PAGE_KERNEL_RO); - pte = pfn_pte(pfn, PAGE_KERNEL_RO); - - if (HYPERVISOR_update_va_mapping((unsigned long)va, pte, 0)) - BUG(); - - frames[f] = mfn; - } + if (HYPERVISOR_update_va_mapping((unsigned long)va, pte, 0)) + BUG(); - if (HYPERVISOR_set_gdt(frames, size / sizeof(struct desc_struct))) + if (HYPERVISOR_set_gdt(, size / sizeof(struct desc_struct))) BUG(); } -- 2.14.3 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote: > On 04/17/2018 11:57 PM, Dongwon Kim wrote: > >On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote: > >>On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote: > >>>Yeah, I definitely agree on the idea of expanding the use case to the > >>>general domain where dmabuf sharing is used. However, what you are > >>>targetting with proposed changes is identical to the core design of > >>>hyper_dmabuf. > >>> > >>>On top of this basic functionalities, hyper_dmabuf has driver level > >>>inter-domain communication, that is needed for dma-buf remote tracking > >>>(no fence forwarding though), event triggering and event handling, extra > >>>meta data exchange and hyper_dmabuf_id that represents grefs > >>>(grefs are shared implicitly on driver level) > >>This really isn't a positive design aspect of hyperdmabuf imo. The core > >>code in xen-zcopy (ignoring the ioctl side, which will be cleaned up) is > >>very simple & clean. > >> > >>If there's a clear need later on we can extend that. But for now xen-zcopy > >>seems to cover the basic use-case needs, so gets the job done. > >> > >>>Also it is designed with frontend (common core framework) + backend > >>>(hyper visor specific comm and memory sharing) structure for portability. > >>>We just can't limit this feature to Xen because we want to use the same > >>>uapis not only for Xen but also other applicable hypervisor, like ACORN. > >>See the discussion around udmabuf and the needs for kvm. I think trying to > >>make an ioctl/uapi that works for multiple hypervisors is misguided - it > >>likely won't work. > >> > >>On top of that the 2nd hypervisor you're aiming to support is ACRN. That's > >>not even upstream yet, nor have I seen any patches proposing to land linux > >>support for ACRN. Since it's not upstream, it doesn't really matter for > >>upstream consideration. I'm doubting that ACRN will use the same grant > >>references as xen, so the same uapi won't work on ACRN as on Xen anyway. > >Yeah, ACRN doesn't have grant-table. Only Xen supports it. But that is why > >hyper_dmabuf has been architectured with the concept of backend. > >If you look at the structure of backend, you will find that > >backend is just a set of standard function calls as shown here: > > > >struct hyper_dmabuf_bknd_ops { > > /* backend initialization routine (optional) */ > > int (*init)(void); > > > > /* backend cleanup routine (optional) */ > > int (*cleanup)(void); > > > > /* retreiving id of current virtual machine */ > > int (*get_vm_id)(void); > > > > /* get pages shared via hypervisor-specific method */ > > int (*share_pages)(struct page **pages, int vm_id, > >int nents, void **refs_info); > > > > /* make shared pages unshared via hypervisor specific method */ > > int (*unshare_pages)(void **refs_info, int nents); > > > > /* map remotely shared pages on importer's side via > > * hypervisor-specific method > > */ > > struct page ** (*map_shared_pages)(unsigned long ref, int vm_id, > >int nents, void **refs_info); > > > > /* unmap and free shared pages on importer's side via > > * hypervisor-specific method > > */ > > int (*unmap_shared_pages)(void **refs_info, int nents); > > > > /* initialize communication environment */ > > int (*init_comm_env)(void); > > > > void (*destroy_comm)(void); > > > > /* upstream ch setup (receiving and responding) */ > > int (*init_rx_ch)(int vm_id); > > > > /* downstream ch setup (transmitting and parsing responses) */ > > int (*init_tx_ch)(int vm_id); > > > > int (*send_req)(int vm_id, struct hyper_dmabuf_req *req, int wait); > >}; > > > >All of these can be mapped with any hypervisor specific implementation. > >We designed backend implementation for Xen using grant-table, Xen event > >and ring buffer communication. For ACRN, we have another backend using > >Virt-IO > >for both memory sharing and communication. > > > >We tried to define this structure of backend to make it general enough (or > >it can be even modified or extended to support more cases.) so that it can > >fit to other hypervisor cases. Only requirements/expectation on the > >hypervisor > >are page-level memory sharing and inter-domain communication, which I think > >are standard features of modern hypervisor. > > > >And please review common UAPIs that hyper_dmabuf and xen-zcopy supports. They > >are very general. One is getting FD (dmabuf) and get those shared. The other > >is generating dmabuf from global handle (secure handle hiding gref behind > >it). > >On top of this, hyper_dmabuf has "unshare" and "query" which are also useful > >for any cases. > > > >So I don't know why we wouldn't want to try to make these
Re: [Xen-devel] [PATCH for-4.11] x86: Use spec_ctrl_{enter, exit}_idle() in the S3/S5 path
On 18/04/18 17:14, Jan Beulich wrote: On 18.04.18 at 18:02,wrote: >> On 18/04/18 16:43, Jan Beulich wrote: >> On 18.04.18 at 12:43, wrote: This avoids opencoding the functionality (and missing one bit of it), and and some comments explaining what is going on. >>> Missing which bit of it? The MSR writes aren't strictly necessary afaict, >>> and >>> functionally clearing bti_ist_info is all that's needed for the entry path >>> of >>> interest, while clearing use_shadow_spec_ctrl is all that's needed for the >>> exit-to-Xen path. Hence I'm don't see (yet) what bug it is you think this >>> fixes. >> It is expected that we don't hand off control to the firmware in cases >> like this with the mitigations in place. Here, it doesn't actually >> matter too much, but there is no point having the mitigations in effect >> when we're the final CPU and trying to shut down. > With an improved description the patch can certainly have my R-b; the > code change itself is clearly fine with me. How about: The main purpose of this patch is to avoid opencoding the recovery logic at the end, but also has the positive side effect of relaxing the SPEC_CTRL mitigations when working to shut the final CPU down. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.11] x86: Use spec_ctrl_{enter, exit}_idle() in the S3/S5 path
>>> On 18.04.18 at 18:02,wrote: > On 18/04/18 16:43, Jan Beulich wrote: > On 18.04.18 at 12:43, wrote: >>> This avoids opencoding the functionality (and missing one bit of it), and >>> and >>> some comments explaining what is going on. >> Missing which bit of it? The MSR writes aren't strictly necessary afaict, and >> functionally clearing bti_ist_info is all that's needed for the entry path of >> interest, while clearing use_shadow_spec_ctrl is all that's needed for the >> exit-to-Xen path. Hence I'm don't see (yet) what bug it is you think this >> fixes. > > It is expected that we don't hand off control to the firmware in cases > like this with the mitigations in place. Here, it doesn't actually > matter too much, but there is no point having the mitigations in effect > when we're the final CPU and trying to shut down. With an improved description the patch can certainly have my R-b; the code change itself is clearly fine with me. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 1/9] x86/xpti: avoid copying L4 page table contents when possible
>>> On 18.04.18 at 10:30,wrote: > @@ -160,5 +161,20 @@ unsigned int flush_area_local(const void *va, unsigned > int flags) > > local_irq_restore(irqfl); > > +if ( flags & FLUSH_ROOT_PGTBL ) > +get_cpu_info()->root_pgt_changed = true; > + > return flags; > } > + > +void flush_root_pgt_mask(cpumask_t *mask) const > +{ > +int cpu; unsigned int > +struct cpu_info *cpu_info; Declaration in most narrow possible scope please. > +for_each_cpu(cpu, mask) > +{ > +cpu_info = (struct cpu_info *)(stack_base[cpu] + STACK_SIZE) - 1; > +cpu_info->root_pgt_changed = true; > +} > +} So why is this not sending an IPI to trigger the FLUSH_ROOT_PGTBL processing visible above? Especially the cast you need here looks quite undesirable. Plus I think this is racy (even if quite unlikely to trigger) with a CPU being brought down: What if stack_base[cpu] becomes NULL after you've found the respective bit in mask set. Especially when Xen runs itself virtualized, (almost) arbitrarily long periods of time may pass between any two instructions. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti
>>> On 18.04.18 at 17:54,wrote: > On 18/04/18 17:45, Jan Beulich wrote: > On 18.04.18 at 17:33, wrote: >>> On 18/04/18 17:29, Jan Beulich wrote: >>> On 18.04.18 at 10:30, wrote: > --- a/xen/arch/x86/mm/shadow/multi.c > +++ b/xen/arch/x86/mm/shadow/multi.c > @@ -967,7 +967,7 @@ static int shadow_set_l4e(struct domain *d, > sh_put_ref(d, osl3mfn, paddr); > } > > -if ( !cpu_has_no_xpti ) > +if ( is_pv_domain(d) && d->arch.pv_domain.xpti ) > /* > * Lazy flushing is enough: either we do a TLB flush right > afterwards > * which will pick up the new root page table on all affected > cpus How come the is_pv_domain() is appearing only here? >>> >>> It is mandatory for testing the per-domain xpti flag. I could add it in >>> patch 1 already if you like that better. >> >> Well, if you added it earlier, some unnecessary IPIs would be suppressed >> right away. > > Which IPIs? There is no IPI involved here. We are just setting the flags > from the current cpu for all cpus which need to picḱ it up. That's the > reason for the comment regarding "lazy flushing". Oh, I did assume flush_root_pgt_mask() would be a function that I had seen in prior versions, invoking the processing of FLUSH_ROOT_PGTBL on remote CPUs. Clearly with the introduction of a new function you should have dropped my R-b; I'll comment there. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.11] x86: Use spec_ctrl_{enter, exit}_idle() in the S3/S5 path
On 18/04/18 16:43, Jan Beulich wrote: On 18.04.18 at 12:43,wrote: >> This avoids opencoding the functionality (and missing one bit of it), and and >> some comments explaining what is going on. > Missing which bit of it? The MSR writes aren't strictly necessary afaict, and > functionally clearing bti_ist_info is all that's needed for the entry path of > interest, while clearing use_shadow_spec_ctrl is all that's needed for the > exit-to-Xen path. Hence I'm don't see (yet) what bug it is you think this > fixes. > > Also s/and$/add/. I hoped noone else would notice. Already found and fixed. It is expected that we don't hand off control to the firmware in cases like this with the mitigations in place. Here, it doesn't actually matter too much, but there is no point having the mitigations in effect when we're the final CPU and trying to shut down. > >> --- a/xen/arch/x86/acpi/power.c >> +++ b/xen/arch/x86/acpi/power.c >> @@ -213,7 +213,8 @@ static int enter_state(u32 state) >> error = 0; >> >> ci = get_cpu_info(); >> -ci->use_shadow_spec_ctrl = 0; >> +spec_ctrl_enter_idle(ci); >> +/* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */ >> ci->bti_ist_info = 0; >> >> ACPI_FLUSH_CPU_CACHE(); >> @@ -257,10 +258,9 @@ static int enter_state(u32 state) >> if ( !recheck_cpu_features(0) ) >> panic("Missing previously available feature(s)."); >> >> +/* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */ >> ci->bti_ist_info = default_bti_ist_info; >> -asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) >> - :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0) >> - : "memory"); >> +spec_ctrl_exit_idle(ci); > The use of "idle" methods is a slight abuse here (fundamentally "idle" is > about C states), but yes, perhaps in a way a system in S3/S5 can be > viewed sort of (in a heavyweight way) idle. That is a matter of interpretation of "idle" and the use here is fine. Either way, the behaviour WRT MSR_SPEC_CTRL is identical whether we are talking about C states or S states, and the purpose of the change is to not opencode the recovery logic. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5] x86/setup: properly update PTEs if src/dst overlaps when relocating Xen image
>>> On 18.04.18 at 12:26,wrote: > @@ -1019,6 +1020,12 @@ void __init noreturn __start_xen(unsigned long mbi_p) > bootsym(trampoline_xen_phys_start) = e; > > /* > + * All PTEs with PFNs above pte_update_limit > + * were updated earlier. Skip them. > + */ > +pte_update_limit = PFN_DOWN(e + XEN_IMG_OFFSET); I don't understand the comment: No PTE updates happen before this point afaict. It is just that PTEs pointing above that address are not candidates for relocation. I think the comment should at least mention the overlap scenario your trying to deal with, with the important point being that there may actually be PTEs pointing into [e, e + XEN_IMG_OFFSET). The actual code adjustments look fine to me now, albeit I wonder whether >= wouldn't be more appropriate to use. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti
>>> On 18.04.18 at 17:33,wrote: > On 18/04/18 17:29, Jan Beulich wrote: > On 18.04.18 at 10:30, wrote: >>> --- a/xen/arch/x86/mm/shadow/multi.c >>> +++ b/xen/arch/x86/mm/shadow/multi.c >>> @@ -967,7 +967,7 @@ static int shadow_set_l4e(struct domain *d, >>> sh_put_ref(d, osl3mfn, paddr); >>> } >>> >>> -if ( !cpu_has_no_xpti ) >>> +if ( is_pv_domain(d) && d->arch.pv_domain.xpti ) >>> /* >>> * Lazy flushing is enough: either we do a TLB flush right >>> afterwards >>> * which will pick up the new root page table on all affected cpus >> >> How come the is_pv_domain() is appearing only here? > > It is mandatory for testing the per-domain xpti flag. I could add it in > patch 1 already if you like that better. Well, if you added it earlier, some unnecessary IPIs would be suppressed right away. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.11] x86: Use spec_ctrl_{enter, exit}_idle() in the S3/S5 path
>>> On 18.04.18 at 12:43,wrote: > This avoids opencoding the functionality (and missing one bit of it), and and > some comments explaining what is going on. Missing which bit of it? The MSR writes aren't strictly necessary afaict, and functionally clearing bti_ist_info is all that's needed for the entry path of interest, while clearing use_shadow_spec_ctrl is all that's needed for the exit-to-Xen path. Hence I'm don't see (yet) what bug it is you think this fixes. Also s/and$/add/. > --- a/xen/arch/x86/acpi/power.c > +++ b/xen/arch/x86/acpi/power.c > @@ -213,7 +213,8 @@ static int enter_state(u32 state) > error = 0; > > ci = get_cpu_info(); > -ci->use_shadow_spec_ctrl = 0; > +spec_ctrl_enter_idle(ci); > +/* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */ > ci->bti_ist_info = 0; > > ACPI_FLUSH_CPU_CACHE(); > @@ -257,10 +258,9 @@ static int enter_state(u32 state) > if ( !recheck_cpu_features(0) ) > panic("Missing previously available feature(s)."); > > +/* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */ > ci->bti_ist_info = default_bti_ist_info; > -asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) > - :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0) > - : "memory"); > +spec_ctrl_exit_idle(ci); The use of "idle" methods is a slight abuse here (fundamentally "idle" is about C states), but yes, perhaps in a way a system in S3/S5 can be viewed sort of (in a heavyweight way) idle. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 9/9] xen/x86: use PCID feature
On 18/04/18 17:32, Jan Beulich wrote: On 18.04.18 at 11:37,wrote: >> On 18/04/18 11:13, Jan Beulich wrote: >> On 18.04.18 at 10:30, wrote: Avoid flushing the complete TLB when switching %cr3 for mitigation of Meltdown by using the PCID feature if available. We are using 4 PCID values for a 64 bit pv domain subject to XPTI and 2 values for the non-XPTI case: - guest active and in kernel mode - guest active and in user mode - hypervisor active and guest in user mode (XPTI only) - hypervisor active and guest in kernel mode (XPTI only) We use PCID only if PCID _and_ INVPCID are supported. With PCID in use we disable global pages in cr4. A command line parameter controls in which cases PCID is being used. As the non-XPTI case has shown not to perform better with PCID at least on some machines the default is to use PCID only for domains subject to XPTI. With PCID enabled we always disable global pages. This avoids having to either flush the complete TLB or do a cycle through all PCID values when invalidating a single global page. Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich --- V6.1: - address some minor comments (Jan Beulich) >>> >>> No v7 changes? Afaict ... >>> @@ -717,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d, update_cr3(v); /* We run on dom0's page tables for the final part of the build >> process. */ -switch_cr3_cr4(v->arch.cr3, read_cr4()); +switch_cr3_cr4(cr3_pa(v->arch.cr3), read_cr4()); >>> >>> ... at least this was added. And to be honest I'm not convinced cr3_pa() is >>> the right construct to use here: It's neither clear why the other bits don't >>> matter at this point, nor why any of the bits that you mask out this way >>> need masking out in the first place (e.g. why, in the crash case that Andrew >>> had observed, the noflush bit was wrongly set). >> >> At this point in time we only want to use another cr3 value. So PCIDs >> should _not_ be used as this would require adapting cr4, resulting in >> writing the cr3_pa() bits only. It would have been possible to use >> write_cr3() here, but only together with open coding a TLB flush in >> order to avoid any global pages remaining in the TLB. So I decided to >> use switch_cr3_cr4(). >> >> The noflush bit was set as dom0 is subject to XPTI and PCID usage and >> update_cr3() is updating v->arch.cr3 accordingly. >> >> I know you are not fond of setting noflush in v->arch.cr3. The only >> sensible alternative to that would be adding something like >> v->arch.cr3_pcid_bits being or-ed to the cr3 value in restore_all_guest. >> This would let us get rid of having to use cr3_pa() in some places >> while adding a (very little) performance penalty. > > Well, I had accepted the current approach as the (performance wise) > better alternative, but with Andrew sharing that original concern of mine, > and with there having been a first real problem resulting from that > approach, more seriously considering the alternative certainly seems > necessary. Andrew? I'm fine with both variants. I just need to know which one will be accepted. :-) Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 9/9] xen/x86: use PCID feature
>>> On 18.04.18 at 11:37,wrote: > On 18/04/18 11:13, Jan Beulich wrote: > On 18.04.18 at 10:30, wrote: >>> Avoid flushing the complete TLB when switching %cr3 for mitigation of >>> Meltdown by using the PCID feature if available. >>> >>> We are using 4 PCID values for a 64 bit pv domain subject to XPTI and >>> 2 values for the non-XPTI case: >>> >>> - guest active and in kernel mode >>> - guest active and in user mode >>> - hypervisor active and guest in user mode (XPTI only) >>> - hypervisor active and guest in kernel mode (XPTI only) >>> >>> We use PCID only if PCID _and_ INVPCID are supported. With PCID in use >>> we disable global pages in cr4. A command line parameter controls in >>> which cases PCID is being used. >>> >>> As the non-XPTI case has shown not to perform better with PCID at least >>> on some machines the default is to use PCID only for domains subject to >>> XPTI. >>> >>> With PCID enabled we always disable global pages. This avoids having to >>> either flush the complete TLB or do a cycle through all PCID values >>> when invalidating a single global page. >>> >>> Signed-off-by: Juergen Gross >>> Reviewed-by: Jan Beulich >>> --- >>> V6.1: >>> - address some minor comments (Jan Beulich) >> >> No v7 changes? Afaict ... >> >>> @@ -717,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d, >>> update_cr3(v); >>> >>> /* We run on dom0's page tables for the final part of the build > process. */ >>> -switch_cr3_cr4(v->arch.cr3, read_cr4()); >>> +switch_cr3_cr4(cr3_pa(v->arch.cr3), read_cr4()); >> >> ... at least this was added. And to be honest I'm not convinced cr3_pa() is >> the right construct to use here: It's neither clear why the other bits don't >> matter at this point, nor why any of the bits that you mask out this way >> need masking out in the first place (e.g. why, in the crash case that Andrew >> had observed, the noflush bit was wrongly set). > > At this point in time we only want to use another cr3 value. So PCIDs > should _not_ be used as this would require adapting cr4, resulting in > writing the cr3_pa() bits only. It would have been possible to use > write_cr3() here, but only together with open coding a TLB flush in > order to avoid any global pages remaining in the TLB. So I decided to > use switch_cr3_cr4(). > > The noflush bit was set as dom0 is subject to XPTI and PCID usage and > update_cr3() is updating v->arch.cr3 accordingly. > > I know you are not fond of setting noflush in v->arch.cr3. The only > sensible alternative to that would be adding something like > v->arch.cr3_pcid_bits being or-ed to the cr3 value in restore_all_guest. > This would let us get rid of having to use cr3_pa() in some places > while adding a (very little) performance penalty. Well, I had accepted the current approach as the (performance wise) better alternative, but with Andrew sharing that original concern of mine, and with there having been a first real problem resulting from that approach, more seriously considering the alternative certainly seems necessary. Andrew? >> I hope there aren't any other such hidden changes in this version of the >> series. > > Oh sorry, I missed to update the history. > > I'm not aware of other changes in v8 (apart from patch 1 and the > resulting needed change in patch 3). Ah - I hadn't spotted that follow-on change, and I've just sent a small comment in its regard. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 0/5] ALSA: xen-front: Add Xen para-virtualized frontend driver
On 04/16/2018 09:24 AM, Oleksandr Andrushchenko wrote: From: Oleksandr AndrushchenkoPlease note: this patch series depends on [3]. The dependency is now merged into Xen kernel tree [4] for-linus-4.17 This patch series adds support for Xen [1] para-virtualized sound frontend driver. It implements the protocol from include/xen/interface/io/sndif.h with the following limitations: - mute/unmute is not supported - get/set volume is not supported Volume control is not supported for the reason that most of the use-cases (at the moment) are based on scenarious where unprivileged OS (e.g. Android, AGL etc) use software mixers. Both capture and playback are supported. Corresponding backend, implemented as a user-space application, can be found at [2]. Thank you, Oleksandr Changes since v1: * 1. Moved driver from sound/drivers to sound/xen 2. Coding style changes to better meet Linux Kernel 3. Added explicit back and front synchronization In order to provide explicit synchronization between backend and frontend the following changes are introduced in the protocol: - add new ring buffer for sending asynchronous events from backend to frontend to report number of bytes played by the frontend (XENSND_EVT_CUR_POS) - introduce trigger events for playback control: start/stop/pause/resume - add "req-" prefix to event-channel and ring-ref to unify naming of the Xen event channels for requests and events 4. Added explicit back and front parameter negotiation In order to provide explicit stream parameter negotiation between backend and frontend the following changes are introduced in the protocol: add XENSND_OP_HW_PARAM_QUERY request to read/update configuration space for the parameters given: request passes desired parameter's intervals/masks and the response to this request returns allowed min/max intervals/masks to be used. [1] https://xenproject.org/ [2] https://github.com/xen-troops/snd_be [3] https://lkml.org/lkml/2018/4/12/522 Oleksandr Andrushchenko (5): ALSA: xen-front: Introduce Xen para-virtualized sound frontend driver ALSA: xen-front: Read sound driver configuration from Xen store ALSA: xen-front: Implement Xen event channel handling ALSA: xen-front: Implement handling of shared buffers ALSA: xen-front: Implement ALSA virtual sound driver sound/Kconfig | 2 + sound/Makefile| 2 +- sound/xen/Kconfig | 10 + sound/xen/Makefile| 9 + sound/xen/xen_snd_front.c | 410 +++ sound/xen/xen_snd_front.h | 57 +++ sound/xen/xen_snd_front_alsa.c| 830 ++ sound/xen/xen_snd_front_alsa.h| 23 ++ sound/xen/xen_snd_front_cfg.c | 517 sound/xen/xen_snd_front_cfg.h | 46 +++ sound/xen/xen_snd_front_evtchnl.c | 478 ++ sound/xen/xen_snd_front_evtchnl.h | 92 + sound/xen/xen_snd_front_shbuf.c | 193 + sound/xen/xen_snd_front_shbuf.h | 36 ++ 14 files changed, 2704 insertions(+), 1 deletion(-) create mode 100644 sound/xen/Kconfig create mode 100644 sound/xen/Makefile create mode 100644 sound/xen/xen_snd_front.c create mode 100644 sound/xen/xen_snd_front.h create mode 100644 sound/xen/xen_snd_front_alsa.c create mode 100644 sound/xen/xen_snd_front_alsa.h create mode 100644 sound/xen/xen_snd_front_cfg.c create mode 100644 sound/xen/xen_snd_front_cfg.h create mode 100644 sound/xen/xen_snd_front_evtchnl.c create mode 100644 sound/xen/xen_snd_front_evtchnl.h create mode 100644 sound/xen/xen_snd_front_shbuf.c create mode 100644 sound/xen/xen_snd_front_shbuf.h [4] https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=for-linus-4.17=cd6e992b3aab072cc90839508aaf5573c8f7e066 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] Input: xen-kbdfront - allow better run-time configuration
From: Oleksandr AndrushchenkoIt is now only possible to control if multi-touch virtual device is created or not (via the corresponding XenStore entries), but keyboard and pointer devices are always created. In some cases this is not desirable. For example, if virtual keyboard device is exposed to Android then the latter won't automatically show on-screen keyboard as it expects that a physical keyboard device can be used for typing. Make it possible to configure which virtual devices are created with module parameters: - no_ptr_dev=1 if no pointer device needs to be created - no_kbd_dev=1 if no keyboard device needs to be created Keep old behavior by default. Signed-off-by: Oleksandr Andrushchenko Suggested-by: Andrii Chepurnyi Tested-by: Andrii Chepurnyi --- drivers/input/misc/xen-kbdfront.c | 159 +- 1 file changed, 92 insertions(+), 67 deletions(-) diff --git a/drivers/input/misc/xen-kbdfront.c b/drivers/input/misc/xen-kbdfront.c index d91f3b1c5375..a3306aad40b0 100644 --- a/drivers/input/misc/xen-kbdfront.c +++ b/drivers/input/misc/xen-kbdfront.c @@ -51,6 +51,16 @@ module_param_array(ptr_size, int, NULL, 0444); MODULE_PARM_DESC(ptr_size, "Pointing device width, height in pixels (default 800,600)"); +static unsigned int no_ptr_dev; +module_param(no_ptr_dev, uint, 0); +MODULE_PARM_DESC(no_ptr_dev, + "If set then no virtual pointing device exposed to the guest"); + +static unsigned int no_kbd_dev; +module_param(no_kbd_dev, uint, 0); +MODULE_PARM_DESC(no_kbd_dev, + "If set then no virtual keyboard device exposed to the guest"); + static int xenkbd_remove(struct xenbus_device *); static int xenkbd_connect_backend(struct xenbus_device *, struct xenkbd_info *); static void xenkbd_disconnect_backend(struct xenkbd_info *); @@ -63,6 +73,9 @@ static void xenkbd_disconnect_backend(struct xenkbd_info *); static void xenkbd_handle_motion_event(struct xenkbd_info *info, struct xenkbd_motion *motion) { + if (unlikely(!info->ptr)) + return; + input_report_rel(info->ptr, REL_X, motion->rel_x); input_report_rel(info->ptr, REL_Y, motion->rel_y); if (motion->rel_z) @@ -73,6 +86,9 @@ static void xenkbd_handle_motion_event(struct xenkbd_info *info, static void xenkbd_handle_position_event(struct xenkbd_info *info, struct xenkbd_position *pos) { + if (unlikely(!info->ptr)) + return; + input_report_abs(info->ptr, ABS_X, pos->abs_x); input_report_abs(info->ptr, ABS_Y, pos->abs_y); if (pos->rel_z) @@ -97,6 +113,9 @@ static void xenkbd_handle_key_event(struct xenkbd_info *info, return; } + if (unlikely(!dev)) + return; + input_event(dev, EV_KEY, key->keycode, value); input_sync(dev); } @@ -192,7 +211,7 @@ static int xenkbd_probe(struct xenbus_device *dev, const struct xenbus_device_id *id) { int ret, i; - unsigned int abs, touch; + unsigned int touch; struct xenkbd_info *info; struct input_dev *kbd, *ptr, *mtouch; @@ -211,24 +230,6 @@ static int xenkbd_probe(struct xenbus_device *dev, if (!info->page) goto error_nomem; - /* Set input abs params to match backend screen res */ - abs = xenbus_read_unsigned(dev->otherend, - XENKBD_FIELD_FEAT_ABS_POINTER, 0); - ptr_size[KPARAM_X] = xenbus_read_unsigned(dev->otherend, - XENKBD_FIELD_WIDTH, - ptr_size[KPARAM_X]); - ptr_size[KPARAM_Y] = xenbus_read_unsigned(dev->otherend, - XENKBD_FIELD_HEIGHT, - ptr_size[KPARAM_Y]); - if (abs) { - ret = xenbus_write(XBT_NIL, dev->nodename, - XENKBD_FIELD_REQ_ABS_POINTER, "1"); - if (ret) { - pr_warn("xenkbd: can't request abs-pointer\n"); - abs = 0; - } - } - touch = xenbus_read_unsigned(dev->nodename, XENKBD_FIELD_FEAT_MTOUCH, 0); if (touch) { @@ -241,60 +242,84 @@ static int xenkbd_probe(struct xenbus_device *dev, } /* keyboard */ - kbd = input_allocate_device(); - if (!kbd) - goto error_nomem; - kbd->name = "Xen Virtual Keyboard"; - kbd->phys = info->phys; - kbd->id.bustype = BUS_PCI; - kbd->id.vendor = 0x5853; - kbd->id.product = 0x; - - __set_bit(EV_KEY, kbd->evbit); - for (i = KEY_ESC; i < KEY_UNKNOWN; i++) -
[Xen-devel] [PATCH V2] x86/p2m: fixed p2m_change_type_range() start / end check
p2m_change_type_range() handles end > max_mapped_pfn, but not start > max_mapped_pfn. Check the latter just after grabbing the lock and bail if true. Signed-off-by: Razvan CojocaruSuggested-by: George Dunlap --- Changes since V1: - Added ASSERT()s. - Wrapped the new condition in an unlikely(). --- xen/arch/x86/mm/p2m.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index c53cab4..e09b256 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -978,8 +978,19 @@ void p2m_change_type_range(struct domain *d, p2m_lock(p2m); p2m->defer_nested_flush = 1; +ASSERT(start < end); + +if ( unlikely(start > p2m->max_mapped_pfn) ) +{ +ASSERT(!p2m_is_hostp2m(p2m)); +p2m_unlock(p2m); +return; +} + if ( unlikely(end > p2m->max_mapped_pfn) ) { +ASSERT(end == ~0UL || !p2m_is_hostp2m(p2m)); + if ( !gfn ) { p2m->change_entry_type_global(p2m, ot, nt); -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
On 04/18/2018 01:55 PM, Roger Pau Monné wrote: On Wed, Apr 18, 2018 at 01:39:35PM +0300, Oleksandr Andrushchenko wrote: On 04/18/2018 01:18 PM, Paul Durrant wrote: -Original Message- From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf Of Roger Pau Monné Sent: 18 April 2018 11:11 To: Oleksandr AndrushchenkoCc: jgr...@suse.com; Artem Mygaiev ; Dongwon Kim ; airl...@linux.ie; oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri- de...@lists.freedesktop.org; Potrola, MateuszX ; xen-devel@lists.xenproject.org; daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko wrote: On 04/18/2018 10:35 AM, Roger Pau Monné wrote: After speaking with Oleksandr on IRC, I think the main usage of the gntdev extension is to: 1. Create a dma-buf from a set of grant references. 2. Share dma-buf and get a list of grant references. I think this set of operations could be broken into: 1.1 Map grant references into user-space using the gntdev. 1.2 Create a dma-buf out of a set of user-space virtual addresses. 2.1 Map a dma-buf into user-space. 2.2 Get grefs out of the user-space addresses where the dma-buf is mapped. So it seems like what's actually missing is a way to: - Create a dma-buf from a list of user-space virtual addresses. - Allow to map a dma-buf into user-space, so it can then be used with the gntdev. I think this is generic enough that it could be implemented by a device not tied to Xen. AFAICT the hyper_dma guys also wanted something similar to this. Ok, so just to summarize, xen-zcopy/hyper-dmabuf as they are now, are no go from your POV? My opinion is that there seems to be a more generic way to implement this, and thus I would prefer that one. Instead, we have to make all that fancy stuff with VAs <-> device-X and have that device-X driver live out of drivers/xen as it is not a Xen specific driver? That would be my preference if feasible, simply because it can be reused by other use-cases that need to create dma-bufs in user-space. There is a use-case I have: a display unit on my target has a DMA controller which can't do scatter-gather, e.g. it only expects a single starting address of the buffer. In order to create a dma-buf from grefs in this case I allocate memory with dma_alloc_xxx and then balloon pages of the buffer and finally map grefs onto this DMA buffer. This way I can give this shared buffer to the display unit as its bus addresses are contiguous. With the proposed solution (gntdev + device-X) I won't be able to achieve this, as I have no control over from where gntdev/balloon drivers get the pages (even more, those can easily be out of DMA address space of the display unit). Thus, even if implemented, I can't use this approach. In any case I just knew about dma-bufs this morning, there might be things that I'm missing. Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory
>>> On 18.04.18 at 13:39,wrote: > On Wed, Apr 18, 2018 at 02:53:03AM -0600, Jan Beulich wrote: > On 06.12.17 at 08:50, wrote: >>> Each vcpu of hvm guest consumes at least one shadow page. Currently, only > 256 >>> (for hap case) pages are pre-allocated as shadow memory at beginning. It > would >>> run out if guest has more than 256 vcpus and guest creation fails. Bump the >>> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * > HVM_MAX_VCPUS >>> for shadow case. >>> >>> This patch won't lead to more memory consumption for the size of shadow >>> memory >>> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the >>> size >>> of guest memory and the number of vcpus. >> >>I don't understand this: What's the purpose of bumping the values if it won't >>lead >>to higher memory consumption? Afaict there's be higher consumption at least >>transiently. And I don't see why this would need doing independent of the >>intended >>vCPU count in the guest. I guess you want to base your series on top on >>Andrew's >>max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11). > > The situation here is some pages are pre-allocated as P2M page for domain > initialization. After vCPU creation, the total number of P2M page are > adjusted by the domctl interface. Before vCPU creation, this domctl > is unusable for the check in paging_domctl(): > if ( unlikely(d->vcpu == NULL) || unlikely(d->vcpu[0] == NULL) ) Hence my reference to Andrew's series. > When the number of a guest's vCPU is small, the pre-allocated are > enough. But it won't be if the number of vCPU is bigger than 256. Each > vCPU will at least use one P2M page when it is created, seeing > contruct_vmcs()->hap_update_paging_modes(). Hmm, that might be a problem perhaps to be addressed in Andrew's series then, as the implication would be that the amount of shadow/p2m memory also needs to be set right by XEN_DOMCTL_createdomain (which iirc the series doesn't do so far). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory
On 18/04/18 12:39, Chao Gao wrote: > On Wed, Apr 18, 2018 at 02:53:03AM -0600, Jan Beulich wrote: > On 06.12.17 at 08:50,wrote: >>> Each vcpu of hvm guest consumes at least one shadow page. Currently, only >>> 256 >>> (for hap case) pages are pre-allocated as shadow memory at beginning. It >>> would >>> run out if guest has more than 256 vcpus and guest creation fails. Bump the >>> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * >>> HVM_MAX_VCPUS >>> for shadow case. >>> >>> This patch won't lead to more memory consumption for the size of shadow >>> memory >>> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the >>> size >>> of guest memory and the number of vcpus. >> I don't understand this: What's the purpose of bumping the values if it >> won't lead >> to higher memory consumption? Afaict there's be higher consumption at least >> transiently. And I don't see why this would need doing independent of the >> intended >> vCPU count in the guest. I guess you want to base your series on top on >> Andrew's >> max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11). > The situation here is some pages are pre-allocated as P2M page for domain > initialization. After vCPU creation, the total number of P2M page are > adjusted by the domctl interface. Before vCPU creation, this domctl > is unusable for the check in paging_domctl(): > if ( unlikely(d->vcpu == NULL) || unlikely(d->vcpu[0] == NULL) ) > > When the number of a guest's vCPU is small, the pre-allocated are > enough. But it won't be if the number of vCPU is bigger than 256. Each > vCPU will at least use one P2M page when it is created, seeing > contruct_vmcs()->hap_update_paging_modes(). I've also found with XTF tests that the current minimum shadow calculations are insufficient for single-vcpu domains with 128M of ram. I haven't had time fix the algorithm, and XTF is using "shadow_memory=4" as a bodge workaround. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory
On Wed, Apr 18, 2018 at 02:53:03AM -0600, Jan Beulich wrote: On 06.12.17 at 08:50,wrote: >> Each vcpu of hvm guest consumes at least one shadow page. Currently, only 256 >> (for hap case) pages are pre-allocated as shadow memory at beginning. It >> would >> run out if guest has more than 256 vcpus and guest creation fails. Bump the >> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * >> HVM_MAX_VCPUS >> for shadow case. >> >> This patch won't lead to more memory consumption for the size of shadow >> memory >> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the >> size >> of guest memory and the number of vcpus. > >I don't understand this: What's the purpose of bumping the values if it won't >lead >to higher memory consumption? Afaict there's be higher consumption at least >transiently. And I don't see why this would need doing independent of the >intended >vCPU count in the guest. I guess you want to base your series on top on >Andrew's >max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11). The situation here is some pages are pre-allocated as P2M page for domain initialization. After vCPU creation, the total number of P2M page are adjusted by the domctl interface. Before vCPU creation, this domctl is unusable for the check in paging_domctl(): if ( unlikely(d->vcpu == NULL) || unlikely(d->vcpu[0] == NULL) ) When the number of a guest's vCPU is small, the pre-allocated are enough. But it won't be if the number of vCPU is bigger than 256. Each vCPU will at least use one P2M page when it is created, seeing contruct_vmcs()->hap_update_paging_modes(). Thanks Chao ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] mktarball: For qemu upstream, use their scripts/archive-source.sh
On Tue, Apr 17, 2018 at 06:04:37PM +0100, Ian Jackson wrote: > diff --git a/tools/misc/mktarball b/tools/misc/mktarball > index 73282b5..42d5430 100755 > --- a/tools/misc/mktarball > +++ b/tools/misc/mktarball > @@ -29,7 +29,21 @@ mkdir -p $tdir > > git_archive_into $xen_root $tdir/xen-$desc > > -git_archive_into $xen_root/tools/qemu-xen-dir-remote > $tdir/xen-$desc/tools/qemu-xen > +# We can't use git_archive_into with qemu upstream because it uses > +# git-submodules. git-submodules are an inherently broken git feature > +# which should never be used in any circumstance. Unfortunately, qemu I don't think xen.git is the right place to bash the use of git submodule. Especially when there is no way, in xen.git, to find out ahead of time which git trees the xen build system is going to want to download. > +# upstream uses them. Relevantly for us, git archive does not work > +# properly when there are submodules. > +( > +cd $xen_root/tools/qemu-xen-dir-remote > +# if it's not clean, the qemu script will call `git stash' ! It's actually `git stash create`. The script doesn't modify the source worktree. The script seems to be intended to make a tarball of the worktree (so also things not commited yet), and we use it make a tarball of a commit. There is scripts/make-release that is simpler (called via `make dist`), but it probably stuff the tarball with more than we need (roms). > +git --no-pager diff --stat HEAD > +scripts/archive-source.sh $tdir/xen-$desc/tools/qemu-xen.tar > +cd $tdir/xen-$desc/tools > +mkdir qemu-xen > +tar +rm qemu-xen.tar > +) -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
On Wed, Apr 18, 2018 at 02:38:48AM -0600, Jan Beulich wrote: On 06.12.17 at 08:50,wrote: >> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software" >> has the following description: >> >> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI >> System >> Description Tables,” of the Advanced Configuration and Power Interface >> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default >> behavior for BIOS is to pass the control to the operating system with the >> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are >> less >> than 255, and in x2APIC mode if there are any logical processor reporting an >> APIC ID of 255 or greater." > >With this you mean ... > >> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus >> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255, > >">= 255" and "greater than 254" here respectively. Please be precise. Will do. > >> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs. >> APs may compete for the stack, thus a lock is introduced to protect the >> stack. > > > >> --- a/tools/firmware/hvmloader/smp.c >> +++ b/tools/firmware/hvmloader/smp.c >> @@ -26,7 +26,9 @@ >> #define AP_BOOT_EIP 0x1000 >> extern char ap_boot_start[], ap_boot_end[]; >> >> -static int ap_callin, ap_cpuid; >> +static int ap_callin; >> +static int enable_x2apic; >> +static bool lock = 1; >> >> asm ( >> ".text \n" >> @@ -47,7 +49,15 @@ asm ( >> "mov %eax,%ds \n" >> "mov %eax,%es \n" >> "mov %eax,%ss \n" >> -"movl $stack_top,%esp \n" >> +"3: movb $1, %bl \n" >> +"mov $lock,%edx\n" >> +"movzbl %bl,%eax \n" >> +"xchg %al, (%edx) \n" >> +"test %al,%al \n" >> +"je2f\n" >> +"pause \n" >> +"jmp 3b\n" >> +"2: movl $stack_top,%esp \n" > >Please be consistent with suffixes: Either add them only when really needed >(preferred) or add them uniformly everywhere (when permitted of course). >I also don't understand why you need to use %bl here at all. will remove %bl stuff. > >> @@ -68,14 +78,34 @@ asm ( >> ".text \n" >> ); >> >> +unsigned int ap_cpuid(void) > >static > >> +{ >> +if ( !(rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_EXTD) ) >> +{ >> +uint32_t eax, ebx, ecx, edx; >> + >> +cpuid(1, , , , ); >> +return ebx >> 24; >> +} >> +else > >pointless "else" > >> +return rdmsr(MSR_IA32_APICBASE_MSR + (APIC_ID >> 4)); >> +} >> + >> void ap_start(void); /* non-static avoids unused-function compiler warning >> */ >> /*static*/ void ap_start(void) >> { >> -printf(" - CPU%d ... ", ap_cpuid); >> +printf(" - CPU%d ... ", ap_cpuid()); >> cacheattr_init(); >> printf("done.\n"); >> wmb(); >> -ap_callin = 1; >> +ap_callin++; >> + >> +if ( enable_x2apic ) >> +wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) | >> + MSR_IA32_APICBASE_EXTD); >> + >> +/* Release the lock */ >> +asm volatile ( "xchgb %1, %b0" : : "m" (lock), "r" (0) : "memory" ); >> } > >How can you release the lock here - you've not switched off the stack, and >you're about to actually use it (for returning from the function)? "2: movl $stack_top,%esp \n" "movl %esp,%ebp \n" "call ap_start \n" "1: hlt \n" "jmp 1b \n" Yes. I think it would be right to release the lock following the "call ap_start" here. > >> @@ -125,9 +169,15 @@ void smp_initialise(void) >> memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start); >> >> printf("Multiprocessor initialisation:\n"); >> +if ( nr_cpus > MADT_MAX_LOCAL_APIC ) >> +enable_x2apic = 1; >> + >> ap_start(); >> -for ( i = 1; i < nr_cpus; i++ ) >> -boot_cpu(i); >> +if ( nr_cpus > MADT_MAX_LOCAL_APIC ) > >Where does MADT_MAX_LOCAL_APIC come from? I can't find it anywhere, and >hence can't judge (along the lines of my remark on the description) whether >this >is a correct comparison. It is defined in patch 5/8. I know it is a mistake. With regard to this point, I only boot up vCPU-s via broadcasting INIT-STARTUP signal when the number of vCPU-s is greater than a given value, otherwise the previous way will be used. In general, before all APs are up, BSP couldn't know each AP's APIC ID (currently, we know that because APIC ID can be inferred from vcpu_id) and whether there is a vCPU with APIC_ID >= 255. I plan to discard the old way and always use broadcast to boot up APs. And we don't
Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check
On 04/18/2018 11:53 AM, Jan Beulich wrote: On 18.04.18 at 12:23,wrote: >> That is, sensible input looks like: >> * start < end >> * For hostp2ms: >> - start <= max_mapped_pfn >> - either end <= max_mapped_pfn, or end == ~0UL >> (But for altp2ms, start or end > max_mapped_pfn is fine.) > > So I guess that's part of the difficulty I'm having with the original > suggestion: > Why would such input be okay for alternative p2m-s? Because altp2ms use max_mapped_pfn to limit what gets propagated to individual altp2m views when the host p2m is updated. See p2m.c:p2m_altp2m_propagate_change(). The motivation for this patch can be found in marc.info/?i=<8fb5e427-a0e6-79bd-3bae-5fcb6aba4...@bitdefender.com> . In summary: - Razvan had started keeping more altp2m stuff 'in sync' with the hostp2m, including max_mapped_pfn - I suggested that was unnecessary, and would reverse the 'filtering' effect that had obviously been put there intentionally - Razvan posted the above stack trace in which not setting max_mapped_pfn triggered an ASSERT - I suggested "fixing" p2m_change_type_range() to handle start > max_mapped_pfn instead You could argue that this 'filtering' is unnecessary, but at the moment that's the way things work; whether it should be changed is a different discussion. You could also argue that we should be doing the filtering before calling p2m_change_type_range(), but it seems to me that this is the right level to do the filtering. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 6/7] xen/arm: Setup virtual paging for secondary CPUs in non-boot scenario
On 18/04/18 11:45, Mirela Simonovic wrote: On Tue, Apr 17, 2018 at 4:11 PM, Julien Grallwrote: On 17/04/18 13:54, Mirela Simonovic wrote: Hi Julien, Hi, On Wed, Apr 11, 2018 at 5:11 PM, Julien Grall wrote: Hi, On 11/04/18 14:19, Mirela Simonovic wrote: In existing code the paging for secondary CPUs is setup only in boot flow. The setup is triggered from start_xen function after all CPUs are brought online. In other words, the initialization of VTCR_EL2 register is done out of the cpu_up/start_secondary control flow. However, the cpu_up flow should be self-contained - it should fully initialize a secondary CPU, because the cpu_up is used not only to bring a secondary CPU online on boot, but also to hotplug a CPU during the system resume. With this patch the setting of paging is triggered from start_secondary function if the current system state is not boot. This way, the paging will be setup in non-boot scenarios, while the setup in boot scenario remains unchanged. I am afraid that this is not correct. You can't assume that value chosen for VTCR by Xen at boot will fit this new CPU. So you have to check it is fine or park the CPU if there are any issue. This is not a new CPU. This CPU already went through its boot sequence and it reached the resume point because it does fit the value chosen for VTCR by Xen. If it wouldn't fit the chosen value for VTCR it would be parked so it wouldn't participate in suspend/resume. Please let me know if I misunderstood your comment. This is not a new CPU for your use case. However your commit message speak about "non-boot" CPU bring-up. So for me this is more than suspend/resume, it is about bringing-up CPU at any time. As those CPUs can't participate to the decision (it is too late), you need to make sure the VTCR will fit on that CPU. AFAIU the value chosen by Xen for VTCR config has to be common for all online CPUs. Since this value is also used in the resume path I suggest to make global (static in the p2m.c) the 'val' variable which is currently local in setup_virt_paging() and passed as argument to setup_virt_paging_one(). Then setup_virt_paging_one() would not receive an argument. I need to access this value on resume, so I would call setup_virt_paging_one() without argument from start_secondary() if the system state is not boot. This seems to me a bit cleaner compared to what I submitted in this patch, but fundamentally the functionality is the same. You don't need to introduce a static variable it. I believe you can re-create it based on the information we already have in global variables. So what I would do is moving the creation of vtcr value in that function. Using this approach each CPU will need to recalculate the value which is already known prior to executing the function. I believe this is sub-optimal and contrary to existing implementation where only one CPU performs the calculation. The implementation usually evolves with the requirements. In the existing implementation, the value could be calculated on a single pCPU and then scratched afterwards. Is there any benefit of recalculating the value? > Is there any disadvantage of remembering the value into a static variable? I know that it is only a 32-bit value, but I would rather avoid spreading static variable when a value can be recompute in a few steps with what we have. Stefano do you have any opinions? Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
On Wed, Apr 18, 2018 at 01:39:35PM +0300, Oleksandr Andrushchenko wrote: > On 04/18/2018 01:18 PM, Paul Durrant wrote: > > > -Original Message- > > > From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf > > > Of Roger Pau Monné > > > Sent: 18 April 2018 11:11 > > > To: Oleksandr Andrushchenko> > > Cc: jgr...@suse.com; Artem Mygaiev ; > > > Dongwon Kim ; airl...@linux.ie; > > > oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri- > > > de...@lists.freedesktop.org; Potrola, MateuszX > > > ; xen-devel@lists.xenproject.org; > > > daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper > > > > > > Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy > > > helper DRM driver > > > > > > On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko > > > wrote: > > > > On 04/18/2018 10:35 AM, Roger Pau Monné wrote: > > > After speaking with Oleksandr on IRC, I think the main usage of the > > > gntdev extension is to: > > > > > > 1. Create a dma-buf from a set of grant references. > > > 2. Share dma-buf and get a list of grant references. > > > > > > I think this set of operations could be broken into: > > > > > > 1.1 Map grant references into user-space using the gntdev. > > > 1.2 Create a dma-buf out of a set of user-space virtual addresses. > > > > > > 2.1 Map a dma-buf into user-space. > > > 2.2 Get grefs out of the user-space addresses where the dma-buf is > > > mapped. > > > > > > So it seems like what's actually missing is a way to: > > > > > > - Create a dma-buf from a list of user-space virtual addresses. > > > - Allow to map a dma-buf into user-space, so it can then be used with > > > the gntdev. > > > > > > I think this is generic enough that it could be implemented by a > > > device not tied to Xen. AFAICT the hyper_dma guys also wanted > > > something similar to this. > Ok, so just to summarize, xen-zcopy/hyper-dmabuf as they are now, > are no go from your POV? My opinion is that there seems to be a more generic way to implement this, and thus I would prefer that one. > Instead, we have to make all that fancy stuff > with VAs <-> device-X and have that device-X driver live out of drivers/xen > as it is not a Xen specific driver? That would be my preference if feasible, simply because it can be reused by other use-cases that need to create dma-bufs in user-space. In any case I just knew about dma-bufs this morning, there might be things that I'm missing. Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check
>>> On 18.04.18 at 12:23,wrote: > That is, sensible input looks like: > * start < end > * For hostp2ms: > - start <= max_mapped_pfn > - either end <= max_mapped_pfn, or end == ~0UL > (But for altp2ms, start or end > max_mapped_pfn is fine.) So I guess that's part of the difficulty I'm having with the original suggestion: Why would such input be okay for alternative p2m-s? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/cpu: Support a new cpu vendor, which is Shanghai
>>> On 18.04.18 at 12:25,wrote: > [FionaLi] : I am sorry. I understood wrongly. The C000 range are > extensions, which provide some additional feature different from Intel. As > you suggested, we will enable those features in guest OSes and remove these > code from patch. Can we support the CPUID leaves and emulate the MSR with > another submit? With another patch you mean? Yes please - if at all possible put separate sets of changes into separate patches, combined into a series if there are dependencies between them. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/18/2018 01:45 PM, George Dunlap wrote: > On 04/18/2018 09:20 AM, Razvan Cojocaru wrote: >> On 04/17/2018 04:53 PM, George Dunlap wrote: >>> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote: void p2m_init_altp2m_ept(struct domain *d, unsigned int i) { struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); struct ept_data *ept; +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; +p2m->default_access = hostp2m->default_access; +p2m->domain = hostp2m->domain; +p2m->logdirty_ranges = hostp2m->logdirty_ranges; +p2m->global_logdirty = hostp2m->global_logdirty; >>> >>> This would certainly be one approach. But then we'd need to keep a lot >>> more of these things in sync -- for instance, we'd have to have similar >>> code to enable and disable global_logdirty on all active altp2m entries. >>> >>> [...] >>> >>> The other thing that seems to be missing from synchronization is that in >>> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being >>> part of the eptp). The code seems to indicate that this is required for >>> PML (hardware-assisted logdirty), but I don't see anywhere this is set >>> or copied from the host p2m. >>> >>> It might be nice to have a more structured way of keeping all these >>> changes in sync, rather than relying on this open-coding everywhere. >> >> For logdirty_ranges and global_logdirty, I propose that we put them both >> in a dynamically allocated struct, and have all p2ms share a pointer to >> them. That way, all that's required is for the pointer to be set up in >> p2m_init_altp2m_ept() and the actual data will be automatically shared >> between p2ms. If I've read the code correctly, the hostp2m is the last >> to be destroyed and the first to be initialized, so it should work well >> as long as all p2ms synchronize access to logdirty_ranges and >> global_logdirty (which I assume they already do). > > That's an interesting idea; one potential disadvantage is that it would > make locking even more complex than it already is. Enabling / disabling > logdirty isn't a hot path, so looping through the altp2ms shouldn't have > much of a performance impact; *reading* is much more common, so having > to grab an extra set of locks is more likely to have a performance > impact. And it's not clear to me that the complexity of keeping several > copies in sync is that much higher than the complexity of adding Yet > Another MM Lock. > > Those are just initial reactions though -- feel free to explore the > solution space and/or argue otherwise. :-) No, you're right, there's no point in complicating things, I'll just loop over the p2ms. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 6/7] xen/arm: Setup virtual paging for secondary CPUs in non-boot scenario
On Tue, Apr 17, 2018 at 4:11 PM, Julien Grallwrote: > > > On 17/04/18 13:54, Mirela Simonovic wrote: >> >> Hi Julien, > > > Hi, > >> >> On Wed, Apr 11, 2018 at 5:11 PM, Julien Grall >> wrote: >>> >>> Hi, >>> >>> On 11/04/18 14:19, Mirela Simonovic wrote: In existing code the paging for secondary CPUs is setup only in boot flow. The setup is triggered from start_xen function after all CPUs are brought online. In other words, the initialization of VTCR_EL2 register is done out of the cpu_up/start_secondary control flow. However, the cpu_up flow should be self-contained - it should fully initialize a secondary CPU, because the cpu_up is used not only to bring a secondary CPU online on boot, but also to hotplug a CPU during the system resume. With this patch the setting of paging is triggered from start_secondary function if the current system state is not boot. This way, the paging will be setup in non-boot scenarios, while the setup in boot scenario remains unchanged. >>> >>> >>> >>> I am afraid that this is not correct. You can't assume that value chosen >>> for >>> VTCR by Xen at boot will fit this new CPU. So you have to check it is >>> fine >>> or park the CPU if there are any issue. >>> >> >> This is not a new CPU. This CPU already went through its boot sequence >> and it reached the resume point because it does fit the value chosen >> for VTCR by Xen. >> If it wouldn't fit the chosen value for VTCR it would be parked so it >> wouldn't participate in suspend/resume. Please let me know if I >> misunderstood your comment. > > > This is not a new CPU for your use case. However your commit message > speak about "non-boot" CPU bring-up. So for me this is more than > suspend/resume, it is about bringing-up CPU at any time. > > As those CPUs can't participate to the decision (it is too late), you > need to make sure the VTCR will fit on that CPU. > >> >> AFAIU the value chosen by Xen for VTCR config has to be common for all >> online CPUs. Since this value is also used in the resume path I >> suggest to make global (static in the p2m.c) the 'val' variable which >> is currently local in setup_virt_paging() and passed as argument to >> setup_virt_paging_one(). Then setup_virt_paging_one() would not >> receive an argument. >> I need to access this value on resume, so I would call >> setup_virt_paging_one() without argument from start_secondary() if the >> system state is not boot. >> This seems to me a bit cleaner compared to what I submitted in this >> patch, but fundamentally the functionality is the same. > > > You don't need to introduce a static variable it. I believe you can > re-create it based on the information we already have in global > variables. So what I would do is moving the creation of vtcr value in > that function. Using this approach each CPU will need to recalculate the value which is already known prior to executing the function. I believe this is sub-optimal and contrary to existing implementation where only one CPU performs the calculation. Is there any benefit of recalculating the value? Is there any disadvantage of remembering the value into a static variable? Please let me know, I just need to understand whether there is any particular reason why you want to repeat the calculation. If no particular reasons, saving the calculated value is definitely a better approach. Thanks, Mirela > > Cheers, > > -- > Julien Grall > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/18/2018 09:20 AM, Razvan Cojocaru wrote: > On 04/17/2018 04:53 PM, George Dunlap wrote: >> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote: >>> void p2m_init_altp2m_ept(struct domain *d, unsigned int i) >>> { >>> struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; >>> +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); >>> struct ept_data *ept; >>> >>> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; >>> +p2m->default_access = hostp2m->default_access; >>> +p2m->domain = hostp2m->domain; >>> +p2m->logdirty_ranges = hostp2m->logdirty_ranges; >>> +p2m->global_logdirty = hostp2m->global_logdirty; >> >> This would certainly be one approach. But then we'd need to keep a lot >> more of these things in sync -- for instance, we'd have to have similar >> code to enable and disable global_logdirty on all active altp2m entries. >> >> [...] >> >> The other thing that seems to be missing from synchronization is that in >> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being >> part of the eptp). The code seems to indicate that this is required for >> PML (hardware-assisted logdirty), but I don't see anywhere this is set >> or copied from the host p2m. >> >> It might be nice to have a more structured way of keeping all these >> changes in sync, rather than relying on this open-coding everywhere. > > For logdirty_ranges and global_logdirty, I propose that we put them both > in a dynamically allocated struct, and have all p2ms share a pointer to > them. That way, all that's required is for the pointer to be set up in > p2m_init_altp2m_ept() and the actual data will be automatically shared > between p2ms. If I've read the code correctly, the hostp2m is the last > to be destroyed and the first to be initialized, so it should work well > as long as all p2ms synchronize access to logdirty_ranges and > global_logdirty (which I assume they already do). That's an interesting idea; one potential disadvantage is that it would make locking even more complex than it already is. Enabling / disabling logdirty isn't a hot path, so looping through the altp2ms shouldn't have much of a performance impact; *reading* is much more common, so having to grab an extra set of locks is more likely to have a performance impact. And it's not clear to me that the complexity of keeping several copies in sync is that much higher than the complexity of adding Yet Another MM Lock. Those are just initial reactions though -- feel free to explore the solution space and/or argue otherwise. :-) -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.11] x86: Use spec_ctrl_{enter, exit}_idle() in the S3/S5 path
This avoids opencoding the functionality (and missing one bit of it), and and some comments explaining what is going on. Signed-off-by: Andrew Cooper--- CC: Jan Beulich CC: Juergen Gross This is effectively a bugfix of c/s 710a8eb "x86: suppress BTI mitigations around S3 suspend/resume" so should be considered for 4.11 at this point. --- xen/arch/x86/acpi/power.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c index 0763846..bb0d095 100644 --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -213,7 +213,8 @@ static int enter_state(u32 state) error = 0; ci = get_cpu_info(); -ci->use_shadow_spec_ctrl = 0; +spec_ctrl_enter_idle(ci); +/* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */ ci->bti_ist_info = 0; ACPI_FLUSH_CPU_CACHE(); @@ -257,10 +258,9 @@ static int enter_state(u32 state) if ( !recheck_cpu_features(0) ) panic("Missing previously available feature(s)."); +/* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */ ci->bti_ist_info = default_bti_ist_info; -asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) - :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0) - : "memory"); +spec_ctrl_exit_idle(ci); done: spin_debug_enable(); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/cpu: Support a new cpu vendor, which is Shanghai
Jan Thanks for your reply. Answer the following. Best wish! FionaLi -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, April 10, 2018 2:35 PM To: lifang...@126.com; Fiona Li(BJ-RD)Cc: xen-devel@lists.xenproject.org Subject: Re: RE: [PATCH] x86/cpu: Support a new cpu vendor,which is Shanghai >>> "Fiona Li(BJ-RD)" 04/10/18 3:08 AM >>> >> +static void init_shanghai(struct cpuinfo_x86 *c) { uint64_t >> +msr_ace,msr_rng; >> +/* Test for Shanghai Extended CPUID information */ if >> +(cpuid_eax(0xC000) >= 0xC001) { /*Get Shanghai Extended >> +function number */ >> +u32 extented_feature_flags = cpuid_edx(0xC001); >> + >> +/* enable ACE,if support ACE unit */ >> +if(ACE_PRESENT(extented_feature_flags) && >> +!ACE_ENABLED(extented_feature_flags)){ >> +rdmsrl(MSR_ZX_ACE, msr_ace); >> +/* enable ACE */ >> +wrmsrl(MSR_ZX_ACE, (msr_ace | ACE_FCR)); >>> +printk(KERN_INFO "CPU: Enabled ACE h/w crypto\n");ai >> +} >> +/* enable RNG,if support RNG unit */ if >> +(RNG_PRESENT(extented_feature_flags) && >> +!RNG_ENABLED(extented_feature_flags)) { rdmsrl(MSR_ZX_RNG, msr_rng); >> +/* enable RNG */ >> +wrmsrl(MSR_ZX_RNG, msr_rng | RNG_ENABLE); printk(KERN_INFO "CPU: >> +Enabled h/w RNG\n"); } } >> + >> +if (c->x86 == 0x6 && c->x86_model >= 0xf) { >> +c->x86_cache_alignment = c->x86_clflush_size * 2; >> +__set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability); } > >Is there a specification available anywhere for all of the above? >[FionaLi]: Main usage of Zhaoxin platforms in recent years is for >limited and embedded instead of distributed markets. So there is no >such document for public access at present. But, Zhaoxin's x86 CPU is >compatible with Intel x86 architecture. Its instruction sets are >compatible with Intel. Furthermore, Zhaoxin's CPU virtualization tech >and I/O virtualization tech are compatible with Intel VMX and VT-d >respectively. So maybe we can refer to Intel manual at present. Notw how I had said "for all of the above": The extensions are clearly not in the Intel docs, and the cache alignment and TSC properties aren't general x86 attributes either. [FionaLi]: For cache alignment, it is equal to the size of cache line on Zhaoxin platform. Since it has been initialized in early_cpu_detect(), we should remove this modification in next version patch. For X86_FEATURE_CONSTANT_TSC, It is an additional feature of TSC timer. For Shanghai CPU, it should be set according to X86_FEATURE_ITSC feature as Intel in next version patch. Simple CPUID document in attached file is for your reference, the features supported by Shanghai CPU can be got from this doc. >What about guests? How would they know these extensions are available for >their use? >[FionaLi]: My colleagues is committing code to Linux kernel and >windows. Its extensions will be available to guests. That wasn't the point of the question: Even with aware guest OSes you first of all need to make sure guests can actually obtain the respective CPUID leaves. Afaict the C000 range will come out as all zeros for them without you doing something about it. It is also questionable whether blanket enabling of the features for all guests is a good step - I think this should be left to guest OSes (requiring you to properly emulate their MSR accesses within Xen, unless the MSRs can be made directly accessible to guests without security risks). [FionaLi] : I am sorry. I understood wrongly. The C000 range are extensions, which provide some additional feature different from Intel. As you suggested, we will enable those features in guest OSes and remove these code from patch. Can we support the CPUID leaves and emulate the MSR with another submit? 保密声明: 本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。 CONFIDENTIAL NOTE: This email contains confidential or legally privileged information and is for the sole use of its intended recipient. Any unauthorized review, use, copying or forwarding of this email or the content of this email is strictly prohibited. zhaoxin_cpuid_leaf_description.xlsx Description: zhaoxin_cpuid_leaf_description.xlsx ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
On 04/18/2018 01:18 PM, Paul Durrant wrote: -Original Message- From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf Of Roger Pau Monné Sent: 18 April 2018 11:11 To: Oleksandr AndrushchenkoCc: jgr...@suse.com; Artem Mygaiev ; Dongwon Kim ; airl...@linux.ie; oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri- de...@lists.freedesktop.org; Potrola, MateuszX ; xen-devel@lists.xenproject.org; daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko wrote: On 04/18/2018 10:35 AM, Roger Pau Monné wrote: On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote: On 04/17/2018 11:57 PM, Dongwon Kim wrote: On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote: On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote: 3.2 Backend exports dma-buf to xen-front In this case Dom0 pages are shared with DomU. As before, DomU can only write to these pages, not any other page from Dom0, so it can be still considered safe. But, the following must be considered (highlighted in xen-front's Kernel documentation): - If guest domain dies then pages/grants received from the backend cannot be claimed back - think of it as memory lost to Dom0 (won't be used for any other guest) - Misbehaving guest may send too many requests to the backend exhausting its grant references and memory (consider this from security POV). As the backend runs in the trusted domain we also assume that it is trusted as well, e.g. must take measures to prevent DDoS attacks. I cannot parse the above sentence: "As the backend runs in the trusted domain we also assume that it is trusted as well, e.g. must take measures to prevent DDoS attacks." What's the relation between being trusted and protecting from DoS attacks? I mean that we trust the backend that it can prevent Dom0 from crashing in case DomU's frontend misbehaves, e.g. if the frontend sends too many memory requests etc. In any case, all? PV protocols are implemented with the frontend sharing pages to the backend, and I think there's a reason why this model is used, and it should continue to be used. This is the first use-case above. But there are real-world use-cases (embedded in my case) when physically contiguous memory needs to be shared, one of the possible ways to achieve this is to share contiguous memory from Dom0 to DomU (the second use-case above) Having to add logic in the backend to prevent such attacks means that: - We need more code in the backend, which increases complexity and chances of bugs. - Such code/logic could be wrong, thus allowing DoS. You can live without this code at all, but this is then up to backend which may make Dom0 down because of DomU's frontend doing evil things IMO we should design protocols that do not allow such attacks instead of having to defend against them. 4. xen-front/backend/xen-zcopy synchronization 4.1. As I already said in 2) all the inter VM communication happens between xen-front and the backend, xen-zcopy is NOT involved in that. When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues a XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE). This call is synchronous, so xen-front expects that backend does free the buffer pages on return. 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY: - closes all dumb handles/fd's of the buffer according to [3] - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen- zcopy to make sure the buffer is freed (think of it as it waits for dma-buf->release callback) So this zcopy thing keeps some kind of track of the memory usage? Why can't the user-space backend keep track of the buffer usage? Because there is no dma-buf UAPI which allows to track the buffer life cycle (e.g. wait until dma-buf's .release callback is called) - replies to xen-front that the buffer can be destroyed. This way deletion of the buffer happens synchronously on both Dom0 and DomU sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out error (BTW, wait time is a parameter of this IOCTL), Xen will defer grant reference removal and will retry later until those are free. Hope this helps understand how buffers are synchronously deleted in case of xen-zcopy with a single protocol command. I think the above logic can also be re-used by the hyper-dmabuf driver with some additional work: 1. xen-zcopy can be split into 2 parts and extend: 1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and vise versa, I don't know much about the dma-buf implementation in Linux, but gntdev is a user-space device, and AFAICT
Re: [Xen-devel] [PATCH 6/7] xen/arm: Setup virtual paging for secondary CPUs in non-boot scenario
Hi Julien, On Wed, Apr 18, 2018 at 11:48 AM, Julien Grallwrote: > > > On 17/04/18 16:22, Mirela Simonovic wrote: >> >> Hi Julien, >> >> On Tue, Apr 17, 2018 at 4:11 PM, Julien Grall >> wrote: >>> >>> >>> >>> On 17/04/18 13:54, Mirela Simonovic wrote: Hi Julien, >>> >>> >>> >>> Hi, >>> On Wed, Apr 11, 2018 at 5:11 PM, Julien Grall wrote: > > > Hi, > > On 11/04/18 14:19, Mirela Simonovic wrote: >> >> >> >> In existing code the paging for secondary CPUs is setup only in boot >> flow. >> The setup is triggered from start_xen function after all CPUs are >> brought >> online. In other words, the initialization of VTCR_EL2 register is >> done >> out of the cpu_up/start_secondary control flow. However, the cpu_up >> flow >> should be self-contained - it should fully initialize a secondary CPU, >> because the cpu_up is used not only to bring a secondary CPU online on >> boot, but also to hotplug a CPU during the system resume. >> With this patch the setting of paging is triggered from >> start_secondary >> function if the current system state is not boot. This way, the paging >> will be setup in non-boot scenarios, while the setup in boot scenario >> remains unchanged. > > > > > I am afraid that this is not correct. You can't assume that value > chosen > for > VTCR by Xen at boot will fit this new CPU. So you have to check it is > fine > or park the CPU if there are any issue. > This is not a new CPU. This CPU already went through its boot sequence and it reached the resume point because it does fit the value chosen for VTCR by Xen. If it wouldn't fit the chosen value for VTCR it would be parked so it wouldn't participate in suspend/resume. Please let me know if I misunderstood your comment. >>> >>> >>> >>> This is not a new CPU for your use case. However your commit message >>> speak about "non-boot" CPU bring-up. So for me this is more than >>> suspend/resume, it is about bringing-up CPU at any time. >>> >> >> Use case you're trying to cover is hotplugging a CPU after the boot is >> done in bit.LITTLE system, and that CPU wasn't initially brought >> online (on boot). Right? > > > That's right. It is how I understood your commit title. > >> >>> As those CPUs can't participate to the decision (it is too late), you >>> need to make sure the VTCR will fit on that CPU. >>> >> >> Could you please point me to the location in sources where this is >> done on boot? I mean checking compliance with chosen VTCR value and >> parking CPU if it doesn't fit. > > > At the moment all CPUs are brought up during Xen boot. So what we do is find > the small PA range that will accommodate all the onlined CPUs (see > setup_virt_paging). > > So there are no parking required at the moment. However, if you start > bringing-up new CPUs after boot. Then you have to check they will be able to > handle the values chosen at boot. > > As you pointed out, this might not be necessary for the suspend/resume case. > So I am not asking you to implement that. However, please make sure the > commit message clearly spell out that you bringing up secondary CPU after > Xen has boot only covers the suspend/resume case. > I'll fix the commit title, thanks. > Cheers, > > -- > Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
On 04/18/2018 01:23 PM, Paul Durrant wrote: -Original Message- From: Oleksandr Andrushchenko [mailto:andr2...@gmail.com] Sent: 18 April 2018 11:21 To: Paul Durrant; Roger Pau Monne Cc: jgr...@suse.com; Artem Mygaiev ; Dongwon Kim ; airl...@linux.ie; oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri- de...@lists.freedesktop.org; Potrola, MateuszX ; xen-devel@lists.xenproject.org; daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver On 04/18/2018 01:18 PM, Paul Durrant wrote: -Original Message- From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf Of Roger Pau Monné Sent: 18 April 2018 11:11 To: Oleksandr Andrushchenko Cc: jgr...@suse.com; Artem Mygaiev ; Dongwon Kim ; airl...@linux.ie; oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri- de...@lists.freedesktop.org; Potrola, MateuszX ; xen-devel@lists.xenproject.org; daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko wrote: On 04/18/2018 10:35 AM, Roger Pau Monné wrote: On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote: On 04/17/2018 11:57 PM, Dongwon Kim wrote: On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote: On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote: 3.2 Backend exports dma-buf to xen-front In this case Dom0 pages are shared with DomU. As before, DomU can only write to these pages, not any other page from Dom0, so it can be still considered safe. But, the following must be considered (highlighted in xen-front's Kernel documentation): - If guest domain dies then pages/grants received from the backend cannot be claimed back - think of it as memory lost to Dom0 (won't be used for any other guest) - Misbehaving guest may send too many requests to the backend exhausting its grant references and memory (consider this from security POV). As the backend runs in the trusted domain we also assume that it is trusted as well, e.g. must take measures to prevent DDoS attacks. I cannot parse the above sentence: "As the backend runs in the trusted domain we also assume that it is trusted as well, e.g. must take measures to prevent DDoS attacks." What's the relation between being trusted and protecting from DoS attacks? I mean that we trust the backend that it can prevent Dom0 from crashing in case DomU's frontend misbehaves, e.g. if the frontend sends too many memory requests etc. In any case, all? PV protocols are implemented with the frontend sharing pages to the backend, and I think there's a reason why this model is used, and it should continue to be used. This is the first use-case above. But there are real-world use-cases (embedded in my case) when physically contiguous memory needs to be shared, one of the possible ways to achieve this is to share contiguous memory from Dom0 to DomU (the second use-case above) Having to add logic in the backend to prevent such attacks means that: - We need more code in the backend, which increases complexity and chances of bugs. - Such code/logic could be wrong, thus allowing DoS. You can live without this code at all, but this is then up to backend which may make Dom0 down because of DomU's frontend doing evil things IMO we should design protocols that do not allow such attacks instead of having to defend against them. 4. xen-front/backend/xen-zcopy synchronization 4.1. As I already said in 2) all the inter VM communication happens between xen-front and the backend, xen-zcopy is NOT involved in that. When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues a XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE). This call is synchronous, so xen-front expects that backend does free the buffer pages on return. 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY: - closes all dumb handles/fd's of the buffer according to [3] - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen- zcopy to make sure the buffer is freed (think of it as it waits for dma-buf->release callback) So this zcopy thing keeps some kind of track of the memory usage? Why can't the user-space backend keep track of the buffer usage? Because there is no dma-buf UAPI which allows to track the buffer life cycle (e.g. wait until dma-buf's .release callback is called) - replies to xen-front that the buffer can be destroyed. This way deletion of the
[Xen-devel] [PATCH v5] x86/setup: properly update PTEs if src/dst overlaps when relocating Xen image
Commit 0d31d16 (x86/setup: do not relocate Xen over current Xen image placement) disallowed src/dst images overlaps when relocating Xen image. Though it deliberately allowed destination region between __image_base__ and (__image_base__ + XEN_IMG_OFFSET) overlaps with the end of source image. And here is the problem. If anything between __page_tables_start and __page_tables_end in source image lands in the overlap then some or even all page table entries may not be updated. This usually means boom in early boot which will be difficult to the investigate. So, I think that we have three choices to fix the issue: - drop XEN_IMG_OFFSET from if ( (end > s) && (end - reloc_size + XEN_IMG_OFFSET >= __pa(_end)) ) - add XEN_IMG_OFFSET to xen_phys_start in PFN_DOWN(xen_phys_start) used in loops as one of conditions, - change PFN_DOWN(xen_phys_start) to PFN_DOWN(xen_remap_end_pfn) proposed in earlier version of this patch. This patch implements the second option. This way we still allow source and destination partial overlap as described above but PTEs are properly updated now. Signed-off-by: Daniel Kiper--- xen/arch/x86/setup.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index b2baee3..040fd03 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -1012,6 +1012,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) l3_pgentry_t *pl3e; l2_pgentry_t *pl2e; int i, j, k; +unsigned long pte_update_limit; /* Select relocation address. */ e = end - reloc_size; @@ -1019,6 +1020,12 @@ void __init noreturn __start_xen(unsigned long mbi_p) bootsym(trampoline_xen_phys_start) = e; /* + * All PTEs with PFNs above pte_update_limit + * were updated earlier. Skip them. + */ +pte_update_limit = PFN_DOWN(e + XEN_IMG_OFFSET); + +/* * Perform relocation to new physical address. * Before doing so we must sync static/global data with main memory * with a barrier(). After this we must *not* modify static/global @@ -1041,7 +1048,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) /* Not present, 1GB mapping, or already relocated? */ if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) || (l3e_get_flags(*pl3e) & _PAGE_PSE) || - (l3e_get_pfn(*pl3e) > PFN_DOWN(xen_phys_start)) ) + (l3e_get_pfn(*pl3e) > pte_update_limit) ) continue; *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) + xen_phys_start); @@ -1051,7 +1058,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) /* Not present, PSE, or already relocated? */ if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) || (l2e_get_flags(*pl2e) & _PAGE_PSE) || - (l2e_get_pfn(*pl2e) > PFN_DOWN(xen_phys_start)) ) + (l2e_get_pfn(*pl2e) > pte_update_limit) ) continue; *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + xen_phys_start); @@ -1075,7 +1082,7 @@ void __init noreturn __start_xen(unsigned long mbi_p) unsigned int flags; if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) || - (l2e_get_pfn(*pl2e) > PFN_DOWN(xen_phys_start)) ) + (l2e_get_pfn(*pl2e) > pte_update_limit) ) continue; if ( !using_2M_mapping() ) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
> -Original Message- > From: Oleksandr Andrushchenko [mailto:andr2...@gmail.com] > Sent: 18 April 2018 11:21 > To: Paul Durrant; Roger Pau Monne > > Cc: jgr...@suse.com; Artem Mygaiev ; > Dongwon Kim ; airl...@linux.ie; > oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri- > de...@lists.freedesktop.org; Potrola, MateuszX > ; xen-devel@lists.xenproject.org; > daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper > > Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy > helper DRM driver > > On 04/18/2018 01:18 PM, Paul Durrant wrote: > >> -Original Message- > >> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On > Behalf > >> Of Roger Pau Monné > >> Sent: 18 April 2018 11:11 > >> To: Oleksandr Andrushchenko > >> Cc: jgr...@suse.com; Artem Mygaiev ; > >> Dongwon Kim ; airl...@linux.ie; > >> oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; > dri- > >> de...@lists.freedesktop.org; Potrola, MateuszX > >> ; xen-devel@lists.xenproject.org; > >> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper > >> > >> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy > >> helper DRM driver > >> > >> On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko > >> wrote: > >>> On 04/18/2018 10:35 AM, Roger Pau Monné wrote: > On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko > >> wrote: > > On 04/17/2018 11:57 PM, Dongwon Kim wrote: > >> On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote: > >>> On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote: > > 3.2 Backend exports dma-buf to xen-front > > > > In this case Dom0 pages are shared with DomU. As before, DomU can > >> only write > > to these pages, not any other page from Dom0, so it can be still > >> considered > > safe. > > But, the following must be considered (highlighted in xen-front's > Kernel > > documentation): > > - If guest domain dies then pages/grants received from the backend > >> cannot > > be claimed back - think of it as memory lost to Dom0 (won't be used > >> for > > any > > other guest) > > - Misbehaving guest may send too many requests to the backend > >> exhausting > > its grant references and memory (consider this from security POV). > >> As the > > backend runs in the trusted domain we also assume that it is > trusted > >> as > > well, > > e.g. must take measures to prevent DDoS attacks. > I cannot parse the above sentence: > > "As the backend runs in the trusted domain we also assume that it is > trusted as well, e.g. must take measures to prevent DDoS attacks." > > What's the relation between being trusted and protecting from DoS > attacks? > >>> I mean that we trust the backend that it can prevent Dom0 > >>> from crashing in case DomU's frontend misbehaves, e.g. > >>> if the frontend sends too many memory requests etc. > In any case, all? PV protocols are implemented with the frontend > sharing pages to the backend, and I think there's a reason why this > model is used, and it should continue to be used. > >>> This is the first use-case above. But there are real-world > >>> use-cases (embedded in my case) when physically contiguous memory > >>> needs to be shared, one of the possible ways to achieve this is > >>> to share contiguous memory from Dom0 to DomU (the second use-case > >> above) > Having to add logic in the backend to prevent such attacks means > that: > > - We need more code in the backend, which increases complexity and > chances of bugs. > - Such code/logic could be wrong, thus allowing DoS. > >>> You can live without this code at all, but this is then up to > >>> backend which may make Dom0 down because of DomU's frontend > doing > >> evil > >>> things > >> IMO we should design protocols that do not allow such attacks instead > >> of having to defend against them. > >> > > 4. xen-front/backend/xen-zcopy synchronization > > > > 4.1. As I already said in 2) all the inter VM communication happens > >> between > > xen-front and the backend, xen-zcopy is NOT involved in that. > > When xen-front wants to destroy a display buffer (dumb/dma-buf) it > >> issues a > > XENDISPL_OP_DBUF_DESTROY command (opposite to > >> XENDISPL_OP_DBUF_CREATE). > > This call is synchronous, so xen-front expects that backend does free > >> the > > buffer pages on return. > > > > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY: > > - closes all dumb handles/fd's of the buffer according to [3]
Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check
On 04/18/2018 08:38 AM, Jan Beulich wrote: On 17.04.18 at 19:16,wrote: >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -976,6 +976,13 @@ void p2m_change_type_range(struct domain *d, >> ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt)); >> >> p2m_lock(p2m); >> + >> +if ( start > p2m->max_mapped_pfn ) >> +{ >> +p2m_unlock(p2m); >> +return; >> +} > > I realize this is what George has suggested, but I still wonder if this is the > right thing to do here: Why is this any more important to check than the > more general start >= end (in which case the assertion in the rangeset > code would also trigger)? Till now the function assumes "sensible" things > to be passed in, but specifically also permitting the [start,~0UL] case > (just in a more generalizer fashion). The problem you're trying to fix here > is something passing in a nonsense range (starting above the valid range). > Yet if we want the function to be immune to nonsense being passed in, I > think start < end is what needs checking for, and that check would then > perhaps better go after end was already adjusted. > > The obvious alternative is for callers to only pass in sane ranges (in which > case adding ASSERT() here may be considered, instead of triggering the > one in the rangeset code). The goal of the fix wasn't to get rid of ASSERTs for "nonsense" input entirely, but to accept input which is in fact "sensible" for altp2m systems. I do agree though, that we should either ASSERT for all "nonsense" input, or handle all such input gracefully. I'm inclined to ASSERT(); in which case maybe it would be best if we put a bunch up front: ASSERT(start < end); if ( start > p2m->max_mapped_pfn) { ASSERT(!p2m_is_hostp2m(p2m)); ... } if ( end > p2m->max_mapped_pfn ) { ASSERT(end == ~0UL || !p2m_is_hostp2m(p2m)); ... That is, sensible input looks like: * start < end * For hostp2ms: - start <= max_mapped_pfn - either end <= max_mapped_pfn, or end == ~0UL (But for altp2ms, start or end > max_mapped_pfn is fine.) -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
On 04/18/2018 01:18 PM, Paul Durrant wrote: -Original Message- From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf Of Roger Pau Monné Sent: 18 April 2018 11:11 To: Oleksandr AndrushchenkoCc: jgr...@suse.com; Artem Mygaiev ; Dongwon Kim ; airl...@linux.ie; oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri- de...@lists.freedesktop.org; Potrola, MateuszX ; xen-devel@lists.xenproject.org; daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko wrote: On 04/18/2018 10:35 AM, Roger Pau Monné wrote: On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote: On 04/17/2018 11:57 PM, Dongwon Kim wrote: On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote: On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote: 3.2 Backend exports dma-buf to xen-front In this case Dom0 pages are shared with DomU. As before, DomU can only write to these pages, not any other page from Dom0, so it can be still considered safe. But, the following must be considered (highlighted in xen-front's Kernel documentation): - If guest domain dies then pages/grants received from the backend cannot be claimed back - think of it as memory lost to Dom0 (won't be used for any other guest) - Misbehaving guest may send too many requests to the backend exhausting its grant references and memory (consider this from security POV). As the backend runs in the trusted domain we also assume that it is trusted as well, e.g. must take measures to prevent DDoS attacks. I cannot parse the above sentence: "As the backend runs in the trusted domain we also assume that it is trusted as well, e.g. must take measures to prevent DDoS attacks." What's the relation between being trusted and protecting from DoS attacks? I mean that we trust the backend that it can prevent Dom0 from crashing in case DomU's frontend misbehaves, e.g. if the frontend sends too many memory requests etc. In any case, all? PV protocols are implemented with the frontend sharing pages to the backend, and I think there's a reason why this model is used, and it should continue to be used. This is the first use-case above. But there are real-world use-cases (embedded in my case) when physically contiguous memory needs to be shared, one of the possible ways to achieve this is to share contiguous memory from Dom0 to DomU (the second use-case above) Having to add logic in the backend to prevent such attacks means that: - We need more code in the backend, which increases complexity and chances of bugs. - Such code/logic could be wrong, thus allowing DoS. You can live without this code at all, but this is then up to backend which may make Dom0 down because of DomU's frontend doing evil things IMO we should design protocols that do not allow such attacks instead of having to defend against them. 4. xen-front/backend/xen-zcopy synchronization 4.1. As I already said in 2) all the inter VM communication happens between xen-front and the backend, xen-zcopy is NOT involved in that. When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues a XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE). This call is synchronous, so xen-front expects that backend does free the buffer pages on return. 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY: - closes all dumb handles/fd's of the buffer according to [3] - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen- zcopy to make sure the buffer is freed (think of it as it waits for dma-buf->release callback) So this zcopy thing keeps some kind of track of the memory usage? Why can't the user-space backend keep track of the buffer usage? Because there is no dma-buf UAPI which allows to track the buffer life cycle (e.g. wait until dma-buf's .release callback is called) - replies to xen-front that the buffer can be destroyed. This way deletion of the buffer happens synchronously on both Dom0 and DomU sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out error (BTW, wait time is a parameter of this IOCTL), Xen will defer grant reference removal and will retry later until those are free. Hope this helps understand how buffers are synchronously deleted in case of xen-zcopy with a single protocol command. I think the above logic can also be re-used by the hyper-dmabuf driver with some additional work: 1. xen-zcopy can be split into 2 parts and extend: 1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and vise versa, I don't know much about the dma-buf implementation in Linux, but gntdev is a user-space device, and AFAICT
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
> -Original Message- > From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf > Of Roger Pau Monné > Sent: 18 April 2018 11:11 > To: Oleksandr Andrushchenko> Cc: jgr...@suse.com; Artem Mygaiev ; > Dongwon Kim ; airl...@linux.ie; > oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri- > de...@lists.freedesktop.org; Potrola, MateuszX > ; xen-devel@lists.xenproject.org; > daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper > > Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy > helper DRM driver > > On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko > wrote: > > On 04/18/2018 10:35 AM, Roger Pau Monné wrote: > > > On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko > wrote: > > > > On 04/17/2018 11:57 PM, Dongwon Kim wrote: > > > > > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote: > > > > > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote: > > > > 3.2 Backend exports dma-buf to xen-front > > > > > > > > In this case Dom0 pages are shared with DomU. As before, DomU can > only write > > > > to these pages, not any other page from Dom0, so it can be still > considered > > > > safe. > > > > But, the following must be considered (highlighted in xen-front's Kernel > > > > documentation): > > > > - If guest domain dies then pages/grants received from the backend > cannot > > > > be claimed back - think of it as memory lost to Dom0 (won't be used > for > > > > any > > > > other guest) > > > > - Misbehaving guest may send too many requests to the backend > exhausting > > > > its grant references and memory (consider this from security POV). > As the > > > > backend runs in the trusted domain we also assume that it is trusted > as > > > > well, > > > > e.g. must take measures to prevent DDoS attacks. > > > I cannot parse the above sentence: > > > > > > "As the backend runs in the trusted domain we also assume that it is > > > trusted as well, e.g. must take measures to prevent DDoS attacks." > > > > > > What's the relation between being trusted and protecting from DoS > > > attacks? > > I mean that we trust the backend that it can prevent Dom0 > > from crashing in case DomU's frontend misbehaves, e.g. > > if the frontend sends too many memory requests etc. > > > In any case, all? PV protocols are implemented with the frontend > > > sharing pages to the backend, and I think there's a reason why this > > > model is used, and it should continue to be used. > > This is the first use-case above. But there are real-world > > use-cases (embedded in my case) when physically contiguous memory > > needs to be shared, one of the possible ways to achieve this is > > to share contiguous memory from Dom0 to DomU (the second use-case > above) > > > Having to add logic in the backend to prevent such attacks means > > > that: > > > > > > - We need more code in the backend, which increases complexity and > > > chances of bugs. > > > - Such code/logic could be wrong, thus allowing DoS. > > You can live without this code at all, but this is then up to > > backend which may make Dom0 down because of DomU's frontend doing > evil > > things > > IMO we should design protocols that do not allow such attacks instead > of having to defend against them. > > > > > 4. xen-front/backend/xen-zcopy synchronization > > > > > > > > 4.1. As I already said in 2) all the inter VM communication happens > between > > > > xen-front and the backend, xen-zcopy is NOT involved in that. > > > > When xen-front wants to destroy a display buffer (dumb/dma-buf) it > issues a > > > > XENDISPL_OP_DBUF_DESTROY command (opposite to > XENDISPL_OP_DBUF_CREATE). > > > > This call is synchronous, so xen-front expects that backend does free > the > > > > buffer pages on return. > > > > > > > > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY: > > > > - closes all dumb handles/fd's of the buffer according to [3] > > > > - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen- > zcopy to make > > > > sure > > > > the buffer is freed (think of it as it waits for dma-buf->release > > > > callback) > > > So this zcopy thing keeps some kind of track of the memory usage? Why > > > can't the user-space backend keep track of the buffer usage? > > Because there is no dma-buf UAPI which allows to track the buffer life cycle > > (e.g. wait until dma-buf's .release callback is called) > > > > - replies to xen-front that the buffer can be destroyed. > > > > This way deletion of the buffer happens synchronously on both Dom0 > and DomU > > > > sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns > with time-out > > > > error > > > > (BTW, wait time is a parameter of this IOCTL), Xen will defer grant > > > > reference > > > > removal and will retry later until those are free. > > > > > > >
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko wrote: > On 04/18/2018 10:35 AM, Roger Pau Monné wrote: > > On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote: > > > On 04/17/2018 11:57 PM, Dongwon Kim wrote: > > > > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote: > > > > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote: > > > 3.2 Backend exports dma-buf to xen-front > > > > > > In this case Dom0 pages are shared with DomU. As before, DomU can only > > > write > > > to these pages, not any other page from Dom0, so it can be still > > > considered > > > safe. > > > But, the following must be considered (highlighted in xen-front's Kernel > > > documentation): > > > - If guest domain dies then pages/grants received from the backend > > > cannot > > > be claimed back - think of it as memory lost to Dom0 (won't be used > > > for > > > any > > > other guest) > > > - Misbehaving guest may send too many requests to the backend exhausting > > > its grant references and memory (consider this from security POV). As > > > the > > > backend runs in the trusted domain we also assume that it is trusted > > > as > > > well, > > > e.g. must take measures to prevent DDoS attacks. > > I cannot parse the above sentence: > > > > "As the backend runs in the trusted domain we also assume that it is > > trusted as well, e.g. must take measures to prevent DDoS attacks." > > > > What's the relation between being trusted and protecting from DoS > > attacks? > I mean that we trust the backend that it can prevent Dom0 > from crashing in case DomU's frontend misbehaves, e.g. > if the frontend sends too many memory requests etc. > > In any case, all? PV protocols are implemented with the frontend > > sharing pages to the backend, and I think there's a reason why this > > model is used, and it should continue to be used. > This is the first use-case above. But there are real-world > use-cases (embedded in my case) when physically contiguous memory > needs to be shared, one of the possible ways to achieve this is > to share contiguous memory from Dom0 to DomU (the second use-case above) > > Having to add logic in the backend to prevent such attacks means > > that: > > > > - We need more code in the backend, which increases complexity and > > chances of bugs. > > - Such code/logic could be wrong, thus allowing DoS. > You can live without this code at all, but this is then up to > backend which may make Dom0 down because of DomU's frontend doing evil > things IMO we should design protocols that do not allow such attacks instead of having to defend against them. > > > 4. xen-front/backend/xen-zcopy synchronization > > > > > > 4.1. As I already said in 2) all the inter VM communication happens > > > between > > > xen-front and the backend, xen-zcopy is NOT involved in that. > > > When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues > > > a > > > XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE). > > > This call is synchronous, so xen-front expects that backend does free the > > > buffer pages on return. > > > > > > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY: > > > - closes all dumb handles/fd's of the buffer according to [3] > > > - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-zcopy to make > > > sure > > > the buffer is freed (think of it as it waits for dma-buf->release > > > callback) > > So this zcopy thing keeps some kind of track of the memory usage? Why > > can't the user-space backend keep track of the buffer usage? > Because there is no dma-buf UAPI which allows to track the buffer life cycle > (e.g. wait until dma-buf's .release callback is called) > > > - replies to xen-front that the buffer can be destroyed. > > > This way deletion of the buffer happens synchronously on both Dom0 and > > > DomU > > > sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out > > > error > > > (BTW, wait time is a parameter of this IOCTL), Xen will defer grant > > > reference > > > removal and will retry later until those are free. > > > > > > Hope this helps understand how buffers are synchronously deleted in case > > > of xen-zcopy with a single protocol command. > > > > > > I think the above logic can also be re-used by the hyper-dmabuf driver > > > with > > > some additional work: > > > > > > 1. xen-zcopy can be split into 2 parts and extend: > > > 1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and > > > vise versa, > > I don't know much about the dma-buf implementation in Linux, but > > gntdev is a user-space device, and AFAICT user-space applications > > don't have any notion of dma buffers. How are such buffers useful for > > user-space? Why can't this just be called memory? > A dma-buf is seen by user-space as a file descriptor and you can > pass it to different drivers then. For example, you can share a buffer > used by
Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti
On 18/04/18 11:42, Sergey Dyasli wrote: > Hi Juergen, > > 2 small requests from me below. > > On Wed, 2018-04-18 at 10:30 +0200, Juergen Gross wrote: >> Instead of switching XPTI globally on or off add a per-domain flag for >> that purpose. This allows to modify the xpti boot parameter to support >> running dom0 without Meltdown mitigations. Using "xpti=nodom0" as boot >> parameter will achieve that. >> >> Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as >> it is pv-domain specific. >> >> Signed-off-by: Juergen Gross>> Reviewed-by: Jan Beulich > > >> diff --git a/docs/misc/xen-command-line.markdown >> b/docs/misc/xen-command-line.markdown >> index b353352adf..d4f758487a 100644 >> --- a/docs/misc/xen-command-line.markdown >> +++ b/docs/misc/xen-command-line.markdown >> @@ -1955,14 +1955,24 @@ clustered mode. The default, given no hint from the >> **FADT**, is cluster >> mode. >> >> ### xpti >> -> `= ` >> +> `= List of [ default | | dom0= | domu= ]` >> >> -> Default: `false` on AMD hardware >> +> Default: `false` on hardware not vulnerable to Meltdown (e.g. AMD) > > Could this line please be changed to: > > `false` on hardware known not to be vulnerable to Meltdown (e.g. AMD) Sure. > >> > Default: `true` everywhere else >> >> Override default selection of whether to isolate 64-bit PV guest page >> tables. >> >> +`true` activates page table isolation even on hardware not vulnerable by >> +Meltdown for all domains. >> + >> +`false` deactivates page table isolation on all systems for all domains. >> + >> +`default` sets the default behaviour. >> + >> +With `dom0` and `domu` it is possible to control page table isolation >> +for dom0 or guest domains only. >> + >> ### xsave >> > `= ` >> > >> diff --git a/xen/include/asm-x86/spec_ctrl.h >> b/xen/include/asm-x86/spec_ctrl.h >> index 5ab4ff3f68..b4fa43269e 100644 >> --- a/xen/include/asm-x86/spec_ctrl.h >> +++ b/xen/include/asm-x86/spec_ctrl.h >> @@ -29,6 +29,10 @@ void init_speculation_mitigations(void); >> extern bool opt_ibpb; >> extern uint8_t default_bti_ist_info; >> >> +extern uint8_t opt_xpti; >> +#define OPT_XPTI_DOM0 0x01 >> +#define OPT_XPTI_DOMU 0x02 >> + >> static inline void init_shadow_spec_ctrl_state(void) >> { >> struct cpu_info *info = get_cpu_info(); > > Could you please also include something like the following: > > @@ -119,8 +122,9 @@ static void __init print_details(enum ind_thunk thunk) > boot_cpu_has(X86_FEATURE_RSB_NATIVE) ? " RSB_NATIVE" : "", > boot_cpu_has(X86_FEATURE_RSB_VMEXIT) ? " RSB_VMEXIT" : ""); > > -printk("XPTI: %s\n", > - boot_cpu_has(X86_FEATURE_NO_XPTI) ? "disabled" : "enabled"); > +printk("XPTI: Dom0 %s, DomU (64-bit PV only) %s\n", > + opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled", > + opt_xpti & OPT_XPTI_DOMU ? "enabled" : "disabled"); > } As stated already in the reply to Jan: I'll use the text: "XPTI (64-bit PV only): Dom0 ...". > (just noticed that commit message also needs update regarding param name) Aah, yes. Thanks, Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti
On 18/04/18 11:49, Jan Beulich wrote: On 18.04.18 at 11:42,wrote: >> On Wed, 2018-04-18 at 10:30 +0200, Juergen Gross wrote: >> @@ -119,8 +122,9 @@ static void __init print_details(enum ind_thunk thunk) >> boot_cpu_has(X86_FEATURE_RSB_NATIVE) ? " RSB_NATIVE" : "", >> boot_cpu_has(X86_FEATURE_RSB_VMEXIT) ? " RSB_VMEXIT" : ""); >> >> -printk("XPTI: %s\n", >> - boot_cpu_has(X86_FEATURE_NO_XPTI) ? "disabled" : "enabled"); >> +printk("XPTI: Dom0 %s, DomU (64-bit PV only) %s\n", >> + opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled", >> + opt_xpti & OPT_XPTI_DOMU ? "enabled" : "disabled"); >> } > > Ah, yes, making this log message more precise is certainly worthwhile. The > placement of "(64-bit PV only)" is a little confusing though: I'd put it > either > before the colon or at the end of the line, as it affects Dom0 as much as > DomU. I'd prefer the former. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 6/7] xen/arm: Setup virtual paging for secondary CPUs in non-boot scenario
On 17/04/18 16:22, Mirela Simonovic wrote: Hi Julien, On Tue, Apr 17, 2018 at 4:11 PM, Julien Grallwrote: On 17/04/18 13:54, Mirela Simonovic wrote: Hi Julien, Hi, On Wed, Apr 11, 2018 at 5:11 PM, Julien Grall wrote: Hi, On 11/04/18 14:19, Mirela Simonovic wrote: In existing code the paging for secondary CPUs is setup only in boot flow. The setup is triggered from start_xen function after all CPUs are brought online. In other words, the initialization of VTCR_EL2 register is done out of the cpu_up/start_secondary control flow. However, the cpu_up flow should be self-contained - it should fully initialize a secondary CPU, because the cpu_up is used not only to bring a secondary CPU online on boot, but also to hotplug a CPU during the system resume. With this patch the setting of paging is triggered from start_secondary function if the current system state is not boot. This way, the paging will be setup in non-boot scenarios, while the setup in boot scenario remains unchanged. I am afraid that this is not correct. You can't assume that value chosen for VTCR by Xen at boot will fit this new CPU. So you have to check it is fine or park the CPU if there are any issue. This is not a new CPU. This CPU already went through its boot sequence and it reached the resume point because it does fit the value chosen for VTCR by Xen. If it wouldn't fit the chosen value for VTCR it would be parked so it wouldn't participate in suspend/resume. Please let me know if I misunderstood your comment. This is not a new CPU for your use case. However your commit message speak about "non-boot" CPU bring-up. So for me this is more than suspend/resume, it is about bringing-up CPU at any time. Use case you're trying to cover is hotplugging a CPU after the boot is done in bit.LITTLE system, and that CPU wasn't initially brought online (on boot). Right? That's right. It is how I understood your commit title. As those CPUs can't participate to the decision (it is too late), you need to make sure the VTCR will fit on that CPU. Could you please point me to the location in sources where this is done on boot? I mean checking compliance with chosen VTCR value and parking CPU if it doesn't fit. At the moment all CPUs are brought up during Xen boot. So what we do is find the small PA range that will accommodate all the onlined CPUs (see setup_virt_paging). So there are no parking required at the moment. However, if you start bringing-up new CPUs after boot. Then you have to check they will be able to handle the values chosen at boot. As you pointed out, this might not be necessary for the suspend/resume case. So I am not asking you to implement that. However, please make sure the commit message clearly spell out that you bringing up secondary CPU after Xen has boot only covers the suspend/resume case. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti
>>> On 18.04.18 at 11:42,wrote: > On Wed, 2018-04-18 at 10:30 +0200, Juergen Gross wrote: > @@ -119,8 +122,9 @@ static void __init print_details(enum ind_thunk thunk) > boot_cpu_has(X86_FEATURE_RSB_NATIVE) ? " RSB_NATIVE" : "", > boot_cpu_has(X86_FEATURE_RSB_VMEXIT) ? " RSB_VMEXIT" : ""); > > -printk("XPTI: %s\n", > - boot_cpu_has(X86_FEATURE_NO_XPTI) ? "disabled" : "enabled"); > +printk("XPTI: Dom0 %s, DomU (64-bit PV only) %s\n", > + opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled", > + opt_xpti & OPT_XPTI_DOMU ? "enabled" : "disabled"); > } Ah, yes, making this log message more precise is certainly worthwhile. The placement of "(64-bit PV only)" is a little confusing though: I'd put it either before the colon or at the end of the line, as it affects Dom0 as much as DomU. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti
Hi Juergen, 2 small requests from me below. On Wed, 2018-04-18 at 10:30 +0200, Juergen Gross wrote: > Instead of switching XPTI globally on or off add a per-domain flag for > that purpose. This allows to modify the xpti boot parameter to support > running dom0 without Meltdown mitigations. Using "xpti=nodom0" as boot > parameter will achieve that. > > Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as > it is pv-domain specific. > > Signed-off-by: Juergen Gross> Reviewed-by: Jan Beulich > diff --git a/docs/misc/xen-command-line.markdown > b/docs/misc/xen-command-line.markdown > index b353352adf..d4f758487a 100644 > --- a/docs/misc/xen-command-line.markdown > +++ b/docs/misc/xen-command-line.markdown > @@ -1955,14 +1955,24 @@ clustered mode. The default, given no hint from the > **FADT**, is cluster > mode. > > ### xpti > -> `= ` > +> `= List of [ default | | dom0= | domu= ]` > > -> Default: `false` on AMD hardware > +> Default: `false` on hardware not vulnerable to Meltdown (e.g. AMD) Could this line please be changed to: `false` on hardware known not to be vulnerable to Meltdown (e.g. AMD) > > Default: `true` everywhere else > > Override default selection of whether to isolate 64-bit PV guest page > tables. > > +`true` activates page table isolation even on hardware not vulnerable by > +Meltdown for all domains. > + > +`false` deactivates page table isolation on all systems for all domains. > + > +`default` sets the default behaviour. > + > +With `dom0` and `domu` it is possible to control page table isolation > +for dom0 or guest domains only. > + > ### xsave > > `= ` > > diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h > index 5ab4ff3f68..b4fa43269e 100644 > --- a/xen/include/asm-x86/spec_ctrl.h > +++ b/xen/include/asm-x86/spec_ctrl.h > @@ -29,6 +29,10 @@ void init_speculation_mitigations(void); > extern bool opt_ibpb; > extern uint8_t default_bti_ist_info; > > +extern uint8_t opt_xpti; > +#define OPT_XPTI_DOM0 0x01 > +#define OPT_XPTI_DOMU 0x02 > + > static inline void init_shadow_spec_ctrl_state(void) > { > struct cpu_info *info = get_cpu_info(); Could you please also include something like the following: @@ -119,8 +122,9 @@ static void __init print_details(enum ind_thunk thunk) boot_cpu_has(X86_FEATURE_RSB_NATIVE) ? " RSB_NATIVE" : "", boot_cpu_has(X86_FEATURE_RSB_VMEXIT) ? " RSB_VMEXIT" : ""); -printk("XPTI: %s\n", - boot_cpu_has(X86_FEATURE_NO_XPTI) ? "disabled" : "enabled"); +printk("XPTI: Dom0 %s, DomU (64-bit PV only) %s\n", + opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled", + opt_xpti & OPT_XPTI_DOMU ? "enabled" : "disabled"); } (just noticed that commit message also needs update regarding param name) -- Thanks, Sergey ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 9/9] xen/x86: use PCID feature
On 18/04/18 11:13, Jan Beulich wrote: On 18.04.18 at 10:30,wrote: >> Avoid flushing the complete TLB when switching %cr3 for mitigation of >> Meltdown by using the PCID feature if available. >> >> We are using 4 PCID values for a 64 bit pv domain subject to XPTI and >> 2 values for the non-XPTI case: >> >> - guest active and in kernel mode >> - guest active and in user mode >> - hypervisor active and guest in user mode (XPTI only) >> - hypervisor active and guest in kernel mode (XPTI only) >> >> We use PCID only if PCID _and_ INVPCID are supported. With PCID in use >> we disable global pages in cr4. A command line parameter controls in >> which cases PCID is being used. >> >> As the non-XPTI case has shown not to perform better with PCID at least >> on some machines the default is to use PCID only for domains subject to >> XPTI. >> >> With PCID enabled we always disable global pages. This avoids having to >> either flush the complete TLB or do a cycle through all PCID values >> when invalidating a single global page. >> >> Signed-off-by: Juergen Gross >> Reviewed-by: Jan Beulich >> --- >> V6.1: >> - address some minor comments (Jan Beulich) > > No v7 changes? Afaict ... > >> @@ -717,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d, >> update_cr3(v); >> >> /* We run on dom0's page tables for the final part of the build >> process. */ >> -switch_cr3_cr4(v->arch.cr3, read_cr4()); >> +switch_cr3_cr4(cr3_pa(v->arch.cr3), read_cr4()); > > ... at least this was added. And to be honest I'm not convinced cr3_pa() is > the right construct to use here: It's neither clear why the other bits don't > matter at this point, nor why any of the bits that you mask out this way > need masking out in the first place (e.g. why, in the crash case that Andrew > had observed, the noflush bit was wrongly set). At this point in time we only want to use another cr3 value. So PCIDs should _not_ be used as this would require adapting cr4, resulting in writing the cr3_pa() bits only. It would have been possible to use write_cr3() here, but only together with open coding a TLB flush in order to avoid any global pages remaining in the TLB. So I decided to use switch_cr3_cr4(). The noflush bit was set as dom0 is subject to XPTI and PCID usage and update_cr3() is updating v->arch.cr3 accordingly. I know you are not fond of setting noflush in v->arch.cr3. The only sensible alternative to that would be adding something like v->arch.cr3_pcid_bits being or-ed to the cr3 value in restore_all_guest. This would let us get rid of having to use cr3_pa() in some places while adding a (very little) performance penalty. > I hope there aren't any other such hidden changes in this version of the > series. Oh sorry, I missed to update the history. I'm not aware of other changes in v8 (apart from patch 1 and the resulting needed change in patch 3). Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2] x86/hpet: Fix possible ASSERT(cpu < nr_cpu_ids)
From: David WangFor the ch->cpumask be cleared by other cpu, cpumask_first() called by hpet_detach_channel() return nr_cpu_ids. That lead an assertion in set_channel_irq_affinity() when cpumask_of() check cpu. Fix this by using a local variable. Signed-off-by: David Wang --- xen/arch/x86/hpet.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c index bc7a851..6b3587a 100644 --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -509,15 +509,18 @@ static void hpet_attach_channel(unsigned int cpu, static void hpet_detach_channel(unsigned int cpu, struct hpet_event_channel *ch) { +cpumask_t cpumask; + spin_lock_irq(>lock); ASSERT(ch == per_cpu(cpu_bc_channel, cpu)); per_cpu(cpu_bc_channel, cpu) = NULL; +cpumask_copy(, ch->cpumask); if ( cpu != ch->cpu ) spin_unlock_irq(>lock); -else if ( cpumask_empty(ch->cpumask) ) +else if ( cpumask_empty() ) { ch->cpu = -1; clear_bit(HPET_EVT_USED_BIT, >flags); @@ -525,7 +528,7 @@ static void hpet_detach_channel(unsigned int cpu, } else { -ch->cpu = cpumask_first(ch->cpumask); +ch->cpu = cpumask_first(); set_channel_irq_affinity(ch); local_irq_enable(); } -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Announcing the Xen Project 4.11 RC and Test Day Schedules
Dear community members, on Tuesday, we created Xen 4.1q RC1 and will release a new release candidate every FRIDAY, until we declare a release candidate as the final candidate and cut the Xen 4.11 release. We will also hold a Test Day every TUESDAY for the release candidate that was released the week prior to the Test Day starting from RC1. Note that RC’s are announced on the following mailing lists: xen-announce, xen-devel and xen-users. This means we will have Test Days coming up on April 24th, May 1st, 8th, 15th and 22nd. Your testing is still valuable on other days, so please feel free to send Test Reports as outlined below at any time. == Getting, Building and Installing a Release Candidate == Release candidates are available from our git repository at git://xenbits.xenproject.org/xen.git (tag 4.11.0-) where is rc1, rc2, rc3, etc. and as tarball from https://downloads.xenproject.org/release/xen/4.11.0-/xen-4.11.0-.tar.gz https://downloads.xenproject.org/release/xen/4.11.0-/xen-4.11.0-.tar.gz.sig Detailed build and Install instructions can be found on the Test Day Wiki. Make sure you check the known issues section of the instructions before trying to download an RC. == Testing new Features, Test and Bug Reports == You can find Test Instructions for new features on our Test Day Wiki and instructions for general tests on Testing Xen. The following pages provide information on how to report successful tests and how to report bugs and issues. Happy Testing! == Resources == * Test Day Wiki: https://wiki.xenproject.org/wiki/Xen_4.11_RC_test_instructions * Known Issues: https://wiki.xenproject.org/wiki/Xen_4.11_RC_test_instructions#Known_issues Best Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 9/9] xen/x86: use PCID feature
>>> On 18.04.18 at 10:30,wrote: > Avoid flushing the complete TLB when switching %cr3 for mitigation of > Meltdown by using the PCID feature if available. > > We are using 4 PCID values for a 64 bit pv domain subject to XPTI and > 2 values for the non-XPTI case: > > - guest active and in kernel mode > - guest active and in user mode > - hypervisor active and guest in user mode (XPTI only) > - hypervisor active and guest in kernel mode (XPTI only) > > We use PCID only if PCID _and_ INVPCID are supported. With PCID in use > we disable global pages in cr4. A command line parameter controls in > which cases PCID is being used. > > As the non-XPTI case has shown not to perform better with PCID at least > on some machines the default is to use PCID only for domains subject to > XPTI. > > With PCID enabled we always disable global pages. This avoids having to > either flush the complete TLB or do a cycle through all PCID values > when invalidating a single global page. > > Signed-off-by: Juergen Gross > Reviewed-by: Jan Beulich > --- > V6.1: > - address some minor comments (Jan Beulich) No v7 changes? Afaict ... > @@ -717,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d, > update_cr3(v); > > /* We run on dom0's page tables for the final part of the build process. > */ > -switch_cr3_cr4(v->arch.cr3, read_cr4()); > +switch_cr3_cr4(cr3_pa(v->arch.cr3), read_cr4()); ... at least this was added. And to be honest I'm not convinced cr3_pa() is the right construct to use here: It's neither clear why the other bits don't matter at this point, nor why any of the bits that you mask out this way need masking out in the first place (e.g. why, in the crash case that Andrew had observed, the noflush bit was wrongly set). I hope there aren't any other such hidden changes in this version of the series. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory
>>> On 06.12.17 at 08:50,wrote: > Each vcpu of hvm guest consumes at least one shadow page. Currently, only 256 > (for hap case) pages are pre-allocated as shadow memory at beginning. It would > run out if guest has more than 256 vcpus and guest creation fails. Bump the > number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * HVM_MAX_VCPUS > for shadow case. > > This patch won't lead to more memory consumption for the size of shadow memory > will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the size > of guest memory and the number of vcpus. I don't understand this: What's the purpose of bumping the values if it won't lead to higher memory consumption? Afaict there's be higher consumption at least transiently. And I don't see why this would need doing independent of the intended vCPU count in the guest. I guess you want to base your series on top on Andrew's max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC Patch v4 6/8] hvmload: Add x2apic entry support in the MADT and SRAT build
>>> On 06.12.17 at 08:50,wrote: > --- a/tools/libacpi/build.c > +++ b/tools/libacpi/build.c > @@ -30,6 +30,11 @@ > > #define align16(sz)(((sz) + 15) & ~15) > #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d)) > +#define min(X, Y) ({ \ > +const typeof (X) _x = (X); \ > +const typeof (Y) _y = (Y); \ > +(void) (&_x == &_y); \ > +(_x < _y) ? _x : _y; }) No new name space violations please: Identifiers starting with a single underscore and a lower case letter are reserved for file scope symbols. Use a trailing underscore instead, for example. > @@ -79,16 +84,19 @@ static struct acpi_20_madt *construct_madt(struct > acpi_ctxt *ctxt, > struct acpi_20_madt_intsrcovr *intsrcovr; > struct acpi_20_madt_ioapic*io_apic; > struct acpi_20_madt_lapic *lapic; > +struct acpi_20_madt_x2apic*x2apic; > const struct hvm_info_table *hvminfo = config->hvminfo; > -int i, sz; > +int i, sz, nr_apic; Apart from your own addition I think both i and sz also want to be unsigned. Please make such corrections when you touch a line anyway. > if ( config->lapic_id == NULL ) > return NULL; > > +nr_apic = min(hvminfo->nr_vcpus, MADT_MAX_LOCAL_APIC); With this the variable name is not well chosen. nr_xapic perhaps? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
>>> On 06.12.17 at 08:50,wrote: > Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software" > has the following description: > > "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System > Description Tables,” of the Advanced Configuration and Power Interface > Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default > behavior for BIOS is to pass the control to the operating system with the > local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less > than 255, and in x2APIC mode if there are any logical processor reporting an > APIC ID of 255 or greater." With this you mean ... > In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus > with APIC ID > 255. To wake up processors whose APIC ID is greater than 255, ">= 255" and "greater than 254" here respectively. Please be precise. > the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs. > APs may compete for the stack, thus a lock is introduced to protect the > stack. > --- a/tools/firmware/hvmloader/smp.c > +++ b/tools/firmware/hvmloader/smp.c > @@ -26,7 +26,9 @@ > #define AP_BOOT_EIP 0x1000 > extern char ap_boot_start[], ap_boot_end[]; > > -static int ap_callin, ap_cpuid; > +static int ap_callin; > +static int enable_x2apic; > +static bool lock = 1; > > asm ( > ".text \n" > @@ -47,7 +49,15 @@ asm ( > "mov %eax,%ds \n" > "mov %eax,%es \n" > "mov %eax,%ss \n" > -"movl $stack_top,%esp \n" > +"3: movb $1, %bl \n" > +"mov $lock,%edx\n" > +"movzbl %bl,%eax \n" > +"xchg %al, (%edx) \n" > +"test %al,%al \n" > +"je2f\n" > +"pause \n" > +"jmp 3b\n" > +"2: movl $stack_top,%esp \n" Please be consistent with suffixes: Either add them only when really needed (preferred) or add them uniformly everywhere (when permitted of course). I also don't understand why you need to use %bl here at all. > @@ -68,14 +78,34 @@ asm ( > ".text \n" > ); > > +unsigned int ap_cpuid(void) static > +{ > +if ( !(rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_EXTD) ) > +{ > +uint32_t eax, ebx, ecx, edx; > + > +cpuid(1, , , , ); > +return ebx >> 24; > +} > +else pointless "else" > +return rdmsr(MSR_IA32_APICBASE_MSR + (APIC_ID >> 4)); > +} > + > void ap_start(void); /* non-static avoids unused-function compiler warning */ > /*static*/ void ap_start(void) > { > -printf(" - CPU%d ... ", ap_cpuid); > +printf(" - CPU%d ... ", ap_cpuid()); > cacheattr_init(); > printf("done.\n"); > wmb(); > -ap_callin = 1; > +ap_callin++; > + > +if ( enable_x2apic ) > +wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) | > + MSR_IA32_APICBASE_EXTD); > + > +/* Release the lock */ > +asm volatile ( "xchgb %1, %b0" : : "m" (lock), "r" (0) : "memory" ); > } How can you release the lock here - you've not switched off the stack, and you're about to actually use it (for returning from the function)? > @@ -125,9 +169,15 @@ void smp_initialise(void) > memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start); > > printf("Multiprocessor initialisation:\n"); > +if ( nr_cpus > MADT_MAX_LOCAL_APIC ) > +enable_x2apic = 1; > + > ap_start(); > -for ( i = 1; i < nr_cpus; i++ ) > -boot_cpu(i); > +if ( nr_cpus > MADT_MAX_LOCAL_APIC ) Where does MADT_MAX_LOCAL_APIC come from? I can't find it anywhere, and hence can't judge (along the lines of my remark on the description) whether this is a correct comparison. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 2/9] xen/x86: add a function for modifying cr3
Instead of having multiple places with more or less identical asm statements just have one function doing a write to cr3. As this function should be named write_cr3() rename the current write_cr3() function to switch_cr3(). Suggested-by: Andrew CopperSigned-off-by: Juergen Gross Reviewed-by: Jan Beulich --- V6: - new patch --- xen/arch/x86/flushtlb.c | 4 ++-- xen/arch/x86/mm.c | 2 +- xen/arch/x86/pv/domain.c| 2 +- xen/common/efi/runtime.c| 4 ++-- xen/include/asm-x86/flushtlb.h | 2 +- xen/include/asm-x86/processor.h | 5 + 6 files changed, 12 insertions(+), 7 deletions(-) diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c index ad19ba0f5d..e36bd6f383 100644 --- a/xen/arch/x86/flushtlb.c +++ b/xen/arch/x86/flushtlb.c @@ -72,7 +72,7 @@ static void post_flush(u32 t) this_cpu(tlbflush_time) = t; } -void write_cr3(unsigned long cr3) +void switch_cr3(unsigned long cr3) { unsigned long flags, cr4; u32 t; @@ -84,7 +84,7 @@ void write_cr3(unsigned long cr3) cr4 = read_cr4(); write_cr4(cr4 & ~X86_CR4_PGE); -asm volatile ( "mov %0, %%cr3" : : "r" (cr3) : "memory" ); +write_cr3(cr3); write_cr4(cr4); post_flush(t); diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 7d960c742e..e245d96a97 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -503,7 +503,7 @@ void make_cr3(struct vcpu *v, mfn_t mfn) void write_ptbase(struct vcpu *v) { get_cpu_info()->root_pgt_changed = true; -write_cr3(v->arch.cr3); +switch_cr3(v->arch.cr3); } /* diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c index b1c40373fa..be40843b05 100644 --- a/xen/arch/x86/pv/domain.c +++ b/xen/arch/x86/pv/domain.c @@ -220,7 +220,7 @@ static void _toggle_guest_pt(struct vcpu *v) get_cpu_info()->root_pgt_changed = true; /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */ -asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" ); +write_cr3(v->arch.cr3); if ( !(v->arch.flags & TF_kernel_mode) ) return; diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c index 3dbc2e8ee5..4e5ddfef4f 100644 --- a/xen/common/efi/runtime.c +++ b/xen/common/efi/runtime.c @@ -111,7 +111,7 @@ struct efi_rs_state efi_rs_enter(void) lgdt(_desc); } -write_cr3(virt_to_maddr(efi_l4_pgtable)); +switch_cr3(virt_to_maddr(efi_l4_pgtable)); return state; } @@ -120,7 +120,7 @@ void efi_rs_leave(struct efi_rs_state *state) { if ( !state->cr3 ) return; -write_cr3(state->cr3); +switch_cr3(state->cr3); if ( is_pv_vcpu(current) && !is_idle_vcpu(current) ) { struct desc_ptr gdt_desc = { diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h index 6bf33a6f5e..c150f82ca2 100644 --- a/xen/include/asm-x86/flushtlb.h +++ b/xen/include/asm-x86/flushtlb.h @@ -84,7 +84,7 @@ static inline unsigned long read_cr3(void) } /* Write pagetable base and implicitly tick the tlbflush clock. */ -void write_cr3(unsigned long cr3); +void switch_cr3(unsigned long cr3); /* flush_* flag fields: */ /* diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index db9988ab33..71d32c0333 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -283,6 +283,11 @@ static inline unsigned long read_cr2(void) return cr2; } +static inline void write_cr3(unsigned long val) +{ +asm volatile ( "mov %0, %%cr3" : : "r" (val) : "memory" ); +} + static inline unsigned long read_cr4(void) { return get_cpu_info()->cr4; -- 2.13.6 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 6/9] xen/x86: use flag byte for decision whether xen_cr3 is valid
Today cpu_info->xen_cr3 is either 0 to indicate %cr3 doesn't need to be switched on entry to Xen, or negative for keeping the value while indicating not to restore %cr3, or positive in case %cr3 is to be restored. Switch to use a flag byte instead of a negative xen_cr3 value in order to allow %cr3 values with the high bit set in case we want to keep TLB entries when using the PCID feature. This reduces the number of branches in interrupt handling and results in better performance (e.g. parallel make of the Xen hypervisor on my system was using about 3% less system time). Signed-off-by: Juergen GrossReviewed-by: Jan Beulich --- V3: - renamed use_xen_cr3 to better fitting use_pv_cr3 - corrected comment regarding semantics of use_pv_cr3 (Jan Beulich) - prefer 32-bit operations over 8- or 16-bit ones (Jan Beulich) --- xen/arch/x86/domain.c | 1 + xen/arch/x86/mm.c | 3 +- xen/arch/x86/smpboot.c | 2 ++ xen/arch/x86/x86_64/asm-offsets.c | 1 + xen/arch/x86/x86_64/compat/entry.S | 5 ++-- xen/arch/x86/x86_64/entry.S| 59 -- xen/include/asm-x86/current.h | 12 +--- 7 files changed, 41 insertions(+), 42 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 9b001a03ec..801ac33810 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1696,6 +1696,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) ASSERT(local_irq_is_enabled()); +get_cpu_info()->use_pv_cr3 = false; get_cpu_info()->xen_cr3 = 0; if ( unlikely(dirty_cpu != cpu) && dirty_cpu != VCPU_CPU_CLEAN ) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 73a38e8715..4997047edf 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -517,7 +517,8 @@ void write_ptbase(struct vcpu *v) } else { -/* Make sure to clear xen_cr3 before pv_cr3. */ +/* Make sure to clear use_pv_cr3 and xen_cr3 before pv_cr3. */ +cpu_info->use_pv_cr3 = false; cpu_info->xen_cr3 = 0; /* switch_cr3_cr4() serializes. */ switch_cr3_cr4(v->arch.cr3, new_cr4); diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 980192e71f..d21020c510 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -324,6 +324,7 @@ void start_secondary(void *unused) */ spin_debug_disable(); +get_cpu_info()->use_pv_cr3 = false; get_cpu_info()->xen_cr3 = 0; get_cpu_info()->pv_cr3 = 0; @@ -1123,6 +1124,7 @@ void __init smp_prepare_boot_cpu(void) per_cpu(scratch_cpumask, cpu) = _cpu0mask; #endif +get_cpu_info()->use_pv_cr3 = false; get_cpu_info()->xen_cr3 = 0; get_cpu_info()->pv_cr3 = 0; } diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c index 9e2aefb00f..7ad024cf37 100644 --- a/xen/arch/x86/x86_64/asm-offsets.c +++ b/xen/arch/x86/x86_64/asm-offsets.c @@ -144,6 +144,7 @@ void __dummy__(void) OFFSET(CPUINFO_use_shadow_spec_ctrl, struct cpu_info, use_shadow_spec_ctrl); OFFSET(CPUINFO_bti_ist_info, struct cpu_info, bti_ist_info); OFFSET(CPUINFO_root_pgt_changed, struct cpu_info, root_pgt_changed); +OFFSET(CPUINFO_use_pv_cr3, struct cpu_info, use_pv_cr3); DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info)); BLANK(); diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S index ae2bb4bf1e..b909977e33 100644 --- a/xen/arch/x86/x86_64/compat/entry.S +++ b/xen/arch/x86/x86_64/compat/entry.S @@ -210,10 +210,9 @@ ENTRY(cstar_enter) GET_STACK_END(bx) mov STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx -neg %rcx +test %rcx, %rcx jz.Lcstar_cr3_okay -mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx) -neg %rcx +movb $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx) mov %rcx, %cr3 movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx) .Lcstar_cr3_okay: diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S index 5f0758d64f..8b7d1c48ad 100644 --- a/xen/arch/x86/x86_64/entry.S +++ b/xen/arch/x86/x86_64/entry.S @@ -154,6 +154,7 @@ restore_all_guest: rep movsq .Lrag_copy_done: mov %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx) +movb $1, STACK_CPUINFO_FIELD(use_pv_cr3)(%rdx) mov %rax, %cr3 .Lrag_keep_cr3: @@ -202,14 +203,9 @@ restore_all_xen: * case we return to late PV exit code (from an NMI or #MC). */ GET_STACK_END(bx) -mov STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rdx +cmpb $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx) +UNLIKELY_START(ne, exit_cr3) mov STACK_CPUINFO_FIELD(pv_cr3)(%rbx), %rax -test %rdx, %rdx -/* - * Ideally the condition would be "nsz", but such doesn't exist, - * so "g" will have to do. - */ -UNLIKELY_START(g, exit_cr3) mov %rax, %cr3
[Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti
Instead of switching XPTI globally on or off add a per-domain flag for that purpose. This allows to modify the xpti boot parameter to support running dom0 without Meltdown mitigations. Using "xpti=nodom0" as boot parameter will achieve that. Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as it is pv-domain specific. Signed-off-by: Juergen GrossReviewed-by: Jan Beulich --- V6.1: - address some minor comments (Jan Beulich) V6: - modify xpti boot parameter options (Andrew Cooper) - move xpti_init() code to spec_ctrl.c (Andrew Cooper) - irework init of per-domain xpti flag (Andrew Cooper) V3: - latch get_cpu_info() return value in variable (Jan Beulich) - call always xpti_domain_init() for pv dom0 (Jan Beulich) - add __init annotations (Jan Beulich) - drop per domain XPTI message (Jan Beulich) - document xpti=default support (Jan Beulich) - move domain xpti flag into a padding hole (Jan Beulich) --- docs/misc/xen-command-line.markdown | 14 ++-- xen/arch/x86/mm.c | 17 +++-- xen/arch/x86/mm/shadow/multi.c | 2 +- xen/arch/x86/pv/dom0_build.c| 1 + xen/arch/x86/pv/domain.c| 6 xen/arch/x86/setup.c| 19 -- xen/arch/x86/smpboot.c | 4 +-- xen/arch/x86/spec_ctrl.c| 70 + xen/include/asm-x86/current.h | 3 +- xen/include/asm-x86/domain.h| 3 ++ xen/include/asm-x86/spec_ctrl.h | 4 +++ 11 files changed, 116 insertions(+), 27 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index b353352adf..d4f758487a 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1955,14 +1955,24 @@ clustered mode. The default, given no hint from the **FADT**, is cluster mode. ### xpti -> `= ` +> `= List of [ default | | dom0= | domu= ]` -> Default: `false` on AMD hardware +> Default: `false` on hardware not vulnerable to Meltdown (e.g. AMD) > Default: `true` everywhere else Override default selection of whether to isolate 64-bit PV guest page tables. +`true` activates page table isolation even on hardware not vulnerable by +Meltdown for all domains. + +`false` deactivates page table isolation on all systems for all domains. + +`default` sets the default behaviour. + +With `dom0` and `domu` it is possible to control page table isolation +for dom0 or guest domains only. + ### xsave > `= ` diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index e245d96a97..9c36614099 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -502,8 +502,21 @@ void make_cr3(struct vcpu *v, mfn_t mfn) void write_ptbase(struct vcpu *v) { -get_cpu_info()->root_pgt_changed = true; -switch_cr3(v->arch.cr3); +struct cpu_info *cpu_info = get_cpu_info(); + +if ( is_pv_vcpu(v) && v->domain->arch.pv_domain.xpti ) +{ +cpu_info->root_pgt_changed = true; +cpu_info->pv_cr3 = __pa(this_cpu(root_pgt)); +switch_cr3(v->arch.cr3); +} +else +{ +/* Make sure to clear xen_cr3 before pv_cr3; switch_cr3() serializes. */ +cpu_info->xen_cr3 = 0; +switch_cr3(v->arch.cr3); +cpu_info->pv_cr3 = 0; +} } /* diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 5e779baa53..5a9242de71 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -967,7 +967,7 @@ static int shadow_set_l4e(struct domain *d, sh_put_ref(d, osl3mfn, paddr); } -if ( !cpu_has_no_xpti ) +if ( is_pv_domain(d) && d->arch.pv_domain.xpti ) /* * Lazy flushing is enough: either we do a TLB flush right afterwards * which will pick up the new root page table on all affected cpus diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index 5b4325b87f..d148395919 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -387,6 +387,7 @@ int __init dom0_construct_pv(struct domain *d, if ( compat32 ) { d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 1; +d->arch.pv_domain.xpti = false; v->vcpu_info = (void *)>shared_info->compat.vcpu_info[0]; if ( setup_compat_arg_xlat(v) != 0 ) BUG(); diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c index be40843b05..ce1a1a9d35 100644 --- a/xen/arch/x86/pv/domain.c +++ b/xen/arch/x86/pv/domain.c @@ -9,6 +9,7 @@ #include #include +#include #include static void noreturn continue_nonidle_domain(struct vcpu *v) @@ -75,6 +76,8 @@ int switch_compat(struct domain *d) d->arch.x87_fip_width = 4; +d->arch.pv_domain.xpti = false; + return 0; undo_and_fail: @@ -205,6 +208,9 @@ int pv_domain_initialise(struct domain *d) /* 64-bit PV guest by default. */ d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0; +
[Xen-devel] [PATCH v8 9/9] xen/x86: use PCID feature
Avoid flushing the complete TLB when switching %cr3 for mitigation of Meltdown by using the PCID feature if available. We are using 4 PCID values for a 64 bit pv domain subject to XPTI and 2 values for the non-XPTI case: - guest active and in kernel mode - guest active and in user mode - hypervisor active and guest in user mode (XPTI only) - hypervisor active and guest in kernel mode (XPTI only) We use PCID only if PCID _and_ INVPCID are supported. With PCID in use we disable global pages in cr4. A command line parameter controls in which cases PCID is being used. As the non-XPTI case has shown not to perform better with PCID at least on some machines the default is to use PCID only for domains subject to XPTI. With PCID enabled we always disable global pages. This avoids having to either flush the complete TLB or do a cycle through all PCID values when invalidating a single global page. Signed-off-by: Juergen GrossReviewed-by: Jan Beulich --- V6.1: - address some minor comments (Jan Beulich) V6: - split off pv_guest_cr4_to_real_cr4() conversion to function into new patch (Andrew Cooper) - changed some comments (Jan Beulich, Andrew Cooper) V5: - use X86_CR3_ADDR_MASK instead of ~X86_CR3_PCID_MASK (Jan Beulich) - add some const qualifiers (Jan Beulich) - mask X86_CR3_ADDR_MASK with PADDR_MASK (Jan Beulich) - add flushing the TLB from old PCID related entries in write_cr3_cr4() (Jan Beulich) V4: - add cr3 mask for page table address and use that in dbg_pv_va2mfn() (Jan Beulich) - use invpcid_flush_all_nonglobals() instead of invpcid_flush_all() (Jan Beulich) - use PCIDs 0/1 when running in Xen or without XPTI, 2/3 with XPTI in guest (Jan Beulich) - ASSERT cr4.pge and cr4.pcide are never active at the same time (Jan Beulich) - make pv_guest_cr4_to_real_cr4() a real function V3: - support PCID for non-XPTI case, too - add command line parameter for controlling usage of PCID - check PCID active by using cr4.pcide (Jan Beulich) --- docs/misc/xen-command-line.markdown | 14 +++ xen/arch/x86/flushtlb.c | 47 - xen/arch/x86/mm.c | 16 +++- xen/arch/x86/pv/dom0_build.c| 3 +- xen/arch/x86/pv/domain.c| 81 - xen/include/asm-x86/domain.h| 4 +- xen/include/asm-x86/processor.h | 3 ++ xen/include/asm-x86/pv/domain.h | 31 ++ 8 files changed, 192 insertions(+), 7 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 451a4fa566..f8950a3bb2 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1451,6 +1451,20 @@ All numbers specified must be hexadecimal ones. This option can be specified more than once (up to 8 times at present). +### pcid (x86) +> `= | xpti=` + +> Default: `xpti` + +> Can be modified at runtime (change takes effect only for domains created + afterwards) + +If available, control usage of the PCID feature of the processor for +64-bit pv-domains. PCID can be used either for no domain at all (`false`), +for all of them (`true`), only for those subject to XPTI (`xpti`) or for +those not subject to XPTI (`no-xpti`). The feature is used only in case +INVPCID is supported and not disabled via `invpcid=false`. + ### ple\_gap > `= ` diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c index d48b900100..bd9a36924a 100644 --- a/xen/arch/x86/flushtlb.c +++ b/xen/arch/x86/flushtlb.c @@ -13,6 +13,7 @@ #include #include #include +#include /* Debug builds: Wrap frequently to stress-test the wrap logic. */ #ifdef NDEBUG @@ -94,6 +95,7 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4) { unsigned long flags, old_cr4; u32 t; +unsigned long old_pcid = cr3_pcid(read_cr3()); /* This non-reentrant function is sometimes called in interrupt context. */ local_irq_save(flags); @@ -103,14 +105,34 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4) old_cr4 = read_cr4(); if ( old_cr4 & X86_CR4_PGE ) { +/* + * X86_CR4_PGE set means PCID is inactive. + * We have to purge the TLB via flipping cr4.pge. + */ old_cr4 = cr4 & ~X86_CR4_PGE; write_cr4(old_cr4); } +else if ( use_invpcid ) +/* + * Flushing the TLB via INVPCID is necessary only in case PCIDs are + * in use, which is true only with INVPCID being available. + * Without PCID usage the following write_cr3() will purge the TLB + * (we are in the cr4.pge off path) of all entries. + * Using invpcid_flush_all_nonglobals() seems to be faster than + * invpcid_flush_all(), so use that. + */ +invpcid_flush_all_nonglobals(); write_cr3(cr3); if ( old_cr4 != cr4 ) write_cr4(cr4); +else if ( old_pcid != cr3_pcid(cr3) ) +/* + * Make sure
[Xen-devel] [PATCH v8 0/9] xen/x86: various XPTI speedups
This patch series aims at reducing the overhead of the XPTI Meltdown mitigation. Patch 1 had been posted before, the main changes in this patch are due to addressing Jan's comments on my first version. The main objective of that patch is to avoid copying the L4 page table each time the guest is being activated, as often the contents didn't change while the hypervisor was active. Patch 2 adds a new helper for writing cr3 instead of open coding the inline assembly in multiple places. Patch 3 sets the stage for being able to activate XPTI per domain. As a first step it is now possible to switch XPTI off for dom0 via the xpti boot parameter. Patch 4 adds support for using the INVPCID instruction for flushing the TLB. Patch 5 reduces the costs of TLB flushes even further: as we don't make any use of global TLB entries with XPTI being active we can avoid removing all global TLB entries on TLB flushes by simply deactivating the global pages in CR4. Patch 6 prepares using PCIDs in patch 6. For that purpose it was necessary to allow CR3 values with bit 63 set in order to avoid flushing TLB entries when writing CR3. This requires a modification of Jan's rather clever state machine with positive and negative CR3 values for the hypervisor by using a dedicated flag byte instead. Patch 7 converts pv_guest_cr4_to_real_cr4() from a macro to a function as it was becoming more and more complex. Patch 8 adds some PCID helper functions for accessing the different parts of cr3 (address and pcid part). Patch 9 is the main performance contributor: by making use of the PCID feature (if available) TLB entries can survive CR3 switches. The TLB needs to be flushed on context switches only and not when switching between guest and hypervisor or guest kernel and user mode. On my machine (Intel i7-4600M) using the PCID feature in the non-XPTI case showed a slightly worse performance than using global pages instead (using PCID and global pages is a bad idea as invalidating global pages in this case would need a complete TLB flush). For this reason I've decided to use PCID for XPTI only as the default. That can easily be changed by using the command line parameter "pcid=true". The complete series has been verified to still mitigate against Meltdown attacks. A simple performance test (make -j 4 in the Xen hypervisor directory) showed significant improvements compared to the state without this series. Numbers are seconds, stddev in braces. xpti=false elapsed system user unpatched: 88.42 ( 2.01) 94.49 ( 1.38) 180.40 ( 1.41) patched : 89.45 ( 3.10) 96.47 ( 3.22) 181.34 ( 1.98) xpti=true elapsed system user unpatched: 113.43 ( 3.68) 165.44 ( 4.41) 183.30 ( 1.72) patched : 92.76 ( 2.11) 103.39 ( 1.13) 184.86 ( 0.12) Changes since last version: - patch 1: set root_pgt_changed flag on other cpus, too, when changing a shadow L4 entry - patch 3: shadow code needs to check xpti flag now due to change in patch 1 Juergen Gross (9): x86/xpti: avoid copying L4 page table contents when possible xen/x86: add a function for modifying cr3 xen/x86: support per-domain flag for xpti xen/x86: use invpcid for flushing the TLB xen/x86: disable global pages for domains with XPTI active xen/x86: use flag byte for decision whether xen_cr3 is valid xen/x86: convert pv_guest_cr4_to_real_cr4() to a function xen/x86: add some cr3 helpers xen/x86: use PCID feature docs/misc/xen-command-line.markdown | 37 +++- xen/arch/x86/cpu/mtrr/generic.c | 37 xen/arch/x86/debug.c| 2 +- xen/arch/x86/domain.c | 6 +- xen/arch/x86/domain_page.c | 2 +- xen/arch/x86/flushtlb.c | 111 ++-- xen/arch/x86/mm.c | 86 +++- xen/arch/x86/mm/shadow/multi.c | 10 xen/arch/x86/pv/dom0_build.c| 8 ++- xen/arch/x86/pv/domain.c| 89 - xen/arch/x86/setup.c| 27 +++-- xen/arch/x86/smp.c | 2 +- xen/arch/x86/smpboot.c | 6 +- xen/arch/x86/spec_ctrl.c| 70 +++ xen/arch/x86/x86_64/asm-offsets.c | 2 + xen/arch/x86/x86_64/compat/entry.S | 5 +- xen/arch/x86/x86_64/entry.S | 78 +++-- xen/common/efi/runtime.c| 4 +- xen/include/asm-x86/current.h | 23 ++-- xen/include/asm-x86/domain.h| 17 +++--- xen/include/asm-x86/flushtlb.h | 7 ++- xen/include/asm-x86/invpcid.h | 2 + xen/include/asm-x86/processor.h | 18 ++ xen/include/asm-x86/pv/domain.h | 31 ++ xen/include/asm-x86/spec_ctrl.h | 4 ++ xen/include/asm-x86/x86-defns.h | 4 +- 26 files changed, 543 insertions(+), 145 deletions(-) -- 2.13.6 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org
[Xen-devel] [PATCH v8 4/9] xen/x86: use invpcid for flushing the TLB
If possible use the INVPCID instruction for flushing the TLB instead of toggling cr4.pge for that purpose. While at it remove the dependency on cr4.pge being required for mtrr loading, as this will be required later anyway. Add a command line option "invpcid" for controlling the use of INVPCID (default to true). Signed-off-by: Juergen GrossReviewed-by: Jan Beulich --- V6: - reword invpcid parameter description (Andrew Cooper) - add __read_mostly to use_invpcid definition (Andrew Cooper) V5: - use pre_flush() as an initializer in do_tlb_flush() (Jan Beulich) - introduce boolean use_invpcid instead of clearing X86_FEATURE_INVPCID (Jan Beulich) V4: - option "invpcid" instead of "noinvpcid" (Jan Beulich) V3: - new patch --- docs/misc/xen-command-line.markdown | 9 + xen/arch/x86/cpu/mtrr/generic.c | 37 ++--- xen/arch/x86/flushtlb.c | 29 +++-- xen/arch/x86/setup.c| 8 xen/include/asm-x86/invpcid.h | 2 ++ 5 files changed, 64 insertions(+), 21 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index d4f758487a..451a4fa566 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1380,6 +1380,15 @@ Because responsibility for APIC setup is shared between Xen and the domain 0 kernel this option is automatically propagated to the domain 0 command line. +### invpcid (x86) +> `= ` + +> Default: `true` + +By default, Xen will use the INVPCID instruction for TLB management if +it is available. This option can be used to cause Xen to fall back to +older mechanisms, which are generally slower. + ### noirqbalance > `= ` diff --git a/xen/arch/x86/cpu/mtrr/generic.c b/xen/arch/x86/cpu/mtrr/generic.c index e9c0e5e059..7ba0c3f0fe 100644 --- a/xen/arch/x86/cpu/mtrr/generic.c +++ b/xen/arch/x86/cpu/mtrr/generic.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -400,8 +401,10 @@ static DEFINE_SPINLOCK(set_atomicity_lock); * has been called. */ -static void prepare_set(void) +static bool prepare_set(void) { + unsigned long cr4; + /* Note that this is not ideal, since the cache is only flushed/disabled for this CPU while the MTRRs are changed, but changing this requires more invasive changes to the way the kernel boots */ @@ -412,18 +415,24 @@ static void prepare_set(void) write_cr0(read_cr0() | X86_CR0_CD); wbinvd(); - /* TLB flushing here relies on Xen always using CR4.PGE. */ - BUILD_BUG_ON(!(XEN_MINIMAL_CR4 & X86_CR4_PGE)); - write_cr4(read_cr4() & ~X86_CR4_PGE); + cr4 = read_cr4(); + if (cr4 & X86_CR4_PGE) + write_cr4(cr4 & ~X86_CR4_PGE); + else if (use_invpcid) + invpcid_flush_all(); + else + write_cr3(read_cr3()); /* Save MTRR state */ rdmsrl(MSR_MTRRdefType, deftype); /* Disable MTRRs, and set the default type to uncached */ mtrr_wrmsr(MSR_MTRRdefType, deftype & ~0xcff); + + return cr4 & X86_CR4_PGE; } -static void post_set(void) +static void post_set(bool pge) { /* Intel (P6) standard MTRRs */ mtrr_wrmsr(MSR_MTRRdefType, deftype); @@ -432,7 +441,12 @@ static void post_set(void) write_cr0(read_cr0() & ~X86_CR0_CD); /* Reenable CR4.PGE (also flushes the TLB) */ - write_cr4(read_cr4() | X86_CR4_PGE); + if (pge) + write_cr4(read_cr4() | X86_CR4_PGE); + else if (use_invpcid) + invpcid_flush_all(); + else + write_cr3(read_cr3()); spin_unlock(_atomicity_lock); } @@ -441,14 +455,15 @@ static void generic_set_all(void) { unsigned long mask, count; unsigned long flags; + bool pge; local_irq_save(flags); - prepare_set(); + pge = prepare_set(); /* Actually set the state */ mask = set_mtrr_state(); - post_set(); + post_set(pge); local_irq_restore(flags); /* Use the atomic bitops to update the global mask */ @@ -457,7 +472,6 @@ static void generic_set_all(void) set_bit(count, _changes_mask); mask >>= 1; } - } static void generic_set_mtrr(unsigned int reg, unsigned long base, @@ -474,11 +488,12 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base, { unsigned long flags; struct mtrr_var_range *vr; + bool pge; vr = _state.var_ranges[reg]; local_irq_save(flags); - prepare_set(); + pge = prepare_set(); if (size == 0) { /* The invalid bit is kept in the mask, so we simply clear the @@ -499,7 +514,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
[Xen-devel] [PATCH v8 7/9] xen/x86: convert pv_guest_cr4_to_real_cr4() to a function
pv_guest_cr4_to_real_cr4() is becoming more and more complex. Convert it from a macro to an ordinary function. Signed-off-by: Juergen GrossReviewed-by: Jan Beulich --- V6: - new patch, split off from (old) patch 7 (Andrew Cooper) --- xen/arch/x86/mm.c| 14 ++ xen/include/asm-x86/domain.h | 11 ++- 2 files changed, 16 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 4997047edf..6aa0c34bae 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -500,6 +500,20 @@ void make_cr3(struct vcpu *v, mfn_t mfn) v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT; } +unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v) +{ +const struct domain *d = v->domain; +unsigned long cr4; + +cr4 = v->arch.pv_vcpu.ctrlreg[4] & ~X86_CR4_DE; +cr4 |= mmu_cr4_features & (X86_CR4_PSE | X86_CR4_SMEP | X86_CR4_SMAP | + X86_CR4_OSXSAVE | X86_CR4_FSGSBASE); +cr4 |= d->arch.pv_domain.xpti ? 0 : X86_CR4_PGE; +cr4 |= d->arch.vtsc ? X86_CR4_TSD : 0; + +return cr4; +} + void write_ptbase(struct vcpu *v) { struct cpu_info *cpu_info = get_cpu_info(); diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index b7894dc8c8..9627058cd0 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -615,15 +615,8 @@ void vcpu_show_registers(const struct vcpu *); unsigned long pv_guest_cr4_fixup(const struct vcpu *, unsigned long guest_cr4); /* Convert between guest-visible and real CR4 values. */ -#define pv_guest_cr4_to_real_cr4(v) \ -(((v)->arch.pv_vcpu.ctrlreg[4] \ - | (mmu_cr4_features \ - & (X86_CR4_PSE | X86_CR4_SMEP |\ -X86_CR4_SMAP | X86_CR4_OSXSAVE |\ -X86_CR4_FSGSBASE)) \ - | ((v)->domain->arch.pv_domain.xpti ? 0 : X86_CR4_PGE) \ - | ((v)->domain->arch.vtsc ? X86_CR4_TSD : 0)) \ - & ~X86_CR4_DE) +unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v); + #define real_cr4_to_pv_guest_cr4(c) \ ((c) & ~(X86_CR4_PGE | X86_CR4_PSE | X86_CR4_TSD | \ X86_CR4_OSXSAVE | X86_CR4_SMEP | \ -- 2.13.6 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 8/9] xen/x86: add some cr3 helpers
Add some helper macros to access the address and pcid parts of cr3. Use those helpers where appropriate. Signed-off-by: Juergen GrossReviewed-by: Jan Beulich --- V6: - new patch (Andrew Cooper) --- xen/arch/x86/debug.c| 2 +- xen/arch/x86/domain_page.c | 2 +- xen/include/asm-x86/processor.h | 10 ++ xen/include/asm-x86/x86-defns.h | 4 +++- 4 files changed, 15 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c index 9159f32db4..a500df01ac 100644 --- a/xen/arch/x86/debug.c +++ b/xen/arch/x86/debug.c @@ -98,7 +98,7 @@ dbg_pv_va2mfn(dbgva_t vaddr, struct domain *dp, uint64_t pgd3val) l2_pgentry_t l2e, *l2t; l1_pgentry_t l1e, *l1t; unsigned long cr3 = (pgd3val ? pgd3val : dp->vcpu[0]->arch.cr3); -mfn_t mfn = maddr_to_mfn(cr3); +mfn_t mfn = maddr_to_mfn(cr3_pa(cr3)); DBGP2("vaddr:%lx domid:%d cr3:%lx pgd3:%lx\n", vaddr, dp->domain_id, cr3, pgd3val); diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index 11b6a5421a..0c24530ed9 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -51,7 +51,7 @@ static inline struct vcpu *mapcache_current_vcpu(void) if ( (v = idle_vcpu[smp_processor_id()]) == current ) sync_local_execstate(); /* We must now be running on the idle page table. */ -ASSERT(read_cr3() == __pa(idle_pg_table)); +ASSERT(cr3_pa(read_cr3()) == __pa(idle_pg_table)); } return v; diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index 71d32c0333..36628459dc 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -288,6 +288,16 @@ static inline void write_cr3(unsigned long val) asm volatile ( "mov %0, %%cr3" : : "r" (val) : "memory" ); } +static inline unsigned long cr3_pa(unsigned long cr3) +{ +return cr3 & X86_CR3_ADDR_MASK; +} + +static inline unsigned long cr3_pcid(unsigned long cr3) +{ +return cr3 & X86_CR3_PCID_MASK; +} + static inline unsigned long read_cr4(void) { return get_cpu_info()->cr4; diff --git a/xen/include/asm-x86/x86-defns.h b/xen/include/asm-x86/x86-defns.h index ff8d66be3c..904041e1ab 100644 --- a/xen/include/asm-x86/x86-defns.h +++ b/xen/include/asm-x86/x86-defns.h @@ -45,7 +45,9 @@ /* * Intel CPU flags in CR3 */ -#define X86_CR3_NOFLUSH (_AC(1, ULL) << 63) +#define X86_CR3_NOFLUSH(_AC(1, ULL) << 63) +#define X86_CR3_ADDR_MASK (PAGE_MASK & PADDR_MASK) +#define X86_CR3_PCID_MASK _AC(0x0fff, ULL) /* Mask for PCID */ /* * Intel CPU features in CR4 -- 2.13.6 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 5/9] xen/x86: disable global pages for domains with XPTI active
Instead of flushing the TLB from global pages when switching address spaces with XPTI being active just disable global pages via %cr4 completely when a domain subject to XPTI is active. This avoids the need for extra TLB flushes as loading %cr3 will remove all TLB entries. In order to avoid states with cr3/cr4 having inconsistent values (e.g. global pages being activated while cr3 already specifies a XPTI address space) move loading of the new cr4 value to write_ptbase() (actually to switch_cr3_cr4() called by write_ptbase()). This requires to use switch_cr3_cr4() instead of write_ptbase() when building dom0 in order to avoid setting cr4 with cr4.smap set. Signed-off-by: Juergen GrossReviewed-by: Jan Beulich --- V7: - use switch_cr3_cr4() in dom0_build.c V6: - don't call read_cr4() multiple times in switch_cr3_cr4() (Andrew Cooper) V4: - don't use mmu_cr4_features for setting new cr4 value (Jan Beulich) - use simpler scheme for setting X86_CR4_PGE in pv_guest_cr4_to_real_cr4() (Jan Beulich) V3: - move cr4 loading for all domains from *_ctxt_switch_to() to write_cr3_cr4() called by write_ptbase() (Jan Beulich) - rebase --- xen/arch/x86/domain.c | 5 - xen/arch/x86/flushtlb.c| 17 - xen/arch/x86/mm.c | 14 +++--- xen/arch/x86/pv/dom0_build.c | 6 +++--- xen/arch/x86/x86_64/entry.S| 10 -- xen/common/efi/runtime.c | 4 ++-- xen/include/asm-x86/domain.h | 3 ++- xen/include/asm-x86/flushtlb.h | 2 +- 8 files changed, 31 insertions(+), 30 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 3d9c19d055..9b001a03ec 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1523,17 +1523,12 @@ void paravirt_ctxt_switch_from(struct vcpu *v) void paravirt_ctxt_switch_to(struct vcpu *v) { root_pgentry_t *root_pgt = this_cpu(root_pgt); -unsigned long cr4; if ( root_pgt ) root_pgt[root_table_offset(PERDOMAIN_VIRT_START)] = l4e_from_page(v->domain->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW); -cr4 = pv_guest_cr4_to_real_cr4(v); -if ( unlikely(cr4 != read_cr4()) ) -write_cr4(cr4); - if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) ) activate_debugregs(v); diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c index 717f3bd19b..d48b900100 100644 --- a/xen/arch/x86/flushtlb.c +++ b/xen/arch/x86/flushtlb.c @@ -90,20 +90,27 @@ static void do_tlb_flush(void) post_flush(t); } -void switch_cr3(unsigned long cr3) +void switch_cr3_cr4(unsigned long cr3, unsigned long cr4) { -unsigned long flags, cr4; +unsigned long flags, old_cr4; u32 t; /* This non-reentrant function is sometimes called in interrupt context. */ local_irq_save(flags); t = pre_flush(); -cr4 = read_cr4(); -write_cr4(cr4 & ~X86_CR4_PGE); +old_cr4 = read_cr4(); +if ( old_cr4 & X86_CR4_PGE ) +{ +old_cr4 = cr4 & ~X86_CR4_PGE; +write_cr4(old_cr4); +} + write_cr3(cr3); -write_cr4(cr4); + +if ( old_cr4 != cr4 ) +write_cr4(cr4); post_flush(t); diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 9c36614099..73a38e8715 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -503,20 +503,28 @@ void make_cr3(struct vcpu *v, mfn_t mfn) void write_ptbase(struct vcpu *v) { struct cpu_info *cpu_info = get_cpu_info(); +unsigned long new_cr4; + +new_cr4 = (is_pv_vcpu(v) && !is_idle_vcpu(v)) + ? pv_guest_cr4_to_real_cr4(v) + : ((read_cr4() & ~X86_CR4_TSD) | X86_CR4_PGE); if ( is_pv_vcpu(v) && v->domain->arch.pv_domain.xpti ) { cpu_info->root_pgt_changed = true; cpu_info->pv_cr3 = __pa(this_cpu(root_pgt)); -switch_cr3(v->arch.cr3); +switch_cr3_cr4(v->arch.cr3, new_cr4); } else { -/* Make sure to clear xen_cr3 before pv_cr3; switch_cr3() serializes. */ +/* Make sure to clear xen_cr3 before pv_cr3. */ cpu_info->xen_cr3 = 0; -switch_cr3(v->arch.cr3); +/* switch_cr3_cr4() serializes. */ +switch_cr3_cr4(v->arch.cr3, new_cr4); cpu_info->pv_cr3 = 0; } + +ASSERT(is_pv_vcpu(v) || read_cr4() == mmu_cr4_features); } /* diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index d148395919..4465a059a8 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -717,7 +717,7 @@ int __init dom0_construct_pv(struct domain *d, update_cr3(v); /* We run on dom0's page tables for the final part of the build process. */ -write_ptbase(v); +switch_cr3_cr4(v->arch.cr3, read_cr4()); mapcache_override_current(v); /* Copy the OS image and free temporary buffer. */ @@ -738,7 +738,7 @@ int __init dom0_construct_pv(struct domain *d, (parms.virt_hypercall >=
[Xen-devel] [PATCH v8 1/9] x86/xpti: avoid copying L4 page table contents when possible
For mitigation of Meltdown the current L4 page table is copied to the cpu local root page table each time a 64 bit pv guest is entered. Copying can be avoided in cases where the guest L4 page table hasn't been modified while running the hypervisor, e.g. when handling interrupts or any hypercall not modifying the L4 page table or %cr3. So add a per-cpu flag indicating whether the copying should be performed and set that flag only when loading a new %cr3 or modifying the L4 page table. This includes synchronization of the cpu local root page table with other cpus, so add a special synchronization flag for that case. A simple performance check (compiling the hypervisor via "make -j 4") in dom0 with 4 vcpus shows a significant improvement: - real time drops from 112 seconds to 103 seconds - system time drops from 142 seconds to 131 seconds Signed-off-by: Juergen GrossReviewed-by: Jan Beulich --- V8: - more shadow code: set flag on all affected cpus (Tim Deegan) V7: - add missing flag setting in shadow code V6: - correct an error from rebasing to staging in assembly part V4: - move setting of root_pgt_changed flag in flush_area_local() out of irq disabled section (Jan Beulich) - move setting of root_pgt_changed in make_cr3() to _toggle_guest_pt() (Jan Beulich) - remove most conditionals in write_ptbase() (Jan Beulich) - don't set root_pgt_changed in do_mmu_update() for modification of the user page table (Jan Beulich) V3: - set flag locally only if affected L4 is active (Jan Beulich) - add setting flag to flush_area_mask() (Jan Beulich) - set flag in make_cr3() only if called for current active vcpu To be applied on top of Jan's "Meltdown band-aid overhead reduction" series --- xen/arch/x86/flushtlb.c | 16 xen/arch/x86/mm.c | 36 +++- xen/arch/x86/mm/shadow/multi.c| 10 ++ xen/arch/x86/pv/domain.c | 2 ++ xen/arch/x86/smp.c| 2 +- xen/arch/x86/x86_64/asm-offsets.c | 1 + xen/arch/x86/x86_64/entry.S | 9 +++-- xen/include/asm-x86/current.h | 8 xen/include/asm-x86/flushtlb.h| 5 + 9 files changed, 73 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c index 8a7a76b8ff..ad19ba0f5d 100644 --- a/xen/arch/x86/flushtlb.c +++ b/xen/arch/x86/flushtlb.c @@ -8,6 +8,7 @@ */ #include +#include #include #include #include @@ -160,5 +161,20 @@ unsigned int flush_area_local(const void *va, unsigned int flags) local_irq_restore(irqfl); +if ( flags & FLUSH_ROOT_PGTBL ) +get_cpu_info()->root_pgt_changed = true; + return flags; } + +void flush_root_pgt_mask(cpumask_t *mask) +{ +int cpu; +struct cpu_info *cpu_info; + +for_each_cpu(cpu, mask) +{ +cpu_info = (struct cpu_info *)(stack_base[cpu] + STACK_SIZE) - 1; +cpu_info->root_pgt_changed = true; +} +} diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 9fe5583fc3..7d960c742e 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -502,6 +502,7 @@ void make_cr3(struct vcpu *v, mfn_t mfn) void write_ptbase(struct vcpu *v) { +get_cpu_info()->root_pgt_changed = true; write_cr3(v->arch.cr3); } @@ -3699,18 +3700,27 @@ long do_mmu_update( break; rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn, cmd == MMU_PT_UPDATE_PRESERVE_AD, v); -/* - * No need to sync if all uses of the page can be accounted - * to the page lock we hold, its pinned status, and uses on - * this (v)CPU. - */ -if ( !rc && !cpu_has_no_xpti && - ((page->u.inuse.type_info & PGT_count_mask) > - (1 + !!(page->u.inuse.type_info & PGT_pinned) + - (pagetable_get_pfn(curr->arch.guest_table) == mfn) + - (pagetable_get_pfn(curr->arch.guest_table_user) == -mfn))) ) -sync_guest = true; +if ( !rc && !cpu_has_no_xpti ) +{ +bool local_in_use = false; + +if ( pagetable_get_pfn(curr->arch.guest_table) == mfn ) +{ +local_in_use = true; +get_cpu_info()->root_pgt_changed = true; +} + +/* + * No need to sync if all uses of the page can be + * accounted to the page lock we hold, its pinned + * status, and uses on this (v)CPU. + */ +if ( (page->u.inuse.type_info & PGT_count_mask) > + (1 +
Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view
On 04/17/2018 04:53 PM, George Dunlap wrote: > On 04/17/2018 11:50 AM, Razvan Cojocaru wrote: >> void p2m_init_altp2m_ept(struct domain *d, unsigned int i) >> { >> struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; >> +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); >> struct ept_data *ept; >> >> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn; >> +p2m->default_access = hostp2m->default_access; >> +p2m->domain = hostp2m->domain; >> +p2m->logdirty_ranges = hostp2m->logdirty_ranges; >> +p2m->global_logdirty = hostp2m->global_logdirty; > > This would certainly be one approach. But then we'd need to keep a lot > more of these things in sync -- for instance, we'd have to have similar > code to enable and disable global_logdirty on all active altp2m entries. > > [...] > > The other thing that seems to be missing from synchronization is that in > p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being > part of the eptp). The code seems to indicate that this is required for > PML (hardware-assisted logdirty), but I don't see anywhere this is set > or copied from the host p2m. > > It might be nice to have a more structured way of keeping all these > changes in sync, rather than relying on this open-coding everywhere. For logdirty_ranges and global_logdirty, I propose that we put them both in a dynamically allocated struct, and have all p2ms share a pointer to them. That way, all that's required is for the pointer to be set up in p2m_init_altp2m_ept() and the actual data will be automatically shared between p2ms. If I've read the code correctly, the hostp2m is the last to be destroyed and the first to be initialized, so it should work well as long as all p2ms synchronize access to logdirty_ranges and global_logdirty (which I assume they already do). ept.ad probably needs to be propagated on each change and copied from the hostp2m on altp2m init. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
>>> On 06.12.17 at 08:50,wrote: > One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the vcpu > number constraint imposed by one IOREQ page, bump the number of IOREQ page to > 4 pages. With this patch, multiple pages can be used as IOREQ page. In case I didn't say so before - I'm opposed to simply changing the upper limit here. Please make sure any number of vCPU-s can be supported by the new code (it looks like it mostly does already, so it may be mainly the description which needs changing). If we want to impose an upper limit, this shouldn't affect the ioreq page handling code at all. > --- a/xen/arch/x86/hvm/ioreq.c > +++ b/xen/arch/x86/hvm/ioreq.c > @@ -64,14 +64,24 @@ static struct hvm_ioreq_server *get_ioreq_server(const > struct domain *d, > continue; \ > else > > +/* Iterate over all ioreq pages */ > +#define FOR_EACH_IOREQ_PAGE(s, i, iorp) \ > +for ( (i) = 0, iorp = s->ioreq; (i) < (s)->ioreq_page_nr; (i)++, iorp++ ) You're going too far with parenthesization here: Just like iorp, i is required to be a simple identifier anyway. Otoh you parenthesize s only once. > +static ioreq_t *get_ioreq_fallible(struct hvm_ioreq_server *s, struct vcpu > *v) What is "fallible"? And why such a separate wrapper anyway? But I guess a lot of this will need to change anyway with Paul's recent changes. I'm therefore not going to look in close detail at any of this. > static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s) > { > -hvm_unmap_ioreq_gfn(s, >ioreq); > +int i; At the example of this - unsigned int please in all cases where the value can't go negative. > @@ -688,8 +741,15 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server > *s, > INIT_LIST_HEAD(>ioreq_vcpu_list); > spin_lock_init(>bufioreq_lock); > > -s->ioreq.gfn = INVALID_GFN; > +FOR_EACH_IOREQ_PAGE(s, i, iorp) > +iorp->gfn = INVALID_GFN; > s->bufioreq.gfn = INVALID_GFN; > +s->ioreq_page_nr = (d->max_vcpus + IOREQ_NUM_PER_PAGE - 1) / > + IOREQ_NUM_PER_PAGE; DIV_ROUND_UP() - please don't open-code things. > --- a/xen/include/public/hvm/params.h > +++ b/xen/include/public/hvm/params.h > @@ -279,6 +279,12 @@ > #define XEN_HVM_MCA_CAP_LMCE (xen_mk_ullong(1) << 0) > #define XEN_HVM_MCA_CAP_MASK XEN_HVM_MCA_CAP_LMCE > > -#define HVM_NR_PARAMS 39 > +/* > + * Number of pages that are reserved for default IOREQ server. The base PFN > + * is set via HVM_PARAM_IOREQ_PFN. > + */ > +#define HVM_PARAM_IOREQ_PAGES 39 Why is this needed? It can be derived from the vCPU count permitted for the domain. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [distros-debian-squeeze test] 74632: tolerable FAIL
flight 74632 distros-debian-squeeze real [real] http://osstest.xs.citrite.net/~osstest/testlogs/logs/74632/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-i386-squeeze-netboot-pygrub 10 debian-di-install fail like 74577 test-amd64-i386-i386-squeeze-netboot-pygrub 10 debian-di-install fail like 74577 test-amd64-amd64-amd64-squeeze-netboot-pygrub 10 debian-di-install fail like 74577 test-amd64-i386-amd64-squeeze-netboot-pygrub 10 debian-di-install fail like 74577 baseline version: flight 74577 jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-amd64-squeeze-netboot-pygrubfail test-amd64-i386-amd64-squeeze-netboot-pygrub fail test-amd64-amd64-i386-squeeze-netboot-pygrub fail test-amd64-i386-i386-squeeze-netboot-pygrub fail sg-report-flight on osstest.xs.citrite.net logs: /home/osstest/logs images: /home/osstest/images Logs, config files, etc. are available at http://osstest.xs.citrite.net/~osstest/testlogs/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Push not applicable. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
I wonder why I am starting to get CCed on Xen patches all of a sudden. I happened to run into Jürgen at a conference only last weekend, but I still don't know anything whatsoever about Xen or how it works. If get_maintainer.pl has started to return my name on this stuff I really want to know why :/ Yours, Linus Walleij ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Fwd: [tip:x86/urgent] x86, sched: Allow topologies where NUMA nodes share an LLC
On 4/17/18 6:02 PM, Dario Faggioli wrote: On Tue, 2018-04-17 at 15:08 +0100, Andrew Cooper wrote: On 17/04/18 15:02, Juergen Gross wrote: Is this something we should be aware of in Xen, too? If we had something close to a working topology representation, probably. True, as far as letting the guest know about these details (in appropriate circumnstaces) goes. About using it _within_ Xen, well: "Intel's Skylake Server CPUs have a different LLC topology than previous generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided into two "slices", each containing half the cores, half the LLC, and one memory controller and each slice is enumerated to Linux as a NUMA node. This is similar to how the cores and LLC were arranged for the Cluster-On-Die (CoD) feature." This looks similar (but not identical) to, e.g., AMD EPYC. If Xen also sees each slice as a NUMA node as well, we already are doing something (although, of course, we can always improve). If not, I see ways of taking advantage of the information that not all the cores in the socket equally share the LLC, e.g., in Credit2, by providing a per-LLC runqueue option and/or by taking that into account in the 'migration resistance' mechanisms (there is already work in that direction). Yeah, per-LLC runqueue option sounds like a good option. We can look into that. Yes, I am working on dynamically definining migration resistance as a factor of cache and cpu topology ie. instead of having a constant value of migration resistance it will consider the placement of cpus, cache and sharing of cache among cpus. Regards, Dario Anshul ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
On 04/18/2018 10:35 AM, Roger Pau Monné wrote: On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote: On 04/17/2018 11:57 PM, Dongwon Kim wrote: On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote: On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote: 3.2 Backend exports dma-buf to xen-front In this case Dom0 pages are shared with DomU. As before, DomU can only write to these pages, not any other page from Dom0, so it can be still considered safe. But, the following must be considered (highlighted in xen-front's Kernel documentation): - If guest domain dies then pages/grants received from the backend cannot be claimed back - think of it as memory lost to Dom0 (won't be used for any other guest) - Misbehaving guest may send too many requests to the backend exhausting its grant references and memory (consider this from security POV). As the backend runs in the trusted domain we also assume that it is trusted as well, e.g. must take measures to prevent DDoS attacks. I cannot parse the above sentence: "As the backend runs in the trusted domain we also assume that it is trusted as well, e.g. must take measures to prevent DDoS attacks." What's the relation between being trusted and protecting from DoS attacks? I mean that we trust the backend that it can prevent Dom0 from crashing in case DomU's frontend misbehaves, e.g. if the frontend sends too many memory requests etc. In any case, all? PV protocols are implemented with the frontend sharing pages to the backend, and I think there's a reason why this model is used, and it should continue to be used. This is the first use-case above. But there are real-world use-cases (embedded in my case) when physically contiguous memory needs to be shared, one of the possible ways to achieve this is to share contiguous memory from Dom0 to DomU (the second use-case above) Having to add logic in the backend to prevent such attacks means that: - We need more code in the backend, which increases complexity and chances of bugs. - Such code/logic could be wrong, thus allowing DoS. You can live without this code at all, but this is then up to backend which may make Dom0 down because of DomU's frontend doing evil things 4. xen-front/backend/xen-zcopy synchronization 4.1. As I already said in 2) all the inter VM communication happens between xen-front and the backend, xen-zcopy is NOT involved in that. When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues a XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE). This call is synchronous, so xen-front expects that backend does free the buffer pages on return. 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY: - closes all dumb handles/fd's of the buffer according to [3] - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-zcopy to make sure the buffer is freed (think of it as it waits for dma-buf->release callback) So this zcopy thing keeps some kind of track of the memory usage? Why can't the user-space backend keep track of the buffer usage? Because there is no dma-buf UAPI which allows to track the buffer life cycle (e.g. wait until dma-buf's .release callback is called) - replies to xen-front that the buffer can be destroyed. This way deletion of the buffer happens synchronously on both Dom0 and DomU sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out error (BTW, wait time is a parameter of this IOCTL), Xen will defer grant reference removal and will retry later until those are free. Hope this helps understand how buffers are synchronously deleted in case of xen-zcopy with a single protocol command. I think the above logic can also be re-used by the hyper-dmabuf driver with some additional work: 1. xen-zcopy can be split into 2 parts and extend: 1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and vise versa, I don't know much about the dma-buf implementation in Linux, but gntdev is a user-space device, and AFAICT user-space applications don't have any notion of dma buffers. How are such buffers useful for user-space? Why can't this just be called memory? A dma-buf is seen by user-space as a file descriptor and you can pass it to different drivers then. For example, you can share a buffer used by a display driver for scanout with a GPU, to compose a picture into it: 1. User-space (US) allocates a display buffer from display driver 2. US asks display driver to export the dma-buf which backs up that buffer, US gets buffer's fd: dma_buf_fd 3. US asks GPU driver to import a buffer and provides it with dma_buf_fd 4. GPU renders contents into display buffer (dma_buf_fd) Finally, this is indeed some memory, but a bit more [1] Also, (with my FreeBSD maintainer hat) how is this going to translate to other OSes? So far the operations performed by the gntdev device are mostly OS-agnostic because this just map/unmap memory, and in fact they are implemented
Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check
On 04/18/2018 10:48 AM, Razvan Cojocaru wrote: > On 04/18/2018 10:38 AM, Jan Beulich wrote: > On 17.04.18 at 19:16,wrote: >>> --- a/xen/arch/x86/mm/p2m.c >>> +++ b/xen/arch/x86/mm/p2m.c >>> @@ -976,6 +976,13 @@ void p2m_change_type_range(struct domain *d, >>> ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt)); >>> >>> p2m_lock(p2m); >>> + >>> +if ( start > p2m->max_mapped_pfn ) >>> +{ >>> +p2m_unlock(p2m); >>> +return; >>> +} >> >> I realize this is what George has suggested, but I still wonder if this is >> the >> right thing to do here: Why is this any more important to check than the >> more general start >= end (in which case the assertion in the rangeset >> code would also trigger)? Till now the function assumes "sensible" things >> to be passed in, but specifically also permitting the [start,~0UL] case >> (just in a more generalizer fashion). The problem you're trying to fix here >> is something passing in a nonsense range (starting above the valid range). >> Yet if we want the function to be immune to nonsense being passed in, I >> think start < end is what needs checking for, and that check would then >> perhaps better go after end was already adjusted. >> >> The obvious alternative is for callers to only pass in sane ranges (in which >> case adding ASSERT() here may be considered, instead of triggering the >> one in the rangeset code). >> >> In any event you want to add unlikely(), just like the similar end check has. > > FWIW I'll be happy to change the test to unlikely( start < end ), if > this is OK with George. I'll test the change now. Sorry, unlikely(end <= start), obviously. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check
On 04/18/2018 10:38 AM, Jan Beulich wrote: On 17.04.18 at 19:16,wrote: >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -976,6 +976,13 @@ void p2m_change_type_range(struct domain *d, >> ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt)); >> >> p2m_lock(p2m); >> + >> +if ( start > p2m->max_mapped_pfn ) >> +{ >> +p2m_unlock(p2m); >> +return; >> +} > > I realize this is what George has suggested, but I still wonder if this is the > right thing to do here: Why is this any more important to check than the > more general start >= end (in which case the assertion in the rangeset > code would also trigger)? Till now the function assumes "sensible" things > to be passed in, but specifically also permitting the [start,~0UL] case > (just in a more generalizer fashion). The problem you're trying to fix here > is something passing in a nonsense range (starting above the valid range). > Yet if we want the function to be immune to nonsense being passed in, I > think start < end is what needs checking for, and that check would then > perhaps better go after end was already adjusted. > > The obvious alternative is for callers to only pass in sane ranges (in which > case adding ASSERT() here may be considered, instead of triggering the > one in the rangeset code). > > In any event you want to add unlikely(), just like the similar end check has. FWIW I'll be happy to change the test to unlikely( start < end ), if this is OK with George. I'll test the change now. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check
>>> On 17.04.18 at 19:16,wrote: > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -976,6 +976,13 @@ void p2m_change_type_range(struct domain *d, > ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt)); > > p2m_lock(p2m); > + > +if ( start > p2m->max_mapped_pfn ) > +{ > +p2m_unlock(p2m); > +return; > +} I realize this is what George has suggested, but I still wonder if this is the right thing to do here: Why is this any more important to check than the more general start >= end (in which case the assertion in the rangeset code would also trigger)? Till now the function assumes "sensible" things to be passed in, but specifically also permitting the [start,~0UL] case (just in a more generalizer fashion). The problem you're trying to fix here is something passing in a nonsense range (starting above the valid range). Yet if we want the function to be immune to nonsense being passed in, I think start < end is what needs checking for, and that check would then perhaps better go after end was already adjusted. The obvious alternative is for callers to only pass in sane ranges (in which case adding ASSERT() here may be considered, instead of triggering the one in the rangeset code). In any event you want to add unlikely(), just like the similar end check has. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote: > On 04/17/2018 11:57 PM, Dongwon Kim wrote: > > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote: > > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote: > 3.2 Backend exports dma-buf to xen-front > > In this case Dom0 pages are shared with DomU. As before, DomU can only write > to these pages, not any other page from Dom0, so it can be still considered > safe. > But, the following must be considered (highlighted in xen-front's Kernel > documentation): > - If guest domain dies then pages/grants received from the backend cannot > be claimed back - think of it as memory lost to Dom0 (won't be used for > any > other guest) > - Misbehaving guest may send too many requests to the backend exhausting > its grant references and memory (consider this from security POV). As the > backend runs in the trusted domain we also assume that it is trusted as > well, > e.g. must take measures to prevent DDoS attacks. I cannot parse the above sentence: "As the backend runs in the trusted domain we also assume that it is trusted as well, e.g. must take measures to prevent DDoS attacks." What's the relation between being trusted and protecting from DoS attacks? In any case, all? PV protocols are implemented with the frontend sharing pages to the backend, and I think there's a reason why this model is used, and it should continue to be used. Having to add logic in the backend to prevent such attacks means that: - We need more code in the backend, which increases complexity and chances of bugs. - Such code/logic could be wrong, thus allowing DoS. > 4. xen-front/backend/xen-zcopy synchronization > > 4.1. As I already said in 2) all the inter VM communication happens between > xen-front and the backend, xen-zcopy is NOT involved in that. > When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues a > XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE). > This call is synchronous, so xen-front expects that backend does free the > buffer pages on return. > > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY: > - closes all dumb handles/fd's of the buffer according to [3] > - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-zcopy to make > sure > the buffer is freed (think of it as it waits for dma-buf->release > callback) So this zcopy thing keeps some kind of track of the memory usage? Why can't the user-space backend keep track of the buffer usage? > - replies to xen-front that the buffer can be destroyed. > This way deletion of the buffer happens synchronously on both Dom0 and DomU > sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out > error > (BTW, wait time is a parameter of this IOCTL), Xen will defer grant > reference > removal and will retry later until those are free. > > Hope this helps understand how buffers are synchronously deleted in case > of xen-zcopy with a single protocol command. > > I think the above logic can also be re-used by the hyper-dmabuf driver with > some additional work: > > 1. xen-zcopy can be split into 2 parts and extend: > 1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and > vise versa, I don't know much about the dma-buf implementation in Linux, but gntdev is a user-space device, and AFAICT user-space applications don't have any notion of dma buffers. How are such buffers useful for user-space? Why can't this just be called memory? Also, (with my FreeBSD maintainer hat) how is this going to translate to other OSes? So far the operations performed by the gntdev device are mostly OS-agnostic because this just map/unmap memory, and in fact they are implemented by Linux and FreeBSD. > implement "wait" ioctl (wait for dma-buf->release): currently these are > DRM_XEN_ZCOPY_DUMB_FROM_REFS, DRM_XEN_ZCOPY_DUMB_TO_REFS and > DRM_XEN_ZCOPY_DUMB_WAIT_FREE > 1.2. Xen balloon driver [6] to allow allocating contiguous buffers (not > needed > by current hyper-dmabuf, but is a must for xen-zcopy use-cases) I think this needs clarifying. In which memory space do you need those regions to be contiguous? Do they need to be contiguous in host physical memory, or guest physical memory? If it's in guest memory space, isn't there any generic interface that you can use? If it's in host physical memory space, why do you need this buffer to be contiguous in host physical memory space? The IOMMU should hide all this. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] drm/xen-front: Remove CMA support
On 04/17/2018 12:08 PM, Oleksandr Andrushchenko wrote: On 04/17/2018 12:04 PM, Daniel Vetter wrote: On Tue, Apr 17, 2018 at 10:40:12AM +0300, Oleksandr Andrushchenko wrote: From: Oleksandr AndrushchenkoEven if xen-front allocates its buffers from contiguous memory those are still not contiguous in PA space, e.g. the buffer is only contiguous in IPA space. The only use-case for this mode was if xen-front is used to allocate dumb buffers which later be used by some other driver requiring contiguous memory, but there is no currently such a use-case or it can be worked around with xen-front. Please also mention the nents confusion here, and the patch that fixes it. Or just outright take the commit message from my patch with all the details: ok, if you don't mind then I'll use your commit message entirely drm/xen: Dissable CMA support It turns out this was only needed to paper over a bug in the CMA helpers, which was addressed in commit 998fb1a0f478b83492220ff79583bf9ad538bdd8 Author: Liviu Dudau Date: Fri Nov 10 13:33:10 2017 + drm: gem_cma_helper.c: Allow importing of contiguous scatterlists with nents > 1 Without this the following pipeline didn't work: domU: 1. xen-front allocates a non-contig buffer 2. creates grants out of it dom0: 3. converts the grants into a dma-buf. Since they're non-contig, the scatter-list is huge. 4. imports it into rcar-du, which requires dma-contig memory for scanout. -> On this given platform there's an IOMMU, so in theory this should work. But in practice this failed, because of the huge number of sg entries, even though the IOMMU driver mapped it all into a dma-contig range. With a guest-contig buffer allocated in step 1, this problem doesn't exist. But there's technically no reason to require guest-contig memory for xen buffer sharing using grants. With the commit message improved: Acked-by: Daniel Vetter Thank you, I'll wait for a day and apply to drm-misc-next if this is ok applied to drm-misc-next Signed-off-by: Oleksandr Andrushchenko Suggested-by: Daniel Vetter --- Documentation/gpu/xen-front.rst | 12 drivers/gpu/drm/xen/Kconfig | 13 drivers/gpu/drm/xen/Makefile | 9 +-- drivers/gpu/drm/xen/xen_drm_front.c | 62 +++- drivers/gpu/drm/xen/xen_drm_front.h | 42 ++- drivers/gpu/drm/xen/xen_drm_front_gem.c | 12 +--- drivers/gpu/drm/xen/xen_drm_front_gem.h | 3 - drivers/gpu/drm/xen/xen_drm_front_gem_cma.c | 79 - drivers/gpu/drm/xen/xen_drm_front_shbuf.c | 22 -- drivers/gpu/drm/xen/xen_drm_front_shbuf.h | 8 --- 10 files changed, 21 insertions(+), 241 deletions(-) delete mode 100644 drivers/gpu/drm/xen/xen_drm_front_gem_cma.c diff --git a/Documentation/gpu/xen-front.rst b/Documentation/gpu/xen-front.rst index 009d942386c5..d988da7d1983 100644 --- a/Documentation/gpu/xen-front.rst +++ b/Documentation/gpu/xen-front.rst @@ -18,18 +18,6 @@ Buffers allocated by the frontend driver .. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h :doc: Buffers allocated by the frontend driver -With GEM CMA helpers - - -.. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h - :doc: With GEM CMA helpers - -Without GEM CMA helpers -~~~ - -.. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h - :doc: Without GEM CMA helpers - Buffers allocated by the backend diff --git a/drivers/gpu/drm/xen/Kconfig b/drivers/gpu/drm/xen/Kconfig index 4f4abc91f3b6..4cca160782ab 100644 --- a/drivers/gpu/drm/xen/Kconfig +++ b/drivers/gpu/drm/xen/Kconfig @@ -15,16 +15,3 @@ config DRM_XEN_FRONTEND help Choose this option if you want to enable a para-virtualized frontend DRM/KMS driver for Xen guest OSes. - -config DRM_XEN_FRONTEND_CMA - bool "Use DRM CMA to allocate dumb buffers" - depends on DRM_XEN_FRONTEND - select DRM_KMS_CMA_HELPER - select DRM_GEM_CMA_HELPER - help - Use DRM CMA helpers to allocate display buffers. - This is useful for the use-cases when guest driver needs to - share or export buffers to other drivers which only expect - contiguous buffers. - Note: in this mode driver cannot use buffers allocated - by the backend. diff --git a/drivers/gpu/drm/xen/Makefile b/drivers/gpu/drm/xen/Makefile index 352730dc6c13..712afff5ffc3 100644 --- a/drivers/gpu/drm/xen/Makefile +++ b/drivers/gpu/drm/xen/Makefile @@ -5,12 +5,7 @@ drm_xen_front-objs := xen_drm_front.o \ xen_drm_front_conn.o \ xen_drm_front_evtchnl.o \
Re: [Xen-devel] [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible
On 16/04/18 20:22, Juergen Gross wrote: > On 16/04/18 17:34, Tim Deegan wrote: >> Hi, >> >> At 02:43 -0600 on 13 Apr (1523587395), Jan Beulich wrote: >> On 12.04.18 at 20:09,wrote: For mitigation of Meltdown the current L4 page table is copied to the cpu local root page table each time a 64 bit pv guest is entered. Copying can be avoided in cases where the guest L4 page table hasn't been modified while running the hypervisor, e.g. when handling interrupts or any hypercall not modifying the L4 page table or %cr3. So add a per-cpu flag indicating whether the copying should be performed and set that flag only when loading a new %cr3 or modifying the L4 page table. This includes synchronization of the cpu local root page table with other cpus, so add a special synchronization flag for that case. A simple performance check (compiling the hypervisor via "make -j 4") in dom0 with 4 vcpus shows a significant improvement: - real time drops from 112 seconds to 103 seconds - system time drops from 142 seconds to 131 seconds Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich --- V7: - add missing flag setting in shadow code >>> >>> This now needs an ack from Tim (now Cc-ed). >> >> I may be misunderstanding how this flag is supposed to work, but this >> seems at first glance to do both too much and too little. The sl4 >> table that's being modified may be in use on any number of other >> pcpus, and might not be in use on the current pcpu. > > Okay. > >> It looks like the do_mmu_update path issues a flush IPI; I think that >> the equivalent IPI would be a better place to hook if you can. > > I want to catch only cases where the L4 table is being modified. > >> Also I'm not sure why the flag needs to be set in >> l4e_propagate_from_guest() as well as shadow_set_l4e(). Can you >> elaborate? > > I narrowed down iteratively where the setting of the flag is mandatory. > Doing it in shadow_set_l4e() seemed to look right, but it wasn't enough. > Setting it also in l4e_propagate_from_guest() showed no further problems > migrating the guest. I just revisited this after looking into the code. Seems I remembered the sequence of narrowing down wrong. l4e_propagate_from_guest() doesn't need to set the flag. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver
On 04/17/2018 11:57 PM, Dongwon Kim wrote: On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote: On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote: Yeah, I definitely agree on the idea of expanding the use case to the general domain where dmabuf sharing is used. However, what you are targetting with proposed changes is identical to the core design of hyper_dmabuf. On top of this basic functionalities, hyper_dmabuf has driver level inter-domain communication, that is needed for dma-buf remote tracking (no fence forwarding though), event triggering and event handling, extra meta data exchange and hyper_dmabuf_id that represents grefs (grefs are shared implicitly on driver level) This really isn't a positive design aspect of hyperdmabuf imo. The core code in xen-zcopy (ignoring the ioctl side, which will be cleaned up) is very simple & clean. If there's a clear need later on we can extend that. But for now xen-zcopy seems to cover the basic use-case needs, so gets the job done. Also it is designed with frontend (common core framework) + backend (hyper visor specific comm and memory sharing) structure for portability. We just can't limit this feature to Xen because we want to use the same uapis not only for Xen but also other applicable hypervisor, like ACORN. See the discussion around udmabuf and the needs for kvm. I think trying to make an ioctl/uapi that works for multiple hypervisors is misguided - it likely won't work. On top of that the 2nd hypervisor you're aiming to support is ACRN. That's not even upstream yet, nor have I seen any patches proposing to land linux support for ACRN. Since it's not upstream, it doesn't really matter for upstream consideration. I'm doubting that ACRN will use the same grant references as xen, so the same uapi won't work on ACRN as on Xen anyway. Yeah, ACRN doesn't have grant-table. Only Xen supports it. But that is why hyper_dmabuf has been architectured with the concept of backend. If you look at the structure of backend, you will find that backend is just a set of standard function calls as shown here: struct hyper_dmabuf_bknd_ops { /* backend initialization routine (optional) */ int (*init)(void); /* backend cleanup routine (optional) */ int (*cleanup)(void); /* retreiving id of current virtual machine */ int (*get_vm_id)(void); /* get pages shared via hypervisor-specific method */ int (*share_pages)(struct page **pages, int vm_id, int nents, void **refs_info); /* make shared pages unshared via hypervisor specific method */ int (*unshare_pages)(void **refs_info, int nents); /* map remotely shared pages on importer's side via * hypervisor-specific method */ struct page ** (*map_shared_pages)(unsigned long ref, int vm_id, int nents, void **refs_info); /* unmap and free shared pages on importer's side via * hypervisor-specific method */ int (*unmap_shared_pages)(void **refs_info, int nents); /* initialize communication environment */ int (*init_comm_env)(void); void (*destroy_comm)(void); /* upstream ch setup (receiving and responding) */ int (*init_rx_ch)(int vm_id); /* downstream ch setup (transmitting and parsing responses) */ int (*init_tx_ch)(int vm_id); int (*send_req)(int vm_id, struct hyper_dmabuf_req *req, int wait); }; All of these can be mapped with any hypervisor specific implementation. We designed backend implementation for Xen using grant-table, Xen event and ring buffer communication. For ACRN, we have another backend using Virt-IO for both memory sharing and communication. We tried to define this structure of backend to make it general enough (or it can be even modified or extended to support more cases.) so that it can fit to other hypervisor cases. Only requirements/expectation on the hypervisor are page-level memory sharing and inter-domain communication, which I think are standard features of modern hypervisor. And please review common UAPIs that hyper_dmabuf and xen-zcopy supports. They are very general. One is getting FD (dmabuf) and get those shared. The other is generating dmabuf from global handle (secure handle hiding gref behind it). On top of this, hyper_dmabuf has "unshare" and "query" which are also useful for any cases. So I don't know why we wouldn't want to try to make these standard in most of hypervisor cases instead of limiting it to certain hypervisor like Xen. Frontend-backend structre is optimal for this I think. So I am wondering we can start with this hyper_dmabuf then modify it for your use-case if needed and polish and fix any glitches if we want to to use this for all general dma-buf usecases. Imo xen-zcopy is a much more reasonable starting point for upstream, which can