Re: [Xen-devel] [PATCH] x86/cpu: Support a new cpu vendor, which is Shanghai

2018-04-18 Thread Fiona Li(BJ-RD)
Hi Jan,
Yes, I mean with another patch. And Thank you again.

-Original Message-
From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Wednesday, April 18, 2018 6:52 PM
To: Fiona Li(BJ-RD) 
Cc: xen-devel 
Subject: RE: RE: [PATCH] x86/cpu: Support a new cpu vendor,which is Shanghai

>>> On 18.04.18 at 12:25,  wrote:
> [FionaLi] : I am sorry. I understood wrongly. The C000 range are
> extensions, which provide some additional feature different from
> Intel.  As you suggested, we will enable those features in guest OSes
> and remove these code from patch. Can we support the CPUID leaves and
> emulate the MSR  with another submit?

With another patch you mean? Yes please - if at all possible put separate sets 
of changes into separate patches, combined into a series if there are 
dependencies between them.

Jan




保密声明:
本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。
CONFIDENTIAL NOTE:
This email contains confidential or legally privileged information and is for 
the sole use of its intended recipient. Any unauthorized review, use, copying 
or forwarding of this email or the content of this email is strictly prohibited.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 5/6] arm: add a small kconfig for qemu-system-aarch64

2018-04-18 Thread Stefano Stabellini
Disable erratas because they don't apply to QEMU's software emulated
CPUs.

Arbitrarily choose a limit of 4 CPUs.

Select the credit and NULL schedulers only.

Signed-off-by: Stefano Stabellini 
---
 xen/arch/arm/configs/qemu.config | 81 
 1 file changed, 81 insertions(+)
 create mode 100644 xen/arch/arm/configs/qemu.config

diff --git a/xen/arch/arm/configs/qemu.config b/xen/arch/arm/configs/qemu.config
new file mode 100644
index 000..6351cfd
--- /dev/null
+++ b/xen/arch/arm/configs/qemu.config
@@ -0,0 +1,81 @@
+#
+# Automatically generated file; DO NOT EDIT.
+# Xen/arm 4.11-unstable Configuration
+#
+CONFIG_64BIT=y
+CONFIG_ARM_64=y
+CONFIG_ARM=y
+CONFIG_ARCH_DEFCONFIG="arch/arm/configs/arm64_defconfig"
+
+#
+# Architecture Features
+#
+CONFIG_NR_CPUS=4
+# CONFIG_ACPI is not set
+CONFIG_HAS_GICV3=y
+# CONFIG_HAS_MEM_ACCESS is not set
+# CONFIG_HAS_ITS is not set
+# CONFIG_NEW_VGIC is not set
+# CONFIG_SBSA_VUART_CONSOLE is not set
+
+#
+# ARM errata workaround via the alternative framework
+#
+# CONFIG_ARM64_ERRATUM_827319 is not set
+# CONFIG_ARM64_ERRATUM_824069 is not set
+# CONFIG_ARM64_ERRATUM_819472 is not set
+# CONFIG_ARM64_ERRATUM_832075 is not set
+# CONFIG_ARM64_ERRATUM_834220 is not set
+CONFIG_HARDEN_BRANCH_PREDICTOR=y
+CONFIG_ARM64_HARDEN_BRANCH_PREDICTOR=y
+
+#
+# Common Features
+#
+CONFIG_HAS_ALTERNATIVE=y
+CONFIG_HAS_DEVICE_TREE=y
+CONFIG_HAS_PDX=y
+# CONFIG_TMEM is not set
+# CONFIG_XSM is not set
+
+#
+# Schedulers
+#
+CONFIG_SCHED_CREDIT=y
+# CONFIG_SCHED_CREDIT2 is not set
+# CONFIG_SCHED_RTDS is not set
+# CONFIG_SCHED_ARINC653 is not set
+CONFIG_SCHED_NULL=y
+CONFIG_SCHED_CREDIT_DEFAULT=y
+# CONFIG_SCHED_NULL_DEFAULT is not set
+CONFIG_SCHED_DEFAULT="credit"
+# CONFIG_LIVEPATCH is not set
+# CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS is not set
+CONFIG_CMDLINE=""
+
+#
+# Device Drivers
+#
+# CONFIG_HAS_NS16550 is not set
+# CONFIG_HAS_CADENCE_UART is not set
+# CONFIG_HAS_MVEBU is not set
+CONFIG_HAS_PL011=y
+# CONFIG_HAS_SCIF is not set
+# CONFIG_HAS_EHCI is not set
+CONFIG_HAS_PASSTHROUGH=y
+# CONFIG_HAS_SMMUv2 is not set
+# CONFIG_VIDEO is not set
+# CONFIG_HAS_ARM_HDLCD is not set
+CONFIG_DEFCONFIG_LIST="$ARCH_DEFCONFIG"
+
+#
+# Debugging Options
+#
+# CONFIG_DEBUG is not set
+# CONFIG_FRAME_POINTER is not set
+# CONFIG_COVERAGE is not set
+# CONFIG_LOCK_PROFILE is not set
+# CONFIG_PERF_COUNTERS is not set
+# CONFIG_VERBOSE_DEBUG is not set
+# CONFIG_DEVICE_TREE_DEBUG is not set
+# CONFIG_SCRUB_DEBUG is not set
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/6] arm: more kconfig configurability and small default configs

2018-04-18 Thread Stefano Stabellini
Hi all,

This patch series is the first step toward building a small certifiable
Xen hypervisor for ARM boards.

First, the series makes a few changes to allow disabling more kconfig
options: most of them already exist but cannot be disabled.

Then, it introduces a reference kconfig for Renesas RCar (due to popular
demand, candidate for certifications) and for QEMU aarch64 (not for
certifications, but useful for debugging).

The last patch in the series adds a convenient cloc target to count the
total lines of code of the source files built.


There a couple of open questions which need to be addressed in regards
to which options to enable/disable in the reference kconfig. The primary
one is which schedulers to enable.

In this series, I enabled NULL and credit, but none of the others. The
choice is somewhat arbitrary but driven by the idea that NULL is
required for best interrupt latency results, and credit is the default.
They can be paired using multiple cpupools.

Aside from the schedulers, most other kconfig options seem pretty
obvious.

I am interested in hearing other opinions about this.

Cheers,

Stefano



Stefano Stabellini (6):
  arm: make it possible to disable more kconfig options
  arm: make it possible to enable/disable UART drivers
  arm: make it possible to disable the SMMU driver
  arm: add a small kconfig for Renesas RCar H3
  arm: add a small kconfig for qemu-system-aarch64
  xen: add cloc target

 xen/Makefile | 14 ++-
 xen/Rules.mk |  2 +
 xen/arch/arm/Kconfig | 15 +--
 xen/arch/arm/configs/qemu.config | 81 
 xen/arch/arm/configs/renesas.config  | 80 +++
 xen/drivers/char/Kconfig | 16 +++
 xen/drivers/passthrough/Kconfig  |  2 +
 xen/drivers/passthrough/arm/Kconfig  |  7 
 xen/drivers/passthrough/arm/Makefile |  2 +-
 xen/drivers/video/Kconfig|  8 +++-
 10 files changed, 211 insertions(+), 16 deletions(-)
 create mode 100644 xen/arch/arm/configs/qemu.config
 create mode 100644 xen/arch/arm/configs/renesas.config
 create mode 100644 xen/drivers/passthrough/arm/Kconfig

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCHv2] x86/xen: Remove use of VLAs

2018-04-18 Thread Boris Ostrovsky
On 04/18/2018 01:08 PM, Laura Abbott wrote:
> There's an ongoing effort to remove VLAs[1] from the kernel to eventually
> turn on -Wvla. It turns out, the few VLAs in use in Xen produce only a
> single entry array that is always bounded by GDT_SIZE. Clean up the code to
> get rid of the VLA and the loop.
>
> [1] https://lkml.org/lkml/2018/3/7/621
>
> Signed-off-by: Laura Abbott 
> ---
> v2: Updated the code to reflect that we know size is always bounded by
> GDT_SIZE. This gets rid of the array and the loop. I can throw a few
> more comments in there if someone thinks they need to be updated.
> ---
>  arch/x86/xen/enlighten_pv.c | 84 
> -
>  1 file changed, 29 insertions(+), 55 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index c36d23aa6c35..1254f2fa3a89 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -421,45 +421,32 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
>  {
>   unsigned long va = dtr->address;
>   unsigned int size = dtr->size + 1;
> - unsigned pages = DIV_ROUND_UP(size, PAGE_SIZE);
> - unsigned long frames[pages];
> - int f;
> -
> - /*
> -  * A GDT can be up to 64k in size, which corresponds to 8192
> -  * 8-byte entries, or 16 4k pages..
> -  */
> + unsigned long pfn, mfn;
> + int level;
> + pte_t *ptep;
> + void *virt;
>  
> - BUG_ON(size > 65536);
> + BUG_ON(size > GDT_SIZE);


I'd probably BUG_ON(size>PAGE_SIZE) because that's what we are really
trying to avoid. Maybe with a comment that we expect GDT_SIZE at most,
and it is less than PAGE_SIZE.

I can fix it while committing if you don't object.

Reviewed-by: Boris Ostrovsky 



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 4/6] arm: add a small kconfig for Renesas RCar H3

2018-04-18 Thread Stefano Stabellini
This is a reference tiny kconfig for Renesas RCar.  In terms of
schedulers, it selects credit and NULL only.  It enables all the ARM64
errata.

Signed-off-by: Stefano Stabellini 
CC: artem_myga...@epam.com
CC: volodymyr_babc...@epam.com

---

This patch is untested on Renesas RCar, please test!
Also, I am not sure whether some of the errata workarounds can be
disabled on the RCar.
---
 xen/arch/arm/configs/renesas.config | 80 +
 1 file changed, 80 insertions(+)
 create mode 100644 xen/arch/arm/configs/renesas.config

diff --git a/xen/arch/arm/configs/renesas.config 
b/xen/arch/arm/configs/renesas.config
new file mode 100644
index 000..7ad3f1c
--- /dev/null
+++ b/xen/arch/arm/configs/renesas.config
@@ -0,0 +1,80 @@
+#
+# Automatically generated file; DO NOT EDIT.
+# Xen/arm 4.11-unstable Configuration
+#
+CONFIG_64BIT=y
+CONFIG_ARM_64=y
+CONFIG_ARM=y
+CONFIG_ARCH_DEFCONFIG="arch/arm/configs/arm64_defconfig"
+
+#
+# Architecture Features
+#
+CONFIG_NR_CPUS=8
+# CONFIG_ACPI is not set
+# CONFIG_HAS_GICV3 is not set
+# CONFIG_HAS_MEM_ACCESS is not set
+# CONFIG_NEW_VGIC is not set
+# CONFIG_SBSA_VUART_CONSOLE is not set
+
+#
+# ARM errata workaround via the alternative framework
+#
+CONFIG_ARM64_ERRATUM_827319=y
+CONFIG_ARM64_ERRATUM_824069=y
+CONFIG_ARM64_ERRATUM_819472=y
+CONFIG_ARM64_ERRATUM_832075=y
+CONFIG_ARM64_ERRATUM_834220=y
+CONFIG_HARDEN_BRANCH_PREDICTOR=y
+CONFIG_ARM64_HARDEN_BRANCH_PREDICTOR=y
+
+#
+# Common Features
+#
+CONFIG_HAS_ALTERNATIVE=y
+CONFIG_HAS_DEVICE_TREE=y
+CONFIG_HAS_PDX=y
+# CONFIG_TMEM is not set
+# CONFIG_XSM is not set
+
+#
+# Schedulers
+#
+CONFIG_SCHED_CREDIT=y
+# CONFIG_SCHED_CREDIT2 is not set
+# CONFIG_SCHED_RTDS is not set
+# CONFIG_SCHED_ARINC653 is not set
+CONFIG_SCHED_NULL=y
+# CONFIG_SCHED_CREDIT_DEFAULT is not set
+CONFIG_SCHED_NULL_DEFAULT=y
+CONFIG_SCHED_DEFAULT="null"
+# CONFIG_LIVEPATCH is not set
+# CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS is not set
+CONFIG_CMDLINE=""
+
+#
+# Device Drivers
+#
+# CONFIG_HAS_NS16550 is not set
+# CONFIG_HAS_CADENCE_UART is not set
+# CONFIG_HAS_MVEBU is not set
+# CONFIG_HAS_PL011 is not set
+CONFIG_HAS_SCIF=y
+# CONFIG_HAS_EHCI is not set
+CONFIG_HAS_PASSTHROUGH=y
+CONFIG_HAS_SMMUv2=y
+# CONFIG_VIDEO is not set
+# CONFIG_HAS_ARM_HDLCD is not set
+CONFIG_DEFCONFIG_LIST="$ARCH_DEFCONFIG"
+
+#
+# Debugging Options
+#
+# CONFIG_DEBUG is not set
+# CONFIG_FRAME_POINTER is not set
+# CONFIG_COVERAGE is not set
+# CONFIG_LOCK_PROFILE is not set
+# CONFIG_PERF_COUNTERS is not set
+# CONFIG_VERBOSE_DEBUG is not set
+# CONFIG_DEVICE_TREE_DEBUG is not set
+# CONFIG_SCRUB_DEBUG is not set
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 6/6] xen: add cloc target

2018-04-18 Thread Stefano Stabellini
Add a Xen build target to count the lines of code of the source files
built. Uses `cloc' to do the job.

Generate the list of source files from the %.o targets, append output
to "sourcelist".

Remove sourcelist on clean, and also at the beginning of the build
target to avoid appending to sourcelist on consequence builds. Otherwise
one could imagine sourcelist could become large if the user builds Xen
repeatedly without calling clean.

For the cloc target, first clean, then build to make sure all files are
properly accounted (no partial builds).

Signed-off-by: Stefano Stabellini 
CC: jbeul...@suse.com
CC: andrew.coop...@citrix.com
---
 xen/Makefile | 14 --
 xen/Rules.mk |  2 ++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/xen/Makefile b/xen/Makefile
index 62d479c..f53fd22 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -48,7 +48,7 @@ else
 endif
 
 .PHONY: _build
-_build: $(TARGET)$(CONFIG_XEN_INSTALL_SUFFIX)
+_build: clean-sourcelist $(TARGET)$(CONFIG_XEN_INSTALL_SUFFIX)
 
 .PHONY: _install
 _install: D=$(DESTDIR)
@@ -108,7 +108,7 @@ _debug:
$(OBJDUMP) -D -S $(TARGET)-syms > $(TARGET).s
 
 .PHONY: _clean
-_clean: delete-unfresh-files
+_clean: delete-unfresh-files clean-sourcelist
$(MAKE) -C tools clean
$(MAKE) -f $(BASEDIR)/Rules.mk -C include clean
$(MAKE) -f $(BASEDIR)/Rules.mk -C common clean
@@ -267,3 +267,13 @@ $(KCONFIG_CONFIG):
 include/config/auto.conf.cmd: ;
 
 -include $(BASEDIR)/include/config/auto.conf.cmd
+
+.PHONY: cloc
+cloc: $(BASEDIR)/sourcelist
+   cloc --list-file=$(BASEDIR)/sourcelist
+
+$(BASEDIR)/sourcelist: clean build
+
+.PHONY: clean-sourcelist
+clean-sourcelist:
+   rm -f $(BASEDIR)/sourcelist
diff --git a/xen/Rules.mk b/xen/Rules.mk
index 5337e20..cb48aa7 100644
--- a/xen/Rules.mk
+++ b/xen/Rules.mk
@@ -190,9 +190,11 @@ _clean_%/: FORCE
$(MAKE) -f $(BASEDIR)/Rules.mk -C $* clean
 
 %.o: %.c Makefile
+   echo `pwd`/$< >> $(BASEDIR)/sourcelist
$(CC) $(CFLAGS) -c $< -o $@
 
 %.o: %.S Makefile
+   echo `pwd`/$< >> $(BASEDIR)/sourcelist
$(CC) $(AFLAGS) -c $< -o $@
 
 SPECIAL_DATA_SECTIONS := rodata $(foreach a,1 2 4 8 16, \
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/6] arm: make it possible to disable more kconfig options

2018-04-18 Thread Stefano Stabellini
Make it possible to disable the following existing kconfig options:
  HAS_GICV3
  HAS_ARM_HDLCD
  HAS_MEM_ACCESS

Today they are silent option. This patch adds one line descriptions and
make them de/selectable.

Also, do not select VIDEO: make HAS_ARM_HDLCD select VIDEO instead. In
fact, VIDEO is only needed by HAS_ARM_HDLCD.

Signed-off-by: Stefano Stabellini 
---
 xen/arch/arm/Kconfig  | 15 +++
 xen/drivers/video/Kconfig |  8 +++-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 8174c0c..83963c8 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -12,17 +12,13 @@ config ARM_32
 config ARM_64
def_bool y
depends on 64BIT
-   select HAS_GICV3
 
 config ARM
def_bool y
select HAS_ALTERNATIVE
-   select HAS_ARM_HDLCD
select HAS_DEVICE_TREE
-   select HAS_MEM_ACCESS
select HAS_PASSTHROUGH
select HAS_PDX
-   select VIDEO
 
 config ARCH_DEFCONFIG
string
@@ -44,6 +40,17 @@ config ACPI
 
 config HAS_GICV3
bool
+   prompt "GICv3 driver"
+   default y
+
+config HAS_MEM_ACCESS
+   bool
+   prompt "Memory Access and VM events"
+   default y
+   ---help---
+
+ Framework to configure memory access types for guests and receive
+ related events in userspace.
 
 config HAS_ITS
 bool
diff --git a/xen/drivers/video/Kconfig b/xen/drivers/video/Kconfig
index 52e8ce6..26daf9a 100644
--- a/xen/drivers/video/Kconfig
+++ b/xen/drivers/video/Kconfig
@@ -13,4 +13,10 @@ config VGA
  If unsure, say Y.
 
 config HAS_ARM_HDLCD
-   bool
+   bool "ARM HDLCD driver"
+   default n
+   select VIDEO
+   ---help---
+ Enable Xen video output to ARM HDLCD graphic controllers.
+
+ if unsure, say N.
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/6] arm: make it possible to enable/disable UART drivers

2018-04-18 Thread Stefano Stabellini
All the UART drivers are silent options. Add one line descriptions so
that can be de/selected via menuconfig.

Signed-off-by: Stefano Stabellini 
---
 xen/drivers/char/Kconfig | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/char/Kconfig b/xen/drivers/char/Kconfig
index cc78ec3..b6fcefd 100644
--- a/xen/drivers/char/Kconfig
+++ b/xen/drivers/char/Kconfig
@@ -1,11 +1,11 @@
 config HAS_NS16550
-   bool
+   bool "NS16550 UART driver"
default y
help
  This selects the 16550-series UART support. For most systems, say Y.
 
 config HAS_CADENCE_UART
-   bool
+   bool "Xilinx Cadence UART driver"
default y
depends on ARM_64
help
@@ -13,7 +13,7 @@ config HAS_CADENCE_UART
  based board, say Y.
 
 config HAS_MVEBU
-   bool
+   bool "Marvell MVEBU UART driver"
default y
depends on ARM_64
help
@@ -21,7 +21,7 @@ config HAS_MVEBU
  based board, say Y.
 
 config HAS_PL011
-   bool
+   bool "ARM PL011 UART driver"
default y
depends on ARM
help
@@ -29,7 +29,7 @@ config HAS_PL011
  an Integrator/PP2, Integrator/CP or Versatile platform, say Y.
 
 config HAS_EXYNOS4210
-   bool
+   bool "Samsung Exynos 4210 UART driver"
default y
depends on ARM_32
help
@@ -37,7 +37,7 @@ config HAS_EXYNOS4210
  Exynos based board, say Y.
 
 config HAS_OMAP
-   bool
+   bool "Texas Instruments OMAP UART driver"
default y
depends on ARM_32
help
@@ -45,7 +45,7 @@ config HAS_OMAP
  Instruments based CPU, say Y.
 
 config HAS_SCIF
-   bool
+   bool "SuperH SCI(F) UART driver"
default y
depends on ARM
help
@@ -53,7 +53,7 @@ config HAS_SCIF
  or Renesas R-Car Gen 2/3 based board say Y.
 
 config HAS_EHCI
-   bool
+   bool "EHCI UART driver"
help
  This selects the USB based EHCI debug port to be used as a UART. If
  you have an x86 based system with USB, say Y.
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/6] arm: make it possible to disable the SMMU driver

2018-04-18 Thread Stefano Stabellini
Introduce a Kconfig option for the ARM SMMUv2 driver.

Signed-off-by: Stefano Stabellini 
CC: jbeul...@suse.com
---
 xen/drivers/passthrough/Kconfig  | 2 ++
 xen/drivers/passthrough/arm/Kconfig  | 7 +++
 xen/drivers/passthrough/arm/Makefile | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 xen/drivers/passthrough/arm/Kconfig

diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig
index 8d90b67..2fe94fa 100644
--- a/xen/drivers/passthrough/Kconfig
+++ b/xen/drivers/passthrough/Kconfig
@@ -1,3 +1,5 @@
 
 config HAS_PASSTHROUGH
bool
+
+source "drivers/passthrough/arm/Kconfig"
diff --git a/xen/drivers/passthrough/arm/Kconfig 
b/xen/drivers/passthrough/arm/Kconfig
new file mode 100644
index 000..98318d9
--- /dev/null
+++ b/xen/drivers/passthrough/arm/Kconfig
@@ -0,0 +1,7 @@
+
+config HAS_SMMUv2
+   bool "ARM SMMUv2 driver"
+   default y
+   depends on ARM
+   ---help---
+ Driver for the ARM SMMU version 2, a popular IOMMU by ARM.
diff --git a/xen/drivers/passthrough/arm/Makefile 
b/xen/drivers/passthrough/arm/Makefile
index f4cd26e..4db273a 100644
--- a/xen/drivers/passthrough/arm/Makefile
+++ b/xen/drivers/passthrough/arm/Makefile
@@ -1,2 +1,2 @@
 obj-y += iommu.o
-obj-y += smmu.o
+obj-$(HAS_SMMUv2) += smmu.o
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.11] x86/spec_ctrl: Updates to retpoline-safety decision making

2018-04-18 Thread Andrew Cooper
All of this is as recommended by the Intel whitepaper:

https://software.intel.com/sites/default/files/managed/1d/46/Retpoline-A-Branch-Target-Injection-Mitigation.pdf

The RSB-alternative bit in MSR_ARCH_CAPABILITIES may be set by a hypervisor to
indicate that the virtual machine may migrate to a processor which isn't
retpoline-safe.  Introduce a shortened name (to reduce code volume), treat it
as authorative in retpoline_safe(), and print its value along with the other
ARCH_CAPS bits.

The exact processor models which do have RSB semantics which fall back to BTB
predictions are enumerated, and include Kabylake and Cannonlake.  Leave a
printk() in the default case to help identify cases which aren't covered.

The exact microcode versions from Broadwell RSB-safety are taken from the
referenced microcode update file (adjusting for the known-bad microcode
versions).  In practice, this means that all Broadwell hardware with
up-to-date microcode will use retpoline in preference to IBRS.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Juergen Gross 

This should be backported to everywhere which has Spectre mitigations, and
therefore should be considered for 4.11 at this point.
---
 xen/arch/x86/spec_ctrl.c| 48 +++--
 xen/include/asm-x86/msr-index.h |  1 +
 2 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 5b5ec90..aff06f0 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -113,12 +113,13 @@ static void __init print_details(enum ind_thunk thunk)
 printk(XENLOG_DEBUG "Speculative mitigation facilities:\n");
 
 /* Hardware features which pertain to speculative mitigations. */
-printk(XENLOG_DEBUG "  Hardware features:%s%s%s%s%s\n",
+printk(XENLOG_DEBUG "  Hardware features:%s%s%s%s%s%s\n",
(_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
(_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP" : "",
(e8b  & cpufeat_mask(X86_FEATURE_IBPB))  ? " IBPB"  : "",
(caps & ARCH_CAPABILITIES_IBRS_ALL)  ? " IBRS_ALL"  : "",
-   (caps & ARCH_CAPABILITIES_RDCL_NO)   ? " RDCL_NO"   : "");
+   (caps & ARCH_CAPABILITIES_RDCL_NO)   ? " RDCL_NO"   : "",
+   (caps & ARCH_CAPS_RSBA)  ? " RBSA"  : "");
 
 /* Compiled-in support which pertains to BTI mitigations. */
 if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) )
@@ -151,6 +152,20 @@ static bool __init retpoline_safe(void)
  boot_cpu_data.x86 != 6 )
 return false;
 
+if ( boot_cpu_has(X86_FEATURE_ARCH_CAPS) )
+{
+uint64_t caps;
+
+rdmsrl(MSR_ARCH_CAPABILITIES, caps);
+
+/*
+ * RBSA may be set by a hypervisor to indicate that we may move to a
+ * processor which isn't retpoline-safe.
+ */
+if ( caps & ARCH_CAPS_RSBA )
+return false;
+}
+
 switch ( boot_cpu_data.x86_model )
 {
 case 0x17: /* Penryn */
@@ -177,18 +192,37 @@ static bool __init retpoline_safe(void)
  * versions.
  */
 case 0x3d: /* Broadwell */
-return ucode_rev >= 0x28;
+return ucode_rev >= 0x2a;
 case 0x47: /* Broadwell H */
-return ucode_rev >= 0x1b;
+return ucode_rev >= 0x1d;
 case 0x4f: /* Broadwell EP/EX */
-return ucode_rev >= 0xb25;
+return ucode_rev >= 0xb21;
 case 0x56: /* Broadwell D */
-return false; /* TBD. */
+switch ( boot_cpu_data.x86_mask )
+{
+case 2:  return ucode_rev >= 0x15;
+case 3:  return ucode_rev >= 0x712;
+case 4:  return ucode_rev >= 0xf11;
+case 5:  return ucode_rev >= 0xe09;
+default: return false;
+}
+break;
 
 /*
- * Skylake and later processors are not retpoline-safe.
+ * Skylake, Kabylake and Cannonlake processors are not retpoline-safe.
  */
+case 0x4e:
+case 0x55:
+case 0x5e:
+case 0x66:
+case 0x67:
+case 0x8e:
+case 0x9e:
+return false;
+
 default:
+printk("Unrecognised CPU model %#x - assuming not reptpoline safe\n",
+   boot_cpu_data.x86_model);
 return false;
 }
 }
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 8416756..c9f44eb 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -42,6 +42,7 @@
 #define MSR_ARCH_CAPABILITIES  0x010a
 #define ARCH_CAPABILITIES_RDCL_NO  (_AC(1, ULL) << 0)
 #define ARCH_CAPABILITIES_IBRS_ALL (_AC(1, ULL) << 1)
+#define ARCH_CAPS_RSBA (_AC(1, ULL) << 2)
 
 /* Intel MSRs. Some also available on other CPUs */
 #define MSR_IA32_PERFCTR0  0x00c1
-- 
2.1.4


___

Re: [Xen-devel] Setting up a call to discuss PCI Emulation - Future Direction

2018-04-18 Thread Stefano Stabellini
On Sat, 14 Apr 2018, Paul Durrant wrote:
> > -Original Message-
> > From: Roger Pau Monne
> > Sent: 13 April 2018 20:53
> > To: Lars Kurth 
> > Cc: xen-devel ; Daniel Smith
> > ; Alexey G ; Stefano
> > Stabellini ; Julien Grall ; 
> > Paul
> > Durrant ; Christopher Clark
> > ; Rich Persaud 
> > Subject: Re: Setting up a call to discuss PCI Emulation - Future Direction
> > 
> > On Fri, Apr 13, 2018 at 12:59:15PM +0100, Lars Kurth wrote:
> > >
> > >
> > > On 13/04/2018, 11:01, "Roger Pau Monne"  wrote:
> > >
> > > On Thu, Apr 12, 2018 at 05:50:00PM +0100, Lars Kurth wrote:
> > > >
> > > >
> > > > On 12/04/2018, 17:41, "Roger Pau Monne" 
> > wrote:
> > > >
> > > > On Thu, Apr 12, 2018 at 05:32:57PM +0100, Lars Kurth wrote:
> > > >
> > > >
> > > > >may work. For me Mon, Wed and Fri’s generally work at those
> > time-slots.
> > > > >Next week is a little busy for me, so I would prefer the 
> > > following
> > week.
> > > > >If you could fill out the following Google poll, if this 
> > > week works
> > that
> > > > >would be great. Otherwise please scream.
> > > >
> > > > I'm afraid I'm on vacations from the 21st to the 29th of April, 
> > > so I
> > > > won't be able to join the meeting unless we move it to the week
> > after.
> > > > Let's see what people think of the current dates.
> > > >
> > > > Roger.
> > > >
> > > > Hi, I changed the dates to the week after. Poll so far has been
> > invalidated.
> > > >
> > > > See https://doodle.com/poll/gdnmcrvnibmw563n
> > >
> > > Thanks! I've already fixed my vote.
> > >
> > > I guess this will come later, but we need a clear agenda of items
> > > because the x86 and ARM topics are probably going to be completely
> > > different (albeit all related to PCI).
> > >
> > > Royger: I am OK with trying to get a draft agenda in place in this e-mail
> > thread. But I can't drive this, as I don’t understand the issues. But I am 
> > happy
> > to collate everything as for the x86 call and write up minutes
> > 
> > On the x86 side:
> > 
> >  - Q35 HVM emulation, adding MCFG support to guests.
> 
> One of the issues here is support for multiple external emulators w.r.t. 
> config space access.
>
> >  - PVH guest pci-passthrough: using the internal vPCI infrastructure.
> 
> Let's not distinguish between HVM or PVH here. We want to use the same 
> passthrough mechanisms for both.

Ideally, we'll use the same mechanism for ARM too.___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCHv2] x86/xen: Remove use of VLAs

2018-04-18 Thread Laura Abbott

There's an ongoing effort to remove VLAs[1] from the kernel to eventually
turn on -Wvla. It turns out, the few VLAs in use in Xen produce only a
single entry array that is always bounded by GDT_SIZE. Clean up the code to
get rid of the VLA and the loop.

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Laura Abbott 
---
v2: Updated the code to reflect that we know size is always bounded by
GDT_SIZE. This gets rid of the array and the loop. I can throw a few
more comments in there if someone thinks they need to be updated.
---
 arch/x86/xen/enlighten_pv.c | 84 -
 1 file changed, 29 insertions(+), 55 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index c36d23aa6c35..1254f2fa3a89 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -421,45 +421,32 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
 {
unsigned long va = dtr->address;
unsigned int size = dtr->size + 1;
-   unsigned pages = DIV_ROUND_UP(size, PAGE_SIZE);
-   unsigned long frames[pages];
-   int f;
-
-   /*
-* A GDT can be up to 64k in size, which corresponds to 8192
-* 8-byte entries, or 16 4k pages..
-*/
+   unsigned long pfn, mfn;
+   int level;
+   pte_t *ptep;
+   void *virt;
 
-   BUG_ON(size > 65536);
+   BUG_ON(size > GDT_SIZE);
BUG_ON(va & ~PAGE_MASK);
 
-   for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) {
-   int level;
-   pte_t *ptep;
-   unsigned long pfn, mfn;
-   void *virt;
-
-   /*
-* The GDT is per-cpu and is in the percpu data area.
-* That can be virtually mapped, so we need to do a
-* page-walk to get the underlying MFN for the
-* hypercall.  The page can also be in the kernel's
-* linear range, so we need to RO that mapping too.
-*/
-   ptep = lookup_address(va, );
-   BUG_ON(ptep == NULL);
-
-   pfn = pte_pfn(*ptep);
-   mfn = pfn_to_mfn(pfn);
-   virt = __va(PFN_PHYS(pfn));
+   /*
+* The GDT is per-cpu and is in the percpu data area.
+* That can be virtually mapped, so we need to do a
+* page-walk to get the underlying MFN for the
+* hypercall.  The page can also be in the kernel's
+* linear range, so we need to RO that mapping too.
+*/
+   ptep = lookup_address(va, );
+   BUG_ON(ptep == NULL);
 
-   frames[f] = mfn;
+   pfn = pte_pfn(*ptep);
+   mfn = pfn_to_mfn(pfn);
+   virt = __va(PFN_PHYS(pfn));
 
-   make_lowmem_page_readonly((void *)va);
-   make_lowmem_page_readonly(virt);
-   }
+   make_lowmem_page_readonly((void *)va);
+   make_lowmem_page_readonly(virt);
 
-   if (HYPERVISOR_set_gdt(frames, size / sizeof(struct desc_struct)))
+   if (HYPERVISOR_set_gdt(, size / sizeof(struct desc_struct)))
BUG();
 }
 
@@ -470,34 +457,21 @@ static void __init xen_load_gdt_boot(const struct 
desc_ptr *dtr)
 {
unsigned long va = dtr->address;
unsigned int size = dtr->size + 1;
-   unsigned pages = DIV_ROUND_UP(size, PAGE_SIZE);
-   unsigned long frames[pages];
-   int f;
-
-   /*
-* A GDT can be up to 64k in size, which corresponds to 8192
-* 8-byte entries, or 16 4k pages..
-*/
+   unsigned long pfn, mfn;
+   pte_t pte;
 
-   BUG_ON(size > 65536);
+   BUG_ON(size > GDT_SIZE);
BUG_ON(va & ~PAGE_MASK);
 
-   for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) {
-   pte_t pte;
-   unsigned long pfn, mfn;
+   pfn = virt_to_pfn(va);
+   mfn = pfn_to_mfn(pfn);
 
-   pfn = virt_to_pfn(va);
-   mfn = pfn_to_mfn(pfn);
+   pte = pfn_pte(pfn, PAGE_KERNEL_RO);
 
-   pte = pfn_pte(pfn, PAGE_KERNEL_RO);
-
-   if (HYPERVISOR_update_va_mapping((unsigned long)va, pte, 0))
-   BUG();
-
-   frames[f] = mfn;
-   }
+   if (HYPERVISOR_update_va_mapping((unsigned long)va, pte, 0))
+   BUG();
 
-   if (HYPERVISOR_set_gdt(frames, size / sizeof(struct desc_struct)))
+   if (HYPERVISOR_set_gdt(, size / sizeof(struct desc_struct)))
BUG();
 }
 
-- 
2.14.3


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Dongwon Kim
On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote:
> On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> >On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> >>On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> >>>Yeah, I definitely agree on the idea of expanding the use case to the
> >>>general domain where dmabuf sharing is used. However, what you are
> >>>targetting with proposed changes is identical to the core design of
> >>>hyper_dmabuf.
> >>>
> >>>On top of this basic functionalities, hyper_dmabuf has driver level
> >>>inter-domain communication, that is needed for dma-buf remote tracking
> >>>(no fence forwarding though), event triggering and event handling, extra
> >>>meta data exchange and hyper_dmabuf_id that represents grefs
> >>>(grefs are shared implicitly on driver level)
> >>This really isn't a positive design aspect of hyperdmabuf imo. The core
> >>code in xen-zcopy (ignoring the ioctl side, which will be cleaned up) is
> >>very simple & clean.
> >>
> >>If there's a clear need later on we can extend that. But for now xen-zcopy
> >>seems to cover the basic use-case needs, so gets the job done.
> >>
> >>>Also it is designed with frontend (common core framework) + backend
> >>>(hyper visor specific comm and memory sharing) structure for portability.
> >>>We just can't limit this feature to Xen because we want to use the same
> >>>uapis not only for Xen but also other applicable hypervisor, like ACORN.
> >>See the discussion around udmabuf and the needs for kvm. I think trying to
> >>make an ioctl/uapi that works for multiple hypervisors is misguided - it
> >>likely won't work.
> >>
> >>On top of that the 2nd hypervisor you're aiming to support is ACRN. That's
> >>not even upstream yet, nor have I seen any patches proposing to land linux
> >>support for ACRN. Since it's not upstream, it doesn't really matter for
> >>upstream consideration. I'm doubting that ACRN will use the same grant
> >>references as xen, so the same uapi won't work on ACRN as on Xen anyway.
> >Yeah, ACRN doesn't have grant-table. Only Xen supports it. But that is why
> >hyper_dmabuf has been architectured with the concept of backend.
> >If you look at the structure of backend, you will find that
> >backend is just a set of standard function calls as shown here:
> >
> >struct hyper_dmabuf_bknd_ops {
> > /* backend initialization routine (optional) */
> > int (*init)(void);
> >
> > /* backend cleanup routine (optional) */
> > int (*cleanup)(void);
> >
> > /* retreiving id of current virtual machine */
> > int (*get_vm_id)(void);
> >
> > /* get pages shared via hypervisor-specific method */
> > int (*share_pages)(struct page **pages, int vm_id,
> >int nents, void **refs_info);
> >
> > /* make shared pages unshared via hypervisor specific method */
> > int (*unshare_pages)(void **refs_info, int nents);
> >
> > /* map remotely shared pages on importer's side via
> >  * hypervisor-specific method
> >  */
> > struct page ** (*map_shared_pages)(unsigned long ref, int vm_id,
> >int nents, void **refs_info);
> >
> > /* unmap and free shared pages on importer's side via
> >  * hypervisor-specific method
> >  */
> > int (*unmap_shared_pages)(void **refs_info, int nents);
> >
> > /* initialize communication environment */
> > int (*init_comm_env)(void);
> >
> > void (*destroy_comm)(void);
> >
> > /* upstream ch setup (receiving and responding) */
> > int (*init_rx_ch)(int vm_id);
> >
> > /* downstream ch setup (transmitting and parsing responses) */
> > int (*init_tx_ch)(int vm_id);
> >
> > int (*send_req)(int vm_id, struct hyper_dmabuf_req *req, int wait);
> >};
> >
> >All of these can be mapped with any hypervisor specific implementation.
> >We designed backend implementation for Xen using grant-table, Xen event
> >and ring buffer communication. For ACRN, we have another backend using 
> >Virt-IO
> >for both memory sharing and communication.
> >
> >We tried to define this structure of backend to make it general enough (or
> >it can be even modified or extended to support more cases.) so that it can
> >fit to other hypervisor cases. Only requirements/expectation on the 
> >hypervisor
> >are page-level memory sharing and inter-domain communication, which I think
> >are standard features of modern hypervisor.
> >
> >And please review common UAPIs that hyper_dmabuf and xen-zcopy supports. They
> >are very general. One is getting FD (dmabuf) and get those shared. The other
> >is generating dmabuf from global handle (secure handle hiding gref behind 
> >it).
> >On top of this, hyper_dmabuf has "unshare" and "query" which are also useful
> >for any cases.
> >
> >So I don't know why we wouldn't want to try to make these 

Re: [Xen-devel] [PATCH for-4.11] x86: Use spec_ctrl_{enter, exit}_idle() in the S3/S5 path

2018-04-18 Thread Andrew Cooper
On 18/04/18 17:14, Jan Beulich wrote:
 On 18.04.18 at 18:02,  wrote:
>> On 18/04/18 16:43, Jan Beulich wrote:
>> On 18.04.18 at 12:43,  wrote:
 This avoids opencoding the functionality (and missing one bit of it), and 
 and
 some comments explaining what is going on.
>>> Missing which bit of it? The MSR writes aren't strictly necessary afaict, 
>>> and
>>> functionally clearing bti_ist_info is all that's needed for the entry path 
>>> of
>>> interest, while clearing use_shadow_spec_ctrl is all that's needed for the
>>> exit-to-Xen path. Hence I'm don't see (yet) what bug it is you think this
>>> fixes.
>> It is expected that we don't hand off control to the firmware in cases
>> like this with the mitigations in place.  Here, it doesn't actually
>> matter too much, but there is no point having the mitigations in effect
>> when we're the final CPU and trying to shut down.
> With an improved description the patch can certainly have my R-b; the
> code change itself is clearly fine with me.

How about:

The main purpose of this patch is to avoid opencoding the recovery logic
at the end, but also has the positive side effect of relaxing the
SPEC_CTRL mitigations when working to shut the final CPU down.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.11] x86: Use spec_ctrl_{enter, exit}_idle() in the S3/S5 path

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 18:02,  wrote:
> On 18/04/18 16:43, Jan Beulich wrote:
> On 18.04.18 at 12:43,  wrote:
>>> This avoids opencoding the functionality (and missing one bit of it), and 
>>> and
>>> some comments explaining what is going on.
>> Missing which bit of it? The MSR writes aren't strictly necessary afaict, and
>> functionally clearing bti_ist_info is all that's needed for the entry path of
>> interest, while clearing use_shadow_spec_ctrl is all that's needed for the
>> exit-to-Xen path. Hence I'm don't see (yet) what bug it is you think this
>> fixes.
> 
> It is expected that we don't hand off control to the firmware in cases
> like this with the mitigations in place.  Here, it doesn't actually
> matter too much, but there is no point having the mitigations in effect
> when we're the final CPU and trying to shut down.

With an improved description the patch can certainly have my R-b; the
code change itself is clearly fine with me.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 1/9] x86/xpti: avoid copying L4 page table contents when possible

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 10:30,  wrote:
> @@ -160,5 +161,20 @@ unsigned int flush_area_local(const void *va, unsigned 
> int flags)
>  
>  local_irq_restore(irqfl);
>  
> +if ( flags & FLUSH_ROOT_PGTBL )
> +get_cpu_info()->root_pgt_changed = true;
> +
>  return flags;
>  }
> +
> +void flush_root_pgt_mask(cpumask_t *mask)

const

> +{
> +int cpu;

unsigned int

> +struct cpu_info *cpu_info;

Declaration in most narrow possible scope please.

> +for_each_cpu(cpu, mask)
> +{
> +cpu_info = (struct cpu_info *)(stack_base[cpu] + STACK_SIZE) - 1;
> +cpu_info->root_pgt_changed = true;
> +}
> +}

So why is this not sending an IPI to trigger the FLUSH_ROOT_PGTBL processing
visible above? Especially the cast you need here looks quite undesirable. Plus I
think this is racy (even if quite unlikely to trigger) with a CPU being brought 
down:
What if stack_base[cpu] becomes NULL after you've found the respective bit in
mask set. Especially when Xen runs itself virtualized, (almost) arbitrarily long
periods of time may pass between any two instructions.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 17:54,  wrote:
> On 18/04/18 17:45, Jan Beulich wrote:
> On 18.04.18 at 17:33,  wrote:
>>> On 18/04/18 17:29, Jan Beulich wrote:
>>> On 18.04.18 at 10:30,  wrote:
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -967,7 +967,7 @@ static int shadow_set_l4e(struct domain *d,
>  sh_put_ref(d, osl3mfn, paddr);
>  }
>  
> -if ( !cpu_has_no_xpti )
> +if ( is_pv_domain(d) && d->arch.pv_domain.xpti )
>  /*
>   * Lazy flushing is enough: either we do a TLB flush right 
> afterwards
>   * which will pick up the new root page table on all affected 
> cpus

 How come the is_pv_domain() is appearing only here?
>>>
>>> It is mandatory for testing the per-domain xpti flag. I could add it in
>>> patch 1 already if you like that better.
>> 
>> Well, if you added it earlier, some unnecessary IPIs would be suppressed
>> right away.
> 
> Which IPIs? There is no IPI involved here. We are just setting the flags
> from the current cpu for all cpus which need to picḱ it up. That's the
> reason for the comment regarding "lazy flushing".

Oh, I did assume flush_root_pgt_mask() would be a function that I had
seen in prior versions, invoking the processing of FLUSH_ROOT_PGTBL on
remote CPUs.

Clearly with the introduction of a new function you should have dropped
my R-b; I'll comment there.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.11] x86: Use spec_ctrl_{enter, exit}_idle() in the S3/S5 path

2018-04-18 Thread Andrew Cooper
On 18/04/18 16:43, Jan Beulich wrote:
 On 18.04.18 at 12:43,  wrote:
>> This avoids opencoding the functionality (and missing one bit of it), and and
>> some comments explaining what is going on.
> Missing which bit of it? The MSR writes aren't strictly necessary afaict, and
> functionally clearing bti_ist_info is all that's needed for the entry path of
> interest, while clearing use_shadow_spec_ctrl is all that's needed for the
> exit-to-Xen path. Hence I'm don't see (yet) what bug it is you think this
> fixes.
>
> Also s/and$/add/.

I hoped noone else would notice.  Already found and fixed.

It is expected that we don't hand off control to the firmware in cases
like this with the mitigations in place.  Here, it doesn't actually
matter too much, but there is no point having the mitigations in effect
when we're the final CPU and trying to shut down.

>
>> --- a/xen/arch/x86/acpi/power.c
>> +++ b/xen/arch/x86/acpi/power.c
>> @@ -213,7 +213,8 @@ static int enter_state(u32 state)
>>  error = 0;
>>  
>>  ci = get_cpu_info();
>> -ci->use_shadow_spec_ctrl = 0;
>> +spec_ctrl_enter_idle(ci);
>> +/* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
>>  ci->bti_ist_info = 0;
>>  
>>  ACPI_FLUSH_CPU_CACHE();
>> @@ -257,10 +258,9 @@ static int enter_state(u32 state)
>>  if ( !recheck_cpu_features(0) )
>>  panic("Missing previously available feature(s).");
>>  
>> +/* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
>>  ci->bti_ist_info = default_bti_ist_info;
>> -asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)
>> -  :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0)
>> -  : "memory");
>> +spec_ctrl_exit_idle(ci);
> The use of "idle" methods is a slight abuse here (fundamentally "idle" is
> about C states), but yes, perhaps in a way a system in S3/S5 can be
> viewed sort of (in a heavyweight way) idle.

That is a matter of interpretation of "idle" and the use here is fine. 
Either way, the behaviour WRT MSR_SPEC_CTRL is identical whether we are
talking about C states or S states, and the purpose of the change is to
not opencode the recovery logic.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5] x86/setup: properly update PTEs if src/dst overlaps when relocating Xen image

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 12:26,  wrote:
> @@ -1019,6 +1020,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  bootsym(trampoline_xen_phys_start) = e;
>  
>  /*
> + * All PTEs with PFNs above pte_update_limit
> + * were updated earlier. Skip them.
> + */
> +pte_update_limit = PFN_DOWN(e + XEN_IMG_OFFSET);

I don't understand the comment: No PTE updates happen before this point
afaict. It is just that PTEs pointing above that address are not candidates
for relocation. I think the comment should at least mention the overlap
scenario your trying to deal with, with the important point being that there
may actually be PTEs pointing into [e, e + XEN_IMG_OFFSET).

The actual code adjustments look fine to me now, albeit I wonder whether
>= wouldn't be more appropriate to use.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 17:33,  wrote:
> On 18/04/18 17:29, Jan Beulich wrote:
> On 18.04.18 at 10:30,  wrote:
>>> --- a/xen/arch/x86/mm/shadow/multi.c
>>> +++ b/xen/arch/x86/mm/shadow/multi.c
>>> @@ -967,7 +967,7 @@ static int shadow_set_l4e(struct domain *d,
>>>  sh_put_ref(d, osl3mfn, paddr);
>>>  }
>>>  
>>> -if ( !cpu_has_no_xpti )
>>> +if ( is_pv_domain(d) && d->arch.pv_domain.xpti )
>>>  /*
>>>   * Lazy flushing is enough: either we do a TLB flush right 
>>> afterwards
>>>   * which will pick up the new root page table on all affected cpus
>> 
>> How come the is_pv_domain() is appearing only here?
> 
> It is mandatory for testing the per-domain xpti flag. I could add it in
> patch 1 already if you like that better.

Well, if you added it earlier, some unnecessary IPIs would be suppressed
right away.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.11] x86: Use spec_ctrl_{enter, exit}_idle() in the S3/S5 path

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 12:43,  wrote:
> This avoids opencoding the functionality (and missing one bit of it), and and
> some comments explaining what is going on.

Missing which bit of it? The MSR writes aren't strictly necessary afaict, and
functionally clearing bti_ist_info is all that's needed for the entry path of
interest, while clearing use_shadow_spec_ctrl is all that's needed for the
exit-to-Xen path. Hence I'm don't see (yet) what bug it is you think this
fixes.

Also s/and$/add/.

> --- a/xen/arch/x86/acpi/power.c
> +++ b/xen/arch/x86/acpi/power.c
> @@ -213,7 +213,8 @@ static int enter_state(u32 state)
>  error = 0;
>  
>  ci = get_cpu_info();
> -ci->use_shadow_spec_ctrl = 0;
> +spec_ctrl_enter_idle(ci);
> +/* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
>  ci->bti_ist_info = 0;
>  
>  ACPI_FLUSH_CPU_CACHE();
> @@ -257,10 +258,9 @@ static int enter_state(u32 state)
>  if ( !recheck_cpu_features(0) )
>  panic("Missing previously available feature(s).");
>  
> +/* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
>  ci->bti_ist_info = default_bti_ist_info;
> -asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)
> -  :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0)
> -  : "memory");
> +spec_ctrl_exit_idle(ci);

The use of "idle" methods is a slight abuse here (fundamentally "idle" is
about C states), but yes, perhaps in a way a system in S3/S5 can be
viewed sort of (in a heavyweight way) idle.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 9/9] xen/x86: use PCID feature

2018-04-18 Thread Juergen Gross
On 18/04/18 17:32, Jan Beulich wrote:
 On 18.04.18 at 11:37,  wrote:
>> On 18/04/18 11:13, Jan Beulich wrote:
>> On 18.04.18 at 10:30,  wrote:
 Avoid flushing the complete TLB when switching %cr3 for mitigation of
 Meltdown by using the PCID feature if available.

 We are using 4 PCID values for a 64 bit pv domain subject to XPTI and
 2 values for the non-XPTI case:

 - guest active and in kernel mode
 - guest active and in user mode
 - hypervisor active and guest in user mode (XPTI only)
 - hypervisor active and guest in kernel mode (XPTI only)

 We use PCID only if PCID _and_ INVPCID are supported. With PCID in use
 we disable global pages in cr4. A command line parameter controls in
 which cases PCID is being used.

 As the non-XPTI case has shown not to perform better with PCID at least
 on some machines the default is to use PCID only for domains subject to
 XPTI.

 With PCID enabled we always disable global pages. This avoids having to
 either flush the complete TLB or do a cycle through all PCID values
 when invalidating a single global page.

 Signed-off-by: Juergen Gross 
 Reviewed-by: Jan Beulich 
 ---
 V6.1:
 - address some minor comments (Jan Beulich)
>>>
>>> No v7 changes? Afaict ...
>>>
 @@ -717,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d,
  update_cr3(v);
  
  /* We run on dom0's page tables for the final part of the build 
>> process. */
 -switch_cr3_cr4(v->arch.cr3, read_cr4());
 +switch_cr3_cr4(cr3_pa(v->arch.cr3), read_cr4());
>>>
>>> ... at least this was added. And to be honest I'm not convinced cr3_pa() is
>>> the right construct to use here: It's neither clear why the other bits don't
>>> matter at this point, nor why any of the bits that you mask out this way
>>> need masking out in the first place (e.g. why, in the crash case that Andrew
>>> had observed, the noflush bit was wrongly set).
>>
>> At this point in time we only want to use another cr3 value. So PCIDs
>> should _not_ be used as this would require adapting cr4, resulting in
>> writing the cr3_pa() bits only. It would have been possible to use
>> write_cr3() here, but only together with open coding a TLB flush in
>> order to avoid any global pages remaining in the TLB. So I decided to
>> use switch_cr3_cr4().
>>
>> The noflush bit was set as dom0 is subject to XPTI and PCID usage and
>> update_cr3() is updating v->arch.cr3 accordingly.
>>
>> I know you are not fond of setting noflush in v->arch.cr3. The only
>> sensible alternative to that would be adding something like
>> v->arch.cr3_pcid_bits being or-ed to the cr3 value in restore_all_guest.
>> This would let us get rid of having to use cr3_pa() in some places
>> while adding a (very little) performance penalty.
> 
> Well, I had accepted the current approach as the (performance wise)
> better alternative, but with Andrew sharing that original concern of mine,
> and with there having been a first real problem resulting from that
> approach, more seriously considering the alternative certainly seems
> necessary. Andrew?

I'm fine with both variants. I just need to know which one will be
accepted. :-)


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 9/9] xen/x86: use PCID feature

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 11:37,  wrote:
> On 18/04/18 11:13, Jan Beulich wrote:
> On 18.04.18 at 10:30,  wrote:
>>> Avoid flushing the complete TLB when switching %cr3 for mitigation of
>>> Meltdown by using the PCID feature if available.
>>>
>>> We are using 4 PCID values for a 64 bit pv domain subject to XPTI and
>>> 2 values for the non-XPTI case:
>>>
>>> - guest active and in kernel mode
>>> - guest active and in user mode
>>> - hypervisor active and guest in user mode (XPTI only)
>>> - hypervisor active and guest in kernel mode (XPTI only)
>>>
>>> We use PCID only if PCID _and_ INVPCID are supported. With PCID in use
>>> we disable global pages in cr4. A command line parameter controls in
>>> which cases PCID is being used.
>>>
>>> As the non-XPTI case has shown not to perform better with PCID at least
>>> on some machines the default is to use PCID only for domains subject to
>>> XPTI.
>>>
>>> With PCID enabled we always disable global pages. This avoids having to
>>> either flush the complete TLB or do a cycle through all PCID values
>>> when invalidating a single global page.
>>>
>>> Signed-off-by: Juergen Gross 
>>> Reviewed-by: Jan Beulich 
>>> ---
>>> V6.1:
>>> - address some minor comments (Jan Beulich)
>> 
>> No v7 changes? Afaict ...
>> 
>>> @@ -717,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d,
>>>  update_cr3(v);
>>>  
>>>  /* We run on dom0's page tables for the final part of the build 
> process. */
>>> -switch_cr3_cr4(v->arch.cr3, read_cr4());
>>> +switch_cr3_cr4(cr3_pa(v->arch.cr3), read_cr4());
>> 
>> ... at least this was added. And to be honest I'm not convinced cr3_pa() is
>> the right construct to use here: It's neither clear why the other bits don't
>> matter at this point, nor why any of the bits that you mask out this way
>> need masking out in the first place (e.g. why, in the crash case that Andrew
>> had observed, the noflush bit was wrongly set).
> 
> At this point in time we only want to use another cr3 value. So PCIDs
> should _not_ be used as this would require adapting cr4, resulting in
> writing the cr3_pa() bits only. It would have been possible to use
> write_cr3() here, but only together with open coding a TLB flush in
> order to avoid any global pages remaining in the TLB. So I decided to
> use switch_cr3_cr4().
> 
> The noflush bit was set as dom0 is subject to XPTI and PCID usage and
> update_cr3() is updating v->arch.cr3 accordingly.
> 
> I know you are not fond of setting noflush in v->arch.cr3. The only
> sensible alternative to that would be adding something like
> v->arch.cr3_pcid_bits being or-ed to the cr3 value in restore_all_guest.
> This would let us get rid of having to use cr3_pa() in some places
> while adding a (very little) performance penalty.

Well, I had accepted the current approach as the (performance wise)
better alternative, but with Andrew sharing that original concern of mine,
and with there having been a first real problem resulting from that
approach, more seriously considering the alternative certainly seems
necessary. Andrew?

>> I hope there aren't any other such hidden changes in this version of the
>> series.
> 
> Oh sorry, I missed to update the history.
> 
> I'm not aware of other changes in v8 (apart from patch 1 and the
> resulting needed change in patch 3).

Ah - I hadn't spotted that follow-on change, and I've just sent a small
comment in its regard.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 0/5] ALSA: xen-front: Add Xen para-virtualized frontend driver

2018-04-18 Thread Oleksandr Andrushchenko

On 04/16/2018 09:24 AM, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Please note: this patch series depends on [3].

The dependency is now merged into Xen kernel tree [4] for-linus-4.17


This patch series adds support for Xen [1] para-virtualized
sound frontend driver. It implements the protocol from
include/xen/interface/io/sndif.h with the following limitations:
- mute/unmute is not supported
- get/set volume is not supported
Volume control is not supported for the reason that most of the
use-cases (at the moment) are based on scenarious where
unprivileged OS (e.g. Android, AGL etc) use software mixers.

Both capture and playback are supported.

Corresponding backend, implemented as a user-space application, can be
found at [2].

Thank you,
Oleksandr

Changes since v1:
*

1. Moved driver from sound/drivers to sound/xen

2. Coding style changes to better meet Linux Kernel

3. Added explicit back and front synchronization
In order to provide explicit synchronization between backend and
frontend the following changes are introduced in the protocol:
 - add new ring buffer for sending asynchronous events from
   backend to frontend to report number of bytes played by the
   frontend (XENSND_EVT_CUR_POS)
 - introduce trigger events for playback control: start/stop/pause/resume
 - add "req-" prefix to event-channel and ring-ref to unify naming
   of the Xen event channels for requests and events

4. Added explicit back and front parameter negotiation
In order to provide explicit stream parameter negotiation between
backend and frontend the following changes are introduced in the protocol:
add XENSND_OP_HW_PARAM_QUERY request to read/update
configuration space for the parameters given: request passes
desired parameter's intervals/masks and the response to this request
returns allowed min/max intervals/masks to be used.

[1] https://xenproject.org/
[2] https://github.com/xen-troops/snd_be
[3] https://lkml.org/lkml/2018/4/12/522

Oleksandr Andrushchenko (5):
   ALSA: xen-front: Introduce Xen para-virtualized sound frontend driver
   ALSA: xen-front: Read sound driver configuration from Xen store
   ALSA: xen-front: Implement Xen event channel handling
   ALSA: xen-front: Implement handling of shared buffers
   ALSA: xen-front: Implement ALSA virtual sound driver

  sound/Kconfig |   2 +
  sound/Makefile|   2 +-
  sound/xen/Kconfig |  10 +
  sound/xen/Makefile|   9 +
  sound/xen/xen_snd_front.c | 410 +++
  sound/xen/xen_snd_front.h |  57 +++
  sound/xen/xen_snd_front_alsa.c| 830 ++
  sound/xen/xen_snd_front_alsa.h|  23 ++
  sound/xen/xen_snd_front_cfg.c | 517 
  sound/xen/xen_snd_front_cfg.h |  46 +++
  sound/xen/xen_snd_front_evtchnl.c | 478 ++
  sound/xen/xen_snd_front_evtchnl.h |  92 +
  sound/xen/xen_snd_front_shbuf.c   | 193 +
  sound/xen/xen_snd_front_shbuf.h   |  36 ++
  14 files changed, 2704 insertions(+), 1 deletion(-)
  create mode 100644 sound/xen/Kconfig
  create mode 100644 sound/xen/Makefile
  create mode 100644 sound/xen/xen_snd_front.c
  create mode 100644 sound/xen/xen_snd_front.h
  create mode 100644 sound/xen/xen_snd_front_alsa.c
  create mode 100644 sound/xen/xen_snd_front_alsa.h
  create mode 100644 sound/xen/xen_snd_front_cfg.c
  create mode 100644 sound/xen/xen_snd_front_cfg.h
  create mode 100644 sound/xen/xen_snd_front_evtchnl.c
  create mode 100644 sound/xen/xen_snd_front_evtchnl.h
  create mode 100644 sound/xen/xen_snd_front_shbuf.c
  create mode 100644 sound/xen/xen_snd_front_shbuf.h

[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=for-linus-4.17=cd6e992b3aab072cc90839508aaf5573c8f7e066


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] Input: xen-kbdfront - allow better run-time configuration

2018-04-18 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

It is now only possible to control if multi-touch virtual device
is created or not (via the corresponding XenStore entries),
but keyboard and pointer devices are always created.
In some cases this is not desirable. For example, if virtual
keyboard device is exposed to Android then the latter won't
automatically show on-screen keyboard as it expects that a
physical keyboard device can be used for typing.

Make it possible to configure which virtual devices are created
with module parameters:
  - no_ptr_dev=1 if no pointer device needs to be created
  - no_kbd_dev=1 if no keyboard device needs to be created
Keep old behavior by default.

Signed-off-by: Oleksandr Andrushchenko 
Suggested-by: Andrii Chepurnyi 
Tested-by: Andrii Chepurnyi 
---
 drivers/input/misc/xen-kbdfront.c | 159 +-
 1 file changed, 92 insertions(+), 67 deletions(-)

diff --git a/drivers/input/misc/xen-kbdfront.c 
b/drivers/input/misc/xen-kbdfront.c
index d91f3b1c5375..a3306aad40b0 100644
--- a/drivers/input/misc/xen-kbdfront.c
+++ b/drivers/input/misc/xen-kbdfront.c
@@ -51,6 +51,16 @@ module_param_array(ptr_size, int, NULL, 0444);
 MODULE_PARM_DESC(ptr_size,
"Pointing device width, height in pixels (default 800,600)");
 
+static unsigned int no_ptr_dev;
+module_param(no_ptr_dev, uint, 0);
+MODULE_PARM_DESC(no_ptr_dev,
+   "If set then no virtual pointing device exposed to the guest");
+
+static unsigned int no_kbd_dev;
+module_param(no_kbd_dev, uint, 0);
+MODULE_PARM_DESC(no_kbd_dev,
+   "If set then no virtual keyboard device exposed to the guest");
+
 static int xenkbd_remove(struct xenbus_device *);
 static int xenkbd_connect_backend(struct xenbus_device *, struct xenkbd_info 
*);
 static void xenkbd_disconnect_backend(struct xenkbd_info *);
@@ -63,6 +73,9 @@ static void xenkbd_disconnect_backend(struct xenkbd_info *);
 static void xenkbd_handle_motion_event(struct xenkbd_info *info,
   struct xenkbd_motion *motion)
 {
+   if (unlikely(!info->ptr))
+   return;
+
input_report_rel(info->ptr, REL_X, motion->rel_x);
input_report_rel(info->ptr, REL_Y, motion->rel_y);
if (motion->rel_z)
@@ -73,6 +86,9 @@ static void xenkbd_handle_motion_event(struct xenkbd_info 
*info,
 static void xenkbd_handle_position_event(struct xenkbd_info *info,
 struct xenkbd_position *pos)
 {
+   if (unlikely(!info->ptr))
+   return;
+
input_report_abs(info->ptr, ABS_X, pos->abs_x);
input_report_abs(info->ptr, ABS_Y, pos->abs_y);
if (pos->rel_z)
@@ -97,6 +113,9 @@ static void xenkbd_handle_key_event(struct xenkbd_info *info,
return;
}
 
+   if (unlikely(!dev))
+   return;
+
input_event(dev, EV_KEY, key->keycode, value);
input_sync(dev);
 }
@@ -192,7 +211,7 @@ static int xenkbd_probe(struct xenbus_device *dev,
  const struct xenbus_device_id *id)
 {
int ret, i;
-   unsigned int abs, touch;
+   unsigned int touch;
struct xenkbd_info *info;
struct input_dev *kbd, *ptr, *mtouch;
 
@@ -211,24 +230,6 @@ static int xenkbd_probe(struct xenbus_device *dev,
if (!info->page)
goto error_nomem;
 
-   /* Set input abs params to match backend screen res */
-   abs = xenbus_read_unsigned(dev->otherend,
-  XENKBD_FIELD_FEAT_ABS_POINTER, 0);
-   ptr_size[KPARAM_X] = xenbus_read_unsigned(dev->otherend,
- XENKBD_FIELD_WIDTH,
- ptr_size[KPARAM_X]);
-   ptr_size[KPARAM_Y] = xenbus_read_unsigned(dev->otherend,
- XENKBD_FIELD_HEIGHT,
- ptr_size[KPARAM_Y]);
-   if (abs) {
-   ret = xenbus_write(XBT_NIL, dev->nodename,
-  XENKBD_FIELD_REQ_ABS_POINTER, "1");
-   if (ret) {
-   pr_warn("xenkbd: can't request abs-pointer\n");
-   abs = 0;
-   }
-   }
-
touch = xenbus_read_unsigned(dev->nodename,
 XENKBD_FIELD_FEAT_MTOUCH, 0);
if (touch) {
@@ -241,60 +242,84 @@ static int xenkbd_probe(struct xenbus_device *dev,
}
 
/* keyboard */
-   kbd = input_allocate_device();
-   if (!kbd)
-   goto error_nomem;
-   kbd->name = "Xen Virtual Keyboard";
-   kbd->phys = info->phys;
-   kbd->id.bustype = BUS_PCI;
-   kbd->id.vendor = 0x5853;
-   kbd->id.product = 0x;
-
-   __set_bit(EV_KEY, kbd->evbit);
-   for (i = KEY_ESC; i < KEY_UNKNOWN; i++)
-  

[Xen-devel] [PATCH V2] x86/p2m: fixed p2m_change_type_range() start / end check

2018-04-18 Thread Razvan Cojocaru
p2m_change_type_range() handles end > max_mapped_pfn, but not
start > max_mapped_pfn. Check the latter just after grabbing the
lock and bail if true.

Signed-off-by: Razvan Cojocaru 
Suggested-by: George Dunlap 

---
Changes since V1:
 - Added ASSERT()s.
 - Wrapped the new condition in an unlikely().
---
 xen/arch/x86/mm/p2m.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c53cab4..e09b256 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -978,8 +978,19 @@ void p2m_change_type_range(struct domain *d,
 p2m_lock(p2m);
 p2m->defer_nested_flush = 1;
 
+ASSERT(start < end);
+
+if ( unlikely(start > p2m->max_mapped_pfn) )
+{
+ASSERT(!p2m_is_hostp2m(p2m));
+p2m_unlock(p2m);
+return;
+}
+
 if ( unlikely(end > p2m->max_mapped_pfn) )
 {
+ASSERT(end == ~0UL || !p2m_is_hostp2m(p2m));
+
 if ( !gfn )
 {
 p2m->change_entry_type_global(p2m, ot, nt);
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Oleksandr Andrushchenko

On 04/18/2018 01:55 PM, Roger Pau Monné wrote:

On Wed, Apr 18, 2018 at 01:39:35PM +0300, Oleksandr Andrushchenko wrote:

On 04/18/2018 01:18 PM, Paul Durrant wrote:

-Original Message-
From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
Of Roger Pau Monné
Sent: 18 April 2018 11:11
To: Oleksandr Andrushchenko 
Cc: jgr...@suse.com; Artem Mygaiev ;
Dongwon Kim ; airl...@linux.ie;
oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri-
de...@lists.freedesktop.org; Potrola, MateuszX
; xen-devel@lists.xenproject.org;
daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper

Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
helper DRM driver

On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
wrote:

On 04/18/2018 10:35 AM, Roger Pau Monné wrote:

After speaking with Oleksandr on IRC, I think the main usage of the
gntdev extension is to:

1. Create a dma-buf from a set of grant references.
2. Share dma-buf and get a list of grant references.

I think this set of operations could be broken into:

1.1 Map grant references into user-space using the gntdev.
1.2 Create a dma-buf out of a set of user-space virtual addresses.

2.1 Map a dma-buf into user-space.
2.2 Get grefs out of the user-space addresses where the dma-buf is
  mapped.

So it seems like what's actually missing is a way to:

   - Create a dma-buf from a list of user-space virtual addresses.
   - Allow to map a dma-buf into user-space, so it can then be used with
 the gntdev.

I think this is generic enough that it could be implemented by a
device not tied to Xen. AFAICT the hyper_dma guys also wanted
something similar to this.

Ok, so just to summarize, xen-zcopy/hyper-dmabuf as they are now,
are no go from your POV?

My opinion is that there seems to be a more generic way to implement
this, and thus I would prefer that one.


Instead, we have to make all that fancy stuff
with VAs <-> device-X and have that device-X driver live out of drivers/xen
as it is not a Xen specific driver?

That would be my preference if feasible, simply because it can be
reused by other use-cases that need to create dma-bufs in user-space.

There is a use-case I have: a display unit on my target has a DMA
controller which can't do scatter-gather, e.g. it only expects a
single starting address of the buffer.
In order to create a dma-buf from grefs in this case
I allocate memory with dma_alloc_xxx and then balloon pages of the
buffer and finally map grefs onto this DMA buffer.
This way I can give this shared buffer to the display unit as its bus
addresses are contiguous.

With the proposed solution (gntdev + device-X) I won't be able to 
achieve this,

as I have no control over from where gntdev/balloon drivers get the pages
(even more, those can easily be out of DMA address space of the display 
unit).


Thus, even if implemented, I can't use this approach.


In any case I just knew about dma-bufs this morning, there might be
things that I'm missing.

Roger.



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 13:39,  wrote:
> On Wed, Apr 18, 2018 at 02:53:03AM -0600, Jan Beulich wrote:
> On 06.12.17 at 08:50,  wrote:
>>> Each vcpu of hvm guest consumes at least one shadow page. Currently, only 
> 256
>>> (for hap case) pages are pre-allocated as shadow memory at beginning. It 
> would
>>> run out if guest has more than 256 vcpus and guest creation fails. Bump the
>>> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * 
> HVM_MAX_VCPUS
>>> for shadow case.
>>> 
>>> This patch won't lead to more memory consumption for the size of shadow 
>>> memory
>>> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the 
>>> size
>>> of guest memory and the number of vcpus.
>>
>>I don't understand this: What's the purpose of bumping the values if it won't 
>>lead
>>to higher memory consumption? Afaict there's be higher consumption at least
>>transiently. And I don't see why this would need doing independent of the 
>>intended
>>vCPU count in the guest. I guess you want to base your series on top on 
>>Andrew's
>>max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11).
> 
> The situation here is some pages are pre-allocated as P2M page for domain
> initialization. After vCPU creation, the total number of P2M page are
> adjusted by the domctl interface. Before vCPU creation, this domctl
> is unusable for the check in paging_domctl():
>  if ( unlikely(d->vcpu == NULL) || unlikely(d->vcpu[0] == NULL) )

Hence my reference to Andrew's series.

> When the number of a guest's vCPU is small, the pre-allocated are
> enough. But it won't be if the number of vCPU is bigger than 256. Each
> vCPU will at least use one P2M page when it is created, seeing
> contruct_vmcs()->hap_update_paging_modes().

Hmm, that might be a problem perhaps to be addressed in Andrew's series
then, as the implication would be that the amount of shadow/p2m memory
also needs to be set right by XEN_DOMCTL_createdomain (which iirc the
series doesn't do so far).

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory

2018-04-18 Thread Andrew Cooper
On 18/04/18 12:39, Chao Gao wrote:
> On Wed, Apr 18, 2018 at 02:53:03AM -0600, Jan Beulich wrote:
> On 06.12.17 at 08:50,  wrote:
>>> Each vcpu of hvm guest consumes at least one shadow page. Currently, only 
>>> 256
>>> (for hap case) pages are pre-allocated as shadow memory at beginning. It 
>>> would
>>> run out if guest has more than 256 vcpus and guest creation fails. Bump the
>>> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * 
>>> HVM_MAX_VCPUS
>>> for shadow case.
>>>
>>> This patch won't lead to more memory consumption for the size of shadow 
>>> memory
>>> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the 
>>> size
>>> of guest memory and the number of vcpus.
>> I don't understand this: What's the purpose of bumping the values if it 
>> won't lead
>> to higher memory consumption? Afaict there's be higher consumption at least
>> transiently. And I don't see why this would need doing independent of the 
>> intended
>> vCPU count in the guest. I guess you want to base your series on top on 
>> Andrew's
>> max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11).
> The situation here is some pages are pre-allocated as P2M page for domain
> initialization. After vCPU creation, the total number of P2M page are
> adjusted by the domctl interface. Before vCPU creation, this domctl
> is unusable for the check in paging_domctl():
>  if ( unlikely(d->vcpu == NULL) || unlikely(d->vcpu[0] == NULL) )
>
> When the number of a guest's vCPU is small, the pre-allocated are
> enough. But it won't be if the number of vCPU is bigger than 256. Each
> vCPU will at least use one P2M page when it is created, seeing
> contruct_vmcs()->hap_update_paging_modes().

I've also found with XTF tests that the current minimum shadow
calculations are insufficient for single-vcpu domains with 128M of ram.

I haven't had time fix the algorithm, and XTF is using "shadow_memory=4"
as a bodge workaround.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory

2018-04-18 Thread Chao Gao
On Wed, Apr 18, 2018 at 02:53:03AM -0600, Jan Beulich wrote:
 On 06.12.17 at 08:50,  wrote:
>> Each vcpu of hvm guest consumes at least one shadow page. Currently, only 256
>> (for hap case) pages are pre-allocated as shadow memory at beginning. It 
>> would
>> run out if guest has more than 256 vcpus and guest creation fails. Bump the
>> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * 
>> HVM_MAX_VCPUS
>> for shadow case.
>> 
>> This patch won't lead to more memory consumption for the size of shadow 
>> memory
>> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the 
>> size
>> of guest memory and the number of vcpus.
>
>I don't understand this: What's the purpose of bumping the values if it won't 
>lead
>to higher memory consumption? Afaict there's be higher consumption at least
>transiently. And I don't see why this would need doing independent of the 
>intended
>vCPU count in the guest. I guess you want to base your series on top on 
>Andrew's
>max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11).

The situation here is some pages are pre-allocated as P2M page for domain
initialization. After vCPU creation, the total number of P2M page are
adjusted by the domctl interface. Before vCPU creation, this domctl
is unusable for the check in paging_domctl():
 if ( unlikely(d->vcpu == NULL) || unlikely(d->vcpu[0] == NULL) )

When the number of a guest's vCPU is small, the pre-allocated are
enough. But it won't be if the number of vCPU is bigger than 256. Each
vCPU will at least use one P2M page when it is created, seeing
contruct_vmcs()->hap_update_paging_modes().

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mktarball: For qemu upstream, use their scripts/archive-source.sh

2018-04-18 Thread Anthony PERARD
On Tue, Apr 17, 2018 at 06:04:37PM +0100, Ian Jackson wrote:
> diff --git a/tools/misc/mktarball b/tools/misc/mktarball
> index 73282b5..42d5430 100755
> --- a/tools/misc/mktarball
> +++ b/tools/misc/mktarball
> @@ -29,7 +29,21 @@ mkdir -p $tdir
>  
>  git_archive_into $xen_root $tdir/xen-$desc
>  
> -git_archive_into $xen_root/tools/qemu-xen-dir-remote 
> $tdir/xen-$desc/tools/qemu-xen
> +# We can't use git_archive_into with qemu upstream because it uses
> +# git-submodules.  git-submodules are an inherently broken git feature
> +# which should never be used in any circumstance.  Unfortunately, qemu

I don't think xen.git is the right place to bash the use of git
submodule. Especially when there is no way, in xen.git, to find out
ahead of time which git trees the xen build system is going to want to
download.

> +# upstream uses them.  Relevantly for us, git archive does not work
> +# properly when there are submodules.
> +(
> +cd $xen_root/tools/qemu-xen-dir-remote
> +# if it's not clean, the qemu script will call `git stash' !

It's actually `git stash create`. The script doesn't modify the source
worktree. The script seems to be intended to make a tarball of the
worktree (so also things not commited yet), and we use it make a tarball
of a commit.

There is scripts/make-release that is simpler (called via `make dist`),
but it probably stuff the tarball with more than we need (roms).

> +git --no-pager diff --stat HEAD
> +scripts/archive-source.sh $tdir/xen-$desc/tools/qemu-xen.tar
> +cd $tdir/xen-$desc/tools
> +mkdir qemu-xen
> +tar  +rm qemu-xen.tar
> +)

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast

2018-04-18 Thread Chao Gao
On Wed, Apr 18, 2018 at 02:38:48AM -0600, Jan Beulich wrote:
 On 06.12.17 at 08:50,  wrote:
>> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
>> has the following description:
>> 
>> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI 
>> System
>> Description Tables,” of the Advanced Configuration and Power Interface
>> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
>> behavior for BIOS is to pass the control to the operating system with the
>> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are 
>> less
>> than 255, and in x2APIC mode if there are any logical processor reporting an
>> APIC ID of 255 or greater."
>
>With this you mean ...
>
>> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
>> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,
>
>">= 255" and "greater than 254" here respectively. Please be precise.

Will do.

>
>> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
>> APs may compete for the stack, thus a lock is introduced to protect the 
>> stack.
>
>
>
>> --- a/tools/firmware/hvmloader/smp.c
>> +++ b/tools/firmware/hvmloader/smp.c
>> @@ -26,7 +26,9 @@
>>  #define AP_BOOT_EIP 0x1000
>>  extern char ap_boot_start[], ap_boot_end[];
>>  
>> -static int ap_callin, ap_cpuid;
>> +static int ap_callin;
>> +static int enable_x2apic;
>> +static bool lock = 1;
>>  
>>  asm (
>>  ".text   \n"
>> @@ -47,7 +49,15 @@ asm (
>>  "mov   %eax,%ds  \n"
>>  "mov   %eax,%es  \n"
>>  "mov   %eax,%ss  \n"
>> -"movl  $stack_top,%esp   \n"
>> +"3:  movb  $1, %bl   \n"
>> +"mov   $lock,%edx\n"
>> +"movzbl %bl,%eax \n"
>> +"xchg  %al, (%edx)   \n"
>> +"test  %al,%al   \n"
>> +"je2f\n"
>> +"pause   \n"
>> +"jmp   3b\n"
>> +"2:  movl  $stack_top,%esp   \n"
>
>Please be consistent with suffixes: Either add them only when really needed
>(preferred) or add them uniformly everywhere (when permitted of course).
>I also don't understand why you need to use %bl here at all.

will remove %bl stuff.

>
>> @@ -68,14 +78,34 @@ asm (
>>  ".text   \n"
>>  );
>>  
>> +unsigned int ap_cpuid(void)
>
>static
>
>> +{
>> +if ( !(rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_EXTD) )
>> +{
>> +uint32_t eax, ebx, ecx, edx;
>> +
>> +cpuid(1, , , , );
>> +return ebx >> 24;
>> +}
>> +else
>
>pointless "else"
>
>> +return rdmsr(MSR_IA32_APICBASE_MSR + (APIC_ID >> 4));
>> +}
>> +
>>  void ap_start(void); /* non-static avoids unused-function compiler warning 
>> */
>>  /*static*/ void ap_start(void)
>>  {
>> -printf(" - CPU%d ... ", ap_cpuid);
>> +printf(" - CPU%d ... ", ap_cpuid());
>>  cacheattr_init();
>>  printf("done.\n");
>>  wmb();
>> -ap_callin = 1;
>> +ap_callin++;
>> +
>> +if ( enable_x2apic )
>> +wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) |
>> + MSR_IA32_APICBASE_EXTD);
>> +
>> +/* Release the lock */
>> +asm volatile ( "xchgb %1, %b0" : : "m" (lock), "r" (0) : "memory" );
>>  }
>
>How can you release the lock here - you've not switched off the stack, and
>you're about to actually use it (for returning from the function)?

"2:  movl  $stack_top,%esp   \n"
"movl  %esp,%ebp \n"
"call  ap_start  \n"
"1:  hlt \n"
"jmp  1b \n"

Yes. I think it would be right to release the lock following the
"call ap_start" here.
>
>> @@ -125,9 +169,15 @@ void smp_initialise(void)
>>  memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start);
>>  
>>  printf("Multiprocessor initialisation:\n");
>> +if ( nr_cpus > MADT_MAX_LOCAL_APIC )
>> +enable_x2apic = 1;
>> +
>>  ap_start();
>> -for ( i = 1; i < nr_cpus; i++ )
>> -boot_cpu(i);
>> +if ( nr_cpus > MADT_MAX_LOCAL_APIC )
>
>Where does MADT_MAX_LOCAL_APIC come from? I can't find it anywhere, and
>hence can't judge (along the lines of my remark on the description) whether 
>this
>is a correct comparison.

It is defined in patch 5/8. I know it is a mistake.

With regard to this point, I only boot up vCPU-s via broadcasting
INIT-STARTUP signal when the number of vCPU-s is greater than a
given value, otherwise the previous way will be used.

In general, before all APs are up, BSP couldn't know each AP's APIC ID
(currently, we know that because APIC ID can be inferred from vcpu_id) and
whether there is a vCPU with APIC_ID >= 255. I plan to discard the
old way and always use broadcast to boot up APs. And we don't 

Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check

2018-04-18 Thread George Dunlap
On 04/18/2018 11:53 AM, Jan Beulich wrote:
 On 18.04.18 at 12:23,  wrote:
>> That is, sensible input looks like:
>> * start < end
>> * For hostp2ms:
>>  - start <= max_mapped_pfn
>>  - either end <= max_mapped_pfn, or end == ~0UL
>> (But for altp2ms, start or end > max_mapped_pfn is fine.)
> 
> So I guess that's part of the difficulty I'm having with the original 
> suggestion:
> Why would such input be okay for alternative p2m-s?

Because altp2ms use max_mapped_pfn to limit what gets propagated to
individual altp2m views when the host p2m is updated.  See
p2m.c:p2m_altp2m_propagate_change().

The motivation for this patch can be found in
marc.info/?i=<8fb5e427-a0e6-79bd-3bae-5fcb6aba4...@bitdefender.com> .
In summary:

- Razvan had started keeping more altp2m stuff 'in sync' with the
hostp2m, including max_mapped_pfn

- I suggested that was unnecessary, and would reverse the 'filtering'
effect that had obviously been put there intentionally

- Razvan posted the above stack trace in which not setting
max_mapped_pfn triggered an ASSERT

- I suggested "fixing" p2m_change_type_range() to handle start >
max_mapped_pfn instead

You could argue that this 'filtering' is unnecessary, but at the moment
that's the way things work; whether it should be changed is a different
discussion.  You could also argue that we should be doing the filtering
before calling p2m_change_type_range(), but it seems to me that this is
the right level to do the filtering.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 6/7] xen/arm: Setup virtual paging for secondary CPUs in non-boot scenario

2018-04-18 Thread Julien Grall



On 18/04/18 11:45, Mirela Simonovic wrote:

On Tue, Apr 17, 2018 at 4:11 PM, Julien Grall  wrote:



On 17/04/18 13:54, Mirela Simonovic wrote:


Hi Julien,



Hi,



On Wed, Apr 11, 2018 at 5:11 PM, Julien Grall 
wrote:


Hi,

On 11/04/18 14:19, Mirela Simonovic wrote:



In existing code the paging for secondary CPUs is setup only in boot
flow.
The setup is triggered from start_xen function after all CPUs are
brought
online. In other words, the initialization of VTCR_EL2 register is done
out of the cpu_up/start_secondary control flow. However, the cpu_up flow
should be self-contained - it should fully initialize a secondary CPU,
because the cpu_up is used not only to bring a secondary CPU online on
boot, but also to hotplug a CPU during the system resume.
With this patch the setting of paging is triggered from start_secondary
function if the current system state is not boot. This way, the paging
will be setup in non-boot scenarios, while the setup in boot scenario
remains unchanged.




I am afraid that this is not correct. You can't assume that value chosen
for
VTCR by Xen at boot will fit this new CPU. So you have to check it is
fine
or park the CPU if there are any issue.



This is not a new CPU. This CPU already went through its boot sequence
and it reached the resume point because it does fit the value chosen
for VTCR by Xen.
If it wouldn't fit the chosen value for VTCR it would be parked so it
wouldn't participate in suspend/resume. Please let me know if I
misunderstood your comment.



This is not a new CPU for your use case. However your commit message
speak about "non-boot" CPU bring-up. So for me this is more than
suspend/resume, it is about bringing-up CPU at any time.

As those CPUs can't participate to the decision (it is too late), you
need to make sure the VTCR will fit on that CPU.



AFAIU the value chosen by Xen for VTCR config has to be common for all
online CPUs. Since this value is also used in the resume path I
suggest to make global (static in the p2m.c) the 'val' variable which
is currently local in setup_virt_paging() and passed as argument to
setup_virt_paging_one(). Then setup_virt_paging_one() would not
receive an argument.
I need to access this value on resume, so I would call
setup_virt_paging_one() without argument from start_secondary() if the
system state is not boot.
This seems to me a bit cleaner compared to what I submitted in this
patch, but fundamentally the functionality is the same.



You don't need to introduce a static variable it. I believe you can
re-create it based on the information we already have in global
variables. So what I would do is moving the creation of vtcr value in
that function.


Using this approach each CPU will need to recalculate the value which
is already known prior to executing the function.
I believe this is sub-optimal and contrary to existing implementation
where only one CPU performs the calculation.


The implementation usually evolves with the requirements. In the 
existing implementation, the value could be calculated on a single pCPU 
and then scratched afterwards.


Is there any benefit of recalculating the value? > Is there any disadvantage of remembering the value into a static 

variable?

I know that it is only a 32-bit value, but I would rather avoid 
spreading static variable when a value can be recompute in a few steps 
with what we have.


Stefano do you have any opinions?

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Roger Pau Monné
On Wed, Apr 18, 2018 at 01:39:35PM +0300, Oleksandr Andrushchenko wrote:
> On 04/18/2018 01:18 PM, Paul Durrant wrote:
> > > -Original Message-
> > > From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> > > Of Roger Pau Monné
> > > Sent: 18 April 2018 11:11
> > > To: Oleksandr Andrushchenko 
> > > Cc: jgr...@suse.com; Artem Mygaiev ;
> > > Dongwon Kim ; airl...@linux.ie;
> > > oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri-
> > > de...@lists.freedesktop.org; Potrola, MateuszX
> > > ; xen-devel@lists.xenproject.org;
> > > daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> > > 
> > > Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> > > helper DRM driver
> > > 
> > > On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
> > > wrote:
> > > > On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
> > > After speaking with Oleksandr on IRC, I think the main usage of the
> > > gntdev extension is to:
> > > 
> > > 1. Create a dma-buf from a set of grant references.
> > > 2. Share dma-buf and get a list of grant references.
> > > 
> > > I think this set of operations could be broken into:
> > > 
> > > 1.1 Map grant references into user-space using the gntdev.
> > > 1.2 Create a dma-buf out of a set of user-space virtual addresses.
> > > 
> > > 2.1 Map a dma-buf into user-space.
> > > 2.2 Get grefs out of the user-space addresses where the dma-buf is
> > >  mapped.
> > > 
> > > So it seems like what's actually missing is a way to:
> > > 
> > >   - Create a dma-buf from a list of user-space virtual addresses.
> > >   - Allow to map a dma-buf into user-space, so it can then be used with
> > > the gntdev.
> > > 
> > > I think this is generic enough that it could be implemented by a
> > > device not tied to Xen. AFAICT the hyper_dma guys also wanted
> > > something similar to this.
> Ok, so just to summarize, xen-zcopy/hyper-dmabuf as they are now,
> are no go from your POV?

My opinion is that there seems to be a more generic way to implement
this, and thus I would prefer that one.

> Instead, we have to make all that fancy stuff
> with VAs <-> device-X and have that device-X driver live out of drivers/xen
> as it is not a Xen specific driver?

That would be my preference if feasible, simply because it can be
reused by other use-cases that need to create dma-bufs in user-space.

In any case I just knew about dma-bufs this morning, there might be
things that I'm missing.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 12:23,  wrote:
> That is, sensible input looks like:
> * start < end
> * For hostp2ms:
>  - start <= max_mapped_pfn
>  - either end <= max_mapped_pfn, or end == ~0UL
> (But for altp2ms, start or end > max_mapped_pfn is fine.)

So I guess that's part of the difficulty I'm having with the original 
suggestion:
Why would such input be okay for alternative p2m-s?

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/cpu: Support a new cpu vendor, which is Shanghai

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 12:25,  wrote:
> [FionaLi] : I am sorry. I understood wrongly. The C000 range are 
> extensions, which provide some additional feature different from Intel.  As 
> you suggested, we will enable those features in guest OSes and remove these 
> code from patch. Can we support the CPUID leaves and emulate the MSR  with 
> another submit?

With another patch you mean? Yes please - if at all possible put separate
sets of changes into separate patches, combined into a series if there are
dependencies between them.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view

2018-04-18 Thread Razvan Cojocaru
On 04/18/2018 01:45 PM, George Dunlap wrote:
> On 04/18/2018 09:20 AM, Razvan Cojocaru wrote:
>> On 04/17/2018 04:53 PM, George Dunlap wrote:
>>> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote:
  void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
  {
  struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
 +struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
  struct ept_data *ept;
  
 +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
 +p2m->default_access = hostp2m->default_access;
 +p2m->domain = hostp2m->domain;
 +p2m->logdirty_ranges = hostp2m->logdirty_ranges;
 +p2m->global_logdirty = hostp2m->global_logdirty;
>>>
>>> This would certainly be one approach.  But then we'd need to keep a lot
>>> more of these things in sync -- for instance, we'd have to have similar
>>> code to enable and disable global_logdirty on all active altp2m entries.
>>>
>>> [...]
>>>
>>> The other thing that seems to be missing from synchronization is that in
>>> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being
>>> part of the eptp).  The code seems to indicate that this is required for
>>> PML (hardware-assisted logdirty), but I don't see anywhere this is set
>>> or copied from the host p2m.
>>>
>>> It might be nice to have a more structured way of keeping all these
>>> changes in sync, rather than relying on this open-coding everywhere.
>>
>> For logdirty_ranges and global_logdirty, I propose that we put them both
>> in a dynamically allocated struct, and have all p2ms share a pointer to
>> them. That way, all that's required is for the pointer to be set up in
>> p2m_init_altp2m_ept() and the actual data will be automatically shared
>> between p2ms. If I've read the code correctly, the hostp2m is the last
>> to be destroyed and the first to be initialized, so it should work well
>> as long as all p2ms synchronize access to logdirty_ranges and
>> global_logdirty (which I assume they already do).
> 
> That's an interesting idea; one potential disadvantage is that it would
> make locking even more complex than it already is.  Enabling / disabling
> logdirty isn't a hot path, so looping through the altp2ms shouldn't have
> much of a performance impact; *reading* is much more common, so having
> to grab an extra set of locks is more likely to have a performance
> impact.  And it's not clear to me that the complexity of keeping several
> copies in sync is that much higher than the complexity of adding Yet
> Another MM Lock.
> 
> Those are just initial reactions though -- feel free to explore the
> solution space and/or argue otherwise. :-)

No, you're right, there's no point in complicating things, I'll just
loop over the p2ms.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 6/7] xen/arm: Setup virtual paging for secondary CPUs in non-boot scenario

2018-04-18 Thread Mirela Simonovic
On Tue, Apr 17, 2018 at 4:11 PM, Julien Grall  wrote:
>
>
> On 17/04/18 13:54, Mirela Simonovic wrote:
>>
>> Hi Julien,
>
>
> Hi,
>
>>
>> On Wed, Apr 11, 2018 at 5:11 PM, Julien Grall 
>> wrote:
>>>
>>> Hi,
>>>
>>> On 11/04/18 14:19, Mirela Simonovic wrote:


 In existing code the paging for secondary CPUs is setup only in boot
 flow.
 The setup is triggered from start_xen function after all CPUs are
 brought
 online. In other words, the initialization of VTCR_EL2 register is done
 out of the cpu_up/start_secondary control flow. However, the cpu_up flow
 should be self-contained - it should fully initialize a secondary CPU,
 because the cpu_up is used not only to bring a secondary CPU online on
 boot, but also to hotplug a CPU during the system resume.
 With this patch the setting of paging is triggered from start_secondary
 function if the current system state is not boot. This way, the paging
 will be setup in non-boot scenarios, while the setup in boot scenario
 remains unchanged.
>>>
>>>
>>>
>>> I am afraid that this is not correct. You can't assume that value chosen
>>> for
>>> VTCR by Xen at boot will fit this new CPU. So you have to check it is
>>> fine
>>> or park the CPU if there are any issue.
>>>
>>
>> This is not a new CPU. This CPU already went through its boot sequence
>> and it reached the resume point because it does fit the value chosen
>> for VTCR by Xen.
>> If it wouldn't fit the chosen value for VTCR it would be parked so it
>> wouldn't participate in suspend/resume. Please let me know if I
>> misunderstood your comment.
>
>
> This is not a new CPU for your use case. However your commit message
> speak about "non-boot" CPU bring-up. So for me this is more than
> suspend/resume, it is about bringing-up CPU at any time.
>
> As those CPUs can't participate to the decision (it is too late), you
> need to make sure the VTCR will fit on that CPU.
>
>>
>> AFAIU the value chosen by Xen for VTCR config has to be common for all
>> online CPUs. Since this value is also used in the resume path I
>> suggest to make global (static in the p2m.c) the 'val' variable which
>> is currently local in setup_virt_paging() and passed as argument to
>> setup_virt_paging_one(). Then setup_virt_paging_one() would not
>> receive an argument.
>> I need to access this value on resume, so I would call
>> setup_virt_paging_one() without argument from start_secondary() if the
>> system state is not boot.
>> This seems to me a bit cleaner compared to what I submitted in this
>> patch, but fundamentally the functionality is the same.
>
>
> You don't need to introduce a static variable it. I believe you can
> re-create it based on the information we already have in global
> variables. So what I would do is moving the creation of vtcr value in
> that function.

Using this approach each CPU will need to recalculate the value which
is already known prior to executing the function.
I believe this is sub-optimal and contrary to existing implementation
where only one CPU performs the calculation.
Is there any benefit of recalculating the value?
Is there any disadvantage of remembering the value into a static variable?

Please let me know, I just need to understand whether there is any
particular reason why you want to repeat the calculation. If no
particular reasons, saving the calculated value is definitely a better
approach.

Thanks,
Mirela

>
> Cheers,
>
> --
> Julien Grall
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view

2018-04-18 Thread George Dunlap
On 04/18/2018 09:20 AM, Razvan Cojocaru wrote:
> On 04/17/2018 04:53 PM, George Dunlap wrote:
>> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote:
>>>  void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
>>>  {
>>>  struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
>>> +struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
>>>  struct ept_data *ept;
>>>  
>>> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
>>> +p2m->default_access = hostp2m->default_access;
>>> +p2m->domain = hostp2m->domain;
>>> +p2m->logdirty_ranges = hostp2m->logdirty_ranges;
>>> +p2m->global_logdirty = hostp2m->global_logdirty;
>>
>> This would certainly be one approach.  But then we'd need to keep a lot
>> more of these things in sync -- for instance, we'd have to have similar
>> code to enable and disable global_logdirty on all active altp2m entries.
>>
>> [...]
>>
>> The other thing that seems to be missing from synchronization is that in
>> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being
>> part of the eptp).  The code seems to indicate that this is required for
>> PML (hardware-assisted logdirty), but I don't see anywhere this is set
>> or copied from the host p2m.
>>
>> It might be nice to have a more structured way of keeping all these
>> changes in sync, rather than relying on this open-coding everywhere.
> 
> For logdirty_ranges and global_logdirty, I propose that we put them both
> in a dynamically allocated struct, and have all p2ms share a pointer to
> them. That way, all that's required is for the pointer to be set up in
> p2m_init_altp2m_ept() and the actual data will be automatically shared
> between p2ms. If I've read the code correctly, the hostp2m is the last
> to be destroyed and the first to be initialized, so it should work well
> as long as all p2ms synchronize access to logdirty_ranges and
> global_logdirty (which I assume they already do).

That's an interesting idea; one potential disadvantage is that it would
make locking even more complex than it already is.  Enabling / disabling
logdirty isn't a hot path, so looping through the altp2ms shouldn't have
much of a performance impact; *reading* is much more common, so having
to grab an extra set of locks is more likely to have a performance
impact.  And it's not clear to me that the complexity of keeping several
copies in sync is that much higher than the complexity of adding Yet
Another MM Lock.

Those are just initial reactions though -- feel free to explore the
solution space and/or argue otherwise. :-)

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.11] x86: Use spec_ctrl_{enter, exit}_idle() in the S3/S5 path

2018-04-18 Thread Andrew Cooper
This avoids opencoding the functionality (and missing one bit of it), and and
some comments explaining what is going on.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Juergen Gross 

This is effectively a bugfix of c/s 710a8eb "x86: suppress BTI mitigations
around S3 suspend/resume" so should be considered for 4.11 at this point.
---
 xen/arch/x86/acpi/power.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 0763846..bb0d095 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -213,7 +213,8 @@ static int enter_state(u32 state)
 error = 0;
 
 ci = get_cpu_info();
-ci->use_shadow_spec_ctrl = 0;
+spec_ctrl_enter_idle(ci);
+/* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
 ci->bti_ist_info = 0;
 
 ACPI_FLUSH_CPU_CACHE();
@@ -257,10 +258,9 @@ static int enter_state(u32 state)
 if ( !recheck_cpu_features(0) )
 panic("Missing previously available feature(s).");
 
+/* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
 ci->bti_ist_info = default_bti_ist_info;
-asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)
-  :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0)
-  : "memory");
+spec_ctrl_exit_idle(ci);
 
  done:
 spin_debug_enable();
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/cpu: Support a new cpu vendor, which is Shanghai

2018-04-18 Thread Fiona Li(BJ-RD)
Jan
Thanks for your reply.  Answer the following.
Best wish!
FionaLi

-Original Message-
From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Tuesday, April 10, 2018 2:35 PM
To: lifang...@126.com; Fiona Li(BJ-RD) 
Cc: xen-devel@lists.xenproject.org
Subject: Re: RE: [PATCH] x86/cpu: Support a new cpu vendor,which is Shanghai

>>> "Fiona Li(BJ-RD)"  04/10/18 3:08 AM >>>
>> +static void init_shanghai(struct cpuinfo_x86 *c) { uint64_t
>> +msr_ace,msr_rng;
>> +/* Test for Shanghai Extended CPUID information */ if
>> +(cpuid_eax(0xC000) >= 0xC001) { /*Get Shanghai Extended
>> +function number */
>> +u32 extented_feature_flags = cpuid_edx(0xC001);
>> +
>> +/* enable ACE,if support ACE unit */
>> +if(ACE_PRESENT(extented_feature_flags) &&
>> +!ACE_ENABLED(extented_feature_flags)){
>> +rdmsrl(MSR_ZX_ACE, msr_ace);
>> +/* enable ACE  */
>> +wrmsrl(MSR_ZX_ACE, (msr_ace | ACE_FCR));
>>> +printk(KERN_INFO "CPU: Enabled ACE h/w crypto\n");ai
>> +}
>> +/* enable RNG,if support RNG unit */ if
>> +(RNG_PRESENT(extented_feature_flags) &&
>> +!RNG_ENABLED(extented_feature_flags)) { rdmsrl(MSR_ZX_RNG, msr_rng);
>> +/* enable RNG  */
>> +wrmsrl(MSR_ZX_RNG, msr_rng | RNG_ENABLE); printk(KERN_INFO "CPU:
>> +Enabled h/w RNG\n"); } }
>> +
>> +if (c->x86 == 0x6 && c->x86_model >= 0xf) {
>> +c->x86_cache_alignment = c->x86_clflush_size * 2;
>> +__set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability); }
>
>Is there a specification available anywhere for all of the above?
>[FionaLi]: Main usage of Zhaoxin platforms in recent years is for
>limited and embedded instead of distributed markets. So there is no
>such document for public access at present. But, Zhaoxin's x86 CPU is
>compatible with Intel x86 architecture. Its instruction sets are
>compatible with Intel. Furthermore, Zhaoxin's CPU virtualization tech
>and I/O virtualization tech are compatible with Intel VMX and VT-d 
>respectively. So maybe we can refer to Intel manual at present.

Notw how I had said "for all of the above": The extensions are clearly not in 
the Intel docs, and the cache alignment and TSC properties aren't general x86 
attributes either.
[FionaLi]:
For cache alignment, it is equal to the size of cache line on Zhaoxin platform. 
Since it has been initialized in early_cpu_detect(), we should remove this 
modification in next version patch.
For X86_FEATURE_CONSTANT_TSC, It  is an additional feature of TSC timer. For 
Shanghai CPU, it should be set according to X86_FEATURE_ITSC feature as Intel 
in next version patch.
Simple CPUID document in attached file is for your reference, the features 
supported by Shanghai CPU can be got from this doc.

>What about guests? How would they know these extensions are available for 
>their use?
>[FionaLi]: My colleagues is committing code to Linux kernel and
>windows. Its extensions will be available to guests.

That wasn't the point of the question: Even with aware guest OSes you first of 
all need to make sure guests can actually obtain the respective CPUID leaves. 
Afaict the C000 range will come out as all zeros for them without you doing 
something about it. It is also questionable whether blanket enabling of the 
features for all guests is a good step - I think this should be left to guest 
OSes (requiring you to properly emulate their MSR accesses within Xen, unless 
the MSRs can be made directly accessible to guests without security risks).
[FionaLi] : I am sorry. I understood wrongly. The C000 range are 
extensions, which provide some additional feature different from Intel.  As you 
suggested, we will enable those features in guest OSes and remove these code 
from patch. Can we support the CPUID leaves and emulate the MSR  with another 
submit?



保密声明:
本邮件含有保密或专有信息,仅供指定收件人使用。严禁对本邮件或其内容做任何未经授权的查阅、使用、复制或转发。
CONFIDENTIAL NOTE:
This email contains confidential or legally privileged information and is for 
the sole use of its intended recipient. Any unauthorized review, use, copying 
or forwarding of this email or the content of this email is strictly prohibited.


zhaoxin_cpuid_leaf_description.xlsx
Description: zhaoxin_cpuid_leaf_description.xlsx
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Oleksandr Andrushchenko

On 04/18/2018 01:18 PM, Paul Durrant wrote:

-Original Message-
From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
Of Roger Pau Monné
Sent: 18 April 2018 11:11
To: Oleksandr Andrushchenko 
Cc: jgr...@suse.com; Artem Mygaiev ;
Dongwon Kim ; airl...@linux.ie;
oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri-
de...@lists.freedesktop.org; Potrola, MateuszX
; xen-devel@lists.xenproject.org;
daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper

Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
helper DRM driver

On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
wrote:

On 04/18/2018 10:35 AM, Roger Pau Monné wrote:

On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko

wrote:

On 04/17/2018 11:57 PM, Dongwon Kim wrote:

On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:

On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:

3.2 Backend exports dma-buf to xen-front

In this case Dom0 pages are shared with DomU. As before, DomU can

only write

to these pages, not any other page from Dom0, so it can be still

considered

safe.
But, the following must be considered (highlighted in xen-front's Kernel
documentation):
   - If guest domain dies then pages/grants received from the backend

cannot

     be claimed back - think of it as memory lost to Dom0 (won't be used

for

any
     other guest)
   - Misbehaving guest may send too many requests to the backend

exhausting

     its grant references and memory (consider this from security POV).

As the

     backend runs in the trusted domain we also assume that it is trusted

as

well,
     e.g. must take measures to prevent DDoS attacks.

I cannot parse the above sentence:

"As the backend runs in the trusted domain we also assume that it is
trusted as well, e.g. must take measures to prevent DDoS attacks."

What's the relation between being trusted and protecting from DoS
attacks?

I mean that we trust the backend that it can prevent Dom0
from crashing in case DomU's frontend misbehaves, e.g.
if the frontend sends too many memory requests etc.

In any case, all? PV protocols are implemented with the frontend
sharing pages to the backend, and I think there's a reason why this
model is used, and it should continue to be used.

This is the first use-case above. But there are real-world
use-cases (embedded in my case) when physically contiguous memory
needs to be shared, one of the possible ways to achieve this is
to share contiguous memory from Dom0 to DomU (the second use-case

above)

Having to add logic in the backend to prevent such attacks means
that:

   - We need more code in the backend, which increases complexity and
 chances of bugs.
   - Such code/logic could be wrong, thus allowing DoS.

You can live without this code at all, but this is then up to
backend which may make Dom0 down because of DomU's frontend doing

evil

things

IMO we should design protocols that do not allow such attacks instead
of having to defend against them.


4. xen-front/backend/xen-zcopy synchronization

4.1. As I already said in 2) all the inter VM communication happens

between

xen-front and the backend, xen-zcopy is NOT involved in that.
When xen-front wants to destroy a display buffer (dumb/dma-buf) it

issues a

XENDISPL_OP_DBUF_DESTROY command (opposite to

XENDISPL_OP_DBUF_CREATE).

This call is synchronous, so xen-front expects that backend does free

the

buffer pages on return.

4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
    - closes all dumb handles/fd's of the buffer according to [3]
    - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-

zcopy to make

sure
      the buffer is freed (think of it as it waits for dma-buf->release
callback)

So this zcopy thing keeps some kind of track of the memory usage? Why
can't the user-space backend keep track of the buffer usage?

Because there is no dma-buf UAPI which allows to track the buffer life cycle
(e.g. wait until dma-buf's .release callback is called)

    - replies to xen-front that the buffer can be destroyed.
This way deletion of the buffer happens synchronously on both Dom0

and DomU

sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns

with time-out

error
(BTW, wait time is a parameter of this IOCTL), Xen will defer grant
reference
removal and will retry later until those are free.

Hope this helps understand how buffers are synchronously deleted in

case

of xen-zcopy with a single protocol command.

I think the above logic can also be re-used by the hyper-dmabuf driver

with

some additional work:

1. xen-zcopy can be split into 2 parts and extend:
1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and
vise versa,

I don't know much about the dma-buf implementation in Linux, but
gntdev is a user-space device, and AFAICT 

Re: [Xen-devel] [PATCH 6/7] xen/arm: Setup virtual paging for secondary CPUs in non-boot scenario

2018-04-18 Thread Mirela Simonovic
Hi Julien,

On Wed, Apr 18, 2018 at 11:48 AM, Julien Grall  wrote:
>
>
> On 17/04/18 16:22, Mirela Simonovic wrote:
>>
>> Hi Julien,
>>
>> On Tue, Apr 17, 2018 at 4:11 PM, Julien Grall 
>> wrote:
>>>
>>>
>>>
>>> On 17/04/18 13:54, Mirela Simonovic wrote:


 Hi Julien,
>>>
>>>
>>>
>>> Hi,
>>>

 On Wed, Apr 11, 2018 at 5:11 PM, Julien Grall 
 wrote:
>
>
> Hi,
>
> On 11/04/18 14:19, Mirela Simonovic wrote:
>>
>>
>>
>> In existing code the paging for secondary CPUs is setup only in boot
>> flow.
>> The setup is triggered from start_xen function after all CPUs are
>> brought
>> online. In other words, the initialization of VTCR_EL2 register is
>> done
>> out of the cpu_up/start_secondary control flow. However, the cpu_up
>> flow
>> should be self-contained - it should fully initialize a secondary CPU,
>> because the cpu_up is used not only to bring a secondary CPU online on
>> boot, but also to hotplug a CPU during the system resume.
>> With this patch the setting of paging is triggered from
>> start_secondary
>> function if the current system state is not boot. This way, the paging
>> will be setup in non-boot scenarios, while the setup in boot scenario
>> remains unchanged.
>
>
>
>
> I am afraid that this is not correct. You can't assume that value
> chosen
> for
> VTCR by Xen at boot will fit this new CPU. So you have to check it is
> fine
> or park the CPU if there are any issue.
>

 This is not a new CPU. This CPU already went through its boot sequence
 and it reached the resume point because it does fit the value chosen
 for VTCR by Xen.
 If it wouldn't fit the chosen value for VTCR it would be parked so it
 wouldn't participate in suspend/resume. Please let me know if I
 misunderstood your comment.
>>>
>>>
>>>
>>> This is not a new CPU for your use case. However your commit message
>>> speak about "non-boot" CPU bring-up. So for me this is more than
>>> suspend/resume, it is about bringing-up CPU at any time.
>>>
>>
>> Use case you're trying to cover is hotplugging a CPU after the boot is
>> done in bit.LITTLE system, and that CPU wasn't initially brought
>> online (on boot). Right?
>
>
> That's right. It is how I understood your commit title.
>
>>
>>> As those CPUs can't participate to the decision (it is too late), you
>>> need to make sure the VTCR will fit on that CPU.
>>>
>>
>> Could you please point me to the location in sources where this is
>> done on boot? I mean checking compliance with chosen VTCR value and
>> parking CPU if it doesn't fit.
>
>
> At the moment all CPUs are brought up during Xen boot. So what we do is find
> the small PA range that will accommodate all the onlined CPUs (see
> setup_virt_paging).
>
> So there are no parking required at the moment. However, if you start
> bringing-up new CPUs after boot. Then you have to check they will be able to
> handle the values chosen at boot.
>
> As you pointed out, this might not be necessary for the suspend/resume case.
> So I am not asking you to implement that. However, please make sure the
> commit message clearly spell out that you bringing up secondary CPU after
> Xen has boot only covers the suspend/resume case.
>

I'll fix the commit title, thanks.

> Cheers,
>
> --
> Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Oleksandr Andrushchenko

On 04/18/2018 01:23 PM, Paul Durrant wrote:

-Original Message-
From: Oleksandr Andrushchenko [mailto:andr2...@gmail.com]
Sent: 18 April 2018 11:21
To: Paul Durrant ; Roger Pau Monne

Cc: jgr...@suse.com; Artem Mygaiev ;
Dongwon Kim ; airl...@linux.ie;
oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri-
de...@lists.freedesktop.org; Potrola, MateuszX
; xen-devel@lists.xenproject.org;
daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper

Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
helper DRM driver

On 04/18/2018 01:18 PM, Paul Durrant wrote:

-Original Message-
From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On

Behalf

Of Roger Pau Monné
Sent: 18 April 2018 11:11
To: Oleksandr Andrushchenko 
Cc: jgr...@suse.com; Artem Mygaiev ;
Dongwon Kim ; airl...@linux.ie;
oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org;

dri-

de...@lists.freedesktop.org; Potrola, MateuszX
; xen-devel@lists.xenproject.org;
daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper

Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
helper DRM driver

On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
wrote:

On 04/18/2018 10:35 AM, Roger Pau Monné wrote:

On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko

wrote:

On 04/17/2018 11:57 PM, Dongwon Kim wrote:

On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:

On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:

3.2 Backend exports dma-buf to xen-front

In this case Dom0 pages are shared with DomU. As before, DomU can

only write

to these pages, not any other page from Dom0, so it can be still

considered

safe.
But, the following must be considered (highlighted in xen-front's

Kernel

documentation):
    - If guest domain dies then pages/grants received from the backend

cannot

      be claimed back - think of it as memory lost to Dom0 (won't be used

for

any
      other guest)
    - Misbehaving guest may send too many requests to the backend

exhausting

      its grant references and memory (consider this from security POV).

As the

      backend runs in the trusted domain we also assume that it is

trusted

as

well,
      e.g. must take measures to prevent DDoS attacks.

I cannot parse the above sentence:

"As the backend runs in the trusted domain we also assume that it is
trusted as well, e.g. must take measures to prevent DDoS attacks."

What's the relation between being trusted and protecting from DoS
attacks?

I mean that we trust the backend that it can prevent Dom0
from crashing in case DomU's frontend misbehaves, e.g.
if the frontend sends too many memory requests etc.

In any case, all? PV protocols are implemented with the frontend
sharing pages to the backend, and I think there's a reason why this
model is used, and it should continue to be used.

This is the first use-case above. But there are real-world
use-cases (embedded in my case) when physically contiguous memory
needs to be shared, one of the possible ways to achieve this is
to share contiguous memory from Dom0 to DomU (the second use-case

above)

Having to add logic in the backend to prevent such attacks means
that:

- We need more code in the backend, which increases complexity and
  chances of bugs.
- Such code/logic could be wrong, thus allowing DoS.

You can live without this code at all, but this is then up to
backend which may make Dom0 down because of DomU's frontend

doing

evil

things

IMO we should design protocols that do not allow such attacks instead
of having to defend against them.


4. xen-front/backend/xen-zcopy synchronization

4.1. As I already said in 2) all the inter VM communication happens

between

xen-front and the backend, xen-zcopy is NOT involved in that.
When xen-front wants to destroy a display buffer (dumb/dma-buf) it

issues a

XENDISPL_OP_DBUF_DESTROY command (opposite to

XENDISPL_OP_DBUF_CREATE).

This call is synchronous, so xen-front expects that backend does free

the

buffer pages on return.

4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
     - closes all dumb handles/fd's of the buffer according to [3]
     - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-

zcopy to make

sure
       the buffer is freed (think of it as it waits for dma-buf->release
callback)

So this zcopy thing keeps some kind of track of the memory usage?

Why

can't the user-space backend keep track of the buffer usage?

Because there is no dma-buf UAPI which allows to track the buffer life

cycle

(e.g. wait until dma-buf's .release callback is called)

     - replies to xen-front that the buffer can be destroyed.
This way deletion of the 

[Xen-devel] [PATCH v5] x86/setup: properly update PTEs if src/dst overlaps when relocating Xen image

2018-04-18 Thread Daniel Kiper
Commit 0d31d16 (x86/setup: do not relocate Xen over current Xen image
placement) disallowed src/dst images overlaps when relocating Xen image.
Though it deliberately allowed destination region between __image_base__
and (__image_base__ + XEN_IMG_OFFSET) overlaps with the end of source
image. And here is the problem. If anything between __page_tables_start
and __page_tables_end in source image lands in the overlap then some or
even all page table entries may not be updated. This usually means boom
in early boot which will be difficult to the investigate. So, I think
that we have three choices to fix the issue:
  - drop XEN_IMG_OFFSET from
if ( (end > s) && (end - reloc_size + XEN_IMG_OFFSET >= __pa(_end)) )
  - add XEN_IMG_OFFSET to xen_phys_start in PFN_DOWN(xen_phys_start)
used in loops as one of conditions,
  - change PFN_DOWN(xen_phys_start) to PFN_DOWN(xen_remap_end_pfn)
proposed in earlier version of this patch.

This patch implements the second option. This way we still allow source
and destination partial overlap as described above but PTEs are properly
updated now.

Signed-off-by: Daniel Kiper 
---
 xen/arch/x86/setup.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index b2baee3..040fd03 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1012,6 +1012,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 l3_pgentry_t *pl3e;
 l2_pgentry_t *pl2e;
 int i, j, k;
+unsigned long pte_update_limit;
 
 /* Select relocation address. */
 e = end - reloc_size;
@@ -1019,6 +1020,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 bootsym(trampoline_xen_phys_start) = e;
 
 /*
+ * All PTEs with PFNs above pte_update_limit
+ * were updated earlier. Skip them.
+ */
+pte_update_limit = PFN_DOWN(e + XEN_IMG_OFFSET);
+
+/*
  * Perform relocation to new physical address.
  * Before doing so we must sync static/global data with main memory
  * with a barrier(). After this we must *not* modify static/global
@@ -1041,7 +1048,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 /* Not present, 1GB mapping, or already relocated? */
 if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) ||
  (l3e_get_flags(*pl3e) & _PAGE_PSE) ||
- (l3e_get_pfn(*pl3e) > PFN_DOWN(xen_phys_start)) )
+ (l3e_get_pfn(*pl3e) > pte_update_limit) )
 continue;
 *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) +
 xen_phys_start);
@@ -1051,7 +1058,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 /* Not present, PSE, or already relocated? */
 if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ||
  (l2e_get_flags(*pl2e) & _PAGE_PSE) ||
- (l2e_get_pfn(*pl2e) > PFN_DOWN(xen_phys_start)) )
+ (l2e_get_pfn(*pl2e) > pte_update_limit) )
 continue;
 *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) +
 xen_phys_start);
@@ -1075,7 +1082,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 unsigned int flags;
 
 if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ||
- (l2e_get_pfn(*pl2e) > PFN_DOWN(xen_phys_start)) )
+ (l2e_get_pfn(*pl2e) > pte_update_limit) )
 continue;
 
 if ( !using_2M_mapping() )
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Paul Durrant
> -Original Message-
> From: Oleksandr Andrushchenko [mailto:andr2...@gmail.com]
> Sent: 18 April 2018 11:21
> To: Paul Durrant ; Roger Pau Monne
> 
> Cc: jgr...@suse.com; Artem Mygaiev ;
> Dongwon Kim ; airl...@linux.ie;
> oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Potrola, MateuszX
> ; xen-devel@lists.xenproject.org;
> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> 
> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> helper DRM driver
> 
> On 04/18/2018 01:18 PM, Paul Durrant wrote:
> >> -Original Message-
> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On
> Behalf
> >> Of Roger Pau Monné
> >> Sent: 18 April 2018 11:11
> >> To: Oleksandr Andrushchenko 
> >> Cc: jgr...@suse.com; Artem Mygaiev ;
> >> Dongwon Kim ; airl...@linux.ie;
> >> oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org;
> dri-
> >> de...@lists.freedesktop.org; Potrola, MateuszX
> >> ; xen-devel@lists.xenproject.org;
> >> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> >> helper DRM driver
> >>
> >> On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
> >> wrote:
> >>> On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
>  On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko
> >> wrote:
> > On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> >> On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> >>> On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> > 3.2 Backend exports dma-buf to xen-front
> >
> > In this case Dom0 pages are shared with DomU. As before, DomU can
> >> only write
> > to these pages, not any other page from Dom0, so it can be still
> >> considered
> > safe.
> > But, the following must be considered (highlighted in xen-front's
> Kernel
> > documentation):
> >    - If guest domain dies then pages/grants received from the backend
> >> cannot
> >      be claimed back - think of it as memory lost to Dom0 (won't be used
> >> for
> > any
> >      other guest)
> >    - Misbehaving guest may send too many requests to the backend
> >> exhausting
> >      its grant references and memory (consider this from security POV).
> >> As the
> >      backend runs in the trusted domain we also assume that it is
> trusted
> >> as
> > well,
> >      e.g. must take measures to prevent DDoS attacks.
>  I cannot parse the above sentence:
> 
>  "As the backend runs in the trusted domain we also assume that it is
>  trusted as well, e.g. must take measures to prevent DDoS attacks."
> 
>  What's the relation between being trusted and protecting from DoS
>  attacks?
> >>> I mean that we trust the backend that it can prevent Dom0
> >>> from crashing in case DomU's frontend misbehaves, e.g.
> >>> if the frontend sends too many memory requests etc.
>  In any case, all? PV protocols are implemented with the frontend
>  sharing pages to the backend, and I think there's a reason why this
>  model is used, and it should continue to be used.
> >>> This is the first use-case above. But there are real-world
> >>> use-cases (embedded in my case) when physically contiguous memory
> >>> needs to be shared, one of the possible ways to achieve this is
> >>> to share contiguous memory from Dom0 to DomU (the second use-case
> >> above)
>  Having to add logic in the backend to prevent such attacks means
>  that:
> 
> - We need more code in the backend, which increases complexity and
>   chances of bugs.
> - Such code/logic could be wrong, thus allowing DoS.
> >>> You can live without this code at all, but this is then up to
> >>> backend which may make Dom0 down because of DomU's frontend
> doing
> >> evil
> >>> things
> >> IMO we should design protocols that do not allow such attacks instead
> >> of having to defend against them.
> >>
> > 4. xen-front/backend/xen-zcopy synchronization
> >
> > 4.1. As I already said in 2) all the inter VM communication happens
> >> between
> > xen-front and the backend, xen-zcopy is NOT involved in that.
> > When xen-front wants to destroy a display buffer (dumb/dma-buf) it
> >> issues a
> > XENDISPL_OP_DBUF_DESTROY command (opposite to
> >> XENDISPL_OP_DBUF_CREATE).
> > This call is synchronous, so xen-front expects that backend does free
> >> the
> > buffer pages on return.
> >
> > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
> >     - closes all dumb handles/fd's of the buffer according to [3]

Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check

2018-04-18 Thread George Dunlap
On 04/18/2018 08:38 AM, Jan Beulich wrote:
 On 17.04.18 at 19:16,  wrote:
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -976,6 +976,13 @@ void p2m_change_type_range(struct domain *d,
>>  ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
>>  
>>  p2m_lock(p2m);
>> +
>> +if ( start > p2m->max_mapped_pfn )
>> +{
>> +p2m_unlock(p2m);
>> +return;
>> +}
> 
> I realize this is what George has suggested, but I still wonder if this is the
> right thing to do here: Why is this any more important to check than the
> more general start >= end (in which case the assertion in the rangeset
> code would also trigger)? Till now the function assumes "sensible" things
> to be passed in, but specifically also permitting the [start,~0UL] case
> (just in a more generalizer fashion). The problem you're trying to fix here
> is something passing in a nonsense range (starting above the valid range).
> Yet if we want the function to be immune to nonsense being passed in, I
> think start < end is what needs checking for, and that check would then
> perhaps better go after end was already adjusted.
> 
> The obvious alternative is for callers to only pass in sane ranges (in which
> case adding ASSERT() here may be considered, instead of triggering the
> one in the rangeset code).

The goal of the fix wasn't to get rid of ASSERTs for "nonsense" input
entirely, but to accept input which is in fact "sensible" for altp2m
systems.

I do agree though, that we should either ASSERT for all "nonsense"
input, or handle all such input gracefully.  I'm inclined to ASSERT();
in which case maybe it would be best if we put a bunch up front:

ASSERT(start < end);
if ( start > p2m->max_mapped_pfn)
{
ASSERT(!p2m_is_hostp2m(p2m));
...
}

if ( end > p2m->max_mapped_pfn )
{
ASSERT(end == ~0UL || !p2m_is_hostp2m(p2m));
...

That is, sensible input looks like:
* start < end
* For hostp2ms:
 - start <= max_mapped_pfn
 - either end <= max_mapped_pfn, or end == ~0UL
(But for altp2ms, start or end > max_mapped_pfn is fine.)

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Oleksandr Andrushchenko

On 04/18/2018 01:18 PM, Paul Durrant wrote:

-Original Message-
From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
Of Roger Pau Monné
Sent: 18 April 2018 11:11
To: Oleksandr Andrushchenko 
Cc: jgr...@suse.com; Artem Mygaiev ;
Dongwon Kim ; airl...@linux.ie;
oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri-
de...@lists.freedesktop.org; Potrola, MateuszX
; xen-devel@lists.xenproject.org;
daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper

Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
helper DRM driver

On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
wrote:

On 04/18/2018 10:35 AM, Roger Pau Monné wrote:

On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko

wrote:

On 04/17/2018 11:57 PM, Dongwon Kim wrote:

On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:

On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:

3.2 Backend exports dma-buf to xen-front

In this case Dom0 pages are shared with DomU. As before, DomU can

only write

to these pages, not any other page from Dom0, so it can be still

considered

safe.
But, the following must be considered (highlighted in xen-front's Kernel
documentation):
   - If guest domain dies then pages/grants received from the backend

cannot

     be claimed back - think of it as memory lost to Dom0 (won't be used

for

any
     other guest)
   - Misbehaving guest may send too many requests to the backend

exhausting

     its grant references and memory (consider this from security POV).

As the

     backend runs in the trusted domain we also assume that it is trusted

as

well,
     e.g. must take measures to prevent DDoS attacks.

I cannot parse the above sentence:

"As the backend runs in the trusted domain we also assume that it is
trusted as well, e.g. must take measures to prevent DDoS attacks."

What's the relation between being trusted and protecting from DoS
attacks?

I mean that we trust the backend that it can prevent Dom0
from crashing in case DomU's frontend misbehaves, e.g.
if the frontend sends too many memory requests etc.

In any case, all? PV protocols are implemented with the frontend
sharing pages to the backend, and I think there's a reason why this
model is used, and it should continue to be used.

This is the first use-case above. But there are real-world
use-cases (embedded in my case) when physically contiguous memory
needs to be shared, one of the possible ways to achieve this is
to share contiguous memory from Dom0 to DomU (the second use-case

above)

Having to add logic in the backend to prevent such attacks means
that:

   - We need more code in the backend, which increases complexity and
 chances of bugs.
   - Such code/logic could be wrong, thus allowing DoS.

You can live without this code at all, but this is then up to
backend which may make Dom0 down because of DomU's frontend doing

evil

things

IMO we should design protocols that do not allow such attacks instead
of having to defend against them.


4. xen-front/backend/xen-zcopy synchronization

4.1. As I already said in 2) all the inter VM communication happens

between

xen-front and the backend, xen-zcopy is NOT involved in that.
When xen-front wants to destroy a display buffer (dumb/dma-buf) it

issues a

XENDISPL_OP_DBUF_DESTROY command (opposite to

XENDISPL_OP_DBUF_CREATE).

This call is synchronous, so xen-front expects that backend does free

the

buffer pages on return.

4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
    - closes all dumb handles/fd's of the buffer according to [3]
    - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-

zcopy to make

sure
      the buffer is freed (think of it as it waits for dma-buf->release
callback)

So this zcopy thing keeps some kind of track of the memory usage? Why
can't the user-space backend keep track of the buffer usage?

Because there is no dma-buf UAPI which allows to track the buffer life cycle
(e.g. wait until dma-buf's .release callback is called)

    - replies to xen-front that the buffer can be destroyed.
This way deletion of the buffer happens synchronously on both Dom0

and DomU

sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns

with time-out

error
(BTW, wait time is a parameter of this IOCTL), Xen will defer grant
reference
removal and will retry later until those are free.

Hope this helps understand how buffers are synchronously deleted in

case

of xen-zcopy with a single protocol command.

I think the above logic can also be re-used by the hyper-dmabuf driver

with

some additional work:

1. xen-zcopy can be split into 2 parts and extend:
1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and
vise versa,

I don't know much about the dma-buf implementation in Linux, but
gntdev is a user-space device, and AFAICT 

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Paul Durrant
> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Roger Pau Monné
> Sent: 18 April 2018 11:11
> To: Oleksandr Andrushchenko 
> Cc: jgr...@suse.com; Artem Mygaiev ;
> Dongwon Kim ; airl...@linux.ie;
> oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Potrola, MateuszX
> ; xen-devel@lists.xenproject.org;
> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> 
> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> helper DRM driver
> 
> On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
> wrote:
> > On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
> > > On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko
> wrote:
> > > > On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> > > > > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> > > > > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> > > > 3.2 Backend exports dma-buf to xen-front
> > > >
> > > > In this case Dom0 pages are shared with DomU. As before, DomU can
> only write
> > > > to these pages, not any other page from Dom0, so it can be still
> considered
> > > > safe.
> > > > But, the following must be considered (highlighted in xen-front's Kernel
> > > > documentation):
> > > >   - If guest domain dies then pages/grants received from the backend
> cannot
> > > >     be claimed back - think of it as memory lost to Dom0 (won't be used
> for
> > > > any
> > > >     other guest)
> > > >   - Misbehaving guest may send too many requests to the backend
> exhausting
> > > >     its grant references and memory (consider this from security POV).
> As the
> > > >     backend runs in the trusted domain we also assume that it is trusted
> as
> > > > well,
> > > >     e.g. must take measures to prevent DDoS attacks.
> > > I cannot parse the above sentence:
> > >
> > > "As the backend runs in the trusted domain we also assume that it is
> > > trusted as well, e.g. must take measures to prevent DDoS attacks."
> > >
> > > What's the relation between being trusted and protecting from DoS
> > > attacks?
> > I mean that we trust the backend that it can prevent Dom0
> > from crashing in case DomU's frontend misbehaves, e.g.
> > if the frontend sends too many memory requests etc.
> > > In any case, all? PV protocols are implemented with the frontend
> > > sharing pages to the backend, and I think there's a reason why this
> > > model is used, and it should continue to be used.
> > This is the first use-case above. But there are real-world
> > use-cases (embedded in my case) when physically contiguous memory
> > needs to be shared, one of the possible ways to achieve this is
> > to share contiguous memory from Dom0 to DomU (the second use-case
> above)
> > > Having to add logic in the backend to prevent such attacks means
> > > that:
> > >
> > >   - We need more code in the backend, which increases complexity and
> > > chances of bugs.
> > >   - Such code/logic could be wrong, thus allowing DoS.
> > You can live without this code at all, but this is then up to
> > backend which may make Dom0 down because of DomU's frontend doing
> evil
> > things
> 
> IMO we should design protocols that do not allow such attacks instead
> of having to defend against them.
> 
> > > > 4. xen-front/backend/xen-zcopy synchronization
> > > >
> > > > 4.1. As I already said in 2) all the inter VM communication happens
> between
> > > > xen-front and the backend, xen-zcopy is NOT involved in that.
> > > > When xen-front wants to destroy a display buffer (dumb/dma-buf) it
> issues a
> > > > XENDISPL_OP_DBUF_DESTROY command (opposite to
> XENDISPL_OP_DBUF_CREATE).
> > > > This call is synchronous, so xen-front expects that backend does free
> the
> > > > buffer pages on return.
> > > >
> > > > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
> > > >    - closes all dumb handles/fd's of the buffer according to [3]
> > > >    - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-
> zcopy to make
> > > > sure
> > > >      the buffer is freed (think of it as it waits for dma-buf->release
> > > > callback)
> > > So this zcopy thing keeps some kind of track of the memory usage? Why
> > > can't the user-space backend keep track of the buffer usage?
> > Because there is no dma-buf UAPI which allows to track the buffer life cycle
> > (e.g. wait until dma-buf's .release callback is called)
> > > >    - replies to xen-front that the buffer can be destroyed.
> > > > This way deletion of the buffer happens synchronously on both Dom0
> and DomU
> > > > sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns
> with time-out
> > > > error
> > > > (BTW, wait time is a parameter of this IOCTL), Xen will defer grant
> > > > reference
> > > > removal and will retry later until those are free.
> > > >
> > > 

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Roger Pau Monné
On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko wrote:
> On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
> > On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote:
> > > On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> > > > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> > > > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> > > 3.2 Backend exports dma-buf to xen-front
> > > 
> > > In this case Dom0 pages are shared with DomU. As before, DomU can only 
> > > write
> > > to these pages, not any other page from Dom0, so it can be still 
> > > considered
> > > safe.
> > > But, the following must be considered (highlighted in xen-front's Kernel
> > > documentation):
> > >   - If guest domain dies then pages/grants received from the backend 
> > > cannot
> > >     be claimed back - think of it as memory lost to Dom0 (won't be used 
> > > for
> > > any
> > >     other guest)
> > >   - Misbehaving guest may send too many requests to the backend exhausting
> > >     its grant references and memory (consider this from security POV). As 
> > > the
> > >     backend runs in the trusted domain we also assume that it is trusted 
> > > as
> > > well,
> > >     e.g. must take measures to prevent DDoS attacks.
> > I cannot parse the above sentence:
> > 
> > "As the backend runs in the trusted domain we also assume that it is
> > trusted as well, e.g. must take measures to prevent DDoS attacks."
> > 
> > What's the relation between being trusted and protecting from DoS
> > attacks?
> I mean that we trust the backend that it can prevent Dom0
> from crashing in case DomU's frontend misbehaves, e.g.
> if the frontend sends too many memory requests etc.
> > In any case, all? PV protocols are implemented with the frontend
> > sharing pages to the backend, and I think there's a reason why this
> > model is used, and it should continue to be used.
> This is the first use-case above. But there are real-world
> use-cases (embedded in my case) when physically contiguous memory
> needs to be shared, one of the possible ways to achieve this is
> to share contiguous memory from Dom0 to DomU (the second use-case above)
> > Having to add logic in the backend to prevent such attacks means
> > that:
> > 
> >   - We need more code in the backend, which increases complexity and
> > chances of bugs.
> >   - Such code/logic could be wrong, thus allowing DoS.
> You can live without this code at all, but this is then up to
> backend which may make Dom0 down because of DomU's frontend doing evil
> things

IMO we should design protocols that do not allow such attacks instead
of having to defend against them.

> > > 4. xen-front/backend/xen-zcopy synchronization
> > > 
> > > 4.1. As I already said in 2) all the inter VM communication happens 
> > > between
> > > xen-front and the backend, xen-zcopy is NOT involved in that.
> > > When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues 
> > > a
> > > XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE).
> > > This call is synchronous, so xen-front expects that backend does free the
> > > buffer pages on return.
> > > 
> > > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
> > >    - closes all dumb handles/fd's of the buffer according to [3]
> > >    - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-zcopy to make
> > > sure
> > >      the buffer is freed (think of it as it waits for dma-buf->release
> > > callback)
> > So this zcopy thing keeps some kind of track of the memory usage? Why
> > can't the user-space backend keep track of the buffer usage?
> Because there is no dma-buf UAPI which allows to track the buffer life cycle
> (e.g. wait until dma-buf's .release callback is called)
> > >    - replies to xen-front that the buffer can be destroyed.
> > > This way deletion of the buffer happens synchronously on both Dom0 and 
> > > DomU
> > > sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out
> > > error
> > > (BTW, wait time is a parameter of this IOCTL), Xen will defer grant
> > > reference
> > > removal and will retry later until those are free.
> > > 
> > > Hope this helps understand how buffers are synchronously deleted in case
> > > of xen-zcopy with a single protocol command.
> > > 
> > > I think the above logic can also be re-used by the hyper-dmabuf driver 
> > > with
> > > some additional work:
> > > 
> > > 1. xen-zcopy can be split into 2 parts and extend:
> > > 1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and
> > > vise versa,
> > I don't know much about the dma-buf implementation in Linux, but
> > gntdev is a user-space device, and AFAICT user-space applications
> > don't have any notion of dma buffers. How are such buffers useful for
> > user-space? Why can't this just be called memory?
> A dma-buf is seen by user-space as a file descriptor and you can
> pass it to different drivers then. For example, you can share a buffer
> used by 

Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti

2018-04-18 Thread Juergen Gross
On 18/04/18 11:42, Sergey Dyasli wrote:
> Hi Juergen,
> 
> 2 small requests from me below.
> 
> On Wed, 2018-04-18 at 10:30 +0200, Juergen Gross wrote:
>> Instead of switching XPTI globally on or off add a per-domain flag for
>> that purpose. This allows to modify the xpti boot parameter to support
>> running dom0 without Meltdown mitigations. Using "xpti=nodom0" as boot
>> parameter will achieve that.
>>
>> Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as
>> it is pv-domain specific.
>>
>> Signed-off-by: Juergen Gross 
>> Reviewed-by: Jan Beulich 
> 
> 
>> diff --git a/docs/misc/xen-command-line.markdown 
>> b/docs/misc/xen-command-line.markdown
>> index b353352adf..d4f758487a 100644
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1955,14 +1955,24 @@ clustered mode.  The default, given no hint from the 
>> **FADT**, is cluster
>>  mode.
>>  
>>  ### xpti
>> -> `= `
>> +> `= List of [ default |  | dom0= | domu= ]`
>>  
>> -> Default: `false` on AMD hardware
>> +> Default: `false` on hardware not vulnerable to Meltdown (e.g. AMD)
> 
> Could this line please be changed to:
> 
> `false` on hardware known not to be vulnerable to Meltdown (e.g. AMD)

Sure.

> 
>>  > Default: `true` everywhere else
>>  
>>  Override default selection of whether to isolate 64-bit PV guest page
>>  tables.
>>  
>> +`true` activates page table isolation even on hardware not vulnerable by
>> +Meltdown for all domains.
>> +
>> +`false` deactivates page table isolation on all systems for all domains.
>> +
>> +`default` sets the default behaviour.
>> +
>> +With `dom0` and `domu` it is possible to control page table isolation
>> +for dom0 or guest domains only.
>> +
>>  ### xsave
>>  > `= `
>>  
> 
>> diff --git a/xen/include/asm-x86/spec_ctrl.h 
>> b/xen/include/asm-x86/spec_ctrl.h
>> index 5ab4ff3f68..b4fa43269e 100644
>> --- a/xen/include/asm-x86/spec_ctrl.h
>> +++ b/xen/include/asm-x86/spec_ctrl.h
>> @@ -29,6 +29,10 @@ void init_speculation_mitigations(void);
>>  extern bool opt_ibpb;
>>  extern uint8_t default_bti_ist_info;
>>  
>> +extern uint8_t opt_xpti;
>> +#define OPT_XPTI_DOM0  0x01
>> +#define OPT_XPTI_DOMU  0x02
>> +
>>  static inline void init_shadow_spec_ctrl_state(void)
>>  {
>>  struct cpu_info *info = get_cpu_info();
> 
> Could you please also include something like the following:
> 
> @@ -119,8 +122,9 @@ static void __init print_details(enum ind_thunk thunk)
> boot_cpu_has(X86_FEATURE_RSB_NATIVE)  ? " RSB_NATIVE" : "",
> boot_cpu_has(X86_FEATURE_RSB_VMEXIT)  ? " RSB_VMEXIT" : "");
>  
> -printk("XPTI: %s\n",
> -   boot_cpu_has(X86_FEATURE_NO_XPTI) ? "disabled" : "enabled");
> +printk("XPTI: Dom0 %s, DomU (64-bit PV only) %s\n",
> +   opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled",
> +   opt_xpti & OPT_XPTI_DOMU ? "enabled" : "disabled");
>  }

As stated already in the reply to Jan: I'll use the text:

"XPTI (64-bit PV only): Dom0 ...".

> (just noticed that commit message also needs update regarding param name)

Aah, yes.

Thanks,


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti

2018-04-18 Thread Juergen Gross
On 18/04/18 11:49, Jan Beulich wrote:
 On 18.04.18 at 11:42,  wrote:
>> On Wed, 2018-04-18 at 10:30 +0200, Juergen Gross wrote:
>> @@ -119,8 +122,9 @@ static void __init print_details(enum ind_thunk thunk)
>> boot_cpu_has(X86_FEATURE_RSB_NATIVE)  ? " RSB_NATIVE" : "",
>> boot_cpu_has(X86_FEATURE_RSB_VMEXIT)  ? " RSB_VMEXIT" : "");
>>  
>> -printk("XPTI: %s\n",
>> -   boot_cpu_has(X86_FEATURE_NO_XPTI) ? "disabled" : "enabled");
>> +printk("XPTI: Dom0 %s, DomU (64-bit PV only) %s\n",
>> +   opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled",
>> +   opt_xpti & OPT_XPTI_DOMU ? "enabled" : "disabled");
>>  }
> 
> Ah, yes, making this log message more precise is certainly worthwhile. The
> placement of "(64-bit PV only)" is a little confusing though: I'd put it 
> either
> before the colon or at the end of the line, as it affects Dom0 as much as
> DomU.

I'd prefer the former.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 6/7] xen/arm: Setup virtual paging for secondary CPUs in non-boot scenario

2018-04-18 Thread Julien Grall



On 17/04/18 16:22, Mirela Simonovic wrote:

Hi Julien,

On Tue, Apr 17, 2018 at 4:11 PM, Julien Grall  wrote:



On 17/04/18 13:54, Mirela Simonovic wrote:


Hi Julien,



Hi,



On Wed, Apr 11, 2018 at 5:11 PM, Julien Grall 
wrote:


Hi,

On 11/04/18 14:19, Mirela Simonovic wrote:



In existing code the paging for secondary CPUs is setup only in boot
flow.
The setup is triggered from start_xen function after all CPUs are
brought
online. In other words, the initialization of VTCR_EL2 register is done
out of the cpu_up/start_secondary control flow. However, the cpu_up flow
should be self-contained - it should fully initialize a secondary CPU,
because the cpu_up is used not only to bring a secondary CPU online on
boot, but also to hotplug a CPU during the system resume.
With this patch the setting of paging is triggered from start_secondary
function if the current system state is not boot. This way, the paging
will be setup in non-boot scenarios, while the setup in boot scenario
remains unchanged.




I am afraid that this is not correct. You can't assume that value chosen
for
VTCR by Xen at boot will fit this new CPU. So you have to check it is
fine
or park the CPU if there are any issue.



This is not a new CPU. This CPU already went through its boot sequence
and it reached the resume point because it does fit the value chosen
for VTCR by Xen.
If it wouldn't fit the chosen value for VTCR it would be parked so it
wouldn't participate in suspend/resume. Please let me know if I
misunderstood your comment.



This is not a new CPU for your use case. However your commit message
speak about "non-boot" CPU bring-up. So for me this is more than
suspend/resume, it is about bringing-up CPU at any time.



Use case you're trying to cover is hotplugging a CPU after the boot is
done in bit.LITTLE system, and that CPU wasn't initially brought
online (on boot). Right?


That's right. It is how I understood your commit title.




As those CPUs can't participate to the decision (it is too late), you
need to make sure the VTCR will fit on that CPU.



Could you please point me to the location in sources where this is
done on boot? I mean checking compliance with chosen VTCR value and
parking CPU if it doesn't fit.


At the moment all CPUs are brought up during Xen boot. So what we do is 
find the small PA range that will accommodate all the onlined CPUs (see 
setup_virt_paging).


So there are no parking required at the moment. However, if you start 
bringing-up new CPUs after boot. Then you have to check they will be 
able to handle the values chosen at boot.


As you pointed out, this might not be necessary for the suspend/resume 
case. So I am not asking you to implement that. However, please make 
sure the commit message clearly spell out that you bringing up secondary 
CPU after Xen has boot only covers the suspend/resume case.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 11:42,  wrote:
> On Wed, 2018-04-18 at 10:30 +0200, Juergen Gross wrote:
> @@ -119,8 +122,9 @@ static void __init print_details(enum ind_thunk thunk)
> boot_cpu_has(X86_FEATURE_RSB_NATIVE)  ? " RSB_NATIVE" : "",
> boot_cpu_has(X86_FEATURE_RSB_VMEXIT)  ? " RSB_VMEXIT" : "");
>  
> -printk("XPTI: %s\n",
> -   boot_cpu_has(X86_FEATURE_NO_XPTI) ? "disabled" : "enabled");
> +printk("XPTI: Dom0 %s, DomU (64-bit PV only) %s\n",
> +   opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled",
> +   opt_xpti & OPT_XPTI_DOMU ? "enabled" : "disabled");
>  }

Ah, yes, making this log message more precise is certainly worthwhile. The
placement of "(64-bit PV only)" is a little confusing though: I'd put it either
before the colon or at the end of the line, as it affects Dom0 as much as
DomU.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti

2018-04-18 Thread Sergey Dyasli
Hi Juergen,

2 small requests from me below.

On Wed, 2018-04-18 at 10:30 +0200, Juergen Gross wrote:
> Instead of switching XPTI globally on or off add a per-domain flag for
> that purpose. This allows to modify the xpti boot parameter to support
> running dom0 without Meltdown mitigations. Using "xpti=nodom0" as boot
> parameter will achieve that.
> 
> Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as
> it is pv-domain specific.
> 
> Signed-off-by: Juergen Gross 
> Reviewed-by: Jan Beulich 


> diff --git a/docs/misc/xen-command-line.markdown 
> b/docs/misc/xen-command-line.markdown
> index b353352adf..d4f758487a 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1955,14 +1955,24 @@ clustered mode.  The default, given no hint from the 
> **FADT**, is cluster
>  mode.
>  
>  ### xpti
> -> `= `
> +> `= List of [ default |  | dom0= | domu= ]`
>  
> -> Default: `false` on AMD hardware
> +> Default: `false` on hardware not vulnerable to Meltdown (e.g. AMD)

Could this line please be changed to:

`false` on hardware known not to be vulnerable to Meltdown (e.g. AMD)

>  > Default: `true` everywhere else
>  
>  Override default selection of whether to isolate 64-bit PV guest page
>  tables.
>  
> +`true` activates page table isolation even on hardware not vulnerable by
> +Meltdown for all domains.
> +
> +`false` deactivates page table isolation on all systems for all domains.
> +
> +`default` sets the default behaviour.
> +
> +With `dom0` and `domu` it is possible to control page table isolation
> +for dom0 or guest domains only.
> +
>  ### xsave
>  > `= `
>  

> diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
> index 5ab4ff3f68..b4fa43269e 100644
> --- a/xen/include/asm-x86/spec_ctrl.h
> +++ b/xen/include/asm-x86/spec_ctrl.h
> @@ -29,6 +29,10 @@ void init_speculation_mitigations(void);
>  extern bool opt_ibpb;
>  extern uint8_t default_bti_ist_info;
>  
> +extern uint8_t opt_xpti;
> +#define OPT_XPTI_DOM0  0x01
> +#define OPT_XPTI_DOMU  0x02
> +
>  static inline void init_shadow_spec_ctrl_state(void)
>  {
>  struct cpu_info *info = get_cpu_info();

Could you please also include something like the following:

@@ -119,8 +122,9 @@ static void __init print_details(enum ind_thunk thunk)
boot_cpu_has(X86_FEATURE_RSB_NATIVE)  ? " RSB_NATIVE" : "",
boot_cpu_has(X86_FEATURE_RSB_VMEXIT)  ? " RSB_VMEXIT" : "");
 
-printk("XPTI: %s\n",
-   boot_cpu_has(X86_FEATURE_NO_XPTI) ? "disabled" : "enabled");
+printk("XPTI: Dom0 %s, DomU (64-bit PV only) %s\n",
+   opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled",
+   opt_xpti & OPT_XPTI_DOMU ? "enabled" : "disabled");
 }


(just noticed that commit message also needs update regarding param name)

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 9/9] xen/x86: use PCID feature

2018-04-18 Thread Juergen Gross
On 18/04/18 11:13, Jan Beulich wrote:
 On 18.04.18 at 10:30,  wrote:
>> Avoid flushing the complete TLB when switching %cr3 for mitigation of
>> Meltdown by using the PCID feature if available.
>>
>> We are using 4 PCID values for a 64 bit pv domain subject to XPTI and
>> 2 values for the non-XPTI case:
>>
>> - guest active and in kernel mode
>> - guest active and in user mode
>> - hypervisor active and guest in user mode (XPTI only)
>> - hypervisor active and guest in kernel mode (XPTI only)
>>
>> We use PCID only if PCID _and_ INVPCID are supported. With PCID in use
>> we disable global pages in cr4. A command line parameter controls in
>> which cases PCID is being used.
>>
>> As the non-XPTI case has shown not to perform better with PCID at least
>> on some machines the default is to use PCID only for domains subject to
>> XPTI.
>>
>> With PCID enabled we always disable global pages. This avoids having to
>> either flush the complete TLB or do a cycle through all PCID values
>> when invalidating a single global page.
>>
>> Signed-off-by: Juergen Gross 
>> Reviewed-by: Jan Beulich 
>> ---
>> V6.1:
>> - address some minor comments (Jan Beulich)
> 
> No v7 changes? Afaict ...
> 
>> @@ -717,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d,
>>  update_cr3(v);
>>  
>>  /* We run on dom0's page tables for the final part of the build 
>> process. */
>> -switch_cr3_cr4(v->arch.cr3, read_cr4());
>> +switch_cr3_cr4(cr3_pa(v->arch.cr3), read_cr4());
> 
> ... at least this was added. And to be honest I'm not convinced cr3_pa() is
> the right construct to use here: It's neither clear why the other bits don't
> matter at this point, nor why any of the bits that you mask out this way
> need masking out in the first place (e.g. why, in the crash case that Andrew
> had observed, the noflush bit was wrongly set).

At this point in time we only want to use another cr3 value. So PCIDs
should _not_ be used as this would require adapting cr4, resulting in
writing the cr3_pa() bits only. It would have been possible to use
write_cr3() here, but only together with open coding a TLB flush in
order to avoid any global pages remaining in the TLB. So I decided to
use switch_cr3_cr4().

The noflush bit was set as dom0 is subject to XPTI and PCID usage and
update_cr3() is updating v->arch.cr3 accordingly.

I know you are not fond of setting noflush in v->arch.cr3. The only
sensible alternative to that would be adding something like
v->arch.cr3_pcid_bits being or-ed to the cr3 value in restore_all_guest.
This would let us get rid of having to use cr3_pa() in some places
while adding a (very little) performance penalty.

> I hope there aren't any other such hidden changes in this version of the
> series.

Oh sorry, I missed to update the history.

I'm not aware of other changes in v8 (apart from patch 1 and the
resulting needed change in patch 3).


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2] x86/hpet: Fix possible ASSERT(cpu < nr_cpu_ids)

2018-04-18 Thread Davidwang
From: David Wang 

For the ch->cpumask be cleared by other cpu, cpumask_first() called by
hpet_detach_channel() return nr_cpu_ids. That lead an assertion in
set_channel_irq_affinity() when cpumask_of() check cpu.
Fix this by using a local variable.

Signed-off-by: David Wang 
---
 xen/arch/x86/hpet.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index bc7a851..6b3587a 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -509,15 +509,18 @@ static void hpet_attach_channel(unsigned int cpu,
 static void hpet_detach_channel(unsigned int cpu,
 struct hpet_event_channel *ch)
 {
+cpumask_t cpumask;
+
 spin_lock_irq(>lock);
 
 ASSERT(ch == per_cpu(cpu_bc_channel, cpu));
 
 per_cpu(cpu_bc_channel, cpu) = NULL;
+cpumask_copy(, ch->cpumask);
 
 if ( cpu != ch->cpu )
 spin_unlock_irq(>lock);
-else if ( cpumask_empty(ch->cpumask) )
+else if ( cpumask_empty() )
 {
 ch->cpu = -1;
 clear_bit(HPET_EVT_USED_BIT, >flags);
@@ -525,7 +528,7 @@ static void hpet_detach_channel(unsigned int cpu,
 }
 else
 {
-ch->cpu = cpumask_first(ch->cpumask);
+ch->cpu = cpumask_first();
 set_channel_irq_affinity(ch);
 local_irq_enable();
 }
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Announcing the Xen Project 4.11 RC and Test Day Schedules

2018-04-18 Thread Lars Kurth
Dear community members,

on Tuesday, we created Xen 4.1q RC1 and will release a new release candidate
every FRIDAY, until we declare a release candidate as the final candidate and
cut the Xen 4.11 release. 

We will also hold a Test Day every TUESDAY for the release candidate that was
released the week prior to the Test Day starting from RC1. Note that RC’s are 
announced on the following mailing lists: xen-announce, xen-devel and 
xen-users. 

This means we will have Test Days coming up on April 24th, May 1st, 8th, 15th 
and 
22nd. Your testing is still valuable on other days, so please feel free to send 
Test Reports as outlined below at any time.


== Getting, Building and Installing a Release Candidate ==

Release candidates are available from our git repository at

git://xenbits.xenproject.org/xen.git (tag 4.11.0-)

where  is rc1, rc2, rc3, etc. and as tarball from

https://downloads.xenproject.org/release/xen/4.11.0-/xen-4.11.0-.tar.gz 
https://downloads.xenproject.org/release/xen/4.11.0-/xen-4.11.0-.tar.gz.sig

Detailed build and Install instructions can be found on the Test Day Wiki. Make
sure you check the known issues section of the instructions before trying to
download an RC.


== Testing new Features, Test and Bug Reports ==

You can find Test Instructions for new features on our Test Day Wiki and
instructions for general tests on Testing Xen. The following pages provide
information on how to report successful tests and how to report bugs and issues.

Happy Testing!

== Resources ==
* Test Day Wiki: https://wiki.xenproject.org/wiki/Xen_4.11_RC_test_instructions
* Known Issues:
https://wiki.xenproject.org/wiki/Xen_4.11_RC_test_instructions#Known_issues

Best Regards
Lars

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 9/9] xen/x86: use PCID feature

2018-04-18 Thread Jan Beulich
>>> On 18.04.18 at 10:30,  wrote:
> Avoid flushing the complete TLB when switching %cr3 for mitigation of
> Meltdown by using the PCID feature if available.
> 
> We are using 4 PCID values for a 64 bit pv domain subject to XPTI and
> 2 values for the non-XPTI case:
> 
> - guest active and in kernel mode
> - guest active and in user mode
> - hypervisor active and guest in user mode (XPTI only)
> - hypervisor active and guest in kernel mode (XPTI only)
> 
> We use PCID only if PCID _and_ INVPCID are supported. With PCID in use
> we disable global pages in cr4. A command line parameter controls in
> which cases PCID is being used.
> 
> As the non-XPTI case has shown not to perform better with PCID at least
> on some machines the default is to use PCID only for domains subject to
> XPTI.
> 
> With PCID enabled we always disable global pages. This avoids having to
> either flush the complete TLB or do a cycle through all PCID values
> when invalidating a single global page.
> 
> Signed-off-by: Juergen Gross 
> Reviewed-by: Jan Beulich 
> ---
> V6.1:
> - address some minor comments (Jan Beulich)

No v7 changes? Afaict ...

> @@ -717,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d,
>  update_cr3(v);
>  
>  /* We run on dom0's page tables for the final part of the build process. 
> */
> -switch_cr3_cr4(v->arch.cr3, read_cr4());
> +switch_cr3_cr4(cr3_pa(v->arch.cr3), read_cr4());

... at least this was added. And to be honest I'm not convinced cr3_pa() is
the right construct to use here: It's neither clear why the other bits don't
matter at this point, nor why any of the bits that you mask out this way
need masking out in the first place (e.g. why, in the crash case that Andrew
had observed, the noflush bit was wrongly set).

I hope there aren't any other such hidden changes in this version of the
series.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory

2018-04-18 Thread Jan Beulich
>>> On 06.12.17 at 08:50,  wrote:
> Each vcpu of hvm guest consumes at least one shadow page. Currently, only 256
> (for hap case) pages are pre-allocated as shadow memory at beginning. It would
> run out if guest has more than 256 vcpus and guest creation fails. Bump the
> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * HVM_MAX_VCPUS
> for shadow case.
> 
> This patch won't lead to more memory consumption for the size of shadow memory
> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the size
> of guest memory and the number of vcpus.

I don't understand this: What's the purpose of bumping the values if it won't 
lead
to higher memory consumption? Afaict there's be higher consumption at least
transiently. And I don't see why this would need doing independent of the 
intended
vCPU count in the guest. I guess you want to base your series on top on Andrew's
max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11).

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 6/8] hvmload: Add x2apic entry support in the MADT and SRAT build

2018-04-18 Thread Jan Beulich
>>> On 06.12.17 at 08:50,  wrote:
> --- a/tools/libacpi/build.c
> +++ b/tools/libacpi/build.c
> @@ -30,6 +30,11 @@
>  
>  #define align16(sz)(((sz) + 15) & ~15)
>  #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d))
> +#define min(X, Y) ({ \
> +const typeof (X) _x = (X);   \
> +const typeof (Y) _y = (Y);   \
> +(void) (&_x == &_y); \
> +(_x < _y) ? _x : _y; })

No new name space violations please: Identifiers starting with a single 
underscore
and a lower case letter are reserved for file scope symbols. Use a trailing
underscore instead, for example.

> @@ -79,16 +84,19 @@ static struct acpi_20_madt *construct_madt(struct 
> acpi_ctxt *ctxt,
>  struct acpi_20_madt_intsrcovr *intsrcovr;
>  struct acpi_20_madt_ioapic*io_apic;
>  struct acpi_20_madt_lapic *lapic;
> +struct acpi_20_madt_x2apic*x2apic;
>  const struct hvm_info_table   *hvminfo = config->hvminfo;
> -int i, sz;
> +int i, sz, nr_apic;

Apart from your own addition I think both i and sz also want to be unsigned.
Please make such corrections when you touch a line anyway.

>  if ( config->lapic_id == NULL )
>  return NULL;
>  
> +nr_apic = min(hvminfo->nr_vcpus, MADT_MAX_LOCAL_APIC);

With this the variable name is not well chosen. nr_xapic perhaps?

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast

2018-04-18 Thread Jan Beulich
>>> On 06.12.17 at 08:50,  wrote:
> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
> has the following description:
> 
> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
> Description Tables,” of the Advanced Configuration and Power Interface
> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
> behavior for BIOS is to pass the control to the operating system with the
> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
> than 255, and in x2APIC mode if there are any logical processor reporting an
> APIC ID of 255 or greater."

With this you mean ...

> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,

">= 255" and "greater than 254" here respectively. Please be precise.

> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
> APs may compete for the stack, thus a lock is introduced to protect the 
> stack.



> --- a/tools/firmware/hvmloader/smp.c
> +++ b/tools/firmware/hvmloader/smp.c
> @@ -26,7 +26,9 @@
>  #define AP_BOOT_EIP 0x1000
>  extern char ap_boot_start[], ap_boot_end[];
>  
> -static int ap_callin, ap_cpuid;
> +static int ap_callin;
> +static int enable_x2apic;
> +static bool lock = 1;
>  
>  asm (
>  ".text   \n"
> @@ -47,7 +49,15 @@ asm (
>  "mov   %eax,%ds  \n"
>  "mov   %eax,%es  \n"
>  "mov   %eax,%ss  \n"
> -"movl  $stack_top,%esp   \n"
> +"3:  movb  $1, %bl   \n"
> +"mov   $lock,%edx\n"
> +"movzbl %bl,%eax \n"
> +"xchg  %al, (%edx)   \n"
> +"test  %al,%al   \n"
> +"je2f\n"
> +"pause   \n"
> +"jmp   3b\n"
> +"2:  movl  $stack_top,%esp   \n"

Please be consistent with suffixes: Either add them only when really needed
(preferred) or add them uniformly everywhere (when permitted of course).
I also don't understand why you need to use %bl here at all.

> @@ -68,14 +78,34 @@ asm (
>  ".text   \n"
>  );
>  
> +unsigned int ap_cpuid(void)

static

> +{
> +if ( !(rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_EXTD) )
> +{
> +uint32_t eax, ebx, ecx, edx;
> +
> +cpuid(1, , , , );
> +return ebx >> 24;
> +}
> +else

pointless "else"

> +return rdmsr(MSR_IA32_APICBASE_MSR + (APIC_ID >> 4));
> +}
> +
>  void ap_start(void); /* non-static avoids unused-function compiler warning */
>  /*static*/ void ap_start(void)
>  {
> -printf(" - CPU%d ... ", ap_cpuid);
> +printf(" - CPU%d ... ", ap_cpuid());
>  cacheattr_init();
>  printf("done.\n");
>  wmb();
> -ap_callin = 1;
> +ap_callin++;
> +
> +if ( enable_x2apic )
> +wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) |
> + MSR_IA32_APICBASE_EXTD);
> +
> +/* Release the lock */
> +asm volatile ( "xchgb %1, %b0" : : "m" (lock), "r" (0) : "memory" );
>  }

How can you release the lock here - you've not switched off the stack, and
you're about to actually use it (for returning from the function)?

> @@ -125,9 +169,15 @@ void smp_initialise(void)
>  memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start);
>  
>  printf("Multiprocessor initialisation:\n");
> +if ( nr_cpus > MADT_MAX_LOCAL_APIC )
> +enable_x2apic = 1;
> +
>  ap_start();
> -for ( i = 1; i < nr_cpus; i++ )
> -boot_cpu(i);
> +if ( nr_cpus > MADT_MAX_LOCAL_APIC )

Where does MADT_MAX_LOCAL_APIC come from? I can't find it anywhere, and
hence can't judge (along the lines of my remark on the description) whether this
is a correct comparison.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 2/9] xen/x86: add a function for modifying cr3

2018-04-18 Thread Juergen Gross
Instead of having multiple places with more or less identical asm
statements just have one function doing a write to cr3.

As this function should be named write_cr3() rename the current
write_cr3() function to switch_cr3().

Suggested-by: Andrew Copper 
Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
---
V6:
- new patch
---
 xen/arch/x86/flushtlb.c | 4 ++--
 xen/arch/x86/mm.c   | 2 +-
 xen/arch/x86/pv/domain.c| 2 +-
 xen/common/efi/runtime.c| 4 ++--
 xen/include/asm-x86/flushtlb.h  | 2 +-
 xen/include/asm-x86/processor.h | 5 +
 6 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index ad19ba0f5d..e36bd6f383 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -72,7 +72,7 @@ static void post_flush(u32 t)
 this_cpu(tlbflush_time) = t;
 }
 
-void write_cr3(unsigned long cr3)
+void switch_cr3(unsigned long cr3)
 {
 unsigned long flags, cr4;
 u32 t;
@@ -84,7 +84,7 @@ void write_cr3(unsigned long cr3)
 cr4 = read_cr4();
 
 write_cr4(cr4 & ~X86_CR4_PGE);
-asm volatile ( "mov %0, %%cr3" : : "r" (cr3) : "memory" );
+write_cr3(cr3);
 write_cr4(cr4);
 
 post_flush(t);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7d960c742e..e245d96a97 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -503,7 +503,7 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 void write_ptbase(struct vcpu *v)
 {
 get_cpu_info()->root_pgt_changed = true;
-write_cr3(v->arch.cr3);
+switch_cr3(v->arch.cr3);
 }
 
 /*
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index b1c40373fa..be40843b05 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -220,7 +220,7 @@ static void _toggle_guest_pt(struct vcpu *v)
 get_cpu_info()->root_pgt_changed = true;
 
 /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
-asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
+write_cr3(v->arch.cr3);
 
 if ( !(v->arch.flags & TF_kernel_mode) )
 return;
diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c
index 3dbc2e8ee5..4e5ddfef4f 100644
--- a/xen/common/efi/runtime.c
+++ b/xen/common/efi/runtime.c
@@ -111,7 +111,7 @@ struct efi_rs_state efi_rs_enter(void)
 lgdt(_desc);
 }
 
-write_cr3(virt_to_maddr(efi_l4_pgtable));
+switch_cr3(virt_to_maddr(efi_l4_pgtable));
 
 return state;
 }
@@ -120,7 +120,7 @@ void efi_rs_leave(struct efi_rs_state *state)
 {
 if ( !state->cr3 )
 return;
-write_cr3(state->cr3);
+switch_cr3(state->cr3);
 if ( is_pv_vcpu(current) && !is_idle_vcpu(current) )
 {
 struct desc_ptr gdt_desc = {
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 6bf33a6f5e..c150f82ca2 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -84,7 +84,7 @@ static inline unsigned long read_cr3(void)
 }
 
 /* Write pagetable base and implicitly tick the tlbflush clock. */
-void write_cr3(unsigned long cr3);
+void switch_cr3(unsigned long cr3);
 
 /* flush_* flag fields: */
  /*
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index db9988ab33..71d32c0333 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -283,6 +283,11 @@ static inline unsigned long read_cr2(void)
 return cr2;
 }
 
+static inline void write_cr3(unsigned long val)
+{
+asm volatile ( "mov %0, %%cr3" : : "r" (val) : "memory" );
+}
+
 static inline unsigned long read_cr4(void)
 {
 return get_cpu_info()->cr4;
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 6/9] xen/x86: use flag byte for decision whether xen_cr3 is valid

2018-04-18 Thread Juergen Gross
Today cpu_info->xen_cr3 is either 0 to indicate %cr3 doesn't need to
be switched on entry to Xen, or negative for keeping the value while
indicating not to restore %cr3, or positive in case %cr3 is to be
restored.

Switch to use a flag byte instead of a negative xen_cr3 value in order
to allow %cr3 values with the high bit set in case we want to keep TLB
entries when using the PCID feature.

This reduces the number of branches in interrupt handling and results
in better performance (e.g. parallel make of the Xen hypervisor on my
system was using about 3% less system time).

Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
---
V3:
- renamed use_xen_cr3 to better fitting use_pv_cr3
- corrected comment regarding semantics of use_pv_cr3 (Jan Beulich)
- prefer 32-bit operations over 8- or 16-bit ones (Jan Beulich)
---
 xen/arch/x86/domain.c  |  1 +
 xen/arch/x86/mm.c  |  3 +-
 xen/arch/x86/smpboot.c |  2 ++
 xen/arch/x86/x86_64/asm-offsets.c  |  1 +
 xen/arch/x86/x86_64/compat/entry.S |  5 ++--
 xen/arch/x86/x86_64/entry.S| 59 --
 xen/include/asm-x86/current.h  | 12 +---
 7 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9b001a03ec..801ac33810 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1696,6 +1696,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
 ASSERT(local_irq_is_enabled());
 
+get_cpu_info()->use_pv_cr3 = false;
 get_cpu_info()->xen_cr3 = 0;
 
 if ( unlikely(dirty_cpu != cpu) && dirty_cpu != VCPU_CPU_CLEAN )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 73a38e8715..4997047edf 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -517,7 +517,8 @@ void write_ptbase(struct vcpu *v)
 }
 else
 {
-/* Make sure to clear xen_cr3 before pv_cr3. */
+/* Make sure to clear use_pv_cr3 and xen_cr3 before pv_cr3. */
+cpu_info->use_pv_cr3 = false;
 cpu_info->xen_cr3 = 0;
 /* switch_cr3_cr4() serializes. */
 switch_cr3_cr4(v->arch.cr3, new_cr4);
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 980192e71f..d21020c510 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -324,6 +324,7 @@ void start_secondary(void *unused)
  */
 spin_debug_disable();
 
+get_cpu_info()->use_pv_cr3 = false;
 get_cpu_info()->xen_cr3 = 0;
 get_cpu_info()->pv_cr3 = 0;
 
@@ -1123,6 +1124,7 @@ void __init smp_prepare_boot_cpu(void)
 per_cpu(scratch_cpumask, cpu) = _cpu0mask;
 #endif
 
+get_cpu_info()->use_pv_cr3 = false;
 get_cpu_info()->xen_cr3 = 0;
 get_cpu_info()->pv_cr3 = 0;
 }
diff --git a/xen/arch/x86/x86_64/asm-offsets.c 
b/xen/arch/x86/x86_64/asm-offsets.c
index 9e2aefb00f..7ad024cf37 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -144,6 +144,7 @@ void __dummy__(void)
 OFFSET(CPUINFO_use_shadow_spec_ctrl, struct cpu_info, 
use_shadow_spec_ctrl);
 OFFSET(CPUINFO_bti_ist_info, struct cpu_info, bti_ist_info);
 OFFSET(CPUINFO_root_pgt_changed, struct cpu_info, root_pgt_changed);
+OFFSET(CPUINFO_use_pv_cr3, struct cpu_info, use_pv_cr3);
 DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
 BLANK();
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S 
b/xen/arch/x86/x86_64/compat/entry.S
index ae2bb4bf1e..b909977e33 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -210,10 +210,9 @@ ENTRY(cstar_enter)
 
 GET_STACK_END(bx)
 mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
-neg   %rcx
+test  %rcx, %rcx
 jz.Lcstar_cr3_okay
-mov   %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
-neg   %rcx
+movb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx)
 mov   %rcx, %cr3
 movq  $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
 .Lcstar_cr3_okay:
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 5f0758d64f..8b7d1c48ad 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -154,6 +154,7 @@ restore_all_guest:
 rep movsq
 .Lrag_copy_done:
 mov   %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
+movb  $1, STACK_CPUINFO_FIELD(use_pv_cr3)(%rdx)
 mov   %rax, %cr3
 .Lrag_keep_cr3:
 
@@ -202,14 +203,9 @@ restore_all_xen:
  * case we return to late PV exit code (from an NMI or #MC).
  */
 GET_STACK_END(bx)
-mov   STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rdx
+cmpb  $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx)
+UNLIKELY_START(ne, exit_cr3)
 mov   STACK_CPUINFO_FIELD(pv_cr3)(%rbx), %rax
-test  %rdx, %rdx
-/*
- * Ideally the condition would be "nsz", but such doesn't exist,
- * so "g" will have to do.
- */
-UNLIKELY_START(g, exit_cr3)
 mov   %rax, %cr3
 

[Xen-devel] [PATCH v8 3/9] xen/x86: support per-domain flag for xpti

2018-04-18 Thread Juergen Gross
Instead of switching XPTI globally on or off add a per-domain flag for
that purpose. This allows to modify the xpti boot parameter to support
running dom0 without Meltdown mitigations. Using "xpti=nodom0" as boot
parameter will achieve that.

Move the xpti boot parameter handling to xen/arch/x86/pv/domain.c as
it is pv-domain specific.

Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
---
V6.1:
- address some minor comments (Jan Beulich)

V6:
- modify xpti boot parameter options (Andrew Cooper)
- move xpti_init() code to spec_ctrl.c (Andrew Cooper)
- irework init of per-domain xpti flag (Andrew Cooper)

V3:
- latch get_cpu_info() return value in variable (Jan Beulich)
- call always xpti_domain_init() for pv dom0 (Jan Beulich)
- add __init annotations (Jan Beulich)
- drop per domain XPTI message (Jan Beulich)
- document xpti=default support (Jan Beulich)
- move domain xpti flag into a padding hole (Jan Beulich)
---
 docs/misc/xen-command-line.markdown | 14 ++--
 xen/arch/x86/mm.c   | 17 +++--
 xen/arch/x86/mm/shadow/multi.c  |  2 +-
 xen/arch/x86/pv/dom0_build.c|  1 +
 xen/arch/x86/pv/domain.c|  6 
 xen/arch/x86/setup.c| 19 --
 xen/arch/x86/smpboot.c  |  4 +--
 xen/arch/x86/spec_ctrl.c| 70 +
 xen/include/asm-x86/current.h   |  3 +-
 xen/include/asm-x86/domain.h|  3 ++
 xen/include/asm-x86/spec_ctrl.h |  4 +++
 11 files changed, 116 insertions(+), 27 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index b353352adf..d4f758487a 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1955,14 +1955,24 @@ clustered mode.  The default, given no hint from the 
**FADT**, is cluster
 mode.
 
 ### xpti
-> `= `
+> `= List of [ default |  | dom0= | domu= ]`
 
-> Default: `false` on AMD hardware
+> Default: `false` on hardware not vulnerable to Meltdown (e.g. AMD)
 > Default: `true` everywhere else
 
 Override default selection of whether to isolate 64-bit PV guest page
 tables.
 
+`true` activates page table isolation even on hardware not vulnerable by
+Meltdown for all domains.
+
+`false` deactivates page table isolation on all systems for all domains.
+
+`default` sets the default behaviour.
+
+With `dom0` and `domu` it is possible to control page table isolation
+for dom0 or guest domains only.
+
 ### xsave
 > `= `
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e245d96a97..9c36614099 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -502,8 +502,21 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 
 void write_ptbase(struct vcpu *v)
 {
-get_cpu_info()->root_pgt_changed = true;
-switch_cr3(v->arch.cr3);
+struct cpu_info *cpu_info = get_cpu_info();
+
+if ( is_pv_vcpu(v) && v->domain->arch.pv_domain.xpti )
+{
+cpu_info->root_pgt_changed = true;
+cpu_info->pv_cr3 = __pa(this_cpu(root_pgt));
+switch_cr3(v->arch.cr3);
+}
+else
+{
+/* Make sure to clear xen_cr3 before pv_cr3; switch_cr3() serializes. 
*/
+cpu_info->xen_cr3 = 0;
+switch_cr3(v->arch.cr3);
+cpu_info->pv_cr3 = 0;
+}
 }
 
 /*
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 5e779baa53..5a9242de71 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -967,7 +967,7 @@ static int shadow_set_l4e(struct domain *d,
 sh_put_ref(d, osl3mfn, paddr);
 }
 
-if ( !cpu_has_no_xpti )
+if ( is_pv_domain(d) && d->arch.pv_domain.xpti )
 /*
  * Lazy flushing is enough: either we do a TLB flush right afterwards
  * which will pick up the new root page table on all affected cpus
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 5b4325b87f..d148395919 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -387,6 +387,7 @@ int __init dom0_construct_pv(struct domain *d,
 if ( compat32 )
 {
 d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 1;
+d->arch.pv_domain.xpti = false;
 v->vcpu_info = (void *)>shared_info->compat.vcpu_info[0];
 if ( setup_compat_arg_xlat(v) != 0 )
 BUG();
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index be40843b05..ce1a1a9d35 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 static void noreturn continue_nonidle_domain(struct vcpu *v)
@@ -75,6 +76,8 @@ int switch_compat(struct domain *d)
 
 d->arch.x87_fip_width = 4;
 
+d->arch.pv_domain.xpti = false;
+
 return 0;
 
  undo_and_fail:
@@ -205,6 +208,9 @@ int pv_domain_initialise(struct domain *d)
 /* 64-bit PV guest by default. */
 d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 
+

[Xen-devel] [PATCH v8 9/9] xen/x86: use PCID feature

2018-04-18 Thread Juergen Gross
Avoid flushing the complete TLB when switching %cr3 for mitigation of
Meltdown by using the PCID feature if available.

We are using 4 PCID values for a 64 bit pv domain subject to XPTI and
2 values for the non-XPTI case:

- guest active and in kernel mode
- guest active and in user mode
- hypervisor active and guest in user mode (XPTI only)
- hypervisor active and guest in kernel mode (XPTI only)

We use PCID only if PCID _and_ INVPCID are supported. With PCID in use
we disable global pages in cr4. A command line parameter controls in
which cases PCID is being used.

As the non-XPTI case has shown not to perform better with PCID at least
on some machines the default is to use PCID only for domains subject to
XPTI.

With PCID enabled we always disable global pages. This avoids having to
either flush the complete TLB or do a cycle through all PCID values
when invalidating a single global page.

Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
---
V6.1:
- address some minor comments (Jan Beulich)

V6:
- split off pv_guest_cr4_to_real_cr4() conversion to function into new
  patch (Andrew Cooper)
- changed some comments (Jan Beulich, Andrew Cooper)

V5:
- use X86_CR3_ADDR_MASK instead of ~X86_CR3_PCID_MASK (Jan Beulich)
- add some const qualifiers (Jan Beulich)
- mask X86_CR3_ADDR_MASK with PADDR_MASK (Jan Beulich)
- add flushing the TLB from old PCID related entries in write_cr3_cr4()
  (Jan Beulich)

V4:
- add cr3 mask for page table address and use that in dbg_pv_va2mfn()
  (Jan Beulich)
- use invpcid_flush_all_nonglobals() instead of invpcid_flush_all()
  (Jan Beulich)
- use PCIDs 0/1 when running in Xen or without XPTI, 2/3 with XPTI in
  guest (Jan Beulich)
- ASSERT cr4.pge and cr4.pcide are never active at the same time
  (Jan Beulich)
- make pv_guest_cr4_to_real_cr4() a real function

V3:
- support PCID for non-XPTI case, too
- add command line parameter for controlling usage of PCID
- check PCID active by using cr4.pcide (Jan Beulich)
---
 docs/misc/xen-command-line.markdown | 14 +++
 xen/arch/x86/flushtlb.c | 47 -
 xen/arch/x86/mm.c   | 16 +++-
 xen/arch/x86/pv/dom0_build.c|  3 +-
 xen/arch/x86/pv/domain.c| 81 -
 xen/include/asm-x86/domain.h|  4 +-
 xen/include/asm-x86/processor.h |  3 ++
 xen/include/asm-x86/pv/domain.h | 31 ++
 8 files changed, 192 insertions(+), 7 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 451a4fa566..f8950a3bb2 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1451,6 +1451,20 @@ All numbers specified must be hexadecimal ones.
 
 This option can be specified more than once (up to 8 times at present).
 
+### pcid (x86)
+> `=  | xpti=`
+
+> Default: `xpti`
+
+> Can be modified at runtime (change takes effect only for domains created
+  afterwards)
+
+If available, control usage of the PCID feature of the processor for
+64-bit pv-domains. PCID can be used either for no domain at all (`false`),
+for all of them (`true`), only for those subject to XPTI (`xpti`) or for
+those not subject to XPTI (`no-xpti`). The feature is used only in case
+INVPCID is supported and not disabled via `invpcid=false`.
+
 ### ple\_gap
 > `= `
 
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index d48b900100..bd9a36924a 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Debug builds: Wrap frequently to stress-test the wrap logic. */
 #ifdef NDEBUG
@@ -94,6 +95,7 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 {
 unsigned long flags, old_cr4;
 u32 t;
+unsigned long old_pcid = cr3_pcid(read_cr3());
 
 /* This non-reentrant function is sometimes called in interrupt context. */
 local_irq_save(flags);
@@ -103,14 +105,34 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 old_cr4 = read_cr4();
 if ( old_cr4 & X86_CR4_PGE )
 {
+/*
+ * X86_CR4_PGE set means PCID is inactive.
+ * We have to purge the TLB via flipping cr4.pge.
+ */
 old_cr4 = cr4 & ~X86_CR4_PGE;
 write_cr4(old_cr4);
 }
+else if ( use_invpcid )
+/*
+ * Flushing the TLB via INVPCID is necessary only in case PCIDs are
+ * in use, which is true only with INVPCID being available.
+ * Without PCID usage the following write_cr3() will purge the TLB
+ * (we are in the cr4.pge off path) of all entries.
+ * Using invpcid_flush_all_nonglobals() seems to be faster than
+ * invpcid_flush_all(), so use that.
+ */
+invpcid_flush_all_nonglobals();
 
 write_cr3(cr3);
 
 if ( old_cr4 != cr4 )
 write_cr4(cr4);
+else if ( old_pcid != cr3_pcid(cr3) )
+/*
+ * Make sure 

[Xen-devel] [PATCH v8 0/9] xen/x86: various XPTI speedups

2018-04-18 Thread Juergen Gross
This patch series aims at reducing the overhead of the XPTI Meltdown
mitigation.

Patch 1 had been posted before, the main changes in this patch are due
to addressing Jan's comments on my first version. The main objective of
that patch is to avoid copying the L4 page table each time the guest is
being activated, as often the contents didn't change while the
hypervisor was active.

Patch 2 adds a new helper for writing cr3 instead of open coding the
inline assembly in multiple places.

Patch 3 sets the stage for being able to activate XPTI per domain. As a
first step it is now possible to switch XPTI off for dom0 via the xpti
boot parameter.

Patch 4 adds support for using the INVPCID instruction for flushing
the TLB.

Patch 5 reduces the costs of TLB flushes even further: as we don't make
any use of global TLB entries with XPTI being active we can avoid
removing all global TLB entries on TLB flushes by simply deactivating
the global pages in CR4.

Patch 6 prepares using PCIDs in patch 6.
For that purpose it was necessary to allow CR3 values with bit 63 set
in order to avoid flushing TLB entries when writing CR3. This requires
a modification of Jan's rather clever state machine with positive and
negative CR3 values for the hypervisor by using a dedicated flag byte
instead.

Patch 7 converts pv_guest_cr4_to_real_cr4() from a macro to a function
as it was becoming more and more complex.

Patch 8 adds some PCID helper functions for accessing the different
parts of cr3 (address and pcid part).

Patch 9 is the main performance contributor: by making use of the PCID
feature (if available) TLB entries can survive CR3 switches. The TLB
needs to be flushed on context switches only and not when switching
between guest and hypervisor or guest kernel and user mode.

On my machine (Intel i7-4600M) using the PCID feature in the non-XPTI
case showed a slightly worse performance than using global pages
instead (using PCID and global pages is a bad idea as invalidating
global pages in this case would need a complete TLB flush). For this
reason I've decided to use PCID for XPTI only as the default. That
can easily be changed by using the command line parameter "pcid=true".

The complete series has been verified to still mitigate against
Meltdown attacks. A simple performance test (make -j 4 in the Xen
hypervisor directory) showed significant improvements compared to the
state without this series.
Numbers are seconds, stddev in braces.

xpti=false  elapsed system user
unpatched:  88.42 ( 2.01)   94.49 ( 1.38)  180.40 ( 1.41)
patched  :  89.45 ( 3.10)   96.47 ( 3.22)  181.34 ( 1.98)

xpti=true   elapsed system user
unpatched: 113.43 ( 3.68)  165.44 ( 4.41)  183.30 ( 1.72)
patched  :  92.76 ( 2.11)  103.39 ( 1.13)  184.86 ( 0.12)

Changes since last version:
- patch 1: set root_pgt_changed flag on other cpus, too, when changing
  a shadow L4 entry
- patch 3: shadow code needs to check xpti flag now due to change in
  patch 1

Juergen Gross (9):
  x86/xpti: avoid copying L4 page table contents when possible
  xen/x86: add a function for modifying cr3
  xen/x86: support per-domain flag for xpti
  xen/x86: use invpcid for flushing the TLB
  xen/x86: disable global pages for domains with XPTI active
  xen/x86: use flag byte for decision whether xen_cr3 is valid
  xen/x86: convert pv_guest_cr4_to_real_cr4() to a function
  xen/x86: add some cr3 helpers
  xen/x86: use PCID feature

 docs/misc/xen-command-line.markdown |  37 +++-
 xen/arch/x86/cpu/mtrr/generic.c |  37 
 xen/arch/x86/debug.c|   2 +-
 xen/arch/x86/domain.c   |   6 +-
 xen/arch/x86/domain_page.c  |   2 +-
 xen/arch/x86/flushtlb.c | 111 ++--
 xen/arch/x86/mm.c   |  86 +++-
 xen/arch/x86/mm/shadow/multi.c  |  10 
 xen/arch/x86/pv/dom0_build.c|   8 ++-
 xen/arch/x86/pv/domain.c|  89 -
 xen/arch/x86/setup.c|  27 +++--
 xen/arch/x86/smp.c  |   2 +-
 xen/arch/x86/smpboot.c  |   6 +-
 xen/arch/x86/spec_ctrl.c|  70 +++
 xen/arch/x86/x86_64/asm-offsets.c   |   2 +
 xen/arch/x86/x86_64/compat/entry.S  |   5 +-
 xen/arch/x86/x86_64/entry.S |  78 +++--
 xen/common/efi/runtime.c|   4 +-
 xen/include/asm-x86/current.h   |  23 ++--
 xen/include/asm-x86/domain.h|  17 +++---
 xen/include/asm-x86/flushtlb.h  |   7 ++-
 xen/include/asm-x86/invpcid.h   |   2 +
 xen/include/asm-x86/processor.h |  18 ++
 xen/include/asm-x86/pv/domain.h |  31 ++
 xen/include/asm-x86/spec_ctrl.h |   4 ++
 xen/include/asm-x86/x86-defns.h |   4 +-
 26 files changed, 543 insertions(+), 145 deletions(-)

-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org

[Xen-devel] [PATCH v8 4/9] xen/x86: use invpcid for flushing the TLB

2018-04-18 Thread Juergen Gross
If possible use the INVPCID instruction for flushing the TLB instead of
toggling cr4.pge for that purpose.

While at it remove the dependency on cr4.pge being required for mtrr
loading, as this will be required later anyway.

Add a command line option "invpcid" for controlling the use of
INVPCID (default to true).

Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
---
V6:
- reword invpcid parameter description (Andrew Cooper)
- add __read_mostly to use_invpcid definition (Andrew Cooper)

V5:
- use pre_flush() as an initializer in do_tlb_flush() (Jan Beulich)
- introduce boolean use_invpcid instead of clearing X86_FEATURE_INVPCID
  (Jan Beulich)

V4:
- option "invpcid" instead of "noinvpcid" (Jan Beulich)

V3:
- new patch
---
 docs/misc/xen-command-line.markdown |  9 +
 xen/arch/x86/cpu/mtrr/generic.c | 37 ++---
 xen/arch/x86/flushtlb.c | 29 +++--
 xen/arch/x86/setup.c|  8 
 xen/include/asm-x86/invpcid.h   |  2 ++
 5 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index d4f758487a..451a4fa566 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1380,6 +1380,15 @@ Because responsibility for APIC setup is shared between 
Xen and the
 domain 0 kernel this option is automatically propagated to the domain
 0 command line.
 
+### invpcid (x86)
+> `= `
+
+> Default: `true`
+
+By default, Xen will use the INVPCID instruction for TLB management if
+it is available.  This option can be used to cause Xen to fall back to
+older mechanisms, which are generally slower.
+
 ### noirqbalance
 > `= `
 
diff --git a/xen/arch/x86/cpu/mtrr/generic.c b/xen/arch/x86/cpu/mtrr/generic.c
index e9c0e5e059..7ba0c3f0fe 100644
--- a/xen/arch/x86/cpu/mtrr/generic.c
+++ b/xen/arch/x86/cpu/mtrr/generic.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -400,8 +401,10 @@ static DEFINE_SPINLOCK(set_atomicity_lock);
  * has been called.
  */
 
-static void prepare_set(void)
+static bool prepare_set(void)
 {
+   unsigned long cr4;
+
/*  Note that this is not ideal, since the cache is only 
flushed/disabled
   for this CPU while the MTRRs are changed, but changing this requires
   more invasive changes to the way the kernel boots  */
@@ -412,18 +415,24 @@ static void prepare_set(void)
write_cr0(read_cr0() | X86_CR0_CD);
wbinvd();
 
-   /*  TLB flushing here relies on Xen always using CR4.PGE. */
-   BUILD_BUG_ON(!(XEN_MINIMAL_CR4 & X86_CR4_PGE));
-   write_cr4(read_cr4() & ~X86_CR4_PGE);
+   cr4 = read_cr4();
+   if (cr4 & X86_CR4_PGE)
+   write_cr4(cr4 & ~X86_CR4_PGE);
+   else if (use_invpcid)
+   invpcid_flush_all();
+   else
+   write_cr3(read_cr3());
 
/*  Save MTRR state */
rdmsrl(MSR_MTRRdefType, deftype);
 
/*  Disable MTRRs, and set the default type to uncached  */
mtrr_wrmsr(MSR_MTRRdefType, deftype & ~0xcff);
+
+   return cr4 & X86_CR4_PGE;
 }
 
-static void post_set(void)
+static void post_set(bool pge)
 {
/* Intel (P6) standard MTRRs */
mtrr_wrmsr(MSR_MTRRdefType, deftype);
@@ -432,7 +441,12 @@ static void post_set(void)
write_cr0(read_cr0() & ~X86_CR0_CD);
 
/*  Reenable CR4.PGE (also flushes the TLB) */
-   write_cr4(read_cr4() | X86_CR4_PGE);
+   if (pge)
+   write_cr4(read_cr4() | X86_CR4_PGE);
+   else if (use_invpcid)
+   invpcid_flush_all();
+   else
+   write_cr3(read_cr3());
 
spin_unlock(_atomicity_lock);
 }
@@ -441,14 +455,15 @@ static void generic_set_all(void)
 {
unsigned long mask, count;
unsigned long flags;
+   bool pge;
 
local_irq_save(flags);
-   prepare_set();
+   pge = prepare_set();
 
/* Actually set the state */
mask = set_mtrr_state();
 
-   post_set();
+   post_set(pge);
local_irq_restore(flags);
 
/*  Use the atomic bitops to update the global mask  */
@@ -457,7 +472,6 @@ static void generic_set_all(void)
set_bit(count, _changes_mask);
mask >>= 1;
}
-   
 }
 
 static void generic_set_mtrr(unsigned int reg, unsigned long base,
@@ -474,11 +488,12 @@ static void generic_set_mtrr(unsigned int reg, unsigned 
long base,
 {
unsigned long flags;
struct mtrr_var_range *vr;
+   bool pge;
 
vr = _state.var_ranges[reg];
 
local_irq_save(flags);
-   prepare_set();
+   pge = prepare_set();
 
if (size == 0) {
/* The invalid bit is kept in the mask, so we simply clear the
@@ -499,7 +514,7 @@ static void generic_set_mtrr(unsigned int reg, unsigned 
long base,

[Xen-devel] [PATCH v8 7/9] xen/x86: convert pv_guest_cr4_to_real_cr4() to a function

2018-04-18 Thread Juergen Gross
pv_guest_cr4_to_real_cr4() is becoming more and more complex. Convert
it from a macro to an ordinary function.

Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
---
V6:
- new patch, split off from (old) patch 7 (Andrew Cooper)
---
 xen/arch/x86/mm.c| 14 ++
 xen/include/asm-x86/domain.h | 11 ++-
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4997047edf..6aa0c34bae 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -500,6 +500,20 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
 }
 
+unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v)
+{
+const struct domain *d = v->domain;
+unsigned long cr4;
+
+cr4 = v->arch.pv_vcpu.ctrlreg[4] & ~X86_CR4_DE;
+cr4 |= mmu_cr4_features & (X86_CR4_PSE | X86_CR4_SMEP | X86_CR4_SMAP |
+   X86_CR4_OSXSAVE | X86_CR4_FSGSBASE);
+cr4 |= d->arch.pv_domain.xpti  ? 0 : X86_CR4_PGE;
+cr4 |= d->arch.vtsc ? X86_CR4_TSD : 0;
+
+return cr4;
+}
+
 void write_ptbase(struct vcpu *v)
 {
 struct cpu_info *cpu_info = get_cpu_info();
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index b7894dc8c8..9627058cd0 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -615,15 +615,8 @@ void vcpu_show_registers(const struct vcpu *);
 unsigned long pv_guest_cr4_fixup(const struct vcpu *, unsigned long guest_cr4);
 
 /* Convert between guest-visible and real CR4 values. */
-#define pv_guest_cr4_to_real_cr4(v) \
-(((v)->arch.pv_vcpu.ctrlreg[4]  \
-  | (mmu_cr4_features   \
- & (X86_CR4_PSE | X86_CR4_SMEP |\
-X86_CR4_SMAP | X86_CR4_OSXSAVE |\
-X86_CR4_FSGSBASE))  \
-  | ((v)->domain->arch.pv_domain.xpti ? 0 : X86_CR4_PGE) \
-  | ((v)->domain->arch.vtsc ? X86_CR4_TSD : 0)) \
- & ~X86_CR4_DE)
+unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu *v);
+
 #define real_cr4_to_pv_guest_cr4(c) \
 ((c) & ~(X86_CR4_PGE | X86_CR4_PSE | X86_CR4_TSD |  \
  X86_CR4_OSXSAVE | X86_CR4_SMEP |   \
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 8/9] xen/x86: add some cr3 helpers

2018-04-18 Thread Juergen Gross
Add some helper macros to access the address and pcid parts of cr3.

Use those helpers where appropriate.

Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
---
V6:
- new patch (Andrew Cooper)
---
 xen/arch/x86/debug.c|  2 +-
 xen/arch/x86/domain_page.c  |  2 +-
 xen/include/asm-x86/processor.h | 10 ++
 xen/include/asm-x86/x86-defns.h |  4 +++-
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c
index 9159f32db4..a500df01ac 100644
--- a/xen/arch/x86/debug.c
+++ b/xen/arch/x86/debug.c
@@ -98,7 +98,7 @@ dbg_pv_va2mfn(dbgva_t vaddr, struct domain *dp, uint64_t 
pgd3val)
 l2_pgentry_t l2e, *l2t;
 l1_pgentry_t l1e, *l1t;
 unsigned long cr3 = (pgd3val ? pgd3val : dp->vcpu[0]->arch.cr3);
-mfn_t mfn = maddr_to_mfn(cr3);
+mfn_t mfn = maddr_to_mfn(cr3_pa(cr3));
 
 DBGP2("vaddr:%lx domid:%d cr3:%lx pgd3:%lx\n", vaddr, dp->domain_id, 
   cr3, pgd3val);
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 11b6a5421a..0c24530ed9 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -51,7 +51,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
 if ( (v = idle_vcpu[smp_processor_id()]) == current )
 sync_local_execstate();
 /* We must now be running on the idle page table. */
-ASSERT(read_cr3() == __pa(idle_pg_table));
+ASSERT(cr3_pa(read_cr3()) == __pa(idle_pg_table));
 }
 
 return v;
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 71d32c0333..36628459dc 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -288,6 +288,16 @@ static inline void write_cr3(unsigned long val)
 asm volatile ( "mov %0, %%cr3" : : "r" (val) : "memory" );
 }
 
+static inline unsigned long cr3_pa(unsigned long cr3)
+{
+return cr3 & X86_CR3_ADDR_MASK;
+}
+
+static inline unsigned long cr3_pcid(unsigned long cr3)
+{
+return cr3 & X86_CR3_PCID_MASK;
+}
+
 static inline unsigned long read_cr4(void)
 {
 return get_cpu_info()->cr4;
diff --git a/xen/include/asm-x86/x86-defns.h b/xen/include/asm-x86/x86-defns.h
index ff8d66be3c..904041e1ab 100644
--- a/xen/include/asm-x86/x86-defns.h
+++ b/xen/include/asm-x86/x86-defns.h
@@ -45,7 +45,9 @@
 /*
  * Intel CPU flags in CR3
  */
-#define X86_CR3_NOFLUSH (_AC(1, ULL) << 63)
+#define X86_CR3_NOFLUSH(_AC(1, ULL) << 63)
+#define X86_CR3_ADDR_MASK  (PAGE_MASK & PADDR_MASK)
+#define X86_CR3_PCID_MASK  _AC(0x0fff, ULL) /* Mask for PCID */
 
 /*
  * Intel CPU features in CR4
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 5/9] xen/x86: disable global pages for domains with XPTI active

2018-04-18 Thread Juergen Gross
Instead of flushing the TLB from global pages when switching address
spaces with XPTI being active just disable global pages via %cr4
completely when a domain subject to XPTI is active. This avoids the
need for extra TLB flushes as loading %cr3 will remove all TLB
entries.

In order to avoid states with cr3/cr4 having inconsistent values
(e.g. global pages being activated while cr3 already specifies a XPTI
address space) move loading of the new cr4 value to write_ptbase()
(actually to switch_cr3_cr4() called by write_ptbase()).

This requires to use switch_cr3_cr4() instead of write_ptbase() when
building dom0 in order to avoid setting cr4 with cr4.smap set.

Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
---
V7:
- use switch_cr3_cr4() in dom0_build.c

V6:
- don't call read_cr4() multiple times in switch_cr3_cr4()
  (Andrew Cooper)

V4:
- don't use mmu_cr4_features for setting new cr4 value (Jan Beulich)
- use simpler scheme for setting X86_CR4_PGE in
  pv_guest_cr4_to_real_cr4() (Jan Beulich)

V3:
- move cr4 loading for all domains from *_ctxt_switch_to() to
  write_cr3_cr4() called by write_ptbase() (Jan Beulich)
- rebase
---
 xen/arch/x86/domain.c  |  5 -
 xen/arch/x86/flushtlb.c| 17 -
 xen/arch/x86/mm.c  | 14 +++---
 xen/arch/x86/pv/dom0_build.c   |  6 +++---
 xen/arch/x86/x86_64/entry.S| 10 --
 xen/common/efi/runtime.c   |  4 ++--
 xen/include/asm-x86/domain.h   |  3 ++-
 xen/include/asm-x86/flushtlb.h |  2 +-
 8 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 3d9c19d055..9b001a03ec 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1523,17 +1523,12 @@ void paravirt_ctxt_switch_from(struct vcpu *v)
 void paravirt_ctxt_switch_to(struct vcpu *v)
 {
 root_pgentry_t *root_pgt = this_cpu(root_pgt);
-unsigned long cr4;
 
 if ( root_pgt )
 root_pgt[root_table_offset(PERDOMAIN_VIRT_START)] =
 l4e_from_page(v->domain->arch.perdomain_l3_pg,
   __PAGE_HYPERVISOR_RW);
 
-cr4 = pv_guest_cr4_to_real_cr4(v);
-if ( unlikely(cr4 != read_cr4()) )
-write_cr4(cr4);
-
 if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) )
 activate_debugregs(v);
 
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 717f3bd19b..d48b900100 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -90,20 +90,27 @@ static void do_tlb_flush(void)
 post_flush(t);
 }
 
-void switch_cr3(unsigned long cr3)
+void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 {
-unsigned long flags, cr4;
+unsigned long flags, old_cr4;
 u32 t;
 
 /* This non-reentrant function is sometimes called in interrupt context. */
 local_irq_save(flags);
 
 t = pre_flush();
-cr4 = read_cr4();
 
-write_cr4(cr4 & ~X86_CR4_PGE);
+old_cr4 = read_cr4();
+if ( old_cr4 & X86_CR4_PGE )
+{
+old_cr4 = cr4 & ~X86_CR4_PGE;
+write_cr4(old_cr4);
+}
+
 write_cr3(cr3);
-write_cr4(cr4);
+
+if ( old_cr4 != cr4 )
+write_cr4(cr4);
 
 post_flush(t);
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9c36614099..73a38e8715 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -503,20 +503,28 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 void write_ptbase(struct vcpu *v)
 {
 struct cpu_info *cpu_info = get_cpu_info();
+unsigned long new_cr4;
+
+new_cr4 = (is_pv_vcpu(v) && !is_idle_vcpu(v))
+  ? pv_guest_cr4_to_real_cr4(v)
+  : ((read_cr4() & ~X86_CR4_TSD) | X86_CR4_PGE);
 
 if ( is_pv_vcpu(v) && v->domain->arch.pv_domain.xpti )
 {
 cpu_info->root_pgt_changed = true;
 cpu_info->pv_cr3 = __pa(this_cpu(root_pgt));
-switch_cr3(v->arch.cr3);
+switch_cr3_cr4(v->arch.cr3, new_cr4);
 }
 else
 {
-/* Make sure to clear xen_cr3 before pv_cr3; switch_cr3() serializes. 
*/
+/* Make sure to clear xen_cr3 before pv_cr3. */
 cpu_info->xen_cr3 = 0;
-switch_cr3(v->arch.cr3);
+/* switch_cr3_cr4() serializes. */
+switch_cr3_cr4(v->arch.cr3, new_cr4);
 cpu_info->pv_cr3 = 0;
 }
+
+ASSERT(is_pv_vcpu(v) || read_cr4() == mmu_cr4_features);
 }
 
 /*
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index d148395919..4465a059a8 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -717,7 +717,7 @@ int __init dom0_construct_pv(struct domain *d,
 update_cr3(v);
 
 /* We run on dom0's page tables for the final part of the build process. */
-write_ptbase(v);
+switch_cr3_cr4(v->arch.cr3, read_cr4());
 mapcache_override_current(v);
 
 /* Copy the OS image and free temporary buffer. */
@@ -738,7 +738,7 @@ int __init dom0_construct_pv(struct domain *d,
  (parms.virt_hypercall >= 

[Xen-devel] [PATCH v8 1/9] x86/xpti: avoid copying L4 page table contents when possible

2018-04-18 Thread Juergen Gross
For mitigation of Meltdown the current L4 page table is copied to the
cpu local root page table each time a 64 bit pv guest is entered.

Copying can be avoided in cases where the guest L4 page table hasn't
been modified while running the hypervisor, e.g. when handling
interrupts or any hypercall not modifying the L4 page table or %cr3.

So add a per-cpu flag indicating whether the copying should be
performed and set that flag only when loading a new %cr3 or modifying
the L4 page table.  This includes synchronization of the cpu local
root page table with other cpus, so add a special synchronization flag
for that case.

A simple performance check (compiling the hypervisor via "make -j 4")
in dom0 with 4 vcpus shows a significant improvement:

- real time drops from 112 seconds to 103 seconds
- system time drops from 142 seconds to 131 seconds

Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
---
V8:
- more shadow code: set flag on all affected cpus (Tim Deegan)

V7:
- add missing flag setting in shadow code

V6:
- correct an error from rebasing to staging in assembly part

V4:
- move setting of root_pgt_changed flag in flush_area_local() out of
  irq disabled section (Jan Beulich)
- move setting of root_pgt_changed in make_cr3() to _toggle_guest_pt()
  (Jan Beulich)
- remove most conditionals in write_ptbase() (Jan Beulich)
- don't set root_pgt_changed in do_mmu_update() for modification of
  the user page table (Jan Beulich)

V3:
- set flag locally only if affected L4 is active (Jan Beulich)
- add setting flag to flush_area_mask() (Jan Beulich)
- set flag in make_cr3() only if called for current active vcpu

To be applied on top of Jan's "Meltdown band-aid overhead reduction"
series
---
 xen/arch/x86/flushtlb.c   | 16 
 xen/arch/x86/mm.c | 36 +++-
 xen/arch/x86/mm/shadow/multi.c| 10 ++
 xen/arch/x86/pv/domain.c  |  2 ++
 xen/arch/x86/smp.c|  2 +-
 xen/arch/x86/x86_64/asm-offsets.c |  1 +
 xen/arch/x86/x86_64/entry.S   |  9 +++--
 xen/include/asm-x86/current.h |  8 
 xen/include/asm-x86/flushtlb.h|  5 +
 9 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 8a7a76b8ff..ad19ba0f5d 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -8,6 +8,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -160,5 +161,20 @@ unsigned int flush_area_local(const void *va, unsigned int 
flags)
 
 local_irq_restore(irqfl);
 
+if ( flags & FLUSH_ROOT_PGTBL )
+get_cpu_info()->root_pgt_changed = true;
+
 return flags;
 }
+
+void flush_root_pgt_mask(cpumask_t *mask)
+{
+int cpu;
+struct cpu_info *cpu_info;
+
+for_each_cpu(cpu, mask)
+{
+cpu_info = (struct cpu_info *)(stack_base[cpu] + STACK_SIZE) - 1;
+cpu_info->root_pgt_changed = true;
+}
+}
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9fe5583fc3..7d960c742e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -502,6 +502,7 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
 
 void write_ptbase(struct vcpu *v)
 {
+get_cpu_info()->root_pgt_changed = true;
 write_cr3(v->arch.cr3);
 }
 
@@ -3699,18 +3700,27 @@ long do_mmu_update(
 break;
 rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
   cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-/*
- * No need to sync if all uses of the page can be accounted
- * to the page lock we hold, its pinned status, and uses on
- * this (v)CPU.
- */
-if ( !rc && !cpu_has_no_xpti &&
- ((page->u.inuse.type_info & PGT_count_mask) >
-  (1 + !!(page->u.inuse.type_info & PGT_pinned) +
-   (pagetable_get_pfn(curr->arch.guest_table) == mfn) +
-   (pagetable_get_pfn(curr->arch.guest_table_user) ==
-mfn))) )
-sync_guest = true;
+if ( !rc && !cpu_has_no_xpti )
+{
+bool local_in_use = false;
+
+if ( pagetable_get_pfn(curr->arch.guest_table) == mfn )
+{
+local_in_use = true;
+get_cpu_info()->root_pgt_changed = true;
+}
+
+/*
+ * No need to sync if all uses of the page can be
+ * accounted to the page lock we hold, its pinned
+ * status, and uses on this (v)CPU.
+ */
+if ( (page->u.inuse.type_info & PGT_count_mask) >
+ (1 + 

Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view

2018-04-18 Thread Razvan Cojocaru
On 04/17/2018 04:53 PM, George Dunlap wrote:
> On 04/17/2018 11:50 AM, Razvan Cojocaru wrote:
>>  void p2m_init_altp2m_ept(struct domain *d, unsigned int i)
>>  {
>>  struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
>> +struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
>>  struct ept_data *ept;
>>  
>> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
>> +p2m->default_access = hostp2m->default_access;
>> +p2m->domain = hostp2m->domain;
>> +p2m->logdirty_ranges = hostp2m->logdirty_ranges;
>> +p2m->global_logdirty = hostp2m->global_logdirty;
> 
> This would certainly be one approach.  But then we'd need to keep a lot
> more of these things in sync -- for instance, we'd have to have similar
> code to enable and disable global_logdirty on all active altp2m entries.
>
> [...]
> 
> The other thing that seems to be missing from synchronization is that in
> p2m-ept.c:ept_enable_pml() sets the p2m->ept.ad bit (which ends up being
> part of the eptp).  The code seems to indicate that this is required for
> PML (hardware-assisted logdirty), but I don't see anywhere this is set
> or copied from the host p2m.
> 
> It might be nice to have a more structured way of keeping all these
> changes in sync, rather than relying on this open-coding everywhere.

For logdirty_ranges and global_logdirty, I propose that we put them both
in a dynamically allocated struct, and have all p2ms share a pointer to
them. That way, all that's required is for the pointer to be set up in
p2m_init_altp2m_ept() and the actual data will be automatically shared
between p2ms. If I've read the code correctly, the hostp2m is the last
to be destroyed and the first to be initialized, so it should work well
as long as all p2ms synchronize access to logdirty_ranges and
global_logdirty (which I assume they already do).

ept.ad probably needs to be propagated on each change and copied from
the hostp2m on altp2m init.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages

2018-04-18 Thread Jan Beulich
>>> On 06.12.17 at 08:50,  wrote:
> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the vcpu
> number constraint imposed by one IOREQ page, bump the number of IOREQ page to
> 4 pages. With this patch, multiple pages can be used as IOREQ page.

In case I didn't say so before - I'm opposed to simply changing the upper limit
here. Please make sure any number of vCPU-s can be supported by the new
code (it looks like it mostly does already, so it may be mainly the description
which needs changing). If we want to impose an upper limit, this shouldn't
affect the ioreq page handling code at all.

> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -64,14 +64,24 @@ static struct hvm_ioreq_server *get_ioreq_server(const 
> struct domain *d,
>  continue; \
>  else
>  
> +/* Iterate over all ioreq pages */
> +#define FOR_EACH_IOREQ_PAGE(s, i, iorp) \
> +for ( (i) = 0, iorp = s->ioreq; (i) < (s)->ioreq_page_nr; (i)++, iorp++ )

You're going too far with parenthesization here: Just like iorp, i is required 
to
be a simple identifier anyway. Otoh you parenthesize s only once.

> +static ioreq_t *get_ioreq_fallible(struct hvm_ioreq_server *s, struct vcpu 
> *v)

What is "fallible"? And why such a separate wrapper anyway? But I guess a lot
of this will need to change anyway with Paul's recent changes. I'm therefore not
going to look in close detail at any of this.

>  static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
>  {
> -hvm_unmap_ioreq_gfn(s, >ioreq);
> +int i;

At the example of this - unsigned int please in all cases where the value can't 
go
negative.

> @@ -688,8 +741,15 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server 
> *s,
>  INIT_LIST_HEAD(>ioreq_vcpu_list);
>  spin_lock_init(>bufioreq_lock);
>  
> -s->ioreq.gfn = INVALID_GFN;
> +FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +iorp->gfn = INVALID_GFN;
>  s->bufioreq.gfn = INVALID_GFN;
> +s->ioreq_page_nr = (d->max_vcpus + IOREQ_NUM_PER_PAGE - 1) /
> +   IOREQ_NUM_PER_PAGE;

DIV_ROUND_UP() - please don't open-code things.

> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -279,6 +279,12 @@
>  #define XEN_HVM_MCA_CAP_LMCE   (xen_mk_ullong(1) << 0)
>  #define XEN_HVM_MCA_CAP_MASK   XEN_HVM_MCA_CAP_LMCE
>  
> -#define HVM_NR_PARAMS 39
> +/*
> + * Number of pages that are reserved for default IOREQ server. The base PFN
> + * is set via HVM_PARAM_IOREQ_PFN.
> + */
> +#define HVM_PARAM_IOREQ_PAGES 39

Why is this needed? It can be derived from the vCPU count permitted for the
domain.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [distros-debian-squeeze test] 74632: tolerable FAIL

2018-04-18 Thread Platform Team regression test user
flight 74632 distros-debian-squeeze real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/74632/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-i386-squeeze-netboot-pygrub 10 debian-di-install fail like 
74577
 test-amd64-i386-i386-squeeze-netboot-pygrub 10 debian-di-install fail like 
74577
 test-amd64-amd64-amd64-squeeze-netboot-pygrub 10 debian-di-install fail like 
74577
 test-amd64-i386-amd64-squeeze-netboot-pygrub 10 debian-di-install fail like 
74577

baseline version:
 flight   74577

jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-amd64-squeeze-netboot-pygrubfail
 test-amd64-i386-amd64-squeeze-netboot-pygrub fail
 test-amd64-amd64-i386-squeeze-netboot-pygrub fail
 test-amd64-i386-i386-squeeze-netboot-pygrub  fail



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-04-18 Thread Linus Walleij
I wonder why I am starting to get CCed on Xen patches all of a sudden.

I happened to run into Jürgen at a conference only last weekend, but
I still don't know anything whatsoever about Xen or how it works.

If get_maintainer.pl has started to return my name on this stuff I
really want to know why :/

Yours,
Linus Walleij

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Fwd: [tip:x86/urgent] x86, sched: Allow topologies where NUMA nodes share an LLC

2018-04-18 Thread Anshul Makkar



On 4/17/18 6:02 PM, Dario Faggioli wrote:

On Tue, 2018-04-17 at 15:08 +0100, Andrew Cooper wrote:

On 17/04/18 15:02, Juergen Gross wrote:

Is this something we should be aware of in Xen, too?

If we had something close to a working topology representation,
probably.


True, as far as letting the guest know about these details (in
appropriate circumnstaces) goes.

About using it _within_ Xen, well:

"Intel's Skylake Server CPUs have a different LLC topology than
previous
generations. When in Sub-NUMA-Clustering (SNC) mode, the package is
divided
into two "slices", each containing half the cores, half the LLC, and
one
memory controller and each slice is enumerated to Linux as a NUMA
node. This is similar to how the cores and LLC were arranged for the
Cluster-On-Die (CoD) feature."

This looks similar (but not identical) to, e.g., AMD EPYC. If Xen also
sees each slice as a NUMA node as well, we already are doing something
(although, of course, we can always improve).

If not, I see ways of taking advantage of the information that not all
the cores in the socket equally share the LLC, e.g., in Credit2, by
providing a per-LLC runqueue option and/or by taking that into account
in the 'migration resistance' mechanisms (there is already work in that
direction).

Yeah, per-LLC runqueue option sounds like a good option. We can look
into that.
Yes, I am working on dynamically definining migration resistance as a 
factor of
cache and cpu topology ie. instead of having a constant value of 
migration resistance
it will consider the placement of cpus, cache and sharing of cache among 
cpus.


Regards,
Dario

Anshul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Oleksandr Andrushchenko

On 04/18/2018 10:35 AM, Roger Pau Monné wrote:

On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote:

On 04/17/2018 11:57 PM, Dongwon Kim wrote:

On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:

On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:

3.2 Backend exports dma-buf to xen-front

In this case Dom0 pages are shared with DomU. As before, DomU can only write
to these pages, not any other page from Dom0, so it can be still considered
safe.
But, the following must be considered (highlighted in xen-front's Kernel
documentation):
  - If guest domain dies then pages/grants received from the backend cannot
    be claimed back - think of it as memory lost to Dom0 (won't be used for
any
    other guest)
  - Misbehaving guest may send too many requests to the backend exhausting
    its grant references and memory (consider this from security POV). As the
    backend runs in the trusted domain we also assume that it is trusted as
well,
    e.g. must take measures to prevent DDoS attacks.

I cannot parse the above sentence:

"As the backend runs in the trusted domain we also assume that it is
trusted as well, e.g. must take measures to prevent DDoS attacks."

What's the relation between being trusted and protecting from DoS
attacks?

I mean that we trust the backend that it can prevent Dom0
from crashing in case DomU's frontend misbehaves, e.g.
if the frontend sends too many memory requests etc.

In any case, all? PV protocols are implemented with the frontend
sharing pages to the backend, and I think there's a reason why this
model is used, and it should continue to be used.

This is the first use-case above. But there are real-world
use-cases (embedded in my case) when physically contiguous memory
needs to be shared, one of the possible ways to achieve this is
to share contiguous memory from Dom0 to DomU (the second use-case above)

Having to add logic in the backend to prevent such attacks means
that:

  - We need more code in the backend, which increases complexity and
chances of bugs.
  - Such code/logic could be wrong, thus allowing DoS.

You can live without this code at all, but this is then up to
backend which may make Dom0 down because of DomU's frontend doing evil 
things

4. xen-front/backend/xen-zcopy synchronization

4.1. As I already said in 2) all the inter VM communication happens between
xen-front and the backend, xen-zcopy is NOT involved in that.
When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues a
XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE).
This call is synchronous, so xen-front expects that backend does free the
buffer pages on return.

4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
   - closes all dumb handles/fd's of the buffer according to [3]
   - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-zcopy to make
sure
     the buffer is freed (think of it as it waits for dma-buf->release
callback)

So this zcopy thing keeps some kind of track of the memory usage? Why
can't the user-space backend keep track of the buffer usage?

Because there is no dma-buf UAPI which allows to track the buffer life cycle
(e.g. wait until dma-buf's .release callback is called)

   - replies to xen-front that the buffer can be destroyed.
This way deletion of the buffer happens synchronously on both Dom0 and DomU
sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out
error
(BTW, wait time is a parameter of this IOCTL), Xen will defer grant
reference
removal and will retry later until those are free.

Hope this helps understand how buffers are synchronously deleted in case
of xen-zcopy with a single protocol command.

I think the above logic can also be re-used by the hyper-dmabuf driver with
some additional work:

1. xen-zcopy can be split into 2 parts and extend:
1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and
vise versa,

I don't know much about the dma-buf implementation in Linux, but
gntdev is a user-space device, and AFAICT user-space applications
don't have any notion of dma buffers. How are such buffers useful for
user-space? Why can't this just be called memory?

A dma-buf is seen by user-space as a file descriptor and you can
pass it to different drivers then. For example, you can share a buffer
used by a display driver for scanout with a GPU, to compose a picture
into it:
1. User-space (US) allocates a display buffer from display driver
2. US asks display driver to export the dma-buf which backs up that buffer,
US gets buffer's fd: dma_buf_fd
3. US asks GPU driver to import a buffer and provides it with dma_buf_fd
4. GPU renders contents into display buffer (dma_buf_fd)

Finally, this is indeed some memory, but a bit more [1]


Also, (with my FreeBSD maintainer hat) how is this going to translate
to other OSes? So far the operations performed by the gntdev device
are mostly OS-agnostic because this just map/unmap memory, and in fact
they are implemented 

Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check

2018-04-18 Thread Razvan Cojocaru


On 04/18/2018 10:48 AM, Razvan Cojocaru wrote:
> On 04/18/2018 10:38 AM, Jan Beulich wrote:
> On 17.04.18 at 19:16,  wrote:
>>> --- a/xen/arch/x86/mm/p2m.c
>>> +++ b/xen/arch/x86/mm/p2m.c
>>> @@ -976,6 +976,13 @@ void p2m_change_type_range(struct domain *d,
>>>  ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
>>>  
>>>  p2m_lock(p2m);
>>> +
>>> +if ( start > p2m->max_mapped_pfn )
>>> +{
>>> +p2m_unlock(p2m);
>>> +return;
>>> +}
>>
>> I realize this is what George has suggested, but I still wonder if this is 
>> the
>> right thing to do here: Why is this any more important to check than the
>> more general start >= end (in which case the assertion in the rangeset
>> code would also trigger)? Till now the function assumes "sensible" things
>> to be passed in, but specifically also permitting the [start,~0UL] case
>> (just in a more generalizer fashion). The problem you're trying to fix here
>> is something passing in a nonsense range (starting above the valid range).
>> Yet if we want the function to be immune to nonsense being passed in, I
>> think start < end is what needs checking for, and that check would then
>> perhaps better go after end was already adjusted.
>>
>> The obvious alternative is for callers to only pass in sane ranges (in which
>> case adding ASSERT() here may be considered, instead of triggering the
>> one in the rangeset code).
>>
>> In any event you want to add unlikely(), just like the similar end check has.
> 
> FWIW I'll be happy to change the test to unlikely( start < end ), if
> this is OK with George. I'll test the change now.

Sorry, unlikely(end <= start), obviously.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check

2018-04-18 Thread Razvan Cojocaru
On 04/18/2018 10:38 AM, Jan Beulich wrote:
 On 17.04.18 at 19:16,  wrote:
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -976,6 +976,13 @@ void p2m_change_type_range(struct domain *d,
>>  ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
>>  
>>  p2m_lock(p2m);
>> +
>> +if ( start > p2m->max_mapped_pfn )
>> +{
>> +p2m_unlock(p2m);
>> +return;
>> +}
> 
> I realize this is what George has suggested, but I still wonder if this is the
> right thing to do here: Why is this any more important to check than the
> more general start >= end (in which case the assertion in the rangeset
> code would also trigger)? Till now the function assumes "sensible" things
> to be passed in, but specifically also permitting the [start,~0UL] case
> (just in a more generalizer fashion). The problem you're trying to fix here
> is something passing in a nonsense range (starting above the valid range).
> Yet if we want the function to be immune to nonsense being passed in, I
> think start < end is what needs checking for, and that check would then
> perhaps better go after end was already adjusted.
> 
> The obvious alternative is for callers to only pass in sane ranges (in which
> case adding ASSERT() here may be considered, instead of triggering the
> one in the rangeset code).
> 
> In any event you want to add unlikely(), just like the similar end check has.

FWIW I'll be happy to change the test to unlikely( start < end ), if
this is OK with George. I'll test the change now.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/p2m: fixed p2m_change_type_range() start / end check

2018-04-18 Thread Jan Beulich
>>> On 17.04.18 at 19:16,  wrote:
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -976,6 +976,13 @@ void p2m_change_type_range(struct domain *d,
>  ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
>  
>  p2m_lock(p2m);
> +
> +if ( start > p2m->max_mapped_pfn )
> +{
> +p2m_unlock(p2m);
> +return;
> +}

I realize this is what George has suggested, but I still wonder if this is the
right thing to do here: Why is this any more important to check than the
more general start >= end (in which case the assertion in the rangeset
code would also trigger)? Till now the function assumes "sensible" things
to be passed in, but specifically also permitting the [start,~0UL] case
(just in a more generalizer fashion). The problem you're trying to fix here
is something passing in a nonsense range (starting above the valid range).
Yet if we want the function to be immune to nonsense being passed in, I
think start < end is what needs checking for, and that check would then
perhaps better go after end was already adjusted.

The obvious alternative is for callers to only pass in sane ranges (in which
case adding ASSERT() here may be considered, instead of triggering the
one in the rangeset code).

In any event you want to add unlikely(), just like the similar end check has.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Roger Pau Monné
On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote:
> On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> 3.2 Backend exports dma-buf to xen-front
> 
> In this case Dom0 pages are shared with DomU. As before, DomU can only write
> to these pages, not any other page from Dom0, so it can be still considered
> safe.
> But, the following must be considered (highlighted in xen-front's Kernel
> documentation):
>  - If guest domain dies then pages/grants received from the backend cannot
>    be claimed back - think of it as memory lost to Dom0 (won't be used for
> any
>    other guest)
>  - Misbehaving guest may send too many requests to the backend exhausting
>    its grant references and memory (consider this from security POV). As the
>    backend runs in the trusted domain we also assume that it is trusted as
> well,
>    e.g. must take measures to prevent DDoS attacks.

I cannot parse the above sentence:

"As the backend runs in the trusted domain we also assume that it is
trusted as well, e.g. must take measures to prevent DDoS attacks."

What's the relation between being trusted and protecting from DoS
attacks?

In any case, all? PV protocols are implemented with the frontend
sharing pages to the backend, and I think there's a reason why this
model is used, and it should continue to be used.

Having to add logic in the backend to prevent such attacks means
that:

 - We need more code in the backend, which increases complexity and
   chances of bugs.
 - Such code/logic could be wrong, thus allowing DoS.

> 4. xen-front/backend/xen-zcopy synchronization
> 
> 4.1. As I already said in 2) all the inter VM communication happens between
> xen-front and the backend, xen-zcopy is NOT involved in that.
> When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues a
> XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE).
> This call is synchronous, so xen-front expects that backend does free the
> buffer pages on return.
> 
> 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
>   - closes all dumb handles/fd's of the buffer according to [3]
>   - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-zcopy to make
> sure
>     the buffer is freed (think of it as it waits for dma-buf->release
> callback)

So this zcopy thing keeps some kind of track of the memory usage? Why
can't the user-space backend keep track of the buffer usage?

>   - replies to xen-front that the buffer can be destroyed.
> This way deletion of the buffer happens synchronously on both Dom0 and DomU
> sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out
> error
> (BTW, wait time is a parameter of this IOCTL), Xen will defer grant
> reference
> removal and will retry later until those are free.
> 
> Hope this helps understand how buffers are synchronously deleted in case
> of xen-zcopy with a single protocol command.
> 
> I think the above logic can also be re-used by the hyper-dmabuf driver with
> some additional work:
> 
> 1. xen-zcopy can be split into 2 parts and extend:
> 1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and
> vise versa,

I don't know much about the dma-buf implementation in Linux, but
gntdev is a user-space device, and AFAICT user-space applications
don't have any notion of dma buffers. How are such buffers useful for
user-space? Why can't this just be called memory?

Also, (with my FreeBSD maintainer hat) how is this going to translate
to other OSes? So far the operations performed by the gntdev device
are mostly OS-agnostic because this just map/unmap memory, and in fact
they are implemented by Linux and FreeBSD.

> implement "wait" ioctl (wait for dma-buf->release): currently these are
> DRM_XEN_ZCOPY_DUMB_FROM_REFS, DRM_XEN_ZCOPY_DUMB_TO_REFS and
> DRM_XEN_ZCOPY_DUMB_WAIT_FREE
> 1.2. Xen balloon driver [6] to allow allocating contiguous buffers (not
> needed
> by current hyper-dmabuf, but is a must for xen-zcopy use-cases)

I think this needs clarifying. In which memory space do you need those
regions to be contiguous?

Do they need to be contiguous in host physical memory, or guest
physical memory?

If it's in guest memory space, isn't there any generic interface that
you can use?

If it's in host physical memory space, why do you need this buffer to
be contiguous in host physical memory space? The IOMMU should hide all
this.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] drm/xen-front: Remove CMA support

2018-04-18 Thread Oleksandr Andrushchenko

On 04/17/2018 12:08 PM, Oleksandr Andrushchenko wrote:

On 04/17/2018 12:04 PM, Daniel Vetter wrote:

On Tue, Apr 17, 2018 at 10:40:12AM +0300, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Even if xen-front allocates its buffers from contiguous memory
those are still not contiguous in PA space, e.g. the buffer is only
contiguous in IPA space.
The only use-case for this mode was if xen-front is used to allocate
dumb buffers which later be used by some other driver requiring
contiguous memory, but there is no currently such a use-case or
it can be worked around with xen-front.
Please also mention the nents confusion here, and the patch that 
fixes it.

Or just outright take the commit message from my patch with all the
details:

ok, if you don't mind then I'll use your commit message entirely

 drm/xen: Dissable CMA support
  It turns out this was only needed to paper over a bug in 
the CMA

 helpers, which was addressed in
  commit 998fb1a0f478b83492220ff79583bf9ad538bdd8
 Author: Liviu Dudau 
 Date:   Fri Nov 10 13:33:10 2017 +
  drm: gem_cma_helper.c: Allow importing of contiguous 
scatterlists with nents > 1

  Without this the following pipeline didn't work:
  domU:
 1. xen-front allocates a non-contig buffer
 2. creates grants out of it
  dom0:
 3. converts the grants into a dma-buf. Since they're non-contig, 
the

 scatter-list is huge.
 4. imports it into rcar-du, which requires dma-contig memory for
 scanout.
  -> On this given platform there's an IOMMU, so in theory 
this should

 work. But in practice this failed, because of the huge number of sg
 entries, even though the IOMMU driver mapped it all into a 
dma-contig

 range.
  With a guest-contig buffer allocated in step 1, this 
problem doesn't

 exist. But there's technically no reason to require guest-contig
 memory for xen buffer sharing using grants.

With the commit message improved:

Acked-by: Daniel Vetter 

Thank you,
I'll wait for a day and apply to drm-misc-next if this is ok

applied to drm-misc-next


Signed-off-by: Oleksandr Andrushchenko 


Suggested-by: Daniel Vetter 
---
  Documentation/gpu/xen-front.rst | 12 
  drivers/gpu/drm/xen/Kconfig | 13 
  drivers/gpu/drm/xen/Makefile    |  9 +--
  drivers/gpu/drm/xen/xen_drm_front.c | 62 +++-
  drivers/gpu/drm/xen/xen_drm_front.h | 42 ++-
  drivers/gpu/drm/xen/xen_drm_front_gem.c | 12 +---
  drivers/gpu/drm/xen/xen_drm_front_gem.h |  3 -
  drivers/gpu/drm/xen/xen_drm_front_gem_cma.c | 79 
-

  drivers/gpu/drm/xen/xen_drm_front_shbuf.c   | 22 --
  drivers/gpu/drm/xen/xen_drm_front_shbuf.h   |  8 ---
  10 files changed, 21 insertions(+), 241 deletions(-)
  delete mode 100644 drivers/gpu/drm/xen/xen_drm_front_gem_cma.c

diff --git a/Documentation/gpu/xen-front.rst 
b/Documentation/gpu/xen-front.rst

index 009d942386c5..d988da7d1983 100644
--- a/Documentation/gpu/xen-front.rst
+++ b/Documentation/gpu/xen-front.rst
@@ -18,18 +18,6 @@ Buffers allocated by the frontend driver
  .. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h
 :doc: Buffers allocated by the frontend driver
  -With GEM CMA helpers
-
-
-.. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h
-   :doc: With GEM CMA helpers
-
-Without GEM CMA helpers
-~~~
-
-.. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h
-   :doc: Without GEM CMA helpers
-
  Buffers allocated by the backend
  
  diff --git a/drivers/gpu/drm/xen/Kconfig 
b/drivers/gpu/drm/xen/Kconfig

index 4f4abc91f3b6..4cca160782ab 100644
--- a/drivers/gpu/drm/xen/Kconfig
+++ b/drivers/gpu/drm/xen/Kconfig
@@ -15,16 +15,3 @@ config DRM_XEN_FRONTEND
  help
    Choose this option if you want to enable a para-virtualized
    frontend DRM/KMS driver for Xen guest OSes.
-
-config DRM_XEN_FRONTEND_CMA
-    bool "Use DRM CMA to allocate dumb buffers"
-    depends on DRM_XEN_FRONTEND
-    select DRM_KMS_CMA_HELPER
-    select DRM_GEM_CMA_HELPER
-    help
-  Use DRM CMA helpers to allocate display buffers.
-  This is useful for the use-cases when guest driver needs to
-  share or export buffers to other drivers which only expect
-  contiguous buffers.
-  Note: in this mode driver cannot use buffers allocated
-  by the backend.
diff --git a/drivers/gpu/drm/xen/Makefile 
b/drivers/gpu/drm/xen/Makefile

index 352730dc6c13..712afff5ffc3 100644
--- a/drivers/gpu/drm/xen/Makefile
+++ b/drivers/gpu/drm/xen/Makefile
@@ -5,12 +5,7 @@ drm_xen_front-objs := xen_drm_front.o \
    xen_drm_front_conn.o \
    xen_drm_front_evtchnl.o \
    

Re: [Xen-devel] [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible

2018-04-18 Thread Juergen Gross
On 16/04/18 20:22, Juergen Gross wrote:
> On 16/04/18 17:34, Tim Deegan wrote:
>> Hi,
>>
>> At 02:43 -0600 on 13 Apr (1523587395), Jan Beulich wrote:
>> On 12.04.18 at 20:09,  wrote:
 For mitigation of Meltdown the current L4 page table is copied to the
 cpu local root page table each time a 64 bit pv guest is entered.

 Copying can be avoided in cases where the guest L4 page table hasn't
 been modified while running the hypervisor, e.g. when handling
 interrupts or any hypercall not modifying the L4 page table or %cr3.

 So add a per-cpu flag indicating whether the copying should be
 performed and set that flag only when loading a new %cr3 or modifying
 the L4 page table.  This includes synchronization of the cpu local
 root page table with other cpus, so add a special synchronization flag
 for that case.

 A simple performance check (compiling the hypervisor via "make -j 4")
 in dom0 with 4 vcpus shows a significant improvement:

 - real time drops from 112 seconds to 103 seconds
 - system time drops from 142 seconds to 131 seconds

 Signed-off-by: Juergen Gross 
 Reviewed-by: Jan Beulich 
 ---
 V7:
 - add missing flag setting in shadow code
>>>
>>> This now needs an ack from Tim (now Cc-ed).
>>
>> I may be misunderstanding how this flag is supposed to work, but this
>> seems at first glance to do both too much and too little.  The sl4
>> table that's being modified may be in use on any number of other
>> pcpus, and might not be in use on the current pcpu.
> 
> Okay.
> 
>> It looks like the do_mmu_update path issues a flush IPI; I think that
>> the equivalent IPI would be a better place to hook if you can.
> 
> I want to catch only cases where the L4 table is being modified.
> 
>> Also I'm not sure why the flag needs to be set in
>> l4e_propagate_from_guest() as well as shadow_set_l4e().  Can you
>> elaborate?
> 
> I narrowed down iteratively where the setting of the flag is mandatory.
> Doing it in shadow_set_l4e() seemed to look right, but it wasn't enough.
> Setting it also in l4e_propagate_from_guest() showed no further problems
> migrating the guest.

I just revisited this after looking into the code. Seems I remembered
the sequence of narrowing down wrong. l4e_propagate_from_guest() doesn't
need to set the flag.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Oleksandr Andrushchenko

On 04/17/2018 11:57 PM, Dongwon Kim wrote:

On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:

On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:

Yeah, I definitely agree on the idea of expanding the use case to the
general domain where dmabuf sharing is used. However, what you are
targetting with proposed changes is identical to the core design of
hyper_dmabuf.

On top of this basic functionalities, hyper_dmabuf has driver level
inter-domain communication, that is needed for dma-buf remote tracking
(no fence forwarding though), event triggering and event handling, extra
meta data exchange and hyper_dmabuf_id that represents grefs
(grefs are shared implicitly on driver level)

This really isn't a positive design aspect of hyperdmabuf imo. The core
code in xen-zcopy (ignoring the ioctl side, which will be cleaned up) is
very simple & clean.

If there's a clear need later on we can extend that. But for now xen-zcopy
seems to cover the basic use-case needs, so gets the job done.


Also it is designed with frontend (common core framework) + backend
(hyper visor specific comm and memory sharing) structure for portability.
We just can't limit this feature to Xen because we want to use the same
uapis not only for Xen but also other applicable hypervisor, like ACORN.

See the discussion around udmabuf and the needs for kvm. I think trying to
make an ioctl/uapi that works for multiple hypervisors is misguided - it
likely won't work.

On top of that the 2nd hypervisor you're aiming to support is ACRN. That's
not even upstream yet, nor have I seen any patches proposing to land linux
support for ACRN. Since it's not upstream, it doesn't really matter for
upstream consideration. I'm doubting that ACRN will use the same grant
references as xen, so the same uapi won't work on ACRN as on Xen anyway.

Yeah, ACRN doesn't have grant-table. Only Xen supports it. But that is why
hyper_dmabuf has been architectured with the concept of backend.
If you look at the structure of backend, you will find that
backend is just a set of standard function calls as shown here:

struct hyper_dmabuf_bknd_ops {
 /* backend initialization routine (optional) */
 int (*init)(void);

 /* backend cleanup routine (optional) */
 int (*cleanup)(void);

 /* retreiving id of current virtual machine */
 int (*get_vm_id)(void);

 /* get pages shared via hypervisor-specific method */
 int (*share_pages)(struct page **pages, int vm_id,
int nents, void **refs_info);

 /* make shared pages unshared via hypervisor specific method */
 int (*unshare_pages)(void **refs_info, int nents);

 /* map remotely shared pages on importer's side via
  * hypervisor-specific method
  */
 struct page ** (*map_shared_pages)(unsigned long ref, int vm_id,
int nents, void **refs_info);

 /* unmap and free shared pages on importer's side via
  * hypervisor-specific method
  */
 int (*unmap_shared_pages)(void **refs_info, int nents);

 /* initialize communication environment */
 int (*init_comm_env)(void);

 void (*destroy_comm)(void);

 /* upstream ch setup (receiving and responding) */
 int (*init_rx_ch)(int vm_id);

 /* downstream ch setup (transmitting and parsing responses) */
 int (*init_tx_ch)(int vm_id);

 int (*send_req)(int vm_id, struct hyper_dmabuf_req *req, int wait);
};

All of these can be mapped with any hypervisor specific implementation.
We designed backend implementation for Xen using grant-table, Xen event
and ring buffer communication. For ACRN, we have another backend using Virt-IO
for both memory sharing and communication.

We tried to define this structure of backend to make it general enough (or
it can be even modified or extended to support more cases.) so that it can
fit to other hypervisor cases. Only requirements/expectation on the hypervisor
are page-level memory sharing and inter-domain communication, which I think
are standard features of modern hypervisor.

And please review common UAPIs that hyper_dmabuf and xen-zcopy supports. They
are very general. One is getting FD (dmabuf) and get those shared. The other
is generating dmabuf from global handle (secure handle hiding gref behind it).
On top of this, hyper_dmabuf has "unshare" and "query" which are also useful
for any cases.

So I don't know why we wouldn't want to try to make these standard in most of
hypervisor cases instead of limiting it to certain hypervisor like Xen.
Frontend-backend structre is optimal for this I think.


So I am wondering we can start with this hyper_dmabuf then modify it for
your use-case if needed and polish and fix any glitches if we want to
to use this for all general dma-buf usecases.

Imo xen-zcopy is a much more reasonable starting point for upstream, which
can