Re: [Xen-devel] [PATCH v7 0/9] xen/x86: various XPTI speedups

2018-04-13 Thread Juergen Gross
On 13/04/18 11:59, Andrew Cooper wrote:
> On 12/04/18 19:09, Juergen Gross wrote:
>> This patch series aims at reducing the overhead of the XPTI Meltdown
>> mitigation.
> 
> Sadly, there are still problems. 
> 
> (XEN) [   13.486805] Dom0 has maximum 2 VCPUs
> (XEN) [   13.486824] [ Xen-4.11.0-5.0.3-d  x86_64  debug=y   Not tainted 
> ]
> (XEN) [   13.486826] CPU:0
> (XEN) [   13.486828] RIP:e008:[] 
> switch_cr3_cr4+0x58/0x116
> (XEN) [   13.486833] RFLAGS: 00010086   CONTEXT: hypervisor
> (XEN) [   13.486836] rax: 00df   rbx: 0282   rcx: 
> 82d0804b7fff
> (XEN) [   13.486839] rdx: 00152660   rsi: 001526e0   rdi: 
> 801071d4a000
> (XEN) [   13.486841] rbp: 82d0804b78d8   rsp: 82d0804b78a8   r8:  
> 
> (XEN) [   13.486844] r9:     r10: 00ff00ff00ff00ff   r11: 
> 0f0f0f0f0f0f0f0f
> (XEN) [   13.486847] r12: 801071d4a000   r13: 57ea8000   r14: 
> 001526e0
> (XEN) [   13.486849] r15: 83107326f000   cr0: 8005003b   cr4: 
> 00152660
> (XEN) [   13.486851] cr3: 57ea8000   cr2: 
> (XEN) [   13.486853] fsb:    gsb:    gss: 
> 
> (XEN) [   13.486855] ds:    es:    fs:    gs:    ss:    
> cs: e008
> (XEN) [   13.486859] Xen code around  
> (switch_cr3_cr4+0x58/0x116):
> (XEN) [   13.486860]  00 00 66 0f 38 82 4d d0 <41> 0f 22 dc 4c 39 f2 75 56 4c 
> 89 ea 81 e2 ff 0f
> (XEN) [   13.486869] Xen stack trace from rsp=82d0804b78a8:
> (XEN) [   13.486870]82d0804b78d8 82d0804466a2 83005a1f1000 
> 0002
> (XEN) [   13.486874]8200 83060fa0 82d0804b7d68 
> 82d08044349e
> (XEN) [   13.486878] 83060fa0 8200 
> 0ff0
> (XEN) [   13.486881] 001071d4c000 831071d4b000 
> 831071d4c000
> (XEN) [   13.486884]81d49000  0013 
> 831071d4dff8
> (XEN) [   13.486887]001071d5c000 831071d4d000 81d5e000 
> 8100
> (XEN) [   13.486891]01072000 001071d5d000 81d4a000 
> 81d49000
> (XEN) [   13.486894]831071d4c080 2000 0107 
> 81d49000
> (XEN) [   13.486897]8200 831071d4aff8 2000 
> 0001
> (XEN) [   13.486900]00800020 0080 570a 
> 0004
> (XEN) [   13.486903] 8000 831071d4dff0 
> 82d080485580
> (XEN) [   13.486907]83005a1f1000 05709ac2  
> 832079bd182c
> (XEN) [   13.486910]832079bd19e8   
> 
> (XEN) [   13.486913]0001 82d0803fd5e8 81b051f0 
> 0001
> (XEN) [   13.486916]82d0803fd436 81001000 0001 
> 82d0803fd410
> (XEN) [   13.486919]8000 0001 82d0803fd429 
> 
> (XEN) [   13.486923]0002 82d0803fd578 832079bd1868 
> 0002
> (XEN) [   13.486926]82d0803fd3d4 832079bd183c 0002 
> 82d0803fd584
> (XEN) [   13.486929]832079bd1854 0002 82d0803fd3cd 
> 832079bd1944
> (XEN) [   13.486933]0002 82d0803fd592 832079bd1930 
> 0002
> (XEN) [   13.486936] Xen call trace:
> (XEN) [   13.486938][] switch_cr3_cr4+0x58/0x116
> (XEN) [   13.486942][] dom0_construct_pv+0x1bb1/0x29e3
> (XEN) [   13.486945][] construct_dom0+0x8c/0xb86
> (XEN) [   13.486949][] __start_xen+0x23c4/0x2629
> (XEN) [   13.486952][] __high_start+0x53/0x58
> (XEN) [   13.486954]
> (XEN) [   14.047278]
> (XEN) [   14.049274] 
> (XEN) [   14.054734] Panic on CPU 0:
> (XEN) [   14.058026] GENERAL PROTECTION FAULT
> (XEN) [   14.062099] [error_code=]
> (XEN) [   14.065565] 
> (XEN) [   14.071024]
> (XEN) [   14.073018] Reboot in five seconds...
> 
> The faulting instruction is `mov %r12, %cr3` which is trying to use
> noflush while %cr4.pcide is clear.

While I can see how that happened I'm not sure why I didn't hit this
when testing my series. Could it be some cpus won't GP in this case?

Could you try the series without the last patch? Maybe it would be
possible to commit some of the patches at least.

I'm just about to leave for the Linux root conference in Kiev, so the
patch attached is only compile tested. You might want to try that.


Juergen

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 22c5150444..34c77bcbe4 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -718,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d,
 update_cr3

Re: [Xen-devel] [PATCH v7 0/9] xen/x86: various XPTI speedups

2018-04-13 Thread Andrew Cooper
On 12/04/18 19:09, Juergen Gross wrote:
> This patch series aims at reducing the overhead of the XPTI Meltdown
> mitigation.

Sadly, there are still problems. 

(XEN) [   13.486805] Dom0 has maximum 2 VCPUs
(XEN) [   13.486824] [ Xen-4.11.0-5.0.3-d  x86_64  debug=y   Not tainted 
]
(XEN) [   13.486826] CPU:0
(XEN) [   13.486828] RIP:e008:[] switch_cr3_cr4+0x58/0x116
(XEN) [   13.486833] RFLAGS: 00010086   CONTEXT: hypervisor
(XEN) [   13.486836] rax: 00df   rbx: 0282   rcx: 
82d0804b7fff
(XEN) [   13.486839] rdx: 00152660   rsi: 001526e0   rdi: 
801071d4a000
(XEN) [   13.486841] rbp: 82d0804b78d8   rsp: 82d0804b78a8   r8:  

(XEN) [   13.486844] r9:     r10: 00ff00ff00ff00ff   r11: 
0f0f0f0f0f0f0f0f
(XEN) [   13.486847] r12: 801071d4a000   r13: 57ea8000   r14: 
001526e0
(XEN) [   13.486849] r15: 83107326f000   cr0: 8005003b   cr4: 
00152660
(XEN) [   13.486851] cr3: 57ea8000   cr2: 
(XEN) [   13.486853] fsb:    gsb:    gss: 

(XEN) [   13.486855] ds:    es:    fs:    gs:    ss:    cs: 
e008
(XEN) [   13.486859] Xen code around  
(switch_cr3_cr4+0x58/0x116):
(XEN) [   13.486860]  00 00 66 0f 38 82 4d d0 <41> 0f 22 dc 4c 39 f2 75 56 4c 
89 ea 81 e2 ff 0f
(XEN) [   13.486869] Xen stack trace from rsp=82d0804b78a8:
(XEN) [   13.486870]82d0804b78d8 82d0804466a2 83005a1f1000 
0002
(XEN) [   13.486874]8200 83060fa0 82d0804b7d68 
82d08044349e
(XEN) [   13.486878] 83060fa0 8200 
0ff0
(XEN) [   13.486881] 001071d4c000 831071d4b000 
831071d4c000
(XEN) [   13.486884]81d49000  0013 
831071d4dff8
(XEN) [   13.486887]001071d5c000 831071d4d000 81d5e000 
8100
(XEN) [   13.486891]01072000 001071d5d000 81d4a000 
81d49000
(XEN) [   13.486894]831071d4c080 2000 0107 
81d49000
(XEN) [   13.486897]8200 831071d4aff8 2000 
0001
(XEN) [   13.486900]00800020 0080 570a 
0004
(XEN) [   13.486903] 8000 831071d4dff0 
82d080485580
(XEN) [   13.486907]83005a1f1000 05709ac2  
832079bd182c
(XEN) [   13.486910]832079bd19e8   

(XEN) [   13.486913]0001 82d0803fd5e8 81b051f0 
0001
(XEN) [   13.486916]82d0803fd436 81001000 0001 
82d0803fd410
(XEN) [   13.486919]8000 0001 82d0803fd429 

(XEN) [   13.486923]0002 82d0803fd578 832079bd1868 
0002
(XEN) [   13.486926]82d0803fd3d4 832079bd183c 0002 
82d0803fd584
(XEN) [   13.486929]832079bd1854 0002 82d0803fd3cd 
832079bd1944
(XEN) [   13.486933]0002 82d0803fd592 832079bd1930 
0002
(XEN) [   13.486936] Xen call trace:
(XEN) [   13.486938][] switch_cr3_cr4+0x58/0x116
(XEN) [   13.486942][] dom0_construct_pv+0x1bb1/0x29e3
(XEN) [   13.486945][] construct_dom0+0x8c/0xb86
(XEN) [   13.486949][] __start_xen+0x23c4/0x2629
(XEN) [   13.486952][] __high_start+0x53/0x58
(XEN) [   13.486954]
(XEN) [   14.047278]
(XEN) [   14.049274] 
(XEN) [   14.054734] Panic on CPU 0:
(XEN) [   14.058026] GENERAL PROTECTION FAULT
(XEN) [   14.062099] [error_code=]
(XEN) [   14.065565] 
(XEN) [   14.071024]
(XEN) [   14.073018] Reboot in five seconds...

The faulting instruction is `mov %r12, %cr3` which is trying to use
noflush while %cr4.pcide is clear.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 0/9] xen/x86: various XPTI speedups

2018-04-12 Thread Juergen Gross
This patch series aims at reducing the overhead of the XPTI Meltdown
mitigation.

Patch 1 had been posted before, the main changes in this patch are due
to addressing Jan's comments on my first version. The main objective of
that patch is to avoid copying the L4 page table each time the guest is
being activated, as often the contents didn't change while the
hypervisor was active.

Patch 2 adds a new helper for writing cr3 instead of open coding the
inline assembly in multiple places.

Patch 3 sets the stage for being able to activate XPTI per domain. As a
first step it is now possible to switch XPTI off for dom0 via the xpti
boot parameter.

Patch 4 adds support for using the INVPCID instruction for flushing
the TLB.

Patch 5 reduces the costs of TLB flushes even further: as we don't make
any use of global TLB entries with XPTI being active we can avoid
removing all global TLB entries on TLB flushes by simply deactivating
the global pages in CR4.

Patch 6 prepares using PCIDs in patch 6.
For that purpose it was necessary to allow CR3 values with bit 63 set
in order to avoid flushing TLB entries when writing CR3. This requires
a modification of Jan's rather clever state machine with positive and
negative CR3 values for the hypervisor by using a dedicated flag byte
instead.

Patch 7 converts pv_guest_cr4_to_real_cr4() from a macro to a function
as it was becoming more and more complex.

Patch 8 adds some PCID helper functions for accessing the different
parts of cr3 (address and pcid part).

Patch 9 is the main performance contributor: by making use of the PCID
feature (if available) TLB entries can survive CR3 switches. The TLB
needs to be flushed on context switches only and not when switching
between guest and hypervisor or guest kernel and user mode.

On my machine (Intel i7-4600M) using the PCID feature in the non-XPTI
case showed a slightly worse performance than using global pages
instead (using PCID and global pages is a bad idea as invalidating
global pages in this case would need a complete TLB flush). For this
reason I've decided to use PCID for XPTI only as the default. That
can easily be changed by using the command line parameter "pcid=true".

The complete series has been verified to still mitigate against
Meltdown attacks. A simple performance test (make -j 4 in the Xen
hypervisor directory) showed significant improvements compared to the
state without this series.
Numbers are seconds, stddev in braces.

xpti=false  elapsed system user
unpatched:  88.42 ( 2.01)   94.49 ( 1.38)  180.40 ( 1.41)
patched  :  89.45 ( 3.10)   96.47 ( 3.22)  181.34 ( 1.98)

xpti=true   elapsed system user
unpatched: 113.43 ( 3.68)  165.44 ( 4.41)  183.30 ( 1.72)
patched  :  92.76 ( 2.11)  103.39 ( 1.13)  184.86 ( 0.12)


Juergen Gross (9):
  x86/xpti: avoid copying L4 page table contents when possible
  xen/x86: add a function for modifying cr3
  xen/x86: support per-domain flag for xpti
  xen/x86: use invpcid for flushing the TLB
  xen/x86: disable global pages for domains with XPTI active
  xen/x86: use flag byte for decision whether xen_cr3 is valid
  xen/x86: convert pv_guest_cr4_to_real_cr4() to a function
  xen/x86: add some cr3 helpers
  xen/x86: use PCID feature

 docs/misc/xen-command-line.markdown | 37 +-
 xen/arch/x86/cpu/mtrr/generic.c | 37 +-
 xen/arch/x86/debug.c|  2 +-
 xen/arch/x86/domain.c   |  6 +--
 xen/arch/x86/domain_page.c  |  2 +-
 xen/arch/x86/flushtlb.c | 98 ++---
 xen/arch/x86/mm.c   | 86 ++--
 xen/arch/x86/mm/shadow/multi.c  |  4 ++
 xen/arch/x86/pv/dom0_build.c|  8 +--
 xen/arch/x86/pv/domain.c| 89 -
 xen/arch/x86/setup.c| 27 +++---
 xen/arch/x86/smp.c  |  2 +-
 xen/arch/x86/smpboot.c  |  6 ++-
 xen/arch/x86/spec_ctrl.c| 70 ++
 xen/arch/x86/x86_64/asm-offsets.c   |  2 +
 xen/arch/x86/x86_64/compat/entry.S  |  5 +-
 xen/arch/x86/x86_64/entry.S | 78 -
 xen/common/efi/runtime.c|  4 +-
 xen/include/asm-x86/current.h   | 23 +++--
 xen/include/asm-x86/domain.h| 17 +++
 xen/include/asm-x86/flushtlb.h  |  4 +-
 xen/include/asm-x86/invpcid.h   |  2 +
 xen/include/asm-x86/processor.h | 18 +++
 xen/include/asm-x86/pv/domain.h | 31 
 xen/include/asm-x86/spec_ctrl.h |  4 ++
 xen/include/asm-x86/x86-defns.h |  4 +-
 26 files changed, 521 insertions(+), 145 deletions(-)

-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel