Re: [Xen-devel] [PATCH v7 0/9] xen/x86: various XPTI speedups
On 13/04/18 11:59, Andrew Cooper wrote: > On 12/04/18 19:09, Juergen Gross wrote: >> This patch series aims at reducing the overhead of the XPTI Meltdown >> mitigation. > > Sadly, there are still problems. > > (XEN) [ 13.486805] Dom0 has maximum 2 VCPUs > (XEN) [ 13.486824] [ Xen-4.11.0-5.0.3-d x86_64 debug=y Not tainted > ] > (XEN) [ 13.486826] CPU:0 > (XEN) [ 13.486828] RIP:e008:[] > switch_cr3_cr4+0x58/0x116 > (XEN) [ 13.486833] RFLAGS: 00010086 CONTEXT: hypervisor > (XEN) [ 13.486836] rax: 00df rbx: 0282 rcx: > 82d0804b7fff > (XEN) [ 13.486839] rdx: 00152660 rsi: 001526e0 rdi: > 801071d4a000 > (XEN) [ 13.486841] rbp: 82d0804b78d8 rsp: 82d0804b78a8 r8: > > (XEN) [ 13.486844] r9: r10: 00ff00ff00ff00ff r11: > 0f0f0f0f0f0f0f0f > (XEN) [ 13.486847] r12: 801071d4a000 r13: 57ea8000 r14: > 001526e0 > (XEN) [ 13.486849] r15: 83107326f000 cr0: 8005003b cr4: > 00152660 > (XEN) [ 13.486851] cr3: 57ea8000 cr2: > (XEN) [ 13.486853] fsb: gsb: gss: > > (XEN) [ 13.486855] ds: es: fs: gs: ss: > cs: e008 > (XEN) [ 13.486859] Xen code around > (switch_cr3_cr4+0x58/0x116): > (XEN) [ 13.486860] 00 00 66 0f 38 82 4d d0 <41> 0f 22 dc 4c 39 f2 75 56 4c > 89 ea 81 e2 ff 0f > (XEN) [ 13.486869] Xen stack trace from rsp=82d0804b78a8: > (XEN) [ 13.486870]82d0804b78d8 82d0804466a2 83005a1f1000 > 0002 > (XEN) [ 13.486874]8200 83060fa0 82d0804b7d68 > 82d08044349e > (XEN) [ 13.486878] 83060fa0 8200 > 0ff0 > (XEN) [ 13.486881] 001071d4c000 831071d4b000 > 831071d4c000 > (XEN) [ 13.486884]81d49000 0013 > 831071d4dff8 > (XEN) [ 13.486887]001071d5c000 831071d4d000 81d5e000 > 8100 > (XEN) [ 13.486891]01072000 001071d5d000 81d4a000 > 81d49000 > (XEN) [ 13.486894]831071d4c080 2000 0107 > 81d49000 > (XEN) [ 13.486897]8200 831071d4aff8 2000 > 0001 > (XEN) [ 13.486900]00800020 0080 570a > 0004 > (XEN) [ 13.486903] 8000 831071d4dff0 > 82d080485580 > (XEN) [ 13.486907]83005a1f1000 05709ac2 > 832079bd182c > (XEN) [ 13.486910]832079bd19e8 > > (XEN) [ 13.486913]0001 82d0803fd5e8 81b051f0 > 0001 > (XEN) [ 13.486916]82d0803fd436 81001000 0001 > 82d0803fd410 > (XEN) [ 13.486919]8000 0001 82d0803fd429 > > (XEN) [ 13.486923]0002 82d0803fd578 832079bd1868 > 0002 > (XEN) [ 13.486926]82d0803fd3d4 832079bd183c 0002 > 82d0803fd584 > (XEN) [ 13.486929]832079bd1854 0002 82d0803fd3cd > 832079bd1944 > (XEN) [ 13.486933]0002 82d0803fd592 832079bd1930 > 0002 > (XEN) [ 13.486936] Xen call trace: > (XEN) [ 13.486938][] switch_cr3_cr4+0x58/0x116 > (XEN) [ 13.486942][] dom0_construct_pv+0x1bb1/0x29e3 > (XEN) [ 13.486945][] construct_dom0+0x8c/0xb86 > (XEN) [ 13.486949][] __start_xen+0x23c4/0x2629 > (XEN) [ 13.486952][] __high_start+0x53/0x58 > (XEN) [ 13.486954] > (XEN) [ 14.047278] > (XEN) [ 14.049274] > (XEN) [ 14.054734] Panic on CPU 0: > (XEN) [ 14.058026] GENERAL PROTECTION FAULT > (XEN) [ 14.062099] [error_code=] > (XEN) [ 14.065565] > (XEN) [ 14.071024] > (XEN) [ 14.073018] Reboot in five seconds... > > The faulting instruction is `mov %r12, %cr3` which is trying to use > noflush while %cr4.pcide is clear. While I can see how that happened I'm not sure why I didn't hit this when testing my series. Could it be some cpus won't GP in this case? Could you try the series without the last patch? Maybe it would be possible to commit some of the patches at least. I'm just about to leave for the Linux root conference in Kiev, so the patch attached is only compile tested. You might want to try that. Juergen diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index 22c5150444..34c77bcbe4 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -718,7 +718,7 @@ int __init dom0_construct_pv(struct domain *d, update_cr3
Re: [Xen-devel] [PATCH v7 0/9] xen/x86: various XPTI speedups
On 12/04/18 19:09, Juergen Gross wrote: > This patch series aims at reducing the overhead of the XPTI Meltdown > mitigation. Sadly, there are still problems. (XEN) [ 13.486805] Dom0 has maximum 2 VCPUs (XEN) [ 13.486824] [ Xen-4.11.0-5.0.3-d x86_64 debug=y Not tainted ] (XEN) [ 13.486826] CPU:0 (XEN) [ 13.486828] RIP:e008:[] switch_cr3_cr4+0x58/0x116 (XEN) [ 13.486833] RFLAGS: 00010086 CONTEXT: hypervisor (XEN) [ 13.486836] rax: 00df rbx: 0282 rcx: 82d0804b7fff (XEN) [ 13.486839] rdx: 00152660 rsi: 001526e0 rdi: 801071d4a000 (XEN) [ 13.486841] rbp: 82d0804b78d8 rsp: 82d0804b78a8 r8: (XEN) [ 13.486844] r9: r10: 00ff00ff00ff00ff r11: 0f0f0f0f0f0f0f0f (XEN) [ 13.486847] r12: 801071d4a000 r13: 57ea8000 r14: 001526e0 (XEN) [ 13.486849] r15: 83107326f000 cr0: 8005003b cr4: 00152660 (XEN) [ 13.486851] cr3: 57ea8000 cr2: (XEN) [ 13.486853] fsb: gsb: gss: (XEN) [ 13.486855] ds: es: fs: gs: ss: cs: e008 (XEN) [ 13.486859] Xen code around (switch_cr3_cr4+0x58/0x116): (XEN) [ 13.486860] 00 00 66 0f 38 82 4d d0 <41> 0f 22 dc 4c 39 f2 75 56 4c 89 ea 81 e2 ff 0f (XEN) [ 13.486869] Xen stack trace from rsp=82d0804b78a8: (XEN) [ 13.486870]82d0804b78d8 82d0804466a2 83005a1f1000 0002 (XEN) [ 13.486874]8200 83060fa0 82d0804b7d68 82d08044349e (XEN) [ 13.486878] 83060fa0 8200 0ff0 (XEN) [ 13.486881] 001071d4c000 831071d4b000 831071d4c000 (XEN) [ 13.486884]81d49000 0013 831071d4dff8 (XEN) [ 13.486887]001071d5c000 831071d4d000 81d5e000 8100 (XEN) [ 13.486891]01072000 001071d5d000 81d4a000 81d49000 (XEN) [ 13.486894]831071d4c080 2000 0107 81d49000 (XEN) [ 13.486897]8200 831071d4aff8 2000 0001 (XEN) [ 13.486900]00800020 0080 570a 0004 (XEN) [ 13.486903] 8000 831071d4dff0 82d080485580 (XEN) [ 13.486907]83005a1f1000 05709ac2 832079bd182c (XEN) [ 13.486910]832079bd19e8 (XEN) [ 13.486913]0001 82d0803fd5e8 81b051f0 0001 (XEN) [ 13.486916]82d0803fd436 81001000 0001 82d0803fd410 (XEN) [ 13.486919]8000 0001 82d0803fd429 (XEN) [ 13.486923]0002 82d0803fd578 832079bd1868 0002 (XEN) [ 13.486926]82d0803fd3d4 832079bd183c 0002 82d0803fd584 (XEN) [ 13.486929]832079bd1854 0002 82d0803fd3cd 832079bd1944 (XEN) [ 13.486933]0002 82d0803fd592 832079bd1930 0002 (XEN) [ 13.486936] Xen call trace: (XEN) [ 13.486938][] switch_cr3_cr4+0x58/0x116 (XEN) [ 13.486942][] dom0_construct_pv+0x1bb1/0x29e3 (XEN) [ 13.486945][] construct_dom0+0x8c/0xb86 (XEN) [ 13.486949][] __start_xen+0x23c4/0x2629 (XEN) [ 13.486952][] __high_start+0x53/0x58 (XEN) [ 13.486954] (XEN) [ 14.047278] (XEN) [ 14.049274] (XEN) [ 14.054734] Panic on CPU 0: (XEN) [ 14.058026] GENERAL PROTECTION FAULT (XEN) [ 14.062099] [error_code=] (XEN) [ 14.065565] (XEN) [ 14.071024] (XEN) [ 14.073018] Reboot in five seconds... The faulting instruction is `mov %r12, %cr3` which is trying to use noflush while %cr4.pcide is clear. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7 0/9] xen/x86: various XPTI speedups
This patch series aims at reducing the overhead of the XPTI Meltdown mitigation. Patch 1 had been posted before, the main changes in this patch are due to addressing Jan's comments on my first version. The main objective of that patch is to avoid copying the L4 page table each time the guest is being activated, as often the contents didn't change while the hypervisor was active. Patch 2 adds a new helper for writing cr3 instead of open coding the inline assembly in multiple places. Patch 3 sets the stage for being able to activate XPTI per domain. As a first step it is now possible to switch XPTI off for dom0 via the xpti boot parameter. Patch 4 adds support for using the INVPCID instruction for flushing the TLB. Patch 5 reduces the costs of TLB flushes even further: as we don't make any use of global TLB entries with XPTI being active we can avoid removing all global TLB entries on TLB flushes by simply deactivating the global pages in CR4. Patch 6 prepares using PCIDs in patch 6. For that purpose it was necessary to allow CR3 values with bit 63 set in order to avoid flushing TLB entries when writing CR3. This requires a modification of Jan's rather clever state machine with positive and negative CR3 values for the hypervisor by using a dedicated flag byte instead. Patch 7 converts pv_guest_cr4_to_real_cr4() from a macro to a function as it was becoming more and more complex. Patch 8 adds some PCID helper functions for accessing the different parts of cr3 (address and pcid part). Patch 9 is the main performance contributor: by making use of the PCID feature (if available) TLB entries can survive CR3 switches. The TLB needs to be flushed on context switches only and not when switching between guest and hypervisor or guest kernel and user mode. On my machine (Intel i7-4600M) using the PCID feature in the non-XPTI case showed a slightly worse performance than using global pages instead (using PCID and global pages is a bad idea as invalidating global pages in this case would need a complete TLB flush). For this reason I've decided to use PCID for XPTI only as the default. That can easily be changed by using the command line parameter "pcid=true". The complete series has been verified to still mitigate against Meltdown attacks. A simple performance test (make -j 4 in the Xen hypervisor directory) showed significant improvements compared to the state without this series. Numbers are seconds, stddev in braces. xpti=false elapsed system user unpatched: 88.42 ( 2.01) 94.49 ( 1.38) 180.40 ( 1.41) patched : 89.45 ( 3.10) 96.47 ( 3.22) 181.34 ( 1.98) xpti=true elapsed system user unpatched: 113.43 ( 3.68) 165.44 ( 4.41) 183.30 ( 1.72) patched : 92.76 ( 2.11) 103.39 ( 1.13) 184.86 ( 0.12) Juergen Gross (9): x86/xpti: avoid copying L4 page table contents when possible xen/x86: add a function for modifying cr3 xen/x86: support per-domain flag for xpti xen/x86: use invpcid for flushing the TLB xen/x86: disable global pages for domains with XPTI active xen/x86: use flag byte for decision whether xen_cr3 is valid xen/x86: convert pv_guest_cr4_to_real_cr4() to a function xen/x86: add some cr3 helpers xen/x86: use PCID feature docs/misc/xen-command-line.markdown | 37 +- xen/arch/x86/cpu/mtrr/generic.c | 37 +- xen/arch/x86/debug.c| 2 +- xen/arch/x86/domain.c | 6 +-- xen/arch/x86/domain_page.c | 2 +- xen/arch/x86/flushtlb.c | 98 ++--- xen/arch/x86/mm.c | 86 ++-- xen/arch/x86/mm/shadow/multi.c | 4 ++ xen/arch/x86/pv/dom0_build.c| 8 +-- xen/arch/x86/pv/domain.c| 89 - xen/arch/x86/setup.c| 27 +++--- xen/arch/x86/smp.c | 2 +- xen/arch/x86/smpboot.c | 6 ++- xen/arch/x86/spec_ctrl.c| 70 ++ xen/arch/x86/x86_64/asm-offsets.c | 2 + xen/arch/x86/x86_64/compat/entry.S | 5 +- xen/arch/x86/x86_64/entry.S | 78 - xen/common/efi/runtime.c| 4 +- xen/include/asm-x86/current.h | 23 +++-- xen/include/asm-x86/domain.h| 17 +++ xen/include/asm-x86/flushtlb.h | 4 +- xen/include/asm-x86/invpcid.h | 2 + xen/include/asm-x86/processor.h | 18 +++ xen/include/asm-x86/pv/domain.h | 31 xen/include/asm-x86/spec_ctrl.h | 4 ++ xen/include/asm-x86/x86-defns.h | 4 +- 26 files changed, 521 insertions(+), 145 deletions(-) -- 2.13.6 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel