On 16/08/2018 10:07, Andrew Cooper wrote: > On 22/06/2018 11:57, Jan Beulich wrote: >> --- a/xen/arch/x86/spec_ctrl.c >> +++ b/xen/arch/x86/spec_ctrl.c >> @@ -616,7 +616,7 @@ void __init init_speculation_mitigations >> >> /* Check whether Eager FPU should be enabled by default. */ >> if ( opt_eager_fpu == -1 ) >> - opt_eager_fpu = should_use_eager_fpu(); >> + opt_eager_fpu = !cpu_has_xsave && should_use_eager_fpu(); > I'd not spotted this the first time round. > > Intel is very clear that, if you're using xsave, you should be using > eager FPU. Therefore, this goes specifically against the advice in the > ORM, and the advise we were given during the LazyFPU timeframe. > > Furthermore we (XenServer) and customers have seen a reliable perf > improvement from the LazyFPU security fix, up to 8% in places, for > normal VDI and server workloads. As I said during the development the > LazyFPU fixes, this is almost certainly down to the fact that all code > uses the FPU these days. > > I'm still waiting on a more formal statement from AMD, and don't yet > have any perf numbers on their hardware. > > However, as we will definitely get an extra perf boost from fully > deleting the remaining lazy paths (no more clts/stts in the context > switch path), my gut feeing is that there is going to have to be some > terrible chronic case on AMD for for us to consider not switching to > fully eager. > > Irrespective of what we do here, I'd really like Wei to rebase his work > to remove the lazy fpu logic from the nested virt paths, because its a > no-brainer (perf wise) and comes with a massive amount of code > simplification in Xen.
Actually, this reminds me of a bug report given during XenSummit in Nanjing. Once Xen has restored lazy state, we drop the interception of #NM, but we still take a vmexit on the clts. This was from Alibaba iirc, and came in at an astounding 70% perf hit to one particular HPC workload. I think this can be fixed by using the host/guest cr0 mask to allow writes of cr0.ts, in exactly the same way as we have recently gained for cr4.pge. Also, AMD has a specific option for virtualisation of cr0.ts writes, and I can't remember if we're using it or not. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel