On 16/08/2018 10:07, Andrew Cooper wrote:
> On 22/06/2018 11:57, Jan Beulich wrote:
>> --- a/xen/arch/x86/spec_ctrl.c
>> +++ b/xen/arch/x86/spec_ctrl.c
>> @@ -616,7 +616,7 @@ void __init init_speculation_mitigations
>>  
>>      /* Check whether Eager FPU should be enabled by default. */
>>      if ( opt_eager_fpu == -1 )
>> -        opt_eager_fpu = should_use_eager_fpu();
>> +        opt_eager_fpu = !cpu_has_xsave && should_use_eager_fpu();
> I'd not spotted this the first time round.
>
> Intel is very clear that, if you're using xsave, you should be using
> eager FPU.  Therefore, this goes specifically against the advice in the
> ORM, and the advise we were given during the LazyFPU timeframe.
>
> Furthermore we (XenServer) and customers have seen a reliable perf
> improvement from the LazyFPU security fix, up to 8% in places, for
> normal VDI and server workloads.  As I said during the development the
> LazyFPU fixes, this is almost certainly down to the fact that all code
> uses the FPU these days.
>
> I'm still waiting on a more formal statement from AMD, and don't yet
> have any perf numbers on their hardware.
>
> However, as we will definitely get an extra perf boost from fully
> deleting the remaining lazy paths (no more clts/stts in the context
> switch path), my gut feeing is that there is going to have to be some
> terrible chronic case on AMD for for us to consider not switching to
> fully eager.
>
> Irrespective of what we do here, I'd really like Wei to rebase his work
> to remove the lazy fpu logic from the nested virt paths, because its a
> no-brainer (perf wise) and comes with a massive amount of code
> simplification in Xen.

Actually, this reminds me of a bug report given during XenSummit in
Nanjing.  Once Xen has restored lazy state, we drop the interception of
#NM, but we still take a vmexit on the clts.  This was from Alibaba
iirc, and came in at an astounding 70% perf hit to one particular HPC
workload.

I think this can be fixed by using the host/guest cr0 mask to allow
writes of cr0.ts, in exactly the same way as we have recently gained for
cr4.pge.  Also, AMD has a specific option for virtualisation of cr0.ts
writes, and I can't remember if we're using it or not.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to