On 26/09/18 07:33, Jan Beulich wrote:
>>>> On 25.09.18 at 18:14, <andrew.coop...@citrix.com> wrote:
>> On 18/09/18 13:28, Jan Beulich wrote:
>>> @@ -1281,11 +1282,35 @@ static void load_segments(struct vcpu *n
>>>      struct cpu_user_regs *uregs = &n->arch.user_regs;
>>>      int all_segs_okay = 1;
>>>      unsigned int dirty_segment_mask, cpu = smp_processor_id();
>>> +    bool fs_gs_done = false;
>>>  
>>>      /* Load and clear the dirty segment mask. */
>>>      dirty_segment_mask = per_cpu(dirty_segment_mask, cpu);
>>>      per_cpu(dirty_segment_mask, cpu) = 0;
>>>  
>>> +#ifdef CONFIG_HVM
>>> +    if ( !is_pv_32bit_vcpu(n) && !cpu_has_fsgsbase && cpu_has_svm &&
>>> +         !((uregs->fs | uregs->gs) & ~3) &&
>>> +         /*
>>> +          * The remaining part is just for optimization: If only shadow GS
>>> +          * needs loading, there's nothing to be gained here.
>> VMLOAD also loads LDT, and LLDT is fully serialising, so an even heavier
>> perf hit than wrmsr.
> I don't understand how your remark relates to the comment

Because the comment is false in the case that the LDT also needs loading.

> , or ...
>
>>> +          */
>>> +         (n->arch.pv.fs_base | n->arch.pv.gs_base_user) )
> ... the commented code. There's nothing LDT-ish here. Or are you
> meaning to suggest something LDT-ish should be added? I'd rather not,
> as the common case (afaict) is for there to be no LDT.
>
> I also don't understand the "even heavier" part - WRMSR (for the MSRs
> in question) is as serializing as is LLDT, isn't it?

No - I'd mixed up which MSRs weren't serialising.

>>> +
>>> +    if ( !ldt_base )
>>> +    {
>>> +        /*
>>> +         * The actual structure field used here was arbitrarily chosen.
>>> +         * Empirically it doesn't seem to matter much which element is 
>>> used,
>>> +         * and a clear explanation of the otherwise poor performance has 
>>> not
>>> +         * been found/provided so far.
>>> +         */
>>> +        asm volatile ( "prefetch %0" :: "m" (vmcb->ldtr) );
>> prefetchw(), which already exists and is used.
> I've specifically decided against using it, as we don't mean to write any
> part of the VMCB.

I think you need to double check your reasoning here.  There is a reason
why this function wont compile if you made vmcb a const pointer.

Irrespective of the writeable aspect, we also have prefetch() which
should be used in preference to inline assembly.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to