On 26/09/18 07:33, Jan Beulich wrote: >>>> On 25.09.18 at 18:14, <andrew.coop...@citrix.com> wrote: >> On 18/09/18 13:28, Jan Beulich wrote: >>> @@ -1281,11 +1282,35 @@ static void load_segments(struct vcpu *n >>> struct cpu_user_regs *uregs = &n->arch.user_regs; >>> int all_segs_okay = 1; >>> unsigned int dirty_segment_mask, cpu = smp_processor_id(); >>> + bool fs_gs_done = false; >>> >>> /* Load and clear the dirty segment mask. */ >>> dirty_segment_mask = per_cpu(dirty_segment_mask, cpu); >>> per_cpu(dirty_segment_mask, cpu) = 0; >>> >>> +#ifdef CONFIG_HVM >>> + if ( !is_pv_32bit_vcpu(n) && !cpu_has_fsgsbase && cpu_has_svm && >>> + !((uregs->fs | uregs->gs) & ~3) && >>> + /* >>> + * The remaining part is just for optimization: If only shadow GS >>> + * needs loading, there's nothing to be gained here. >> VMLOAD also loads LDT, and LLDT is fully serialising, so an even heavier >> perf hit than wrmsr. > I don't understand how your remark relates to the comment
Because the comment is false in the case that the LDT also needs loading. > , or ... > >>> + */ >>> + (n->arch.pv.fs_base | n->arch.pv.gs_base_user) ) > ... the commented code. There's nothing LDT-ish here. Or are you > meaning to suggest something LDT-ish should be added? I'd rather not, > as the common case (afaict) is for there to be no LDT. > > I also don't understand the "even heavier" part - WRMSR (for the MSRs > in question) is as serializing as is LLDT, isn't it? No - I'd mixed up which MSRs weren't serialising. >>> + >>> + if ( !ldt_base ) >>> + { >>> + /* >>> + * The actual structure field used here was arbitrarily chosen. >>> + * Empirically it doesn't seem to matter much which element is >>> used, >>> + * and a clear explanation of the otherwise poor performance has >>> not >>> + * been found/provided so far. >>> + */ >>> + asm volatile ( "prefetch %0" :: "m" (vmcb->ldtr) ); >> prefetchw(), which already exists and is used. > I've specifically decided against using it, as we don't mean to write any > part of the VMCB. I think you need to double check your reasoning here. There is a reason why this function wont compile if you made vmcb a const pointer. Irrespective of the writeable aspect, we also have prefetch() which should be used in preference to inline assembly. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel