Re: [Xen-devel] [PATCH 0/6] Intel Processor Trace virtulization enabling
> >>> Hi All, > >>> > >>> Here is a patch-series which adding Processor Trace enabling in XEN > >>> guest. You can get It's software developer manuals from: > >>> https://software.intel.com/sites/default/files/managed/c5/15/archite > >>> ct ure-instruction-set-extensions-programming-reference.pdf > >>> In Chapter 5 INTEL PROCESSOR TRACE: VMX IMPROVEMENTS. > >>> > >>> Introduction: > >>> Intel Processor Trace (Intel PT) is an extension of Intel > >>> Architecture that captures information about software execution > >>> using > >> dedicated hardware facilities that cause only minimal performance > >> perturbation to the software being traced. Details on the Intel PT > >> infrastructure and trace capabilities can be found in the Intel 64 > and IA-32 Architectures Software Developer’s Manual, Volume 3C. > >>> The suite of architecture changes serve to simplify the process of > >>> virtualizing Intel PT for use by a guest software. There are two > >> primary elements to this new architecture support for VMX support > >> improvements made for Intel PT. > >>> 1. Addition of a new guest IA32_RTIT_CTL value field to the VMCS. > >>> — This serves to speed and simplify the process of disabling trace on > >>> VM exit, and restoring it on VM entry. > >>> 2. Enabling use of EPT to redirect PT output. > >>> — This enables the VMM to elect to virtualize the PT output buffer > >>> using EPT. In this mode, the CPU will treat PT output > >> addresses as Guest Physical Addresses (GPAs) and translate them using > >> EPT. This means that Intel PT output reads (of the ToPA > >> table) and writes (of trace output) can cause EPT violations, and other > >> output events. > >> > >> Hello, > >> > >> Having read the new proposed extensions, I've got some architecture > >> questions before diving into the patches themselves. > >> > >> First of all, is this technology expected to end up in Icelake, or > >> something later? > > Yes, this would be enabled on Icelake. > > Thanks. > > > > >> In Vol 3, the existing VMX support describes a number of scenarios > >> (system wide, VMM-only, VM-only, guest aware), which require the use of > >> MSR load lists to atomically alter the IA32_RTIT_* > msrs. > > Currently, I just enabling the guest only mode(VM-only) in my patches. > > That is fine. I'm not asking you to implement these modes; I am just trying > to work out how they would interact. System-Wide: trace Xen hypervisor and guest and output to host buffer; VMM-only: trace Xen hypervisor only and output to host buffer.; VM-only: trace guest only and output to guest buffer; > > > > >> Obviously, system wide mode is incompatible with also allowing the > >> guest to use PT itself, > > Yes, system mode(collect trace packets of the entire system) can't work > > with guest only mode at the same time. > > > >> but what about Xen wanting to use PT for itself, and for the guest to use > >> PT as well? > > I think this can be named by host-guest mode. This may need add new command > > or interface to enable PT in Xen hypervisor. > Trace vmm-only and guest simultaneous and output to their respective buffer. > > > >> Previously, this appears to be possible using the MSR load lists > >> (albeit with Xen needing to shadow the ToPA records to cause the packet > >> stream to end up in the right place). > > Yes. > > > >> However, the new VM consistency checks require that using EPT > >> redirection requires clear/load CTL on exit/entry be set, and having load > >> on entry set requires the host TraceEn to be clear. > > Yes. New bits added in VMCS can make sure PT be disabled before VM exit and > > IA32_RTIT_CTL MSR will be written with the value > of the associated Guest State field of the VMCS on VM entry. > > I am not sure I explained myself clearly. > > It appears that it is possible to implement host-guest mode using MSR load > lists, because all the host configuration gets replaced by > guest configuration on vmentry, and host configuration gets restored at > vmexit. Yes, correct. > > However with these PT extension, a new restriction is that a vmentry failure > will occur if we try to load the guest RTIT_CTL value > while the current RTIT_CTL.TraceEn is set. Yes, as description in "Intel® architecture instruction set extensions programming reference" 5.2.3. If the “Load IA32_RTIT_CTL on entry” is 1, IA32_RTIT_CTL.TraceEn must be zero. Otherwise will VM entry fails by loading processor state from the guest-state area of the VMCS. > > As far as I can tell, this prohibits host-guest mode from working, unless we > tolerate having Xen clear RTIT_CTL before restoring guest > GPR state. Yes. If implement host-guest mode we should clear RTIT_CTL first and restore the guest state before VM-entry . 1. Clear RTIT_CTL before configuration MSRs. A WRMSR to any of these configuration MSRs that begins and ends with IA32_RTIT_CTL.TraceEn set will #GP fault. Packet generation must be disabled before the configuration MSRs c
Re: [Xen-devel] [PATCH 0/6] Intel Processor Trace virtulization enabling
> > Hi All, > > > > Here is a patch-series which adding Processor Trace enabling in XEN guest. > > You can get It's software developer manuals from: > > https://software.intel.com/sites/default/files/managed/c5/15/architect > > ure-instruction-set-extensions-programming-reference.pdf > > In Chapter 5 INTEL PROCESSOR TRACE: VMX IMPROVEMENTS. > > > > Introduction: > > Intel Processor Trace (Intel PT) is an extension of Intel Architecture that > > captures information about software execution using > dedicated hardware facilities that cause only minimal performance > perturbation to the software being traced. Details on the Intel PT > infrastructure and trace capabilities can be found in the Intel 64 and IA-32 > Architectures Software Developer’s Manual, Volume 3C. > > > > The suite of architecture changes serve to simplify the process of > > virtualizing Intel PT for use by a guest software. There are two > primary elements to this new architecture support for VMX support > improvements made for Intel PT. > > 1. Addition of a new guest IA32_RTIT_CTL value field to the VMCS. > > — This serves to speed and simplify the process of disabling trace on VM > > exit, and restoring it on VM entry. > > 2. Enabling use of EPT to redirect PT output. > > — This enables the VMM to elect to virtualize the PT output buffer using > > EPT. In this mode, the CPU will treat PT output > addresses as Guest Physical Addresses (GPAs) and translate them using EPT. > This means that Intel PT output reads (of the ToPA > table) and writes (of trace output) can cause EPT violations, and other > output events. > > Hello, > > Having read the new proposed extensions, I've got some architecture questions > before diving into the patches themselves. > > First of all, is this technology expected to end up in Icelake, or something > later? Yes, this would be enabled on Icelake. > > In Vol 3, the existing VMX support describes a number of scenarios (system > wide, VMM-only, VM-only, guest aware), which require > the use of MSR load lists to atomically alter the IA32_RTIT_* msrs. Currently, I just enabling the guest only mode(VM-only) in my patches. > > Obviously, system wide mode is incompatible with also allowing the guest to > use PT itself, Yes, system mode(collect trace packets of the entire system) can't work with guest only mode at the same time. > but what about Xen wanting to use PT for itself, and for the guest to use PT > as well? I think this can be named by host-guest mode. This may need add new command or interface to enable PT in Xen hypervisor. Trace vmm-only and guest simultaneous and output to their respective buffer. > > Previously, this appears to be possible using the MSR load lists (albeit with > Xen needing to shadow the ToPA records to cause the > packet stream to end up in the right place). Yes. > > However, the new VM consistency checks require that using EPT redirection > requires clear/load CTL on exit/entry be set, and having > load on entry set requires the host TraceEn to be clear. Yes. New bits added in VMCS can make sure PT be disabled before VM exit and IA32_RTIT_CTL MSR will be written with the value of the associated Guest State field of the VMCS on VM entry. Thanks for your response. Luwei Kang > > Therefore, as far as I can see, allowing a guest to use PT via EPT now > prohibits Xen also using PT for its own purposes outside of non- > root mode. > > Is this intentional and/or expected, or have I misunderstood something in the > manuals? > > Thanks, > > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 for-4.9] x86/vpmu: add cpu hot unplug notifier for vpmu
Hi Julien, Follow your advice, I change the tag to [PATCH v4 for-4.9] because of we hope this patch can be merged in Xen 4.9. Hi Jan, If there have anything need to change in this patch? Thanks, Luwei Kang > -Original Message- > From: Kang, Luwei > Sent: Tuesday, May 23, 2017 4:46 AM > To: xen-devel@lists.xen.org > Cc: jbeul...@suse.com; andrew.coop...@citrix.com; boris.ostrov...@oracle.com; > Kang, Luwei > Subject: [PATCH v4] x86/vpmu: add cpu hot unplug notifier for vpmu > > Currently, Hot unplug a physical CPU with vpmu enabled may cause system hang > due to send a remote call to an offlined pCPU. This > patch add a cpu hot unplug notifer to save vpmu context before cpu offline. > > Consider one scenario, hot unplug pCPU N with vpmu enabled. > The vcpu which running on this pCPU will be switch to other online cpu. A > remote call will be send to pCPU N to save the vpmu > context before loading the vpmu context on this pCPU. > System will hang in function on_select_cpus() because of that pCPU is > offlined and can not do any respond. > > The purpose of add a VPMU_CONTEXT_LOADED check in vpmu_arch_destroy() before > send a remote call to save vpmu contex is: > a. when a vpmu context has been loaded in a remote pCPU, make a >remote call to save the vpmu contex and stop counters is necessary. > b. VPMU_CONTEXT_LOADED flag will be reset if a pCPU is offlined. >this check will prevent send a remote call to an offlined pCPU. > > Signed-off-by: Luwei Kang > --- > v4: > 1.remove cpu_online() check in vpm_load(); 2.remove "vpmu_" prefix; 3.fix > a coding style; 4.add some commit message about > VPMU_CONTEXT_LOADED in vpmu_arch_destroy(); > v3: > 1.add cpu_online() check in vpm_load() and vpmu_arch_destroy(); 2.add vpmu_ > prefix. rename cpu_callback() to > vpmu_cpu_callback(); > v2: > 1.fix some typo and coding style; > 2.change "swith" to "if" in cpu_callback() because of there just have one > case; 3.add VPMU_CONTEX_LOADED check before send > remote call in vpmu_arch_destroy(); > --- > xen/arch/x86/cpu/vpmu.c | 45 + > 1 file changed, 41 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c index > 03401fd..1f7830b 100644 > --- a/xen/arch/x86/cpu/vpmu.c > +++ b/xen/arch/x86/cpu/vpmu.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -575,15 +576,21 @@ static void vpmu_arch_destroy(struct vcpu *v) > * We will test it again in vpmu_clear_last() with interrupts > * disabled to make sure we don't clear someone else. > */ > -if ( per_cpu(last_vcpu, vpmu->last_pcpu) == v ) > +if ( cpu_online(vpmu->last_pcpu) && > + per_cpu(last_vcpu, vpmu->last_pcpu) == v ) > on_selected_cpus(cpumask_of(vpmu->last_pcpu), > vpmu_clear_last, v, 1); > > if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy ) > { > -/* Unload VPMU first. This will stop counters */ > -on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), > - vpmu_save_force, v, 1); > +/* > + * Unload VPMU first if VPMU_CONTEXT_LOADED being set. > + * This will stop counters. > + */ > +if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) > +on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), > + vpmu_save_force, v, 1); > + > vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); > } > } > @@ -835,6 +842,33 @@ long do_xenpmu_op(unsigned int op, > XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg) > return ret; > } > > +static int cpu_callback( > +struct notifier_block *nfb, unsigned long action, void *hcpu) { > +unsigned int cpu = (unsigned long)hcpu; > +struct vcpu *vcpu = per_cpu(last_vcpu, cpu); > +struct vpmu_struct *vpmu; > + > +if ( !vcpu ) > +return NOTIFY_DONE; > + > +vpmu = vcpu_vpmu(vcpu); > +if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) > +return NOTIFY_DONE; > + > +if ( action == CPU_DYING ) > +{ > +vpmu_save_force(vcpu); > +vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); > +} > + > +return NOTIFY_DONE; > +} > + > +static struct notifier_block cpu_nfb = { > +.notifier_call = cpu_callback > +}; > + > static int __init vpmu_init(void) > { > int vendor = current_cpu_data.x86_vendor; @@ -872,8 +906,11 @@ static > int __init vpmu_init(vo
Re: [Xen-devel] [PATCH v3] x86/vpmu: add cpu hot unplug notifier for vpmu
> >>> On 21.05.17 at 15:09, wrote: > > v3: > > 1.add cpu_online() check in vpm_load() and vpmu_arch_destroy(); > > 2.add vpmu_ prefix. rename cpu_callback() to vpmu_cpu_callback(); > > I had specifically objected to the latter. Sorry, will rollback it. > > > @@ -394,8 +395,11 @@ int vpmu_load(struct vcpu *v, bool_t from_guest) > > if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) > > return 0; > > > > -/* First time this VCPU is running here */ > > -if ( vpmu->last_pcpu != pcpu ) > > +/* > > + * The last pCPU is still online and this is the first time this vCPU > > + * running here. > > + */ > > +if ( cpu_online(vpmu->last_pcpu) && vpmu->last_pcpu != pcpu ) > > Adding a cpu_online() check here is unlikely to be helpful. Actually I may > have misguided you with prior comments (and if so, I'm > sorry) - the LOADED check following this one makes sure on_selected_cpus() > won't be called with an offline CPU here. IOW I think > the code can be left untouched, but the reason why should be spelled out in > the commit message (matching the reasoning why > adding the LOADED check to vpmu_arch_destroy() is sufficient for the second > use of last_pcpu there). > So, remove cpu_online() check here, because of LOADED check can make sure don't send remote call to an offline cpu (cpu_callback() will reset this flag). The cpu_online() check in vpmu_arch_destroy() should be reserved due to per_cpu(last_vcpu, vpmu->last_pcpu) has become an invalid value(Not NULL). Is that right? Thanks, Luwei Kang > > @@ -575,15 +579,21 @@ static void vpmu_arch_destroy(struct vcpu *v) > > * We will test it again in vpmu_clear_last() with interrupts > > * disabled to make sure we don't clear someone else. > > */ > > -if ( per_cpu(last_vcpu, vpmu->last_pcpu) == v ) > > +if ( cpu_online(vpmu->last_pcpu) && > > +per_cpu(last_vcpu, vpmu->last_pcpu) == v ) > > Indentation. > > Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vpmu: add cpu hot unplug notifier for vpmu
> >>> On 18.05.17 at 16:23, wrote: > >> > --- a/xen/arch/x86/cpu/vpmu.c > >> > +++ b/xen/arch/x86/cpu/vpmu.c > >> > @@ -859,6 +859,7 @@ static int cpu_callback( > >> > { > >> > vpmu_save_force(vcpu); > >> > vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); > >> > +per_cpu(last_vcpu, cpu) = NULL;// OR: > >> > this_cpu(last_vcpu) > > = NULL; > >> > } > >> > As you mentioned in before comments, it has been done in > > vpmu_save_force(). So this change is unnecessary. > >> > >> Indeed. But all I was talking is last_pcpu (whereas you once again > >> talk > > about last_vcpu). > >> > >> > In summary, I think it is enough to solve the issue in > >> > vpmu_load() and > > vpmu_arch_destroy(). > >> > >> That's what I alluded to in my reply. > >> > >> > After cpu_callback() function, per_cpu(last_vcpu, > >> > vpmu->last_pcpu) will be NULL > >> > >> No. per_cpu(..., ) simply is invalid. > >> > >> > and VPMU_CONTEXT_LOADED will be clear. > >> > In vpmu_arch_destroy(), there will not make remote call to clear > >> > last. > >> > >> I don't understand this sentence. > > > > I mean per_cpu(..., ) will be NULL after > > cpu_callback(), so that "per_cpu(last_vcpu, vpmu->last_pcpu) == v" > > check in > > vpmu_arch_destroy() will be fail when last_pcpu is the offlined pCPU. > > Or, it may make a remote call to clear the last_vpcu (vpmu_clear_last()). > > But I don't understand why simply is invalid, last_vcpu set to NULL is > > presented in source code. How to comprehend it? > > per_cpu(..., ) will fault once the CPU is actually offline. > See free_percpu_area(). Oh, yes. Thanks for your clarification. Thanks, Luwei Kang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vpmu: add cpu hot unplug notifier for vpmu
> > --- a/xen/arch/x86/cpu/vpmu.c > > +++ b/xen/arch/x86/cpu/vpmu.c > > @@ -859,6 +859,7 @@ static int cpu_callback( > > { > > vpmu_save_force(vcpu); > > vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); > > +per_cpu(last_vcpu, cpu) = NULL;// OR: this_cpu(last_vcpu) > > = NULL; > > } > > As you mentioned in before comments, it has been done in > > vpmu_save_force(). So this change is unnecessary. > > Indeed. But all I was talking is last_pcpu (whereas you once again talk about > last_vcpu). > > > In summary, I think it is enough to solve the issue in vpmu_load() and > > vpmu_arch_destroy(). > > That's what I alluded to in my reply. > > > After cpu_callback() function, per_cpu(last_vcpu, vpmu->last_pcpu) > > will be NULL > > No. per_cpu(..., ) simply is invalid. > > > and VPMU_CONTEXT_LOADED will be clear. > > In vpmu_arch_destroy(), there will not make remote call to clear last. > > I don't understand this sentence. I mean per_cpu(..., ) will be NULL after cpu_callback(), so that "per_cpu(last_vcpu, vpmu->last_pcpu) == v" check in vpmu_arch_destroy() will be fail when last_pcpu is the offlined pCPU. Or, it may make a remote call to clear the last_vpcu (vpmu_clear_last()). But I don't understand why simply is invalid, last_vcpu set to NULL is presented in source code. How to comprehend it? Thanks, Luwei Kang > > > In vpmu_load(), remote call will guarded by VPMU_CONTEXT_LOADED flag > > check. As for vpmu->last_pcpu, we can't use some > random online one to produce false. > > What is your opinion? > > I continue to think that it needs to be made sure last_pcpu is valid before > using it for anything. Agreed, my previous suggestion of > simply storing an invalid value was not very useful, as the questionable > comparison is != (when making the suggestion I did wrongly > rememeber it to be == ), but that doesn't eliminate the need to sanity check > the value before use. Perhaps all that's needed are a > couple of cpu_online() checks. > > Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vpmu: add cpu hot unplug notifier for vpmu
> >>> On 17.05.17 at 17:57, wrote: > > @@ -581,9 +582,14 @@ static void vpmu_arch_destroy(struct vcpu *v) > > > > if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy ) > > { > > -/* Unload VPMU first. This will stop counters */ > > -on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), > > - vpmu_save_force, v, 1); > > +/* > > + * Unload VPMU first if VPMU_CONTEXT_LOADED being set. > > + * This will stop counters. > > + */ > > +if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) > > +on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), > > + vpmu_save_force, v, 1); > > + > > vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); > > } > > } > > So this is a good step towards what was requested during v1 review, provided > it is correct (I'll let Boris comment). You didn't, > however, do anything about the other unguarded last_pcpu uses (in vpmu_load() > and upwards from the code above in > vpmu_arch_destroy()). These _may_ be implicitly fine, but if so please at > least add suitable ASSERT()s. > Hi Jan, Thanks for your reply. I think I understand the issue you mentioned. But sorry, I am not very clear what is your solution from your description. At first, I want to change like this: --- a/xen/arch/x86/cpu/vpmu.c +++ b/xen/arch/x86/cpu/vpmu.c @@ -859,6 +859,7 @@ static int cpu_callback( { vpmu_save_force(vcpu); vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); +per_cpu(last_vcpu, cpu) = NULL;// OR: this_cpu(last_vcpu) = NULL; } As you mentioned in before comments, it has been done in vpmu_save_force(). So this change is unnecessary. In summary, I think it is enough to solve the issue in vpmu_load() and vpmu_arch_destroy(). After cpu_callback() function, per_cpu(last_vcpu, vpmu->last_pcpu) will be NULL and VPMU_CONTEXT_LOADED will be clear. In vpmu_arch_destroy(), there will not make remote call to clear last. In vpmu_load(), remote call will guarded by VPMU_CONTEXT_LOADED flag check. As for vpmu->last_pcpu, we can't use some random online one to produce false. What is your opinion? Thanks, Luwei Kang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/vpmu: add cpu hot unplug notifier for vpmu
> >>> On 16.05.17 at 19:29, wrote: > > Currently, hot unplug a cpu with vpmu enabled may cause system hang > > due to send IPI to a die physical cpu. This patch add a cpu hot unplug > > notifer to save vpmu context before cpu offline. > > > > Consider one scene, hotplug physical cpu N with vpmu is enabled. > > I think you mean "scenario" and "hot unplug". > > > The vcpu which running on this physical cpu before will be switch to > > other online cpu. Before load the vpmu context to new physical cpu, a > > IPI will be send to cpu N to save the vpmu context. > > System will hang in function on_select_cpus because of that physical > > cpu is offline and can not do any response. > > Doesn't this make clear that you would better also make sure > ->last_pcpu doesn't hold to the then stale CPU anymore? For > example, vpmu_load() compares it with smp_processor_id() (the subsequent use > is guarded by a VPMU_CONTEXT_LOADED flag > check), allowing badness if the same or another CPU with the same number > comes up again quickly enough. Similarly > vpmu_arch_destroy() uses it without checking VPMU_CONTEXT_LOADED. Hi Jan, I think it may can't make sure "->last_pcpu" doesn't hold to the then stale CPU. The purpose of this notifier is to save the vpmu context before cpu offline. Avoid save vpmu context by send IPI to that offline cpu. There is no reason to change the value except it saving (vpmu_save()) in another physical cpu. Regarding vpmu_arch_destroy(), it indeed will cause same issue. What about add " this_cpu(cpu) = NULL" in cpu_callback() to clean the last_vcpu pointer of this physical cpu. In addition, add VPMU_CONTEXT_LOADED check before execute on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), vpmu_save_force, v, 1) in vpmu_arch_destroy(). Because of force save operation has been finished in notifier function. Thanks, Luwei Kang > > > --- a/xen/arch/x86/cpu/vpmu.c > > +++ b/xen/arch/x86/cpu/vpmu.c > > @@ -35,6 +35,7 @@ > > #include > > #include > > #include > > +#include > > Please place this in the group of other xen/ includes. > > > +static int cpu_callback(struct notifier_block *nfb, unsigned long > > +action, void *hcpu) { > > +unsigned int cpu = (unsigned long)hcpu; > > +struct vcpu *vcpu = per_cpu(last_vcpu, cpu); > > +struct vpmu_struct *vpmu; > > + > > +if ( !vcpu ) > > +return NOTIFY_DONE; > > + > > +vpmu = vcpu_vpmu(vcpu); > > +if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) > > +return NOTIFY_DONE; > > + > > +switch ( action ) > > +{ > > +case CPU_DYING: > > +vpmu_save_force(vcpu); > > +vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); > > +break; > > +default: > > +break; > > Pointless default case. > > > @@ -871,10 +902,11 @@ static int __init vpmu_init(void) > > break; > > } > > > > -if ( vpmu_mode != XENPMU_MODE_OFF ) > > +if ( vpmu_mode != XENPMU_MODE_OFF ) { > > +register_cpu_notifier(&vpmu_cpu_nfb); > > printk(XENLOG_INFO "VPMU: version " __stringify(XENPMU_VER_MAJ) "." > > __stringify(XENPMU_VER_MIN) "\n"); > > -else > > +} else > > Coding style (brace placement). > > Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/vpmu: fix vpmu can't enabled in guest
> On 14/02/17 02:19, Luwei Kang wrote: > > vpmu_enable() can not use for check if vpmu is enabled in guest during > > booting up. Because linux kernel get the status of if supported pmu is > > earler than xen vpmu initialise. vpmu_enable() will return false even > > if vpmu has been enabled in guest. > > Sorry for breaking this. However ... > > > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c index > > e0a387e..b63c5d8 100644 > > --- a/xen/arch/x86/cpuid.c > > +++ b/xen/arch/x86/cpuid.c > > @@ -713,8 +713,7 @@ static void pv_cpuid(uint32_t leaf, uint32_t subleaf, > > struct cpuid_leaf *res) > > } > > } > > > > -if ( vpmu_enabled(curr) && > > - vpmu_is_set(vcpu_vpmu(curr), VPMU_CPU_HAS_DS) ) > > +if ( opt_vpmu_enabled && boot_cpu_has(X86_FEATURE_DS) ) > > ... this is overly general. The visibility of these flags must be per > domain, and not globally like this. > > In particular, XENPMU_MODE_ALL needs to expose PMU to dom0, but hide it from > all other domains, while even in the > XENPMU_MODE_SELF case, only domains explicitly configured with PMU should see > it (as it generally unsafe to migrate with). > > Longterm (with the inclusion of the CPUID improvements), my plan was to have > PMU available in the max policy but hidden in the > default policy, which requires the toolstack to explicitly opt in for > domains. It would opt in/out by setting the version field in guests > CPUID policy and the other feature bits. > Get it. Thanks for your explanation. Thanks, Luwei Kang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/vpmu: fix vpmu can't enabled in guest
> >>> On 14.02.17 at 03:19, wrote: > > vpmu_enable() can not use for check if vpmu is enabled in guest during > > booting up. Because linux kernel get the status of if supported pmu is > > earler than xen vpmu initialise. vpmu_enable() will return false even > > if vpmu has been enabled in guest. > > > > Signed-off-by: Luwei Kang > > You've probably seen Boris' patch with the same overall goal. While his looks > to leave things a little too strict, yours looks to be widening > things a little too much. Do both of you think we could find a middle ground, > or do we need to accept the effect of possibly misleading > the guest by surfacing the CPUID data independent of vPMU mode, as is done > here? > Sorry, I didn't check the mail list on time and don't know somebody is working on this. I think Boris' patch is better than me. Thanks, Luwei Kang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] X86/VPMU:clear the overflow status of which counter happened overflow
> > From: Kang, Luwei > > Sent: Tuesday, December 13, 2016 10:50 AM > > > > just set the corresponding bits of which counter happened overflow, > > rather than set all the available bits of IA32_PERF_GLOBAL_OVF_CTRL > > when happened pmu interrupt. > > "when pmu interrupt happens" > When a counter happens overflow. > > > > Signed-off-by: Luwei Kang > > Acked-by: Kevin Tian ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] X86/VPMU:clear the overflow status of which counter happened overflow
> >>> On 12.12.16 at 11:35, wrote: > > If a counter happend overflow just clear corresponding bit of > > IA32_PERF_GLOBAL_OVF_CTRL, rather than clear all the bit of this msr. > > The code change is fine, but the description appears to be wrong: > Isn't the change to avoid bits getting wrongly set, rather than any getting > wrongly cleared? > just set the corresponding bits of which counter happened overflow, rather than set all the available bits of IA32_PERF_GLOBAL_OVF_CTRL when happened pmu interrupt. Or Remove the redundant bits set of IA32_PERF_GLOBAL_OVF_CTRL. Is that OK? Thank, Luwei Kang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] X86/VPMU: mask off uncore overflow bit on xeon phi knights landing
> >>> On 12.12.16 at 07:51, wrote: > > By the way, I think another place may need to do some modify as well. > > > > @@ -868,7 +868,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs > > *regs) > > if ( is_pmc_quirk ) > > handle_pmc_quirk(msr_content); > > core2_vpmu_cxt->global_status |= msr_content; > > -msr_content = ~global_ovf_ctrl_mask; > > +msr_content &= ~global_ovf_ctrl_mask; > > wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); > > } > > > > If one counter have overflow all the bit will be clean. I think it > > need add & with current status. > > > > Hi jan and Andrew, > > What is your opinion? > > I agree, but it looks to be an independent change. > Hi Jan, Thanks for your reply. I will submit another patch regarding " msr_content &= ~global_ovf_ctrl_mask;" soon. What is your opinion about this patch? Need any modify? Thanks, Luwei Kang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] X86/VPMU: mask off uncore overflow bit on xeon phi knights landing
> On 12/08/2016 10:17 PM, Luwei Kang wrote: > > IA32_PERF_GLOBAL_STATUS.OvfUncore (MSR 38EH, bit[61]) is always 0 and > > writing 1 to IA32_PERF_GLOBAL_OVF_CTRL.ClrOvfUncore (MSR 390H, > > bit[61]) signals #GP. > > Reference "Intel Xeon Phi Procssor x200 Product Family", document > > number 334646-008. > > > > Signed-off-by: Luwei Kang > > --- > > xen/arch/x86/cpu/vpmu_intel.c | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/xen/arch/x86/cpu/vpmu_intel.c > > b/xen/arch/x86/cpu/vpmu_intel.c index e8049ed..0be78ff 100644 > > --- a/xen/arch/x86/cpu/vpmu_intel.c > > +++ b/xen/arch/x86/cpu/vpmu_intel.c > > @@ -1058,11 +1058,17 @@ int __init core2_vpmu_init(void) > > (((1ULL << fixed_pmc_cnt) - 1) << 32) | > > ((1ULL << arch_pmc_cnt) - 1)); > > if ( version > 2 ) > > +{ > > /* > > * Even though we don't support Uncore counters guests should be > > * able to clear all available overflows. > > */ > > global_ovf_ctrl_mask &= ~(1ULL << 61); > > +/* Knight Landing doesn't support overflow bit on uncore counters > > */ > > +if ( current_cpu_data.x86_model == 0x57 ) > > +global_ovf_ctrl_mask |= (1ULL << 61); > > + > > +} > > > > I think these types of model-specific changes (or errata) should be done in > check_pmc_quirk(). And is_pmc_quirk (along with > handle_pmc_quirk()) should be renamed to something more specific to that > particular problem. > Maybe pmc_underflow_quirk? > By the way, I think another place may need to do some modify as well. @@ -868,7 +868,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs) if ( is_pmc_quirk ) handle_pmc_quirk(msr_content); core2_vpmu_cxt->global_status |= msr_content; -msr_content = ~global_ovf_ctrl_mask; +msr_content &= ~global_ovf_ctrl_mask; wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); } If one counter have overflow all the bit will be clean. I think it need add & with current status. Hi jan and Andrew, What is your opinion? Thanks, Luwei Kang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5] x86/cpuid: AVX-512 Feature Detection
> >>> On 23.08.16 at 03:54, wrote: > > AVX512 is an extention of AVX2. Its spec can be found at: > > https://software.intel.com/sites/default/files/managed/b4/3a/319433-02 > > 4.pdf This patch detects AVX512 features by CPUID. > > > > Signed-off-by: Luwei Kang > > Reviewed-by: Jan Beulich > > But I'd specifically like to have at least an ack from Andrew too before > putting this in. > Hi Andrew, What is your opinion? Thanks, Luwei Kang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/cpuid: AVX-512 Feature Detection
> > >> First of all - please don't top post. > > >> > > >>> What about remove the dependency between AVX2 and AVX512F > > ( AVX2: > > >> [AVX512F], ) ? > > >> > > >> Yes, that's what I think we want, but we need Andrew's agreement > > >> here. > > >> > > > Hi Andrew, what is your opinion ? > > You are in a better position to answer than me. > > > > For a specific instruction which may be VEX and EVEX encoded, > > is the circuitry for a specific instruction shared, or > > discrete? Is there a plausible future where you might > > support only the EVEX variant of an instruction, and not the VEX > > variant? > > > > These dependences are about what may be reasonably assumed > > about the way the environment is structured. It doesn't look > > reasonable to advertise an AVX512 environment to guests while > > at the same time stating that AVX2 is not present. If this > > is correct, then the dependency > > > should stay. > > If Intel plausibly things it might release hardware with > > !AVX2 but AVX512, then the dependency should be dropped. > > >>> Regarding the dependency between AVX2 and AVX512F, I have ask > > >>> some hardware > > > architecture engineer. > > >>> AVX512 is a superset of AVX2, the most important item being > > >>> the state. If > > > the state for AVX2 isn't enabled (in XCR0), then AVX512 > > >> also can't function. > > >>> So if we want to use AVX512, AVX2 must be supported and enabled. > > >>> The > > > dependence between AVX512F and AVX2 may need be > > >> reserved. > > >> > > >> Ok, so AVX512 strictly depends on AVX2. > > >> > > >> In which case, the python code was correct. The meaning of the > > >> key/value > > > relationship is "if the feature in the key is not present, all > > >> features in the value must also be disabled". > > > Hi jan, what is your opinion? > > There's no opinion to be had here, according to your earlier reply. > > I do, > > >>> however, continue to question that model: State and > > instruction set are independent items. Of course YMM state is a > > prereq to > > >>> ZMM state, but I do not buy AVX2 insn support being a > > prereq to AVX-512 insns. Yet to me the AVX2 and AVX-512F feature > > flags solely > > >>> represent the instructions, while the XSTATE leaf bits > > represent the states. > > > > >>> Hi jan, > > >>> I get some information from hardware colleague. There is no > > >>> dependence about AVX512 instructions and AVX2 instructions. It is > > >>> also not states in > > > the > > >>> official document. > > >> As I had assumed. > > >> > > >>>But I want to know the meaning of the dependence "AVX2: > > >>> [AVX512F]" in "gen-cpuid.py" file. > > >>>If it is the feature dependence, I think the dependence between > > >>> AVX512 and AVX2 may need to add. As we talk before, Zmm is part of > > >>> AVX512 feature. > > > > > >>> If the state for AVX2 isn't enabled (in XCR0), then AVX512 also > > >>> can't function. Apart from that, there is no processors with > > >>> AVX512 without AVX2 currently and it is safe to treat AVX2 as part > > >>> of the thermometer leading to > > > > > >>> AVX512. Regarding if will have a cpu support avx512 without avx2 > > >>> in future, it is really hard to say. > > >>> If it is the instructions dependence, I have no idea to delete > > >>> this dependence in "gen-cpuid.py" file. > > >>> So, I want to know your advice. > > >> Well, the main issue here is that we have no feature flag > > >> representation for the xstate bits. If we had, the relevant parts > > >> of the dependencies should look like > > >> > > >> XSTATE_YMM: [AVX, XSTATE_ZMM] > > >> AVX: [AVX2] > > >> XSTATE_ZMM: [AVX512F] > > >> AVX512F: [AVX512CD, ...] > > >> > > >> But in their absence I guess keeping the AVX2 dependency as you > > >> have it is the only reasonable approach. Just that I'd like it to > > >> be accompanied by a comment explaining that this isn't the actual > > >> dependency (so that if XSTATE feature flags got introduced later, > > >> it would be clear how to adjust those parts). > > >> > > >> Andrew - I keep forgetting why you think having such XSTATE_* > > >> feature flags is not appropriate. > > > > > > This is all about providing a plausible cpuid layout to a guest. > > > > > > It is not plausible that we will ever see a processor with e.g. > > > AVX512F but not XSTATE_ZMM. > > > > This is a somewhat bogus argument considering we have > > > > FXSR: [FFXSR, SSE], > > > > which, as you certainly realize, is slightly wrong nowadays: > > XSTATE_XMM would suffice as a prereq, without any FXSR. (But I > > certainly realize that lack of FXSR is unlikely, as that's considered > > part
Re: [Xen-devel] [PATCH v4] x86/cpuid: AVX-512 Feature Detection
> >>> On 10.08.16 at 14:10, wrote: > > On 10/08/16 11:29, Jan Beulich wrote: > >>>>> On 10.08.16 at 11:58, wrote: > >>>>>>> On 03.08.16 at 03:32, wrote: > >>>>>> On 25/07/16 07:09, Kang, Luwei wrote: > >>>>>>>>>> First of all - please don't top post. > >>>>>>>>>> > >>>>>>>>>>> What about remove the dependency between AVX2 and AVX512F > >>>>>>>> ( AVX2: > >>>>>>>>>> [AVX512F], ) ? > >>>>>>>>>> > >>>>>>>>>> Yes, that's what I think we want, but we need Andrew's agreement > >>>>>>>>>> here. > >>>>>>>>>> > >>>>>>>>> Hi Andrew, what is your opinion ? > >>>>>>>> You are in a better position to answer than me. > >>>>>>>> > >>>>>>>> For a specific instruction which may be VEX and EVEX encoded, > >>>>>>>> is the circuitry for a specific instruction shared, or > >>>>>>>> discrete? Is there a plausible future where you might support > >>>>>>>> only the EVEX variant of an instruction, and not the VEX variant? > >>>>>>>> > >>>>>>>> These dependences are about what may be reasonably assumed > >>>>>>>> about the way the environment is structured. It doesn't look > >>>>>>>> reasonable to advertise an AVX512 environment to guests while > >>>>>>>> at the same time stating that AVX2 is not present. If this is > >>>>>>>> correct, then the dependency > >>>>> should stay. > >>>>>>>> If Intel plausibly things it might release hardware with !AVX2 > >>>>>>>> but AVX512, then the dependency should be dropped. > >>>>>>> Regarding the dependency between AVX2 and AVX512F, I have ask > >>>>>>> some hardware > >>>>> architecture engineer. > >>>>>>> AVX512 is a superset of AVX2, the most important item being the > >>>>>>> state. If > >>>>> the state for AVX2 isn't enabled (in XCR0), then AVX512 > >>>>>> also can't function. > >>>>>>> So if we want to use AVX512, AVX2 must be supported and enabled. > >>>>>>> The > >>>>> dependence between AVX512F and AVX2 may need be > >>>>>> reserved. > >>>>>> > >>>>>> Ok, so AVX512 strictly depends on AVX2. > >>>>>> > >>>>>> In which case, the python code was correct. The meaning of the > >>>>>> key/value > >>>>> relationship is "if the feature in the key is not present, all > >>>>>> features in the value must also be disabled". > >>>>> Hi jan, what is your opinion? > >>>> There's no opinion to be had here, according to your earlier reply. > >>>> I do, > >>> however, continue to question that model: State and > >>>> instruction set are independent items. Of course YMM state is a > >>>> prereq to > >>> ZMM state, but I do not buy AVX2 insn support being a > >>>> prereq to AVX-512 insns. Yet to me the AVX2 and AVX-512F feature > >>>> flags solely > >>> represent the instructions, while the XSTATE leaf bits > >>>> represent the states. > >>>> > >>> Hi jan, > >>> I get some information from hardware colleague. There is no > >>> dependence about AVX512 instructions and AVX2 instructions. It is > >>> also not states in > > the > >>> official document. > >> As I had assumed. > >> > >>>But I want to know the meaning of the dependence "AVX2: > >>> [AVX512F]" in "gen-cpuid.py" file. > >>>If it is the feature dependence, I think the dependence between > >>> AVX512 and AVX2 may need to add. As we talk before, Zmm is part of > >>> AVX512 feature. > > > >>> If the state for AVX2 isn't enabled (in XCR0), then AVX512 also > >>> can't function. Apart from that, there is no processors with AVX512 > >>> w
Re: [Xen-devel] [PATCH v4] x86/cpuid: AVX-512 Feature Detection
> >>> On 03.08.16 at 03:32, wrote: > >> On 25/07/16 07:09, Kang, Luwei wrote: > >> >>>> First of all - please don't top post. > >> >>>> > >> >>>>> What about remove the dependency between AVX2 and AVX512F > >> >> ( AVX2: > >> >>>> [AVX512F], ) ? > >> >>>> > >> >>>> Yes, that's what I think we want, but we need Andrew's agreement here. > >> >>>> > >> >>> Hi Andrew, what is your opinion ? > >> >> You are in a better position to answer than me. > >> >> > >> >> For a specific instruction which may be VEX and EVEX encoded, is > >> >> the circuitry for a specific instruction shared, or discrete? Is > >> >> there a plausible future where you might support only the EVEX > >> >> variant of an instruction, and not the VEX variant? > >> >> > >> >> These dependences are about what may be reasonably assumed about > >> >> the way the environment is structured. It doesn't look reasonable > >> >> to advertise an AVX512 environment to guests while at the same > >> >> time stating that AVX2 is not present. If this is correct, then > >> >> the dependency > > should stay. > >> >> If Intel plausibly things it might release hardware with !AVX2 but > >> >> AVX512, then the dependency should be dropped. > >> > Regarding the dependency between AVX2 and AVX512F, I have ask some > >> > hardware > > architecture engineer. > >> > > >> > AVX512 is a superset of AVX2, the most important item being the > >> > state. If > > the state for AVX2 isn't enabled (in XCR0), then AVX512 > >> also can't function. > >> > > >> > So if we want to use AVX512, AVX2 must be supported and enabled. > >> > The > > dependence between AVX512F and AVX2 may need be > >> reserved. > >> > >> Ok, so AVX512 strictly depends on AVX2. > >> > >> In which case, the python code was correct. The meaning of the > >> key/value > > relationship is "if the feature in the key is not present, all > >> features in the value must also be disabled". > > > > Hi jan, what is your opinion? > > There's no opinion to be had here, according to your earlier reply. I do, > however, continue to question that model: State and > instruction set are independent items. Of course YMM state is a prereq to ZMM > state, but I do not buy AVX2 insn support being a > prereq to AVX-512 insns. Yet to me the AVX2 and AVX-512F feature flags solely > represent the instructions, while the XSTATE leaf bits > represent the states. > Hi jan, I get some information from hardware colleague. There is no dependence about AVX512 instructions and AVX2 instructions. It is also not states in the official document. But I want to know the meaning of the dependence "AVX2: [AVX512F]" in "gen-cpuid.py" file. If it is the feature dependence, I think the dependence between AVX512 and AVX2 may need to add. As we talk before, Zmm is part of AVX512 feature. If the state for AVX2 isn't enabled (in XCR0), then AVX512 also can't function. Apart from that, there is no processors with AVX512 without AVX2 currently and it is safe to treat AVX2 as part of the thermometer leading to AVX512. Regarding if will have a cpu support avx512 without avx2 in future, it is really hard to say. If it is the instructions dependence, I have no idea to delete this dependence in "gen-cpuid.py" file. So, I want to know your advice. Thanks, Luwei Kang ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/cpuid: AVX-512 Feature Detection
> On 25/07/16 07:09, Kang, Luwei wrote: > >>>> First of all - please don't top post. > >>>> > >>>>> What about remove the dependency between AVX2 and AVX512F > >> ( AVX2: > >>>> [AVX512F], ) ? > >>>> > >>>> Yes, that's what I think we want, but we need Andrew's agreement here. > >>>> > >>> Hi Andrew, what is your opinion ? > >> You are in a better position to answer than me. > >> > >> For a specific instruction which may be VEX and EVEX encoded, is the > >> circuitry for a specific instruction shared, or discrete? Is there a > >> plausible future where you might support only the EVEX variant of an > >> instruction, and not the VEX variant? > >> > >> These dependences are about what may be reasonably assumed about the > >> way the environment is structured. It doesn't look reasonable to > >> advertise an AVX512 environment to guests while at the same time > >> stating that AVX2 is not present. If this is correct, then the dependency > >> should stay. > >> If Intel plausibly things it might release hardware with !AVX2 but > >> AVX512, then the dependency should be dropped. > > Regarding the dependency between AVX2 and AVX512F, I have ask some hardware > > architecture engineer. > > > > AVX512 is a superset of AVX2, the most important item being the state. If > > the state for AVX2 isn't enabled (in XCR0), then AVX512 > also can't function. > > > > So if we want to use AVX512, AVX2 must be supported and enabled. The > > dependence between AVX512F and AVX2 may need be > reserved. > > Ok, so AVX512 strictly depends on AVX2. > > In which case, the python code was correct. The meaning of the key/value > relationship is "if the feature in the key is not present, all > features in the value must also be disabled". Hi jan, what is your opinion? > > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/cpuid: AVX-512 Feature Detection
> >> First of all - please don't top post. > >> > >>> What about remove the dependency between AVX2 and AVX512F > ( AVX2: > >> [AVX512F], ) ? > >> > >> Yes, that's what I think we want, but we need Andrew's agreement here. > >> > > Hi Andrew, what is your opinion ? > > You are in a better position to answer than me. > > For a specific instruction which may be VEX and EVEX encoded, is the circuitry > for a specific instruction shared, or discrete? Is there a plausible future > where you might support only the EVEX variant of an instruction, and not the > VEX variant? > > These dependences are about what may be reasonably assumed about the > way the environment is structured. It doesn't look reasonable to advertise > an AVX512 environment to guests while at the same time stating that AVX2 is > not present. If this is correct, then the dependency should stay. > If Intel plausibly things it might release hardware with !AVX2 but AVX512, > then the dependency should be dropped. Regarding the dependency between AVX2 and AVX512F, I have ask some hardware architecture engineer. AVX512 is a superset of AVX2, the most important item being the state. If the state for AVX2 isn't enabled (in XCR0), then AVX512 also can't function. So if we want to use AVX512, AVX2 must be supported and enabled. The dependence between AVX512F and AVX2 may need be reserved. > > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/cpuid: AVX-512 Feature Detection
> -Original Message- > From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: Tuesday, July 5, 2016 3:03 PM > To: Kang, Luwei > Cc: andrew.coop...@citrix.com; chao.p.p...@linux.intel.com; xen- > de...@lists.xen.org > Subject: RE: [PATCH v4] x86/cpuid: AVX-512 Feature Detection > > >>> On 05.07.16 at 04:31, wrote: > > First of all - please don't top post. > > > What about remove the dependency between AVX2 and AVX512F ( AVX2: > [AVX512F], ) ? > > Yes, that's what I think we want, but we need Andrew's agreement here. > Hi Andrew, what is your opinion ? > > From: Jan Beulich [mailto:jbeul...@suse.com] > > Sent: Friday, July 1, 2016 3:56 PM > >>>> On 30.06.16 at 07:50, wrote: > >> --- a/xen/tools/gen-cpuid.py > >> +++ b/xen/tools/gen-cpuid.py > >> @@ -243,6 +243,17 @@ def crunch_numbers(state): > >> # AMD K6-2+ and K6-III processors shipped with 3DNow+, beyond > the > >> # standard 3DNow in the earlier K6 processors. > >> _3DNOW: [_3DNOWEXT], > >> + > >> +# AVX2 is an extension to AVX, providing mainly new integer > instructions. > >> +# In principle, AVX512 only takes YMM register state as a > prerequisite > >> +# and many AVX2 instructions are extended by AVX512F to 512-bit > forms. > >> +AVX2: [AVX512F], > > > > I think this was meant to be "but" or "yet" instead of "and". And then > > I'm not particularly happy about the asymmetry with AVX (which is not > > dependent on SSE or any of its successors, despite extending many of > > them to 256-bit forms). Plus e.g. FMA gets extended too, but isn't being > made a dependency. > > And quite opposite to that, at least some AVX2 instructions get > > extended only by e.g. AVX512BW. > > Nor are any of the extended forms listed to require other than the > > AVX512* bits checked in the doc. > > > > Jan > > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] x86/cpuid: AVX-512 Feature Detection
What about remove the dependency between AVX2 and AVX512F ( AVX2: [AVX512F], ) ? -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, July 1, 2016 3:56 PM To: andrew.coop...@citrix.com; Kang, Luwei Cc: chao.p.p...@linux.intel.com; xen-devel@lists.xen.org Subject: Re: [PATCH v4] x86/cpuid: AVX-512 Feature Detection >>> On 30.06.16 at 07:50, wrote: > --- a/xen/tools/gen-cpuid.py > +++ b/xen/tools/gen-cpuid.py > @@ -243,6 +243,17 @@ def crunch_numbers(state): > # AMD K6-2+ and K6-III processors shipped with 3DNow+, beyond the > # standard 3DNow in the earlier K6 processors. > _3DNOW: [_3DNOWEXT], > + > +# AVX2 is an extension to AVX, providing mainly new integer > instructions. > +# In principle, AVX512 only takes YMM register state as a > prerequisite > +# and many AVX2 instructions are extended by AVX512F to 512-bit > forms. > +AVX2: [AVX512F], I think this was meant to be "but" or "yet" instead of "and". And then I'm not particularly happy about the asymmetry with AVX (which is not dependent on SSE or any of its successors, despite extending many of them to 256-bit forms). Plus e.g. FMA gets extended too, but isn't being made a dependency. And quite opposite to that, at least some AVX2 instructions get extended only by e.g. AVX512BW. Nor are any of the extended forms listed to require other than the AVX512* bits checked in the doc. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [V3] x86/cpuid: AVX-512 Feature Detection
From table 2-2 I can see that AVX512F = AVX512F & AVX512VL AVX512CD = AVX512F & AVX512VL & AVX512CD AVX512DQ = AVX512F & AVX512VL & AVX512DQ AVX512BW = AVX512F & AVX512VL & AVX512BW AVX512IFMA = AVX512F & AVX512VL & AVX512IFMA AVX512VBMI = AVX512F & AVX512VL & AVX512VBMI From this depends list some AVX512 feature also depends on AVX512VL. So the depends may like this: AVX2: [AVX512F], AVX512F: [AVX512DQ, AVX512IFMA, AVX512CD, AVX512BW, AVX512VBMI], AVX512VL:[AVX512F, AVX512DQ, AVX512IFMA, AVX512CD, AVX512BW, AVX512VBMI] But I have test on current hardware, this machine support AVX512F, AVX512PF, AVX512ER and AVX512CD, do not support AVX512VL. I think there may have some conflict between spec and hardware. The depends about AVX512VL should not add, or the AVX512 feature will not work. So I think below depends is ok. AVX2: [AVX512F], AVX512F: [AVX512DQ, AVX512IFMA, AVX512PF, AVX512ER, AVX512CD, AVX512BW, AVX512VL, AVX512VBMI], Thanks, Luwei Kang -Original Message- From: Kang, Luwei Sent: Wednesday, June 29, 2016 7:28 PM To: xen-devel@lists.xen.org Cc: jbeul...@suse.com; andrew.coop...@citrix.com; Wang, Yong Y ; Peng, Chao P ; Kang, Luwei Subject: [V3] x86/cpuid: AVX-512 Feature Detection AVX-512 is an extention of AVX2. Its spec can be found at: https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf This patch detects AVX-512 features by CPUID. Signed-off-by: Luwei Kang --- [V3] 1.adjust dependencies between features. [V2] 1.one per bit, change from > + xstate_size = max(xstate_size, > + xstate_offsets[_XSTATE_HI_ZMM] + > + xstate_sizes[_XSTATE_HI_ZMM]); to xstate_size = max(xstate_size, xstate_offsets[_XSTATE_OPMASK] + xstate_sizes[_XSTATE_OPMASK]); xstate_size = max(xstate_size, xstate_offsets[_XSTATE_ZMM] + xstate_sizes[_XSTATE_ZMM]); xstate_size = max(xstate_size, xstate_offsets[_XSTATE_HI_ZMM] + xstate_sizes[_XSTATE_HI_ZMM]); 2.change form domain_cpuid(currd, 0x07, 0, &tmp, &_ebx, &tmp, &tmp); to domain_cpuid(currd, 7, 0, &tmp, &_ebx, &tmp, &tmp); 3.add dependencies between features in xen/tools/gen-cpuid.py 4.split the cpuid call just like the way the hvm_cpuid() side works. xen/arch/x86/hvm/hvm.c | 14 ++ xen/arch/x86/traps.c| 22 +- xen/include/public/arch-x86/cpufeatureset.h | 9 + xen/tools/gen-cpuid.py | 11 +++ 4 files changed, 55 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index c89ab6e..7696b1e 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3474,6 +3474,20 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, xstate_sizes[_XSTATE_BNDCSR]); } +if ( _ebx & cpufeat_mask(X86_FEATURE_AVX512F) ) +{ +xfeature_mask |= XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM; +xstate_size = max(xstate_size, + xstate_offsets[_XSTATE_OPMASK] + + xstate_sizes[_XSTATE_OPMASK]); +xstate_size = max(xstate_size, + xstate_offsets[_XSTATE_ZMM] + + xstate_sizes[_XSTATE_ZMM]); +xstate_size = max(xstate_size, + xstate_offsets[_XSTATE_HI_ZMM] + + xstate_sizes[_XSTATE_HI_ZMM]); +} + if ( _ecx & cpufeat_mask(X86_FEATURE_PKU) ) { xfeature_mask |= XSTATE_PKRU; diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 767d0b0..8fb697b 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -975,7 +975,7 @@ void pv_cpuid(struct cpu_user_regs *regs) switch ( leaf ) { -uint32_t tmp, _ecx; +uint32_t tmp, _ecx, _ebx; case 0x0001: c &= pv_featureset[FEATURESET_1c]; @@ -1157,6 +1157,26 @@ void pv_cpuid(struct cpu_user_regs *regs) xstate_sizes[_XSTATE_YMM]); } +if ( !is_control_domain(currd) && !is_hardware_domain(currd) ) +domain_cpuid(currd, 7, 0, &tmp, &_ebx, &tmp, &tmp); +else +cpuid_count(7, 0, &tmp, &_ebx, &tmp, &tmp); +_ebx &= pv_featureset[FEATURESET_7b0]; + +if ( _ebx & cpufeat_mask(X86_FEATURE_AVX512F) ) +{ +xfeature_mask |= XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM; +x
Re: [Xen-devel] [PATCH] x86/cpuid: AVX-512 Feature Detection
OK, no problem. -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Tuesday, June 28, 2016 4:47 PM To: Kang, Luwei ; xen-devel@lists.xen.org Cc: jbeul...@suse.com; Wang, Yong Y ; Peng, Chao P Subject: Re: [PATCH] x86/cpuid: AVX-512 Feature Detection On 28/06/16 06:51, Luwei Kang wrote: > @@ -1136,9 +1136,16 @@ void pv_cpuid(struct cpu_user_regs *regs) > case XSTATE_CPUID: > > if ( !is_control_domain(currd) && !is_hardware_domain(currd) > ) > +{ > domain_cpuid(currd, 1, 0, &tmp, &tmp, &_ecx, &tmp); > +domain_cpuid(currd, 0x07, 0, &tmp, &_ebx, &tmp, &tmp); > +} > else > +{ > _ecx = cpuid_ecx(1); > +cpuid_count(0x07, 0, &tmp, &_ebx, &tmp, &tmp); > +} > + In addition to Jan's comments, having _ecx from one leaf and _ebx from a different leaf collected at the same time is liable to cause confusion. Please split the cpuid call for leaf 7 out from here, and put it in the next hunk, just like the way the hvm_cpuid() side works. ~Andrew > _ecx &= pv_featureset[FEATURESET_1c]; > > if ( !(_ecx & cpufeat_mask(X86_FEATURE_XSAVE)) || subleaf >= > 63 ) @@ -1157,6 +1164,14 @@ void pv_cpuid(struct cpu_user_regs *regs) > xstate_sizes[_XSTATE_YMM]); > } > > +if ( _ebx & cpufeat_mask(X86_FEATURE_AVX512F) ) > +{ > +xfeature_mask |= XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM; > +xstate_size = max(xstate_size, > + xstate_offsets[_XSTATE_HI_ZMM] + > + xstate_sizes[_XSTATE_HI_ZMM]); > +} > + > a = (uint32_t)xfeature_mask; > d = (uint32_t)(xfeature_mask >> 32); > c = xstate_size; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/cpuid: AVX-512 Feature Detection
Thanks for your advice, I will make a change right now. -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, June 28, 2016 3:49 PM To: Kang, Luwei Cc: andrew.coop...@citrix.com; Peng, Chao P ; Wang, Yong Y ; xen-devel@lists.xen.org Subject: Re: [PATCH] x86/cpuid: AVX-512 Feature Detection >>> On 28.06.16 at 07:51, wrote: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -3474,6 +3474,14 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, > unsigned int *ebx, >xstate_sizes[_XSTATE_BNDCSR]); > } > > +if ( _ebx & cpufeat_mask(X86_FEATURE_AVX512F) ) > +{ > +xfeature_mask |= XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM; > +xstate_size = max(xstate_size, > + xstate_offsets[_XSTATE_HI_ZMM] + > + xstate_sizes[_XSTATE_HI_ZMM]); I think this would better be three such statements, one per bit. Otherwise the goal of not putting in assumptions on the relative ordering of bits and save area ranges gets undermined. > @@ -1136,9 +1136,16 @@ void pv_cpuid(struct cpu_user_regs *regs) > case XSTATE_CPUID: > > if ( !is_control_domain(currd) && !is_hardware_domain(currd) > ) > +{ > domain_cpuid(currd, 1, 0, &tmp, &tmp, &_ecx, &tmp); > +domain_cpuid(currd, 0x07, 0, &tmp, &_ebx, &tmp, &tmp); The neighboring line tells you that this should be 7 instead of 0x07. > --- a/xen/include/public/arch-x86/cpufeatureset.h > +++ b/xen/include/public/arch-x86/cpufeatureset.h > @@ -206,15 +206,24 @@ XEN_CPUFEATURE(PQM, 5*32+12) /* Platform > QoS Monitoring */ > XEN_CPUFEATURE(NO_FPU_SEL,5*32+13) /*! FPU CS/DS stored as zero */ > XEN_CPUFEATURE(MPX, 5*32+14) /*S Memory Protection Extensions */ > XEN_CPUFEATURE(PQE, 5*32+15) /* Platform QoS Enforcement */ > +XEN_CPUFEATURE(AVX512F, 5*32+16) /*A AVX-512 Foundation Instructions > */ > +XEN_CPUFEATURE(AVX512DQ, 5*32+17) /*A AVX-512 Doubleword & Quadword > Instrs */ > XEN_CPUFEATURE(RDSEED,5*32+18) /*A RDSEED instruction */ > XEN_CPUFEATURE(ADX, 5*32+19) /*A ADCX, ADOX instructions */ > XEN_CPUFEATURE(SMAP, 5*32+20) /*S Supervisor Mode Access > Prevention */ > +XEN_CPUFEATURE(AVX512IFMA,5*32+21) /*A AVX-512 Integer Fused Multiply > Add */ > XEN_CPUFEATURE(CLFLUSHOPT,5*32+23) /*A CLFLUSHOPT instruction */ > XEN_CPUFEATURE(CLWB, 5*32+24) /*A CLWB instruction */ > +XEN_CPUFEATURE(AVX512PF, 5*32+26) /*A AVX-512 Prefetch Instructions */ > +XEN_CPUFEATURE(AVX512ER, 5*32+27) /*A AVX-512 Exponent & Reciprocal > Instrs */ > +XEN_CPUFEATURE(AVX512CD, 5*32+28) /*A AVX-512 Conflict Detection > Instrs */ > XEN_CPUFEATURE(SHA, 5*32+29) /*A SHA1 & SHA256 instructions */ > +XEN_CPUFEATURE(AVX512BW, 5*32+30) /*A AVX-512 Byte and Word > Instructions */ > +XEN_CPUFEATURE(AVX512VL, 5*32+31) /*A AVX-512 Vector Length Extensions > */ > > /* Intel-defined CPU features, CPUID level 0x0007:0.ecx, word 6 */ > XEN_CPUFEATURE(PREFETCHWT1, 6*32+ 0) /*A PREFETCHWT1 instruction */ > +XEN_CPUFEATURE(AVX512VBMI,6*32+ 1) /*A AVX-512 Vector Byte Manipulation > Instrs */ > XEN_CPUFEATURE(PKU, 6*32+ 3) /*H Protection Keys for Userspace */ > XEN_CPUFEATURE(OSPKE, 6*32+ 4) /*! OS Protection Keys Enable */ This lacks an adjustment to the dependencies between features in xen/tools/gen-cpuid.py. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.7] x86/cpuid: Calculate a guests xfeature_mask from its featureset
OK. -Original Message- From: Andrew Cooper [mailto:am...@hermes.cam.ac.uk] On Behalf Of Andrew Cooper Sent: Friday, June 3, 2016 3:35 PM To: Kang, Luwei ; Xen-devel Cc: Jan Beulich ; Wei Liu ; Han, Huaitong ; Wang, Yong Y Subject: Re: [PATCH for-4.7] x86/cpuid: Calculate a guests xfeature_mask from its featureset On 03/06/2016 03:50, Kang, Luwei wrote: > I have finsh test this patch and it work well, thank Andrew and all. May I take that as a Tested-by: Luwei Kang then please? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.7] x86/cpuid: Calculate a guests xfeature_mask from its featureset
I have finsh test this patch and it work well, thank Andrew and all. -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Friday, June 3, 2016 12:48 AM To: Xen-devel Cc: Andrew Cooper ; Jan Beulich ; Wei Liu ; Kang, Luwei ; Han, Huaitong Subject: [PATCH for-4.7] x86/cpuid: Calculate a guests xfeature_mask from its featureset libxc current performs the xstate calculation for guests, and provides the information to Xen to be used when satisfying CPUID traps. (There is further work planned to improve this arrangement, but the worst a buggy toolstack can do is make junk appear in the cpuid leaves for the guest.) dom0 however has no policy constructed for it, and certain fields filter straight through from hardware. Linux queries CPUID.7[0].{EAX/EDX} alone to choose a setting for %xcr0, which is action to take. However, features such as MPX and PKRU are not supported for PV guests. As a result, Linux, using leaked hardware information, fails to set %xcr0 on newer Skylake hardware with PKRU support, and crashes. As an interim solution, dynamically calculate the correct xfeature_mask and xstate_size to report to the guest for CPUID.7[0] queries. This ensures that domains don't see leaked hardware values, even when no cpuid policy is provided. Similarly, CPUID.7[1]{ECX/EDX} represents the applicable settings for MSR_XSS. Xen doesn't support any XSS states in guests, unconditionally clear them for HVM guests. Reported-by: Luwei Kang Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Wei Liu CC: Luwei Kang CC: Huaitong Han --- xen/arch/x86/hvm/hvm.c | 53 ++-- xen/arch/x86/traps.c | 50 - xen/arch/x86/xstate.c| 2 +- xen/include/asm-x86/xstate.h | 32 +- 4 files changed, 114 insertions(+), 23 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index bb98051..72bbed5 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3362,7 +3362,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, switch ( input ) { -unsigned int _ecx, _edx; +unsigned int _ebx, _ecx, _edx; case 0x1: /* Fix up VLAPIC details. */ @@ -3443,6 +3443,51 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, switch ( count ) { case 0: +{ +uint64_t xfeature_mask = XSTATE_FP_SSE; +uint32_t xstate_size = XSTATE_AREA_MIN_SIZE; + +if ( _ecx & cpufeat_mask(X86_FEATURE_AVX) ) +{ +xfeature_mask |= XSTATE_YMM; +xstate_size = MAX(xstate_size, + xstate_offsets[_XSTATE_YMM] + + xstate_sizes[_XSTATE_YMM]); +} + +_ecx = 0; +hvm_cpuid(7, NULL, &_ebx, &_ecx, NULL); + +if ( _ebx & cpufeat_mask(X86_FEATURE_MPX) ) +{ +xfeature_mask |= XSTATE_BNDREGS | XSTATE_BNDCSR; +xstate_size = MAX(xstate_size, + xstate_offsets[_XSTATE_BNDCSR] + + xstate_sizes[_XSTATE_BNDCSR]); +} + +if ( _ebx & cpufeat_mask(X86_FEATURE_PKU) ) +{ +xfeature_mask |= XSTATE_PKRU; +xstate_size = MAX(xstate_size, + xstate_offsets[_XSTATE_PKRU] + + xstate_sizes[_XSTATE_PKRU]); +} + +hvm_cpuid(0x8001, NULL, NULL, &_ecx, NULL); + +if ( _ecx & cpufeat_mask(X86_FEATURE_LWP) ) +{ +xfeature_mask |= XSTATE_LWP; +xstate_size = MAX(xstate_size, + xstate_offsets[_XSTATE_LWP] + + xstate_sizes[_XSTATE_LWP]); +} + +*eax = (uint32_t)xfeature_mask; +*edx = (uint32_t)(xfeature_mask >> 32); +*ecx = xstate_size; + /* * Always read CPUID[0xD,0].EBX from hardware, rather than domain * policy. It varies with enabled xstate, and the correct xcr0 is @@ -3450,6 +3495,8 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, */ cpuid_count(input, count, &dummy, ebx, &dummy, &dummy); break; +} + case 1: *eax &= hvm_featureset[FEATURESET_Da1]; @@ -3463,7 +3510,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, cpuid_count(input, count, &dummy, ebx, &dummy, &dummy); } else -*ebx = *ecx = *edx = 0; +*ebx = 0; + +*ecx = *edx = 0;
Re: [Xen-devel] [PATCH] x86/cpuid: fix dom0 crash on skylake machine
Thank you Andrew Cooper, this patch indeed resolve my issue and two point need modify. The code need move ahead of "break;" @@ -1101,6 +1101,9 @@ void pv_cpuid(struct cpu_user_regs *regs) if ( !is_control_domain(currd) && !is_hardware_domain(currd) ) cpuid_count(leaf, subleaf, &tmp, &b, &tmp, &tmp); break; + +a &= (uint32_t)pv_xfeature_mask; +d &= (uint32_t)(pv_xfeature_mask >> 32); } extraneous space after "&". -sanitise_featureset(hvm_featureset); +sanitise_featureset(hvm_featureset, & hvm_xfeature_mask); -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Wednesday, June 1, 2016 5:04 PM To: Kang, Luwei ; xen-devel@lists.xen.org Cc: jbeul...@suse.com; Han, Huaitong ; Wang, Yong Y Subject: Re: [PATCH] x86/cpuid: fix dom0 crash on skylake machine On 01/06/16 05:58, Luwei Kang wrote: > CPUID.0XD.0X0.EAX is from machine value for dom0, and dom0 kernel will > xsetbv with xfeatures_mask that is from CPUID.0XD.0X0.EAX, but > handle_xsetbv has ingored XSTATE_PKRU with hardware protection fault > emulation, so dom0 kernel will crash on skylake machine with PKRU support. > > Signed-off-by: Luwei Kang > --- > xen/arch/x86/traps.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index > 1ef8401..5e72e44 100644 > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -1100,6 +1100,9 @@ void pv_cpuid(struct cpu_user_regs *regs) > */ > if ( !is_control_domain(currd) && !is_hardware_domain(currd) ) > cpuid_count(leaf, subleaf, &tmp, &b, &tmp, &tmp); > + > +/* PV is not supported by XSTATE_PKRU. */ > +a &= ~XSTATE_PKRU; > break; > } > While this does work, it undoes some of the work I started with my cpuid improvements in 4.7 Does the attached patch also resolve your issue? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel