Re: [Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
>>> On 23.02.18 at 16:51,wrote: > On 23/02/18 15:05, Jan Beulich wrote: > On 21.02.18 at 12:36, wrote: >>> --- a/xen/arch/x86/msr.c >>> +++ b/xen/arch/x86/msr.c >>> @@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, >>> uint64_t *val) >>> _MSR_MISC_FEATURES_CPUID_FAULTING; >>> break; >>> >>> +case MSR_TSC_AUX: >>> +if ( !cp->extd.rdtscp ) >>> +goto gp_fault; >> Isn't this breaking the PV use without the feature flag set, but >> running in TSC_MODE_PVRDTSCP? I.e. don't you want >> >> if ( !cp->extd.rdtscp && >> d->arch.tsc_mode != TSC_MODE_PVRDTSCP ) >> >> ? Remember there are two cases, one being that the host has the >> feature flag set, but the guest has it clear, and the other being >> that both have it clear. Since in the former case the guest can read >> the MSR through RDTSCP, I think the MSR access ought to be >> allowed too. > > There is at least a 3rd case, of no hardware RDTSCP support, which is > why we also emulate it in emul-invl-op.c Well, that's the "both have it clear" case I've mentioned above. > A guest trying to use PVRDTSC necessarily needs out-of-band signalling > to set it up. I do not think it is reasonable or appropriate to retain > the ABI breakage of completing reads of the MSR when the instruction > should be architecturally unavailable. Well, one can take either position, so I'm not going to object to you not wanting to make the change. >>> --- a/xen/arch/x86/time.c >>> +++ b/xen/arch/x86/time.c >>> @@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d, >>> } >>> break; >>> } >>> + >>> d->arch.incarnation = incarnation + 1; >>> + >>> +if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP ) >>> +{ >>> +struct vcpu *v; >>> + >>> +/* Distribute incarnation into each vcpu's MSR_TSC_AUX. */ >>> +for_each_vcpu ( d, v ) >>> +v->arch.msr->tsc_aux = d->arch.incarnation; >>> +} >> This not needing a lock might warrant a comment (the domain is >> [explicitly or implicitly] paused when coming here). > > This new piece of logic isn't any different to the rest of > tsc_set_info() WRT being paused. It is explicitly paused at this point. True. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
On 23/02/18 15:05, Jan Beulich wrote: On 21.02.18 at 12:36,wrote: >> There are many problems with MSR_TSC_AUX handling. >> >> To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the >> RDTSCP feature which enumerates the MSR. Therefore, RDPID functionally >> depends on RDTSCP. > Are you sure this isn't a documentation mistake? No, but nor do I have reason to doubt what is written. It seems plausible and reasonable to me. RDTSCP has been in hardware for many generations now (and appears to be architectural in AMD64 although not picked up by Intel initially), while RDPID is new in Skylake. The documentation is consistent between Vol 2 (CPUID, RDPID) and Vol 4 (MSRs), none of which make any link between RDPID enumeration and TSC_AUX, but plenty of links between RDTSCP and TSC_AUX. > If it indeed isn't, of > course my earlier comments regarding the use of cpu_has_rdpid > would have been wrong (and that patch would be fine without the > requested adjustment). > >> For PV guests, we hide RDTSCP but advertise RDPID. We also silently drop >> writes to MSR_TSC_AUX, which is very antisocial. Therefore, enable RDTSCP >> for >> PV guests, which in turn allows RDPID to work. >> >> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved >> into the generic MSR policy infrastructure, and becomes common. One >> improvement is that we will now reject invalid values, rather than silently >> truncating and accepting them. This also causes the emulator to reject >> RDTSCP >> for guests without the features. >> >> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of >> MSR_TSC_AUX and the reported value is actually the migration incarnation. >> The >> previous behaviour of this mode was to silently drop writes, but as it is a >> break in the x86 ABI to start with, switch the semantics to be more sane, and >> have TSC_MODE_PVRDTSCP make the MSR properly read-only. > All of this obviously wants an ack and/or testing by the Oracle folks > (assuming this is still in use somewhere). > >> @@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, >> hvm_domain_context_t *h) >> if ( hvm_funcs.tsc_scaling.setup ) >> hvm_funcs.tsc_scaling.setup(v); >> >> -v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux; >> +/* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. >> */ >> +if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP ) >> +v->arch.msr->tsc_aux = ctxt.msr_tsc_aux; > Since you talk about range checking in the description, shouldn't > you reject here values with the upper 32 bits non-zero? Probably. In reality all migration streams have it within range, because of the types used on the sending side. > >> --- a/xen/arch/x86/msr.c >> +++ b/xen/arch/x86/msr.c >> @@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, >> uint64_t *val) >> _MSR_MISC_FEATURES_CPUID_FAULTING; >> break; >> >> +case MSR_TSC_AUX: >> +if ( !cp->extd.rdtscp ) >> +goto gp_fault; > Isn't this breaking the PV use without the feature flag set, but > running in TSC_MODE_PVRDTSCP? I.e. don't you want > > if ( !cp->extd.rdtscp && > d->arch.tsc_mode != TSC_MODE_PVRDTSCP ) > > ? Remember there are two cases, one being that the host has the > feature flag set, but the guest has it clear, and the other being > that both have it clear. Since in the former case the guest can read > the MSR through RDTSCP, I think the MSR access ought to be > allowed too. There is at least a 3rd case, of no hardware RDTSCP support, which is why we also emulate it in emul-invl-op.c A guest trying to use PVRDTSC necessarily needs out-of-band signalling to set it up. I do not think it is reasonable or appropriate to retain the ABI breakage of completing reads of the MSR when the instruction should be architecturally unavailable. But as you've said and I agree, we definitely need Oracle's input here. >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d, >> } >> break; >> } >> + >> d->arch.incarnation = incarnation + 1; >> + >> +if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP ) >> +{ >> +struct vcpu *v; >> + >> +/* Distribute incarnation into each vcpu's MSR_TSC_AUX. */ >> +for_each_vcpu ( d, v ) >> +v->arch.msr->tsc_aux = d->arch.incarnation; >> +} > This not needing a lock might warrant a comment (the domain is > [explicitly or implicitly] paused when coming here). This new piece of logic isn't any different to the rest of tsc_set_info() WRT being paused. It is explicitly paused at this point. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
>>> On 21.02.18 at 12:36,wrote: > There are many problems with MSR_TSC_AUX handling. > > To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the > RDTSCP feature which enumerates the MSR. Therefore, RDPID functionally > depends on RDTSCP. Are you sure this isn't a documentation mistake? If it indeed isn't, of course my earlier comments regarding the use of cpu_has_rdpid would have been wrong (and that patch would be fine without the requested adjustment). > For PV guests, we hide RDTSCP but advertise RDPID. We also silently drop > writes to MSR_TSC_AUX, which is very antisocial. Therefore, enable RDTSCP for > PV guests, which in turn allows RDPID to work. > > To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved > into the generic MSR policy infrastructure, and becomes common. One > improvement is that we will now reject invalid values, rather than silently > truncating and accepting them. This also causes the emulator to reject RDTSCP > for guests without the features. > > One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of > MSR_TSC_AUX and the reported value is actually the migration incarnation. The > previous behaviour of this mode was to silently drop writes, but as it is a > break in the x86 ABI to start with, switch the semantics to be more sane, and > have TSC_MODE_PVRDTSCP make the MSR properly read-only. All of this obviously wants an ack and/or testing by the Oracle folks (assuming this is still in use somewhere). > @@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, > hvm_domain_context_t *h) > if ( hvm_funcs.tsc_scaling.setup ) > hvm_funcs.tsc_scaling.setup(v); > > -v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux; > +/* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. */ > +if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP ) > +v->arch.msr->tsc_aux = ctxt.msr_tsc_aux; Since you talk about range checking in the description, shouldn't you reject here values with the upper 32 bits non-zero? > --- a/xen/arch/x86/msr.c > +++ b/xen/arch/x86/msr.c > @@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, > uint64_t *val) > _MSR_MISC_FEATURES_CPUID_FAULTING; > break; > > +case MSR_TSC_AUX: > +if ( !cp->extd.rdtscp ) > +goto gp_fault; Isn't this breaking the PV use without the feature flag set, but running in TSC_MODE_PVRDTSCP? I.e. don't you want if ( !cp->extd.rdtscp && d->arch.tsc_mode != TSC_MODE_PVRDTSCP ) ? Remember there are two cases, one being that the host has the feature flag set, but the guest has it clear, and the other being that both have it clear. Since in the former case the guest can read the MSR through RDTSCP, I think the MSR access ought to be allowed too. > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d, > } > break; > } > + > d->arch.incarnation = incarnation + 1; > + > +if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP ) > +{ > +struct vcpu *v; > + > +/* Distribute incarnation into each vcpu's MSR_TSC_AUX. */ > +for_each_vcpu ( d, v ) > +v->arch.msr->tsc_aux = d->arch.incarnation; > +} This not needing a lock might warrant a comment (the domain is [explicitly or implicitly] paused when coming here). > --- a/xen/tools/gen-cpuid.py > +++ b/xen/tools/gen-cpuid.py > @@ -235,6 +235,9 @@ def crunch_numbers(state): > # absence of any enabled xstate. > AVX: [FMA, FMA4, F16C, AVX2, XOP], > > +# MSR_TSC_AUX is enumerated by RDTSCP, but RDPID also reads TSC_AUX. > +RDTSCP: [RDPID], As per above I'm not convinced this is a valid dependency. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
On Wed, Feb 21, 2018 at 11:36:15AM +, Andrew Cooper wrote: > There are many problems with MSR_TSC_AUX handling. > > To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the > RDTSCP feature which enumerates the MSR. Therefore, RDPID functionally > depends on RDTSCP. > > For PV guests, we hide RDTSCP but advertise RDPID. We also silently drop > writes to MSR_TSC_AUX, which is very antisocial. Therefore, enable RDTSCP for > PV guests, which in turn allows RDPID to work. > > To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved > into the generic MSR policy infrastructure, and becomes common. One > improvement is that we will now reject invalid values, rather than silently > truncating and accepting them. This also causes the emulator to reject RDTSCP > for guests without the features. > > One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of > MSR_TSC_AUX and the reported value is actually the migration incarnation. The > previous behaviour of this mode was to silently drop writes, but as it is a > break in the x86 ABI to start with, switch the semantics to be more sane, and > have TSC_MODE_PVRDTSCP make the MSR properly read-only. > > With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs > migrating, so it is moved into the common MSR logic for PV guests. Care must > be taken however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as > the receiving side will reject the guest_wrmsr(). The HVM side is tweaked as > well to only send/receive hvm_hw_cpu.msr_tsc_aux when the TSC logic isn't in > control of the value. > > What remains is that tsc_set_info() need to broadcast d->arch.incarnation to > all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and > emulation code functions correctly. > > Signed-off-by: Andrew CooperReviewed-by: Roger Pau Monné Just one comment nit below. > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index 0539551..e45f6df 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -792,7 +792,9 @@ static int hvm_save_cpu_ctxt(struct domain *d, > hvm_domain_context_t *h) > > ctxt.tsc = hvm_get_guest_tsc_fixed(v, d->arch.hvm_domain.sync_tsc); > > -ctxt.msr_tsc_aux = hvm_msr_tsc_aux(v); > +/* Only send MSR_TSC_AUX if it isn't being handled by the TSC logic. > */ > +if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP ) > +ctxt.msr_tsc_aux = v->arch.msr->tsc_aux; > > hvm_get_segment_register(v, x86_seg_idtr, ); > ctxt.idtr_limit = seg.limit; > @@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, > hvm_domain_context_t *h) > if ( hvm_funcs.tsc_scaling.setup ) > hvm_funcs.tsc_scaling.setup(v); > > -v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux; > +/* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. */ ^ set? We actually accept it. Ie: Xen doesn't don't error out when msr_tsc_aux is set and d->arch.tsc_mode == TSC_MODE_PVRDTSCP, or else we would break backwards compatibility. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.
There are many problems with MSR_TSC_AUX handling. To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the RDTSCP feature which enumerates the MSR. Therefore, RDPID functionally depends on RDTSCP. For PV guests, we hide RDTSCP but advertise RDPID. We also silently drop writes to MSR_TSC_AUX, which is very antisocial. Therefore, enable RDTSCP for PV guests, which in turn allows RDPID to work. To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved into the generic MSR policy infrastructure, and becomes common. One improvement is that we will now reject invalid values, rather than silently truncating and accepting them. This also causes the emulator to reject RDTSCP for guests without the features. One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of MSR_TSC_AUX and the reported value is actually the migration incarnation. The previous behaviour of this mode was to silently drop writes, but as it is a break in the x86 ABI to start with, switch the semantics to be more sane, and have TSC_MODE_PVRDTSCP make the MSR properly read-only. With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs migrating, so it is moved into the common MSR logic for PV guests. Care must be taken however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as the receiving side will reject the guest_wrmsr(). The HVM side is tweaked as well to only send/receive hvm_hw_cpu.msr_tsc_aux when the TSC logic isn't in control of the value. What remains is that tsc_set_info() need to broadcast d->arch.incarnation to all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and emulation code functions correctly. Signed-off-by: Andrew Cooper--- CC: Jan Beulich CC: Jun Nakajima CC: Kevin Tian CC: Boris Ostrovsky CC: Suravee Suthikulpanit CC: Wei Liu CC: Roger Pau Monné CC: Konrad Rzeszutek Wilk v2: * Spelling corrections in the commit message * Fix HVM migration issues --- xen/arch/x86/domain.c | 3 +-- xen/arch/x86/domctl.c | 15 ++- xen/arch/x86/hvm/hvm.c | 19 ++- xen/arch/x86/hvm/svm/svm.c | 4 ++-- xen/arch/x86/hvm/vmx/vmx.c | 4 ++-- xen/arch/x86/msr.c | 16 xen/arch/x86/pv/emul-inv-op.c | 4 +--- xen/arch/x86/pv/emul-priv-op.c | 5 - xen/arch/x86/time.c | 11 +++ xen/arch/x86/x86_emulate/x86_emulate.c | 1 + xen/include/asm-x86/hvm/hvm.h | 6 -- xen/include/asm-x86/hvm/vcpu.h | 1 - xen/include/asm-x86/msr.h | 9 + xen/include/public/arch-x86/cpufeatureset.h | 2 +- xen/include/public/arch-x86/hvm/save.h | 5 + xen/tools/gen-cpuid.py | 3 +++ 16 files changed, 72 insertions(+), 36 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 9c3527f..144d6f0 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1534,8 +1534,7 @@ void paravirt_ctxt_switch_to(struct vcpu *v) activate_debugregs(v); if ( cpu_has_rdtscp ) -wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP - ? v->domain->arch.incarnation : 0); +wrmsr_tsc_aux(v->arch.msr->tsc_aux); } /* Update per-VCPU guest runstate shared memory area (if registered). */ diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 8fbbf3a..979afdf 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -1249,6 +1249,7 @@ long arch_do_domctl( static const uint32_t msrs_to_send[] = { MSR_SPEC_CTRL, MSR_INTEL_MISC_FEATURES_ENABLES, +MSR_TSC_AUX, }; uint32_t nr_msrs = ARRAY_SIZE(msrs_to_send); @@ -1284,7 +1285,18 @@ long arch_do_domctl( for ( j = 0; j < ARRAY_SIZE(msrs_to_send); ++j ) { uint64_t val; -int rc = guest_rdmsr(v, msrs_to_send[j], ); +int rc; + +/* + * Skip MSR_TSC_AUX if using TSC_MODE_PVRDTSCP. In this + * case, the MSR is read-only, and should be rejected if + * seen on the restore side. + */ +if ( msrs_to_send[j] == MSR_TSC_AUX && + d->arch.tsc_mode == TSC_MODE_PVRDTSCP ) +continue; + +rc = guest_rdmsr(v, msrs_to_send[j], ); /* * It is the programmers responsibility to ensure that @@ -1373,6 +1385,7 @@ long arch_do_domctl(