Re: [Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.

2018-02-26 Thread Jan Beulich
>>> On 23.02.18 at 16:51,  wrote:
> On 23/02/18 15:05, Jan Beulich wrote:
> On 21.02.18 at 12:36,  wrote:
>>> --- a/xen/arch/x86/msr.c
>>> +++ b/xen/arch/x86/msr.c
>>> @@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, 
>>> uint64_t *val)
>>> _MSR_MISC_FEATURES_CPUID_FAULTING;
>>>  break;
>>>  
>>> +case MSR_TSC_AUX:
>>> +if ( !cp->extd.rdtscp )
>>> +goto gp_fault;
>> Isn't this breaking the PV use without the feature flag set, but
>> running in TSC_MODE_PVRDTSCP? I.e. don't you want
>>
>> if ( !cp->extd.rdtscp &&
>>  d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
>>
>> ? Remember there are two cases, one being that the host has the
>> feature flag set, but the guest has it clear, and the other being
>> that both have it clear. Since in the former case the guest can read
>> the MSR through RDTSCP, I think the MSR access ought to be
>> allowed too.
> 
> There is at least a 3rd case, of no hardware RDTSCP support, which is
> why we also emulate it in emul-invl-op.c

Well, that's the "both have it clear" case I've mentioned above.

> A guest trying to use PVRDTSC necessarily needs out-of-band signalling
> to set it up.  I do not think it is reasonable or appropriate to retain
> the ABI breakage of completing reads of the MSR when the instruction
> should be architecturally unavailable.

Well, one can take either position, so I'm not going to object to
you not wanting to make the change.

>>> --- a/xen/arch/x86/time.c
>>> +++ b/xen/arch/x86/time.c
>>> @@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d,
>>>  }
>>>  break;
>>>  }
>>> +
>>>  d->arch.incarnation = incarnation + 1;
>>> +
>>> +if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
>>> +{
>>> +struct vcpu *v;
>>> +
>>> +/* Distribute incarnation into each vcpu's MSR_TSC_AUX. */
>>> +for_each_vcpu ( d, v )
>>> +v->arch.msr->tsc_aux = d->arch.incarnation;
>>> +}
>> This not needing a lock might warrant a comment (the domain is
>> [explicitly or implicitly] paused when coming here).
> 
> This new piece of logic isn't any different to the rest of
> tsc_set_info() WRT being paused.  It is explicitly paused at this point.

True.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.

2018-02-23 Thread Andrew Cooper
On 23/02/18 15:05, Jan Beulich wrote:
 On 21.02.18 at 12:36,  wrote:
>> There are many problems with MSR_TSC_AUX handling.
>>
>> To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
>> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
>> depends on RDTSCP.
> Are you sure this isn't a documentation mistake?

No, but nor do I have reason to doubt what is written.  It seems
plausible and reasonable to me.

RDTSCP has been in hardware for many generations now (and appears to be
architectural in AMD64 although not picked up by Intel initially), while
RDPID is new in Skylake.

The documentation is consistent between Vol 2 (CPUID, RDPID) and Vol 4
(MSRs), none of which make any link between RDPID enumeration and
TSC_AUX, but plenty of links between RDTSCP and TSC_AUX.

>  If it indeed isn't, of
> course my earlier comments regarding the use of cpu_has_rdpid
> would have been wrong (and that patch would be fine without the
> requested adjustment).
>
>> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
>> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP 
>> for
>> PV guests, which in turn allows RDPID to work.
>>
>> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
>> into the generic MSR policy infrastructure, and becomes common.  One
>> improvement is that we will now reject invalid values, rather than silently
>> truncating and accepting them.  This also causes the emulator to reject 
>> RDTSCP
>> for guests without the features.
>>
>> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
>> MSR_TSC_AUX and the reported value is actually the migration incarnation.  
>> The
>> previous behaviour of this mode was to silently drop writes, but as it is a
>> break in the x86 ABI to start with, switch the semantics to be more sane, and
>> have TSC_MODE_PVRDTSCP make the MSR properly read-only.
> All of this obviously wants an ack and/or testing by the Oracle folks
> (assuming this is still in use somewhere).
>
>> @@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
>> hvm_domain_context_t *h)
>>  if ( hvm_funcs.tsc_scaling.setup )
>>  hvm_funcs.tsc_scaling.setup(v);
>>  
>> -v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
>> +/* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. 
>> */
>> +if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
>> +v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;
> Since you talk about range checking in the description, shouldn't
> you reject here values with the upper 32 bits non-zero?

Probably. In reality all migration streams have it within range, because
of the types used on the sending side.

>
>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, 
>> uint64_t *val)
>> _MSR_MISC_FEATURES_CPUID_FAULTING;
>>  break;
>>  
>> +case MSR_TSC_AUX:
>> +if ( !cp->extd.rdtscp )
>> +goto gp_fault;
> Isn't this breaking the PV use without the feature flag set, but
> running in TSC_MODE_PVRDTSCP? I.e. don't you want
>
> if ( !cp->extd.rdtscp &&
>  d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
>
> ? Remember there are two cases, one being that the host has the
> feature flag set, but the guest has it clear, and the other being
> that both have it clear. Since in the former case the guest can read
> the MSR through RDTSCP, I think the MSR access ought to be
> allowed too.

There is at least a 3rd case, of no hardware RDTSCP support, which is
why we also emulate it in emul-invl-op.c

A guest trying to use PVRDTSC necessarily needs out-of-band signalling
to set it up.  I do not think it is reasonable or appropriate to retain
the ABI breakage of completing reads of the MSR when the instruction
should be architecturally unavailable.

But as you've said and I agree, we definitely need Oracle's input here.

>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d,
>>  }
>>  break;
>>  }
>> +
>>  d->arch.incarnation = incarnation + 1;
>> +
>> +if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
>> +{
>> +struct vcpu *v;
>> +
>> +/* Distribute incarnation into each vcpu's MSR_TSC_AUX. */
>> +for_each_vcpu ( d, v )
>> +v->arch.msr->tsc_aux = d->arch.incarnation;
>> +}
> This not needing a lock might warrant a comment (the domain is
> [explicitly or implicitly] paused when coming here).

This new piece of logic isn't any different to the rest of
tsc_set_info() WRT being paused.  It is explicitly paused at this point.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.

2018-02-23 Thread Jan Beulich
>>> On 21.02.18 at 12:36,  wrote:
> There are many problems with MSR_TSC_AUX handling.
> 
> To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
> depends on RDTSCP.

Are you sure this isn't a documentation mistake? If it indeed isn't, of
course my earlier comments regarding the use of cpu_has_rdpid
would have been wrong (and that patch would be fine without the
requested adjustment).

> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
> PV guests, which in turn allows RDPID to work.
> 
> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
> into the generic MSR policy infrastructure, and becomes common.  One
> improvement is that we will now reject invalid values, rather than silently
> truncating and accepting them.  This also causes the emulator to reject RDTSCP
> for guests without the features.
> 
> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
> MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
> previous behaviour of this mode was to silently drop writes, but as it is a
> break in the x86 ABI to start with, switch the semantics to be more sane, and
> have TSC_MODE_PVRDTSCP make the MSR properly read-only.

All of this obviously wants an ack and/or testing by the Oracle folks
(assuming this is still in use somewhere).

> @@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
> hvm_domain_context_t *h)
>  if ( hvm_funcs.tsc_scaling.setup )
>  hvm_funcs.tsc_scaling.setup(v);
>  
> -v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
> +/* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. */
> +if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
> +v->arch.msr->tsc_aux = ctxt.msr_tsc_aux;

Since you talk about range checking in the description, shouldn't
you reject here values with the upper 32 bits non-zero?

> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -175,6 +175,13 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, 
> uint64_t *val)
> _MSR_MISC_FEATURES_CPUID_FAULTING;
>  break;
>  
> +case MSR_TSC_AUX:
> +if ( !cp->extd.rdtscp )
> +goto gp_fault;

Isn't this breaking the PV use without the feature flag set, but
running in TSC_MODE_PVRDTSCP? I.e. don't you want

if ( !cp->extd.rdtscp &&
 d->arch.tsc_mode != TSC_MODE_PVRDTSCP )

? Remember there are two cases, one being that the host has the
feature flag set, but the guest has it clear, and the other being
that both have it clear. Since in the former case the guest can read
the MSR through RDTSCP, I think the MSR access ought to be
allowed too.

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2178,7 +2178,18 @@ void tsc_set_info(struct domain *d,
>  }
>  break;
>  }
> +
>  d->arch.incarnation = incarnation + 1;
> +
> +if ( d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
> +{
> +struct vcpu *v;
> +
> +/* Distribute incarnation into each vcpu's MSR_TSC_AUX. */
> +for_each_vcpu ( d, v )
> +v->arch.msr->tsc_aux = d->arch.incarnation;
> +}

This not needing a lock might warrant a comment (the domain is
[explicitly or implicitly] paused when coming here).

> --- a/xen/tools/gen-cpuid.py
> +++ b/xen/tools/gen-cpuid.py
> @@ -235,6 +235,9 @@ def crunch_numbers(state):
>  # absence of any enabled xstate.
>  AVX: [FMA, FMA4, F16C, AVX2, XOP],
>  
> +# MSR_TSC_AUX is enumerated by RDTSCP, but RDPID also reads TSC_AUX.
> +RDTSCP: [RDPID],

As per above I'm not convinced this is a valid dependency.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.

2018-02-21 Thread Roger Pau Monné
On Wed, Feb 21, 2018 at 11:36:15AM +, Andrew Cooper wrote:
> There are many problems with MSR_TSC_AUX handling.
> 
> To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
> RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
> depends on RDTSCP.
> 
> For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
> writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
> PV guests, which in turn allows RDPID to work.
> 
> To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
> into the generic MSR policy infrastructure, and becomes common.  One
> improvement is that we will now reject invalid values, rather than silently
> truncating and accepting them.  This also causes the emulator to reject RDTSCP
> for guests without the features.
> 
> One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
> MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
> previous behaviour of this mode was to silently drop writes, but as it is a
> break in the x86 ABI to start with, switch the semantics to be more sane, and
> have TSC_MODE_PVRDTSCP make the MSR properly read-only.
> 
> With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs
> migrating, so it is moved into the common MSR logic for PV guests.  Care must
> be taken however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as
> the receiving side will reject the guest_wrmsr().  The HVM side is tweaked as
> well to only send/receive hvm_hw_cpu.msr_tsc_aux when the TSC logic isn't in
> control of the value.
> 
> What remains is that tsc_set_info() need to broadcast d->arch.incarnation to
> all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and
> emulation code functions correctly.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

Just one comment nit below.

> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 0539551..e45f6df 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -792,7 +792,9 @@ static int hvm_save_cpu_ctxt(struct domain *d, 
> hvm_domain_context_t *h)
>  
>  ctxt.tsc = hvm_get_guest_tsc_fixed(v, d->arch.hvm_domain.sync_tsc);
>  
> -ctxt.msr_tsc_aux = hvm_msr_tsc_aux(v);
> +/* Only send MSR_TSC_AUX if it isn't being handled by the TSC logic. 
> */
> +if ( d->arch.tsc_mode != TSC_MODE_PVRDTSCP )
> +ctxt.msr_tsc_aux = v->arch.msr->tsc_aux;
>  
>  hvm_get_segment_register(v, x86_seg_idtr, );
>  ctxt.idtr_limit = seg.limit;
> @@ -1046,7 +1048,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
> hvm_domain_context_t *h)
>  if ( hvm_funcs.tsc_scaling.setup )
>  hvm_funcs.tsc_scaling.setup(v);
>  
> -v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
> +/* Only accept MSR_TSC_AUX if it isn't being handled by the TSC logic. */
   ^ set?

We actually accept it. Ie: Xen doesn't don't error out when
msr_tsc_aux is set and d->arch.tsc_mode == TSC_MODE_PVRDTSCP, or else
we would break backwards compatibility.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 5/5] x86: Rework MSR_TSC_AUX handling from scratch.

2018-02-21 Thread Andrew Cooper
There are many problems with MSR_TSC_AUX handling.

To begin with, the RDPID instruction reads MSR_TSC_AUX, but it is only the
RDTSCP feature which enumerates the MSR.  Therefore, RDPID functionally
depends on RDTSCP.

For PV guests, we hide RDTSCP but advertise RDPID.  We also silently drop
writes to MSR_TSC_AUX, which is very antisocial.  Therefore, enable RDTSCP for
PV guests, which in turn allows RDPID to work.

To support RDTSCP properly for PV guests, the MSR_TSC_AUX handling is moved
into the generic MSR policy infrastructure, and becomes common.  One
improvement is that we will now reject invalid values, rather than silently
truncating and accepting them.  This also causes the emulator to reject RDTSCP
for guests without the features.

One complication is TSC_MODE_PVRDTSCP, in which Xen takes control of
MSR_TSC_AUX and the reported value is actually the migration incarnation.  The
previous behaviour of this mode was to silently drop writes, but as it is a
break in the x86 ABI to start with, switch the semantics to be more sane, and
have TSC_MODE_PVRDTSCP make the MSR properly read-only.

With PV guests getting to use MSR_TSC_AUX properly now, the MSR needs
migrating, so it is moved into the common MSR logic for PV guests.  Care must
be taken however to avoid sending the MSR if TSC_MODE_PVRDTSCP is active, as
the receiving side will reject the guest_wrmsr().  The HVM side is tweaked as
well to only send/receive hvm_hw_cpu.msr_tsc_aux when the TSC logic isn't in
control of the value.

What remains is that tsc_set_info() need to broadcast d->arch.incarnation to
all vCPUs MSR block if in TSC_MODE_PVRDTSCP, so the context switching and
emulation code functions correctly.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: Boris Ostrovsky 
CC: Suravee Suthikulpanit 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Konrad Rzeszutek Wilk 

v2:
 * Spelling corrections in the commit message
 * Fix HVM migration issues
---
 xen/arch/x86/domain.c   |  3 +--
 xen/arch/x86/domctl.c   | 15 ++-
 xen/arch/x86/hvm/hvm.c  | 19 ++-
 xen/arch/x86/hvm/svm/svm.c  |  4 ++--
 xen/arch/x86/hvm/vmx/vmx.c  |  4 ++--
 xen/arch/x86/msr.c  | 16 
 xen/arch/x86/pv/emul-inv-op.c   |  4 +---
 xen/arch/x86/pv/emul-priv-op.c  |  5 -
 xen/arch/x86/time.c | 11 +++
 xen/arch/x86/x86_emulate/x86_emulate.c  |  1 +
 xen/include/asm-x86/hvm/hvm.h   |  6 --
 xen/include/asm-x86/hvm/vcpu.h  |  1 -
 xen/include/asm-x86/msr.h   |  9 +
 xen/include/public/arch-x86/cpufeatureset.h |  2 +-
 xen/include/public/arch-x86/hvm/save.h  |  5 +
 xen/tools/gen-cpuid.py  |  3 +++
 16 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9c3527f..144d6f0 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1534,8 +1534,7 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
 activate_debugregs(v);
 
 if ( cpu_has_rdtscp )
-wrmsr_tsc_aux(v->domain->arch.tsc_mode == TSC_MODE_PVRDTSCP
-  ? v->domain->arch.incarnation : 0);
+wrmsr_tsc_aux(v->arch.msr->tsc_aux);
 }
 
 /* Update per-VCPU guest runstate shared memory area (if registered). */
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3a..979afdf 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1249,6 +1249,7 @@ long arch_do_domctl(
 static const uint32_t msrs_to_send[] = {
 MSR_SPEC_CTRL,
 MSR_INTEL_MISC_FEATURES_ENABLES,
+MSR_TSC_AUX,
 };
 uint32_t nr_msrs = ARRAY_SIZE(msrs_to_send);
 
@@ -1284,7 +1285,18 @@ long arch_do_domctl(
 for ( j = 0; j < ARRAY_SIZE(msrs_to_send); ++j )
 {
 uint64_t val;
-int rc = guest_rdmsr(v, msrs_to_send[j], );
+int rc;
+
+/*
+ * Skip MSR_TSC_AUX if using TSC_MODE_PVRDTSCP.  In this
+ * case, the MSR is read-only, and should be rejected if
+ * seen on the restore side.
+ */
+if ( msrs_to_send[j] == MSR_TSC_AUX &&
+ d->arch.tsc_mode == TSC_MODE_PVRDTSCP )
+continue;
+
+rc = guest_rdmsr(v, msrs_to_send[j], );
 
 /*
  * It is the programmers responsibility to ensure that
@@ -1373,6 +1385,7 @@ long arch_do_domctl(