Re: [Xen-devel] [PATCH v2] x86/PV: don't wrongly hide/expose CPUID.OSXSAVE from/to user mode

2016-08-23 Thread Jan Beulich
>>> On 23.08.16 at 11:48,  wrote:
> On 23/08/16 10:41, Jan Beulich wrote:
> On 23.08.16 at 11:24,  wrote:
>>> On 23/08/16 10:00, Jan Beulich wrote:
>>> On 22.08.16 at 19:30,  wrote:
> On 19/08/16 19:07, Andrew Cooper wrote:
>> On 19/08/16 18:09, Andrew Cooper wrote:
>>> On 19/08/16 13:53, Jan Beulich wrote:
 User mode code generally cannot be expected to invoke the PV-enabled
 CPUID Xen supports, and prior to the CPUID levelling changes for 4.7
 (as well as even nowadays on levelling incapable hardware) such CPUID
 invocations actually saw the host CR4.OSXSAVE value, whereas prior to
 this patch
 - on Intel guest user mode always saw the flag clear,
 - on AMD guest user mode saw the flag set even when the guest kernel
   didn't enable use of XSAVE/XRSTOR.
 Fold in the guest view of CR4.OSXSAVE when setting the levelling MSRs,
 just like we do in other CPUID handling.

 To make guest CR4 changes immediately visible via CPUID, also invoke
 ctxt_switch_levelling() from the CR4 write path.
> I was just putting together a patch series to make these changes in a
> more consistent manor, and have found a spanner in the works.
>
> Hiding Xen's view of OSXSAVE from a guests native cpuid will break
> Linux, because of the pile of hacks making up the current PV XSAVE 
> support.
 No, because PV Linux doesn't use native CPUID. We clearly should
 not hide OSXSAVE in the PV variant (and your earlier series did take
 great care to avoid that).
>>> Where? There is no distinction for OSXSAVE.
>>>
>>> The only place in pv_cpuid() which distinguishes native vs emulated
>>> cpuid is the X86_FEATURE_MONITOR handling.
>>>
>>> We distinguish kernel and userspace quite a lot, so the compatibility
>>> breakages are only visible to the kernel.
>> That's actually the part I meant.
> 
> I am sorry, but I now have no idea what you are referring to.  We can't
> reasonably do a kernel/userspace split with masking, because would
> become substantially higher overhead, and Xen isn't necessarily involved
> in the context switch back to userspace.

The kernel is fine as is, and unaffected by masking/faulting, as it
doesn't use native CPUID. Hence we only need to care about user
space, and that's what my patch attempts to correct.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/PV: don't wrongly hide/expose CPUID.OSXSAVE from/to user mode

2016-08-23 Thread Andrew Cooper
On 23/08/16 10:41, Jan Beulich wrote:
 On 23.08.16 at 11:24,  wrote:
>> On 23/08/16 10:00, Jan Beulich wrote:
>> On 22.08.16 at 19:30,  wrote:
 On 19/08/16 19:07, Andrew Cooper wrote:
> On 19/08/16 18:09, Andrew Cooper wrote:
>> On 19/08/16 13:53, Jan Beulich wrote:
>>> User mode code generally cannot be expected to invoke the PV-enabled
>>> CPUID Xen supports, and prior to the CPUID levelling changes for 4.7
>>> (as well as even nowadays on levelling incapable hardware) such CPUID
>>> invocations actually saw the host CR4.OSXSAVE value, whereas prior to
>>> this patch
>>> - on Intel guest user mode always saw the flag clear,
>>> - on AMD guest user mode saw the flag set even when the guest kernel
>>>   didn't enable use of XSAVE/XRSTOR.
>>> Fold in the guest view of CR4.OSXSAVE when setting the levelling MSRs,
>>> just like we do in other CPUID handling.
>>>
>>> To make guest CR4 changes immediately visible via CPUID, also invoke
>>> ctxt_switch_levelling() from the CR4 write path.
 I was just putting together a patch series to make these changes in a
 more consistent manor, and have found a spanner in the works.

 Hiding Xen's view of OSXSAVE from a guests native cpuid will break
 Linux, because of the pile of hacks making up the current PV XSAVE support.
>>> No, because PV Linux doesn't use native CPUID. We clearly should
>>> not hide OSXSAVE in the PV variant (and your earlier series did take
>>> great care to avoid that).
>> Where? There is no distinction for OSXSAVE.
>>
>> The only place in pv_cpuid() which distinguishes native vs emulated
>> cpuid is the X86_FEATURE_MONITOR handling.
>>
>> We distinguish kernel and userspace quite a lot, so the compatibility
>> breakages are only visible to the kernel.
> That's actually the part I meant.

I am sorry, but I now have no idea what you are referring to.  We can't
reasonably do a kernel/userspace split with masking, because would
become substantially higher overhead, and Xen isn't necessarily involved
in the context switch back to userspace.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/PV: don't wrongly hide/expose CPUID.OSXSAVE from/to user mode

2016-08-23 Thread Jan Beulich
>>> On 23.08.16 at 11:24,  wrote:
> On 23/08/16 10:00, Jan Beulich wrote:
> On 22.08.16 at 19:30,  wrote:
>>> On 19/08/16 19:07, Andrew Cooper wrote:
 On 19/08/16 18:09, Andrew Cooper wrote:
> On 19/08/16 13:53, Jan Beulich wrote:
>> User mode code generally cannot be expected to invoke the PV-enabled
>> CPUID Xen supports, and prior to the CPUID levelling changes for 4.7
>> (as well as even nowadays on levelling incapable hardware) such CPUID
>> invocations actually saw the host CR4.OSXSAVE value, whereas prior to
>> this patch
>> - on Intel guest user mode always saw the flag clear,
>> - on AMD guest user mode saw the flag set even when the guest kernel
>>   didn't enable use of XSAVE/XRSTOR.
>> Fold in the guest view of CR4.OSXSAVE when setting the levelling MSRs,
>> just like we do in other CPUID handling.
>>
>> To make guest CR4 changes immediately visible via CPUID, also invoke
>> ctxt_switch_levelling() from the CR4 write path.
>>> I was just putting together a patch series to make these changes in a
>>> more consistent manor, and have found a spanner in the works.
>>>
>>> Hiding Xen's view of OSXSAVE from a guests native cpuid will break
>>> Linux, because of the pile of hacks making up the current PV XSAVE support.
>> No, because PV Linux doesn't use native CPUID. We clearly should
>> not hide OSXSAVE in the PV variant (and your earlier series did take
>> great care to avoid that).
> 
> Where? There is no distinction for OSXSAVE.
> 
> The only place in pv_cpuid() which distinguishes native vs emulated
> cpuid is the X86_FEATURE_MONITOR handling.
> 
> We distinguish kernel and userspace quite a lot, so the compatibility
> breakages are only visible to the kernel.

That's actually the part I meant.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/PV: don't wrongly hide/expose CPUID.OSXSAVE from/to user mode

2016-08-23 Thread Andrew Cooper
On 23/08/16 10:00, Jan Beulich wrote:
 On 22.08.16 at 19:30,  wrote:
>> On 19/08/16 19:07, Andrew Cooper wrote:
>>> On 19/08/16 18:09, Andrew Cooper wrote:
 On 19/08/16 13:53, Jan Beulich wrote:
> User mode code generally cannot be expected to invoke the PV-enabled
> CPUID Xen supports, and prior to the CPUID levelling changes for 4.7
> (as well as even nowadays on levelling incapable hardware) such CPUID
> invocations actually saw the host CR4.OSXSAVE value, whereas prior to
> this patch
> - on Intel guest user mode always saw the flag clear,
> - on AMD guest user mode saw the flag set even when the guest kernel
>   didn't enable use of XSAVE/XRSTOR.
> Fold in the guest view of CR4.OSXSAVE when setting the levelling MSRs,
> just like we do in other CPUID handling.
>
> To make guest CR4 changes immediately visible via CPUID, also invoke
> ctxt_switch_levelling() from the CR4 write path.
>> I was just putting together a patch series to make these changes in a
>> more consistent manor, and have found a spanner in the works.
>>
>> Hiding Xen's view of OSXSAVE from a guests native cpuid will break
>> Linux, because of the pile of hacks making up the current PV XSAVE support.
> No, because PV Linux doesn't use native CPUID. We clearly should
> not hide OSXSAVE in the PV variant (and your earlier series did take
> great care to avoid that).

Where? There is no distinction for OSXSAVE.

The only place in pv_cpuid() which distinguishes native vs emulated
cpuid is the X86_FEATURE_MONITOR handling.

We distinguish kernel and userspace quite a lot, so the compatibility
breakages are only visible to the kernel.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/PV: don't wrongly hide/expose CPUID.OSXSAVE from/to user mode

2016-08-23 Thread Jan Beulich
>>> On 22.08.16 at 19:30,  wrote:
> On 19/08/16 19:07, Andrew Cooper wrote:
>> On 19/08/16 18:09, Andrew Cooper wrote:
>>> On 19/08/16 13:53, Jan Beulich wrote:
 User mode code generally cannot be expected to invoke the PV-enabled
 CPUID Xen supports, and prior to the CPUID levelling changes for 4.7
 (as well as even nowadays on levelling incapable hardware) such CPUID
 invocations actually saw the host CR4.OSXSAVE value, whereas prior to
 this patch
 - on Intel guest user mode always saw the flag clear,
 - on AMD guest user mode saw the flag set even when the guest kernel
   didn't enable use of XSAVE/XRSTOR.
 Fold in the guest view of CR4.OSXSAVE when setting the levelling MSRs,
 just like we do in other CPUID handling.

 To make guest CR4 changes immediately visible via CPUID, also invoke
 ctxt_switch_levelling() from the CR4 write path.
> 
> I was just putting together a patch series to make these changes in a
> more consistent manor, and have found a spanner in the works.
> 
> Hiding Xen's view of OSXSAVE from a guests native cpuid will break
> Linux, because of the pile of hacks making up the current PV XSAVE support.

No, because PV Linux doesn't use native CPUID. We clearly should
not hide OSXSAVE in the PV variant (and your earlier series did take
great care to avoid that).

Jan

> Some PV guests needs to see OSXSAVE in native CPUID before they will
> consider trying to use xsetbv.  I realise I have completely broken this
> on Intel following the mistake in determining how the MSR masks
> functioned, but seeing the value from CR4 of next will be no better.
> 
> Our choices are:
> 1) Always expose Xen's OSXSAVE, even to guest userspace
> 2) Context switch the MSRs even on guest userspace/kernel context switch.
> 
> The latter would allow us to expose Xen's OSXSAVE to guest kernels, but
> expose guest kernels' OSXSAVE to guest userspace.  However, given the
> number of ways for a guest kernel to context switch back to guest
> userspace without Xen's interaction, we can't reliably provide an
> architectural view to guest userspace.
> 
> Option 1 is the easiest patch, and least overhead.
> 
> ~Andrew




___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/PV: don't wrongly hide/expose CPUID.OSXSAVE from/to user mode

2016-08-22 Thread Andrew Cooper
On 19/08/16 19:07, Andrew Cooper wrote:
> On 19/08/16 18:09, Andrew Cooper wrote:
>> On 19/08/16 13:53, Jan Beulich wrote:
>>> User mode code generally cannot be expected to invoke the PV-enabled
>>> CPUID Xen supports, and prior to the CPUID levelling changes for 4.7
>>> (as well as even nowadays on levelling incapable hardware) such CPUID
>>> invocations actually saw the host CR4.OSXSAVE value, whereas prior to
>>> this patch
>>> - on Intel guest user mode always saw the flag clear,
>>> - on AMD guest user mode saw the flag set even when the guest kernel
>>>   didn't enable use of XSAVE/XRSTOR.
>>> Fold in the guest view of CR4.OSXSAVE when setting the levelling MSRs,
>>> just like we do in other CPUID handling.
>>>
>>> To make guest CR4 changes immediately visible via CPUID, also invoke
>>> ctxt_switch_levelling() from the CR4 write path.

I was just putting together a patch series to make these changes in a
more consistent manor, and have found a spanner in the works.

Hiding Xen's view of OSXSAVE from a guests native cpuid will break
Linux, because of the pile of hacks making up the current PV XSAVE support.

Some PV guests needs to see OSXSAVE in native CPUID before they will
consider trying to use xsetbv.  I realise I have completely broken this
on Intel following the mistake in determining how the MSR masks
functioned, but seeing the value from CR4 of next will be no better.

Our choices are:
1) Always expose Xen's OSXSAVE, even to guest userspace
2) Context switch the MSRs even on guest userspace/kernel context switch.

The latter would allow us to expose Xen's OSXSAVE to guest kernels, but
expose guest kernels' OSXSAVE to guest userspace.  However, given the
number of ways for a guest kernel to context switch back to guest
userspace without Xen's interaction, we can't reliably provide an
architectural view to guest userspace.

Option 1 is the easiest patch, and least overhead.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/PV: don't wrongly hide/expose CPUID.OSXSAVE from/to user mode

2016-08-19 Thread Andrew Cooper
On 19/08/16 18:09, Andrew Cooper wrote:
> On 19/08/16 13:53, Jan Beulich wrote:
>> User mode code generally cannot be expected to invoke the PV-enabled
>> CPUID Xen supports, and prior to the CPUID levelling changes for 4.7
>> (as well as even nowadays on levelling incapable hardware) such CPUID
>> invocations actually saw the host CR4.OSXSAVE value, whereas prior to
>> this patch
>> - on Intel guest user mode always saw the flag clear,
>> - on AMD guest user mode saw the flag set even when the guest kernel
>>   didn't enable use of XSAVE/XRSTOR.
>> Fold in the guest view of CR4.OSXSAVE when setting the levelling MSRs,
>> just like we do in other CPUID handling.
>>
>> To make guest CR4 changes immediately visible via CPUID, also invoke
>> ctxt_switch_levelling() from the CR4 write path.
>>
>> Signed-off-by: Jan Beulich 
> I have just rerun a more thorough test, and I clearly got some incorrect
> conclusions to start with.
>
> (XEN) '1' pressed -> Extreme debugging in progress...
> (XEN) Testing OSXSAVE
> (XEN) ** CR4[-], MSR[-], cpuid 0
> (XEN) ** CR4[+], MSR[-], cpuid 0
> (XEN) ** CR4[-], MSR[+], cpuid 0
> (XEN) ** CR4[+], MSR[+], cpuid 1
> (XEN) '2' pressed -> Extreme debugging in progress...
> (XEN) Testing APIC
> (XEN) ** APIC[-], MSR[-], cpuid 0
> (XEN) ** APIC[+], MSR[-], cpuid 0
> (XEN) ** APIC[-], MSR[+], cpuid 0
> (XEN) ** APIC[+], MSR[+], cpuid 1
> (XEN) Watchdog timer detects that CPU21 is stuck!
> ... (an IPI hitting this core while the APIC is hard disabled appears to
> get ignored, and other cores get unhappy)
>
> So on this Sandy Bridge box does match your observation of behaviour,
> and that masks are applied after fast-forwarded state.  I am rerunning
> on other hardware to see how they behave.

From Nehalem:

(XEN) '1' pressed -> Extreme debugging in progress...
(XEN) Testing OSXSAVE
(XEN) feature xsave missing - skipping OSXSAVE check
(XEN) '2' pressed -> Extreme debugging in progress...
(XEN) Testing APIC
(XEN) ** APIC[-], MSR[-], cpuid 0
(XEN) ** APIC[+], MSR[-], cpuid 0
(XEN) ** APIC[-], MSR[+], cpuid 0
(XEN) ** APIC[+], MSR[+], cpuid 1
(XEN) Watchdog timer detects that CPU6 is stuck!


From AMD Excavator:
(XEN) '1' pressed -> Extreme debugging in progress...
(XEN) Testing OSXSAVE
(XEN) ** CR4[-], MSR[-], cpuid 0
(XEN) ** CR4[+], MSR[-], cpuid 0
(XEN) ** CR4[-], MSR[+], cpuid 0
(XEN) ** CR4[+], MSR[+], cpuid 1
(XEN) '2' pressed -> Extreme debugging in progress...
(XEN) Testing APIC
(XEN) ** APIC[-], MSR[-], cpuid 0
(XEN) ** APIC[+], MSR[-], cpuid 0
(XEN) ** APIC[-], MSR[+], cpuid 0
(XEN) ** APIC[+], MSR[+], cpuid 1
(Curiously, no problems at all with short times of APIC hard disable)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/PV: don't wrongly hide/expose CPUID.OSXSAVE from/to user mode

2016-08-19 Thread Andrew Cooper
On 19/08/16 13:53, Jan Beulich wrote:
> User mode code generally cannot be expected to invoke the PV-enabled
> CPUID Xen supports, and prior to the CPUID levelling changes for 4.7
> (as well as even nowadays on levelling incapable hardware) such CPUID
> invocations actually saw the host CR4.OSXSAVE value, whereas prior to
> this patch
> - on Intel guest user mode always saw the flag clear,
> - on AMD guest user mode saw the flag set even when the guest kernel
>   didn't enable use of XSAVE/XRSTOR.
> Fold in the guest view of CR4.OSXSAVE when setting the levelling MSRs,
> just like we do in other CPUID handling.
>
> To make guest CR4 changes immediately visible via CPUID, also invoke
> ctxt_switch_levelling() from the CR4 write path.
>
> Signed-off-by: Jan Beulich 

I have just rerun a more thorough test, and I clearly got some incorrect
conclusions to start with.

(XEN) '1' pressed -> Extreme debugging in progress...
(XEN) Testing OSXSAVE
(XEN) ** CR4[-], MSR[-], cpuid 0
(XEN) ** CR4[+], MSR[-], cpuid 0
(XEN) ** CR4[-], MSR[+], cpuid 0
(XEN) ** CR4[+], MSR[+], cpuid 1
(XEN) '2' pressed -> Extreme debugging in progress...
(XEN) Testing APIC
(XEN) ** APIC[-], MSR[-], cpuid 0
(XEN) ** APIC[+], MSR[-], cpuid 0
(XEN) ** APIC[-], MSR[+], cpuid 0
(XEN) ** APIC[+], MSR[+], cpuid 1
(XEN) Watchdog timer detects that CPU21 is stuck!
... (an IPI hitting this core while the APIC is hard disabled appears to
get ignored, and other cores get unhappy)

So on this Sandy Bridge box does match your observation of behaviour,
and that masks are applied after fast-forwarded state.  I am rerunning
on other hardware to see how they behave.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] x86/PV: don't wrongly hide/expose CPUID.OSXSAVE from/to user mode

2016-08-19 Thread Jan Beulich
User mode code generally cannot be expected to invoke the PV-enabled
CPUID Xen supports, and prior to the CPUID levelling changes for 4.7
(as well as even nowadays on levelling incapable hardware) such CPUID
invocations actually saw the host CR4.OSXSAVE value, whereas prior to
this patch
- on Intel guest user mode always saw the flag clear,
- on AMD guest user mode saw the flag set even when the guest kernel
  didn't enable use of XSAVE/XRSTOR.
Fold in the guest view of CR4.OSXSAVE when setting the levelling MSRs,
just like we do in other CPUID handling.

To make guest CR4 changes immediately visible via CPUID, also invoke
ctxt_switch_levelling() from the CR4 write path.

Signed-off-by: Jan Beulich 
---
v2: Invert operation on AMD (from OR-ing in to AND-ing out), adjust
title, and extend description.

--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -206,17 +206,30 @@ static void __init noinline probe_maskin
 static void amd_ctxt_switch_levelling(const struct domain *nextd)
 {
struct cpuidmasks *these_masks = _cpu(cpuidmasks);
-   const struct cpuidmasks *masks =
-   (nextd && is_pv_domain(nextd) && 
nextd->arch.pv_domain.cpuidmasks)
-   ? nextd->arch.pv_domain.cpuidmasks : _defaults;
+   const struct cpuidmasks *masks = NULL;
+   unsigned long cr4;
+   uint64_t val__1cd = 0, val_e1cd = 0, val__7ab0 = 0, val__6c = 0;
+
+   if (nextd && is_pv_domain(nextd) && !is_idle_domain(nextd)) {
+   cr4 = current->arch.pv_vcpu.ctrlreg[4];
+   masks = nextd->arch.pv_domain.cpuidmasks;
+   } else
+   cr4 = read_cr4();
+
+   if (!(cr4 & X86_CR4_OSXSAVE))
+   val__1cd |= (uint64_t)cpufeat_mask(X86_FEATURE_OSXSAVE) << 32;
+
+   if (!masks)
+   masks = _defaults;
 
 #define LAZY(cap, msr, field)  \
({  \
-   if (unlikely(these_masks->field != masks->field) && \
+   val_##field = ~val_##field & masks->field;  
\
+   if (unlikely(these_masks->field != val_##field) &&  \
((levelling_caps & cap) == cap))\
{   \
-   wrmsr_amd(msr, masks->field);   \
-   these_masks->field = masks->field;  \
+   wrmsr_amd(msr, val_##field);\
+   these_masks->field = val_##field;   \
}   \
})
 
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -154,7 +154,9 @@ static void __init probe_masking_msrs(vo
 static void intel_ctxt_switch_levelling(const struct domain *nextd)
 {
struct cpuidmasks *these_masks = _cpu(cpuidmasks);
-   const struct cpuidmasks *masks;
+   const struct cpuidmasks *masks = NULL;
+   unsigned long cr4;
+   uint64_t val__1cd = 0, val_e1cd = 0, val_Da1 = 0;
 
if (cpu_has_cpuid_faulting) {
/*
@@ -178,16 +180,27 @@ static void intel_ctxt_switch_levelling(
return;
}
 
-   masks = (nextd && is_pv_domain(nextd) && 
nextd->arch.pv_domain.cpuidmasks)
-   ? nextd->arch.pv_domain.cpuidmasks : _defaults;
+   if (nextd && is_pv_domain(nextd) && !is_idle_domain(nextd)) {
+   cr4 = current->arch.pv_vcpu.ctrlreg[4];
+   masks = nextd->arch.pv_domain.cpuidmasks;
+   } else
+   cr4 = read_cr4();
+
+   /* OSXSAVE cleared by pv_featureset.  Fast-forward CR4 back in. */
+   if (cr4 & X86_CR4_OSXSAVE)
+   val__1cd |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+
+   if (!masks)
+   masks = _defaults;
 
 #define LAZY(msr, field)   \
({  \
-   if (unlikely(these_masks->field != masks->field) && \
+   val_##field |= masks->field;\
+   if (unlikely(these_masks->field != val_##field) &&  \
(msr))  \
{   \
-   wrmsrl((msr), masks->field);\
-   these_masks->field = masks->field;  \
+   wrmsrl((msr), val_##field); \
+   these_masks->field = val_##field;   \
}   \
})
 
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2733,6 +2733,7 @@ static int emulate_privileged_op(struct
 case 4: /* Write CR4 */