Re: [Xen-devel] [PATCH 8/8] xen/x86: Additional SMAP modes to work around buggy 32bit PV guests

2015-06-25 Thread Jan Beulich
>>> On 24.06.15 at 18:31,  wrote:
> @@ -784,6 +791,11 @@ unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu 
> *v)
>  if ( v->domain->arch.vtsc )
>  cr4 |= X86_CR4_TSD;
>  
> +/* Disable SMAP behind unaware 32bit PV guests. */
> +if ( (smap_mode == smap_mode_compat) && is_pv_32bit_vcpu(v) &&
> + ((v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_SMAP) == 0) )
> +cr4 &= ~X86_CR4_SMAP;

There's actually another problem here: The function you modify
is used by paravirt_ctxt_switch_to(). Obviously you will want to
make sure to re-enable SMAP when switching away from the
guest, and not only at the point you finally switch to a suitable
other non-idle vCPU.

And then there's the question whether the above fixup wouldn't
better be deferred until compat_restore_all_guest, minimizing
the "damage" to Xen?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 8/8] xen/x86: Additional SMAP modes to work around buggy 32bit PV guests

2015-06-25 Thread Andrew Cooper
On 25/06/15 16:12, Jan Beulich wrote:
 On 24.06.15 at 18:31,  wrote:
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1261,11 +1261,32 @@ Set the serial transmit buffer size.
>>  Flag to enable Supervisor Mode Execution Protection
>>  
>>  ### smap
>> -> `= `
>> +> `=  | compat | fixup`
>>  
>>  > Default: `true`
> Knowing that it breaks certain guests, I think the default can't be
> true, but instead needs to become compat. People wanting more
> security at the expense of breaking guests can then pick either
> of true or fixup.

Ok.

> Jan
>


>
>> @@ -730,6 +731,12 @@ static int __init init_pv_cr4_masks(void)
>>  if ( cpu_has_fsgsbase )
>>  pv_cr4_mask &= ~X86_CR4_FSGSBASE;
>>  
>> +/*
>> + * 32bit PV guests may attempt to modify SMAP.
>> + */
>> +if ( cpu_has_smap )
>> +compat_pv_cr4_mask &= ~X86_CR4_SMAP;
> Shouldn't this then be accompanied by exposing the CPUID flag to
> 32-bit guests?

I have become undecided on whether actually exposing SMAP is sensible or
not.  Not exposing it does make these fixes more simple.

>
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -1437,21 +1437,38 @@ static int fixup_page_fault(unsigned long addr, 
>> struct cpu_user_regs *regs)
>>  return 0;
>>  }
>>  
>> -if ( guest_kernel_mode(v, regs) &&
>> - !(regs->error_code & (PFEC_reserved_bit | PFEC_insn_fetch)) &&
>> - (regs->error_code & PFEC_write_access) )
>> -{
>> -if ( VM_ASSIST(d, writable_pagetables) &&
>> - /* Do not check if access-protection fault since the page may
>> -legitimately be not present in shadow page tables */
>> - (paging_mode_enabled(d) ||
>> -  (regs->error_code & PFEC_page_present)) &&
>> - ptwr_do_page_fault(v, addr, regs) )
>> -return EXCRET_fault_fixed;
>> +if ( guest_kernel_mode(v, regs) )
>> +{
>> +if (!(regs->error_code & (PFEC_reserved_bit | PFEC_insn_fetch)) &&
>> +(regs->error_code & PFEC_write_access) )
>> +{
>> +if ( VM_ASSIST(d, writable_pagetables) &&
>> + /* Do not check if access-protection fault since the page 
>> may
>> +legitimately be not present in shadow page tables */
>> + (paging_mode_enabled(d) ||
>> +  (regs->error_code & PFEC_page_present)) &&
>> + ptwr_do_page_fault(v, addr, regs) )
>> +return EXCRET_fault_fixed;
>>  
>> -if ( is_hardware_domain(d) && (regs->error_code & 
>> PFEC_page_present) &&
>> - mmio_ro_do_page_fault(v, addr, regs) )
>> +if ( is_hardware_domain(d) && (regs->error_code & 
>> PFEC_page_present) &&
>> + mmio_ro_do_page_fault(v, addr, regs) )
>> +return EXCRET_fault_fixed;
>> +}
>> +
>> +/*
>> + * SMAP violation behind an unaware 32bit PV guest kernel? Set
>> + * EFLAGS.AC behind its back and try again.
>> + */
>> +if ( (smap_mode == smap_mode_fixup) && is_pv_32bit_domain(d) &&
>> + ((regs->error_code &
>> +   (PFEC_insn_fetch | PFEC_reserved_bit |
>> +PFEC_user_mode | PFEC_page_present)) == PFEC_page_present) 
>> &&
>> + ((v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_SMAP) == 0) &&
>> + ((regs->eflags & X86_EFLAGS_AC) == 0) )
> At least for single bit checks like these, could I talk you into using
> !(x & mask), as is being done elsewhere in the code you modify
> above?

Ok.

>
> Also here as well as in the code enforcing compat mode, I'm not
> sure it's correct to tie this to the current guest setting of CR4.SMAP.
> Instead I think you should latch any attempt by the guest to set
> the bit into a per-domain flag. After all it may itself have reasons to
> play strange things with the bit.

In both cases, if the guest sets CR4.SMAP itself, Xen needs to stop any
fixup behind the guests back.  If the guest subsequently clears SMAP for
a bit, it will expect to stop getting violations, but Xen still wants to
fix things up properly.

>
> Or alternatively the terminology in the comment and command line
> option documentation should be changed from "unaware" to "not
> currently having enabled".

This is better.  I will update the description.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 8/8] xen/x86: Additional SMAP modes to work around buggy 32bit PV guests

2015-06-25 Thread Jan Beulich
>>> On 24.06.15 at 18:31,  wrote:
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1261,11 +1261,32 @@ Set the serial transmit buffer size.
>  Flag to enable Supervisor Mode Execution Protection
>  
>  ### smap
> -> `= `
> +> `=  | compat | fixup`
>  
>  > Default: `true`

Knowing that it breaks certain guests, I think the default can't be
true, but instead needs to become compat. People wanting more
security at the expense of breaking guests can then pick either
of true or fixup.

> @@ -730,6 +731,12 @@ static int __init init_pv_cr4_masks(void)
>  if ( cpu_has_fsgsbase )
>  pv_cr4_mask &= ~X86_CR4_FSGSBASE;
>  
> +/*
> + * 32bit PV guests may attempt to modify SMAP.
> + */
> +if ( cpu_has_smap )
> +compat_pv_cr4_mask &= ~X86_CR4_SMAP;

Shouldn't this then be accompanied by exposing the CPUID flag to
32-bit guests?

> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1437,21 +1437,38 @@ static int fixup_page_fault(unsigned long addr, 
> struct cpu_user_regs *regs)
>  return 0;
>  }
>  
> -if ( guest_kernel_mode(v, regs) &&
> - !(regs->error_code & (PFEC_reserved_bit | PFEC_insn_fetch)) &&
> - (regs->error_code & PFEC_write_access) )
> -{
> -if ( VM_ASSIST(d, writable_pagetables) &&
> - /* Do not check if access-protection fault since the page may
> -legitimately be not present in shadow page tables */
> - (paging_mode_enabled(d) ||
> -  (regs->error_code & PFEC_page_present)) &&
> - ptwr_do_page_fault(v, addr, regs) )
> -return EXCRET_fault_fixed;
> +if ( guest_kernel_mode(v, regs) )
> +{
> +if (!(regs->error_code & (PFEC_reserved_bit | PFEC_insn_fetch)) &&
> +(regs->error_code & PFEC_write_access) )
> +{
> +if ( VM_ASSIST(d, writable_pagetables) &&
> + /* Do not check if access-protection fault since the page 
> may
> +legitimately be not present in shadow page tables */
> + (paging_mode_enabled(d) ||
> +  (regs->error_code & PFEC_page_present)) &&
> + ptwr_do_page_fault(v, addr, regs) )
> +return EXCRET_fault_fixed;
>  
> -if ( is_hardware_domain(d) && (regs->error_code & PFEC_page_present) 
> &&
> - mmio_ro_do_page_fault(v, addr, regs) )
> +if ( is_hardware_domain(d) && (regs->error_code & 
> PFEC_page_present) &&
> + mmio_ro_do_page_fault(v, addr, regs) )
> +return EXCRET_fault_fixed;
> +}
> +
> +/*
> + * SMAP violation behind an unaware 32bit PV guest kernel? Set
> + * EFLAGS.AC behind its back and try again.
> + */
> +if ( (smap_mode == smap_mode_fixup) && is_pv_32bit_domain(d) &&
> + ((regs->error_code &
> +   (PFEC_insn_fetch | PFEC_reserved_bit |
> +PFEC_user_mode | PFEC_page_present)) == PFEC_page_present) &&
> + ((v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_SMAP) == 0) &&
> + ((regs->eflags & X86_EFLAGS_AC) == 0) )

At least for single bit checks like these, could I talk you into using
!(x & mask), as is being done elsewhere in the code you modify
above?

Also here as well as in the code enforcing compat mode, I'm not
sure it's correct to tie this to the current guest setting of CR4.SMAP.
Instead I think you should latch any attempt by the guest to set
the bit into a per-domain flag. After all it may itself have reasons to
play strange things with the bit.

Or alternatively the terminology in the comment and command line
option documentation should be changed from "unaware" to "not
currently having enabled".

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 8/8] xen/x86: Additional SMAP modes to work around buggy 32bit PV guests

2015-06-25 Thread Andrew Cooper
On 25/06/15 12:18, David Vrabel wrote:
> On 24/06/15 17:31, Andrew Cooper wrote:
>> Experimentally, older Linux guests perform construction of `init` with user
>> pagetable mappings.  This is fine for native systems as such a guest would 
>> not
>> set CR4.SMAP itself.
>>
>> However if Xen uses SMAP itself, 32bit PV guests (whose kernels run in ring1)
>> are also affected.  Older Linux guests end up spinning in a loop assuming 
>> that
>> the SMAP violation pagefaults are spurious, and make no further progress.
>>
>> One option is to disable SMAP completely, but this is unreasonable.  A better
>> alternative is to disable SMAP only in the context of 32bit PV guests, but
>> reduces the effectiveness SMAP security.  A 3rd option is for Xen to fix up
>> behind a 32bit guest if it were SMAP-aware.  It is a heuristic, and does
>> result in a guest-visible state change, but allows Xen to keep CR4.SMAP
>> unconditionally enabled.
> [...]
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1261,11 +1261,32 @@ Set the serial transmit buffer size.
>>  Flag to enable Supervisor Mode Execution Protection
>>  
>>  ### smap
>> -> `= `
>> +> `=  | compat | fixup`
>>  
>>  > Default: `true`
>>  
>> -Flag to enable Supervisor Mode Access Prevention
>> +Handling of Supervisor Mode Access Prevention.
>> +
>> +32bit PV guest kernels qualify as supervisor code, as they execute in ring 
>> 1.
>> +If Xen uses SMAP protection itself, a PV guest which is not SMAP aware may
>> +suffer unexpected pagefaults which it cannot handle. (Experimentally, there
>> +are 32bit PV guests which fall foul of SMAP enforcement and spin in an
>> +infinite loop taking pagefaults early on boot.)
>> +
>> +Two further SMAP modes are introduced to work around buggy 32bit PV guests 
>> to
>> +prevent functional regressions of VMs on newer hardware.  At any point if 
>> the
>> +guest sets `CR4.SMAP` itself, it is deemed aware, and **compat/fixup** cease
>> +to apply.
> Guests that is not aware of SMAP or do not support it are not "buggy".

Taking and not understanding a SMAP #PF is understandable.  The way it
spins in an infinite loop is unquestionably buggy.

>
>> +
>> +A SMAP mode of **compat** causes Xen to disable `CR4.SMAP` in the context of
>> +an unaware 32bit PV guest.  This prevents the guest from being subject to 
>> SMAP
>> +enforcement, but also prevents Xen from benefiting from the added security
>> +checks.
>> +
>> +A SMAP mode of **fixup** causes Xen to set `EFLAGS.AC` when discovering a 
>> SMAP
>> +pagefault in the context of an unaware 32bit PV guest.  This allows Xen to
>> +retain the added security from SMAP checks, but results in a guest-visible
>> +state change which it might object to.
> What does the PV ABI say about the use of EFLAGS.AC?  Have guests
> historically been allowed to use this bit?  If so, does Xen fiddling
> with it potentially break some guests?

If there were an ABI written down anywhere, I might be able to answer
that question.

32bit PV guest kernels cannot make use of AC themselves; alignment
checking is only available in cpl3.  AC is however able to be changed by
a popf instruction even in cpl3 (which make it very curious as to why
stac/clac are strictly cpl0 instructions).

Fundamentally, smap=fixup might indeed break a PV guest, but testing
shows that RHEL/CentOS 5/6, SLES 11/12 and Debian 6/7 PV guests are all
fine with it.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 8/8] xen/x86: Additional SMAP modes to work around buggy 32bit PV guests

2015-06-25 Thread David Vrabel
On 24/06/15 17:31, Andrew Cooper wrote:
> Experimentally, older Linux guests perform construction of `init` with user
> pagetable mappings.  This is fine for native systems as such a guest would not
> set CR4.SMAP itself.
> 
> However if Xen uses SMAP itself, 32bit PV guests (whose kernels run in ring1)
> are also affected.  Older Linux guests end up spinning in a loop assuming that
> the SMAP violation pagefaults are spurious, and make no further progress.
> 
> One option is to disable SMAP completely, but this is unreasonable.  A better
> alternative is to disable SMAP only in the context of 32bit PV guests, but
> reduces the effectiveness SMAP security.  A 3rd option is for Xen to fix up
> behind a 32bit guest if it were SMAP-aware.  It is a heuristic, and does
> result in a guest-visible state change, but allows Xen to keep CR4.SMAP
> unconditionally enabled.
[...]
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1261,11 +1261,32 @@ Set the serial transmit buffer size.
>  Flag to enable Supervisor Mode Execution Protection
>  
>  ### smap
> -> `= `
> +> `=  | compat | fixup`
>  
>  > Default: `true`
>  
> -Flag to enable Supervisor Mode Access Prevention
> +Handling of Supervisor Mode Access Prevention.
> +
> +32bit PV guest kernels qualify as supervisor code, as they execute in ring 1.
> +If Xen uses SMAP protection itself, a PV guest which is not SMAP aware may
> +suffer unexpected pagefaults which it cannot handle. (Experimentally, there
> +are 32bit PV guests which fall foul of SMAP enforcement and spin in an
> +infinite loop taking pagefaults early on boot.)
> +
> +Two further SMAP modes are introduced to work around buggy 32bit PV guests to
> +prevent functional regressions of VMs on newer hardware.  At any point if the
> +guest sets `CR4.SMAP` itself, it is deemed aware, and **compat/fixup** cease
> +to apply.

Guests that is not aware of SMAP or do not support it are not "buggy".

> +
> +A SMAP mode of **compat** causes Xen to disable `CR4.SMAP` in the context of
> +an unaware 32bit PV guest.  This prevents the guest from being subject to 
> SMAP
> +enforcement, but also prevents Xen from benefiting from the added security
> +checks.
> +
> +A SMAP mode of **fixup** causes Xen to set `EFLAGS.AC` when discovering a 
> SMAP
> +pagefault in the context of an unaware 32bit PV guest.  This allows Xen to
> +retain the added security from SMAP checks, but results in a guest-visible
> +state change which it might object to.

What does the PV ABI say about the use of EFLAGS.AC?  Have guests
historically been allowed to use this bit?  If so, does Xen fiddling
with it potentially break some guests?

David


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 8/8] xen/x86: Additional SMAP modes to work around buggy 32bit PV guests

2015-06-24 Thread Andrew Cooper
Experimentally, older Linux guests perform construction of `init` with user
pagetable mappings.  This is fine for native systems as such a guest would not
set CR4.SMAP itself.

However if Xen uses SMAP itself, 32bit PV guests (whose kernels run in ring1)
are also affected.  Older Linux guests end up spinning in a loop assuming that
the SMAP violation pagefaults are spurious, and make no further progress.

One option is to disable SMAP completely, but this is unreasonable.  A better
alternative is to disable SMAP only in the context of 32bit PV guests, but
reduces the effectiveness SMAP security.  A 3rd option is for Xen to fix up
behind a 32bit guest if it were SMAP-aware.  It is a heuristic, and does
result in a guest-visible state change, but allows Xen to keep CR4.SMAP
unconditionally enabled.

Signed-off-by: Andrew Cooper 
CC: Jan Beulich 
---
 docs/misc/xen-command-line.markdown |   25 ++--
 xen/arch/x86/domain.c   |   16 +++--
 xen/arch/x86/setup.c|   31 +
 xen/arch/x86/traps.c|   43 ---
 xen/include/asm-x86/processor.h |   10 
 5 files changed, 103 insertions(+), 22 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index aa684c0..a63d2b2 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1261,11 +1261,32 @@ Set the serial transmit buffer size.
 Flag to enable Supervisor Mode Execution Protection
 
 ### smap
-> `= `
+> `=  | compat | fixup`
 
 > Default: `true`
 
-Flag to enable Supervisor Mode Access Prevention
+Handling of Supervisor Mode Access Prevention.
+
+32bit PV guest kernels qualify as supervisor code, as they execute in ring 1.
+If Xen uses SMAP protection itself, a PV guest which is not SMAP aware may
+suffer unexpected pagefaults which it cannot handle. (Experimentally, there
+are 32bit PV guests which fall foul of SMAP enforcement and spin in an
+infinite loop taking pagefaults early on boot.)
+
+Two further SMAP modes are introduced to work around buggy 32bit PV guests to
+prevent functional regressions of VMs on newer hardware.  At any point if the
+guest sets `CR4.SMAP` itself, it is deemed aware, and **compat/fixup** cease
+to apply.
+
+A SMAP mode of **compat** causes Xen to disable `CR4.SMAP` in the context of
+an unaware 32bit PV guest.  This prevents the guest from being subject to SMAP
+enforcement, but also prevents Xen from benefiting from the added security
+checks.
+
+A SMAP mode of **fixup** causes Xen to set `EFLAGS.AC` when discovering a SMAP
+pagefault in the context of an unaware 32bit PV guest.  This allows Xen to
+retain the added security from SMAP checks, but results in a guest-visible
+state change which it might object to.
 
 ### snb\_igd\_quirk
 > `=  | cap | `
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 18c3c62..045a2b9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -687,9 +687,10 @@ void arch_domain_unpause(struct domain *d)
  *  - TSD for vtsc
  *  - DE for %%dr4/5 emulation
  *  - OSXSAVE for xsetbv emulation
+ *  - SMAP for smap_mode_{compat,fixup} handling
  */
 #define PV_CR4_SHADOW   \
-(X86_CR4_TSD | X86_CR4_DE | X86_CR4_OSXSAVE)
+(X86_CR4_TSD | X86_CR4_DE | X86_CR4_OSXSAVE | X86_CR4_SMAP)
 
 /* CR4 bits a guest controls. */
 #define PV_CR4_GUEST (X86_CR4_TSD)
@@ -700,7 +701,7 @@ void arch_domain_unpause(struct domain *d)
  */
 #define PV_CR4_READ \
 (X86_CR4_PAE | X86_CR4_PGE |X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT |   \
- X86_CR4_FSGSBASE | X86_CR4_SMEP | X86_CR4_SMAP)
+ X86_CR4_FSGSBASE | X86_CR4_SMEP)
 
 /*
  * These are the masks of CR4 bits (subject to hardware availability) which a
@@ -730,6 +731,12 @@ static int __init init_pv_cr4_masks(void)
 if ( cpu_has_fsgsbase )
 pv_cr4_mask &= ~X86_CR4_FSGSBASE;
 
+/*
+ * 32bit PV guests may attempt to modify SMAP.
+ */
+if ( cpu_has_smap )
+compat_pv_cr4_mask &= ~X86_CR4_SMAP;
+
 BUILD_BUG_ON(PV_CR4_SHADOW & PV_CR4_READ);
 
 return 0;
@@ -784,6 +791,11 @@ unsigned long pv_guest_cr4_to_real_cr4(const struct vcpu 
*v)
 if ( v->domain->arch.vtsc )
 cr4 |= X86_CR4_TSD;
 
+/* Disable SMAP behind unaware 32bit PV guests. */
+if ( (smap_mode == smap_mode_compat) && is_pv_32bit_vcpu(v) &&
+ ((v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_SMAP) == 0) )
+cr4 &= ~X86_CR4_SMAP;
+
 return cr4;
 }
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index ff34670..36fce42 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -63,10 +63,6 @@
 static bool_t __initdata disable_smep;
 invbool_param("smep", disable_smep);
 
-/* smap: Enable/disable Supervisor Mode Access Prevention (default on). */
-static bool_t __initdata disable_smap;
-invbool_param("smap", disable_smap);
-