On 05/03/2021 15:37, Jan Beulich wrote:
> Since we don't need to encode all of the PTE flags, we have enough bits
> in the shadow entry to store the full GFN. Don't use literal numbers -
> instead derive the involved values. Or, where derivation would become
> too ugly, sanity-check the result (invoking #error to identify failure).
>
> This then allows dropping from sh_l1e_mmio() again the guarding against
> too large GFNs.
>
> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> ---
> I wonder if the respective check in sh_audit_l1_table() is actually
> useful to retain with these changes.
>
> --- a/xen/arch/x86/mm/shadow/types.h
> +++ b/xen/arch/x86/mm/shadow/types.h
> @@ -283,9 +283,17 @@ shadow_put_page_from_l1e(shadow_l1e_t sl
>   * This is only feasible for PAE and 64bit Xen: 32-bit non-PAE PTEs don't
>   * have reserved bits that we can use for this.  And even there it can only
>   * be used if we can be certain the processor doesn't use all 52 address 
> bits.
> + *
> + * For the MMIO encoding (see below) we need the bottom 4 bits for
> + * identifying the kind of entry and a full GFN's worth of bits to encode
> + * the originating frame number.  Set all remaining bits to trigger
> + * reserved bit faults, if (see above) the hardware permits triggering such.
>   */
>  
> -#define SH_L1E_MAGIC 0xffffffff00000001ULL
> +#define SH_L1E_MAGIC_NR_META_BITS 4
> +#define SH_L1E_MAGIC_MASK ((~0ULL << (PADDR_BITS - PAGE_SHIFT + \
> +                                      SH_L1E_MAGIC_NR_META_BITS)) | \
> +                           _PAGE_PRESENT)
>  
>  static inline bool sh_have_pte_rsvd_bits(void)
>  {
> @@ -294,7 +302,8 @@ static inline bool sh_have_pte_rsvd_bits
>  
>  static inline bool sh_l1e_is_magic(shadow_l1e_t sl1e)
>  {
> -    return (sl1e.l1 & SH_L1E_MAGIC) == SH_L1E_MAGIC;
> +    BUILD_BUG_ON(!(PADDR_MASK & SH_L1E_MAGIC_MASK));
> +    return (sl1e.l1 & SH_L1E_MAGIC_MASK) == SH_L1E_MAGIC_MASK;
>  }
>  
>  /* Guest not present: a single magic value */
> @@ -320,20 +329,26 @@ static inline bool sh_l1e_is_gnp(shadow_
>  
>  /*
>   * MMIO: an invalid PTE that contains the GFN of the equivalent guest l1e.
> - * We store 28 bits of GFN in bits 4:32 of the entry.
> + * We store the GFN in bits 4:43 of the entry.
>   * The present bit is set, and the U/S and R/W bits are taken from the guest.
>   * Bit 3 is always 0, to differentiate from gnp above.
>   */
> -#define SH_L1E_MMIO_MAGIC       0xffffffff00000001ULL
> -#define SH_L1E_MMIO_MAGIC_MASK  0xffffffff00000009ULL
> -#define SH_L1E_MMIO_GFN_MASK    0x00000000fffffff0ULL
> +#define SH_L1E_MMIO_MAGIC       SH_L1E_MAGIC_MASK
> +#define SH_L1E_MMIO_MAGIC_BIT   ((_PAGE_PRESENT | _PAGE_RW | _PAGE_USER) + 1)
> +#if SH_L1E_MMIO_MAGIC_BIT & (SH_L1E_MMIO_MAGIC_BIT - 1)
> +# error SH_L1E_MMIO_MAGIC_BIT needs to be a power of 2
> +#endif
> +#if SH_L1E_MMIO_MAGIC_BIT >> SH_L1E_MAGIC_NR_META_BITS
> +# error SH_L1E_MMIO_MAGIC_BIT and SH_L1E_MAGIC_NR_META_BITS are out of sync
> +#endif
> +#define SH_L1E_MMIO_MAGIC_MASK  (SH_L1E_MAGIC_MASK | SH_L1E_MMIO_MAGIC_BIT)
> +#define SH_L1E_MMIO_GFN_MASK    ~(SH_L1E_MMIO_MAGIC_MASK | _PAGE_RW | 
> _PAGE_USER)

In practice, it is 4:36, because we have to limit shadow guests to 32
bits of gfn for XSA-173 (width of the superpage backpointer IIRC).

Also, this property is important for L1TF.  The more guest-controllable
bits we permit in here, the greater the chance of being vulnerable to
L1TF on massive machines.

(I'm a little concerned that I can't spot an L1TF comment which has been
made stale by these changes...  Probably the fault of whichever numpty
prepared the L1TF patches, because I'm certain we discussed this at the
time)

Going from 32 to 36 bits moves the upper safety barrier from TOP-4G to
TOP-64G but I recall us agreed that that was ok, especially as the main
safety guestimate is "no RAM in the top quarter of the address space".

However, I don't think we want to accidentally creep beyond bit 36, so
I'd suggest that the easy fix here is just adjusting a nibble in the
MMIO masks.

~Andrew


Reply via email to