On 05.06.2023 09:48, Jan Beulich wrote:
> On 02.06.2023 15:57, Andrew Cooper wrote:
>> On 02/06/2023 10:56 am, Jan Beulich wrote:
>>> On 01.06.2023 16:48, Andrew Cooper wrote:
>>>> @@ -593,15 +596,85 @@ static bool __init retpoline_calculations(void)
>>>>          return false;
>>>>  
>>>>      /*
>>>> -     * RSBA may be set by a hypervisor to indicate that we may move to a
>>>> -     * processor which isn't retpoline-safe.
>>>> +     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
>>>> +     * agreed upon meaning at the time of writing (May 2023) is thus:
>>>> +     *
>>>> +     * - RSBA (RSB Alternative) means that an RSB may fall back to an
>>>> +     *   alternative predictor on underflow.  Skylake uarch and later all 
>>>> have
>>>> +     *   this property.  Broadwell too, when running microcode versions 
>>>> prior
>>>> +     *   to Jan 2018.
>>>> +     *
>>>> +     * - All eIBRS-capable processors suffer RSBA, but eIBRS also 
>>>> introduces
>>>> +     *   tagging of predictions with the mode in which they were learned. 
>>>>  So
>>>> +     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
>>>> +     *
>>>> +     * - CPUs are not expected to enumerate both RSBA and RRSBA.
>>>> +     *
>>>> +     * Some parts (Broadwell) are not expected to ever enumerate this
>>>> +     * behaviour directly.  Other parts have differing enumeration with
>>>> +     * microcode version.  Fix up Xen's idea, so we can advertise them 
>>>> safely
>>>> +     * to guests, and so toolstacks can level a VM safety for migration.
>>>> +     *
>>>> +     * The following states exist:
>>>> +     *
>>>> +     * |   | RSBA | EIBRS | RRSBA | Notes              | Action        |
>>>> +     * |---+------+-------+-------+--------------------+---------------|
>>>> +     * | 1 |    0 |     0 |     0 | OK (older parts)   | Maybe +RSBA   |
>>>> +     * | 2 |    0 |     0 |     1 | Broken             | +RSBA, -RRSBA |
>>>> +     * | 3 |    0 |     1 |     0 | OK (pre-Aug ucode) | +RRSBA        |
>>>> +     * | 4 |    0 |     1 |     1 | OK                 |               |
>>>> +     * | 5 |    1 |     0 |     0 | OK                 |               |
>>>> +     * | 6 |    1 |     0 |     1 | Broken             | -RRSBA        |
>>>> +     * | 7 |    1 |     1 |     0 | Broken             | -RSBA, +RRSBA |
>>>> +     * | 8 |    1 |     1 |     1 | Broken             | -RSBA         |
>>>>       *
>>>> +     * However, we doesn't need perfect adherence to the spec.  Identify 
>>>> the
>>>> +     * broken cases (so we stand a chance of spotting violated 
>>>> assumptions),
>>>> +     * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify
>>>> +     * "alternative predictors potentially in use".
>>> Considering that it's rows 2, 6, 7, and 8 which are broken, I find this
>>> comment a little misleading. To me it doesn't become clear whether them
>>> subsequently being left alone (and merely a log message issued) is
>>> intentional.
>>
>> It is intentional.
>>
>> I don't know if these combinations exist in practice anywhere or not. 
>> Intel think they oughtn't to, and it's quite possible that the printk()
>> is unreachable, but given the complexity and shifting meanings over time
>> here, I think it would be unwise to simply assume this to be true.
> 
> I agree.

Thinking of it - would we perhaps want to go a step further an taint the
system in such a case? I would then view this as kind of "Xen not
(security) supported on this hardware." Until we manage to fix (or work
around) the issue.

Jan

Reply via email to