On 05.06.2023 09:48, Jan Beulich wrote: > On 02.06.2023 15:57, Andrew Cooper wrote: >> On 02/06/2023 10:56 am, Jan Beulich wrote: >>> On 01.06.2023 16:48, Andrew Cooper wrote: >>>> @@ -593,15 +596,85 @@ static bool __init retpoline_calculations(void) >>>> return false; >>>> >>>> /* >>>> - * RSBA may be set by a hypervisor to indicate that we may move to a >>>> - * processor which isn't retpoline-safe. >>>> + * The meaning of the RSBA and RRSBA bits have evolved over time. The >>>> + * agreed upon meaning at the time of writing (May 2023) is thus: >>>> + * >>>> + * - RSBA (RSB Alternative) means that an RSB may fall back to an >>>> + * alternative predictor on underflow. Skylake uarch and later all >>>> have >>>> + * this property. Broadwell too, when running microcode versions >>>> prior >>>> + * to Jan 2018. >>>> + * >>>> + * - All eIBRS-capable processors suffer RSBA, but eIBRS also >>>> introduces >>>> + * tagging of predictions with the mode in which they were learned. >>>> So >>>> + * when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA). >>>> + * >>>> + * - CPUs are not expected to enumerate both RSBA and RRSBA. >>>> + * >>>> + * Some parts (Broadwell) are not expected to ever enumerate this >>>> + * behaviour directly. Other parts have differing enumeration with >>>> + * microcode version. Fix up Xen's idea, so we can advertise them >>>> safely >>>> + * to guests, and so toolstacks can level a VM safety for migration. >>>> + * >>>> + * The following states exist: >>>> + * >>>> + * | | RSBA | EIBRS | RRSBA | Notes | Action | >>>> + * |---+------+-------+-------+--------------------+---------------| >>>> + * | 1 | 0 | 0 | 0 | OK (older parts) | Maybe +RSBA | >>>> + * | 2 | 0 | 0 | 1 | Broken | +RSBA, -RRSBA | >>>> + * | 3 | 0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA | >>>> + * | 4 | 0 | 1 | 1 | OK | | >>>> + * | 5 | 1 | 0 | 0 | OK | | >>>> + * | 6 | 1 | 0 | 1 | Broken | -RRSBA | >>>> + * | 7 | 1 | 1 | 0 | Broken | -RSBA, +RRSBA | >>>> + * | 8 | 1 | 1 | 1 | Broken | -RSBA | >>>> * >>>> + * However, we doesn't need perfect adherence to the spec. Identify >>>> the >>>> + * broken cases (so we stand a chance of spotting violated >>>> assumptions), >>>> + * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify >>>> + * "alternative predictors potentially in use". >>> Considering that it's rows 2, 6, 7, and 8 which are broken, I find this >>> comment a little misleading. To me it doesn't become clear whether them >>> subsequently being left alone (and merely a log message issued) is >>> intentional. >> >> It is intentional. >> >> I don't know if these combinations exist in practice anywhere or not. >> Intel think they oughtn't to, and it's quite possible that the printk() >> is unreachable, but given the complexity and shifting meanings over time >> here, I think it would be unwise to simply assume this to be true. > > I agree.
Thinking of it - would we perhaps want to go a step further an taint the system in such a case? I would then view this as kind of "Xen not (security) supported on this hardware." Until we manage to fix (or work around) the issue. Jan