Re: Wake-up from suspend to RAM broken under `retbleed=stuff`

2023-01-11 Thread Andrew Cooper
On 11/01/2023 11:45 am, Jan Beulich wrote:
> On 11.01.2023 12:39, Andrew Cooper wrote:
>> The bigger issue with stuff accounting is that nothing AFAICT accounts
>> for the fact that any hypercall potentially empties the RSB in otherwise
>> synchronous program flow.
> But that's not just at hypercall boundaries, but effectively anywhere
> (i.e. whenever the hypervisor decides to de-schedule the vCPU)?

Correct, but it's only the RET instructions that reliably underflow the
RSB which can be usefully attacked.

The %rip at which Xen decides to de-schedule a vCPU are random from the
point of view of an attacker.

~Andrew


Re: Wake-up from suspend to RAM broken under `retbleed=stuff`

2023-01-11 Thread Jan Beulich
On 11.01.2023 12:39, Andrew Cooper wrote:
> The bigger issue with stuff accounting is that nothing AFAICT accounts
> for the fact that any hypercall potentially empties the RSB in otherwise
> synchronous program flow.

But that's not just at hypercall boundaries, but effectively anywhere
(i.e. whenever the hypervisor decides to de-schedule the vCPU)?

Jan



Re: Wake-up from suspend to RAM broken under `retbleed=stuff`

2023-01-11 Thread Andrew Cooper
On 11/01/2023 11:20 am, Peter Zijlstra wrote:
> On Mon, Jan 09, 2023 at 04:05:31AM +, Joan Bruguera wrote:
>> This fixes wakeup for me on both QEMU and real HW
>> (just a proof of concept, don't merge)
>>
>> diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
>> index ffea98f9064b..8704bcc0ce32 100644
>> --- a/arch/x86/kernel/callthunks.c
>> +++ b/arch/x86/kernel/callthunks.c
>> @@ -7,6 +7,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include 
>>  #include 
>> @@ -150,6 +151,10 @@ static bool skip_addr(void *dest)
>>  if (dest >= (void *)hypercall_page &&
>>  dest < (void*)hypercall_page + PAGE_SIZE)
>>  return true;
>> +#endif
>> +#ifdef CONFIG_PM_SLEEP
>> +if (dest == restore_processor_state)
>> +return true;
>>  #endif
>>  return false;
>>  }
>> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
>> index 236447ee9beb..e667894936f7 100644
>> --- a/arch/x86/power/cpu.c
>> +++ b/arch/x86/power/cpu.c
>> @@ -281,6 +281,9 @@ static void notrace __restore_processor_state(struct 
>> saved_context *ctxt)
>>  /* Needed by apm.c */
>>  void notrace restore_processor_state(void)
>>  {
>> +/* Restore GS before calling anything to avoid crash on call depth 
>> accounting */
>> +native_wrmsrl(MSR_GS_BASE, saved_context.kernelmode_gs_base);
>> +
>>  __restore_processor_state(_context);
>>  }
> Yeah, I can see why, but I'm not really comfortable with this. TBH, I
> don't see how the whole resume code is correct to begin with. At the
> very least it needs a heavy dose of noinstr.
>
> Rafael, what cr3 is active when we call restore_processor_state()?
>
> Specifically, the problem is that I don't feel comfortable doing any
> sort of weird code until all the CR and segment registers have been
> restored, however, write_cr*() are paravirt functions that result in
> CALL, which then gives us a bit of a checken and egg problem.
>
> I'm also wondering how well retbleed=stuff works on Xen, if at all. If
> we can ignore Xen, things are a little earier perhaps.

I really would like retbleed=stuff to work under Xen PV, because then we
can arrange to start turning off some even more expensive mitigations
that Xen does on behalf of guests.

I have a suspicion that these paths will be used under Xen PV, even if
only for dom0.  The abstraction for host S3/4/5 are not great.  That
said, at all points that guest code is executing, even after a logical
S3 resume, it will have a good GS_BASE  (Assuming the teardown logic
doesn't self-clobber.)

The bigger issue with stuff accounting is that nothing AFAICT accounts
for the fact that any hypercall potentially empties the RSB in otherwise
synchronous program flow.

~Andrew