On Thu, Jul 11, 2019 at 11:20 PM Jan Kiszka <jan.kis...@siemens.com> wrote: > > On 11.07.19 22:48, Richard Weinberger wrote: > > On Thu, Jul 11, 2019 at 8:30 PM Jan Kiszka <jan.kis...@siemens.com> wrote: > >> > >> On 11.07.19 12:25, Richard Weinberger wrote: > >>> On Thu, Jul 11, 2019 at 12:21 PM Jan Kiszka <jan.kis...@siemens.com> > >>> wrote: > >>>> Can't reproduce so far, even with a while-true loop. Can you share your > >>>> .config? > >>> > >>> Sure, see attachment. > >>> > >> > >> This seems to fix the issue here: > >> > >> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > >> index 119fd66d111e..8f647c208cf2 100644 > >> --- a/arch/x86/entry/entry_64.S > >> +++ b/arch/x86/entry/entry_64.S > >> @@ -997,8 +997,8 @@ apicinterrupt IRQ_WORK_VECTOR > >> irq_work_interrupt smp_irq_work_interrupt > >> \skip_label: > >> UNWIND_HINT_REGS > >> DISABLE_INTERRUPTS(CLBR_ANY) > >> - testl %ebx, %ebx /* %ebx: return to kernel mode */ > >> - jnz retint_kernel_early > >> + testb $3, CS(%rsp) > >> + jz retint_kernel_early > >> jmp retint_user_early > >> .endif > >> 1001: > >> > >> Tests welcome! > > > > With that change I can no longer trigger the crash. > > Perfect. > > > Can you please give more context? I'd like to understand the problem. > > > > We were basing the decision whether to switch GS on return or not on a stale > register (ebx). That register used to contain the information, but that > changed > with "x86/entry/64: Remove %ebx handling from error_entry/exit". This caused > CPU > state corruptions under certain conditions, apparently only when dealing with > #DB exceptions, not with the way more frequent #PF.
Ah! Upstream b3681dd548d0 ("x86/entry/64: Remove %ebx handling from error_entry/exit") changed ebx to CS. Now things make sense again. :-) Thanks for the quick fix and the explanation! -- Thanks, //richard