On 11.07.19 22:48, Richard Weinberger wrote: > On Thu, Jul 11, 2019 at 8:30 PM Jan Kiszka <[email protected]> wrote: >> >> On 11.07.19 12:25, Richard Weinberger wrote: >>> On Thu, Jul 11, 2019 at 12:21 PM Jan Kiszka <[email protected]> wrote: >>>> Can't reproduce so far, even with a while-true loop. Can you share your >>>> .config? >>> >>> Sure, see attachment. >>> >> >> This seems to fix the issue here: >> >> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S >> index 119fd66d111e..8f647c208cf2 100644 >> --- a/arch/x86/entry/entry_64.S >> +++ b/arch/x86/entry/entry_64.S >> @@ -997,8 +997,8 @@ apicinterrupt IRQ_WORK_VECTOR >> irq_work_interrupt smp_irq_work_interrupt >> \skip_label: >> UNWIND_HINT_REGS >> DISABLE_INTERRUPTS(CLBR_ANY) >> - testl %ebx, %ebx /* %ebx: return to kernel mode */ >> - jnz retint_kernel_early >> + testb $3, CS(%rsp) >> + jz retint_kernel_early >> jmp retint_user_early >> .endif >> 1001: >> >> Tests welcome! > > With that change I can no longer trigger the crash.
Perfect. > Can you please give more context? I'd like to understand the problem. > We were basing the decision whether to switch GS on return or not on a stale register (ebx). That register used to contain the information, but that changed with "x86/entry/64: Remove %ebx handling from error_entry/exit". This caused CPU state corruptions under certain conditions, apparently only when dealing with #DB exceptions, not with the way more frequent #PF. The issue is also present in 4.14, but in 4.4 and the unmaintained 4.9 as I first thought. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
