On 11.07.19 22:48, Richard Weinberger wrote:
> On Thu, Jul 11, 2019 at 8:30 PM Jan Kiszka <jan.kis...@siemens.com> wrote:
>>
>> On 11.07.19 12:25, Richard Weinberger wrote:
>>> On Thu, Jul 11, 2019 at 12:21 PM Jan Kiszka <jan.kis...@siemens.com> wrote:
>>>> Can't reproduce so far, even with a while-true loop. Can you share your 
>>>> .config?
>>>
>>> Sure, see attachment.
>>>
>>
>> This seems to fix the issue here:
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 119fd66d111e..8f647c208cf2 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -997,8 +997,8 @@ apicinterrupt IRQ_WORK_VECTOR                       
>> irq_work_interrupt              smp_irq_work_interrupt
>>  \skip_label:
>>         UNWIND_HINT_REGS
>>         DISABLE_INTERRUPTS(CLBR_ANY)
>> -       testl   %ebx, %ebx      /* %ebx: return to kernel mode */
>> -       jnz     retint_kernel_early
>> +       testb   $3, CS(%rsp)
>> +       jz      retint_kernel_early
>>         jmp     retint_user_early
>>         .endif
>>  1001:
>>
>> Tests welcome!
> 
> With that change I can no longer trigger the crash.

Perfect.

> Can you please give more context? I'd like to understand the problem.
> 

We were basing the decision whether to switch GS on return or not on a stale
register (ebx). That register used to contain the information, but that changed
with "x86/entry/64: Remove %ebx handling from error_entry/exit". This caused CPU
state corruptions under certain conditions, apparently only when dealing with
#DB exceptions, not with the way more frequent #PF.

The issue is also present in 4.14, but in 4.4 and the unmaintained 4.9 as I
first thought.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

Reply via email to