Re: [Xenomai] ipipe: issues with ARM exception handling

Jan Kiszka Fri, 20 Feb 2015 11:02:10 -0800

On 2015-02-20 19:53, Gilles Chanteperdrix wrote:
> On Fri, Feb 20, 2015 at 07:51:19PM +0100, Jan Kiszka wrote:
>> On 2015-02-20 19:38, Gilles Chanteperdrix wrote:
>>> On Fri, Feb 20, 2015 at 07:03:14PM +0100, Jan Kiszka wrote:
>>>> Hi Gilles,
>>>>
>>>> analyzing a lockdep warning on 3.16 with I-pipe enabled, I dug deeper
>>>> into the hard and virtual interrupt state management during exception
>>>> handling on ARM. I think there are several issues:
>>>>
>>>> - ipipe_fault_entry should not fiddle with the root irq state if run
>>>>   over head, only when invoked over root.
>>>> - ipipe_fault_exit must not change the root state unless we entered over
>>>>   head and are about to leave over root - see x86. The current code may
>>>>   keep root incorrectly stalled after an exception, though this will
>>>>   probably be fixed up again in practice quickly.
>>>> - do_sect_fault is only called by do_DataAbort and do_PrefetchAbort,
>>>>   in both cases already wrapped in ipipe_fault_entry/exit, thus it
>>>>   shouldn't invoke them once again.
>>>>
>>>> Room for optimization:
>>>> - ipipe_fault_entry is always called with hard IRQs off from
>>>>   do_page_fault and do_translation_fault. I suspect this applies to the
>>>>   remaining callers (do_DataAbort and do_PrefetchAbort ) as well. Thus
>>>>   the hard IRQ state is actually known at compile time, right?
>>
>> To follow up on this: do_DataAbort and do_PrefetchAbort are always
>> invoked with hard IRQs disable when a regular exception takes us there.
>> Only the ghost syscall cmpxchg simulates do_DataAbort without adjusting
>> hardware interrupt. It's probably easier to adjust that than to account
>> for hw irqs being potentially on an fault entry.
>>
>>>>
>>>> I can hack up patches, but I'd like to confirm first that I'm not
>>>> missing anything subtle or ARM-specific here.
>>>
>>> Just to explain the original hack.
>>>
>>> Some time ago, the faults handlers were executed irqs on ARM. The
>>> irqs were enabled in entry.S before executing the handlers.
>>>
>>> At some point, this was removed in entry.S and fault handlers
>>> started to be executed irqs off. On ARM, all faults relax to be
>>> handled in secondary mode, actually there is an exception, the FPU,
>>> but it goes through a completely different path which had always
>>> been executed irqs off until recently where the irqs are reenabled
>>> when accessing user-space to be able to handle faults without
>>> lockups. 
>>>
>>> My concern was that the code thus executed could have assertion
>>> about the root domain being stalled which would be fail, so I added
>>> code which stalled root and enabled hardware irqs on fault entry and
>>> unstalled root and disabled hardware irqs on fault exit (which
>>> always happen on root domain). This should have worked even if a fault
>>> had happened to be handled in head domain, because then the
>>> operation would have been a nop (simply stall/then unstall). 
>>>
>>> But Philippe found this dumb approach to fail when working on LPAE,
>>> IIRC. IIRC, namely, if the root domain happens to be stalled when
>>> entering a fault over head domain, it would end up unstalled after
>>> the operation. So, I believe the code he added saves the stall state
>>> on fault entry and restores it on fault exit. I have checked
>>> Philippe's code details at the time and did not find anything wrong.
>>
>> I suspect the LPAE scenario takes the do_page_fault path? Then it should
>> rather be solved by providing the right information to or preventing
>> the execution of
>>
>>      /* Enable interrupts if they were enabled in the parent context. */
>>      if (interrupts_enabled(regs))
>>              local_irq_enable();
>>
>> Now we unconditionally restore to the root state on entry, overwriting
>> what may happen to it during the handler execution - specifically via
>> the snippet above.
> 
> This code is part of the mainline kernel.


Correct. But we can adjust it to take interrupt virtualization and
domain migration into account.


I just also followed hook_ifault_code to find out if there is something
unaddressed /wrt I-pipe. There is: hardware break/watchpoints. If
triggered over head, they will go to hw_breakpoint_pending directly
without prior domain migration. We probably want another
__ipipe_report_trap here.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] ipipe: issues with ARM exception handling

Reply via email to