On 05/10/2013 06:22 PM, Jan Kiszka wrote:
> On 2013-05-10 17:44, Gilles Chanteperdrix wrote:
>> On 05/10/2013 05:27 PM, Gilles Chanteperdrix wrote:
>>
>>> On 01/18/2013 04:38 PM, Jan Kiszka wrote:
>>>
>>>> This fixes a nasty bug on SMP boxes: We may migrate to root in the
>>>> context of an IRQ handler, and then also to a different CPU. Therefore,
>>>> we must not use domain contexts read before the invocation but update
>>>> them afterward or use stable information like the domain reference.
>>>>
>>>> Signed-off-by: Jan Kiszka <[email protected]>
>>>> ---
>>>>
>>>> We are still facing stalled, unkillable RT processes despite this fix,
>>>> but at least the head domain status corruption (and related warnings)
>>>> seems to be gone now.
>>>>
>>>> kernel/ipipe/core.c | 5 +++--
>>>> 1 files changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/ipipe/core.c b/kernel/ipipe/core.c
>>>> index 68af0b3..6aa9572 100644
>>>> --- a/kernel/ipipe/core.c
>>>> +++ b/kernel/ipipe/core.c
>>>> @@ -1173,18 +1173,19 @@ static void dispatch_irq_head(unsigned int irq) /*
>>>> hw interrupts off */
>>>> head->irqs[irq].handler(irq, head->irqs[irq].cookie);
>>>> __ipipe_run_irqtail(irq);
>>>> hard_local_irq_disable();
>>>> + p = ipipe_this_cpu_head_context();
>>>> __clear_bit(IPIPE_STALL_FLAG, &p->status);
>>>>
>>>> /* Are we still running in the head domain? */
>>>> if (likely(__ipipe_current_context == p)) {
>>>> /* Did we enter this code over the head domain? */
>>>> - if (old == p) {
>>>> + if (old->domain == head) {
>>>> /* Yes, do immediate synchronization. */
>>>> if (__ipipe_ipending_p(p))
>>>> __ipipe_sync_stage();
>>>> return;
>>>> }
>>>> - __ipipe_set_current_context(old);
>>>> + __ipipe_set_current_context(ipipe_this_cpu_root_context());
>>>> }
>>>>
>>>> /*
>>>
>>>
>>> Hi Jan,
>>>
>>> I may have an issue reported which resembles the problem fixed by this
>>> patch. If I understand your patch, it means that an irq handler may
>>> migrate domain.
>
> Likely not the IRQ handler itself but its bottom-half that may perform a
> reschedule to an RT task that decides to migrate to handle a fault or a
> Linux syscall etc.
In my case, it is probably due to the way signals are handled by the
posix skin: xnshadow_relax() is called in a signal handler invoked by
xnpod_schedule() (xnpod_dispatch_signals()), and I suspect the system go
south when this happens on the context of an irq handler (head domain
irq handlers do invoke xnpod_schedule()). I am probably going to replace
xnshadow_relax() with xnshadow_call_mayday(), as it seems a cleaner way
to force a migration to secondary mode.
--
Gilles.
_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai