On 05/10/2013 05:27 PM, Gilles Chanteperdrix wrote:
> On 01/18/2013 04:38 PM, Jan Kiszka wrote:
>
>> This fixes a nasty bug on SMP boxes: We may migrate to root in the
>> context of an IRQ handler, and then also to a different CPU. Therefore,
>> we must not use domain contexts read before the invocation but update
>> them afterward or use stable information like the domain reference.
>>
>> Signed-off-by: Jan Kiszka <[email protected]>
>> ---
>>
>> We are still facing stalled, unkillable RT processes despite this fix,
>> but at least the head domain status corruption (and related warnings)
>> seems to be gone now.
>>
>> kernel/ipipe/core.c | 5 +++--
>> 1 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/ipipe/core.c b/kernel/ipipe/core.c
>> index 68af0b3..6aa9572 100644
>> --- a/kernel/ipipe/core.c
>> +++ b/kernel/ipipe/core.c
>> @@ -1173,18 +1173,19 @@ static void dispatch_irq_head(unsigned int irq) /*
>> hw interrupts off */
>> head->irqs[irq].handler(irq, head->irqs[irq].cookie);
>> __ipipe_run_irqtail(irq);
>> hard_local_irq_disable();
>> + p = ipipe_this_cpu_head_context();
>> __clear_bit(IPIPE_STALL_FLAG, &p->status);
>>
>> /* Are we still running in the head domain? */
>> if (likely(__ipipe_current_context == p)) {
>> /* Did we enter this code over the head domain? */
>> - if (old == p) {
>> + if (old->domain == head) {
>> /* Yes, do immediate synchronization. */
>> if (__ipipe_ipending_p(p))
>> __ipipe_sync_stage();
>> return;
>> }
>> - __ipipe_set_current_context(old);
>> + __ipipe_set_current_context(ipipe_this_cpu_root_context());
>> }
>>
>> /*
>
>
> Hi Jan,
>
> I may have an issue reported which resembles the problem fixed by this
> patch. If I understand your patch, it means that an irq handler may
> migrate domain. But if I read __ipipe_sync_stage() code:
>
> void __ipipe_do_sync_stage(void)
> {
> struct ipipe_percpu_domain_data *p;
> struct ipipe_domain *ipd;
> int irq;
>
> p = __ipipe_current_context;
> ipd = p->domain;
>
> __set_bit(IPIPE_STALL_FLAG, &p->status);
> smp_wmb();
>
> if (ipd == ipipe_root_domain)
> trace_hardirqs_off();
>
> for (;;) {
> irq = __ipipe_next_irq(p);
> if (irq < 0)
> break;
> /*
> * Make sure the compiler does not reorder wrongly, so
> * that all updates to maps are done before the
> * handler gets called.
> */
> barrier();
>
> if (test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
> continue;
>
> if (ipd != ipipe_head_domain)
> hard_local_irq_enable();
>
> if (likely(ipd != ipipe_root_domain)) {
> ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie);
> __ipipe_run_irqtail(irq);
> hard_local_irq_disable();
> } else if (ipipe_virtual_irq_p(irq)) {
> irq_enter();
> ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie);
> irq_exit();
> root_stall_after_handler();
> hard_local_irq_disable();
> while (__ipipe_check_root_resched())
> __ipipe_preempt_schedule_irq();
> } else {
> ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie);
> root_stall_after_handler();
> hard_local_irq_disable();
> }
>
> p = __ipipe_current_context;
> }
>
> if (ipd == ipipe_root_domain)
> trace_hardirqs_on();
>
> __clear_bit(IPIPE_STALL_FLAG, &p->status);
> }
>
> It seems to me that at the end of each loop, we re-read p, but we assume
> that p->domain did not change. So I would add:
>
> ipd = p->domain
>
> right after the line
> p = __ipipe_current_context
>
> But I am not sure that we are supposed to allow irq handlers to migrate
> domains.
And I am not sure that this would not break the logic of
__ipipe_sync_stage, so, maybe replacing:
p = __ipipe_current_context
with
p = ipipe_this_cpu_context(ipd)
would make more sense?
--
Gilles.
_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai