[Xenomai] A possible mis-interaction between CONFIG_PREEMPT and GPIO IRQ handling for ARM, leading to extreme latency

Mitchell Tasman Mon, 21 May 2012 17:32:42 -0700

Hi. I'm running an OMAP DM3725-based board with a 2.6.38.8 kernel, amid-May 2012 git snapshot of xenomai-2.6, and a backport (really justapplication) of the patch that Gilles referenced in the "IRQ latency"thread last week:


http://git.xenomai.org/?p=ipipe-gch.git;a=commit;h=81bfc05c4716b76e79f5e486baf4c52015a3b92c

Of perhaps critical note is that I'm building Linux with CONFIG_PREEMPT.The possible problem that I'm about to describe appears specific toCONFIG_PREEMPT being defined. To date, I have not been able to triggerthe problem if I rebuild the kernel with CONFIG_NONE.

Almost exactly as with last week's "IRQ latency" thread, I have a GPIO(164 in my case) configured as an input, and configured to trigger areal-time interrupt upon a falling edge of the signal. The virtualinterrupt number assigned by Xenomai for that GPIO is 324 (0x144 hex).

What I have found via I-Pipe tracing is that sometimes many, manymilliseconds elapse within the call to __ipipe_handle_irq() from theinlined function ipipe_handle_chained_irq() in gpio_demux_inner(), e.g.:

:|   +begin   0x00000144 -33547    0.925  gpio_demux_inner+0x74 
(gpio_irq_handler+0x178)
:|   +end     0x00000144  -196+   1.370  gpio_demux_inner+0x88 
(gpio_irq_handler+0x178)

The present I-Pipe for 2.6.38.8 for ARM patch set patches theplat-omap-specific implementation of gpio_irq_handler() tounconditionally disable unmasking of the GPIO bank interrupt until justbefore that function returns. So, if we somehow get "stuck" ingpio_demux_inner() for an extended period, the bank interrupt remainsmasked, and new GPIO line state transitions will go unhandled duringthat period.

I find that the GPIO interrupt is processed as would be expected, andany RT tasks that have work to do are scheduled to do that work.

What appears to be going awry is that when it becomes time to return toLinux domain, the I-Pipe machinery finds that there is an interruptpending for that domain. Rather than gpio_demux_inner() running tocompletion, and winding back to gpio_handle_irq(), the new interrupt isprocessed, and it can be many, many milliseconds beforegpio_demux_inner() gets a chance to run and finish its work.

As it happens, my code is the source of that new (virtual) interrupt,and it's possible that I'm exercising a use case that hasn't beenconsidered. On the other hand, I could easily imagine other cases whereinterrupts happen to appear at arbitrary points along the timeline.


The sequence of events is as follows:

1. The interrupt fires while the system is running in the Linux domain,and the real-time interrupt handler for GPIO 164 calls rt_sem_v() towake up a native Xenomai kernel task that is blocked on rt_sem_p().I can see that from Xenomai's perspective, processing of the interruptfinishes just after that point, e.g. around -33527 in this sample:

:|   +func               -33558    0.814  __ipipe_grab_irq+0x10 (__irq_svc+0x3c)
:|   +begin   0xffffffff -33557    0.851  __ipipe_grab_irq+0x60 (__irq_svc+0x3c)
:|   +func               -33556+   1.481  __ipipe_handle_irq+0x14 
(__ipipe_grab_irq+0x6c)
:|   +func               -33555+   1.037  __ipipe_set_irq_pending+0x10 
(__ipipe_handle_irq+0x120)
:|   +func               -33554    0.925  gpio_irq_handler+0x10 
(__ipipe_handle_irq+0x150)
:|   +func               -33553+   1.333  omap_mask_ack_irq+0x10 
(gpio_irq_handler+0x2c)
:|   +func               -33552    1.000  irq_get_irq_data+0x10 
(gpio_irq_handler+0x34)
:|   +(0x01)  0xfb058018 -33551+   1.777  gpio_irq_handler+0x1b0 
(__ipipe_handle_irq+0x150) /* custom trace point */
:|   +(0x02)  0x00000010 -33549    0.629  gpio_irq_handler+0x15c 
(__ipipe_handle_irq+0x150) /* custom trace point */
:|   +func               -33548    0.740  gpio_demux_inner+0x10 
(gpio_irq_handler+0x178)
:|   +begin   0x00000144 -33547    0.925  gpio_demux_inner+0x74 
(gpio_irq_handler+0x178)
:|   +func               -33547    0.740  __ipipe_handle_irq+0x14 
(gpio_demux_inner+0x80)
:|   +func               -33546    0.925  __ipipe_set_irq_pending+0x10 
(__ipipe_handle_irq+0x120)
:|   +func               -33545    0.925  __ipipe_ack_edge_irq+0x10 
(__ipipe_handle_irq+0x150)
:|   +func               -33544    0.962  gpio_ack_irq+0x10 
(__ipipe_ack_edge_irq+0x24)
:|   +func               -33543+   1.444  __ipipe_walk_pipeline+0x10 
(__ipipe_handle_irq+0x1a0)
:|  + func               -33542    0.962  ipipe_suspend_domain+0x10 
(__ipipe_walk_pipeline+0x68)
:|  + func               -33541+   1.037  __ipipe_sync_stage+0x14 
(ipipe_suspend_domain+0x9c)
:|  # func               -33540+   1.592  xnintr_irq_handler+0x14 
(__ipipe_sync_stage+0x124)
:|  # func               -33538    1.000  my_isr_rt+0x10 [phy_srv] 
(xnintr_irq_handler+0x60)

/* do a bit of work */

:|  # func               -33533    0.703  rt_sem_v+0x10 (physrv_htm_isr_rt+0x24 
[phy_srv])
:|  # func               -33532    1.000  xnsynch_wakeup_one_sleeper+0x10 
(rt_sem_v+0x64)
:|  # func               -33531    0.925  xnpod_resume_thread+0x10 
(xnsynch_wakeup_one_sleeper+0xac)
:|  # [   -1] -<?>-   60 -33530+   1.851  xnpod_resume_thread+0x64 
(xnsynch_wakeup_one_sleeper+0xac)
:|  # func               -33528+   1.222  rthal_irq_end+0x10 
(xnintr_irq_handler+0x138)
:|  # func               -33527+   1.333  __ipipe_end_edge_irq+0x10 
(rthal_irq_end+0x3c)
:|  # func               -33526+   1.037  __xnpod_schedule+0x14 
(xnintr_irq_handler+0x1cc)
:|  # [ 1633] -<?>-   -1 -33525    0.777  __xnpod_schedule+0x9c 
(xnintr_irq_handler+0x1cc)
:|  # func               -33524+   1.555  xnsched_pick_next+0x10 
(__xnpod_schedule+0xd4)
:|  # func               -33523    0.888  ipipe_mute_pic+0x10 
(__xnpod_schedule+0x17c)
:|  # func               -33522+   2.111  omap3_intc_mute+0x10 
(ipipe_mute_pic+0x1c)
:|  # func               -33520    0.703  ipipe_unstall_pipeline_head+0x10 
(__xnpod_schedule+0x2ac)
:|  + end     0x80000000 -33519+   1.851  ipipe_unstall_pipeline_head+0x88 
(__xnpod_schedule+0x2ac)

2. That task that had been waiting on rt_sem_p() awakens and collectsthe data signaled by the GPIO transition, and in turn wakes up anothernative Xenomai kernel task via rt_sem_v().


3.  The second task invokes a write entry point on an RTDM driver.

4. The RTDM driver may decide to place a copy of the data in a knownlocation in kernel memory, and call rtdm_nrtsig_pend() to invoke anon-realtime handler in Linux kernel space. This results in a virtualinterrupt being generated for the Linux domain, e.g.:

:   + func               -33238+   1.037  ipipe_trigger_irq+0x10 (my_write+0x84 
[my_rtdm_driver])
:|  + begin   0x80000001 -33237    0.925  ipipe_trigger_irq+0x74 (my_write+0x84 
[my_rtdm_driver])
:|  + func               -33236+   1.481  __ipipe_handle_irq+0x14 
(ipipe_trigger_irq+0x80)
:|  + func               -33235+   1.370  __ipipe_set_irq_pending+0x10 
(__ipipe_handle_irq+0x120)
:|  + end     0x80000001 -33233+   1.555  ipipe_trigger_irq+0x90 (my_write+0x84 
[my_rtdm_driver])


5.  The non-realtime handler schedules a softint via tasklet_schedule()

6.  The tasklet retrieves the data posted by the real-time task.

My hope would be that steps 5 and 6 would only run aftergpio_demux_inner() had a chance to unwind back to gpio_handle_irq().Instead, upon transition back to the Linux domain, the new interrupt isprocessed instead, followed by the soft IRQ, and then Linux does varioushousekeeping tasks, here's an example snippet:

:   + func               -33119    1.000  xnsched_finish_unlocked_switch+0x10 
(__xnpod_schedule+0x4c0)
:|  + begin   0x80000000 -33118+   1.037  xnsched_finish_unlocked_switch+0x30 
(__xnpod_schedule+0x4c0)
:|  # [ 1633] -<?>-   -1 -33116+   1.481  __xnpod_schedule+0x4e4 
(xnintr_irq_handler+0x1cc)
:|   +func               -33115+   1.222  __ipipe_sync_stage+0x14 
(ipipe_suspend_domain+0x9c)
:|   #end     0x80000000 -33114    0.851  __ipipe_sync_stage+0xf8 
(ipipe_suspend_domain+0x9c)
:    #func               -33113    0.666  __ipipe_do_IRQ+0x10 
(__ipipe_sync_stage+0x1bc)
:    #func               -33112+   1.111  asm_do_IRQ+0x10 (__ipipe_do_IRQ+0x1c)
:    #func               -33111    0.851  irq_enter+0x10 (asm_do_IRQ+0x28)
:    #func               -33110    0.925  rcu_exit_nohz+0x10 (irq_enter+0x18)
:    #func               -33109+   1.407  idle_cpu+0x10 (irq_enter+0x20)
:    #func               -33108    0.814  ipipe_check_context+0x10 
(irq_enter+0x60)
:    #func               -33107+   1.074  __ipipe_noack_irq+0x10 
(asm_do_IRQ+0x84)
:    #func               -33106    0.888  irq_exit+0x10 (asm_do_IRQ+0x88)
:    #func               -33105+   1.111  ipipe_check_context+0x10 
(irq_exit+0x1c)
:    #func               -33104    1.000  rcu_enter_nohz+0x10 (irq_exit+0x60)
:    #func               -33103    0.703  idle_cpu+0x10 (irq_exit+0x68)
:    #func               -33102    0.740  ipipe_check_context+0x10 
(irq_exit+0xa8)
:|   #begin   0x80000000 -33102    0.888  __ipipe_sync_stage+0x1d4 
(ipipe_suspend_domain+0x9c)
:|   #end     0x80000000 -33101    0.592  __ipipe_sync_stage+0xf8 
(ipipe_suspend_domain+0x9c)
:    #func               -33100    0.888  irq_enter+0x10 
(__ipipe_sync_stage+0x14c)
:    #func               -33099    0.666  rcu_exit_nohz+0x10 (irq_enter+0x18)
:    #func               -33099    0.629  idle_cpu+0x10 (irq_enter+0x20)
:    #func               -33098+   1.148  ipipe_check_context+0x10 
(irq_enter+0x60)

/* __tasklet_schedule */
...

:    #func               -33091    0.592  __ipipe_restore_root+0x10 
(__tasklet_schedule+0x12c)
:    #func               -33090    0.962  ipipe_check_context+0x10 
(__ipipe_restore_root+0x20)
:|   #begin   0x80000001 -33089    0.666  __ipipe_restore_root+0x40 
(__tasklet_schedule+0x12c)
:|   #end     0x80000001 -33088    0.666  __ipipe_restore_root+0x60 
(__tasklet_schedule+0x12c)
:    #func               -33088    0.851  irq_exit+0x10 
(__ipipe_sync_stage+0x168)
:    #func               -33087    0.740  ipipe_check_context+0x10 
(irq_exit+0x1c)
:    #func               -33086+   1.111  __do_softirq+0x14 (irq_exit+0x5c)

...

We don't seem to get around to finishing gpio_demux_inner() until time -196.

:|   +begin   0x80000000  -219+   1.481  schedule+0x5ac (schedule_timeout+0x2ec)
:|   +func                -218    1.000  
__ipipe_switch_to_notifier_call_chain+0x10 (__switch_to+0x2c)
:|   #func                -217    0.666  atomic_notifier_call_chain+0x14 
(__ipipe_switch_to_notifier_call_chain+0x6c)
:|   #func                -216    0.629  __atomic_notifier_call_chain+0x14 
(atomic_notifier_call_chain+0x28)
:|   #func                -216    0.962  ipipe_check_context+0x10 
(__atomic_notifier_call_chain+0x3c)
:|   #func                -215    0.703  notifier_call_chain+0x10 
(__atomic_notifier_call_chain+0x64)
:|   #func                -214    0.851  vfp_notifier+0x10 
(notifier_call_chain+0x3c)
:|   #func                -213+   1.111  thumbee_notifier+0x10 
(notifier_call_chain+0x3c)
:|   #func                -212+   1.666  ipipe_check_context+0x10 
(__atomic_notifier_call_chain+0x70
:|   +end     0x80000000  -210+   1.111  schedule+0x5d4 
(preempt_schedule_irq+0x64)
:    +func                -209    0.666  finish_task_switch+0x10 
(schedule+0x60c)
:|   +begin   0x80000001  -209    0.703  finish_task_switch+0x40 
(schedule+0x60c)
:|   #end     0x80000001  -208    0.925  finish_task_switch+0x60 
(schedule+0x60c)
:    #func                -207    0.888  __ipipe_unstall_root+0x10 
(finish_task_switch+0x8c)
:|   #begin   0x80000000  -206    0.666  __ipipe_unstall_root+0x2c 
(finish_task_switch+0x8c)
:|   #func                -205    0.925  ipipe_check_context+0x10 
(__ipipe_unstall_root+0x34)
:|   +end     0x80000000  -205    0.740  __ipipe_unstall_root+0x68 
(finish_task_switch+0x8c)
:    +func                -204+   1.629  ipipe_check_context+0x10 
(schedule+0x664)
:|   +begin   0x80000001  -202    0.814  preempt_schedule_irq+0x7c 
(__ipipe_preempt_schedule_irq+0x84)
:|   #end     0x80000001  -201    0.592  preempt_schedule_irq+0x9c 
(__ipipe_preempt_schedule_irq+0x84)
:    #func                -201+   1.148  ipipe_check_context+0x10 
(preempt_schedule_irq+0xa8)
:|   #begin   0x80000000  -200+   3.185  __ipipe_preempt_schedule_irq+0x9c 
(__ipipe_sync_stage+0x188)
:|   +end     0x00000144  -196+   1.370  gpio_demux_inner+0x88 
(gpio_irq_handler+0x178)
:|   +func                -195+   2.962  gpio_demux_inner+0x10 
(gpio_irq_handler+0x18c)
:|   +(0x02)  0x00000010  -192    0.629  gpio_irq_handler+0x15c 
(__ipipe_handle_irq+0x150) /* a custom trace point */
:|   +func                -191    0.925  gpio_demux_inner+0x10 
(gpio_irq_handler+0x178)
:|   +begin   0x00000144  -191+   1.111  gpio_demux_inner+0x74 
(gpio_irq_handler+0x178)
:|   +func                -189+   1.518  __ipipe_handle_irq+0x14 
(gpio_demux_inner+0x80)
:|   +func                -188+   2.148  __ipipe_set_irq_pending+0x10 
(__ipipe_handle_irq+0x120)
:|   +func                -186+   1.074  __ipipe_ack_edge_irq+0x10 
(__ipipe_handle_irq+0x150)
:|   +func                -185+   1.370  gpio_ack_irq+0x10 
(__ipipe_ack_edge_irq+0x24)
:|   +func                -183+   1.333  __ipipe_walk_pipeline+0x10 
(__ipipe_handle_irq+0x1a0)
:|  + func                -182    0.740  ipipe_suspend_domain+0x10 
(__ipipe_walk_pipeline+0x68)
:|  + func                -181+   1.407  __ipipe_sync_stage+0x14 
(ipipe_suspend_domain+0x9c)
:|  # func                -180+   1.888  xnintr_irq_handler+0x14 
(__ipipe_sync_stage+0x124)

/* process the new GPIO interrupt */

I don't presently understand why we get back to runninggpio_demux_inner() at this particular point, rather than sometime duringthe prior 33 milliseconds. Note that once gpio_demux_inner() gets achance to unwind, we detect the next (generated long ago) GPIOtransition almost immediately thereafter.

My intuition is that the overall situation may have something to do withthe fact that when CONFIG_PREEMPT enabled, __ipipe_sync_stage() callsthe CONFIG_PREEMPT-specific, non-NULL, implementationof__ipipe_preempt_schedule_irq(), but that's speculation.

I'm hoping that the Xenomai community can shed some further light onwhat's going awry, and perhaps suggest a fix. If possible, I wouldprefer to retain the option to build the kernel with CONFIG_PREEMPT, butwithout encountering the extended latencies documented above.


Thanks much and Best Regards,
Mitch

_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

[Xenomai] A possible mis-interaction between CONFIG_PREEMPT and GPIO IRQ handling for ARM, leading to extreme latency

Reply via email to