Philippe Gerum via Xenomai <[email protected]> writes:

> François Legal <[email protected]> writes:
>
>> Le Lundi, Juin 21, 2021 16:57 CEST, Philippe Gerum <[email protected]> a 
>> écrit: 
>>  
>>> 
>>> Jan Kiszka <[email protected]> writes:
>>> 
>>> > On 21.06.21 16:28, Philippe Gerum wrote:
>>> >> 
>>> >> Jan Kiszka <[email protected]> writes:
>>> >> 
>>> >>> On 21.06.21 15:54, Philippe Gerum wrote:
>>> >>>>
>>> >>>> Jan Kiszka <[email protected]> writes:
>>> >>>>
>>> >>>>> On 21.06.21 11:39, Philippe Gerum wrote:
>>> >>>>>>
>>> >>>>>> Jan Kiszka <[email protected]> writes:
>>> >>>>>>
>>> >>>>>>> On 18.06.21 20:41, François Legal wrote:
>>> >>>>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka 
>>> >>>>>>>> <[email protected]> a écrit: 
>>> >>>>>>>>  
>>> >>>>>>>>> On 16.06.21 15:51, François Legal wrote:
>>> >>>>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka 
>>> >>>>>>>>>> <[email protected]> a écrit: 
>>> >>>>>>>>>>  
>>> >>>>>>>>>>> On 16.06.21 15:29, François Legal wrote:
>>> >>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka 
>>> >>>>>>>>>>>> <[email protected]> a écrit: 
>>> >>>>>>>>>>>>  
>>> >>>>>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>> >>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" 
>>> >>>>>>>>>>>>>> <[email protected]> a écrit: 
>>> >>>>>>>>>>>>>>  
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> -----Original Message-----
>>> >>>>>>>>>>>>>>>> From: François Legal <[email protected]> 
>>> >>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>> >>>>>>>>>>>>>>>> To: Chen, Hongzhan <[email protected]>
>>> >>>>>>>>>>>>>>>> Cc: [email protected]
>>> >>>>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" 
>>> >>>>>>>>>>>>>>>> <[email protected]> a écrit: 
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> -----Original Message-----
>>> >>>>>>>>>>>>>>>>>> From: Xenomai <[email protected]> On Behalf Of 
>>> >>>>>>>>>>>>>>>>>> François Legal via Xenomai
>>> >>>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>> >>>>>>>>>>>>>>>>>> To: [email protected]
>>> >>>>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> Hello,
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> working on a realtime data recorder (for which I 
>>> >>>>>>>>>>>>>>>>>> submitted a patch to add timestamp retrieval for net 
>>> >>>>>>>>>>>>>>>>>> packets), I experience a strange latency problem with 
>>> >>>>>>>>>>>>>>>>>> taking GPIO interrupts.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> So basically my app monitors network packets (which as 
>>> >>>>>>>>>>>>>>>>>> received by the DUT would trigger toggling of a GPIO on 
>>> >>>>>>>>>>>>>>>>>> the DUT) and one GPIO.
>>> >>>>>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the 
>>> >>>>>>>>>>>>>>>>>> DUT would toggle the GPIO 1-2ms after receiving a 
>>> >>>>>>>>>>>>>>>>>> specific network packet.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> My recorder app relies on the timestamping of events 
>>> >>>>>>>>>>>>>>>>>> done in interrupts service routines for GPIOs and 
>>> >>>>>>>>>>>>>>>>>> network interface. By checking the timestamps returned 
>>> >>>>>>>>>>>>>>>>>> by the application, I get an about 50ms delay between 
>>> >>>>>>>>>>>>>>>>>> the network packet and the GPIO (so something between 51 
>>> >>>>>>>>>>>>>>>>>> and 52ms delay).
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through 
>>> >>>>>>>>>>>>>>>>> your rtdm device like on path 
>>> >>>>>>>>>>>>>>>>> /dev/rtdm/your_gpiocontroller/gpio*?
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Maybe my setup was not very clear.
>>> >>>>>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one 
>>> >>>>>>>>>>>>>>>> of its GPIOs depending one the network data.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> I've got another device running my realtime recording app, 
>>> >>>>>>>>>>>>>>>> that receives the same network data as the DUT, and that 
>>> >>>>>>>>>>>>>>>> has one of its GPIO connected to the DUT GPIO.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl 
>>> >>>>>>>>>>>>>>>> (enable interrupt + timestamp), then select & read the 
>>> >>>>>>>>>>>>>>>> GPIO value.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> When issue happen, the recording app side have got same 
>>> >>>>>>>>>>>>>>> number of network data packages and gpio interrupts with 
>>> >>>>>>>>>>>>>>> that DUT have been toggling?  
>>> >>>>>>>>>>>>>>> I am asking this because I met gpio hardware issue that 
>>> >>>>>>>>>>>>>>> would cause gpio interrupt missing or invalid gpio 
>>> >>>>>>>>>>>>>>> interrupts when gpio is connecting on two different boards.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. 
>>> >>>>>>>>>>>>>> Regarding network, the DUT and the recoding device are 
>>> >>>>>>>>>>>>>> connected to the same switch with the same port config.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the 
>>> >>>>>>>>>>>>>> recording device (switched from rising edge to falling 
>>> >>>>>>>>>>>>>> edge), and the result is 1-2ms + ~30ms (the width of the 
>>> >>>>>>>>>>>>>> pulse of the DUT) + 50ms latency
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Can you (or did you already) break down the latencies on the 
>>> >>>>>>>>>>>>> DUT via
>>> >>>>>>>>>>>>> tracing? Is it really the GPIO output path? What is happening 
>>> >>>>>>>>>>>>> in it,
>>> >>>>>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I just did it.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my 
>>> >>>>>>>>>>>> application, triggered the GPIO.
>>> >>>>>>>>>>>> This is what I get in frozen, but I'm not quite sure how to 
>>> >>>>>>>>>>>> read it and what conclusion I could derive :
>>> >>>>>>>>>>>>  cat frozen
>>> >>>>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release 
>>> >>>>>>>>>>>> #10
>>> >>>>>>>>>>>> ------------------------------------------------------------
>>> >>>>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>> >>>>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>  +----- Hard IRQs ('|': locked)
>>> >>>>>>>>>>>>  |+-- Xenomai
>>> >>>>>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': 
>>> >>>>>>>>>>>> current+stalled)
>>> >>>>>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, 
>>> >>>>>>>>>>>> '!': > 10 us)
>>> >>>>>>>>>>>>  |||                      |        +- NMI noise ('N')
>>> >>>>>>>>>>>>  |||                      |        |
>>> >>>>>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>>> >>>>>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 
>>> >>>>>>>>>>>> (run_rebalance_domains+0x7e8)
>>> >>>>>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 
>>> >>>>>>>>>>>> (load_balance+0x180)
>>> >>>>>>>>>>>> :  +func               -6084      0.892  
>>> >>>>>>>>>>>> find_busiest_group+0x14 (load_balance+0x1a4)
>>> >>>>>>>>>>>> :  +func               -6084      0.757  
>>> >>>>>>>>>>>> update_group_capacity+0x14 (find_busiest_group+0x128)
>>> >>>>>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 
>>> >>>>>>>>>>>> (update_group_capacity+0x30)
>>> >>>>>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 
>>> >>>>>>>>>>>> (find_busiest_group+0x1e4)
>>> >>>>>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 
>>> >>>>>>>>>>>> (find_busiest_group+0x1e4)
>>> >>>>>>>>>>>> :  +func               -6078      0.967  
>>> >>>>>>>>>>>> __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>> >>>>>>>>>>>> :  +func               -6077      0.886  
>>> >>>>>>>>>>>> __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>> >>>>>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 
>>> >>>>>>>>>>>> (__do_softirq+0x1b0)
>>> >>>>>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 
>>> >>>>>>>>>>>> (__do_softirq+0x1b4)
>>> >>>>>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 
>>> >>>>>>>>>>>> (ipipe_stall_root+0x18)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6074      0.934  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> >>>>>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_root_only+0xfc)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6072      0.895  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_stall_root+0xb8)
>>> >>>>>>>>>>>> :  #func               -6071      0.814  
>>> >>>>>>>>>>>> __local_bh_enable+0x10 (__do_softirq+0x214)
>>> >>>>>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 
>>> >>>>>>>>>>>> (__local_bh_enable+0x1c)
>>> >>>>>>>>>>>> :| #begin   0x80000001 -6069      0.907  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_test_root+0xb8)
>>> >>>>>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 
>>> >>>>>>>>>>>> (irq_exit+0x84)
>>> >>>>>>>>>>>> :  #func               -6067      0.781  
>>> >>>>>>>>>>>> ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>> >>>>>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 
>>> >>>>>>>>>>>> (ipipe_test_and_stall_root+0x18)
>>> >>>>>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 
>>> >>>>>>>>>>>> (ipipe_root_only+0xb8)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_root_only+0xfc)
>>> >>>>>>>>>>>> :| #begin   0x80000001 -6063      0.895  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_test_and_stall_root+0xc4)
>>> >>>>>>>>>>>> :  #func               -6061+   1.197  
>>> >>>>>>>>>>>> rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>> >>>>>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 
>>> >>>>>>>>>>>> (__ipipe_do_sync_stage+0x2b8)
>>> >>>>>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (__ipipe_grab_irq+0x84)
>>> >>>>>>>>>>>> :| +func               -6058      0.988  
>>> >>>>>>>>>>>> __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>> >>>>>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 
>>> >>>>>>>>>>>> (__ipipe_check_root_interruptible+0x68)
>>> >>>>>>>>>>>> :| +func               -6056      0.829  
>>> >>>>>>>>>>>> __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>> >>>>>>>>>>>> :| +end     0x90000000 -6055      0.913  
>>> >>>>>>>>>>>> __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>> >>>>>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 
>>> >>>>>>>>>>>> (cpu_startup_entry+0x12c)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6053      0.868  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>> >>>>>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_test_root+0xb8)
>>> >>>>>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 
>>> >>>>>>>>>>>> (cpu_startup_entry+0x138)
>>> >>>>>>>>>>>> :  +func               -6051      0.739  
>>> >>>>>>>>>>>> ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>> >>>>>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 
>>> >>>>>>>>>>>> (ipipe_test_and_stall_root+0x18)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 
>>> >>>>>>>>>>>> (ipipe_root_only+0xb8)
>>> >>>>>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_root_only+0xfc)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6048      0.940  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_test_and_stall_root+0xc4)
>>> >>>>>>>>>>>> :  #func               -6046      0.859  
>>> >>>>>>>>>>>> rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>> >>>>>>>>>>>> :  #func               -6045      0.772  
>>> >>>>>>>>>>>> ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>> >>>>>>>>>>>> :| #begin   0x80000000 -6044      0.814  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>> >>>>>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 
>>> >>>>>>>>>>>> (ipipe_unstall_root+0x24)
>>> >>>>>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_unstall_root+0x84)
>>> >>>>>>>>>>>> :  +func               -6042      0.922  
>>> >>>>>>>>>>>> arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>> >>>>>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 
>>> >>>>>>>>>>>> (cpu_startup_entry+0xc4)
>>> >>>>>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 
>>> >>>>>>>>>>>> (ipipe_stall_root+0x18)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 
>>> >>>>>>>>>>>> (ipipe_root_only+0xb8)
>>> >>>>>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_root_only+0xfc)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6037      0.841  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_stall_root+0xb8)
>>> >>>>>>>>>>>> :  #func               -6036      0.838  
>>> >>>>>>>>>>>> arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>> >>>>>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 
>>> >>>>>>>>>>>> (arch_cpu_idle_enter+0x1c)
>>> >>>>>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 
>>> >>>>>>>>>>>> (arm_heavy_mb+0x2c)
>>> >>>>>>>>>>>> :  #func               -6033+   1.062  
>>> >>>>>>>>>>>> tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>> >>>>>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 
>>> >>>>>>>>>>>> (cpu_startup_entry+0x124)
>>> >>>>>>>>>>>> :  #func               -6031      0.745  
>>> >>>>>>>>>>>> ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>> >>>>>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 
>>> >>>>>>>>>>>> (ipipe_test_and_stall_root+0x18)
>>> >>>>>>>>>>>> :| #begin   0x80000001 -6030      0.991  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_root_only+0xfc)
>>> >>>>>>>>>>>> :| #begin   0x80000001 -6028      0.892  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_test_and_stall_root+0xc4)
>>> >>>>>>>>>>>> :  #func               -6026      0.922  
>>> >>>>>>>>>>>> rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>> >>>>>>>>>>>> :  #func               -6025      0.862  
>>> >>>>>>>>>>>> default_idle_call+0x10 (cpu_startup_entry+0x128)
>>> >>>>>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 
>>> >>>>>>>>>>>> (default_idle_call+0x38)
>>> >>>>>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 
>>> >>>>>>>>>>>> (arch_cpu_idle+0xb8)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Here your system (or this core) went idle, waiting for the next 
>>> >>>>>>>>>>> event.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> :| +func                 -31      0.760  
>>> >>>>>>>>>>>> ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Comming back from idle.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 
>>> >>>>>>>>>>>> (ipipe_unstall_root+0x24)
>>> >>>>>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ipipe_unstall_root+0x84)
>>> >>>>>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 
>>> >>>>>>>>>>>> (ipipe_unstall_root+0x88)
>>> >>>>>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 
>>> >>>>>>>>>>>> (__irq_svc+0x6c)
>>> >>>>>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 
>>> >>>>>>>>>>>> (gic_handle_irq+0x50)
>>> >>>>>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 
>>> >>>>>>>>>>>> (gic_handle_irq+0x58)
>>> >>>>>>>>>>>> :| +begin   0x000000c9   -25      0.826  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>> >>>>>>>>>>>> :| +func                 -24      0.814  
>>> >>>>>>>>>>>> __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>> >>>>>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 
>>> >>>>>>>>>>>> (__ipipe_dispatch_irq+0x184)
>>> >>>>>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 
>>> >>>>>>>>>>>> (__ipipe_dispatch_irq+0x198)
>>> >>>>>>>>>>>> :| +func                 -20+   2.092  
>>> >>>>>>>>>>>> ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>> >>>>>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 
>>> >>>>>>>>>>>> (ucc_gpio_irqhandler+0x84)
>>> >>>>>>>>>>>> :| +begin   0x000000e3   -17      0.757  
>>> >>>>>>>>>>>> ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>> >>>>>>>>>>>> :| +func                 -16      0.778  
>>> >>>>>>>>>>>> __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>> >>>>>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 
>>> >>>>>>>>>>>> (__ipipe_dispatch_irq+0x184)
>>> >>>>>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 
>>> >>>>>>>>>>>> (__ipipe_dispatch_irq+0x198)
>>> >>>>>>>>>>>> :| +func                 -13+   1.014  
>>> >>>>>>>>>>>> __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>> >>>>>>>>>>>> :| +func                 -12      0.763  
>>> >>>>>>>>>>>> ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>> >>>>>>>>>>>> :| +func                 -11+   1.248  
>>> >>>>>>>>>>>> __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>> >>>>>>>>>>>> :| #func                 -10+   1.619  
>>> >>>>>>>>>>>> __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>> >>>>>>>>>>>> :| +func                  -8+   1.239  
>>> >>>>>>>>>>>> __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>> >>>>>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 
>>> >>>>>>>>>>>> (ucc_gpio_irqhandler+0xa0)
>>> >>>>>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 
>>> >>>>>>>>>>>> (ucc_gpio_irqhandler+0xd4)
>>> >>>>>>>>>>>> :| +func                  -5+   1.353  
>>> >>>>>>>>>>>> __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>> >>>>>>>>>>>> :|+ func                  -4+   1.449  
>>> >>>>>>>>>>>> __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>> >>>>>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 
>>> >>>>>>>>>>>> (__ipipe_do_sync_stage+0x200)
>>> >>>>>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 
>>> >>>>>>>>>>>> (xnintr_irq_handler+0xc0)
>>> >>>>>>>>>>>> <|# func                   0      1.107  
>>> >>>>>>>>>>>> gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> And here we start to process that GPIO interrupt in the RTDM 
>>> >>>>>>>>>>> handler,
>>> >>>>>>>>>>> roughly after 30 µs (which are also due to tracing overhead). 
>>> >>>>>>>>>>> So far
>>> >>>>>>>>>>> nothing suspiciuos.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>>  |# func                   1      1.236  
>>> >>>>>>>>>>>> xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>> >>>>>>>>>>>>  |# func                   2      1.224  
>>> >>>>>>>>>>>> rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Here some likely waiting RT task is signalled. Does that one 
>>> >>>>>>>>>>> notice a
>>> >>>>>>>>>>> too high latency?
>>> >>>>>>>>>>
>>> >>>>>>>>>> I may have been wrong speaking of "latency". The problem I'm 
>>> >>>>>>>>>> trying to fix, is understanding why my reference instrument 
>>> >>>>>>>>>> indicates a delay from network packet to GPIO on the DUT of 
>>> >>>>>>>>>> about 1 to 2ms, whereas my xenomai powered realtime recorder 
>>> >>>>>>>>>> application, given the same network stream and gpio access gives 
>>> >>>>>>>>>> me ~50 to 52 ms.
>>> >>>>>>>>>>
>>> >>>>>>>>>> I first though there was something wrong/delayed on the GPIO 
>>> >>>>>>>>>> interrupt side (the network packet could not have been received 
>>> >>>>>>>>>> before it is sent right).
>>> >>>>>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a 
>>> >>>>>>>>>> number I expected for interrupt latency on that system) for the 
>>> >>>>>>>>>> GPIO. So something is wrong on my system, but I don't know what !
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Try to trace events, not functions, using regular ftrace 
>>> >>>>>>>>> ("trace-cmd
>>> >>>>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when 
>>> >>>>>>>>> the
>>> >>>>>>>>> NIC gets the interrupt and compare that to when the GPIO event is
>>> >>>>>>>>> triggered (or whatever is trigger and reaction). Function tracing 
>>> >>>>>>>>> is for
>>> >>>>>>>>> zooming in when you know where to zoom.
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> So I think I found the answer to my problem.
>>> >>>>>>>> Is there any reason why NET events (in NIC irq functions) are 
>>> >>>>>>>> dated with rtdm_clock_read, whereas GPIO event are dated with 
>>> >>>>>>>> rtdm_clock_read_monotonic ?
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read 
>>> >>>>>>> for
>>> >>>>>>> user-exposed timestamping as well. Maybe Philippe can comment on
>>> >>>>>>> thoughts behind this deviation.
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
>>> >>>>>> idea of real time, which is the Cobalt monotonic clock plus an 
>>> >>>>>> arbitrary
>>> >>>>>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
>>> >>>>>> (different CPUs might read different timestamps at the exact same 
>>> >>>>>> time),
>>> >>>>>> is not in sync with linux's wall clock either. For these reasons, I
>>> >>>>>> don't see any practical way to synchronize multiple systems on the 
>>> >>>>>> clock
>>> >>>>>> underlying rtdm_clock_read().
>>> >>>>>>
>>> >>>>>> Therefore, there is no upside in using rtdm_clock_read() for
>>> >>>>>> timestamping in this context, only adding the potential for even more
>>> >>>>>> surprising results due to the mono->real-time offset changing under 
>>> >>>>>> our
>>> >>>>>> feet, since the epoch of the Xenomai real-time clock can be (re)set
>>> >>>>>> during runtime.
>>> >>>>>>
>>> >>>>>> I believe the UART driver is wrong here, it should return timestamps
>>> >>>>>> based on the monotonic source, which best fits the common need: 
>>> >>>>>> getting
>>> >>>>>> timestamps from the local CPU for measuring delays between events
>>> >>>>>> received by drivers and the actions taken by the applications which
>>> >>>>>> consume them, immune from updates to the underlying clock
>>> >>>>>> epoch. Granted, there might be a catch when a timestamp is taken from
>>> >>>>>> IRQ context, which is then consumed from a thread living on a 
>>> >>>>>> different
>>> >>>>>> CPU, if per-CPU clocks are not/badly synchronized. But that would 
>>> >>>>>> happen
>>> >>>>>> the exact same way with rtdm_clock_read() anyway.
>>> >>>>>>
>>> >>>>>> The situation improves when running on top of Dovetail, since Xenomai
>>> >>>>>> now refers to the common linux clocks (mono / real) instead of 
>>> >>>>>> providing
>>> >>>>>> its own idea of time, but the UART driver code predates the Dovetail
>>> >>>>>> port.
>>> >>>>>>  
>>> >>>>>
>>> >>>>> Well, it's not just UART. All hardware drivers - except for GPIO - use
>>> >>>>> rtdm_clock_read. That was no problem in practice for their use cases 
>>> >>>>> so
>>> >>>>> far. One problem is that GPIO timestamps are now not comparable to 
>>> >>>>> others.
>>> >>>>>
>>> >>>>> But IIRC, most Xenomai APIs using absolute timestamps are based on
>>> >>>>> Xenomai's real-time clock. Therefore, providing timestamps for that
>>> >>>>
>>> >>>> /*
>>> >>>>  * The Copperplate clock shall be monotonic unless the threading
>>> >>>>  * library has restrictions to support this over Mercury.
>>> >>>>  *
>>> >>>>  * In the normal case, this means that ongoing delays and timeouts
>>> >>>>  * won't be affected when the host system date is changed. In the
>>> >>>>  * restricted case by contrast, ongoing delays and timeouts may be
>>> >>>>  * impacted by changes to the host system date.
>>> >>>>  *
>>> >>>>  * The implementation maintains a per-clock epoch value, so that
>>> >>>>  * different emulators can have different (virtual) system dates.
>>> >>>>  */
>>> >>>> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
>>> >>>> #define CLOCK_COPPERPLATE  CLOCK_REALTIME
>>> >>>> #else
>>> >>>> #define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
>>> >>>> #endif
>>> >>>>
>>> >>>> So no, only the POSIX API is using what the standard mandates, which is
>>> >>>> CLOCK_REALTIME. All other APIs are based on copperplate, and they are
>>> >>>> using a monotonic source as documented above.
>>> >>>
>>> >>> OK, but that changed in 3.x. At the time that RTDM API was originally
>>> >>> added and then promoted, it was the other not this way. We became
>>> >>> inconsistent then.
>>> >>>
>>> >>>>
>>> >>>>> particular clock was the original idea of rtdm_clock_read (which
>>> >>>>> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
>>> >>>>> fixed - unless I'm wrong with that assumption.
>>> >>>>>
>>> >>>>
>>> >>>> We cannot assume the epoch is going to remain stable with
>>> >>>> rtdm_clock_read() the way it is implemented, which is quite of a
>>> >>>> problem wrt the common use case. For timestamping, a majority of
>>> >>>> mainline drivers is using ktime_get() or a variant thereof which is
>>> >>>> based on the monotonic clock source, not the _real form. Why would the
>>> >>>> real-time I/O drivers be different?
>>> >>>
>>> >>> We have two cases here:
>>> >>>
>>> >>>  - I-pipe-based version where the realtime clock is under full
>>> >>>    application control -> no problem to use rtdm_clock_read
>>> >> 
>>> >> Well, there is still the issue that rtdm_clock_read() is not immune to
>>> >> some part of userland changing the CLOCK_REALTIME epoch Xenomai-wise
>>> >> e.g. via a call to clock_settime(), which is the same problem than Linux
>>> >> changing the epoch of CLOCK_REALTIME over Dovetail. This would break the
>>> >> application.
>>> >
>>> > Yes, but that's about the application(s) breaking themselves. Nothing
>>> > new, not going to change when we only avoid clock_realtime for stamps
>>> > but still use POSIX services basing timers on that clock. The key point
>>> > is that I-pipe gave that into RT application hands (with all related
>>> > downsides), with Dovetail it's in system hands.
>>> >
>>> >> 
>>> >>>  - Dovetail where we share the realtime clock - with all its tuning -
>>> >>>    with Linux -> here we have a problem with rtdm_clock_read and should
>>> >>>    reconsider its usage (and promotion!)
>>> >>>
>>> >>> For stable 3.1, the proper fix is with GPIO going to rtdm_clock_read.
>>> >> 
>>> >> Wait, you have downstream users already depending on GPIO returning
>>> >> monotonic timestamps, and this is a _stable_ release. So why not fixing
>>> >> other drivers based on the fact that timestamping with rtdm_clock_read()
>>> >> is wrong instead? Same issue, right?
>>> >
>>> > GPIO was broken, but you are right that we may have users relying on
>>> > that breakage now. Obviously, we can change the other drivers for the
>>> > very same reasons: They are working like they work for more than 10
>>> > years now.
>>> >
>>> 
>>> Sorry, but I my views, rtdm_read_clock() was broken since day one. GPIO
>>> had to work around the breakage.. :)
>>> 
>>> >> 
>>> >> So the best course of action to sort this out for 3.1.x may be to extend
>>> >> GPIO_RTIOC_TS with say, GPIO_RTIOC_TS_REAL, which would log and return
>>> >> timestamps based on the Xenomai wallclock. Applications which do want to
>>> >> align on that clock would simply have to issue GPIO_RTIOC_TS_REAL
>>> >> instead of GPIO_RTIOC_TS. This would break backward ABI compat only for
>>> >> users of GPIO_RTIOC_TS_REAL, but that would be much better than
>>> >> introducing a sneaky change in behavior for the GPIO driver.
>>> >
>>> > Yeah, likely the way to go.
>>> >
>>> >> 
>>> >>> For 3.2, I'm not sure yet what to do with rtdm_clock_read.
>>> >>>
>>> >> 
>>> >> The decision looks pretty simple for the common use case: when
>>> >> timestamps are needed for performance/delay measurements, we want to get
>>> >> them from a clock source which won't play funny games, warping back to
>>> >> the future.
>>> >
>>> > You only think of measurements. The other drivers used the stamping also
>>> > for real applications, means to calculate absolute clock-realtime
>>> > timeouts and wait for them to arrive. We will likely need to enhance
>>> > also the other driver APIs to select the desired clocksource, just like
>>> > for GPIO.
>>> 
>>> I'm referring to what is in the GPIO code, which is the problem at hand:
>>> that timestamping was designed since day #1 to provide a way to measure
>>> the scheduling latency.
>>> 
>>> I agree that the only way out is to enable all timestamp consumers to
>>> pick their base clock of choice (mono or wallclock).
>>> 
>>> -- 
>>> Philippe.
>>  
>>  Hi there,
>>
>> as we speak of breaking ABI, why not introducing some specific IOCTL to 
>> select which timesource to use for each driver using it, then using a 
>> function pointer to call the right rtdm_time_get_whatever function ?
>>
>
> That would mean adding a generic ioctl, and many changes all over the
> place (we would need no function pointer with distinct ioctl
> codes). Doable for sure, but likely too much for 3.1.x though.

wrt function pointer: I mean that a plain simple branch on some selector
is likely to perform better than a function pointer, with the spectre
mitigations around.

-- 
Philippe.

Reply via email to