Re: [Xenomai-core] [BUG?] stalled xeno domain
Jan Kiszka wrote: Philippe Gerum wrote: Jan Kiszka wrote: Philippe Gerum wrote: Jan Kiszka wrote: Philippe, do you see any remaining issues, e.g. that the leak survived the task termination? Does this have any meaning for correct driver and skin code? The only way I could see this leakage survive a switch transition would require it to happen over the root context, not over a primary context. Was it the case? The task had to leave from primary mode. If I forced it to secondary before terminating, the problem did not show up. But does the code causing the leakage could have been run by different contexts in sequence, including the root one? I don't think so. Bugs in our software aside, there should be no switch to secondary mode until termination. Moreover, we installed a SIGXCPU handler, and that one didn't trigger as well. I just constructed a simple test by placing rthal_local_irq_disable() in rt_timer_spin and setting up this user space app: #include stdio.h #include signal.h #include native/task.h #include native/timer.h RT_TASK task; void func(void *arg) { rt_timer_spin(0); } void terminate(int sig) { printf(joining...\n); rt_task_join(task); rt_task_delete(task); printf(done\n); } int main() { signal(SIGINT, terminate); rt_task_spawn(task, lockup, 0, 10, T_FPU | T_JOINABLE | T_WARNSW, func, NULL); pause(); return 0; } Should this lock up (as it currently does) or rather continue to run normally after the RT-task terminated? BTW, I'm still not sure if we are hunting shadows (is IRQs off a legal state for user space in some skin?) or a real problem - i.e. is it worth the time. I've just tested this frag against the current SVN head, patching rt_timer_spin() as required, and cannot reproduce the lockup. As Are you sure that you actually used the modified native skin for the test? Yep, checked twice. expected, the incoming root thread reinstates the correct stall bit (i.e. clears it) after the RT thread terminates. Any chance some potentially troublesome stuff exists in your setup? I just re-verified this behaviour on a slightly different setup (still 2.6.15-ipipe-1.2-02, xeno trunk), and I'm going to try this on a third box with 2.6.16+tracing soon. So far I still have a stuck timer IRQ after the test. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG?] stalled xeno domain
Jan Kiszka wrote: Philippe Gerum wrote: Jan Kiszka wrote: Philippe, do you see any remaining issues, e.g. that the leak survived the task termination? Does this have any meaning for correct driver and skin code? The only way I could see this leakage survive a switch transition would require it to happen over the root context, not over a primary context. Was it the case? The task had to leave from primary mode. If I forced it to secondary before terminating, the problem did not show up. But does the code causing the leakage could have been run by different contexts in sequence, including the root one? I don't think so. Bugs in our software aside, there should be no switch to secondary mode until termination. Moreover, we installed a SIGXCPU handler, and that one didn't trigger as well. I just constructed a simple test by placing rthal_local_irq_disable() in rt_timer_spin and setting up this user space app: #include stdio.h #include signal.h #include native/task.h #include native/timer.h RT_TASK task; void func(void *arg) { rt_timer_spin(0); } void terminate(int sig) { printf(joining...\n); rt_task_join(task); rt_task_delete(task); printf(done\n); } int main() { signal(SIGINT, terminate); rt_task_spawn(task, lockup, 0, 10, T_FPU | T_JOINABLE | T_WARNSW, func, NULL); pause(); return 0; } Should this lock up (as it currently does) or rather continue to run normally after the RT-task terminated? BTW, I'm still not sure if we are hunting shadows (is IRQs off a legal state for user space in some skin?) or a real problem - i.e. is it worth the time. I've just tested this frag against the current SVN head, patching rt_timer_spin() as required, and cannot reproduce the lockup. As expected, the incoming root thread reinstates the correct stall bit (i.e. clears it) after the RT thread terminates. Any chance some potentially troublesome stuff exists in your setup? -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG?] stalled xeno domain
Jan Kiszka wrote: Jan Kiszka wrote: Jan Kiszka wrote: Hi Philippe, debugging is nice, tracing is still nicer. As you suggested, I extended the tracer with per-domain stall flags (needs some output clean-up, preliminary patch attached nevertheless). And here is the result (full trace attached): :|(0x51) 0x00c8 -1113+ 1.112 __ipipe_sync_stage+0x142 (ipipe_suspend_domain+0x56) :|fn-1112+ 1.789 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x56) :| *(0x50) 0x0064 -1110+ 1.293 __ipipe_sync_stage+0x97 (ipipe_suspend_domain+0x56) :*fn-1109+ 1.398 do_IRQ+0x8 (__ipipe_sync_stage+0xcf) :*fn-1107+ 2.105 __do_IRQ+0xc (do_IRQ+0x21) :*fn-1105+ 1.631 handle_IRQ_event+0xd (__do_IRQ+0x9e) :*fn-1104+ 2.413 timer_interrupt+0x9 (handle_IRQ_event+0x40) :*fn-1101+ 3.022 mark_offset_tsc+0xe (timer_interrupt+0x31) :*fn-1098+ 2.721 do_timer+0x9 (timer_interrupt+0x37) :| *fn-1096+ 2.112 __ipipe_handle_irq+0xe (common_interrupt+0x18) :| *fn-1093+ 1.210 __ipipe_ack_common_irq+0xc (__ipipe_handle_irq+0xc0) :| *(0x50) 0x0064 -1092+ 1.218 __ipipe_ack_common_irq+0x31 (__ipipe_handle_irq+0xc0) :| *fn-1091+ 1.834 mask_and_ack_8259A+0xb (__ipipe_ack_common_irq+0x5d) :| *(0x50) 0x0064 -1089+ 1.345 __ipipe_ack_common_irq+0x9d (__ipipe_handle_irq+0xc0) :| *fn-10880.924 ipipe_suspend_domain+0xb (__ipipe_handle_irq+0x174) :| *(0x51) 0x00c8 -1087+ 1.172 ipipe_suspend_domain+0x26 (__ipipe_handle_irq+0x174) :| *fn-1086+ 2.030 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x56) :| **(0x50) 0x00c8 -1084+ 1.330 __ipipe_sync_stage+0x97 (ipipe_suspend_domain+0x56) :| **fn-10820.879 xnintr_clock_handler+0x8 (__ipipe_sync_stage+0x12b) :| **fn-1082+ 1.218 xnintr_irq_handler+0xb (xnintr_clock_handler+0x18) :| **fn-1080+ 1.233 xnpod_announce_tick+0x9 (xnintr_irq_handler+0x2a) :| **(0x50) 0x00c8 -1079+ 1.736 xnpod_announce_tick+0x20 (xnintr_irq_handler+0x2a) :| **fn-1077+ 2.984 xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0x36) :| **fn-1074+ 2.751 xnthread_timeout_handler+0x8 (xntimer_do_tick_aperiodic+0x18d) :| **fn-1072+ 1.864 xnpod_resume_thread+0xe (xnthread_timeout_handler+0x1d) :| **(0x50) 0x00c8 -1070+ 4.699 xnpod_resume_thread+0x2b (xnthread_timeout_handler+0x1d) :| **(0x50) 0x00c8 -1065+ 1.586 xnpod_resume_thread+0x2bf (xnthread_timeout_handler+0x1d) :| **(0x01) 0x1237 -1063+ 2.661 xntimer_do_tick_aperiodic+0x309 (xnpod_announce_tick+0x36) :| **(0x50) 0x00c8 -1061+ 1.263 xnpod_announce_tick+0x4f (xnintr_irq_handler+0x2a) :| **fn-1060+ 1.255 rthal_irq_end+0x8 (xnintr_irq_handler+0x46) :| **fn-1058+ 2.007 enable_8259A_irq+0x9 (rthal_irq_end+0x2e) :| **fn-1056+ 1.924 xnpod_schedule+0xe (xnintr_irq_handler+0x6e) :| **(0x50) 0x00c8 -1054! 15.368 xnpod_schedule+0x53 (xnintr_irq_handler+0x6e) :| **fn-1039! 13.300 __switch_to+0xc (xnpod_schedule+0x689) :| **(0x50) 0x00c8 -1026+ 1.766 xnpod_schedule+0x951 (xnpod_suspend_thread+0x27c) :| **(0x50) 0x00c8 -1024! 18.195 xnpod_suspend_thread+0x2bb (rt_task_sleep+0x54) : **fn-1006+ 1.624 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn-1004+ 1.406 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) : **fn-1003+ 4.323 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e) : **fn -998+ 1.398 __rt_task_sleep+0xa (hisyscall_event+0x23c) : **fn -997+ 1.789 __copy_from_user_ll+0xa (__rt_task_sleep+0x1a) : **fn -995! 15.345 rt_task_sleep+0xa (__rt_task_sleep+0x25) : **fn -9800.879 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn -979+ 1.308 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) : **fn -978+ 2.451 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e) : **fn -975+ 1.631 sys_rtdm_ioctl+0x8 (hisyscall_event+0x23c) : **fn -974+ 1.255 _rtdm_ioctl+0xc (sys_rtdm_ioctl+0x1b) : **fn -972+ 4.180 rtdm_context_get+0xa (_rtdm_ioctl+0x20) : **fn -968+ 1.203 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn -967+ 1.345 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) The '*' mark a stalled domain, root domain on the right. It seems to me that xnpod_suspend_thread() leaves the xeno domain stalled on wake up. This gets recovered somehow later. But here we see a different effect which finally
Re: [Xenomai-core] [BUG?] stalled xeno domain
Philippe Gerum wrote: Jan Kiszka wrote: Jan Kiszka wrote: Jan Kiszka wrote: Hi Philippe, debugging is nice, tracing is still nicer. As you suggested, I extended the tracer with per-domain stall flags (needs some output clean-up, preliminary patch attached nevertheless). And here is the result (full trace attached): :|(0x51) 0x00c8 -1113+ 1.112 __ipipe_sync_stage+0x142 (ipipe_suspend_domain+0x56) :|fn-1112+ 1.789 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x56) :| *(0x50) 0x0064 -1110+ 1.293 __ipipe_sync_stage+0x97 (ipipe_suspend_domain+0x56) :*fn-1109+ 1.398 do_IRQ+0x8 (__ipipe_sync_stage+0xcf) :*fn-1107+ 2.105 __do_IRQ+0xc (do_IRQ+0x21) :*fn-1105+ 1.631 handle_IRQ_event+0xd (__do_IRQ+0x9e) :*fn-1104+ 2.413 timer_interrupt+0x9 (handle_IRQ_event+0x40) :*fn-1101+ 3.022 mark_offset_tsc+0xe (timer_interrupt+0x31) :*fn-1098+ 2.721 do_timer+0x9 (timer_interrupt+0x37) :| *fn-1096+ 2.112 __ipipe_handle_irq+0xe (common_interrupt+0x18) :| *fn-1093+ 1.210 __ipipe_ack_common_irq+0xc (__ipipe_handle_irq+0xc0) :| *(0x50) 0x0064 -1092+ 1.218 __ipipe_ack_common_irq+0x31 (__ipipe_handle_irq+0xc0) :| *fn-1091+ 1.834 mask_and_ack_8259A+0xb (__ipipe_ack_common_irq+0x5d) :| *(0x50) 0x0064 -1089+ 1.345 __ipipe_ack_common_irq+0x9d (__ipipe_handle_irq+0xc0) :| *fn-10880.924 ipipe_suspend_domain+0xb (__ipipe_handle_irq+0x174) :| *(0x51) 0x00c8 -1087+ 1.172 ipipe_suspend_domain+0x26 (__ipipe_handle_irq+0x174) :| *fn-1086+ 2.030 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x56) :| **(0x50) 0x00c8 -1084+ 1.330 __ipipe_sync_stage+0x97 (ipipe_suspend_domain+0x56) :| **fn-10820.879 xnintr_clock_handler+0x8 (__ipipe_sync_stage+0x12b) :| **fn-1082+ 1.218 xnintr_irq_handler+0xb (xnintr_clock_handler+0x18) :| **fn-1080+ 1.233 xnpod_announce_tick+0x9 (xnintr_irq_handler+0x2a) :| **(0x50) 0x00c8 -1079+ 1.736 xnpod_announce_tick+0x20 (xnintr_irq_handler+0x2a) :| **fn-1077+ 2.984 xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0x36) :| **fn-1074+ 2.751 xnthread_timeout_handler+0x8 (xntimer_do_tick_aperiodic+0x18d) :| **fn-1072+ 1.864 xnpod_resume_thread+0xe (xnthread_timeout_handler+0x1d) :| **(0x50) 0x00c8 -1070+ 4.699 xnpod_resume_thread+0x2b (xnthread_timeout_handler+0x1d) :| **(0x50) 0x00c8 -1065+ 1.586 xnpod_resume_thread+0x2bf (xnthread_timeout_handler+0x1d) :| **(0x01) 0x1237 -1063+ 2.661 xntimer_do_tick_aperiodic+0x309 (xnpod_announce_tick+0x36) :| **(0x50) 0x00c8 -1061+ 1.263 xnpod_announce_tick+0x4f (xnintr_irq_handler+0x2a) :| **fn-1060+ 1.255 rthal_irq_end+0x8 (xnintr_irq_handler+0x46) :| **fn-1058+ 2.007 enable_8259A_irq+0x9 (rthal_irq_end+0x2e) :| **fn-1056+ 1.924 xnpod_schedule+0xe (xnintr_irq_handler+0x6e) :| **(0x50) 0x00c8 -1054! 15.368 xnpod_schedule+0x53 (xnintr_irq_handler+0x6e) :| **fn-1039! 13.300 __switch_to+0xc (xnpod_schedule+0x689) :| **(0x50) 0x00c8 -1026+ 1.766 xnpod_schedule+0x951 (xnpod_suspend_thread+0x27c) :| **(0x50) 0x00c8 -1024! 18.195 xnpod_suspend_thread+0x2bb (rt_task_sleep+0x54) : **fn-1006+ 1.624 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn-1004+ 1.406 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) : **fn-1003+ 4.323 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e) : **fn -998+ 1.398 __rt_task_sleep+0xa (hisyscall_event+0x23c) : **fn -997+ 1.789 __copy_from_user_ll+0xa (__rt_task_sleep+0x1a) : **fn -995! 15.345 rt_task_sleep+0xa (__rt_task_sleep+0x25) : **fn -9800.879 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn -979+ 1.308 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) : **fn -978+ 2.451 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e) : **fn -975+ 1.631 sys_rtdm_ioctl+0x8 (hisyscall_event+0x23c) : **fn -974+ 1.255 _rtdm_ioctl+0xc (sys_rtdm_ioctl+0x1b) : **fn -972+ 4.180 rtdm_context_get+0xa (_rtdm_ioctl+0x20) : **fn -968+ 1.203 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn -967+ 1.345 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) The '*' mark a stalled domain, root domain on the right. It seems to me that xnpod_suspend_thread() leaves the xeno domain stalled on wake up.
Re: [Xenomai-core] [BUG?] stalled xeno domain
Jan Kiszka wrote: Philippe, do you see any remaining issues, e.g. that the leak survived the task termination? Does this have any meaning for correct driver and skin code? The only way I could see this leakage survive a switch transition would require it to happen over the root context, not over a primary context. Was it the case? The task had to leave from primary mode. If I forced it to secondary before terminating, the problem did not show up. But does the code causing the leakage could have been run by different contexts in sequence, including the root one? -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG?] stalled xeno domain
Jan Kiszka wrote: Philippe Gerum wrote: Jan Kiszka wrote: Philippe, do you see any remaining issues, e.g. that the leak survived the task termination? Does this have any meaning for correct driver and skin code? The only way I could see this leakage survive a switch transition would require it to happen over the root context, not over a primary context. Was it the case? The task had to leave from primary mode. If I forced it to secondary before terminating, the problem did not show up. But does the code causing the leakage could have been run by different contexts in sequence, including the root one? I don't think so. Bugs in our software aside, there should be no switch to secondary mode until termination. Moreover, we installed a SIGXCPU handler, and that one didn't trigger as well. I just constructed a simple test by placing rthal_local_irq_disable() in rt_timer_spin and setting up this user space app: #include stdio.h #include signal.h #include native/task.h #include native/timer.h RT_TASK task; void func(void *arg) { rt_timer_spin(0); } void terminate(int sig) { printf(joining...\n); rt_task_join(task); rt_task_delete(task); printf(done\n); } int main() { signal(SIGINT, terminate); rt_task_spawn(task, lockup, 0, 10, T_FPU | T_JOINABLE | T_WARNSW, func, NULL); pause(); return 0; } Should this lock up (as it currently does) or rather continue to run normally after the RT-task terminated? BTW, I'm still not sure if we are hunting shadows (is IRQs off a legal state for user space in some skin?) or a real problem - i.e. is it worth the time. IRQS off in user-space - aside of the particular semantics introduced by the interrupt shielding - is not a correct state, but it is for kernel based RT threads, so I would expect the real-time core to be robust wrt this kind of situation. I'm going to put this issue on my work queue anyway, I don't like unexplained software thingies getting too close to the Twilight Zone... -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG?] stalled xeno domain
Jan Kiszka wrote: Jan Kiszka wrote: Hi Philippe, debugging is nice, tracing is still nicer. As you suggested, I extended the tracer with per-domain stall flags (needs some output clean-up, preliminary patch attached nevertheless). And here is the result (full trace attached): :|(0x51) 0x00c8 -1113+ 1.112 __ipipe_sync_stage+0x142 (ipipe_suspend_domain+0x56) :|fn-1112+ 1.789 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x56) :| *(0x50) 0x0064 -1110+ 1.293 __ipipe_sync_stage+0x97 (ipipe_suspend_domain+0x56) :*fn-1109+ 1.398 do_IRQ+0x8 (__ipipe_sync_stage+0xcf) :*fn-1107+ 2.105 __do_IRQ+0xc (do_IRQ+0x21) :*fn-1105+ 1.631 handle_IRQ_event+0xd (__do_IRQ+0x9e) :*fn-1104+ 2.413 timer_interrupt+0x9 (handle_IRQ_event+0x40) :*fn-1101+ 3.022 mark_offset_tsc+0xe (timer_interrupt+0x31) :*fn-1098+ 2.721 do_timer+0x9 (timer_interrupt+0x37) :| *fn-1096+ 2.112 __ipipe_handle_irq+0xe (common_interrupt+0x18) :| *fn-1093+ 1.210 __ipipe_ack_common_irq+0xc (__ipipe_handle_irq+0xc0) :| *(0x50) 0x0064 -1092+ 1.218 __ipipe_ack_common_irq+0x31 (__ipipe_handle_irq+0xc0) :| *fn-1091+ 1.834 mask_and_ack_8259A+0xb (__ipipe_ack_common_irq+0x5d) :| *(0x50) 0x0064 -1089+ 1.345 __ipipe_ack_common_irq+0x9d (__ipipe_handle_irq+0xc0) :| *fn-10880.924 ipipe_suspend_domain+0xb (__ipipe_handle_irq+0x174) :| *(0x51) 0x00c8 -1087+ 1.172 ipipe_suspend_domain+0x26 (__ipipe_handle_irq+0x174) :| *fn-1086+ 2.030 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x56) :| **(0x50) 0x00c8 -1084+ 1.330 __ipipe_sync_stage+0x97 (ipipe_suspend_domain+0x56) :| **fn-10820.879 xnintr_clock_handler+0x8 (__ipipe_sync_stage+0x12b) :| **fn-1082+ 1.218 xnintr_irq_handler+0xb (xnintr_clock_handler+0x18) :| **fn-1080+ 1.233 xnpod_announce_tick+0x9 (xnintr_irq_handler+0x2a) :| **(0x50) 0x00c8 -1079+ 1.736 xnpod_announce_tick+0x20 (xnintr_irq_handler+0x2a) :| **fn-1077+ 2.984 xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0x36) :| **fn-1074+ 2.751 xnthread_timeout_handler+0x8 (xntimer_do_tick_aperiodic+0x18d) :| **fn-1072+ 1.864 xnpod_resume_thread+0xe (xnthread_timeout_handler+0x1d) :| **(0x50) 0x00c8 -1070+ 4.699 xnpod_resume_thread+0x2b (xnthread_timeout_handler+0x1d) :| **(0x50) 0x00c8 -1065+ 1.586 xnpod_resume_thread+0x2bf (xnthread_timeout_handler+0x1d) :| **(0x01) 0x1237 -1063+ 2.661 xntimer_do_tick_aperiodic+0x309 (xnpod_announce_tick+0x36) :| **(0x50) 0x00c8 -1061+ 1.263 xnpod_announce_tick+0x4f (xnintr_irq_handler+0x2a) :| **fn-1060+ 1.255 rthal_irq_end+0x8 (xnintr_irq_handler+0x46) :| **fn-1058+ 2.007 enable_8259A_irq+0x9 (rthal_irq_end+0x2e) :| **fn-1056+ 1.924 xnpod_schedule+0xe (xnintr_irq_handler+0x6e) :| **(0x50) 0x00c8 -1054! 15.368 xnpod_schedule+0x53 (xnintr_irq_handler+0x6e) :| **fn-1039! 13.300 __switch_to+0xc (xnpod_schedule+0x689) :| **(0x50) 0x00c8 -1026+ 1.766 xnpod_schedule+0x951 (xnpod_suspend_thread+0x27c) :| **(0x50) 0x00c8 -1024! 18.195 xnpod_suspend_thread+0x2bb (rt_task_sleep+0x54) : **fn-1006+ 1.624 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn-1004+ 1.406 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) : **fn-1003+ 4.323 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e) : **fn -998+ 1.398 __rt_task_sleep+0xa (hisyscall_event+0x23c) : **fn -997+ 1.789 __copy_from_user_ll+0xa (__rt_task_sleep+0x1a) : **fn -995! 15.345 rt_task_sleep+0xa (__rt_task_sleep+0x25) : **fn -9800.879 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn -979+ 1.308 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) : **fn -978+ 2.451 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e) : **fn -975+ 1.631 sys_rtdm_ioctl+0x8 (hisyscall_event+0x23c) : **fn -974+ 1.255 _rtdm_ioctl+0xc (sys_rtdm_ioctl+0x1b) : **fn -972+ 4.180 rtdm_context_get+0xa (_rtdm_ioctl+0x20) : **fn -968+ 1.203 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn -967+ 1.345 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) The '*' mark a stalled domain, root domain on the right. It seems to me that xnpod_suspend_thread() leaves the xeno domain stalled on wake up. This gets recovered somehow later. But here we see a different effect which finally causes the tick to get
[Xenomai-core] [BUG?] stalled xeno domain
Hi Philippe, debugging is nice, tracing is still nicer. As you suggested, I extended the tracer with per-domain stall flags (needs some output clean-up, preliminary patch attached nevertheless). And here is the result (full trace attached): :|(0x51) 0x00c8 -1113+ 1.112 __ipipe_sync_stage+0x142 (ipipe_suspend_domain+0x56) :|fn-1112+ 1.789 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x56) :| *(0x50) 0x0064 -1110+ 1.293 __ipipe_sync_stage+0x97 (ipipe_suspend_domain+0x56) :*fn-1109+ 1.398 do_IRQ+0x8 (__ipipe_sync_stage+0xcf) :*fn-1107+ 2.105 __do_IRQ+0xc (do_IRQ+0x21) :*fn-1105+ 1.631 handle_IRQ_event+0xd (__do_IRQ+0x9e) :*fn-1104+ 2.413 timer_interrupt+0x9 (handle_IRQ_event+0x40) :*fn-1101+ 3.022 mark_offset_tsc+0xe (timer_interrupt+0x31) :*fn-1098+ 2.721 do_timer+0x9 (timer_interrupt+0x37) :| *fn-1096+ 2.112 __ipipe_handle_irq+0xe (common_interrupt+0x18) :| *fn-1093+ 1.210 __ipipe_ack_common_irq+0xc (__ipipe_handle_irq+0xc0) :| *(0x50) 0x0064 -1092+ 1.218 __ipipe_ack_common_irq+0x31 (__ipipe_handle_irq+0xc0) :| *fn-1091+ 1.834 mask_and_ack_8259A+0xb (__ipipe_ack_common_irq+0x5d) :| *(0x50) 0x0064 -1089+ 1.345 __ipipe_ack_common_irq+0x9d (__ipipe_handle_irq+0xc0) :| *fn-10880.924 ipipe_suspend_domain+0xb (__ipipe_handle_irq+0x174) :| *(0x51) 0x00c8 -1087+ 1.172 ipipe_suspend_domain+0x26 (__ipipe_handle_irq+0x174) :| *fn-1086+ 2.030 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x56) :| **(0x50) 0x00c8 -1084+ 1.330 __ipipe_sync_stage+0x97 (ipipe_suspend_domain+0x56) :| **fn-10820.879 xnintr_clock_handler+0x8 (__ipipe_sync_stage+0x12b) :| **fn-1082+ 1.218 xnintr_irq_handler+0xb (xnintr_clock_handler+0x18) :| **fn-1080+ 1.233 xnpod_announce_tick+0x9 (xnintr_irq_handler+0x2a) :| **(0x50) 0x00c8 -1079+ 1.736 xnpod_announce_tick+0x20 (xnintr_irq_handler+0x2a) :| **fn-1077+ 2.984 xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0x36) :| **fn-1074+ 2.751 xnthread_timeout_handler+0x8 (xntimer_do_tick_aperiodic+0x18d) :| **fn-1072+ 1.864 xnpod_resume_thread+0xe (xnthread_timeout_handler+0x1d) :| **(0x50) 0x00c8 -1070+ 4.699 xnpod_resume_thread+0x2b (xnthread_timeout_handler+0x1d) :| **(0x50) 0x00c8 -1065+ 1.586 xnpod_resume_thread+0x2bf (xnthread_timeout_handler+0x1d) :| **(0x01) 0x1237 -1063+ 2.661 xntimer_do_tick_aperiodic+0x309 (xnpod_announce_tick+0x36) :| **(0x50) 0x00c8 -1061+ 1.263 xnpod_announce_tick+0x4f (xnintr_irq_handler+0x2a) :| **fn-1060+ 1.255 rthal_irq_end+0x8 (xnintr_irq_handler+0x46) :| **fn-1058+ 2.007 enable_8259A_irq+0x9 (rthal_irq_end+0x2e) :| **fn-1056+ 1.924 xnpod_schedule+0xe (xnintr_irq_handler+0x6e) :| **(0x50) 0x00c8 -1054! 15.368 xnpod_schedule+0x53 (xnintr_irq_handler+0x6e) :| **fn-1039! 13.300 __switch_to+0xc (xnpod_schedule+0x689) :| **(0x50) 0x00c8 -1026+ 1.766 xnpod_schedule+0x951 (xnpod_suspend_thread+0x27c) :| **(0x50) 0x00c8 -1024! 18.195 xnpod_suspend_thread+0x2bb (rt_task_sleep+0x54) : **fn-1006+ 1.624 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn-1004+ 1.406 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) : **fn-1003+ 4.323 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e) : **fn -998+ 1.398 __rt_task_sleep+0xa (hisyscall_event+0x23c) : **fn -997+ 1.789 __copy_from_user_ll+0xa (__rt_task_sleep+0x1a) : **fn -995! 15.345 rt_task_sleep+0xa (__rt_task_sleep+0x25) : **fn -9800.879 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn -979+ 1.308 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) : **fn -978+ 2.451 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e) : **fn -975+ 1.631 sys_rtdm_ioctl+0x8 (hisyscall_event+0x23c) : **fn -974+ 1.255 _rtdm_ioctl+0xc (sys_rtdm_ioctl+0x1b) : **fn -972+ 4.180 rtdm_context_get+0xa (_rtdm_ioctl+0x20) : **fn -968+ 1.203 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn -967+ 1.345 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) The '*' mark a stalled domain, root domain on the right. It seems to me that xnpod_suspend_thread() leaves the xeno domain stalled on wake up. This gets recovered somehow later. But here we see a
Re: [Xenomai-core] [BUG?] stalled xeno domain
Jan Kiszka wrote: Hi Philippe, debugging is nice, tracing is still nicer. As you suggested, I extended the tracer with per-domain stall flags (needs some output clean-up, preliminary patch attached nevertheless). And here is the result (full trace attached): :|(0x51) 0x00c8 -1113+ 1.112 __ipipe_sync_stage+0x142 (ipipe_suspend_domain+0x56) :|fn-1112+ 1.789 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x56) :| *(0x50) 0x0064 -1110+ 1.293 __ipipe_sync_stage+0x97 (ipipe_suspend_domain+0x56) :*fn-1109+ 1.398 do_IRQ+0x8 (__ipipe_sync_stage+0xcf) :*fn-1107+ 2.105 __do_IRQ+0xc (do_IRQ+0x21) :*fn-1105+ 1.631 handle_IRQ_event+0xd (__do_IRQ+0x9e) :*fn-1104+ 2.413 timer_interrupt+0x9 (handle_IRQ_event+0x40) :*fn-1101+ 3.022 mark_offset_tsc+0xe (timer_interrupt+0x31) :*fn-1098+ 2.721 do_timer+0x9 (timer_interrupt+0x37) :| *fn-1096+ 2.112 __ipipe_handle_irq+0xe (common_interrupt+0x18) :| *fn-1093+ 1.210 __ipipe_ack_common_irq+0xc (__ipipe_handle_irq+0xc0) :| *(0x50) 0x0064 -1092+ 1.218 __ipipe_ack_common_irq+0x31 (__ipipe_handle_irq+0xc0) :| *fn-1091+ 1.834 mask_and_ack_8259A+0xb (__ipipe_ack_common_irq+0x5d) :| *(0x50) 0x0064 -1089+ 1.345 __ipipe_ack_common_irq+0x9d (__ipipe_handle_irq+0xc0) :| *fn-10880.924 ipipe_suspend_domain+0xb (__ipipe_handle_irq+0x174) :| *(0x51) 0x00c8 -1087+ 1.172 ipipe_suspend_domain+0x26 (__ipipe_handle_irq+0x174) :| *fn-1086+ 2.030 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x56) :| **(0x50) 0x00c8 -1084+ 1.330 __ipipe_sync_stage+0x97 (ipipe_suspend_domain+0x56) :| **fn-10820.879 xnintr_clock_handler+0x8 (__ipipe_sync_stage+0x12b) :| **fn-1082+ 1.218 xnintr_irq_handler+0xb (xnintr_clock_handler+0x18) :| **fn-1080+ 1.233 xnpod_announce_tick+0x9 (xnintr_irq_handler+0x2a) :| **(0x50) 0x00c8 -1079+ 1.736 xnpod_announce_tick+0x20 (xnintr_irq_handler+0x2a) :| **fn-1077+ 2.984 xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0x36) :| **fn-1074+ 2.751 xnthread_timeout_handler+0x8 (xntimer_do_tick_aperiodic+0x18d) :| **fn-1072+ 1.864 xnpod_resume_thread+0xe (xnthread_timeout_handler+0x1d) :| **(0x50) 0x00c8 -1070+ 4.699 xnpod_resume_thread+0x2b (xnthread_timeout_handler+0x1d) :| **(0x50) 0x00c8 -1065+ 1.586 xnpod_resume_thread+0x2bf (xnthread_timeout_handler+0x1d) :| **(0x01) 0x1237 -1063+ 2.661 xntimer_do_tick_aperiodic+0x309 (xnpod_announce_tick+0x36) :| **(0x50) 0x00c8 -1061+ 1.263 xnpod_announce_tick+0x4f (xnintr_irq_handler+0x2a) :| **fn-1060+ 1.255 rthal_irq_end+0x8 (xnintr_irq_handler+0x46) :| **fn-1058+ 2.007 enable_8259A_irq+0x9 (rthal_irq_end+0x2e) :| **fn-1056+ 1.924 xnpod_schedule+0xe (xnintr_irq_handler+0x6e) :| **(0x50) 0x00c8 -1054! 15.368 xnpod_schedule+0x53 (xnintr_irq_handler+0x6e) :| **fn-1039! 13.300 __switch_to+0xc (xnpod_schedule+0x689) :| **(0x50) 0x00c8 -1026+ 1.766 xnpod_schedule+0x951 (xnpod_suspend_thread+0x27c) :| **(0x50) 0x00c8 -1024! 18.195 xnpod_suspend_thread+0x2bb (rt_task_sleep+0x54) : **fn-1006+ 1.624 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn-1004+ 1.406 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) : **fn-1003+ 4.323 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e) : **fn -998+ 1.398 __rt_task_sleep+0xa (hisyscall_event+0x23c) : **fn -997+ 1.789 __copy_from_user_ll+0xa (__rt_task_sleep+0x1a) : **fn -995! 15.345 rt_task_sleep+0xa (__rt_task_sleep+0x25) : **fn -9800.879 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn -979+ 1.308 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) : **fn -978+ 2.451 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e) : **fn -975+ 1.631 sys_rtdm_ioctl+0x8 (hisyscall_event+0x23c) : **fn -974+ 1.255 _rtdm_ioctl+0xc (sys_rtdm_ioctl+0x1b) : **fn -972+ 4.180 rtdm_context_get+0xa (_rtdm_ioctl+0x20) : **fn -968+ 1.203 __ipipe_syscall_root+0x9 (system_call+0x20) : **fn -967+ 1.345 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x58) The '*' mark a stalled domain, root domain on the right. It seems to me that xnpod_suspend_thread() leaves the xeno domain stalled on wake up. This gets recovered