Re: [Xenomai-core] [BUG?] stalled xeno domain

2006-04-17 Thread Philippe Gerum

Jan Kiszka wrote:

Philippe Gerum wrote:


Jan Kiszka wrote:


Philippe Gerum wrote:



Jan Kiszka wrote:



Philippe, do you see any remaining issues, e.g. that the leak
survived
the task termination? Does this have any meaning for correct
driver and
skin code?



The only way I could see this leakage survive a switch transition
would
require it to happen over the root context, not over a primary
context.
Was it the case?




The task had to leave from primary mode. If I forced it to secondary
before terminating, the problem did not show up.



But does the code causing the leakage could have been run by different
contexts in sequence, including the root one?




I don't think so. Bugs in our software aside, there should be no switch
to secondary mode until termination. Moreover, we installed a SIGXCPU
handler, and that one didn't trigger as well.


I just constructed a simple test by placing rthal_local_irq_disable() in
rt_timer_spin and setting up this user space app:

#include stdio.h
#include signal.h
#include native/task.h
#include native/timer.h

RT_TASK task;

void func(void *arg)
{
   rt_timer_spin(0);
}


void terminate(int sig)
{
   printf(joining...\n);
   rt_task_join(task);
   rt_task_delete(task);
   printf(done\n);
}


int main()
{
   signal(SIGINT, terminate);
   rt_task_spawn(task, lockup, 0, 10, T_FPU | T_JOINABLE | T_WARNSW,
 func, NULL);
   pause();
   return 0;
}


Should this lock up (as it currently does) or rather continue to run
normally after the RT-task terminated? BTW, I'm still not sure if we are
hunting shadows (is IRQs off a legal state for user space in some skin?)
or a real problem - i.e. is it worth the time.



I've just tested this frag against the current SVN head, patching
rt_timer_spin() as required, and cannot reproduce the lockup. As



Are you sure that you actually used the modified native skin for the test?



Yep, checked twice.




expected, the incoming root thread reinstates the correct stall bit
(i.e. clears it) after the RT thread terminates. Any chance some
potentially troublesome stuff exists in your setup?




I just re-verified this behaviour on a slightly different setup (still
2.6.15-ipipe-1.2-02, xeno trunk), and I'm going to try this on a third
box with 2.6.16+tracing soon. So far I still have a stuck timer IRQ
after the test.

Jan




--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG?] stalled xeno domain

2006-04-16 Thread Philippe Gerum

Jan Kiszka wrote:

Philippe Gerum wrote:


Jan Kiszka wrote:


Philippe, do you see any remaining issues, e.g. that the leak survived
the task termination? Does this have any meaning for correct driver and
skin code?



The only way I could see this leakage survive a switch transition would
require it to happen over the root context, not over a primary context.
Was it the case?




The task had to leave from primary mode. If I forced it to secondary
before terminating, the problem did not show up.



But does the code causing the leakage could have been run by different
contexts in sequence, including the root one?




I don't think so. Bugs in our software aside, there should be no switch
to secondary mode until termination. Moreover, we installed a SIGXCPU
handler, and that one didn't trigger as well.


I just constructed a simple test by placing rthal_local_irq_disable() in
rt_timer_spin and setting up this user space app:

#include stdio.h
#include signal.h
#include native/task.h
#include native/timer.h

RT_TASK task;

void func(void *arg)
{
rt_timer_spin(0);
}


void terminate(int sig)
{
printf(joining...\n);
rt_task_join(task);
rt_task_delete(task);
printf(done\n);
}


int main()
{
signal(SIGINT, terminate);
rt_task_spawn(task, lockup, 0, 10, T_FPU | T_JOINABLE | T_WARNSW,
  func, NULL);
pause();
return 0;
}


Should this lock up (as it currently does) or rather continue to run
normally after the RT-task terminated? BTW, I'm still not sure if we are
hunting shadows (is IRQs off a legal state for user space in some skin?)
or a real problem - i.e. is it worth the time.



I've just tested this frag against the current SVN head, patching 
rt_timer_spin() as required, and cannot reproduce the lockup. As 
expected, the incoming root thread reinstates the correct stall bit 
(i.e. clears it) after the RT thread terminates. Any chance some 
potentially troublesome stuff exists in your setup?


--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG?] stalled xeno domain

2006-04-10 Thread Philippe Gerum

Jan Kiszka wrote:

Jan Kiszka wrote:


Jan Kiszka wrote:


Hi Philippe,

debugging is nice, tracing is still nicer. As you suggested, I extended
the tracer with per-domain stall flags (needs some output clean-up,
preliminary patch attached nevertheless).

And here is the result (full trace attached):



:|(0x51) 0x00c8 -1113+   1.112  __ipipe_sync_stage+0x142 
(ipipe_suspend_domain+0x56)
:|fn-1112+   1.789  __ipipe_sync_stage+0xe 
(ipipe_suspend_domain+0x56)
:|   *(0x50) 0x0064 -1110+   1.293  __ipipe_sync_stage+0x97 
(ipipe_suspend_domain+0x56)
:*fn-1109+   1.398  do_IRQ+0x8 (__ipipe_sync_stage+0xcf)
:*fn-1107+   2.105  __do_IRQ+0xc (do_IRQ+0x21)
:*fn-1105+   1.631  handle_IRQ_event+0xd (__do_IRQ+0x9e)
:*fn-1104+   2.413  timer_interrupt+0x9 
(handle_IRQ_event+0x40)
:*fn-1101+   3.022  mark_offset_tsc+0xe 
(timer_interrupt+0x31)
:*fn-1098+   2.721  do_timer+0x9 (timer_interrupt+0x37)
:|   *fn-1096+   2.112  __ipipe_handle_irq+0xe 
(common_interrupt+0x18)
:|   *fn-1093+   1.210  __ipipe_ack_common_irq+0xc 
(__ipipe_handle_irq+0xc0)
:|   *(0x50) 0x0064 -1092+   1.218  __ipipe_ack_common_irq+0x31 
(__ipipe_handle_irq+0xc0)
:|   *fn-1091+   1.834  mask_and_ack_8259A+0xb 
(__ipipe_ack_common_irq+0x5d)
:|   *(0x50) 0x0064 -1089+   1.345  __ipipe_ack_common_irq+0x9d 
(__ipipe_handle_irq+0xc0)
:|   *fn-10880.924  ipipe_suspend_domain+0xb 
(__ipipe_handle_irq+0x174)
:|   *(0x51) 0x00c8 -1087+   1.172  ipipe_suspend_domain+0x26 
(__ipipe_handle_irq+0x174)
:|   *fn-1086+   2.030  __ipipe_sync_stage+0xe 
(ipipe_suspend_domain+0x56)
:|  **(0x50) 0x00c8 -1084+   1.330  __ipipe_sync_stage+0x97 
(ipipe_suspend_domain+0x56)
:|  **fn-10820.879  xnintr_clock_handler+0x8 
(__ipipe_sync_stage+0x12b)
:|  **fn-1082+   1.218  xnintr_irq_handler+0xb 
(xnintr_clock_handler+0x18)
:|  **fn-1080+   1.233  xnpod_announce_tick+0x9 
(xnintr_irq_handler+0x2a)
:|  **(0x50) 0x00c8 -1079+   1.736  xnpod_announce_tick+0x20 
(xnintr_irq_handler+0x2a)
:|  **fn-1077+   2.984  xntimer_do_tick_aperiodic+0xe 
(xnpod_announce_tick+0x36)
:|  **fn-1074+   2.751  xnthread_timeout_handler+0x8 
(xntimer_do_tick_aperiodic+0x18d)
:|  **fn-1072+   1.864  xnpod_resume_thread+0xe 
(xnthread_timeout_handler+0x1d)
:|  **(0x50) 0x00c8 -1070+   4.699  xnpod_resume_thread+0x2b 
(xnthread_timeout_handler+0x1d)
:|  **(0x50) 0x00c8 -1065+   1.586  xnpod_resume_thread+0x2bf 
(xnthread_timeout_handler+0x1d)
:|  **(0x01) 0x1237 -1063+   2.661  xntimer_do_tick_aperiodic+0x309 
(xnpod_announce_tick+0x36)
:|  **(0x50) 0x00c8 -1061+   1.263  xnpod_announce_tick+0x4f 
(xnintr_irq_handler+0x2a)
:|  **fn-1060+   1.255  rthal_irq_end+0x8 
(xnintr_irq_handler+0x46)
:|  **fn-1058+   2.007  enable_8259A_irq+0x9 
(rthal_irq_end+0x2e)
:|  **fn-1056+   1.924  xnpod_schedule+0xe 
(xnintr_irq_handler+0x6e)
:|  **(0x50) 0x00c8 -1054!  15.368  xnpod_schedule+0x53 
(xnintr_irq_handler+0x6e)
:|  **fn-1039!  13.300  __switch_to+0xc (xnpod_schedule+0x689)
:|  **(0x50) 0x00c8 -1026+   1.766  xnpod_schedule+0x951 
(xnpod_suspend_thread+0x27c)
:|  **(0x50) 0x00c8 -1024!  18.195  xnpod_suspend_thread+0x2bb 
(rt_task_sleep+0x54)
:   **fn-1006+   1.624  __ipipe_syscall_root+0x9 
(system_call+0x20)
:   **fn-1004+   1.406  __ipipe_dispatch_event+0xe 
(__ipipe_syscall_root+0x58)
:   **fn-1003+   4.323  hisyscall_event+0xe 
(__ipipe_dispatch_event+0x5e)
:   **fn -998+   1.398  __rt_task_sleep+0xa 
(hisyscall_event+0x23c)
:   **fn -997+   1.789  __copy_from_user_ll+0xa 
(__rt_task_sleep+0x1a)
:   **fn -995!  15.345  rt_task_sleep+0xa (__rt_task_sleep+0x25)
:   **fn -9800.879  __ipipe_syscall_root+0x9 
(system_call+0x20)
:   **fn -979+   1.308  __ipipe_dispatch_event+0xe 
(__ipipe_syscall_root+0x58)
:   **fn -978+   2.451  hisyscall_event+0xe 
(__ipipe_dispatch_event+0x5e)
:   **fn -975+   1.631  sys_rtdm_ioctl+0x8 
(hisyscall_event+0x23c)
:   **fn -974+   1.255  _rtdm_ioctl+0xc (sys_rtdm_ioctl+0x1b)
:   **fn -972+   4.180  rtdm_context_get+0xa (_rtdm_ioctl+0x20)
:   **fn -968+   1.203  __ipipe_syscall_root+0x9 
(system_call+0x20)
:   **fn -967+   1.345  __ipipe_dispatch_event+0xe 
(__ipipe_syscall_root+0x58)


The '*' mark a stalled domain, root domain on the right. It seems to me
that xnpod_suspend_thread() leaves the xeno domain stalled on wake up.
This gets recovered somehow later.

But here we see a different effect which finally 

Re: [Xenomai-core] [BUG?] stalled xeno domain

2006-04-10 Thread Jan Kiszka
Philippe Gerum wrote:
 Jan Kiszka wrote:
 Jan Kiszka wrote:

 Jan Kiszka wrote:

 Hi Philippe,

 debugging is nice, tracing is still nicer. As you suggested, I extended
 the tracer with per-domain stall flags (needs some output clean-up,
 preliminary patch attached nevertheless).

 And here is the result (full trace attached):


 :|(0x51) 0x00c8 -1113+   1.112  __ipipe_sync_stage+0x142
 (ipipe_suspend_domain+0x56)
 :|fn-1112+   1.789  __ipipe_sync_stage+0xe
 (ipipe_suspend_domain+0x56)
 :|   *(0x50) 0x0064 -1110+   1.293  __ipipe_sync_stage+0x97
 (ipipe_suspend_domain+0x56)
 :*fn-1109+   1.398  do_IRQ+0x8
 (__ipipe_sync_stage+0xcf)
 :*fn-1107+   2.105  __do_IRQ+0xc (do_IRQ+0x21)
 :*fn-1105+   1.631  handle_IRQ_event+0xd
 (__do_IRQ+0x9e)
 :*fn-1104+   2.413  timer_interrupt+0x9
 (handle_IRQ_event+0x40)
 :*fn-1101+   3.022  mark_offset_tsc+0xe
 (timer_interrupt+0x31)
 :*fn-1098+   2.721  do_timer+0x9
 (timer_interrupt+0x37)
 :|   *fn-1096+   2.112  __ipipe_handle_irq+0xe
 (common_interrupt+0x18)
 :|   *fn-1093+   1.210  __ipipe_ack_common_irq+0xc
 (__ipipe_handle_irq+0xc0)
 :|   *(0x50) 0x0064 -1092+   1.218  __ipipe_ack_common_irq+0x31
 (__ipipe_handle_irq+0xc0)
 :|   *fn-1091+   1.834  mask_and_ack_8259A+0xb
 (__ipipe_ack_common_irq+0x5d)
 :|   *(0x50) 0x0064 -1089+   1.345  __ipipe_ack_common_irq+0x9d
 (__ipipe_handle_irq+0xc0)
 :|   *fn-10880.924  ipipe_suspend_domain+0xb
 (__ipipe_handle_irq+0x174)
 :|   *(0x51) 0x00c8 -1087+   1.172  ipipe_suspend_domain+0x26
 (__ipipe_handle_irq+0x174)
 :|   *fn-1086+   2.030  __ipipe_sync_stage+0xe
 (ipipe_suspend_domain+0x56)
 :|  **(0x50) 0x00c8 -1084+   1.330  __ipipe_sync_stage+0x97
 (ipipe_suspend_domain+0x56)
 :|  **fn-10820.879  xnintr_clock_handler+0x8
 (__ipipe_sync_stage+0x12b)
 :|  **fn-1082+   1.218  xnintr_irq_handler+0xb
 (xnintr_clock_handler+0x18)
 :|  **fn-1080+   1.233  xnpod_announce_tick+0x9
 (xnintr_irq_handler+0x2a)
 :|  **(0x50) 0x00c8 -1079+   1.736  xnpod_announce_tick+0x20
 (xnintr_irq_handler+0x2a)
 :|  **fn-1077+   2.984 
 xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0x36)
 :|  **fn-1074+   2.751 
 xnthread_timeout_handler+0x8 (xntimer_do_tick_aperiodic+0x18d)
 :|  **fn-1072+   1.864  xnpod_resume_thread+0xe
 (xnthread_timeout_handler+0x1d)
 :|  **(0x50) 0x00c8 -1070+   4.699  xnpod_resume_thread+0x2b
 (xnthread_timeout_handler+0x1d)
 :|  **(0x50) 0x00c8 -1065+   1.586  xnpod_resume_thread+0x2bf
 (xnthread_timeout_handler+0x1d)
 :|  **(0x01) 0x1237 -1063+   2.661 
 xntimer_do_tick_aperiodic+0x309 (xnpod_announce_tick+0x36)
 :|  **(0x50) 0x00c8 -1061+   1.263  xnpod_announce_tick+0x4f
 (xnintr_irq_handler+0x2a)
 :|  **fn-1060+   1.255  rthal_irq_end+0x8
 (xnintr_irq_handler+0x46)
 :|  **fn-1058+   2.007  enable_8259A_irq+0x9
 (rthal_irq_end+0x2e)
 :|  **fn-1056+   1.924  xnpod_schedule+0xe
 (xnintr_irq_handler+0x6e)
 :|  **(0x50) 0x00c8 -1054!  15.368  xnpod_schedule+0x53
 (xnintr_irq_handler+0x6e)
 :|  **fn-1039!  13.300  __switch_to+0xc
 (xnpod_schedule+0x689)
 :|  **(0x50) 0x00c8 -1026+   1.766  xnpod_schedule+0x951
 (xnpod_suspend_thread+0x27c)
 :|  **(0x50) 0x00c8 -1024!  18.195  xnpod_suspend_thread+0x2bb
 (rt_task_sleep+0x54)
 :   **fn-1006+   1.624  __ipipe_syscall_root+0x9
 (system_call+0x20)
 :   **fn-1004+   1.406  __ipipe_dispatch_event+0xe
 (__ipipe_syscall_root+0x58)
 :   **fn-1003+   4.323  hisyscall_event+0xe
 (__ipipe_dispatch_event+0x5e)
 :   **fn -998+   1.398  __rt_task_sleep+0xa
 (hisyscall_event+0x23c)
 :   **fn -997+   1.789  __copy_from_user_ll+0xa
 (__rt_task_sleep+0x1a)
 :   **fn -995!  15.345  rt_task_sleep+0xa
 (__rt_task_sleep+0x25)
 :   **fn -9800.879  __ipipe_syscall_root+0x9
 (system_call+0x20)
 :   **fn -979+   1.308  __ipipe_dispatch_event+0xe
 (__ipipe_syscall_root+0x58)
 :   **fn -978+   2.451  hisyscall_event+0xe
 (__ipipe_dispatch_event+0x5e)
 :   **fn -975+   1.631  sys_rtdm_ioctl+0x8
 (hisyscall_event+0x23c)
 :   **fn -974+   1.255  _rtdm_ioctl+0xc
 (sys_rtdm_ioctl+0x1b)
 :   **fn -972+   4.180  rtdm_context_get+0xa
 (_rtdm_ioctl+0x20)
 :   **fn -968+   1.203  __ipipe_syscall_root+0x9
 (system_call+0x20)
 :   **fn -967+   1.345  __ipipe_dispatch_event+0xe
 (__ipipe_syscall_root+0x58)

 The '*' mark a stalled domain, root domain on the right. It seems to me
 that xnpod_suspend_thread() leaves the xeno domain stalled on wake up.
 

Re: [Xenomai-core] [BUG?] stalled xeno domain

2006-04-10 Thread Philippe Gerum

Jan Kiszka wrote:


Philippe, do you see any remaining issues, e.g. that the leak survived
the task termination? Does this have any meaning for correct driver and
skin code?



The only way I could see this leakage survive a switch transition would
require it to happen over the root context, not over a primary context.
Was it the case?




The task had to leave from primary mode. If I forced it to secondary
before terminating, the problem did not show up.



But does the code causing the leakage could have been run by different 
contexts in sequence, including the root one?


--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG?] stalled xeno domain

2006-04-10 Thread Philippe Gerum

Jan Kiszka wrote:

Philippe Gerum wrote:


Jan Kiszka wrote:


Philippe, do you see any remaining issues, e.g. that the leak survived
the task termination? Does this have any meaning for correct driver and
skin code?



The only way I could see this leakage survive a switch transition would
require it to happen over the root context, not over a primary context.
Was it the case?




The task had to leave from primary mode. If I forced it to secondary
before terminating, the problem did not show up.



But does the code causing the leakage could have been run by different
contexts in sequence, including the root one?




I don't think so. Bugs in our software aside, there should be no switch
to secondary mode until termination. Moreover, we installed a SIGXCPU
handler, and that one didn't trigger as well.


I just constructed a simple test by placing rthal_local_irq_disable() in
rt_timer_spin and setting up this user space app:

#include stdio.h
#include signal.h
#include native/task.h
#include native/timer.h

RT_TASK task;

void func(void *arg)
{
rt_timer_spin(0);
}


void terminate(int sig)
{
printf(joining...\n);
rt_task_join(task);
rt_task_delete(task);
printf(done\n);
}


int main()
{
signal(SIGINT, terminate);
rt_task_spawn(task, lockup, 0, 10, T_FPU | T_JOINABLE | T_WARNSW,
  func, NULL);
pause();
return 0;
}


Should this lock up (as it currently does) or rather continue to run
normally after the RT-task terminated? BTW, I'm still not sure if we are
hunting shadows (is IRQs off a legal state for user space in some skin?)
or a real problem - i.e. is it worth the time.



IRQS off in user-space - aside of the particular semantics introduced by 
the interrupt shielding - is not a correct state, but it is for kernel 
based RT threads, so I would expect the real-time core to be robust wrt 
this kind of situation. I'm going to put this issue on my work queue 
anyway, I don't like unexplained software thingies getting too close to 
the Twilight Zone...


--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG?] stalled xeno domain

2006-04-09 Thread Philippe Gerum

Jan Kiszka wrote:

Jan Kiszka wrote:


Hi Philippe,

debugging is nice, tracing is still nicer. As you suggested, I extended
the tracer with per-domain stall flags (needs some output clean-up,
preliminary patch attached nevertheless).

And here is the result (full trace attached):



:|(0x51) 0x00c8 -1113+   1.112  __ipipe_sync_stage+0x142 
(ipipe_suspend_domain+0x56)
:|fn-1112+   1.789  __ipipe_sync_stage+0xe 
(ipipe_suspend_domain+0x56)
:|   *(0x50) 0x0064 -1110+   1.293  __ipipe_sync_stage+0x97 
(ipipe_suspend_domain+0x56)
:*fn-1109+   1.398  do_IRQ+0x8 (__ipipe_sync_stage+0xcf)
:*fn-1107+   2.105  __do_IRQ+0xc (do_IRQ+0x21)
:*fn-1105+   1.631  handle_IRQ_event+0xd (__do_IRQ+0x9e)
:*fn-1104+   2.413  timer_interrupt+0x9 
(handle_IRQ_event+0x40)
:*fn-1101+   3.022  mark_offset_tsc+0xe 
(timer_interrupt+0x31)
:*fn-1098+   2.721  do_timer+0x9 (timer_interrupt+0x37)
:|   *fn-1096+   2.112  __ipipe_handle_irq+0xe 
(common_interrupt+0x18)
:|   *fn-1093+   1.210  __ipipe_ack_common_irq+0xc 
(__ipipe_handle_irq+0xc0)
:|   *(0x50) 0x0064 -1092+   1.218  __ipipe_ack_common_irq+0x31 
(__ipipe_handle_irq+0xc0)
:|   *fn-1091+   1.834  mask_and_ack_8259A+0xb 
(__ipipe_ack_common_irq+0x5d)
:|   *(0x50) 0x0064 -1089+   1.345  __ipipe_ack_common_irq+0x9d 
(__ipipe_handle_irq+0xc0)
:|   *fn-10880.924  ipipe_suspend_domain+0xb 
(__ipipe_handle_irq+0x174)
:|   *(0x51) 0x00c8 -1087+   1.172  ipipe_suspend_domain+0x26 
(__ipipe_handle_irq+0x174)
:|   *fn-1086+   2.030  __ipipe_sync_stage+0xe 
(ipipe_suspend_domain+0x56)
:|  **(0x50) 0x00c8 -1084+   1.330  __ipipe_sync_stage+0x97 
(ipipe_suspend_domain+0x56)
:|  **fn-10820.879  xnintr_clock_handler+0x8 
(__ipipe_sync_stage+0x12b)
:|  **fn-1082+   1.218  xnintr_irq_handler+0xb 
(xnintr_clock_handler+0x18)
:|  **fn-1080+   1.233  xnpod_announce_tick+0x9 
(xnintr_irq_handler+0x2a)
:|  **(0x50) 0x00c8 -1079+   1.736  xnpod_announce_tick+0x20 
(xnintr_irq_handler+0x2a)
:|  **fn-1077+   2.984  xntimer_do_tick_aperiodic+0xe 
(xnpod_announce_tick+0x36)
:|  **fn-1074+   2.751  xnthread_timeout_handler+0x8 
(xntimer_do_tick_aperiodic+0x18d)
:|  **fn-1072+   1.864  xnpod_resume_thread+0xe 
(xnthread_timeout_handler+0x1d)
:|  **(0x50) 0x00c8 -1070+   4.699  xnpod_resume_thread+0x2b 
(xnthread_timeout_handler+0x1d)
:|  **(0x50) 0x00c8 -1065+   1.586  xnpod_resume_thread+0x2bf 
(xnthread_timeout_handler+0x1d)
:|  **(0x01) 0x1237 -1063+   2.661  xntimer_do_tick_aperiodic+0x309 
(xnpod_announce_tick+0x36)
:|  **(0x50) 0x00c8 -1061+   1.263  xnpod_announce_tick+0x4f 
(xnintr_irq_handler+0x2a)
:|  **fn-1060+   1.255  rthal_irq_end+0x8 
(xnintr_irq_handler+0x46)
:|  **fn-1058+   2.007  enable_8259A_irq+0x9 
(rthal_irq_end+0x2e)
:|  **fn-1056+   1.924  xnpod_schedule+0xe 
(xnintr_irq_handler+0x6e)
:|  **(0x50) 0x00c8 -1054!  15.368  xnpod_schedule+0x53 
(xnintr_irq_handler+0x6e)
:|  **fn-1039!  13.300  __switch_to+0xc (xnpod_schedule+0x689)
:|  **(0x50) 0x00c8 -1026+   1.766  xnpod_schedule+0x951 
(xnpod_suspend_thread+0x27c)
:|  **(0x50) 0x00c8 -1024!  18.195  xnpod_suspend_thread+0x2bb 
(rt_task_sleep+0x54)
:   **fn-1006+   1.624  __ipipe_syscall_root+0x9 
(system_call+0x20)
:   **fn-1004+   1.406  __ipipe_dispatch_event+0xe 
(__ipipe_syscall_root+0x58)
:   **fn-1003+   4.323  hisyscall_event+0xe 
(__ipipe_dispatch_event+0x5e)
:   **fn -998+   1.398  __rt_task_sleep+0xa 
(hisyscall_event+0x23c)
:   **fn -997+   1.789  __copy_from_user_ll+0xa 
(__rt_task_sleep+0x1a)
:   **fn -995!  15.345  rt_task_sleep+0xa (__rt_task_sleep+0x25)
:   **fn -9800.879  __ipipe_syscall_root+0x9 
(system_call+0x20)
:   **fn -979+   1.308  __ipipe_dispatch_event+0xe 
(__ipipe_syscall_root+0x58)
:   **fn -978+   2.451  hisyscall_event+0xe 
(__ipipe_dispatch_event+0x5e)
:   **fn -975+   1.631  sys_rtdm_ioctl+0x8 
(hisyscall_event+0x23c)
:   **fn -974+   1.255  _rtdm_ioctl+0xc (sys_rtdm_ioctl+0x1b)
:   **fn -972+   4.180  rtdm_context_get+0xa (_rtdm_ioctl+0x20)
:   **fn -968+   1.203  __ipipe_syscall_root+0x9 
(system_call+0x20)
:   **fn -967+   1.345  __ipipe_dispatch_event+0xe 
(__ipipe_syscall_root+0x58)


The '*' mark a stalled domain, root domain on the right. It seems to me
that xnpod_suspend_thread() leaves the xeno domain stalled on wake up.
This gets recovered somehow later.

But here we see a different effect which finally causes the tick to get

[Xenomai-core] [BUG?] stalled xeno domain

2006-04-08 Thread Jan Kiszka
Hi Philippe,

debugging is nice, tracing is still nicer. As you suggested, I extended
the tracer with per-domain stall flags (needs some output clean-up,
preliminary patch attached nevertheless).

And here is the result (full trace attached):

 :|(0x51) 0x00c8 -1113+   1.112  __ipipe_sync_stage+0x142 
 (ipipe_suspend_domain+0x56)
 :|fn-1112+   1.789  __ipipe_sync_stage+0xe 
 (ipipe_suspend_domain+0x56)
 :|   *(0x50) 0x0064 -1110+   1.293  __ipipe_sync_stage+0x97 
 (ipipe_suspend_domain+0x56)
 :*fn-1109+   1.398  do_IRQ+0x8 (__ipipe_sync_stage+0xcf)
 :*fn-1107+   2.105  __do_IRQ+0xc (do_IRQ+0x21)
 :*fn-1105+   1.631  handle_IRQ_event+0xd (__do_IRQ+0x9e)
 :*fn-1104+   2.413  timer_interrupt+0x9 
 (handle_IRQ_event+0x40)
 :*fn-1101+   3.022  mark_offset_tsc+0xe 
 (timer_interrupt+0x31)
 :*fn-1098+   2.721  do_timer+0x9 (timer_interrupt+0x37)
 :|   *fn-1096+   2.112  __ipipe_handle_irq+0xe 
 (common_interrupt+0x18)
 :|   *fn-1093+   1.210  __ipipe_ack_common_irq+0xc 
 (__ipipe_handle_irq+0xc0)
 :|   *(0x50) 0x0064 -1092+   1.218  __ipipe_ack_common_irq+0x31 
 (__ipipe_handle_irq+0xc0)
 :|   *fn-1091+   1.834  mask_and_ack_8259A+0xb 
 (__ipipe_ack_common_irq+0x5d)
 :|   *(0x50) 0x0064 -1089+   1.345  __ipipe_ack_common_irq+0x9d 
 (__ipipe_handle_irq+0xc0)
 :|   *fn-10880.924  ipipe_suspend_domain+0xb 
 (__ipipe_handle_irq+0x174)
 :|   *(0x51) 0x00c8 -1087+   1.172  ipipe_suspend_domain+0x26 
 (__ipipe_handle_irq+0x174)
 :|   *fn-1086+   2.030  __ipipe_sync_stage+0xe 
 (ipipe_suspend_domain+0x56)
 :|  **(0x50) 0x00c8 -1084+   1.330  __ipipe_sync_stage+0x97 
 (ipipe_suspend_domain+0x56)
 :|  **fn-10820.879  xnintr_clock_handler+0x8 
 (__ipipe_sync_stage+0x12b)
 :|  **fn-1082+   1.218  xnintr_irq_handler+0xb 
 (xnintr_clock_handler+0x18)
 :|  **fn-1080+   1.233  xnpod_announce_tick+0x9 
 (xnintr_irq_handler+0x2a)
 :|  **(0x50) 0x00c8 -1079+   1.736  xnpod_announce_tick+0x20 
 (xnintr_irq_handler+0x2a)
 :|  **fn-1077+   2.984  xntimer_do_tick_aperiodic+0xe 
 (xnpod_announce_tick+0x36)
 :|  **fn-1074+   2.751  xnthread_timeout_handler+0x8 
 (xntimer_do_tick_aperiodic+0x18d)
 :|  **fn-1072+   1.864  xnpod_resume_thread+0xe 
 (xnthread_timeout_handler+0x1d)
 :|  **(0x50) 0x00c8 -1070+   4.699  xnpod_resume_thread+0x2b 
 (xnthread_timeout_handler+0x1d)
 :|  **(0x50) 0x00c8 -1065+   1.586  xnpod_resume_thread+0x2bf 
 (xnthread_timeout_handler+0x1d)
 :|  **(0x01) 0x1237 -1063+   2.661  xntimer_do_tick_aperiodic+0x309 
 (xnpod_announce_tick+0x36)
 :|  **(0x50) 0x00c8 -1061+   1.263  xnpod_announce_tick+0x4f 
 (xnintr_irq_handler+0x2a)
 :|  **fn-1060+   1.255  rthal_irq_end+0x8 
 (xnintr_irq_handler+0x46)
 :|  **fn-1058+   2.007  enable_8259A_irq+0x9 
 (rthal_irq_end+0x2e)
 :|  **fn-1056+   1.924  xnpod_schedule+0xe 
 (xnintr_irq_handler+0x6e)
 :|  **(0x50) 0x00c8 -1054!  15.368  xnpod_schedule+0x53 
 (xnintr_irq_handler+0x6e)
 :|  **fn-1039!  13.300  __switch_to+0xc (xnpod_schedule+0x689)
 :|  **(0x50) 0x00c8 -1026+   1.766  xnpod_schedule+0x951 
 (xnpod_suspend_thread+0x27c)
 :|  **(0x50) 0x00c8 -1024!  18.195  xnpod_suspend_thread+0x2bb 
 (rt_task_sleep+0x54)
 :   **fn-1006+   1.624  __ipipe_syscall_root+0x9 
 (system_call+0x20)
 :   **fn-1004+   1.406  __ipipe_dispatch_event+0xe 
 (__ipipe_syscall_root+0x58)
 :   **fn-1003+   4.323  hisyscall_event+0xe 
 (__ipipe_dispatch_event+0x5e)
 :   **fn -998+   1.398  __rt_task_sleep+0xa 
 (hisyscall_event+0x23c)
 :   **fn -997+   1.789  __copy_from_user_ll+0xa 
 (__rt_task_sleep+0x1a)
 :   **fn -995!  15.345  rt_task_sleep+0xa 
 (__rt_task_sleep+0x25)
 :   **fn -9800.879  __ipipe_syscall_root+0x9 
 (system_call+0x20)
 :   **fn -979+   1.308  __ipipe_dispatch_event+0xe 
 (__ipipe_syscall_root+0x58)
 :   **fn -978+   2.451  hisyscall_event+0xe 
 (__ipipe_dispatch_event+0x5e)
 :   **fn -975+   1.631  sys_rtdm_ioctl+0x8 
 (hisyscall_event+0x23c)
 :   **fn -974+   1.255  _rtdm_ioctl+0xc (sys_rtdm_ioctl+0x1b)
 :   **fn -972+   4.180  rtdm_context_get+0xa 
 (_rtdm_ioctl+0x20)
 :   **fn -968+   1.203  __ipipe_syscall_root+0x9 
 (system_call+0x20)
 :   **fn -967+   1.345  __ipipe_dispatch_event+0xe 
 (__ipipe_syscall_root+0x58)

The '*' mark a stalled domain, root domain on the right. It seems to me
that xnpod_suspend_thread() leaves the xeno domain stalled on wake up.
This gets recovered somehow later.

But here we see a 

Re: [Xenomai-core] [BUG?] stalled xeno domain

2006-04-08 Thread Jan Kiszka
Jan Kiszka wrote:
 Hi Philippe,
 
 debugging is nice, tracing is still nicer. As you suggested, I extended
 the tracer with per-domain stall flags (needs some output clean-up,
 preliminary patch attached nevertheless).
 
 And here is the result (full trace attached):
 
 :|(0x51) 0x00c8 -1113+   1.112  __ipipe_sync_stage+0x142 
 (ipipe_suspend_domain+0x56)
 :|fn-1112+   1.789  __ipipe_sync_stage+0xe 
 (ipipe_suspend_domain+0x56)
 :|   *(0x50) 0x0064 -1110+   1.293  __ipipe_sync_stage+0x97 
 (ipipe_suspend_domain+0x56)
 :*fn-1109+   1.398  do_IRQ+0x8 (__ipipe_sync_stage+0xcf)
 :*fn-1107+   2.105  __do_IRQ+0xc (do_IRQ+0x21)
 :*fn-1105+   1.631  handle_IRQ_event+0xd (__do_IRQ+0x9e)
 :*fn-1104+   2.413  timer_interrupt+0x9 
 (handle_IRQ_event+0x40)
 :*fn-1101+   3.022  mark_offset_tsc+0xe 
 (timer_interrupt+0x31)
 :*fn-1098+   2.721  do_timer+0x9 (timer_interrupt+0x37)
 :|   *fn-1096+   2.112  __ipipe_handle_irq+0xe 
 (common_interrupt+0x18)
 :|   *fn-1093+   1.210  __ipipe_ack_common_irq+0xc 
 (__ipipe_handle_irq+0xc0)
 :|   *(0x50) 0x0064 -1092+   1.218  __ipipe_ack_common_irq+0x31 
 (__ipipe_handle_irq+0xc0)
 :|   *fn-1091+   1.834  mask_and_ack_8259A+0xb 
 (__ipipe_ack_common_irq+0x5d)
 :|   *(0x50) 0x0064 -1089+   1.345  __ipipe_ack_common_irq+0x9d 
 (__ipipe_handle_irq+0xc0)
 :|   *fn-10880.924  ipipe_suspend_domain+0xb 
 (__ipipe_handle_irq+0x174)
 :|   *(0x51) 0x00c8 -1087+   1.172  ipipe_suspend_domain+0x26 
 (__ipipe_handle_irq+0x174)
 :|   *fn-1086+   2.030  __ipipe_sync_stage+0xe 
 (ipipe_suspend_domain+0x56)
 :|  **(0x50) 0x00c8 -1084+   1.330  __ipipe_sync_stage+0x97 
 (ipipe_suspend_domain+0x56)
 :|  **fn-10820.879  xnintr_clock_handler+0x8 
 (__ipipe_sync_stage+0x12b)
 :|  **fn-1082+   1.218  xnintr_irq_handler+0xb 
 (xnintr_clock_handler+0x18)
 :|  **fn-1080+   1.233  xnpod_announce_tick+0x9 
 (xnintr_irq_handler+0x2a)
 :|  **(0x50) 0x00c8 -1079+   1.736  xnpod_announce_tick+0x20 
 (xnintr_irq_handler+0x2a)
 :|  **fn-1077+   2.984  xntimer_do_tick_aperiodic+0xe 
 (xnpod_announce_tick+0x36)
 :|  **fn-1074+   2.751  xnthread_timeout_handler+0x8 
 (xntimer_do_tick_aperiodic+0x18d)
 :|  **fn-1072+   1.864  xnpod_resume_thread+0xe 
 (xnthread_timeout_handler+0x1d)
 :|  **(0x50) 0x00c8 -1070+   4.699  xnpod_resume_thread+0x2b 
 (xnthread_timeout_handler+0x1d)
 :|  **(0x50) 0x00c8 -1065+   1.586  xnpod_resume_thread+0x2bf 
 (xnthread_timeout_handler+0x1d)
 :|  **(0x01) 0x1237 -1063+   2.661  xntimer_do_tick_aperiodic+0x309 
 (xnpod_announce_tick+0x36)
 :|  **(0x50) 0x00c8 -1061+   1.263  xnpod_announce_tick+0x4f 
 (xnintr_irq_handler+0x2a)
 :|  **fn-1060+   1.255  rthal_irq_end+0x8 
 (xnintr_irq_handler+0x46)
 :|  **fn-1058+   2.007  enable_8259A_irq+0x9 
 (rthal_irq_end+0x2e)
 :|  **fn-1056+   1.924  xnpod_schedule+0xe 
 (xnintr_irq_handler+0x6e)
 :|  **(0x50) 0x00c8 -1054!  15.368  xnpod_schedule+0x53 
 (xnintr_irq_handler+0x6e)
 :|  **fn-1039!  13.300  __switch_to+0xc 
 (xnpod_schedule+0x689)
 :|  **(0x50) 0x00c8 -1026+   1.766  xnpod_schedule+0x951 
 (xnpod_suspend_thread+0x27c)
 :|  **(0x50) 0x00c8 -1024!  18.195  xnpod_suspend_thread+0x2bb 
 (rt_task_sleep+0x54)
 :   **fn-1006+   1.624  __ipipe_syscall_root+0x9 
 (system_call+0x20)
 :   **fn-1004+   1.406  __ipipe_dispatch_event+0xe 
 (__ipipe_syscall_root+0x58)
 :   **fn-1003+   4.323  hisyscall_event+0xe 
 (__ipipe_dispatch_event+0x5e)
 :   **fn -998+   1.398  __rt_task_sleep+0xa 
 (hisyscall_event+0x23c)
 :   **fn -997+   1.789  __copy_from_user_ll+0xa 
 (__rt_task_sleep+0x1a)
 :   **fn -995!  15.345  rt_task_sleep+0xa 
 (__rt_task_sleep+0x25)
 :   **fn -9800.879  __ipipe_syscall_root+0x9 
 (system_call+0x20)
 :   **fn -979+   1.308  __ipipe_dispatch_event+0xe 
 (__ipipe_syscall_root+0x58)
 :   **fn -978+   2.451  hisyscall_event+0xe 
 (__ipipe_dispatch_event+0x5e)
 :   **fn -975+   1.631  sys_rtdm_ioctl+0x8 
 (hisyscall_event+0x23c)
 :   **fn -974+   1.255  _rtdm_ioctl+0xc (sys_rtdm_ioctl+0x1b)
 :   **fn -972+   4.180  rtdm_context_get+0xa 
 (_rtdm_ioctl+0x20)
 :   **fn -968+   1.203  __ipipe_syscall_root+0x9 
 (system_call+0x20)
 :   **fn -967+   1.345  __ipipe_dispatch_event+0xe 
 (__ipipe_syscall_root+0x58)
 
 The '*' mark a stalled domain, root domain on the right. It seems to me
 that xnpod_suspend_thread() leaves the xeno domain stalled on wake up.
 This gets recovered