Re: [Xenomai-core] Scheduling while atomic

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Jan Kiszka wrote:


...
[Update] While writing this mail and letting your test run for a while,
I *did* get a hard lock-up. Hold on, digging deeper...




And here are its last words, spoken via serial console:

c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020
   c012e564 0022  0246 c30d1a90 c4866ce0 0033
c482
   c482a360 c4866ca0  c48293a4 c48524e1  
0002
Call Trace:
 [] __ipipe_dispatch_event+0x56/0xdd
 [] e100_hw_init+0x3ad/0xa81 [e100]
 [] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus]
 [] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus]
 [] rt_sem_p+0xa6/0x10a [xeno_native]
 [] __rt_sem_p+0x5d/0x66 [xeno_native]
 [] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus]
 [] __ipipe_dispatch_event+0x56/0xdd
 [] __ipipe_syscall_root+0x53/0xbe
 [] system_call+0x20/0x41
Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082,
sig=0, prev=gatekeeper/0[809])
 CPU  PIDPRI  TIMEOUT  STAT  NAME


0  0  30   000500080  ROOT


   0  86430   000300180  task0
   0  86529   000300288  task1
   0  8631000300082  main
Timer: oneshot [tickval=1 ns, elapsed=175144731477]

c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500
   c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022
0001
   c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610
c030cd80
Call Trace:
 [] __ipipe_dispatch_event+0x56/0xdd
 [] schedule+0x3ef/0x5ed
 [] gatekeeper_thread+0x0/0x179 [xeno_nucleus]
 [] gatekeeper_thread+0x9a/0x179 [xeno_nucleus]
 [] default_wake_function+0x0/0x12
 [] kthread+0x68/0x95
 [] kthread+0x0/0x95
 [] kernel_thread_helper+0x5/0xb

Any bells already ringing?


Yes; the bad news is that this looks like the same bug than you reported recently, 
which I only partially fixed, it seems. xnshadow_harden() is still not working 
properly under certain preemption situation induced by CONFIG_PREEMPT, and the 
hardening thread is likely unexpectedly moved back to the Linux runqueue while 
transitioning to Xenomai. The good news is that it's a well identified issue, at 
least...




Will try Gilles' patch now...

Jan





___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core



--

Philippe.



Re: [Xenomai-core] Scheduling while atomic

2006-01-29 Thread Philippe Gerum

Jan Kiszka wrote:

Jan Kiszka wrote:


...
[Update] While writing this mail and letting your test run for a while,
I *did* get a hard lock-up. Hold on, digging deeper...




And here are its last words, spoken via serial console:

c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020
   c012e564 0022  0246 c30d1a90 c4866ce0 0033
c482
   c482a360 c4866ca0  c48293a4 c48524e1  
0002
Call Trace:
 [] __ipipe_dispatch_event+0x56/0xdd
 [] e100_hw_init+0x3ad/0xa81 [e100]
 [] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus]
 [] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus]
 [] rt_sem_p+0xa6/0x10a [xeno_native]
 [] __rt_sem_p+0x5d/0x66 [xeno_native]
 [] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus]
 [] __ipipe_dispatch_event+0x56/0xdd
 [] __ipipe_syscall_root+0x53/0xbe
 [] system_call+0x20/0x41
Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082,
sig=0, prev=gatekeeper/0[809])
 CPU  PIDPRI  TIMEOUT  STAT  NAME


0  0  30   000500080  ROOT


   0  86430   000300180  task0
   0  86529   000300288  task1
   0  8631000300082  main
Timer: oneshot [tickval=1 ns, elapsed=175144731477]

c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500
   c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022
0001
   c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610
c030cd80
Call Trace:
 [] __ipipe_dispatch_event+0x56/0xdd
 [] schedule+0x3ef/0x5ed
 [] gatekeeper_thread+0x0/0x179 [xeno_nucleus]
 [] gatekeeper_thread+0x9a/0x179 [xeno_nucleus]
 [] default_wake_function+0x0/0x12
 [] kthread+0x68/0x95
 [] kthread+0x0/0x95
 [] kernel_thread_helper+0x5/0xb

Any bells already ringing?


Yes; the bad news is that this looks like the same bug than you reported recently, 
which I only partially fixed, it seems. xnshadow_harden() is still not working 
properly under certain preemption situation induced by CONFIG_PREEMPT, and the 
hardening thread is likely unexpectedly moved back to the Linux runqueue while 
transitioning to Xenomai. The good news is that it's a well identified issue, at 
least...




Will try Gilles' patch now...

Jan





___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core



--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-19 Thread Jeroen Van den Keybus
Turned off:
  o  nucleus debugging
  o  ipipe stats
  o  ipipe tracing
  o  ishield
 
The problem persists. Also tried three tasks, with same result. With one task, there is no problem.
 
 
Jeroen.


Re: [Xenomai-core] Scheduling while atomic

2006-01-19 Thread Jeroen Van den Keybus
Turned off:
  o  nucleus debugging
  o  ipipe stats
  o  ipipe tracing
  o  ishield
 
The problem persists. Also tried three tasks, with same result. With one task, there is no problem.
 
 
Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-19 Thread Jeroen Van den Keybus

Hold on. Just crashed without the file access: please disregard last post.
 
 
Jeroen.


[Xenomai-core] Scheduling while atomic

2006-01-19 Thread Jeroen Van den Keybus
Hm.
 
When I remove the output() from both tasks, all seems fine.
 
Jeroen. 


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
# ./sem 1.0e5Running for 2.15 seconds.L119: Operation not permitted.
 
Could be a dirty cleanup. Sorry for that. 
...ipipe_supend_domain...default_idle...cpu_idle...start_kernel...unknown_bootoptionKernel panic

 
start_kernel ? Unknown_bootoption ? The plot is thickening.
 
# ./mutex 1.0e5ALERT: No lock! (lockcnt=0) Offending task: task0ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0[...][freeze]
 
The problem is: freezing could be due to Linux getting no CPU time anymore. I think the best test now is to run the semaphore test (it should give no ALERTS) at a reasonable (1.0e6 - 1.0e7, which is on average 2kHz - 200Hz) speed for a longer duration. Maybe better 
1.0e7, which will yield about 1.5 Mb per hour.
Many thanks already for testing and helping us out.
 
Jeroen.


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
When testing, also check the dmesg logs please. When nucleus debugging is off, you get only a kernel warning 'scheduling while atomic'.
- Could you replace MAXLONG on line 119 with TM_INFINITE and rerun the sem test (to avoid printf'ing) You can terminate with Ctrl+C.


- Could you try to run with tmax somewhat lower (to increase the load): e.g. ./sem 5.0e6 or so. Be careful not to starve Linux. I think 1.0e5 will be too low. 
Crashes with me happen within the first 10 secs with 1.0e7
 
Thanks,
 
Jeroen.
 


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Hannes Mayer

Jeroen Van den Keybus wrote:
- Could you replace MAXLONG on line 119 with TM_INFINITE and rerun the 
sem test (to avoid printf'ing) You can terminate with Ctrl+C.
- Could you try to run with tmax somewhat lower (to increase the load): 
e.g. ./sem 5.0e6 or so. Be careful not to starve Linux. I think 1.0e5 
will be too low.
 
Crashes with me happen within the first 10 secs with 1.0e7


On Xeno2.0, kernel 2.6.13.4:

# ./sem 1.0e5
Running for 2.15 seconds.
L119: Operation not permitted.

Kernel.log:
Unable to handle kernel NULL pointer dereference at virtual address 0114
[freeze]

# ./sem 1.0e4
Running for 2.15 seconds.

Kernel.log:
(sorry no serial console available yet)
...default_idle
...ipipe_supend_domain
...default_idle
...cpu_idle
...start_kernel
...unknown_bootoption
Kernel panic

# ./mutex 1.0e5
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
[...]
[freeze]

kernel.log:
[off-screen]
...panic
...do_exit
...printk
...die
...__ipipe_sync_stage
...do_page_fault
...__do_softirw
...irq_exit
...do_IRQ
...restore_raw
...__ipipe_handle_exception
...error_code
...__ipipe_sync_stage
...default_idle
...ipipe_supend_domain
...default_idle
...cpu_idle
...start_kernel
...unknown_bootoption
Kernel panic

Best regards,
Hannes.



Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
Hm.
 
When I remove the output() from both tasks, all seems fine.
 
Jeroen. 


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Hannes Mayer

Jan Kiszka wrote:
[...]

Do you (or anybody else) have a running 2.0.x installation? If so,
please test that setup as well.


Sure :-)

# uname -r
2.6.13.4-adeos-xenomai
# cat /proc/xenomai/version
2.0
# ./mutex
Running for 2.15 seconds.
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
L121: Connection timed out.
# cat dump.txt
101001001010101011000110001[...]

# ./sem
Running for 2.15 seconds.
L119: Connection timed out.
# cat dump.txt
101001muon:/home/xenomai/atomic#

More tests ?

Best regards,
Hannes.



Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
>  Interesting, when writing to 2 different files, I get the same crashes.
> Will test with only one task/fd.

File ops doesn't matter for me. I took them out of task0/1, and I still
got the crashes. (BTW, this may explain the difference in your backtrace
you reported privately.)

Jan - now really leaving...



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Hannes Mayer wrote:
> Jeroen Van den Keybus wrote:
>> Hello,
>>  
>>  
>> Apparently, the code I shared with Gilles never made it to this forum.
>> Anyway, the issue I'm having here is really a problem and it might be
>> useful if some of you could try it out or comment on it. 
> 
> If you have a makefile for them, I'll give it a try.
> I'm too lazy...err...busy to make one myself(*)  ;-)

Try this, it even works without make:

gcc -o   ` --xeno-cflags` \
` --xeno-ldlags` -lnative

Do you (or anybody else) have a running 2.0.x installation? If so,
please test that setup as well.

> 
> Best regards,
> Hannes.
> 
> (*) Jan has the copyright on this line, so I hope it's GPL ;-)

No rights reserved. :)


Ok, have to go home now - getting hungry...

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus

Interesting, when writing to 2 different files, I get the same crashes. Will test with only one task/fd.
 
 
> In fact, my last crash report did contain a difference with yours:>> ROOT task was at priority 0
> main was at 1> task0 was at 30 timeout 3.7ms> task1 was at 29 timeout 4.9 ms>> Does this mean that both tasks were actually not even inside the critical> section ?>
Yep, and this clearly indicates that there is no issue with mutexes,semaphores, or whatever. Please try to capture the full trace and postit to the list. As I said, this kind of topic here is not my home domain...



Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>  > Jeroen Van den Keybus wrote:
>>  > > Gilles,
>>  > > 
>>  > > 
>>  > > I cannot reproduce those messages after turning nucleus debugging on.
>>  > > Instead, I now either get relatively more failing mutexes or even hard
>>  > > lockups with the test program I sent to you. If the computer didn't 
>> crash,
>>  > > dmesg contains 3 Xenomai messages relating to a task being movend to
>>  > > secondary domain after exception #14. As when the computer crashes: I 
>> have
>>  > > written the last kernel panic message on a paper. Please tell if you 
>> want
>>  > > also the addresses or (part of) the call stack.
>>  > > 
>>  > > I'm still wondering if there's a programming error in the mutex test
>>  > > program. After I sent my previous message, and before I turned nucleus
>>  > > debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to
>>  > > fatally crash the computer, while spewing out countless 'scheduling 
>> while
>>  > > atomic messages'. Is the mutex error reproducible ?
>>  > 
>>  > I was not able to crash my box or generate that scheduler warnings, but
>>  > the attached patch fixes the false positive warnings of unlocked
>>  > mutexes. We had a "leak" in the unlock path when someone was already
>>  > waiting. Anyway, *this* issues should not have caused any other problems
>>  > then the wrong report of rt_mutex_inquire().
>>
>> Actually the patch seem insufficient, the whole block :
>>  {
>>  xnsynch_set_owner(&mutex->synch_base,&task->thread_base);
>>  mutex->owner = task;
>>  mutex->lockcnt = 1;
>>  goto unlock_and_exit;
>>  }
>>
>> should be done after xnsynch_sleep_on in rt_mutex_lock.
>>
> 
> Damn, of course - except for "mutex->owner = task". Then this missing
> xnsync_set_owner() may have caused serious issues? Will test...

Correction: xnsynch_wakeup_one_sleeper() updates synch->owner, so all
fine with my patch in this regard. I guess it is really some migration
issue again.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>  > Jeroen Van den Keybus wrote:
>  > > Gilles,
>  > > 
>  > > 
>  > > I cannot reproduce those messages after turning nucleus debugging on.
>  > > Instead, I now either get relatively more failing mutexes or even hard
>  > > lockups with the test program I sent to you. If the computer didn't 
> crash,
>  > > dmesg contains 3 Xenomai messages relating to a task being movend to
>  > > secondary domain after exception #14. As when the computer crashes: I 
> have
>  > > written the last kernel panic message on a paper. Please tell if you want
>  > > also the addresses or (part of) the call stack.
>  > > 
>  > > I'm still wondering if there's a programming error in the mutex test
>  > > program. After I sent my previous message, and before I turned nucleus
>  > > debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to
>  > > fatally crash the computer, while spewing out countless 'scheduling while
>  > > atomic messages'. Is the mutex error reproducible ?
>  > 
>  > I was not able to crash my box or generate that scheduler warnings, but
>  > the attached patch fixes the false positive warnings of unlocked
>  > mutexes. We had a "leak" in the unlock path when someone was already
>  > waiting. Anyway, *this* issues should not have caused any other problems
>  > then the wrong report of rt_mutex_inquire().
> 
> Actually the patch seem insufficient, the whole block :
>   {
>   xnsynch_set_owner(&mutex->synch_base,&task->thread_base);
>   mutex->owner = task;
>   mutex->lockcnt = 1;
>   goto unlock_and_exit;
>   }
> 
> should be done after xnsynch_sleep_on in rt_mutex_lock.
> 

Damn, of course - except for "mutex->owner = task". Then this missing
xnsync_set_owner() may have caused serious issues? Will test...

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > Jeroen Van den Keybus wrote:
 > > Gilles,
 > > 
 > > 
 > > I cannot reproduce those messages after turning nucleus debugging on.
 > > Instead, I now either get relatively more failing mutexes or even hard
 > > lockups with the test program I sent to you. If the computer didn't crash,
 > > dmesg contains 3 Xenomai messages relating to a task being movend to
 > > secondary domain after exception #14. As when the computer crashes: I have
 > > written the last kernel panic message on a paper. Please tell if you want
 > > also the addresses or (part of) the call stack.
 > > 
 > > I'm still wondering if there's a programming error in the mutex test
 > > program. After I sent my previous message, and before I turned nucleus
 > > debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to
 > > fatally crash the computer, while spewing out countless 'scheduling while
 > > atomic messages'. Is the mutex error reproducible ?
 > 
 > I was not able to crash my box or generate that scheduler warnings, but
 > the attached patch fixes the false positive warnings of unlocked
 > mutexes. We had a "leak" in the unlock path when someone was already
 > waiting. Anyway, *this* issues should not have caused any other problems
 > then the wrong report of rt_mutex_inquire().

Actually the patch seem insufficient, the whole block :
{
xnsynch_set_owner(&mutex->synch_base,&task->thread_base);
mutex->owner = task;
mutex->lockcnt = 1;
goto unlock_and_exit;
}

should be done after xnsynch_sleep_on in rt_mutex_lock.

-- 


Gilles Chanteperdrix.



Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jan Kiszka wrote:
> Jan Kiszka wrote:
>> ...
>> [Update] While writing this mail and letting your test run for a while,
>> I *did* get a hard lock-up. Hold on, digging deeper...
>>
> 
> And here are its last words, spoken via serial console:
> 
> c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020
>c012e564 0022  0246 c30d1a90 c4866ce0 0033
> c482
>c482a360 c4866ca0  c48293a4 c48524e1  
> 0002
> Call Trace:
>  [] __ipipe_dispatch_event+0x56/0xdd
>  [] e100_hw_init+0x3ad/0xa81 [e100]
>  [] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus]
>  [] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus]
>  [] rt_sem_p+0xa6/0x10a [xeno_native]
>  [] __rt_sem_p+0x5d/0x66 [xeno_native]
>  [] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus]
>  [] __ipipe_dispatch_event+0x56/0xdd
>  [] __ipipe_syscall_root+0x53/0xbe
>  [] system_call+0x20/0x41
> Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082,
> sig=0, prev=gatekeeper/0[809])
>  CPU  PIDPRI  TIMEOUT  STAT  NAME
>>  0  0  30   000500080  ROOT
>0  86430   000300180  task0
>0  86529   000300288  task1
>0  8631000300082  main
> Timer: oneshot [tickval=1 ns, elapsed=175144731477]
> 
> c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500
>c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022
> 0001
>c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610
> c030cd80
> Call Trace:
>  [] __ipipe_dispatch_event+0x56/0xdd
>  [] schedule+0x3ef/0x5ed
>  [] gatekeeper_thread+0x0/0x179 [xeno_nucleus]
>  [] gatekeeper_thread+0x9a/0x179 [xeno_nucleus]
>  [] default_wake_function+0x0/0x12
>  [] kthread+0x68/0x95
>  [] kthread+0x0/0x95
>  [] kernel_thread_helper+0x5/0xb
> 
> Any bells already ringing?
> 
> Will try Gilles' patch now...
> 

Nope, this didn't help.

Ok, this is migration magic. Someone around who hacks this part blindly?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > @Gilles: please apply to both trees.

Done. Thanks.

-- 


Gilles Chanteperdrix.



Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Hannes Mayer

Jeroen Van den Keybus wrote:

Hello,
 
 
Apparently, the code I shared with Gilles never made it to this forum. 
Anyway, the issue I'm having here is really a problem and it might be 
useful if some of you could try it out or comment on it. 


If you have a makefile for them, I'll give it a try.
I'm too lazy...err...busy to make one myself(*)  ;-)

Best regards,
Hannes.

(*) Jan has the copyright on this line, so I hope it's GPL ;-)



Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jan Kiszka wrote:
> ...
> [Update] While writing this mail and letting your test run for a while,
> I *did* get a hard lock-up. Hold on, digging deeper...
> 

And here are its last words, spoken via serial console:

c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020
   c012e564 0022  0246 c30d1a90 c4866ce0 0033
c482
   c482a360 c4866ca0  c48293a4 c48524e1  
0002
Call Trace:
 [] __ipipe_dispatch_event+0x56/0xdd
 [] e100_hw_init+0x3ad/0xa81 [e100]
 [] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus]
 [] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus]
 [] rt_sem_p+0xa6/0x10a [xeno_native]
 [] __rt_sem_p+0x5d/0x66 [xeno_native]
 [] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus]
 [] __ipipe_dispatch_event+0x56/0xdd
 [] __ipipe_syscall_root+0x53/0xbe
 [] system_call+0x20/0x41
Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082,
sig=0, prev=gatekeeper/0[809])
 CPU  PIDPRI  TIMEOUT  STAT  NAME
>  0  0  30   000500080  ROOT
   0  86430   000300180  task0
   0  86529   000300288  task1
   0  8631000300082  main
Timer: oneshot [tickval=1 ns, elapsed=175144731477]

c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500
   c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022
0001
   c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610
c030cd80
Call Trace:
 [] __ipipe_dispatch_event+0x56/0xdd
 [] schedule+0x3ef/0x5ed
 [] gatekeeper_thread+0x0/0x179 [xeno_nucleus]
 [] gatekeeper_thread+0x9a/0x179 [xeno_nucleus]
 [] default_wake_function+0x0/0x12
 [] kthread+0x68/0x95
 [] kthread+0x0/0x95
 [] kernel_thread_helper+0x5/0xb

Any bells already ringing?

Will try Gilles' patch now...

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
> Gilles,
> 
> 
> I cannot reproduce those messages after turning nucleus debugging on.
> Instead, I now either get relatively more failing mutexes or even hard
> lockups with the test program I sent to you. If the computer didn't crash,
> dmesg contains 3 Xenomai messages relating to a task being movend to
> secondary domain after exception #14. As when the computer crashes: I have
> written the last kernel panic message on a paper. Please tell if you want
> also the addresses or (part of) the call stack.
> 
> I'm still wondering if there's a programming error in the mutex test
> program. After I sent my previous message, and before I turned nucleus
> debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to
> fatally crash the computer, while spewing out countless 'scheduling while
> atomic messages'. Is the mutex error reproducible ?

I was not able to crash my box or generate that scheduler warnings, but
the attached patch fixes the false positive warnings of unlocked
mutexes. We had a "leak" in the unlock path when someone was already
waiting. Anyway, *this* issues should not have caused any other problems
then the wrong report of rt_mutex_inquire().

@Gilles: please apply to both trees.

[Update] While writing this mail and letting your test run for a while,
I *did* get a hard lock-up. Hold on, digging deeper...

Jan
Index: ChangeLog
===
--- ChangeLog   (revision 465)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2006-01-18  Jan Kiszka  <[EMAIL PROTECTED]>
+
+   * ksrc/skins/native/mutex.c (rt_mutex_unlock): Fix leaking lockcnt
+   on unlock with pending waiters.
+
 2006-01-16  Gilles Chanteperdrix  <[EMAIL PROTECTED]>
 
* ksrc/skins/native/task.c (rt_task_create): Use a separate string
Index: ksrc/skins/native/mutex.c
===
--- ksrc/skins/native/mutex.c   (revision 465)
+++ ksrc/skins/native/mutex.c   (working copy)
@@ -461,8 +461,11 @@
 mutex->owner = 
thread2rtask(xnsynch_wakeup_one_sleeper(&mutex->synch_base));
 
 if (mutex->owner != NULL)
+   {
+   mutex->lockcnt = 1;
xnpod_schedule();
-
+   }
+
  unlock_and_exit:
 
 xnlock_put_irqrestore(&nklock,s);


signature.asc
Description: OpenPGP digital signature


[Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
Hello,
 
 
Apparently, the code I shared with Gilles never made it to this forum. Anyway, the issue I'm having here is really a problem and it might be useful if some of you could try it out or comment on it. I might be making a silly programming error here, but the result is invariably erroneous operation or kernel crashes.

 
The program creates a file dump.txt and has two independent threads trying to access it and write a one or a zero there. Inside the writing routine, which is accessed by both threads, a check is made to see if the access is really locked. In my setup, I have tons of ALERTS popping up with this program, meaning that something is wrong with my use of mutex. Could anyone please check and see if a) it is correctly written and b) it fails as well on their machine. It would allow me to focus my actions on the Xenomai setup (which I keep frozen this instant, in order to keep a possible bug predictable) or on my own programming.

 
A second example is also included, which tries to achieve the same goal with a semaphore (initialized to 1). That seems to work, but under heavy load (tmax = 1.0e7), the kernel crashes.
 
Kernel: 2.6.15 Adeos: 1.1-03 gcc: 4.0.2 Ipipe tracing enabled
 
TIA
 
Jeroen.
 
 

/* TEST_MUTEX.C */
#include #include #include #include #include #include #include 
#include 
#include #include #include 
int fd, err;RT_MUTEX m;RT_SEM s;float tmax = 1.0e7;
#define CHECK(arg) check(arg, __LINE__)
int check(int r, int n){    if (r != 0)    fprintf(stderr, "L%d: %s.\n", n, strerror(-r));    return(r);}
void output(char c) {    static int cnt = 0;    int n;    char buf[2];    RT_MUTEX_INFO mutexinfo;        buf[0] = c;        if (cnt == 80) {    buf[1] = '\n';    n = 2;
    cnt = 0;    }    else {    n = 1;    cnt++;    }        CHECK(rt_mutex_inquire(&m, &mutexinfo));    if (mutexinfo.lockcnt <= 0) {    RT_TASK_INFO taskinfo;
    CHECK(rt_task_inquire(NULL, &taskinfo));    fprintf(stderr, "ALERT: No lock! (lockcnt=%d) Offending task: %s\n",    mutexinfo.lockcnt, taskinfo.name
);    }       if (write(fd, buf, n) != n) {    fprintf(stderr, "File write error.\n");    CHECK(rt_sem_v(&s));    }    }
void task0(void *arg){    CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL));    while (1) {    CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX));    CHECK(rt_mutex_lock(&m, TM_INFINITE));
    output('0');    CHECK(rt_mutex_unlock(&m));    }}
void task1(void *arg){    CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL));    while (1) {    CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX));    CHECK(rt_mutex_lock(&m, TM_INFINITE));
    output('1');    CHECK(rt_mutex_unlock(&m));    }}
void sighandler(int arg){    CHECK(rt_sem_v(&s));}
int main(int argc, char *argv[]){    RT_TASK t, t0, t1;        if ((fd = open("dump.txt", O_CREAT | O_TRUNC | O_WRONLY)) < 0)    fprintf(stderr, "File open error.\n");
    else {    if (argc == 2) {    tmax = atof(argv[1]);    if (tmax == 0.0)    tmax = 1.0e7;    }    if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0)    printf("mlockall() error.\n");
        CHECK(rt_task_shadow(&t, "main", 1, T_FPU));
    CHECK(rt_timer_start(TM_ONESHOT));        CHECK(rt_mutex_create(&m, "mutex"));    CHECK(rt_sem_create(&s, "sem", 0, S_PRIO));
    signal(SIGINT, sighandler);        CHECK(rt_task_create(&t0, "task0", 0, 30, T_FPU));    CHECK(rt_task_start(&t0, task0, NULL));    CHECK(rt_task_create(&t1, "task1", 0, 29, T_FPU));
    CHECK(rt_task_start(&t1, task1, NULL));
    printf("Running for %.2f seconds.\n", (float)MAXLONG/1.0e9);    CHECK(rt_sem_p(&s, MAXLONG));    signal(SIGINT, SIG_IGN);        CHECK(rt_task_delete(&t1));
    CHECK(rt_task_delete(&t1));    CHECK(rt_task_delete(&t0));        CHECK(rt_sem_delete(&s));    CHECK(rt_mutex_delete(&m));        rt_timer_stop();
        close(fd);    }    return 0;}
/*/ 

/* TEST_SEM.C */
#include #include #include #include #include #include #include 
#include 
#include #include 
int fd, err;RT_SEM s, m;float tmax = 1.0e9;
#define CHECK(arg) check(arg, __LINE__)
int check(int r, int n){    if (r != 0)    fprintf(stderr, "L%d: %s.\n", n, strerror(-r));    return(r);}
void output(char c) {    static int cnt = 0;    int n;    char buf[2];    RT_SEM_INFO seminfo;        buf[0] = c;        if (cnt == 80) {    buf[1] = '\n';    n = 2;
    cnt = 0;    }    else {    n = 1;    cnt++;    }        CHECK(rt_sem_inquire(&m, &seminfo));    if (seminfo.count != 0) {    RT_TASK_INFO taskinfo;    CHECK(rt_task_inquire(NULL, &taskinfo));
    fprintf(stderr, "ALERT: No lock! (count=%ld) Offending task: %s\n",    seminfo.count, taskinfo.name);    }       if (write(fd, buf, n) != n) {
   

Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus

Hold on. Just crashed without the file access: please disregard last post.
 
 
Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
Hm.
 
When I remove the output() from both tasks, all seems fine.
 
Jeroen. 
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
# ./sem 1.0e5Running for 2.15 seconds.L119: Operation not permitted.
 
Could be a dirty cleanup. Sorry for that. 
...ipipe_supend_domain...default_idle...cpu_idle...start_kernel...unknown_bootoptionKernel panic

 
start_kernel ? Unknown_bootoption ? The plot is thickening.
 
# ./mutex 1.0e5ALERT: No lock! (lockcnt=0) Offending task: task0ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0[...][freeze]
 
The problem is: freezing could be due to Linux getting no CPU time anymore. I think the best test now is to run the semaphore test (it should give no ALERTS) at a reasonable (1.0e6 - 1.0e7, which is on average 2kHz - 200Hz) speed for a longer duration. Maybe better 
1.0e7, which will yield about 1.5 Mb per hour.
Many thanks already for testing and helping us out.
 
Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
When testing, also check the dmesg logs please. When nucleus debugging is off, you get only a kernel warning 'scheduling while atomic'.
- Could you replace MAXLONG on line 119 with TM_INFINITE and rerun the sem test (to avoid printf'ing) You can terminate with Ctrl+C.


- Could you try to run with tmax somewhat lower (to increase the load): e.g. ./sem 5.0e6 or so. Be careful not to starve Linux. I think 1.0e5 will be too low. 
Crashes with me happen within the first 10 secs with 1.0e7
 
Thanks,
 
Jeroen.
 
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Hannes Mayer

Jeroen Van den Keybus wrote:
- Could you replace MAXLONG on line 119 with TM_INFINITE and rerun the 
sem test (to avoid printf'ing) You can terminate with Ctrl+C.
- Could you try to run with tmax somewhat lower (to increase the load): 
e.g. ./sem 5.0e6 or so. Be careful not to starve Linux. I think 1.0e5 
will be too low.
 
Crashes with me happen within the first 10 secs with 1.0e7


On Xeno2.0, kernel 2.6.13.4:

# ./sem 1.0e5
Running for 2.15 seconds.
L119: Operation not permitted.

Kernel.log:
Unable to handle kernel NULL pointer dereference at virtual address 0114
[freeze]

# ./sem 1.0e4
Running for 2.15 seconds.

Kernel.log:
(sorry no serial console available yet)
...default_idle
...ipipe_supend_domain
...default_idle
...cpu_idle
...start_kernel
...unknown_bootoption
Kernel panic

# ./mutex 1.0e5
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
[...]
[freeze]

kernel.log:
[off-screen]
...panic
...do_exit
...printk
...die
...__ipipe_sync_stage
...do_page_fault
...__do_softirw
...irq_exit
...do_IRQ
...restore_raw
...__ipipe_handle_exception
...error_code
...__ipipe_sync_stage
...default_idle
...ipipe_supend_domain
...default_idle
...cpu_idle
...start_kernel
...unknown_bootoption
Kernel panic

Best regards,
Hannes.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
Hm.
 
When I remove the output() from both tasks, all seems fine.
 
Jeroen. 
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Hannes Mayer

Jan Kiszka wrote:
[...]

Do you (or anybody else) have a running 2.0.x installation? If so,
please test that setup as well.


Sure :-)

# uname -r
2.6.13.4-adeos-xenomai
# cat /proc/xenomai/version
2.0
# ./mutex
Running for 2.15 seconds.
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
L121: Connection timed out.
# cat dump.txt
101001001010101011000110001[...]

# ./sem
Running for 2.15 seconds.
L119: Connection timed out.
# cat dump.txt
101001muon:/home/xenomai/atomic#

More tests ?

Best regards,
Hannes.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
>  Interesting, when writing to 2 different files, I get the same crashes.
> Will test with only one task/fd.

File ops doesn't matter for me. I took them out of task0/1, and I still
got the crashes. (BTW, this may explain the difference in your backtrace
you reported privately.)

Jan - now really leaving...



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Hannes Mayer wrote:
> Jeroen Van den Keybus wrote:
>> Hello,
>>  
>>  
>> Apparently, the code I shared with Gilles never made it to this forum.
>> Anyway, the issue I'm having here is really a problem and it might be
>> useful if some of you could try it out or comment on it. 
> 
> If you have a makefile for them, I'll give it a try.
> I'm too lazy...err...busy to make one myself(*)  ;-)

Try this, it even works without make:

gcc -o   ` --xeno-cflags` \
` --xeno-ldlags` -lnative

Do you (or anybody else) have a running 2.0.x installation? If so,
please test that setup as well.

> 
> Best regards,
> Hannes.
> 
> (*) Jan has the copyright on this line, so I hope it's GPL ;-)

No rights reserved. :)


Ok, have to go home now - getting hungry...

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus

Interesting, when writing to 2 different files, I get the same crashes. Will test with only one task/fd.
 
 
> In fact, my last crash report did contain a difference with yours:>> ROOT task was at priority 0
> main was at 1> task0 was at 30 timeout 3.7ms> task1 was at 29 timeout 4.9 ms>> Does this mean that both tasks were actually not even inside the critical> section ?>
Yep, and this clearly indicates that there is no issue with mutexes,semaphores, or whatever. Please try to capture the full trace and postit to the list. As I said, this kind of topic here is not my home domain...

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>  > Jeroen Van den Keybus wrote:
>>  > > Gilles,
>>  > > 
>>  > > 
>>  > > I cannot reproduce those messages after turning nucleus debugging on.
>>  > > Instead, I now either get relatively more failing mutexes or even hard
>>  > > lockups with the test program I sent to you. If the computer didn't 
>> crash,
>>  > > dmesg contains 3 Xenomai messages relating to a task being movend to
>>  > > secondary domain after exception #14. As when the computer crashes: I 
>> have
>>  > > written the last kernel panic message on a paper. Please tell if you 
>> want
>>  > > also the addresses or (part of) the call stack.
>>  > > 
>>  > > I'm still wondering if there's a programming error in the mutex test
>>  > > program. After I sent my previous message, and before I turned nucleus
>>  > > debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to
>>  > > fatally crash the computer, while spewing out countless 'scheduling 
>> while
>>  > > atomic messages'. Is the mutex error reproducible ?
>>  > 
>>  > I was not able to crash my box or generate that scheduler warnings, but
>>  > the attached patch fixes the false positive warnings of unlocked
>>  > mutexes. We had a "leak" in the unlock path when someone was already
>>  > waiting. Anyway, *this* issues should not have caused any other problems
>>  > then the wrong report of rt_mutex_inquire().
>>
>> Actually the patch seem insufficient, the whole block :
>>  {
>>  xnsynch_set_owner(&mutex->synch_base,&task->thread_base);
>>  mutex->owner = task;
>>  mutex->lockcnt = 1;
>>  goto unlock_and_exit;
>>  }
>>
>> should be done after xnsynch_sleep_on in rt_mutex_lock.
>>
> 
> Damn, of course - except for "mutex->owner = task". Then this missing
> xnsync_set_owner() may have caused serious issues? Will test...

Correction: xnsynch_wakeup_one_sleeper() updates synch->owner, so all
fine with my patch in this regard. I guess it is really some migration
issue again.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>  > Jeroen Van den Keybus wrote:
>  > > Gilles,
>  > > 
>  > > 
>  > > I cannot reproduce those messages after turning nucleus debugging on.
>  > > Instead, I now either get relatively more failing mutexes or even hard
>  > > lockups with the test program I sent to you. If the computer didn't 
> crash,
>  > > dmesg contains 3 Xenomai messages relating to a task being movend to
>  > > secondary domain after exception #14. As when the computer crashes: I 
> have
>  > > written the last kernel panic message on a paper. Please tell if you want
>  > > also the addresses or (part of) the call stack.
>  > > 
>  > > I'm still wondering if there's a programming error in the mutex test
>  > > program. After I sent my previous message, and before I turned nucleus
>  > > debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to
>  > > fatally crash the computer, while spewing out countless 'scheduling while
>  > > atomic messages'. Is the mutex error reproducible ?
>  > 
>  > I was not able to crash my box or generate that scheduler warnings, but
>  > the attached patch fixes the false positive warnings of unlocked
>  > mutexes. We had a "leak" in the unlock path when someone was already
>  > waiting. Anyway, *this* issues should not have caused any other problems
>  > then the wrong report of rt_mutex_inquire().
> 
> Actually the patch seem insufficient, the whole block :
>   {
>   xnsynch_set_owner(&mutex->synch_base,&task->thread_base);
>   mutex->owner = task;
>   mutex->lockcnt = 1;
>   goto unlock_and_exit;
>   }
> 
> should be done after xnsynch_sleep_on in rt_mutex_lock.
> 

Damn, of course - except for "mutex->owner = task". Then this missing
xnsync_set_owner() may have caused serious issues? Will test...

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jan Kiszka wrote:
> Jan Kiszka wrote:
>> ...
>> [Update] While writing this mail and letting your test run for a while,
>> I *did* get a hard lock-up. Hold on, digging deeper...
>>
> 
> And here are its last words, spoken via serial console:
> 
> c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020
>c012e564 0022  0246 c30d1a90 c4866ce0 0033
> c482
>c482a360 c4866ca0  c48293a4 c48524e1  
> 0002
> Call Trace:
>  [] __ipipe_dispatch_event+0x56/0xdd
>  [] e100_hw_init+0x3ad/0xa81 [e100]
>  [] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus]
>  [] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus]
>  [] rt_sem_p+0xa6/0x10a [xeno_native]
>  [] __rt_sem_p+0x5d/0x66 [xeno_native]
>  [] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus]
>  [] __ipipe_dispatch_event+0x56/0xdd
>  [] __ipipe_syscall_root+0x53/0xbe
>  [] system_call+0x20/0x41
> Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082,
> sig=0, prev=gatekeeper/0[809])
>  CPU  PIDPRI  TIMEOUT  STAT  NAME
>>  0  0  30   000500080  ROOT
>0  86430   000300180  task0
>0  86529   000300288  task1
>0  8631000300082  main
> Timer: oneshot [tickval=1 ns, elapsed=175144731477]
> 
> c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500
>c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022
> 0001
>c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610
> c030cd80
> Call Trace:
>  [] __ipipe_dispatch_event+0x56/0xdd
>  [] schedule+0x3ef/0x5ed
>  [] gatekeeper_thread+0x0/0x179 [xeno_nucleus]
>  [] gatekeeper_thread+0x9a/0x179 [xeno_nucleus]
>  [] default_wake_function+0x0/0x12
>  [] kthread+0x68/0x95
>  [] kthread+0x0/0x95
>  [] kernel_thread_helper+0x5/0xb
> 
> Any bells already ringing?
> 
> Will try Gilles' patch now...
> 

Nope, this didn't help.

Ok, this is migration magic. Someone around who hacks this part blindly?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > Jeroen Van den Keybus wrote:
 > > Gilles,
 > > 
 > > 
 > > I cannot reproduce those messages after turning nucleus debugging on.
 > > Instead, I now either get relatively more failing mutexes or even hard
 > > lockups with the test program I sent to you. If the computer didn't crash,
 > > dmesg contains 3 Xenomai messages relating to a task being movend to
 > > secondary domain after exception #14. As when the computer crashes: I have
 > > written the last kernel panic message on a paper. Please tell if you want
 > > also the addresses or (part of) the call stack.
 > > 
 > > I'm still wondering if there's a programming error in the mutex test
 > > program. After I sent my previous message, and before I turned nucleus
 > > debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to
 > > fatally crash the computer, while spewing out countless 'scheduling while
 > > atomic messages'. Is the mutex error reproducible ?
 > 
 > I was not able to crash my box or generate that scheduler warnings, but
 > the attached patch fixes the false positive warnings of unlocked
 > mutexes. We had a "leak" in the unlock path when someone was already
 > waiting. Anyway, *this* issues should not have caused any other problems
 > then the wrong report of rt_mutex_inquire().

Actually the patch seem insufficient, the whole block :
{
xnsynch_set_owner(&mutex->synch_base,&task->thread_base);
mutex->owner = task;
mutex->lockcnt = 1;
goto unlock_and_exit;
}

should be done after xnsynch_sleep_on in rt_mutex_lock.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > @Gilles: please apply to both trees.

Done. Thanks.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Hannes Mayer

Jeroen Van den Keybus wrote:

Hello,
 
 
Apparently, the code I shared with Gilles never made it to this forum. 
Anyway, the issue I'm having here is really a problem and it might be 
useful if some of you could try it out or comment on it. 


If you have a makefile for them, I'll give it a try.
I'm too lazy...err...busy to make one myself(*)  ;-)

Best regards,
Hannes.

(*) Jan has the copyright on this line, so I hope it's GPL ;-)

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jan Kiszka wrote:
> ...
> [Update] While writing this mail and letting your test run for a while,
> I *did* get a hard lock-up. Hold on, digging deeper...
> 

And here are its last words, spoken via serial console:

c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020
   c012e564 0022  0246 c30d1a90 c4866ce0 0033
c482
   c482a360 c4866ca0  c48293a4 c48524e1  
0002
Call Trace:
 [] __ipipe_dispatch_event+0x56/0xdd
 [] e100_hw_init+0x3ad/0xa81 [e100]
 [] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus]
 [] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus]
 [] rt_sem_p+0xa6/0x10a [xeno_native]
 [] __rt_sem_p+0x5d/0x66 [xeno_native]
 [] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus]
 [] __ipipe_dispatch_event+0x56/0xdd
 [] __ipipe_syscall_root+0x53/0xbe
 [] system_call+0x20/0x41
Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082,
sig=0, prev=gatekeeper/0[809])
 CPU  PIDPRI  TIMEOUT  STAT  NAME
>  0  0  30   000500080  ROOT
   0  86430   000300180  task0
   0  86529   000300288  task1
   0  8631000300082  main
Timer: oneshot [tickval=1 ns, elapsed=175144731477]

c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500
   c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022
0001
   c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610
c030cd80
Call Trace:
 [] __ipipe_dispatch_event+0x56/0xdd
 [] schedule+0x3ef/0x5ed
 [] gatekeeper_thread+0x0/0x179 [xeno_nucleus]
 [] gatekeeper_thread+0x9a/0x179 [xeno_nucleus]
 [] default_wake_function+0x0/0x12
 [] kthread+0x68/0x95
 [] kthread+0x0/0x95
 [] kernel_thread_helper+0x5/0xb

Any bells already ringing?

Will try Gilles' patch now...

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
> Gilles,
> 
> 
> I cannot reproduce those messages after turning nucleus debugging on.
> Instead, I now either get relatively more failing mutexes or even hard
> lockups with the test program I sent to you. If the computer didn't crash,
> dmesg contains 3 Xenomai messages relating to a task being movend to
> secondary domain after exception #14. As when the computer crashes: I have
> written the last kernel panic message on a paper. Please tell if you want
> also the addresses or (part of) the call stack.
> 
> I'm still wondering if there's a programming error in the mutex test
> program. After I sent my previous message, and before I turned nucleus
> debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to
> fatally crash the computer, while spewing out countless 'scheduling while
> atomic messages'. Is the mutex error reproducible ?

I was not able to crash my box or generate that scheduler warnings, but
the attached patch fixes the false positive warnings of unlocked
mutexes. We had a "leak" in the unlock path when someone was already
waiting. Anyway, *this* issues should not have caused any other problems
then the wrong report of rt_mutex_inquire().

@Gilles: please apply to both trees.

[Update] While writing this mail and letting your test run for a while,
I *did* get a hard lock-up. Hold on, digging deeper...

Jan
Index: ChangeLog
===
--- ChangeLog   (revision 465)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2006-01-18  Jan Kiszka  <[EMAIL PROTECTED]>
+
+   * ksrc/skins/native/mutex.c (rt_mutex_unlock): Fix leaking lockcnt
+   on unlock with pending waiters.
+
 2006-01-16  Gilles Chanteperdrix  <[EMAIL PROTECTED]>
 
* ksrc/skins/native/task.c (rt_task_create): Use a separate string
Index: ksrc/skins/native/mutex.c
===
--- ksrc/skins/native/mutex.c   (revision 465)
+++ ksrc/skins/native/mutex.c   (working copy)
@@ -461,8 +461,11 @@
 mutex->owner = 
thread2rtask(xnsynch_wakeup_one_sleeper(&mutex->synch_base));
 
 if (mutex->owner != NULL)
+   {
+   mutex->lockcnt = 1;
xnpod_schedule();
-
+   }
+
  unlock_and_exit:
 
 xnlock_put_irqrestore(&nklock,s);


signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
Hello,
 
 
Apparently, the code I shared with Gilles never made it to this forum. Anyway, the issue I'm having here is really a problem and it might be useful if some of you could try it out or comment on it. I might be making a silly programming error here, but the result is invariably erroneous operation or kernel crashes.

 
The program creates a file dump.txt and has two independent threads trying to access it and write a one or a zero there. Inside the writing routine, which is accessed by both threads, a check is made to see if the access is really locked. In my setup, I have tons of ALERTS popping up with this program, meaning that something is wrong with my use of mutex. Could anyone please check and see if a) it is correctly written and b) it fails as well on their machine. It would allow me to focus my actions on the Xenomai setup (which I keep frozen this instant, in order to keep a possible bug predictable) or on my own programming.

 
A second example is also included, which tries to achieve the same goal with a semaphore (initialized to 1). That seems to work, but under heavy load (tmax = 1.0e7), the kernel crashes.
 
Kernel: 2.6.15 Adeos: 1.1-03 gcc: 4.0.2 Ipipe tracing enabled
 
TIA
 
Jeroen.
 
 

/* TEST_MUTEX.C */
#include #include #include #include #include #include #include 
#include 
#include #include #include 
int fd, err;RT_MUTEX m;RT_SEM s;float tmax = 1.0e7;
#define CHECK(arg) check(arg, __LINE__)
int check(int r, int n){    if (r != 0)    fprintf(stderr, "L%d: %s.\n", n, strerror(-r));    return(r);}
void output(char c) {    static int cnt = 0;    int n;    char buf[2];    RT_MUTEX_INFO mutexinfo;        buf[0] = c;        if (cnt == 80) {    buf[1] = '\n';    n = 2;
    cnt = 0;    }    else {    n = 1;    cnt++;    }        CHECK(rt_mutex_inquire(&m, &mutexinfo));    if (mutexinfo.lockcnt <= 0) {    RT_TASK_INFO taskinfo;
    CHECK(rt_task_inquire(NULL, &taskinfo));    fprintf(stderr, "ALERT: No lock! (lockcnt=%d) Offending task: %s\n",    mutexinfo.lockcnt, taskinfo.name
);    }       if (write(fd, buf, n) != n) {    fprintf(stderr, "File write error.\n");    CHECK(rt_sem_v(&s));    }    }
void task0(void *arg){    CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL));    while (1) {    CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX));    CHECK(rt_mutex_lock(&m, TM_INFINITE));
    output('0');    CHECK(rt_mutex_unlock(&m));    }}
void task1(void *arg){    CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL));    while (1) {    CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX));    CHECK(rt_mutex_lock(&m, TM_INFINITE));
    output('1');    CHECK(rt_mutex_unlock(&m));    }}
void sighandler(int arg){    CHECK(rt_sem_v(&s));}
int main(int argc, char *argv[]){    RT_TASK t, t0, t1;        if ((fd = open("dump.txt", O_CREAT | O_TRUNC | O_WRONLY)) < 0)    fprintf(stderr, "File open error.\n");
    else {    if (argc == 2) {    tmax = atof(argv[1]);    if (tmax == 0.0)    tmax = 1.0e7;    }    if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0)    printf("mlockall() error.\n");
        CHECK(rt_task_shadow(&t, "main", 1, T_FPU));
    CHECK(rt_timer_start(TM_ONESHOT));        CHECK(rt_mutex_create(&m, "mutex"));    CHECK(rt_sem_create(&s, "sem", 0, S_PRIO));
    signal(SIGINT, sighandler);        CHECK(rt_task_create(&t0, "task0", 0, 30, T_FPU));    CHECK(rt_task_start(&t0, task0, NULL));    CHECK(rt_task_create(&t1, "task1", 0, 29, T_FPU));
    CHECK(rt_task_start(&t1, task1, NULL));
    printf("Running for %.2f seconds.\n", (float)MAXLONG/1.0e9);    CHECK(rt_sem_p(&s, MAXLONG));    signal(SIGINT, SIG_IGN);        CHECK(rt_task_delete(&t1));
    CHECK(rt_task_delete(&t1));    CHECK(rt_task_delete(&t0));        CHECK(rt_sem_delete(&s));    CHECK(rt_mutex_delete(&m));        rt_timer_stop();
        close(fd);    }    return 0;}
/*/ 

/* TEST_SEM.C */
#include #include #include #include #include #include #include 
#include 
#include #include 
int fd, err;RT_SEM s, m;float tmax = 1.0e9;
#define CHECK(arg) check(arg, __LINE__)
int check(int r, int n){    if (r != 0)    fprintf(stderr, "L%d: %s.\n", n, strerror(-r));    return(r);}
void output(char c) {    static int cnt = 0;    int n;    char buf[2];    RT_SEM_INFO seminfo;        buf[0] = c;        if (cnt == 80) {    buf[1] = '\n';    n = 2;
    cnt = 0;    }    else {    n = 1;    cnt++;    }        CHECK(rt_sem_inquire(&m, &seminfo));    if (seminfo.count != 0) {    RT_TASK_INFO taskinfo;    CHECK(rt_task_inquire(NULL, &taskinfo));
    fprintf(stderr, "ALERT: No lock! (count=%ld) Offending task: %s\n",    seminfo.count, taskinfo.name);    }       if (write(fd, buf, n) != n) {
   

Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Gilles Chanteperdrix
Jeroen Van den Keybus wrote:
 > Gilles,
 > 
 > 
 > I cannot reproduce those messages after turning nucleus debugging on.
 > Instead, I now either get relatively more failing mutexes or even hard
 > lockups with the test program I sent to you. If the computer didn't crash,
 > dmesg contains 3 Xenomai messages relating to a task being movend to
 > secondary domain after exception #14. As when the computer crashes: I have
 > written the last kernel panic message on a paper. Please tell if you want
 > also the addresses or (part of) the call stack.

Could you try adding a call to mlockall(MCL_CURRENT|MCL_FUTURE) ? 

Also note that you do not need protecting accesses to file descriptor
with rt_mutexes. stdio file descriptor are protected with pthread
mutexes, and pthread mutexes functions cause threads migration to
secondary mode. And unix file descriptor are passed to system calls,
which also cause migration to secondary mode.

-- 


Gilles Chanteperdrix.



Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Gilles Chanteperdrix
Jeroen Van den Keybus wrote:
 > Gilles,
 > 
 > 
 > I cannot reproduce those messages after turning nucleus debugging on.
 > Instead, I now either get relatively more failing mutexes or even hard
 > lockups with the test program I sent to you. If the computer didn't crash,
 > dmesg contains 3 Xenomai messages relating to a task being movend to
 > secondary domain after exception #14. As when the computer crashes: I have
 > written the last kernel panic message on a paper. Please tell if you want
 > also the addresses or (part of) the call stack.

Could you try adding a call to mlockall(MCL_CURRENT|MCL_FUTURE) ? 

Also note that you do not need protecting accesses to file descriptor
with rt_mutexes. stdio file descriptor are protected with pthread
mutexes, and pthread mutexes functions cause threads migration to
secondary mode. And unix file descriptor are passed to system calls,
which also cause migration to secondary mode.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
Gilles,
 
 
I cannot reproduce those messages after turning nucleus debugging on. Instead, I now either get relatively more failing mutexes or even hard lockups with the test program I sent to you. If the computer didn't crash, dmesg contains 3 Xenomai messages relating to a task being movend to secondary domain after exception #14. As when the computer crashes: I have written the last kernel panic message on a paper. Please tell if you want also the addresses or (part of) the call stack.

 
I'm still wondering if there's a programming error in the mutex test program. After I sent my previous message, and before I turned nucleus debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to fatally crash the computer, while spewing out countless 'scheduling while atomic messages'. Is the mutex error reproducible ?

 
Tomorrow I'll try the patch.
 
lostage_handler + e/33a
rthal_apc_handler + 3b/46
lostage_handler + 190/33a
rthal_apc_handler + 3b/46
__ipipe_sync_stage + 2a1/2bc
mark_offset_tsc + c1/456
__ipipe_sync_stage + 2a9/2bc
ipipe_unstall_pipeline_from + 189/194 (might be 181/194)
xnpod_delete_thread + ba1/bc3
mcount + 23/2a
taskexit_event + 4f/6c
__ipipe_dispatch_event + 90/173
do_exit + 10f/604
sys_exit + 8/14
syscall_call + 7/b
next_thread + 0/15
syscall_call + 7/b
 
<0> Kernel panic - not syncing: Fatal Exception in interrupt
 
 
Thanks for investigating,
 
Jeroen.


[Xenomai-core] Scheduling while atomic

2006-01-17 Thread Jeroen Van den Keybus
Gilles,
 
 
I cannot reproduce those messages after turning nucleus debugging on. Instead, I now either get relatively more failing mutexes or even hard lockups with the test program I sent to you. If the computer didn't crash, dmesg contains 3 Xenomai messages relating to a task being movend to secondary domain after exception #14. As when the computer crashes: I have written the last kernel panic message on a paper. Please tell if you want also the addresses or (part of) the call stack.

 
I'm still wondering if there's a programming error in the mutex test program. After I sent my previous message, and before I turned nucleus debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to fatally crash the computer, while spewing out countless 'scheduling while atomic messages'. Is the mutex error reproducible ?

 
Tomorrow I'll try the patch.
 
lostage_handler + e/33a
rthal_apc_handler + 3b/46
lostage_handler + 190/33a
rthal_apc_handler + 3b/46
__ipipe_sync_stage + 2a1/2bc
mark_offset_tsc + c1/456
__ipipe_sync_stage + 2a9/2bc
ipipe_unstall_pipeline_from + 189/194 (might be 181/194)
xnpod_delete_thread + ba1/bc3
mcount + 23/2a
taskexit_event + 4f/6c
__ipipe_dispatch_event + 90/173
do_exit + 10f/604
sys_exit + 8/14
syscall_call + 7/b
next_thread + 0/15
syscall_call + 7/b
 
<0> Kernel panic - not syncing: Fatal Exception in interrupt
 
 
Thanks for investigating,
 
Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core