Re: [Xenomai] segfault in printer_loop()

Jan Kiszka Fri, 10 Nov 2017 02:08:15 -0800

Please always keep the list in CC.

On 2017-11-10 08:34, C Smith wrote:
> The hardware watchpoint did not catch the bad memory access. Here is the
> gdb session based on your advice:
> (gdb) r
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/libthread_db.so.1".
> [New Thread 0xb7fccb40 (LWP 22537)]
> 
> Breakpoint 4, __rt_print_init () at rt_print.c:756
> (gdb) set variable gDebug = 0xC
> (gdb) watch *gDebug
> Hardware watchpoint 10: *gDebug


You need to set the watchpoint in the address of the condition variable
field that is going to be changed, not the invalid value that is written
to it.

Jan

> (gdb) cont
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb7fccb40 (LWP 22537)]
> 0xb7fae0db in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
> (gdb) bt
> #0  0xb7fae0db in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #1  0xb7fd07de in printer_loop (arg=0x0) at rt_print.c:693
> #2  0xb7faaadf in start_thread () from /lib/libpthread.so.0
> #3  0xb7d3b44e in clone () from /lib/libc.so.6
> (gdb) info threads
>   Id   Target Id         Frame
> * 2    Thread 0xb7fccb40 (LWP 22537) "app" 0xb7fae0db in
> pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
>   1    Thread 0xb7c3f6c0 (LWP 22536) "app" 0xffffe424 in
> __kernel_vsyscall ()
> (gdb) p $_siginfo._sifields._sigfault.si_addr
> $5 = (void *) 0xc
> (gdb) frame 1
> #1  0xb7fd07de in printer_loop (arg=0x0) at rt_print.c:693
> (gdb) print buffers
> $6 = 4
> (gdb) print printer_wakeup
> $7 = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq =
> 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
> __size = '\000' <repeats 47 times>, __align = 0}
> (gdb) print buffer_lock
> $8 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0,
> __nusers = 0, {__spins = 0, __list = {__next = 0x0}}}, __size = '\000'
> <repeats 23 times>, __align = 0}
> 
> That line 693 in my sources is the same line of printer_loop() as usual: 
>     pthread_cond_wait(&printer_wakeup, &buffer_lock);
> I didn't hit breakpoint 4 more than once.
> How else might I catch the bad memory access?
> 
> thanks,
> -C Smith
> 
> 
> On Thu, Nov 9, 2017 at 11:02 PM, Jan Kiszka <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     On 2017-11-10 07:58, C Smith wrote:
>     > Agreed the segfault is inside pthread_cond_wait(), the contents of the
>     > args are seen in previous post.
>     > dmesg says this:
>     > app[12316]: segfault at c ip b76fe0db sp b771c268 error 4 in
>     > libpthread-2.15.so <http://libpthread-2.15.so>
>     <http://libpthread-2.15.so>[b76f4000+16000]
>     >
>     > And gdb shows me the same address. After a segfault generated inside 
> gdb:
>     > p $_siginfo._sifields._sigfault.si_addr
>     > $9 = (void *) 0xc
>     >
>     > I've done further testing and in gdb I found that my app segfaults
>     > before hitting the first line of main().
>     > Thus I am unable to catch it in gdb with a hardware watchpoint. I
>     > attempted to do so by first making my
>     > app hit a breakpoint on the first line of main(), then I set a
>     > watchpoint on 0xC in
>     > gdb and run, but I never get a segfault after that point, after over 100
>     > runs.
> 
>     You can set a breakpoint on __rt_print_init, e.g.
> 
>     >
>     > Note that the app launches only 1 realtime thread now in these tests (in
>     > original tests it had 3 threads).
>     > Here is the one way I was able to get the app to run without
>     > segfaulting, even with multiple real
>     > time threads: I set kernel boot option maxcpus=1.  (on a SMP kernel with
>     > 4 cores). I was then able to run
>     > the app over 80 times with no segfault.
>     >
>     > So the segfault is happening on about 10% of runs, in printer_loop(),
>     > apparently before the first line of main(), and I am unable to
>     > catch the bad memory access in a debugger with a watchpoint.
>     > Do you have a suggestion as to how to further debug this?
> 
>     See above. Maybe that alone will give you a hint: if that function, for
>     what ever reason, happens to be called twice, that could explain the
>     issue as well. Then please catch the backtraces of all invocations.
> 
>     Jan
> 
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: 
<http://xenomai.org/pipermail/xenomai/attachments/20171110/87357663/attachment.sig>
_______________________________________________
Xenomai mailing list
[email protected]
https://xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] segfault in printer_loop()

Reply via email to