2014-07-28 13:09 GMT+02:00 Philippe Gerum <[email protected]>:
> On 07/23/2014 10:31 AM, Kim De Mey wrote:
>> 2014-07-09 17:05 GMT+02:00 Kim De Mey <[email protected]>:
>>>
>>> 2014-07-09 16:16 GMT+02:00 Philippe Gerum <[email protected]>:
>>>> On 07/09/2014 12:23 PM, Kim De Mey wrote:
>>>>>
>>>>> 2014-07-09 11:25 GMT+02:00 Philippe Gerum <[email protected]>:
>>>>>>
>>>>>> On 07/09/2014 10:09 AM, Kim De Mey wrote:
>>>>>>
>>>>>>> Newer version --dump-config output:
>>>>>>> ...
>>>>>>> CONFIG_XENO_COMPILER="gcc version 4.7.0 (Cavium Inc. Version:
>>>>>>> SDK_3_1_0 build 27) "
>>>>>>> ...
>>>>>>>
>>>>>>
>>>>>> Thanks. I can't reproduce this issue yet, so posting the current gdb
>>>>>> backtraces for all the threads still shown by "info threads" after a
>>>>>> deadlock would help.
>>>>>>
>>>>>> TIA,
>>>>>>
>>>>>
>>>>> Below backtraces of all the threads. It is a case with two "worker"
>>>>> tasks in deadlock. The "main" and "create_delete" tasks continued to
>>>>> their "tm_wkafter" loop.
>>>>>
>>>>>
>>>>> (gdb) thread a a bt
>>>>>
>>>>> Thread 4 (Thread 9406):
>>>>> #0  0x77cc2d94 in __pthread_mutex_lock_full (mutex=0x77b03300) at
>>>>> pthread_mutex_lock.c:321
>>>>> #1  0x77cc7e64 in __GI___pthread_mutex_lock (mutex=0x77b03300) at
>>>>> pthread_mutex_lock.c:55
>>>>> #2  0x77cf91e0 in threadobj_lock () from
>>>>>
>>>>> /repo/kdemey/buildroot-sdk31/output/host/usr/mips64-buildroot-linux-gnu/sysroot/usr/lib/libcopperplate.so.0
>>>>> #3  0x77cf92e4 in finalize_thread () from
>>>>>
>>>>> /repo/kdemey/buildroot-sdk31/output/host/usr/mips64-buildroot-linux-gnu/sysroot/usr/lib/libcopperplate.so.0
>>>>> #4  0x77cc5704 in __nptl_deallocate_tsd () at pthread_create.c:156
>>>>> #5  0x77cc5940 in start_thread (arg=0x75dff490) at pthread_create.c:315
>>>>> #6  0x77bf1bbc in __thread_start () from
>>>>> output/host/usr/mips64-buildroot-linux-gnu/sysroot/lib32/libc.so.6
>>>>> Backtrace stopped: frame did not save the PC
>>>>>
>>>>> Thread 3 (Thread 31145):
>>>>> #0  0x77cc2d94 in __pthread_mutex_lock_full (mutex=0x77b030b8) at
>>>>> pthread_mutex_lock.c:321
>>>>> #1  0x77cc7e64 in __GI___pthread_mutex_lock (mutex=0x77b030b8) at
>>>>> pthread_mutex_lock.c:55
>>>>> #2  0x77cf91e0 in threadobj_lock () from
>>>>>
>>>>> /repo/kdemey/buildroot-sdk31/output/host/usr/mips64-buildroot-linux-gnu/sysroot/usr/lib/libcopperplate.so.0
>>>>> #3  0x77cf92e4 in finalize_thread () from
>>>>>
>>>>> /repo/kdemey/buildroot-sdk31/output/host/usr/mips64-buildroot-linux-gnu/sysroot/usr/lib/libcopperplate.so.0
>>>>> #4  0x77cc5704 in __nptl_deallocate_tsd () at pthread_create.c:156
>>>>> #5  0x77cc5940 in start_thread (arg=0x766ff490) at pthread_create.c:315
>>>>> #6  0x77bf1bbc in __thread_start () from
>>>>> output/host/usr/mips64-buildroot-linux-gnu/sysroot/lib32/libc.so.6
>>>>> Backtrace stopped: frame did not save the PC
>>>>>
>>>>> Thread 2 (Thread 6970):
>>>>> #0  clock_nanosleep (clock_id=<optimized out>, flags=<optimized out>,
>>>>> req=<optimized out>, rem=<optimized out>)
>>>>>      at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:50
>>>>> #1  0x77cfa0bc in threadobj_sleep () from
>>>>>
>>>>> /repo/kdemey/buildroot-sdk31/output/host/usr/mips64-buildroot-linux-gnu/sysroot/usr/lib/libcopperplate.so.0
>>>>> #2  0x77d2af08 in tm_wkafter () from
>>>>>
>>>>> /repo/kdemey/buildroot-sdk31/output/host/usr/mips64-buildroot-linux-gnu/sysroot/usr/lib/libpsos.so.0
>>>>> #3  0x10000b64 in create_delete (a=0, b=0, c=0, d=0) at
>>>>> delete_child_hangs.c:31
>>>>> #4  0x77d296fc in task_trampoline () from
>>>>>
>>>>> /repo/kdemey/buildroot-sdk31/output/host/usr/mips64-buildroot-linux-gnu/sysroot/usr/lib/libpsos.so.0
>>>>> #5  0x77cf7d84 in thread_trampoline () from
>>>>>
>>>>> /repo/kdemey/buildroot-sdk31/output/host/usr/mips64-buildroot-linux-gnu/sysroot/usr/lib/libcopperplate.so.0
>>>>> #6  0x77cc592c in start_thread (arg=0x77a00490) at pthread_create.c:310
>>>>> #7  0x77bf1bbc in __thread_start () from
>>>>> output/host/usr/mips64-buildroot-linux-gnu/sysroot/lib32/libc.so.6
>>>>> Backtrace stopped: frame did not save the PC
>>>>>
>>>>> Thread 1 (Thread 6969):
>>>>> #0  clock_nanosleep (clock_id=<optimized out>, flags=<optimized out>,
>>>>> req=<optimized out>, rem=<optimized out>)
>>>>>      at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:50
>>>>> #1  0x77cfa0bc in threadobj_sleep () from
>>>>>
>>>>> /repo/kdemey/buildroot-sdk31/output/host/usr/mips64-buildroot-linux-gnu/sysroot/usr/lib/libcopperplate.so.0
>>>>> #2  0x77d2af08 in tm_wkafter () from
>>>>>
>>>>> /repo/kdemey/buildroot-sdk31/output/host/usr/mips64-buildroot-linux-gnu/sysroot/usr/lib/libpsos.so.0
>>>>> #3  0x10000c00 in main (argc=1, argv=0x10011030) at
>>>>> delete_child_hangs.c:43
>>>>>
>>>>
>>>> Ok, unless we've entered the twilight zone, the only reason I could see 
>>>> this
>>>> happening would be that the thread prologue somehow gets hit by an async
>>>> cancellation signal while holding its own lock. If so, then the patch below
>>>> would cause an assertion to trigger in such circumstance. You will have to
>>>> turn on --enable-assert in your configuration, leaving debugging entirely
>>>> off not to significantly affect the current timings.
>>>>
>>>> diff --git a/include/boilerplate/lock.h b/include/boilerplate/lock.h
>>>> index 6f0218c..b704987 100644
>>>> --- a/include/boilerplate/lock.h
>>>> +++ b/include/boilerplate/lock.h
>>>> @@ -206,7 +206,7 @@ int __check_cancel_type(const char *locktype);
>>>>  #define read_unlock_safe(__lock, __state)      \
>>>>         __do_unlock_safe(__lock, __state)
>>>>
>>>> -#ifdef CONFIG_XENO_DEBUG
>>>> +#ifndef NDEBUG
>>>>  #define mutex_type_attribute PTHREAD_MUTEX_ERRORCHECK
>>>>  #else
>>>>  #define mutex_type_attribute PTHREAD_MUTEX_NORMAL
>>>> diff --git a/lib/copperplate/threadobj.c b/lib/copperplate/threadobj.c
>>>> index 05bb6cb..6547460 100644
>>>> --- a/lib/copperplate/threadobj.c
>>>> +++ b/lib/copperplate/threadobj.c
>>>> @@ -1310,6 +1310,7 @@ int threadobj_cancel(struct threadobj *thobj)
>>>>  static void finalize_thread(void *p) /* thobj->lock free */
>>>>  {
>>>>         struct threadobj *thobj = p;
>>>> +       int ret;
>>>>
>>>>         if (thobj == NULL || thobj == THREADOBJ_IRQCONTEXT)
>>>>                 return;
>>>> @@ -1343,7 +1344,9 @@ static void finalize_thread(void *p) /* thobj->lock
>>>> free */
>>>>          * waiting for us to start, pending on
>>>>          * wait_on_barrier(). Instead, hand it over to this thread.
>>>>          */
>>>> -       threadobj_lock(thobj);
>>>> +       ret = threadobj_lock(thobj);
>>>> +       assert(ret == 0);
>>>> +       (void)ret;
>>>>         if ((thobj->status & __THREAD_S_SAFE) == 0) {
>>>>                 threadobj_unlock(thobj);
>>>>                 destroy_thread(thobj);
>>>>
>>>
>>> And...the assert kicks in!
>>>
>>> threadobj.c:1348: finalize_thread: Assertion `ret == 0' failed.
>>
>>
>> Philippe,
>>
>> Could you explain this case of the async cancellation signal? I don't
>> fully understand how this happens or why it causes the issue.
>>
>> I also tested once with CONFIG_XENO_ASYNC_CANCEL OFF and then the
>> issue doesn't occur anymore.
>>
>>
>
> I have not been able to reproduce this race yet, but still the following 
> patch would make sense:
>
> diff --git a/lib/copperplate/threadobj.c b/lib/copperplate/threadobj.c
> index 05bb6cb..48aa032 100644
> --- a/lib/copperplate/threadobj.c
> +++ b/lib/copperplate/threadobj.c
> @@ -1045,9 +1045,11 @@ static int wait_on_barrier(struct threadobj *thobj, 
> int mask)
>                 if (status & mask)
>                         break;
>                 oldstate = thobj->cancel_state;
> +               push_cleanup_lock(&thobj->lock);
>                 __threadobj_tag_unlocked(thobj);
>                 __RT(pthread_cond_wait(&thobj->barrier, &thobj->lock));
>                 __threadobj_tag_locked(thobj);
> +               pop_cleanup_lock(&thobj->lock);
>                 thobj->cancel_state = oldstate;
>         }
>
> @@ -1243,9 +1245,11 @@ static void cancel_sync(struct threadobj *thobj) /* 
> thobj->lock held */
>          */
>         while (thobj->status & __THREAD_S_WARMUP) {
>                 oldstate = thobj->cancel_state;
> +               push_cleanup_lock(&thobj->lock);
>                 __threadobj_tag_unlocked(thobj);
>                 __RT(pthread_cond_wait(&thobj->barrier, &thobj->lock));
>                 __threadobj_tag_locked(thobj);
> +               pop_cleanup_lock(&thobj->lock);
>                 thobj->cancel_state = oldstate;
>         }
>

I have tried this patch but it does not seem to make any difference.
The threads are still deadlocked at the same location.

I've only tested this on my setup with the old toolchain/kernel so
far. I'll check also with the newer version.


Regards,
Kim

_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

Reply via email to