Using Xenomai 3.0.10, with kernel 4.9.128-05789, on armv7, we're having
problems with the functionality of rtdm_waitqueues. The code was written by
a Xenomai-adept developer who has since left for greener pastures.

We have two functions that use rtdm_waitqueue_lock/unlock on the same
rtdm_waitqueue_t to manage access to a shared data structure. One is an
rtdm_task_t that runs periodically every 1ms, the second is an IOCTL
handler.

Problem: In some circumstances, one of the two functions will acquire the
lock, and access the shared data structure. But before the first function
releases the lock, the second function seems to also acquire the lock, and
begin to access its own access of the shared data structure. The second
function releases its lock after its work is complete, and then when the
first function tries to release the lock, it gets an "already unlocked"
error from Xenomai:

[Xenomai] lock 80f10020 already unlocked on CPU #0
          last owner = kernel/xenomai/sched.c:908 (___xnsched_run(), CPU #0)
[<8010ed78>] (unwind_backtrace) from [<8010b5f0>] (show_stack+0x10/0x14)
[<8010b5f0>] (show_stack) from [<801c8c08>] (xnlock_dbg_release+0x12c/0x138)
[<801c8c08>] (xnlock_dbg_release) from [<801be110>] (___xnlock_put+0xc/0x38)
[<801be110>] (___xnlock_put) from [<7f000434>]
(myengine_rtdm_waitqueue_unlock_with_num+0xf8/0x13c [engine_rtnet])
[<7f000434>] (myengine_rtdm_waitqueue_unlock_with_num [engine_rtnet]) from
[<7f00ace8>] (engine_rtnet_periodic_task+0x604/0x660 [engine_rtnet])
[<7f00ace8>] (engine_rtnet_periodic_task [engine_rtnet]) from [<801c73ac>]
(kthread_trampoline+0x68/0xa4)
[<801c73ac>] (kthread_trampoline) from [<80147190>] (kthread+0x108/0x110)
[<80147190>] (kthread) from [<80107cd4>] (ret_from_fork+0x18/0x24)


These waitqueues were originally mutexes, and the above-mentioned adept
committed this change to waitqueues seven years ago with the following
comment: "Use Wait Queue instead of Mutex, because Mutex can't be called
from the non-RT context."

We'd expect that once one of the functions obtains the lock on the
waitqueue, the other would be blocked until the first function releases the
lock. It's quite possible, likely really, that we don't understand the
differences between mutexes and waitqueues. We've looked at the online
Xenomai documentation on waitqueues, but we have not been enlightened.


Would you have any suggestions on things we should do (or not do) to figure
out what's going on?


Many thanks,
Matt

Reply via email to