Thank you Jan, Philippe. Your responses have given us a lot to look into and a lot to learn. We'll come back with a more detailed response once we've gained a little more understanding on our end.
Matt On Wed, Mar 16, 2022 at 5:09 AM Philippe Gerum <r...@xenomai.org> wrote: > > Matt Klass via Xenomai <xenomai@xenomai.org> writes: > > > Using Xenomai 3.0.10, with kernel 4.9.128-05789, on armv7, we're having > > problems with the functionality of rtdm_waitqueues. The code was written > by > > a Xenomai-adept developer who has since left for greener pastures. > > > > We have two functions that use rtdm_waitqueue_lock/unlock on the same > > rtdm_waitqueue_t to manage access to a shared data structure. One is an > > rtdm_task_t that runs periodically every 1ms, the second is an IOCTL > > handler. > > Is that a RTDM non-rt ioctl() handler? > > > > > Problem: In some circumstances, one of the two functions will acquire the > > lock, and access the shared data structure. But before the first function > > releases the lock, the second function seems to also acquire the lock, > and > > begin to access its own access of the shared data structure. The second > > function releases its lock after its work is complete, and then when the > > first function tries to release the lock, it gets an "already unlocked" > > error from Xenomai: > > > > [Xenomai] lock 80f10020 already unlocked on CPU #0 > > last owner = kernel/xenomai/sched.c:908 (___xnsched_run(), CPU > #0) > > [<8010ed78>] (unwind_backtrace) from [<8010b5f0>] (show_stack+0x10/0x14) > > [<8010b5f0>] (show_stack) from [<801c8c08>] > (xnlock_dbg_release+0x12c/0x138) > > [<801c8c08>] (xnlock_dbg_release) from [<801be110>] > (___xnlock_put+0xc/0x38) > > [<801be110>] (___xnlock_put) from [<7f000434>] > > (myengine_rtdm_waitqueue_unlock_with_num+0xf8/0x13c [engine_rtnet]) > > [<7f000434>] (myengine_rtdm_waitqueue_unlock_with_num [engine_rtnet]) > from > > [<7f00ace8>] (engine_rtnet_periodic_task+0x604/0x660 [engine_rtnet]) > > [<7f00ace8>] (engine_rtnet_periodic_task [engine_rtnet]) from > [<801c73ac>] > > (kthread_trampoline+0x68/0xa4) > > [<801c73ac>] (kthread_trampoline) from [<80147190>] (kthread+0x108/0x110) > > [<80147190>] (kthread) from [<80107cd4>] (ret_from_fork+0x18/0x24) > > > > This is difficult to comment on this without seeing the whole code using > the wait queue, there are several wait() calls for RTDM waitqueues. It > is possible that the waitqueue construct may be misused. > > > > > These waitqueues were originally mutexes, and the above-mentioned adept > > committed this change to waitqueues seven years ago with the following > > comment: "Use Wait Queue instead of Mutex, because Mutex can't be called > > from the non-RT context." > > > > We'd expect that once one of the functions obtains the lock on the > > waitqueue, the other would be blocked until the first function releases > the > > lock. It's quite possible, likely really, that we don't understand the > > differences between mutexes and waitqueues. We've looked at the online > > Xenomai documentation on waitqueues, but we have not been enlightened. > > > > RTDM mutexes follow the common POSIX mutex semantics, with priority > inheritance force enabled. On the other hand, waitqueues allow for any > number of threads to wait for an arbitrary condition only known by the > application to happen. > > Strictly speaking, rtdm_waitqueue_lock/unlock is supposed to bind the > condition and the waitqueue access atomically together, in order to > prevent wakeup signals from being missed (pretty much like the common > POSIX mutex+condvar logic). Typically, this lock is taken by a waiter > before it checks the condition then goes sleeping on the associated wq, > and released atomically by the scheduler right before switching out that > waiter as the condition is still unmet. > > So if this is about serializing all accesses to a user-defined shared > memory, the wq semantics would not fit well, and waitqueue_lock/unlock > would not serialize anything past the waitqueue handling code itself. > > > > > Would you have any suggestions on things we should do (or not do) to > figure > > out what's going on? > > > > If the idea is to serialize non-RT (ioctl_nrt handler?) vs RT contexts, > then no RTDM synchronization object will do, these can only do RT/RT > serialization. > > Xenomai 3 cannot do write/write serialization between non-RT and RT > stages natively (Xenomai 4 can do so via the so-called 'stax' objects, > but this is not going to help you ATM I guess). If this is read/write, > and the non-RT ioctl handler is the reader, _and_ the shared data is > fairly small, then you might resort to some kind of ad hoc sequence lock > mechanism to implement this. > > -- > Philippe. >