Jan Kiszka wrote: > Philippe Gerum wrote: >> Jan Kiszka wrote: >>> Philippe Gerum wrote: >>>> Philippe Gerum wrote: >>>>> Jan Kiszka wrote: >>>>>> Philippe Gerum wrote: >>>>>>> Jan Kiszka wrote: >>>>>>>> Philippe Gerum wrote: >>>>>>>>> Jan Kiszka wrote: >>>>>>>>>> commit 728fc8970e2032b3280971788f1223f3ad82d80d >>>>>>>>>> Author: Jan Kiszka <jan.kis...@siemens.com> >>>>>>>>>> Date: Thu Jan 15 11:10:24 2009 +0100 >>>>>>>>>> >>>>>>>>>> xnpipe: Fix racy callback handlers >>>>>>>>>> >>>>>>>>>> Invocation of input, output and alloc handler must take place >>>>>>>>>> under >>>>>>>>>> nklock to properly synchronize with xnpipe_disconnect. Change all >>>>>>>>>> callers to comply with this policy. >>>>>>>>>> >>>>>>>>> That one is under investigation. I agree on the bug report (thanks >>>>>>>>> btw), but I >>>>>>>>> disagree on the fix. Basically, we can't run all hooks under nklock. >>>>>>>>> For >>>>>>>>> instance, the alloc_handler may issue kmalloc() calls when issued >>>>>>>>> from the Linux >>>>>>>>> write endpoint. >>>>>>>> You mean it /could/? Because no in-tree user (ie. native) calls >>>>>>>> rt-unsafe services from its alloc_handler. >>>>>>>> >>>>>>> When you export a public interface, it is better not to make it >>>>>>> incompatible >>>>>>> unless there is no other way to fix a situation. Doing so is last >>>>>>> resort for me. >>>>>> OTH, there is nothing documented yet about those callback handlers or >>>>>> xnpipe_connect. So we could only break _assumptions_ about this >>>>>> interface. >>>>> Actually, we would break existing implementations, but I understand your >>>>> point. >>>>> >>>>> But, of course, I would be happy if we could continue to keep >>>>>> the critical section length short. I just don't see how to achieve this >>>>>> without significant restrictions on the callback handlers and their use >>>>>> cases. >>>>>> >>>>> That is because the semantics of those callouts is not that smart. If we >>>>> have to >>>>> break the API to solve the locking issue, I want to get the semantics >>>>> fixed in >>>>> the same move (which may help a lot in solving the locking issue as >>>>> well), so we >>>>> don't end up with two subsequent breakage. >>>>> >>>> As suspected, the callback management in the message pipe code had very >>>> serious >>>> issues, including the xnpipe_disconnect / xnpipe_release race. At least: >>>> >>>> - the latency would simply skyrocket when the input queue (i.e. userland to >>>> kernel) is flushed from xnpipe_cleanup_user_conn, because the latter holds >>>> the >>>> nklock with interrupts off when doing so. I've tested this case on v2.4.x, >>>> the >>>> results on a latency test when the pipe test code is running in parallel >>>> are not >>>> pretty (i.e. > 220 us latency spot). >>>> >>>> - message buffers waiting in the output queue while the latter is flushed >>>> would >>>> NOT be returned to the memory pool when an input handler is present, which >>>> is >>>> the case with the native API that provides one. All the confusion went >>>> from the >>>> output notification and free buffer logic that were combined into the >>>> output >>>> handler. This went unnoticed probably because messages sent via the write() >>>> endpoint are immediately consumed by the receiving side in a RT task that >>>> preempts, but still, the bug is there. >>>> >>>> The disconnect vs release race can be fixed in a (nearly) lockless way >>>> using a >>>> lingering close sequence, that prevents xnpipe_disconnect() to wipe out the >>>> extra state (e.g. RT_PIPE struct) until xnpipe_release() does not need it >>>> anymore. In short, if the userland side is connected, xnpipe_disconnect() >>>> will >>>> defer the actual release of that extra state to xnpipe_release(); >>>> otherwise, it >>>> will discard it immediately. A new callback, namely .release has been >>>> added for >>>> that purpose. >>> Just had a short glance so far. Looks good - except that there is still >>> a use (cookie, now xstate, from xnpipe_state) after release >>> (xnpipe_minor_free). Can't we push the minor_free after calling the >>> release handler? >>> >> We should indeed postpone this just in case the upper layer indexes the extra >> state on the minor value. We can also simplify a few things doing so. >> >> --- ksrc/nucleus/pipe.c (revision 4565) >> +++ ksrc/nucleus/pipe.c (working copy) >> @@ -77,11 +77,9 @@ >> >> static inline void xnpipe_minor_free(int minor) >> { >> - if (minor < 0 || minor >= XNPIPE_NDEVS) >> - return; >> - >> - __clrbits(xnpipe_bitmap[minor / BITS_PER_LONG], >> - 1UL << (minor % BITS_PER_LONG)); > > xnarch_write_memory_barrier()? Just for the paranoiacs. >
Yes, we should be that paranoid. >> + /* May be called with nklock free. */ >> + clrbits(xnpipe_bitmap[minor / BITS_PER_LONG], >> + 1UL << (minor % BITS_PER_LONG)); >> } >> >> static inline void xnpipe_enqueue_wait(struct xnpipe_state *state, int mask) >> @@ -413,9 +411,9 @@ >> __setbits(state->status, XNPIPE_KERN_LCLOSE); >> xnlock_put_irqrestore(&nklock, s); >> } else { >> - xnpipe_minor_free(minor); >> xnlock_put_irqrestore(&nklock, s); >> state->ops.release(state->xstate); >> + xnpipe_minor_free(minor); >> } >> >> if (need_sched) >> @@ -621,9 +619,9 @@ >> __clrbits((__state)->status, XNPIPE_USER_CONN); \ >> if (testbits((__state)->status, XNPIPE_KERN_LCLOSE)) { \ >> clrbits((__state)->status, XNPIPE_KERN_LCLOSE); \ >> - xnpipe_minor_free(xnminor_from_state(__state)); \ >> xnlock_put_irqrestore(&nklock, (__s)); \ >> (__state)->ops.release((__state)->xstate); \ >> + xnpipe_minor_free(xnminor_from_state(__state)); \ >> xnlock_get_irqsave(&nklock, (__s)); \ >> } \ >> } while(0) >> > > Jan > -- Philippe. _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core