Jan Kiszka wrote: > Philippe Gerum wrote: >> Philippe Gerum wrote: >>> Jan Kiszka wrote: >>>> Philippe Gerum wrote: >>>>> Jan Kiszka wrote: >>>>>> Philippe Gerum wrote: >>>>>>> Jan Kiszka wrote: >>>>>>>> commit 728fc8970e2032b3280971788f1223f3ad82d80d >>>>>>>> Author: Jan Kiszka <jan.kis...@siemens.com> >>>>>>>> Date: Thu Jan 15 11:10:24 2009 +0100 >>>>>>>> >>>>>>>> xnpipe: Fix racy callback handlers >>>>>>>> >>>>>>>> Invocation of input, output and alloc handler must take place under >>>>>>>> nklock to properly synchronize with xnpipe_disconnect. Change all >>>>>>>> callers to comply with this policy. >>>>>>>> >>>>>>> That one is under investigation. I agree on the bug report (thanks >>>>>>> btw), but I >>>>>>> disagree on the fix. Basically, we can't run all hooks under nklock. For >>>>>>> instance, the alloc_handler may issue kmalloc() calls when issued from >>>>>>> the Linux >>>>>>> write endpoint. >>>>>> You mean it /could/? Because no in-tree user (ie. native) calls >>>>>> rt-unsafe services from its alloc_handler. >>>>>> >>>>> When you export a public interface, it is better not to make it >>>>> incompatible >>>>> unless there is no other way to fix a situation. Doing so is last resort >>>>> for me. >>>> OTH, there is nothing documented yet about those callback handlers or >>>> xnpipe_connect. So we could only break _assumptions_ about this >>>> interface. >>> Actually, we would break existing implementations, but I understand your >>> point. >>> >>> But, of course, I would be happy if we could continue to keep >>>> the critical section length short. I just don't see how to achieve this >>>> without significant restrictions on the callback handlers and their use >>>> cases. >>>> >>> That is because the semantics of those callouts is not that smart. If we >>> have to >>> break the API to solve the locking issue, I want to get the semantics fixed >>> in >>> the same move (which may help a lot in solving the locking issue as well), >>> so we >>> don't end up with two subsequent breakage. >>> >> As suspected, the callback management in the message pipe code had very >> serious >> issues, including the xnpipe_disconnect / xnpipe_release race. At least: >> >> - the latency would simply skyrocket when the input queue (i.e. userland to >> kernel) is flushed from xnpipe_cleanup_user_conn, because the latter holds >> the >> nklock with interrupts off when doing so. I've tested this case on v2.4.x, >> the >> results on a latency test when the pipe test code is running in parallel are >> not >> pretty (i.e. > 220 us latency spot). >> >> - message buffers waiting in the output queue while the latter is flushed >> would >> NOT be returned to the memory pool when an input handler is present, which is >> the case with the native API that provides one. All the confusion went from >> the >> output notification and free buffer logic that were combined into the output >> handler. This went unnoticed probably because messages sent via the write() >> endpoint are immediately consumed by the receiving side in a RT task that >> preempts, but still, the bug is there. >> >> The disconnect vs release race can be fixed in a (nearly) lockless way using >> a >> lingering close sequence, that prevents xnpipe_disconnect() to wipe out the >> extra state (e.g. RT_PIPE struct) until xnpipe_release() does not need it >> anymore. In short, if the userland side is connected, xnpipe_disconnect() >> will >> defer the actual release of that extra state to xnpipe_release(); otherwise, >> it >> will discard it immediately. A new callback, namely .release has been added >> for >> that purpose. > > Just had a short glance so far. Looks good - except that there is still > a use (cookie, now xstate, from xnpipe_state) after release > (xnpipe_minor_free). Can't we push the minor_free after calling the > release handler? >
We should indeed postpone this just in case the upper layer indexes the extra state on the minor value. We can also simplify a few things doing so. --- ksrc/nucleus/pipe.c (revision 4565) +++ ksrc/nucleus/pipe.c (working copy) @@ -77,11 +77,9 @@ static inline void xnpipe_minor_free(int minor) { - if (minor < 0 || minor >= XNPIPE_NDEVS) - return; - - __clrbits(xnpipe_bitmap[minor / BITS_PER_LONG], - 1UL << (minor % BITS_PER_LONG)); + /* May be called with nklock free. */ + clrbits(xnpipe_bitmap[minor / BITS_PER_LONG], + 1UL << (minor % BITS_PER_LONG)); } static inline void xnpipe_enqueue_wait(struct xnpipe_state *state, int mask) @@ -413,9 +411,9 @@ __setbits(state->status, XNPIPE_KERN_LCLOSE); xnlock_put_irqrestore(&nklock, s); } else { - xnpipe_minor_free(minor); xnlock_put_irqrestore(&nklock, s); state->ops.release(state->xstate); + xnpipe_minor_free(minor); } if (need_sched) @@ -621,9 +619,9 @@ __clrbits((__state)->status, XNPIPE_USER_CONN); \ if (testbits((__state)->status, XNPIPE_KERN_LCLOSE)) { \ clrbits((__state)->status, XNPIPE_KERN_LCLOSE); \ - xnpipe_minor_free(xnminor_from_state(__state)); \ xnlock_put_irqrestore(&nklock, (__s)); \ (__state)->ops.release((__state)->xstate); \ + xnpipe_minor_free(xnminor_from_state(__state)); \ xnlock_get_irqsave(&nklock, (__s)); \ } \ } while(0) >> The internal xnpipe API had to be fixed with respect to the new set of >> callbacks >> it accepts, but the overall logic remained the same. The handlers were simply >> made more consistent with what the message pipe machinery expects from them. >> There is no user visible change. >> >> I have committed the changes that fix all the issues above; the new code >> survived significant stress testing here, which only means that this seems >> to be >> a reasonable framework to start working with. > > Will see that I can quickly integrate this in our environment to broaden > the test base. > Great, thanks. >> Given the severity of the bugs, I definitely plan to backport those changes >> to >> v2.4.x at some point (the earlier, the better), since we could not reasonably >> live with such issues in a stable version. >> > > Yep, would be good. > > Thanks, > Jan > -- Philippe. _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core