Philippe Gerum wrote: > Jan Kiszka wrote: >> Philippe Gerum wrote: >>> Philippe Gerum wrote: >>>> Jan Kiszka wrote: >>>>> Philippe Gerum wrote: >>>>>> Jan Kiszka wrote: >>>>>>> Philippe Gerum wrote: >>>>>>>> Jan Kiszka wrote: >>>>>>>>> commit 728fc8970e2032b3280971788f1223f3ad82d80d >>>>>>>>> Author: Jan Kiszka <jan.kis...@siemens.com> >>>>>>>>> Date: Thu Jan 15 11:10:24 2009 +0100 >>>>>>>>> >>>>>>>>> xnpipe: Fix racy callback handlers >>>>>>>>> >>>>>>>>> Invocation of input, output and alloc handler must take place >>>>>>>>> under >>>>>>>>> nklock to properly synchronize with xnpipe_disconnect. Change all >>>>>>>>> callers to comply with this policy. >>>>>>>>> >>>>>>>> That one is under investigation. I agree on the bug report (thanks >>>>>>>> btw), but I >>>>>>>> disagree on the fix. Basically, we can't run all hooks under nklock. >>>>>>>> For >>>>>>>> instance, the alloc_handler may issue kmalloc() calls when issued from >>>>>>>> the Linux >>>>>>>> write endpoint. >>>>>>> You mean it /could/? Because no in-tree user (ie. native) calls >>>>>>> rt-unsafe services from its alloc_handler. >>>>>>> >>>>>> When you export a public interface, it is better not to make it >>>>>> incompatible >>>>>> unless there is no other way to fix a situation. Doing so is last resort >>>>>> for me. >>>>> OTH, there is nothing documented yet about those callback handlers or >>>>> xnpipe_connect. So we could only break _assumptions_ about this >>>>> interface. >>>> Actually, we would break existing implementations, but I understand your >>>> point. >>>> >>>> But, of course, I would be happy if we could continue to keep >>>>> the critical section length short. I just don't see how to achieve this >>>>> without significant restrictions on the callback handlers and their use >>>>> cases. >>>>> >>>> That is because the semantics of those callouts is not that smart. If we >>>> have to >>>> break the API to solve the locking issue, I want to get the semantics >>>> fixed in >>>> the same move (which may help a lot in solving the locking issue as well), >>>> so we >>>> don't end up with two subsequent breakage. >>>> >>> As suspected, the callback management in the message pipe code had very >>> serious >>> issues, including the xnpipe_disconnect / xnpipe_release race. At least: >>> >>> - the latency would simply skyrocket when the input queue (i.e. userland to >>> kernel) is flushed from xnpipe_cleanup_user_conn, because the latter holds >>> the >>> nklock with interrupts off when doing so. I've tested this case on v2.4.x, >>> the >>> results on a latency test when the pipe test code is running in parallel >>> are not >>> pretty (i.e. > 220 us latency spot). >>> >>> - message buffers waiting in the output queue while the latter is flushed >>> would >>> NOT be returned to the memory pool when an input handler is present, which >>> is >>> the case with the native API that provides one. All the confusion went from >>> the >>> output notification and free buffer logic that were combined into the output >>> handler. This went unnoticed probably because messages sent via the write() >>> endpoint are immediately consumed by the receiving side in a RT task that >>> preempts, but still, the bug is there. >>> >>> The disconnect vs release race can be fixed in a (nearly) lockless way >>> using a >>> lingering close sequence, that prevents xnpipe_disconnect() to wipe out the >>> extra state (e.g. RT_PIPE struct) until xnpipe_release() does not need it >>> anymore. In short, if the userland side is connected, xnpipe_disconnect() >>> will >>> defer the actual release of that extra state to xnpipe_release(); >>> otherwise, it >>> will discard it immediately. A new callback, namely .release has been added >>> for >>> that purpose. >> Just had a short glance so far. Looks good - except that there is still >> a use (cookie, now xstate, from xnpipe_state) after release >> (xnpipe_minor_free). Can't we push the minor_free after calling the >> release handler? >> > > We should indeed postpone this just in case the upper layer indexes the extra > state on the minor value. We can also simplify a few things doing so. > > --- ksrc/nucleus/pipe.c (revision 4565) > +++ ksrc/nucleus/pipe.c (working copy) > @@ -77,11 +77,9 @@ > > static inline void xnpipe_minor_free(int minor) > { > - if (minor < 0 || minor >= XNPIPE_NDEVS) > - return; > - > - __clrbits(xnpipe_bitmap[minor / BITS_PER_LONG], > - 1UL << (minor % BITS_PER_LONG));
xnarch_write_memory_barrier()? Just for the paranoiacs. > + /* May be called with nklock free. */ > + clrbits(xnpipe_bitmap[minor / BITS_PER_LONG], > + 1UL << (minor % BITS_PER_LONG)); > } > > static inline void xnpipe_enqueue_wait(struct xnpipe_state *state, int mask) > @@ -413,9 +411,9 @@ > __setbits(state->status, XNPIPE_KERN_LCLOSE); > xnlock_put_irqrestore(&nklock, s); > } else { > - xnpipe_minor_free(minor); > xnlock_put_irqrestore(&nklock, s); > state->ops.release(state->xstate); > + xnpipe_minor_free(minor); > } > > if (need_sched) > @@ -621,9 +619,9 @@ > __clrbits((__state)->status, XNPIPE_USER_CONN); \ > if (testbits((__state)->status, XNPIPE_KERN_LCLOSE)) { \ > clrbits((__state)->status, XNPIPE_KERN_LCLOSE); \ > - xnpipe_minor_free(xnminor_from_state(__state)); \ > xnlock_put_irqrestore(&nklock, (__s)); \ > (__state)->ops.release((__state)->xstate); \ > + xnpipe_minor_free(xnminor_from_state(__state)); \ > xnlock_get_irqsave(&nklock, (__s)); \ > } \ > } while(0) > Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core