Jan Kiszka wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>> Philippe Gerum wrote:
>>>> Philippe Gerum wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Philippe Gerum wrote:
>>>>>>> Jan Kiszka wrote:
>>>>>>>> Philippe Gerum wrote:
>>>>>>>>> Jan Kiszka wrote:
>>>>>>>>>> commit 728fc8970e2032b3280971788f1223f3ad82d80d
>>>>>>>>>> Author: Jan Kiszka <jan.kis...@siemens.com>
>>>>>>>>>> Date:   Thu Jan 15 11:10:24 2009 +0100
>>>>>>>>>>
>>>>>>>>>>     xnpipe: Fix racy callback handlers
>>>>>>>>>>     
>>>>>>>>>>     Invocation of input, output and alloc handler must take place 
>>>>>>>>>> under
>>>>>>>>>>     nklock to properly synchronize with xnpipe_disconnect. Change all
>>>>>>>>>>     callers to comply with this policy.
>>>>>>>>>>
>>>>>>>>> That one is under investigation. I agree on the bug report (thanks 
>>>>>>>>> btw), but I
>>>>>>>>> disagree on the fix. Basically, we can't run all hooks under nklock. 
>>>>>>>>> For
>>>>>>>>> instance, the alloc_handler may issue kmalloc() calls when issued 
>>>>>>>>> from the Linux
>>>>>>>>> write endpoint.
>>>>>>>> You mean it /could/? Because no in-tree user (ie. native) calls
>>>>>>>> rt-unsafe services from its alloc_handler.
>>>>>>>>
>>>>>>> When you export a public interface, it is better not to make it 
>>>>>>> incompatible
>>>>>>> unless there is no other way to fix a situation. Doing so is last 
>>>>>>> resort for me.
>>>>>> OTH, there is nothing documented yet about those callback handlers or
>>>>>> xnpipe_connect. So we could only break _assumptions_ about this
>>>>>> interface.
>>>>> Actually, we would break existing implementations, but I understand your 
>>>>> point.
>>>>>
>>>>>  But, of course, I would be happy if we could continue to keep
>>>>>> the critical section length short. I just don't see how to achieve this
>>>>>> without significant restrictions on the callback handlers and their use
>>>>>> cases.
>>>>>>
>>>>> That is because the semantics of those callouts is not that smart. If we 
>>>>> have to
>>>>> break the API to solve the locking issue, I want to get the semantics 
>>>>> fixed in
>>>>> the same move (which may help a lot in solving the locking issue as 
>>>>> well), so we
>>>>> don't end up with two subsequent breakage.
>>>>>
>>>> As suspected, the callback management in the message pipe code had very 
>>>> serious
>>>> issues, including the xnpipe_disconnect / xnpipe_release race. At least:
>>>>
>>>> - the latency would simply skyrocket when the input queue (i.e. userland to
>>>> kernel) is flushed from xnpipe_cleanup_user_conn, because the latter holds 
>>>> the
>>>> nklock with interrupts off when doing so. I've tested this case on v2.4.x, 
>>>> the
>>>> results on a latency test when the pipe test code is running in parallel 
>>>> are not
>>>> pretty (i.e. > 220 us latency spot).
>>>>
>>>> - message buffers waiting in the output queue while the latter is flushed 
>>>> would
>>>> NOT be returned to the memory pool when an input handler is present, which 
>>>> is
>>>> the case with the native API that provides one. All the confusion went 
>>>> from the
>>>> output notification and free buffer logic that were combined into the 
>>>> output
>>>> handler. This went unnoticed probably because messages sent via the write()
>>>> endpoint are immediately consumed by the receiving side in a RT task that
>>>> preempts, but still, the bug is there.
>>>>
>>>> The disconnect vs release race can be fixed in a (nearly) lockless way 
>>>> using a
>>>> lingering close sequence, that prevents xnpipe_disconnect() to wipe out the
>>>> extra state (e.g. RT_PIPE struct) until xnpipe_release() does not need it
>>>> anymore. In short, if the userland side is connected, xnpipe_disconnect() 
>>>> will
>>>> defer the actual release of that extra state to xnpipe_release(); 
>>>> otherwise, it
>>>> will discard it immediately. A new callback, namely .release has been 
>>>> added for
>>>> that purpose.
>>> Just had a short glance so far. Looks good - except that there is still
>>> a use (cookie, now xstate, from xnpipe_state) after release
>>> (xnpipe_minor_free). Can't we push the minor_free after calling the
>>> release handler?
>>>
>> We should indeed postpone this just in case the upper layer indexes the extra
>> state on the minor value. We can also simplify a few things doing so.
>>
>> --- ksrc/nucleus/pipe.c      (revision 4565)
>> +++ ksrc/nucleus/pipe.c      (working copy)
>> @@ -77,11 +77,9 @@
>>
>>  static inline void xnpipe_minor_free(int minor)
>>  {
>> -    if (minor < 0 || minor >= XNPIPE_NDEVS)
>> -            return;
>> -
>> -    __clrbits(xnpipe_bitmap[minor / BITS_PER_LONG],
>> -              1UL << (minor % BITS_PER_LONG));
> 
> xnarch_write_memory_barrier()? Just for the paranoiacs.
>

Yes, we should be that paranoid.

>> +    /* May be called with nklock free. */
>> +    clrbits(xnpipe_bitmap[minor / BITS_PER_LONG],
>> +            1UL << (minor % BITS_PER_LONG));
>>  }
>>
>>  static inline void xnpipe_enqueue_wait(struct xnpipe_state *state, int mask)
>> @@ -413,9 +411,9 @@
>>              __setbits(state->status, XNPIPE_KERN_LCLOSE);
>>              xnlock_put_irqrestore(&nklock, s);
>>      } else {
>> -            xnpipe_minor_free(minor);
>>              xnlock_put_irqrestore(&nklock, s);
>>              state->ops.release(state->xstate);
>> +            xnpipe_minor_free(minor);
>>      }
>>
>>      if (need_sched)
>> @@ -621,9 +619,9 @@
>>              __clrbits((__state)->status, XNPIPE_USER_CONN);         \
>>              if (testbits((__state)->status, XNPIPE_KERN_LCLOSE)) {  \
>>                      clrbits((__state)->status, XNPIPE_KERN_LCLOSE); \
>> -                    xnpipe_minor_free(xnminor_from_state(__state)); \
>>                      xnlock_put_irqrestore(&nklock, (__s));          \
>>                      (__state)->ops.release((__state)->xstate);      \
>> +                    xnpipe_minor_free(xnminor_from_state(__state)); \
>>                      xnlock_get_irqsave(&nklock, (__s));             \
>>              }                                                       \
>>      } while(0)
>>
> 
> Jan
> 


-- 
Philippe.

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to