Re: [Xenomai] [Xenomai-git] Jan Kiszka : cobalt/kernel: Fix locking for xnthread info manipulations

Philippe Gerum Tue, 30 Jun 2015 11:58:54 -0700

On 06/30/2015 08:40 PM, Jan Kiszka wrote:
> On 2015-06-27 15:05, Philippe Gerum wrote:
>> On 06/27/2015 12:59 PM, Gilles Chanteperdrix wrote:
>>> On Sat, Jun 27, 2015 at 12:50:35PM +0200, Philippe Gerum wrote:
>>>> On 06/27/2015 12:03 PM, Gilles Chanteperdrix wrote:
>>>>> On Fri, Jun 26, 2015 at 10:17:21PM +0200, Jan Kiszka wrote:
>>>>>> On 2015-06-26 18:01, Gilles Chanteperdrix wrote:
>>>>>>> On Fri, Jun 26, 2015 at 04:12:44PM +0200, git repository hosting wrote:
>>>>>>>> Module: xenomai-jki
>>>>>>>> Branch: for-forge
>>>>>>>> Commit: 67a64e164b737759cc171d3b04b05770e1450cb6
>>>>>>>> URL:    
>>>>>>>> http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=67a64e164b737759cc171d3b04b05770e1450cb6
>>>>>>>>
>>>>>>>> Author: Jan Kiszka <[email protected]>
>>>>>>>> Date:   Fri Jun 26 15:11:42 2015 +0200
>>>>>>>>
>>>>>>>> cobalt/kernel: Fix locking for xnthread info manipulations
>>>>>>>>
>>>>>>>> nklock must be held when manipulating bits of xnthread::info. Not all
>>>>>>>> callsites of xnthread_set/clear_info follow this rule so far, directly
>>>>>>>> or indirectly, fix them (and possibly some other races along this).
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Kiszka <[email protected]>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>>  include/cobalt/kernel/rtdm/driver.h |    8 ++++----
>>>>>>>>  kernel/cobalt/posix/syscall.c       |    5 +++++
>>>>>>>>  kernel/cobalt/synch.c               |    2 ++
>>>>>>>>  kernel/cobalt/thread.c              |    5 +++++
>>>>>>>>  4 files changed, 16 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/cobalt/kernel/rtdm/driver.h 
>>>>>>>> b/include/cobalt/kernel/rtdm/driver.h
>>>>>>>> index c14198b..de476ca 100644
>>>>>>>> --- a/include/cobalt/kernel/rtdm/driver.h
>>>>>>>> +++ b/include/cobalt/kernel/rtdm/driver.h
>>>>>>>> @@ -553,7 +553,7 @@ static inline void rtdm_lock_get(rtdm_lock_t *lock)
>>>>>>>>  {
>>>>>>>>        XENO_BUG_ON(COBALT, !spltest());
>>>>>>>>        spin_lock(lock);
>>>>>>>> -      __xnsched_lock();
>>>>>>>> +      xnsched_lock();
>>>>>>>>  }
>>>>>>>>  
>>>>>>>>  /**
>>>>>>>> @@ -566,7 +566,7 @@ static inline void rtdm_lock_get(rtdm_lock_t *lock)
>>>>>>>>  static inline void rtdm_lock_put(rtdm_lock_t *lock)
>>>>>>>>  {
>>>>>>>>        spin_unlock(lock);
>>>>>>>> -      __xnsched_unlock();
>>>>>>>> +      xnsched_unlock();
>>>>>>>>  }
>>>>>>>>  
>>>>>>>>  /**
>>>>>>>> @@ -584,7 +584,7 @@ static inline rtdm_lockctx_t 
>>>>>>>> __rtdm_lock_get_irqsave(rtdm_lock_t *lock)
>>>>>>>>  
>>>>>>>>        context = ipipe_test_and_stall_head();
>>>>>>>>        spin_lock(lock);
>>>>>>>> -      __xnsched_lock();
>>>>>>>> +      xnsched_lock();
>>>>>>>>  
>>>>>>>>        return context;
>>>>>>>>  }
>>>>>>>> @@ -603,7 +603,7 @@ static inline
>>>>>>>>  void rtdm_lock_put_irqrestore(rtdm_lock_t *lock, rtdm_lockctx_t 
>>>>>>>> context)
>>>>>>>>  {
>>>>>>>>        spin_unlock(lock);
>>>>>>>> -      __xnsched_unlock();
>>>>>>>> +      xnsched_unlock();
>>>>>>>>        ipipe_restore_head(context);
>>>>>>>>  }
>>>>>>>
>>>>>>> These changes are not without risk: is not there a risk of deadlock
>>>>>>> if one thread calls rtdm_lock_get without holding the nklock and
>>>>>>> another calls it while holding it?
>>>>>>
>>>>>> That could happen, although the "normal" pattern does not involve
>>>>>> explicit nklock acquisition if nesting is used, rather rtdm_lock before
>>>>>> calling nklock-acquiring services. Not a reasonable ordering either, but
>>>>>> common practice now.
>>>>>>
>>>>>> We could also try to rework the whole scheduler lock so that it avoids
>>>>>> unsafe thread fields, ideally converting it to something more
>>>>>> lightweight as Linux' preempt_disable/enable().
>>>>>>
>>>>>> BTW, xnsched has those two types of flags fields: status requires
>>>>>> nklock, lflags must only be locally modified, thus are fine with
>>>>>> interrupt disabling as protection.
>>>>>
>>>>> the sched lock uses lflags. The only point where we need to know
>>>>> that a thread had the scheduler lock, is when it is switched in
>>>>> after having voluntarily suspended. So, maybe a solution would be to
>>>>> set that state bit only when a thread with the scheduler lock
>>>>> voluntarily suspends. That would make the scheduler lock quite
>>>>> lightweight (refcount + scheduler lflags bit).
>>>>>
>>>>
>>>> - XNINLOCK is overkill, it is only tested for skipping the resched
>>>> procedure as an optimization from xnsched_run(). As we may access
>>>> sched->lflags in that situation, we may also access
>>>> sched->curr->lock_count to figure this out.
>>>>
>>>> - XNLOCK and xnthread->lock_count are redundant when combined to track
>>>> the lock state internally. This is part of what makes xnsched_lock() and
>>>> friends uselessly complex (the potential for deadlocks you mentioned
>>>> about the RTDM locking changes is definitely a concern, dropping the
>>>> requirement for grabbing the nklock there is very desirable in that
>>>> sense). However, we still need XNLOCK as part of the ABI to convey the
>>>> state bit information between kernel and userland. But, referring to
>>>> curr->lock_count should be enough for implementing that logic in the
>>>> scheduler code.
>>>>
>>>> - there are only two places where XNLOCK is accessed within the state
>>>> word of a potentially non-current thread, i.e: from
>>>> __xnthread_set_schedparam(), and xnthread_set_mode(). But XNLOCK is only
>>>> tested there, and testing thread->lock_count would work exactly the same
>>>> way.
>>>>
>>>> So, yes, I agree that such code direly needs simplification.
>>>
>>> Ok. I was thinking about using XNINLOCK in place of XNLOCK, but
>>> using lock_count may be even simpler.
>>>
>>> The nklock seems to also be needed for handling the XNLBALERT bit.
>>>
>>
>> Correct, but I'm looking for a way to move this elsewhere.
>>
> 
> BTW, let me know how we should proceed with this issue. Are you looking
> into the scheduler lock? I could split off the RTDM thing from the rest
> if those bits are fine.
>


Yes, I'm dropping XNLOCK as a runtime bit, keeping it only for ABI
purpose between lib/cobalt and pthread_setmode_np basically. I'll push
the changes tomorrow. Next step will be to get rid of nklock for
xnsched_lock() entirely, still needs a bit more thought though.

-- 
Philippe.

_______________________________________________
Xenomai mailing list
[email protected]
http://xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] [Xenomai-git] Jan Kiszka : cobalt/kernel: Fix locking for xnthread info manipulations

Reply via email to