Re: [Xenomai] Target frozen with rtcan_virt

Philippe Gerum Wed, 10 Oct 2012 05:43:16 -0700

On 10/10/2012 02:41 PM, Philippe Gerum wrote:
> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>
>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we 
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the 
>>>>>>>>>>>>>>> second patch,
>>>>>>>>>>>>>>> we might want to try and reduce the overhead of 
>>>>>>>>>>>>>>> xnpod_unlock_sched.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? 
>>>>>>>>>>>>>> That's a
>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are 
>>>>>>>>>>>>>> also using
>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full 
>>>>>>>>>>>>>> context
>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>
>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks 
>>>>>>>>>>>>> are
>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no 
>>>>>>>>>>>>> longer
>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>
>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>
>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>
>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up 
>>>>>>>>>>> another
>>>>>>>>>>> thread.
>>>>>>>>>>>
>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>> talking about.
>>>>>>>>>>
>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the 
>>>>>>>>>> encapsulated
>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the 
>>>>>>>>>> end
>>>>>>>>>> of the block.
>>>>>>>>>
>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>
>>>>>>>> Err... absolutely right.
>>>>>>>>
>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>> first patch is fine.
>>>>>>>
>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>
>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>
>>>>>> But then we really need something as light-weight as 
>>>>>> preempt_enable/disable.
>>>>>>
>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>> a counter to achieve this (which saves one data reference in
>>>>> xnpod_schedule() though), but this is a start:
>>>>>
>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>
>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>
>>>>
>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>> of code:
>>>>
>>>>            xnsched_set_self_resched(curr->sched);
>>>>            xnpod_schedule();
>>>>
>>>> It means that we will go to the full blown __xnpod_schedule when
>>>> unlocking the top-most spinlock.
>>>>
>>>> I guess what I meant is that we should have a scheduler bit that is
>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>
>>>>
>>>
>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>
>> Yes, but the reason why we have to go to __xnpod_schedule in
>> xnpod_unlock_sched() is to allow thread to be suspended with the
>> scheduler locked (you added that at some point for the vxworks emulator,
>> if I remember correctly). But without that need, we can put the XNLOCK
>> bit in the sched->status structure, and simply test that bit with all
>> the others in xnpod_schedule(), and forget about the need to call
>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>
> 
> You may have to reschedule after an IRQ has interrupted the lock owner
> inside the locked section, in which case you have to make sure to
> reschedule when dropping the sched lock.
> 
> This change was not


s/was not/was/

 directly related to holding the scheduler lock
> across rescheduling points. The motivation was to plug a caveat
> preventing fully non-preemptible sections to be built with a
> rescheduling point in the middle: sometimes the app logic really wants
> all operations to run un-preempted before the caller switches out.
> 
>>>
>>> The main issue we have for really optimizing the implementation, is
>>> hidden deep into the way we refer to the current running context: we do
>>> have to go through the current scheduler structure to get the current
>>> thread, which in turn requires to disable preemption...which is overkill
>>> by design.
>>
>> And the fact that we do it over and over again for even only one skin
>> service implementation.
>>
>>>
>>> core pipelines used in non-legacy mode allow the client code to define a
>>> stack-based thread-info block, but this won't work for 2.6.x which
>>> follows the legacy API. And this will work in 3.x when we will have
>>> gotten rid of these crappy Xenomai-specific stacks for kernel rt threads.
>>>
>>
>> I agree.
>>
> 
> 


-- 
Philippe.

_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] Target frozen with rtcan_virt

Reply via email to