Re: [Xenomai] Target frozen with rtcan_virt

Philippe Gerum Wed, 10 Oct 2012 05:41:49 -0700

On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>
>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we 
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second 
>>>>>>>>>>>>>> patch,
>>>>>>>>>>>>>> we might want to try and reduce the overhead of 
>>>>>>>>>>>>>> xnpod_unlock_sched.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's 
>>>>>>>>>>>>> a
>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are 
>>>>>>>>>>>>> also using
>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full 
>>>>>>>>>>>>> context
>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>
>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no 
>>>>>>>>>>>> longer
>>>>>>>>>>>> a bug.
>>>>>>>>>>>
>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>
>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>
>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>
>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up 
>>>>>>>>>> another
>>>>>>>>>> thread.
>>>>>>>>>>
>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>> talking about.
>>>>>>>>>
>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>> code, executing it atomically as rescheduling is postponed until the 
>>>>>>>>> end
>>>>>>>>> of the block.
>>>>>>>>
>>>>>>>> Err... no. Absolutely not.
>>>>>>>
>>>>>>> Err... absolutely right.
>>>>>>>
>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>> first patch is fine.
>>>>>>
>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>
>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>> semantical change to rtdm_lock/unlock.
>>>>>
>>>>> But then we really need something as light-weight as 
>>>>> preempt_enable/disable.
>>>>>
>>>> This is not as lightweight as it might be given that we pair a flag and
>>>> a counter to achieve this (which saves one data reference in
>>>> xnpod_schedule() though), but this is a start:
>>>>
>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>
>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>> in the nucleus API to manipulate the sched locking counter from a
>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>
>>>
>>> The problem of xnpod_unlock_sched from my point of view is this section
>>> of code:
>>>
>>>             xnsched_set_self_resched(curr->sched);
>>>             xnpod_schedule();
>>>
>>> It means that we will go to the full blown __xnpod_schedule when
>>> unlocking the top-most spinlock.
>>>
>>> I guess what I meant is that we should have a scheduler bit that is
>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>> should define another xnpod_lock/unlock pair, maybe something like
>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>
>>>
>>
>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
> 
> Yes, but the reason why we have to go to __xnpod_schedule in
> xnpod_unlock_sched() is to allow thread to be suspended with the
> scheduler locked (you added that at some point for the vxworks emulator,
> if I remember correctly). But without that need, we can put the XNLOCK
> bit in the sched->status structure, and simply test that bit with all
> the others in xnpod_schedule(), and forget about the need to call
> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>


You may have to reschedule after an IRQ has interrupted the lock owner
inside the locked section, in which case you have to make sure to
reschedule when dropping the sched lock.

This change was not directly related to holding the scheduler lock
across rescheduling points. The motivation was to plug a caveat
preventing fully non-preemptible sections to be built with a
rescheduling point in the middle: sometimes the app logic really wants
all operations to run un-preempted before the caller switches out.

>>
>> The main issue we have for really optimizing the implementation, is
>> hidden deep into the way we refer to the current running context: we do
>> have to go through the current scheduler structure to get the current
>> thread, which in turn requires to disable preemption...which is overkill
>> by design.
> 
> And the fact that we do it over and over again for even only one skin
> service implementation.
> 
>>
>> core pipelines used in non-legacy mode allow the client code to define a
>> stack-based thread-info block, but this won't work for 2.6.x which
>> follows the legacy API. And this will work in 3.x when we will have
>> gotten rid of these crappy Xenomai-specific stacks for kernel rt threads.
>>
> 
> I agree.
> 


-- 
Philippe.

_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] Target frozen with rtcan_virt

Reply via email to