Re: [Xenomai] Target frozen with rtcan_virt

Philippe Gerum Wed, 10 Oct 2012 03:54:36 -0700

On 10/10/2012 12:25 PM, Jan Kiszka wrote:
> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>
>>>>>>>>>>> Many thanks,
>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>
>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>> ....
>>>>>>>>>>>
>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>
>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>
>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>
>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>
>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second 
>>>>>>>>>> patch,
>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also 
>>>>>>>>> using
>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full 
>>>>>>>>> context
>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>
>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>> held, so that posting a semaphore while holding a spin lock is no 
>>>>>>>> longer
>>>>>>>> a bug.
>>>>>>>
>>>>>>> Sounds a bit hacky,
>>>>>>
>>>>>> Well, that is what the linux kernel does.
>>>>>>
>>>>>>  but I think we have this pattern
>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>
>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>> thread.
>>>>>>
>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>> talking about.
>>>>>
>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>> of the block.
>>>>
>>>> Err... no. Absolutely not.
>>>
>>> Err... absolutely right.
>>>
>>> The good news is: we don't need to worry about such kind of locking. In
>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>> in a handler. So it won't disappear when we drop the lock, and your
>>> first patch is fine.
>>
>> Which one? The first one does not seem to work because the rtdm locks
>> seem to be nested. The second one would probably need to find a way to
>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>> RTDM_EXECUTE_ATOMICALLY.
> 
> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
> used much on SMP so far. That looks indeed unresolvable without a
> semantical change to rtdm_lock/unlock.
> 
> But then we really need something as light-weight as preempt_enable/disable.
>


This is not as lightweight as it might be given that we pair a flag and
a counter to achieve this (which saves one data reference in
xnpod_schedule() though), but this is a start:

http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497

So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
in the nucleus API to manipulate the sched locking counter from a
context where the nucleus lock is already held, so that RTDM can rely on
this for RTDM_EXECUTE_ATOMICALLY().

-- 
Philippe.

_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] Target frozen with rtcan_virt

Reply via email to