Re: [Xenomai] Target frozen with rtcan_virt

Gilles Chanteperdrix Wed, 10 Oct 2012 06:01:47 -0700

On 10/10/2012 02:57 PM, Philippe Gerum wrote:
> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>
>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, 
>>>>>>>>>>>>>>>>>> we have
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> and 
>>>>>>>>>>>>>>>>>> rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second 
>>>>>>>>>>>>>>>>>> patch, 
>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the 
>>>>>>>>>>>>>>>>> second patch,
>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of 
>>>>>>>>>>>>>>>>> xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? 
>>>>>>>>>>>>>>>> That's a
>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are 
>>>>>>>>>>>>>>>> also using
>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the 
>>>>>>>>>>>>>>>> full context
>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks 
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is 
>>>>>>>>>>>>>>> no longer
>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>
>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up 
>>>>>>>>>>>>> another
>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we 
>>>>>>>>>>>>> are
>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>
>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the 
>>>>>>>>>>>> encapsulated
>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until 
>>>>>>>>>>>> the end
>>>>>>>>>>>> of the block.
>>>>>>>>>>>
>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>
>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>
>>>>>>>>>> The good news is: we don't need to worry about such kind of locking. 
>>>>>>>>>> In
>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we 
>>>>>>>>>> are
>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>>> first patch is fine.
>>>>>>>>>
>>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, 
>>>>>>>>> however,
>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>
>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>
>>>>>>>> But then we really need something as light-weight as 
>>>>>>>> preempt_enable/disable.
>>>>>>>>
>>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>
>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>
>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>
>>>>>>
>>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>>> of code:
>>>>>>
>>>>>>                  xnsched_set_self_resched(curr->sched);
>>>>>>                  xnpod_schedule();
>>>>>>
>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>> unlocking the top-most spinlock.
>>>>>>
>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>
>>>>>>
>>>>>
>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>>
>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>> scheduler locked (you added that at some point for the vxworks emulator,
>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>> bit in the sched->status structure, and simply test that bit with all
>>>> the others in xnpod_schedule(), and forget about the need to call
>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>>
>>>
>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>> inside the locked section, in which case you have to make sure to
>>> reschedule when dropping the sched lock.
>>
>> Well, in that case the resched bit will have been set already by the irq
>> handler calling xnpod_resume_thread, or other service, you do not have
>> to force it.
>>
> 
> Yes, and this is why you have to call __xnpod_schedule() regardless.
> 
no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
the resched bit is set.


-- 
                                            Gilles.

_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] Target frozen with rtcan_virt

Reply via email to