Re: [Xenomai] Target frozen with rtcan_virt

Philippe Gerum Wed, 10 Oct 2012 06:57:31 -0700

On 10/10/2012 03:19 PM, Gilles Chanteperdrix wrote:
> On 10/10/2012 03:18 PM, Philippe Gerum wrote:
>> On 10/10/2012 03:16 PM, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 03:09 PM, Philippe Gerum wrote:
>>>> On 10/10/2012 03:01 PM, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 02:57 PM, Philippe Gerum wrote:
>>>>>> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>>>>>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>>>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in 
>>>>>>>>>>>>>>>>>>>>>>> rtcan_virt, we have
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, 
>>>>>>>>>>>>>>>>>>>>>>> lock_ctx);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> and 
>>>>>>>>>>>>>>>>>>>>>>> rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts 
>>>>>>>>>>>>>>>>>>>>>>> locked.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second 
>>>>>>>>>>>>>>>>>>>>>>> patch, 
>>>>>>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt 
>>>>>>>>>>>>>>>>>>>>>> the second patch,
>>>>>>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of 
>>>>>>>>>>>>>>>>>>>>>> xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin 
>>>>>>>>>>>>>>>>>>>>> lock? That's a
>>>>>>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we 
>>>>>>>>>>>>>>>>>>>>> are also using
>>>>>>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at 
>>>>>>>>>>>>>>>>>>>>> the full context
>>>>>>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the 
>>>>>>>>>>>>>>>>>>>>> lock(s).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while 
>>>>>>>>>>>>>>>>>>>> spinlocks are
>>>>>>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin 
>>>>>>>>>>>>>>>>>>>> lock is no longer
>>>>>>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes 
>>>>>>>>>>>>>>>>>> up another
>>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the 
>>>>>>>>>>>>>>>>>> issue we are
>>>>>>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the 
>>>>>>>>>>>>>>>>> encapsulated
>>>>>>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed 
>>>>>>>>>>>>>>>>> until the end
>>>>>>>>>>>>>>>>> of the block.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The good news is: we don't need to worry about such kind of 
>>>>>>>>>>>>>>> locking. In
>>>>>>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as 
>>>>>>>>>>>>>>> we are
>>>>>>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and 
>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>> first patch is fine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which one? The first one does not seem to work because the rtdm 
>>>>>>>>>>>>>> locks
>>>>>>>>>>>>>> seem to be nested. The second one would probably need to find a 
>>>>>>>>>>>>>> way to
>>>>>>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, 
>>>>>>>>>>>>>> however,
>>>>>>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan 
>>>>>>>>>>>>> wasn't
>>>>>>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But then we really need something as light-weight as 
>>>>>>>>>>>>> preempt_enable/disable.
>>>>>>>>>>>>>
>>>>>>>>>>>> This is not as lightweight as it might be given that we pair a 
>>>>>>>>>>>> flag and
>>>>>>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>>>>>>
>>>>>>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>>>>>>
>>>>>>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in 
>>>>>>>>>>>> stone
>>>>>>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>>>>>>> context where the nucleus lock is already held, so that RTDM can 
>>>>>>>>>>>> rely on
>>>>>>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The problem of xnpod_unlock_sched from my point of view is this 
>>>>>>>>>>> section
>>>>>>>>>>> of code:
>>>>>>>>>>>
>>>>>>>>>>>             xnsched_set_self_resched(curr->sched);
>>>>>>>>>>>             xnpod_schedule();
>>>>>>>>>>>
>>>>>>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>>>>>>> unlocking the top-most spinlock.
>>>>>>>>>>>
>>>>>>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>>>>>>> simply tested in xnpod_schedule, but then we loose the ability for 
>>>>>>>>>>> the
>>>>>>>>>>> threads with the scheduler locked to suspend themselves. So, maybe 
>>>>>>>>>>> we
>>>>>>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS 
>>>>>>>>>> parlance.
>>>>>>>>>
>>>>>>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>>>>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>>>>>>> scheduler locked (you added that at some point for the vxworks 
>>>>>>>>> emulator,
>>>>>>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>>>>>>> bit in the sched->status structure, and simply test that bit with all
>>>>>>>>> the others in xnpod_schedule(), and forget about the need to call
>>>>>>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I 
>>>>>>>>> meant.
>>>>>>>>>
>>>>>>>>
>>>>>>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>>>>>>> inside the locked section, in which case you have to make sure to
>>>>>>>> reschedule when dropping the sched lock.
>>>>>>>
>>>>>>> Well, in that case the resched bit will have been set already by the irq
>>>>>>> handler calling xnpod_resume_thread, or other service, you do not have
>>>>>>> to force it.
>>>>>>>
>>>>>>
>>>>>> Yes, and this is why you have to call __xnpod_schedule() regardless.
>>>>>>
>>>>> no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
>>>>> the resched bit is set.
>>>>>
>>>>
>>>> We are not discussing about the same issue, I'm afraid. We can't
>>>> optimize the schedlock path via a scheduler flag since we have to care
>>>> about lock nesting. Since the only sane option there is to hold such
>>>> counter in the current thread context, the optimization is 100% in the
>>>> way we access this information.
>>>>
>>>
>>> Again, if we do not have to allow threads to suspend while holding the
>>> scheduler lock, the preempt_count also can be in the sched structure.
>>>
>>
>> Sure, but this point is moot. We just _can't_ do that.
>>
> Except if we invent another scheduler lock service, simply for the
> purpose of spinlocks where suspending the spinlock holder does not make
> sense anyway.
>


Oh, well, ok. Ack.

Let's just make sure that we can fold the whole thing into a single set
of services at some point in 3.x though.

The core issue is that we should not even have to expose a scheduler
locking service to userland, but emulating traditional RTOS APIs
requires that. What a braindamage counter-feature for a real-time system
when one thinks of it.

-- 
Philippe.

_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] Target frozen with rtcan_virt

Reply via email to