On 10/10/2012 03:19 PM, Gilles Chanteperdrix wrote: > On 10/10/2012 03:18 PM, Philippe Gerum wrote: >> On 10/10/2012 03:16 PM, Gilles Chanteperdrix wrote: >>> On 10/10/2012 03:09 PM, Philippe Gerum wrote: >>>> On 10/10/2012 03:01 PM, Gilles Chanteperdrix wrote: >>>>> On 10/10/2012 02:57 PM, Philippe Gerum wrote: >>>>>> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote: >>>>>>> On 10/10/2012 02:41 PM, Philippe Gerum wrote: >>>>>>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote: >>>>>>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote: >>>>>>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote: >>>>>>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote: >>>>>>>>>>> >>>>>>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote: >>>>>>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote: >>>>>>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote: >>>>>>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote: >>>>>>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote: >>>>>>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote: >>>>>>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote: >>>>>>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote: >>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote: >>>>>>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote: >>>>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi Gilles, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Many thanks, >>>>>>>>>>>>>>>>>>>>>>> The first patch does not work, the second does. >>>>>>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in >>>>>>>>>>>>>>>>>>>>>>> rtcan_virt, we have >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx); >>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock); >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>>>>>> ---> rtcan_rcv(rx_dev, &skb); >>>>>>>>>>>>>>>>>>>>>>> .... >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock); >>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, >>>>>>>>>>>>>>>>>>>>>>> lock_ctx); >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>> rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem); >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts >>>>>>>>>>>>>>>>>>>>>>> locked. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second >>>>>>>>>>>>>>>>>>>>>>> patch, >>>>>>>>>>>>>>>>>>>>>>> since you change the overall behaviour ? >>>>>>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt >>>>>>>>>>>>>>>>>>>>>> the second patch, >>>>>>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of >>>>>>>>>>>>>>>>>>>>>> xnpod_unlock_sched. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin >>>>>>>>>>>>>>>>>>>>> lock? That's a >>>>>>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we >>>>>>>>>>>>>>>>>>>>> are also using >>>>>>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at >>>>>>>>>>>>>>>>>>>>> the full context >>>>>>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the >>>>>>>>>>>>>>>>>>>>> lock(s). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while >>>>>>>>>>>>>>>>>>>> spinlocks are >>>>>>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin >>>>>>>>>>>>>>>>>>>> lock is no longer >>>>>>>>>>>>>>>>>>>> a bug. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Sounds a bit hacky, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Well, that is what the linux kernel does. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> but I think we have this pattern >>>>>>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do: >>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar()); >>>>>>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes >>>>>>>>>>>>>>>>>> up another >>>>>>>>>>>>>>>>>> thread. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the >>>>>>>>>>>>>>>>>> issue we are >>>>>>>>>>>>>>>>>> talking about. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the >>>>>>>>>>>>>>>>> encapsulated >>>>>>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed >>>>>>>>>>>>>>>>> until the end >>>>>>>>>>>>>>>>> of the block. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Err... no. Absolutely not. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Err... absolutely right. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The good news is: we don't need to worry about such kind of >>>>>>>>>>>>>>> locking. In >>>>>>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as >>>>>>>>>>>>>>> we are >>>>>>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and >>>>>>>>>>>>>>> your >>>>>>>>>>>>>>> first patch is fine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Which one? The first one does not seem to work because the rtdm >>>>>>>>>>>>>> locks >>>>>>>>>>>>>> seem to be nested. The second one would probably need to find a >>>>>>>>>>>>>> way to >>>>>>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, >>>>>>>>>>>>>> however, >>>>>>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in >>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY. >>>>>>>>>>>>> >>>>>>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan >>>>>>>>>>>>> wasn't >>>>>>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a >>>>>>>>>>>>> semantical change to rtdm_lock/unlock. >>>>>>>>>>>>> >>>>>>>>>>>>> But then we really need something as light-weight as >>>>>>>>>>>>> preempt_enable/disable. >>>>>>>>>>>>> >>>>>>>>>>>> This is not as lightweight as it might be given that we pair a >>>>>>>>>>>> flag and >>>>>>>>>>>> a counter to achieve this (which saves one data reference in >>>>>>>>>>>> xnpod_schedule() though), but this is a start: >>>>>>>>>>>> >>>>>>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497 >>>>>>>>>>>> >>>>>>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in >>>>>>>>>>>> stone >>>>>>>>>>>> in the nucleus API to manipulate the sched locking counter from a >>>>>>>>>>>> context where the nucleus lock is already held, so that RTDM can >>>>>>>>>>>> rely on >>>>>>>>>>>> this for RTDM_EXECUTE_ATOMICALLY(). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The problem of xnpod_unlock_sched from my point of view is this >>>>>>>>>>> section >>>>>>>>>>> of code: >>>>>>>>>>> >>>>>>>>>>> xnsched_set_self_resched(curr->sched); >>>>>>>>>>> xnpod_schedule(); >>>>>>>>>>> >>>>>>>>>>> It means that we will go to the full blown __xnpod_schedule when >>>>>>>>>>> unlocking the top-most spinlock. >>>>>>>>>>> >>>>>>>>>>> I guess what I meant is that we should have a scheduler bit that is >>>>>>>>>>> simply tested in xnpod_schedule, but then we loose the ability for >>>>>>>>>>> the >>>>>>>>>>> threads with the scheduler locked to suspend themselves. So, maybe >>>>>>>>>>> we >>>>>>>>>>> should define another xnpod_lock/unlock pair, maybe something like >>>>>>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled(). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS >>>>>>>>>> parlance. >>>>>>>>> >>>>>>>>> Yes, but the reason why we have to go to __xnpod_schedule in >>>>>>>>> xnpod_unlock_sched() is to allow thread to be suspended with the >>>>>>>>> scheduler locked (you added that at some point for the vxworks >>>>>>>>> emulator, >>>>>>>>> if I remember correctly). But without that need, we can put the XNLOCK >>>>>>>>> bit in the sched->status structure, and simply test that bit with all >>>>>>>>> the others in xnpod_schedule(), and forget about the need to call >>>>>>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I >>>>>>>>> meant. >>>>>>>>> >>>>>>>> >>>>>>>> You may have to reschedule after an IRQ has interrupted the lock owner >>>>>>>> inside the locked section, in which case you have to make sure to >>>>>>>> reschedule when dropping the sched lock. >>>>>>> >>>>>>> Well, in that case the resched bit will have been set already by the irq >>>>>>> handler calling xnpod_resume_thread, or other service, you do not have >>>>>>> to force it. >>>>>>> >>>>>> >>>>>> Yes, and this is why you have to call __xnpod_schedule() regardless. >>>>>> >>>>> no, xnpod_schedule is enough, it will only call __xnpod_schedule() if >>>>> the resched bit is set. >>>>> >>>> >>>> We are not discussing about the same issue, I'm afraid. We can't >>>> optimize the schedlock path via a scheduler flag since we have to care >>>> about lock nesting. Since the only sane option there is to hold such >>>> counter in the current thread context, the optimization is 100% in the >>>> way we access this information. >>>> >>> >>> Again, if we do not have to allow threads to suspend while holding the >>> scheduler lock, the preempt_count also can be in the sched structure. >>> >> >> Sure, but this point is moot. We just _can't_ do that. >> > Except if we invent another scheduler lock service, simply for the > purpose of spinlocks where suspending the spinlock holder does not make > sense anyway. >
Oh, well, ok. Ack. Let's just make sure that we can fold the whole thing into a single set of services at some point in 3.x though. The core issue is that we should not even have to expose a scheduler locking service to userland, but emulating traditional RTOS APIs requires that. What a braindamage counter-feature for a real-time system when one thinks of it. -- Philippe. _______________________________________________ Xenomai mailing list [email protected] http://www.xenomai.org/mailman/listinfo/xenomai
