Re: [Xenomai] RTDM syscalls & switching

Philippe Gerum Fri, 13 May 2016 06:42:11 -0700

On 05/13/2016 07:54 AM, Jan Kiszka wrote:
> On 2016-05-13 00:26, Philippe Gerum wrote:
>> On 05/12/2016 09:27 PM, Jan Kiszka wrote:
>>> On 2016-05-12 21:08, Philippe Gerum wrote:
>>>> On 05/12/2016 08:42 PM, Jan Kiszka wrote:
>>>>> On 2016-05-12 20:35, Philippe Gerum wrote:
>>>>>> On 05/12/2016 08:24 PM, Jan Kiszka wrote:
>>>>>>> On 2016-05-12 20:20, Gilles Chanteperdrix wrote:
>>>>>>>> On Thu, May 12, 2016 at 07:17:15PM +0200, Jan Kiszka wrote:
>>>>>>>>> On 2016-05-12 19:12, Gilles Chanteperdrix wrote:
>>>>>>>>>> On Thu, May 12, 2016 at 06:59:04PM +0200, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On Thu, May 12, 2016 at 06:50:03PM +0200, Jan Kiszka wrote:
>>>>>>>>>>>> On 2016-05-12 18:31, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On Thu, May 12, 2016 at 06:06:16PM +0200, Jan Kiszka wrote:
>>>>>>>>>>>>>> Gilles,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> regarding commit bec5d0dd42 (rtdm: make syscalls conforming 
>>>>>>>>>>>>>> rather than
>>>>>>>>>>>>>> current) - I remember a discussion on that topic, but I do not 
>>>>>>>>>>>>>> find its
>>>>>>>>>>>>>> traces any more. Do you have a pointer
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In any case, I'm confronted with a use case for the old (Xenomai 
>>>>>>>>>>>>>> 2),
>>>>>>>>>>>>>> lazy switching behaviour: lightweight, performance sensitive 
>>>>>>>>>>>>>> IOCTL
>>>>>>>>>>>>>> services that can (and should) be called without any switching 
>>>>>>>>>>>>>> from both
>>>>>>>>>>>>>> domains.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why not using a plain linux driver? ioctl_nrt callbacks are
>>>>>>>>>>>>> redundant with plain linux drivers.
>>>>>>>>>>>>
>>>>>>>>>>>> Because that enforces the calling layer to either call the same 
>>>>>>>>>>>> service
>>>>>>>>>>>> via a plain Linux device if the calling thread is currently 
>>>>>>>>>>>> relaxed or
>>>>>>>>>>>> go for the RT device if the caller is in primary. Doable, but I 
>>>>>>>>>>>> would
>>>>>>>>>>>> really like to avoid this pain for the users.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What were the arguments in favour of migrating threads to 
>>>>>>>>>>>>>> real-time first?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I currently see the real need only for IOCTLs, but the question 
>>>>>>>>>>>>>> is then
>>>>>>>>>>>>>> if we shouldn't go back to "__xn_exec_current" in all RTDM cases 
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> avoid unwanted migration costs (which are significantly higher 
>>>>>>>>>>>>>> than
>>>>>>>>>>>>>> syscall restarts).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not find commit bec5d0dd42 in xenomai-2.6 git tree, and I do
>>>>>>>>>>>>
>>>>>>>>>>>> Xenomai 2 is still following the lazy scheme - we reverted that 
>>>>>>>>>>>> commit
>>>>>>>>>>>> later on in 7df0c1d96b. Xenomai 3 changed it again with the commit 
>>>>>>>>>>>> above.
>>>>>>>>>>>>
>>>>>>>>>>>>> not remember merging this. However I find commit
>>>>>>>>>>>>> 13bfdd477ab880499d2e8f3b82c49ef4d2cccff0 from 2010 which seems to
>>>>>>>>>>>>> explain the reason pretty clear.
>>>>>>>>>>>>>
>>>>>>>>>>>>> At the time of the discussion we had concluded that it was the way
>>>>>>>>>>>>> to go. With __xn_exec_current you may enter the ioctl_rt callback
>>>>>>>>>>>>> from secondary domain, which is counter-intuitive, error-prone, 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> forces you to cripple driver code for checks for the current 
>>>>>>>>>>>>> domain.
>>>>>>>>>>>>
>>>>>>>>>>>> Nope, normal drivers are not affected as they just implement those
>>>>>>>>>>>> services in the respective mode they want to support there and 
>>>>>>>>>>>> have a
>>>>>>>>>>>> simple -ENOSYS for the rest (explicitly in IOCTLs or implicitly by
>>>>>>>>>>>> leaving out the implementation of the counterpart handler).
>>>>>>>>>>>
>>>>>>>>>>> Yes, I got mixed up trying to remember. I think the crux of the
>>>>>>>>>>> problem is that if a thread running in primary mode gets
>>>>>>>>>>> (temporarily) switched to secondary mode by gdb, the ioctl_nrt
>>>>>>>>>>> handler gets invoked, which is almost certainly the wrong thing to
>>>>>>>>>>> do. You want the thread to migrate to primary mode to execute
>>>>>>>>>>> ioctl_rt, which __xn_exec_conforming achieves. Otherwise running an
>>>>>>>>>>> application in gdb causes the application to behave differently.
>>>>>>>>>>
>>>>>>>>>> And trying and avoiding this issue indeed cripple codes with checks
>>>>>>>>>> for rtdm_in_rt_context:
>>>>>>>>>> https://git.xenomai.org/xenomai-2.6.git/tree/ksrc/drivers/analogy/rtdm_interface.c#n194
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't remember details here, but this is a special case: The driver
>>>>>>>>> provides also read_nrt - is that really useful for Analogy?
>>>>>>>>>
>>>>>>>>> In most cases, you are fine with not providing the nrt (or rt) 
>>>>>>>>> handler,
>>>>>>>>> or with a simple
>>>>>>>>>
>>>>>>>>> default:
>>>>>>>>>       return -ENOSYS;
>>>>>>>>>
>>>>>>>>> in your ioctl dispatcher.
>>>>>>>>
>>>>>>>> You are missing the point: if you enter read_nrt, there are two
>>>>>>>> cases:
>>>>>>>> - either the thread is real-time capable and has been relaxed by gdb
>>>>>>>> and you want to switch to read_rt for the reasons I already
>>>>>>>> explained, in that case, you must return -ENOSYS;
>>>>>>>> - or the thread is not real-time capable and the nrt handler
>>>>>>>> applies.
>>>>>>>>
>>>>>>>> So, you need at least
>>>>>>>>
>>>>>>>> read_nrt()
>>>>>>>> {
>>>>>>>>        if (rt_capable)
>>>>>>>>             return -ENOSYS;
>>>>>>>>
>>>>>>>>        /* Do the normal case here */
>>>>>>>> }
>>>>>>>
>>>>>>> Now tell me how many drivers have read_nrt, write_nrt? 1 in-tree.
>>>>>>> recvmsg_nrt, sendmsg_nrt? 0 in-tree. Analogy is special (still like to
>>>>>>> understand why, though). And having some special code in the exceptional
>>>>>>> case is probably better then the side effects we get from eagerly
>>>>>>> switching now.
>>>>>>>
>>>>>>
>>>>>> Sorry, that is exactly the opposite: your use case is exceptional and I
>>>>>> believe is wrong. The normal use case is the one that does not ask the
>>>>>> user to track the current mode for knowing what any random driver would
>>>>>> eventually do depending on the calling context.
>>>>>
>>>>> You still miss the point that this is not required in 99% of the cases.
>>>>> There is no such problem. There only Analogy.
>>>>>
>>>>
>>>> I'm not discussing Analogy at all, those drivers are still biased by the
>>>> legacy 2.x logic for dealing with modes and need fixing. I have never
>>>> been convinced by the reasoning behind rtdm_in_rt_context(), which
>>>> perfectly illustrates why messing with the call mode is not the
>>>> application's business.
>>>
>>> You still need rtdm_in_rt_context() for the (rare) case of having the
>>> same handler for both service_rt and service_nrt. That didn't change
>>> with any switching strategy adjustment. It can't as long as there are
>>> services behind a syscall that may handle any mode, thus that syscall is
>>> unable to filter for the service in the background. We really need to
>>> differentiate here.
>>>
>>>>
>>>>> Every driver must ensure that a service is only exposed to users in the
>>>>> right mode. That is a functional requirement, and drivers that fail to
>>>>> do so only work by chance (thus with the restricted workload they are
>>>>> tested against). If that is fulfilled, it doesn't matter to the driver
>>>>> when the switch happens. It's pure optimization.
>>>>>
>>>>
>>>> You don't seem to get my point either. Let's proceed differently, please
>>>> sketch the application code that would require __xn_exec_current for
>>>> RTDM calls.
>>>
>>> You cut the more interesting case (migration ping-pong when calling
>>> non-RT drivers from relaxed threads), and I hope you will not forget to
>>> answer this.
>>>
>>
>> I'm not ignoring the question, I have been postponing the answer until I
>> understand why the application could be put in a situation making this
>> migration a problem, and whether another approach would exist for
>> solving that problem within the current scheme.
> 
> These two scenarios are unrelated: this migration issue would still be
> there even if we solved the one below via a different application/driver
> design.
>


Which starts to be an issue only because the caller is a Cobalt shadow
undergoing the SCHED_WEAK policy, calling a RTDM driver for a non-rt
operation very frequently. For this reason, those two scenarii are very
much related.

>>
>>> But let's go to our case:
>>>
>>> We have a non-blocking service in the driver, the classic case of
>>> accessing a privileged resource that userspace can't or shouldn't touch
>>> directly. Think of some kind of register access that requires low-level
>>> synchronization with other threads and interrupt handlers. That service
>>> is called by both RT and non-RT threads (SCHED_WEAK) at higher frequency
>>> (some thousand times per second). The RT threads are obviously on the
>>> time critical path, must not migrate, and that can be achieved perfectly
>>> already by providing that service under ioctl_rt. The non-RT threads
>>> could be migrated to RT, but then they would pay an unneeded price,
>>> contributing to a higher system load, in the worst case overload.
>>> Therefore, the very same service shall be provided under ioctl_nrt as
>>> well. Makes sense?
>>>
>>
>> I understand the conflict with the "rt-always-has-precedence" rule
>> implemented by the conforming state, then I have another question:
>>
>> assuming the nrt thread undergoes the SCHED_WEAK policy because it is
>> mainly operating from the Linux space but still needs to synchronize
>> with the rt side at some point, which kind of high frequency interaction
>> with the rt side is this?
>>
>> Sharing some resource requiring mutual exclusion via a Cobalt synchro,
>> waiting for rt events, something else?
>>
> 
> There synchronization need is first of all only on the hardware access
> (thus inside the driver), not necessarily at application level. In fact,
> there are even scenarios where you only want to exploit the driver as
> permission checker on privileged resource accesses (userspace shall only
> access certain MMIO registers in a page, thus the driver acts as
> gatekeeper). Then there could be no synchronization at all but still the
> need to provide migration-free accesses.
> 

I get the idea of the resource gatekeeper, which does make a lot of sense.

However I still don't get which benefit your caller has in undergoing
the SCHED_WEAK policy - which implies that it has to share
synchronization points with Cobalt - compared to running as a regular
(glibc) thread, under whichever policy that could fit?

Leaving the non-RT ioctl call aside, which are those Cobalt calls the
SCHED_WEAK thread needs to invoke for synchronizing with rt threads?

-- 
Philippe.

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
https://xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] RTDM syscalls & switching

Reply via email to