On 02/06/2013 07:26 PM, Jan Kiszka wrote:
> On 2013-02-06 18:51, Gilles Chanteperdrix wrote:
>> On 02/06/2013 06:47 PM, Jan Kiszka wrote:
>>
>>> On 2013-02-06 18:44, Gilles Chanteperdrix wrote:
>>>> On 02/06/2013 06:40 PM, Jan Kiszka wrote:
>>>>
>>>>> On 2013-02-06 18:35, Gilles Chanteperdrix wrote:
>>>>>> On 02/06/2013 06:33 PM, Jan Kiszka wrote:
>>>>>>
>>>>>>> On 2013-02-06 18:09, Gilles Chanteperdrix wrote:
>>>>>>>> On 02/06/2013 06:03 PM, Jan Kiszka wrote:
>>>>>>>>
>>>>>>>>> Gilles,
>>>>>>>>>
>>>>>>>>> do you remember if this core-3.4 change was a performance optimization
>>>>>>>>> or a necessary fix? Also, I'm not yet understanding why we need all
>>>>>>>>> the
>>>>>>>>> #ifdefs except for the first one which forces fpu.preload to 0.
>>>>>>>>
>>>>>>>>
>>>>>>>> It is a performance optimization, without it, we systematically hit the
>>>>>>>> maximum latency when the timer would tick during a context switch which
>>>>>>>> restores the FPU. Note that if you change that, you will probably break
>>>>>>>> -forge.
>>>>>>>
>>>>>>> According to the Intel folks who introduced eagerfpu, xsave, or at least
>>>>>>> xsaveopt (which I didn't implemented yet) is now faster than serializing
>>>>>>> clts/stts. On the other hand, the worst case is a full SSE + AVX restore
>>>>>>> while the target RT task is not depending on the FPU.
>>>>>>
>>>>>>
>>>>>> Without xsave, we never restore fpu if the RT task never used it. This
>>>>>> changes with xsave?
>>>>>
>>>>> This would change with eagerfpu which depends on xsave. The kernel
>>>>> sticks with lazy switching in the absence of xsaveopt.
>>>>
>>>>
>>>> I am not sure you understand what I mean, so, I am going to reformulate.
>>>> Without xsave, Linux uses lazy fpu restore, and Xenomai uses eager fpu
>>>> restore. But Xenomai eager fpu restore is a nop if the RT task never
>>>> used FPU since its inception (and all the parents from which it is
>>>> cloned never used FPU either). Does Linux eager switching mean the same
>>>> thing?
>>>
>>> eagerfpu means: always call xsaveopt/xrstor, it will optimize the case
>>> that the FPU was unused by the source/destination. And no fiddling with
>>> TS anymore, at no time.
>>
>>
>> I still do not understand this sentence then: "the worst case is a full
>> SSE + AVX restore while the target RT task is not depending on the FPU."
>> If the RT task does not depend on the FPU, why would xsaveopt/xrstor
>> restore SSE and AVX context?
>
> Switching between two tasks that both use the full state space defines
> the maximum latency of the FPU save/restore step. We cannot interrupt
> xsave or xrstor instructions, but we couldn't interrupt fxsave either.
>
> What we can do, though, is to ensure that we have at least an preemption
> point between both. Do we have such thing so far, a chance to handle a
> Xenomai IRQ between some FPU save for Linux task A and a FPU restore for
> the following task B? If not, the discussion is mood and we are just
> shifting probabilities of the very same worst case.
We can implement unlocked context switch support on x86 as we do on
other platforms. I tried that on atom actually and it did not really
improve latencies. You do not answer my question though, why would
xsave/xrstor do anything if the RT thread has not used FPU (and all its
parents have not used fpu) ?
--
Gilles.
_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai