On Fri, Feb 20, 2015 at 10:58:15AM -0800, Andy Lutomirski wrote:
> - /* Auto enable eagerfpu for xsaveopt */
> - if (cpu_has_xsaveopt && eagerfpu != DISABLE)
> + /* Auto enable eagerfpu for everyone */
> + if (eagerfpu != DISABLE)
> eagerfpu = ENABLE;
So Mel did run
* Borislav Petkov wrote:
> On Tue, Feb 24, 2015 at 04:07:07PM -0800, Andy Lutomirski wrote:
>
> > I'd prefer a different partial solution: encourage
> > everyone to clear the xstate before making syscalls
> > (using e.g. vzeroall). In fact, maybe user code should
> > aggressively clear
* Andy Lutomirski wrote:
> > I'm a big fan of simplifying things, but.
> >
> > SIMD registers were growing in x86, and they are going
> > to grow again, this time four-fold in Intel MIC: from
> > sixteen 256-bit registers to thirty two 512-bit
> > registers.
> >
> > That's 2 kbytes of data.
On Tue, Feb 24, 2015 at 04:07:07PM -0800, Andy Lutomirski wrote:
> I'd prefer a different partial solution: encourage everyone to clear
> the xstate before making syscalls (using e.g. vzeroall). In fact,
> maybe user code should aggressively clear newly-unused xstate.
We don't trust userspace.
On Fri, Feb 20, 2015 at 10:58:15AM -0800, Andy Lutomirski wrote:
- /* Auto enable eagerfpu for xsaveopt */
- if (cpu_has_xsaveopt eagerfpu != DISABLE)
+ /* Auto enable eagerfpu for everyone */
+ if (eagerfpu != DISABLE)
eagerfpu = ENABLE;
So Mel did run some
On Tue, Feb 24, 2015 at 04:07:07PM -0800, Andy Lutomirski wrote:
I'd prefer a different partial solution: encourage everyone to clear
the xstate before making syscalls (using e.g. vzeroall). In fact,
maybe user code should aggressively clear newly-unused xstate.
We don't trust userspace.
--
* Andy Lutomirski l...@amacapital.net wrote:
I'm a big fan of simplifying things, but.
SIMD registers were growing in x86, and they are going
to grow again, this time four-fold in Intel MIC: from
sixteen 256-bit registers to thirty two 512-bit
registers.
That's 2 kbytes of
* Borislav Petkov b...@alien8.de wrote:
On Tue, Feb 24, 2015 at 04:07:07PM -0800, Andy Lutomirski wrote:
I'd prefer a different partial solution: encourage
everyone to clear the xstate before making syscalls
(using e.g. vzeroall). In fact, maybe user code should
aggressively clear
On Tue, Feb 24, 2015 at 11:15 AM, Denys Vlasenko
wrote:
> On Fri, Feb 20, 2015 at 7:58 PM, Andy Lutomirski wrote:
>> We have eager and lazy fpu modes, introduced in:
>>
>> 304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
>> xsave
>>
>> The result is rather messy. There
On Fri, Feb 20, 2015 at 7:58 PM, Andy Lutomirski wrote:
> We have eager and lazy fpu modes, introduced in:
>
> 304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
> xsave
>
> The result is rather messy. There are two code paths in almost all of the
> FPU code, and only one
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 09:31 PM, Andy Lutomirski wrote:
> On Mon, Feb 23, 2015 at 6:14 PM, Maciej W. Rozycki
> wrote:
>> That's an interesting case too, although not necessarily related.
>> If you say that we always save the FP context eagerly for the
>>
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 09:31 PM, Andy Lutomirski wrote:
On Mon, Feb 23, 2015 at 6:14 PM, Maciej W. Rozycki
ma...@linux-mips.org wrote:
That's an interesting case too, although not necessarily related.
If you say that we always save the FP context eagerly
On Fri, Feb 20, 2015 at 7:58 PM, Andy Lutomirski l...@amacapital.net wrote:
We have eager and lazy fpu modes, introduced in:
304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
xsave
The result is rather messy. There are two code paths in almost all of the
FPU code,
On Tue, Feb 24, 2015 at 11:15 AM, Denys Vlasenko
vda.li...@googlemail.com wrote:
On Fri, Feb 20, 2015 at 7:58 PM, Andy Lutomirski l...@amacapital.net wrote:
We have eager and lazy fpu modes, introduced in:
304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
xsave
The
On Mon, Feb 23, 2015 at 6:14 PM, Maciej W. Rozycki wrote:
> On Mon, 23 Feb 2015, Andy Lutomirski wrote:
>
>> >> After a context switch, the instructions from the old task are no
>> >> longer in the pipeline.
>> >
>> > I'd say it's implementation-specific. As I mentioned the i486 aborted
>> >
On Mon, 23 Feb 2015, Andy Lutomirski wrote:
> >> After a context switch, the instructions from the old task are no
> >> longer in the pipeline.
> >
> > I'd say it's implementation-specific. As I mentioned the i486 aborted
> > any transcendental x87 instruction in progress upon taking an
On Mon, 23 Feb 2015, Linus Torvalds wrote:
> We have one traditional special case, which actually did something
> like Maciej's nightmare scenario: the completely broken "FPU errors
> over irq13" IBM PC/AT FPU linkage.
>
> But since we don't actually support old i386 machines any more, we
>
On Mon, Feb 23, 2015 at 4:56 PM, Maciej W. Rozycki wrote:
> On Mon, 23 Feb 2015, Linus Torvalds wrote:
>
>> We have one traditional special case, which actually did something
>> like Maciej's nightmare scenario: the completely broken "FPU errors
>> over irq13" IBM PC/AT FPU linkage.
>>
>> But
On Mon, Feb 23, 2015 at 2:27 PM, Maciej W. Rozycki wrote:
> On Mon, 23 Feb 2015, Rik van Riel wrote:
>
>> > I meant something else -- a slow FPU instruction can retire after a
>> > task has been switched where the FP context has been left intact,
>> > i.e. in the lazy FP context switching case,
On Mon, 23 Feb 2015, Rik van Riel wrote:
> > I meant something else -- a slow FPU instruction can retire after a
> > task has been switched where the FP context has been left intact,
> > i.e. in the lazy FP context switching case, where only the MMU
> > context and GPRs have been replaced.
>
> I
On Mon, Feb 23, 2015 at 1:21 PM, Rik van Riel wrote:
>
> On 02/23/2015 04:17 PM, Maciej W. Rozycki wrote:
>>>
>>> It seems highly unlikely to me that a slow FPU instruction can
>>> retire *after* a subsequent fxsave, which would need to happen
>>> for this to work.
>>
>> I meant something else --
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 04:17 PM, Maciej W. Rozycki wrote:
> On Sat, 21 Feb 2015, Andy Lutomirski wrote:
>
>>> Additionally I believe long-executing FPU instructions (i.e.
>>> transcendentals) can take advantage of continuing to execute in
>>> parallel where
On Sat, 21 Feb 2015, Andy Lutomirski wrote:
> > Additionally I believe long-executing FPU instructions (i.e.
> > transcendentals) can take advantage of continuing to execute in parallel
> > where the context has already been switched rather than stalling an eager
> > FPU context switch until the
On 02/23, Rik van Riel wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 02/23/2015 10:11 AM, Borislav Petkov wrote:
> > On Mon, Feb 23, 2015 at 03:59:29PM +0100, Oleg Nesterov wrote:
> >> Well, but if we want this change then perhaps we should simply
> >> change the default value?
On Mon, Feb 23, 2015 at 10:51:26AM -0500, Rik van Riel wrote:
> However, we would still need the rest of the kernel code to ...
Yeah, let's wait out first and see what the benchmarks say. Mel started
a bunch of them on a couple of boxes here, we'll have results in the
coming days.
--
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 10:11 AM, Borislav Petkov wrote:
> On Mon, Feb 23, 2015 at 03:59:29PM +0100, Oleg Nesterov wrote:
>> Well, but if we want this change then perhaps we should simply
>> change the default value? This way "AUTO" still can work.
>
> Yeah,
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 10:03 AM, Borislav Petkov wrote:
> On Mon, Feb 23, 2015 at 07:51:04AM -0500, Rik van Riel wrote:
>> At that point we either load the FPU context, or we set CR0.TS.
>
> Right, but provided eager doesn't bring any slowdown, we can drop
>
On Mon, Feb 23, 2015 at 03:59:29PM +0100, Oleg Nesterov wrote:
> Well, but if we want this change then perhaps we should simply change
> the default value? This way "AUTO" still can work.
Yeah, sure, let's do some measurements first, to see whether this is
even worth it.
Btw, Mel pointed me at
On Mon, Feb 23, 2015 at 07:51:04AM -0500, Rik van Riel wrote:
> At that point we either load the FPU context, or we
> set CR0.TS.
Right, but provided eager doesn't bring any slowdown, we can drop the TS
fiddling altogether and only load FPU context.
--
Regards/Gruss,
Boris.
ECO tip #101:
On 02/20, Andy Lutomirski wrote:
>
> We have eager and lazy fpu modes, introduced in:
>
> 304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
> xsave
>
> The result is rather messy. There are two code paths in almost all of the
> FPU code, and only one of them (the eager
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 12:22 AM, Andy Lutomirski wrote:
> On Sun, Feb 22, 2015 at 5:45 PM, Rik van Riel
> wrote:
>> One implication of this is that in kernel mode, we can no longer
>> just assume that the user space FPU state is always loaded, and
>> we need
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 12:22 AM, Andy Lutomirski wrote:
On Sun, Feb 22, 2015 at 5:45 PM, Rik van Riel r...@redhat.com
wrote:
One implication of this is that in kernel mode, we can no longer
just assume that the user space FPU state is always loaded, and
On Mon, Feb 23, 2015 at 07:51:04AM -0500, Rik van Riel wrote:
At that point we either load the FPU context, or we
set CR0.TS.
Right, but provided eager doesn't bring any slowdown, we can drop the TS
fiddling altogether and only load FPU context.
--
Regards/Gruss,
Boris.
ECO tip #101:
On 02/20, Andy Lutomirski wrote:
We have eager and lazy fpu modes, introduced in:
304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
xsave
The result is rather messy. There are two code paths in almost all of the
FPU code, and only one of them (the eager case) is
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 10:03 AM, Borislav Petkov wrote:
On Mon, Feb 23, 2015 at 07:51:04AM -0500, Rik van Riel wrote:
At that point we either load the FPU context, or we set CR0.TS.
Right, but provided eager doesn't bring any slowdown, we can drop
the TS
On Mon, Feb 23, 2015 at 03:59:29PM +0100, Oleg Nesterov wrote:
Well, but if we want this change then perhaps we should simply change
the default value? This way AUTO still can work.
Yeah, sure, let's do some measurements first, to see whether this is
even worth it.
Btw, Mel pointed me at some
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 10:11 AM, Borislav Petkov wrote:
On Mon, Feb 23, 2015 at 03:59:29PM +0100, Oleg Nesterov wrote:
Well, but if we want this change then perhaps we should simply
change the default value? This way AUTO still can work.
Yeah, sure,
On Sat, 21 Feb 2015, Andy Lutomirski wrote:
Additionally I believe long-executing FPU instructions (i.e.
transcendentals) can take advantage of continuing to execute in parallel
where the context has already been switched rather than stalling an eager
FPU context switch until the FPU
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 04:17 PM, Maciej W. Rozycki wrote:
On Sat, 21 Feb 2015, Andy Lutomirski wrote:
Additionally I believe long-executing FPU instructions (i.e.
transcendentals) can take advantage of continuing to execute in
parallel where the context
On 02/23, Rik van Riel wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/23/2015 10:11 AM, Borislav Petkov wrote:
On Mon, Feb 23, 2015 at 03:59:29PM +0100, Oleg Nesterov wrote:
Well, but if we want this change then perhaps we should simply
change the default value? This way
On Mon, Feb 23, 2015 at 10:51:26AM -0500, Rik van Riel wrote:
However, we would still need the rest of the kernel code to ...
Yeah, let's wait out first and see what the benchmarks say. Mel started
a bunch of them on a couple of boxes here, we'll have results in the
coming days.
--
On Mon, 23 Feb 2015, Rik van Riel wrote:
I meant something else -- a slow FPU instruction can retire after a
task has been switched where the FP context has been left intact,
i.e. in the lazy FP context switching case, where only the MMU
context and GPRs have been replaced.
I don't
On Mon, Feb 23, 2015 at 1:21 PM, Rik van Riel r...@redhat.com wrote:
On 02/23/2015 04:17 PM, Maciej W. Rozycki wrote:
It seems highly unlikely to me that a slow FPU instruction can
retire *after* a subsequent fxsave, which would need to happen
for this to work.
I meant something else -- a
On Mon, 23 Feb 2015, Andy Lutomirski wrote:
After a context switch, the instructions from the old task are no
longer in the pipeline.
I'd say it's implementation-specific. As I mentioned the i486 aborted
any transcendental x87 instruction in progress upon taking an exception or
On Mon, 23 Feb 2015, Linus Torvalds wrote:
We have one traditional special case, which actually did something
like Maciej's nightmare scenario: the completely broken FPU errors
over irq13 IBM PC/AT FPU linkage.
But since we don't actually support old i386 machines any more, we
don't really
On Mon, Feb 23, 2015 at 4:56 PM, Maciej W. Rozycki ma...@linux-mips.org wrote:
On Mon, 23 Feb 2015, Linus Torvalds wrote:
We have one traditional special case, which actually did something
like Maciej's nightmare scenario: the completely broken FPU errors
over irq13 IBM PC/AT FPU linkage.
On Mon, Feb 23, 2015 at 6:14 PM, Maciej W. Rozycki ma...@linux-mips.org wrote:
On Mon, 23 Feb 2015, Andy Lutomirski wrote:
After a context switch, the instructions from the old task are no
longer in the pipeline.
I'd say it's implementation-specific. As I mentioned the i486 aborted
On Mon, Feb 23, 2015 at 2:27 PM, Maciej W. Rozycki ma...@linux-mips.org wrote:
On Mon, 23 Feb 2015, Rik van Riel wrote:
I meant something else -- a slow FPU instruction can retire after a
task has been switched where the FP context has been left intact,
i.e. in the lazy FP context
On Sun, Feb 22, 2015 at 5:45 PM, Rik van Riel wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 02/22/2015 06:06 AM, Borislav Petkov wrote:
>> On Sat, Feb 21, 2015 at 06:18:01PM -0800, Andy Lutomirski wrote:
>>> That's true. The question is whether there are enough of them,
>>> and
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/22/2015 06:06 AM, Borislav Petkov wrote:
> On Sat, Feb 21, 2015 at 06:18:01PM -0800, Andy Lutomirski wrote:
>> That's true. The question is whether there are enough of them,
>> and whether twiddling TS is fast enough, that it's worth it.
>
>
On Sun, Feb 22, 2015 at 01:57:36PM +0100, Ingo Molnar wrote:
> This is also very similar to the ~0.6 secs improvement your
> first set of numbers gave.
Yeah, running without --repeat was simply misleading.
> So now that it appears we have consistent numbers, it would
> be nice to check it on
* Borislav Petkov wrote:
> Lazy FPU:
> 219.406449195 seconds time elapsed
>( +- 0.17% )
> Eager FPU:
> 218.791122148 seconds time elapsed
>( +- 0.13% )
> Timing improvement of 0.6 secs on average
On Sun, Feb 22, 2015 at 09:18:40AM +0100, Ingo Molnar wrote:
> - It might make sense to do a 'perf stat --null --repeat'
> measurement as well [without any -e arguments], to make
> sure the rich PMU stats you are gathering are not
> interfering?
Well, the --repeat thing definitely
On Sat, Feb 21, 2015 at 06:18:01PM -0800, Andy Lutomirski wrote:
> That's true. The question is whether there are enough of them, and
> whether twiddling TS is fast enough, that it's worth it.
Yes, and let me make it clear what I'm trying to do here: I want to make
sure that eager FPU handling
On Sun, Feb 22, 2015 at 09:18:40AM +0100, Ingo Molnar wrote:
> So am I interpreting the older and your latest numbers
> correctly in stating that the cost observation has flipped
> around 180 degrees: the first measurement showed eager FPU
> to be a win, but now that we can do more precise
>
* Ingo Molnar wrote:
> - Do you have enough RAM that there's essentially no IO
> in the system worth speaking of? Do you have enough RAM
> to copy a whole kernel tree to /tmp/linux/ and do the
> measurement there, on ramfs?
Doing that will also pin down the page cache: kernel
* Borislav Petkov wrote:
> which spit this:
>
> Lazy FPU:
> 219.127929718 seconds time elapsed
> Eager FPU:
> 220.148034331 seconds time elapsed
> so we have a second slowdown and 200K FPU saves more in eager mode.
So am I interpreting the older and your latest numbers
correctly
On Sat, Feb 21, 2015 at 06:18:01PM -0800, Andy Lutomirski wrote:
That's true. The question is whether there are enough of them, and
whether twiddling TS is fast enough, that it's worth it.
Yes, and let me make it clear what I'm trying to do here: I want to make
sure that eager FPU handling
* Borislav Petkov b...@alien8.de wrote:
Lazy FPU:
219.406449195 seconds time elapsed
( +- 0.17% )
Eager FPU:
218.791122148 seconds time elapsed
( +- 0.13% )
Timing improvement of 0.6 secs on
* Borislav Petkov b...@alien8.de wrote:
which spit this:
Lazy FPU:
219.127929718 seconds time elapsed
Eager FPU:
220.148034331 seconds time elapsed
so we have a second slowdown and 200K FPU saves more in eager mode.
So am I interpreting the older and your latest numbers
On Sun, Feb 22, 2015 at 09:18:40AM +0100, Ingo Molnar wrote:
- It might make sense to do a 'perf stat --null --repeat'
measurement as well [without any -e arguments], to make
sure the rich PMU stats you are gathering are not
interfering?
Well, the --repeat thing definitely is
On Sun, Feb 22, 2015 at 09:18:40AM +0100, Ingo Molnar wrote:
So am I interpreting the older and your latest numbers
correctly in stating that the cost observation has flipped
around 180 degrees: the first measurement showed eager FPU
to be a win, but now that we can do more precise
On Sun, Feb 22, 2015 at 01:57:36PM +0100, Ingo Molnar wrote:
This is also very similar to the ~0.6 secs improvement your
first set of numbers gave.
Yeah, running without --repeat was simply misleading.
So now that it appears we have consistent numbers, it would
be nice to check it on older
* Ingo Molnar mi...@kernel.org wrote:
- Do you have enough RAM that there's essentially no IO
in the system worth speaking of? Do you have enough RAM
to copy a whole kernel tree to /tmp/linux/ and do the
measurement there, on ramfs?
Doing that will also pin down the page
On Sun, Feb 22, 2015 at 5:45 PM, Rik van Riel r...@redhat.com wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/22/2015 06:06 AM, Borislav Petkov wrote:
On Sat, Feb 21, 2015 at 06:18:01PM -0800, Andy Lutomirski wrote:
That's true. The question is whether there are enough of them,
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/22/2015 06:06 AM, Borislav Petkov wrote:
On Sat, Feb 21, 2015 at 06:18:01PM -0800, Andy Lutomirski wrote:
That's true. The question is whether there are enough of them,
and whether twiddling TS is fast enough, that it's worth it.
Yes, and
On Sat, Feb 21, 2015 at 4:34 PM, Maciej W. Rozycki wrote:
> On Sat, 21 Feb 2015, Borislav Petkov wrote:
>
>> Provided I've not made a mistake, this leads me to think that this
>> simple workload and pretty much everything else uses the FPU through
>> glibc which does the SSE memcpy and so on.
On Sat, 21 Feb 2015, Borislav Petkov wrote:
> Provided I've not made a mistake, this leads me to think that this
> simple workload and pretty much everything else uses the FPU through
> glibc which does the SSE memcpy and so on. Which basically kills the
> whole idea behind lazy FPU as
On Sat, Feb 21, 2015 at 08:23:52PM +0100, Ingo Molnar wrote:
> to switch between the modes?
I went all out and did a debugfs file, see patch at the end, which
counts FPU saves. Then I ran this script:
---
#!/bin/bash
D="/sys/kernel/debug/fpu/eager"
echo "Lazy FPU: "
echo 0 > $D
echo -n " FPU
* Borislav Petkov wrote:
> > I'd sleep a lot better if we had some runtime debug
> > flag to be able to do run-to-run comparisons on the
> > same booted up kernel, or so.
>
> Let me take a look whether we could so some knob... The
> nice thing is, code uses use_eager_fpu() to check stuff
>
On Sat, Feb 21, 2015 at 07:39:52PM +0100, Ingo Molnar wrote:
> So the workload improved by ~600,000 usecs, and there's
> 68,000 less calls, so it saved 8.8 usecs per call. Isn't
I think you mean more calls. The eager measurement has more calls. Let
me do some primitive math:
def
* Borislav Petkov wrote:
> On Sat, Feb 21, 2015 at 05:38:40PM +0100, Borislav Petkov wrote:
> > My assumption is that libc uses SSE for memcpy and thus the FPU will
> > be used. (I'll trace FPU-specific PMCs later to confirm).
>
> Ok, so I slapped a trace_printk() at the beginning of
* Borislav Petkov wrote:
> plain 3.19:
>
> 234.681331200 seconds time elapsed
>( +- 0.15% )
>
> eagerfpu=ENABLE
>
> 234.066525648 seconds time elapsed
>( +- 0.19% )
hm, a win of more than 600
On Sat, Feb 21, 2015 at 05:38:40PM +0100, Borislav Petkov wrote:
> My assumption is that libc uses SSE for memcpy and thus the FPU will
> be used. (I'll trace FPU-specific PMCs later to confirm).
Ok, so I slapped a trace_printk() at the beginning of fpu_save_init()
and did a kernel build once
On Sat, Feb 21, 2015 at 10:31:50AM +0100, Ingo Molnar wrote:
> So it would be nice to test this on at least one reasonably old (but
> not uncomfortably old - say 5 years old) system, to get a feel for
> what kind of performance impact it has there.
Yeah, this is exactly what Andy and I were
* Andy Lutomirski wrote:
> We have eager and lazy fpu modes, introduced in:
>
> 304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
> xsave
>
> The result is rather messy. There are two code paths in
> almost all of the FPU code, and only one of them (the
> eager
On Sat, Feb 21, 2015 at 4:34 PM, Maciej W. Rozycki ma...@linux-mips.org wrote:
On Sat, 21 Feb 2015, Borislav Petkov wrote:
Provided I've not made a mistake, this leads me to think that this
simple workload and pretty much everything else uses the FPU through
glibc which does the SSE memcpy
On Sat, 21 Feb 2015, Borislav Petkov wrote:
Provided I've not made a mistake, this leads me to think that this
simple workload and pretty much everything else uses the FPU through
glibc which does the SSE memcpy and so on. Which basically kills the
whole idea behind lazy FPU as practically
* Andy Lutomirski l...@amacapital.net wrote:
We have eager and lazy fpu modes, introduced in:
304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
xsave
The result is rather messy. There are two code paths in
almost all of the FPU code, and only one of them (the
On Sat, Feb 21, 2015 at 10:31:50AM +0100, Ingo Molnar wrote:
So it would be nice to test this on at least one reasonably old (but
not uncomfortably old - say 5 years old) system, to get a feel for
what kind of performance impact it has there.
Yeah, this is exactly what Andy and I were talking
* Borislav Petkov b...@alien8.de wrote:
On Sat, Feb 21, 2015 at 05:38:40PM +0100, Borislav Petkov wrote:
My assumption is that libc uses SSE for memcpy and thus the FPU will
be used. (I'll trace FPU-specific PMCs later to confirm).
Ok, so I slapped a trace_printk() at the beginning of
On Sat, Feb 21, 2015 at 05:38:40PM +0100, Borislav Petkov wrote:
My assumption is that libc uses SSE for memcpy and thus the FPU will
be used. (I'll trace FPU-specific PMCs later to confirm).
Ok, so I slapped a trace_printk() at the beginning of fpu_save_init()
and did a kernel build once with
* Borislav Petkov b...@alien8.de wrote:
plain 3.19:
234.681331200 seconds time elapsed
( +- 0.15% )
eagerfpu=ENABLE
234.066525648 seconds time elapsed
( +- 0.19% )
hm, a win of more than 600
On Sat, Feb 21, 2015 at 08:23:52PM +0100, Ingo Molnar wrote:
to switch between the modes?
I went all out and did a debugfs file, see patch at the end, which
counts FPU saves. Then I ran this script:
---
#!/bin/bash
D=/sys/kernel/debug/fpu/eager
echo Lazy FPU:
echo 0 $D
echo -n FPU saves
On Sat, Feb 21, 2015 at 07:39:52PM +0100, Ingo Molnar wrote:
So the workload improved by ~600,000 usecs, and there's
68,000 less calls, so it saved 8.8 usecs per call. Isn't
I think you mean more calls. The eager measurement has more calls. Let
me do some primitive math:
def =(234.681331200
* Borislav Petkov b...@alien8.de wrote:
I'd sleep a lot better if we had some runtime debug
flag to be able to do run-to-run comparisons on the
same booted up kernel, or so.
Let me take a look whether we could so some knob... The
nice thing is, code uses use_eager_fpu() to check
+ Linus.
I'm sure he'll save something to say about it :-)
On Fri, Feb 20, 2015 at 10:58:15AM -0800, Andy Lutomirski wrote:
> We have eager and lazy fpu modes, introduced in:
>
> 304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
> xsave
>
> The result is rather messy.
We have eager and lazy fpu modes, introduced in:
304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting xsave
The result is rather messy. There are two code paths in almost all of the
FPU code, and only one of them (the eager case) is tested frequently, since
most kernel
+ Linus.
I'm sure he'll save something to say about it :-)
On Fri, Feb 20, 2015 at 10:58:15AM -0800, Andy Lutomirski wrote:
We have eager and lazy fpu modes, introduced in:
304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting
xsave
The result is rather messy. There
We have eager and lazy fpu modes, introduced in:
304bceda6a18 x86, fpu: use non-lazy fpu restore for processors supporting xsave
The result is rather messy. There are two code paths in almost all of the
FPU code, and only one of them (the eager case) is tested frequently, since
most kernel
90 matches
Mail list logo