Re: [Xen-devel] [RFC PATCH] x86/asm/irq: Don't use POPF but STI

2015-04-21 Thread Borislav Petkov
On Tue, Apr 21, 2015 at 02:45:58PM +0200, Ingo Molnar wrote:
 From 6f01f6381e8293c360b7a89f516b8605e357d563 Mon Sep 17 00:00:00 2001
 From: Ingo Molnar mi...@kernel.org
 Date: Tue, 21 Apr 2015 13:32:13 +0200
 Subject: [PATCH] x86/asm/irq: Don't use POPF but STI
 
 So because the POPF instruction is slow and STI is faster on 
 essentially all x86 CPUs that matter, instead of:
 
   81891848:   9d  popfq
 
 we can do:
 
   81661a2e:   41 f7 c4 00 02 00 00test   $0x200,%r12d
   81661a35:   74 01   je 81661a38 
 snd_pcm_stream_unlock_irqrestore+0x28
   81661a37:   fb  sti
   81661a38:
 
 This bloats the kernel a bit, by about 1K on the 64-bit defconfig:
 
textdata bss dec hex filename
122586341812120 1085440 15156194 e743e2 vmlinux.before
122595821812120 1085440 15157142 e74796 vmlinux.after
 
 the other cost is the extra branching, adding extra pressure to the
 branch prediction hardware and also potential branch misses.

Do we care? After we enable interrupts, we'll most likely go somewhere
cache cold anyway, so the branch misses will happen anyway.

The question is, would the cost drop from POPF - STI cover the increase
in branch misses overhead?

Hmm, interesting.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/asm/irq: Don't use POPF but STI

2015-04-21 Thread Ingo Molnar

* Borislav Petkov b...@alien8.de wrote:

 On Tue, Apr 21, 2015 at 02:45:58PM +0200, Ingo Molnar wrote:
  From 6f01f6381e8293c360b7a89f516b8605e357d563 Mon Sep 17 00:00:00 2001
  From: Ingo Molnar mi...@kernel.org
  Date: Tue, 21 Apr 2015 13:32:13 +0200
  Subject: [PATCH] x86/asm/irq: Don't use POPF but STI
  
  So because the POPF instruction is slow and STI is faster on 
  essentially all x86 CPUs that matter, instead of:
  
81891848:   9d  popfq
  
  we can do:
  
81661a2e:   41 f7 c4 00 02 00 00test   $0x200,%r12d
81661a35:   74 01   je 81661a38 
  snd_pcm_stream_unlock_irqrestore+0x28
81661a37:   fb  sti
81661a38:
  
  This bloats the kernel a bit, by about 1K on the 64-bit defconfig:
  
 textdata bss dec hex filename
 122586341812120 1085440 15156194 e743e2 vmlinux.before
 122595821812120 1085440 15157142 e74796 vmlinux.after
  
  the other cost is the extra branching, adding extra pressure to the
  branch prediction hardware and also potential branch misses.
 
 Do we care? [...]

Only if it makes stuff faster.

 [...] After we enable interrupts, we'll most likely go somewhere 
 cache cold anyway, so the branch misses will happen anyway.
 
 The question is, would the cost drop from POPF - STI cover the 
 increase in branch misses overhead?
 
 Hmm, interesting.

So there's a few places where the POPF is a STI in 100% of the cases. 
It's probably a win there.

But my main worry would be sites that are 'multi use', such as locking 
APIs - for example spin_unlock_irqrestore(): those tend to be called 
from different code paths, and each one has a different IRQ flags 
state.

For example scheduler wakeups done from irqs-off codepaths (it's very 
common), or from irqs-on codepaths (that's very common as well). In 
the former case we won't have a STI, in the latter case we will - and 
both would hit a POPF at the end of the critical section. The 
probability of a branch prediction miss is high in this case.

So the question is, is the POPF/STI performance difference higher than 
the average cost of branch misses. If yes, then the change is probably 
a win. If not, then it's probably a loss.

My gut feeling is that we should let the hardware do it, i.e. we 
should continue to use POPF - but I can be convinced ...

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/asm/irq: Don't use POPF but STI

2015-04-21 Thread Linus Torvalds
On Tue, Apr 21, 2015 at 5:45 AM, Ingo Molnar mi...@kernel.org wrote:

 Totally untested and not signed off yet: because we'd first have to
 make sure (via irq flags debugging) that it's not used in reverse, to
 re-disable interrupts:

Not only might that happen in some place, I *really* doubt that a
conditional 'sti' is actually any faster. The only way it's going to
be measurably faster is if you run some microbenchmark so that the
code is hot and the branch predicts well.

popf is fast for the no changes to IF case, and is a smaller
instruction anyway. I'd really hate to make this any more complex
unless somebody has some real numbers for performance improvement
(that is *not* just some cycle timing from a bogus test-case, but real
measurements on a real load).

And even *with* real measurements, I'd worry about the use popf to
clear IF case.

   Linus

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/asm/irq: Don't use POPF but STI

2015-04-21 Thread Andy Lutomirski
On Tue, Apr 21, 2015 at 9:12 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Tue, Apr 21, 2015 at 5:45 AM, Ingo Molnar mi...@kernel.org wrote:

 Totally untested and not signed off yet: because we'd first have to
 make sure (via irq flags debugging) that it's not used in reverse, to
 re-disable interrupts:

 Not only might that happen in some place, I *really* doubt that a
 conditional 'sti' is actually any faster. The only way it's going to
 be measurably faster is if you run some microbenchmark so that the
 code is hot and the branch predicts well.

 popf is fast for the no changes to IF case, and is a smaller
 instruction anyway. I'd really hate to make this any more complex
 unless somebody has some real numbers for performance improvement
 (that is *not* just some cycle timing from a bogus test-case, but real
 measurements on a real load).

 And even *with* real measurements, I'd worry about the use popf to
 clear IF case.

Fair enough.  Maybe I'll benchmark this some day.

--Andy


Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel