Re: [Xen-devel] [RFC PATCH] x86/asm/irq: Don't use POPF but STI
On Tue, Apr 21, 2015 at 02:45:58PM +0200, Ingo Molnar wrote: From 6f01f6381e8293c360b7a89f516b8605e357d563 Mon Sep 17 00:00:00 2001 From: Ingo Molnar mi...@kernel.org Date: Tue, 21 Apr 2015 13:32:13 +0200 Subject: [PATCH] x86/asm/irq: Don't use POPF but STI So because the POPF instruction is slow and STI is faster on essentially all x86 CPUs that matter, instead of: 81891848: 9d popfq we can do: 81661a2e: 41 f7 c4 00 02 00 00test $0x200,%r12d 81661a35: 74 01 je 81661a38 snd_pcm_stream_unlock_irqrestore+0x28 81661a37: fb sti 81661a38: This bloats the kernel a bit, by about 1K on the 64-bit defconfig: textdata bss dec hex filename 122586341812120 1085440 15156194 e743e2 vmlinux.before 122595821812120 1085440 15157142 e74796 vmlinux.after the other cost is the extra branching, adding extra pressure to the branch prediction hardware and also potential branch misses. Do we care? After we enable interrupts, we'll most likely go somewhere cache cold anyway, so the branch misses will happen anyway. The question is, would the cost drop from POPF - STI cover the increase in branch misses overhead? Hmm, interesting. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH] x86/asm/irq: Don't use POPF but STI
* Borislav Petkov b...@alien8.de wrote: On Tue, Apr 21, 2015 at 02:45:58PM +0200, Ingo Molnar wrote: From 6f01f6381e8293c360b7a89f516b8605e357d563 Mon Sep 17 00:00:00 2001 From: Ingo Molnar mi...@kernel.org Date: Tue, 21 Apr 2015 13:32:13 +0200 Subject: [PATCH] x86/asm/irq: Don't use POPF but STI So because the POPF instruction is slow and STI is faster on essentially all x86 CPUs that matter, instead of: 81891848: 9d popfq we can do: 81661a2e: 41 f7 c4 00 02 00 00test $0x200,%r12d 81661a35: 74 01 je 81661a38 snd_pcm_stream_unlock_irqrestore+0x28 81661a37: fb sti 81661a38: This bloats the kernel a bit, by about 1K on the 64-bit defconfig: textdata bss dec hex filename 122586341812120 1085440 15156194 e743e2 vmlinux.before 122595821812120 1085440 15157142 e74796 vmlinux.after the other cost is the extra branching, adding extra pressure to the branch prediction hardware and also potential branch misses. Do we care? [...] Only if it makes stuff faster. [...] After we enable interrupts, we'll most likely go somewhere cache cold anyway, so the branch misses will happen anyway. The question is, would the cost drop from POPF - STI cover the increase in branch misses overhead? Hmm, interesting. So there's a few places where the POPF is a STI in 100% of the cases. It's probably a win there. But my main worry would be sites that are 'multi use', such as locking APIs - for example spin_unlock_irqrestore(): those tend to be called from different code paths, and each one has a different IRQ flags state. For example scheduler wakeups done from irqs-off codepaths (it's very common), or from irqs-on codepaths (that's very common as well). In the former case we won't have a STI, in the latter case we will - and both would hit a POPF at the end of the critical section. The probability of a branch prediction miss is high in this case. So the question is, is the POPF/STI performance difference higher than the average cost of branch misses. If yes, then the change is probably a win. If not, then it's probably a loss. My gut feeling is that we should let the hardware do it, i.e. we should continue to use POPF - but I can be convinced ... Thanks, Ingo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH] x86/asm/irq: Don't use POPF but STI
On Tue, Apr 21, 2015 at 5:45 AM, Ingo Molnar mi...@kernel.org wrote: Totally untested and not signed off yet: because we'd first have to make sure (via irq flags debugging) that it's not used in reverse, to re-disable interrupts: Not only might that happen in some place, I *really* doubt that a conditional 'sti' is actually any faster. The only way it's going to be measurably faster is if you run some microbenchmark so that the code is hot and the branch predicts well. popf is fast for the no changes to IF case, and is a smaller instruction anyway. I'd really hate to make this any more complex unless somebody has some real numbers for performance improvement (that is *not* just some cycle timing from a bogus test-case, but real measurements on a real load). And even *with* real measurements, I'd worry about the use popf to clear IF case. Linus ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH] x86/asm/irq: Don't use POPF but STI
On Tue, Apr 21, 2015 at 9:12 AM, Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Apr 21, 2015 at 5:45 AM, Ingo Molnar mi...@kernel.org wrote: Totally untested and not signed off yet: because we'd first have to make sure (via irq flags debugging) that it's not used in reverse, to re-disable interrupts: Not only might that happen in some place, I *really* doubt that a conditional 'sti' is actually any faster. The only way it's going to be measurably faster is if you run some microbenchmark so that the code is hot and the branch predicts well. popf is fast for the no changes to IF case, and is a smaller instruction anyway. I'd really hate to make this any more complex unless somebody has some real numbers for performance improvement (that is *not* just some cycle timing from a bogus test-case, but real measurements on a real load). And even *with* real measurements, I'd worry about the use popf to clear IF case. Fair enough. Maybe I'll benchmark this some day. --Andy Linus -- Andy Lutomirski AMA Capital Management, LLC ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel