Hello!
New patch to avoid LCP stalls based on feedback from earlier patch. I modified
H.J.'s old patch to perform the peephole2 to split immediate moves to HImode
memory. This is now enabled for Core2, Corei7 and Generic.
2012-04-04 Teresa Johnson tejohn...@google.com
*
Thanks, I will do both and update the comment as suggested by David,
retest and then commit.
Teresa
On Thu, Apr 5, 2012 at 12:41 AM, Uros Bizjak ubiz...@gmail.com wrote:
Hello!
New patch to avoid LCP stalls based on feedback from earlier patch. I
modified
H.J.'s old patch to perform the
New patch to avoid LCP stalls based on feedback from earlier patch. I modified
H.J.'s old patch to perform the peephole2 to split immediate moves to HImode
memory. This is now enabled for Core2, Corei7 and Generic.
I verified that this enables the splitting to occur in the case that originally
On Wed, Apr 4, 2012 at 5:07 PM, Teresa Johnson tejohn...@google.com wrote:
New patch to avoid LCP stalls based on feedback from earlier patch. I modified
H.J.'s old patch to perform the peephole2 to split immediate moves to HImode
memory. This is now enabled for Core2, Corei7 and Generic.
I
On Wed, Apr 4, 2012 at 5:39 PM, H.J. Lu hjl.to...@gmail.com wrote:
On Wed, Apr 4, 2012 at 5:07 PM, Teresa Johnson tejohn...@google.com wrote:
New patch to avoid LCP stalls based on feedback from earlier patch. I
modified
H.J.'s old patch to perform the peephole2 to split immediate moves to
This patch addresses instructions that incur expensive length-changing prefix
(LCP) stalls
on some x86-64 implementations, notably Core2 and Corei7. Specifically, a move
of
a 16-bit constant into memory requires a length-changing prefix and can incur
significant
penalties. The attached patch
I should add that I have tested performance of this on Core2, Corei7
(Nehalem) and AMD Opteron-based systems. It appears to be
performance-neutral on AMD (only minor perturbations, overall a wash).
For the test case that provoked the optimization, there were nice
improvements on Core2 and Corei7.
Minor update to patch to remove unnecessary check in new movhi_imm_internal
define_insn.
Retested successfully.
Teresa
2012-03-29 Teresa Johnson tejohn...@google.com
* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_LCP_STALL.
* config/i386/i386.md
On 03/30/2012 11:03 AM, Teresa Johnson wrote:
+(define_insn *movhi_imm_internal
+ [(set (match_operand:HI 0 memory_operand =m)
+(match_operand:HI 1 immediate_operand n))]
+ !TARGET_LCP_STALL
+{
+ return mov{w}\t{%1, %0|%0, %1};
+}
+ [(set (attr type) (const_string imov))
+
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 185920)
+++ config/i386/i386.md (working copy)
@@ -2262,9 +2262,19 @@
]
(const_string SI)))])
+(define_insn *movhi_imm_internal
On 03/30/2012 11:11 AM, Richard Henderson wrote:
On 03/30/2012 11:03 AM, Teresa Johnson wrote:
+(define_insn *movhi_imm_internal
+ [(set (match_operand:HI 0 memory_operand =m)
+(match_operand:HI 1 immediate_operand n))]
+ !TARGET_LCP_STALL
+{
+ return mov{w}\t{%1, %0|%0, %1};
+}
On Fri, Mar 30, 2012 at 8:19 AM, Richard Henderson r...@redhat.com wrote:
On 03/30/2012 11:11 AM, Richard Henderson wrote:
On 03/30/2012 11:03 AM, Teresa Johnson wrote:
+(define_insn *movhi_imm_internal
+ [(set (match_operand:HI 0 memory_operand =m)
+ (match_operand:HI 1
Hi Richard, Jan and H.J.,
Thanks for all the quick responses and suggestions.
I had tested my patch when tuning for an arch without the LCP stalls,
but it didn't hit an issue in reload because it didn't require
rematerialization. Thanks for pointing out this issue.
Regarding the penalty, it can
Hi Richard, Jan and H.J.,
Thanks for all the quick responses and suggestions.
I had tested my patch when tuning for an arch without the LCP stalls,
but it didn't hit an issue in reload because it didn't require
rematerialization. Thanks for pointing out this issue.
Regarding the
14 matches
Mail list logo