[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2009-02-03 Thread bonzini at gnu dot org
--- Comment #10 from bonzini at gnu dot org 2009-02-03 09:47 --- Can you try the patch of PR38824? -- bonzini at gnu dot org changed: What|Removed |Added

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2009-02-03 Thread bonzini at gnu dot org
--- Comment #12 from bonzini at gnu dot org 2009-02-03 11:17 --- What if we forbid altogether memory operands and we *synthesize* them with a peephole2? Anyway, it seems safe to me to declare this a dup of PR38824? -- bonzini at gnu dot org changed: What|Removed

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2009-02-03 Thread ubizjak at gmail dot com
--- Comment #11 from ubizjak at gmail dot com 2009-02-03 10:36 --- (In reply to comment #10) Can you try the patch of PR38824? I have tried with a similar peephole2 recognizer. The problem is, that there is no spare x register to allocate as a temporary, so peephole2 is ineffective in

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2009-02-03 Thread ubizjak at gmail dot com
--- Comment #13 from ubizjak at gmail dot com 2009-02-03 11:34 --- (In reply to comment #12) What if we forbid altogether memory operands and we *synthesize* them with a peephole2? Anyway, it seems safe to me to declare this a dup of PR38824? I think that we will hit PR 19398

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2008-11-17 Thread jakub at gcc dot gnu dot org
-- jakub at gcc dot gnu dot org changed: What|Removed |Added Priority|P3 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38134

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2008-11-17 Thread ubizjak at gmail dot com
--- Comment #6 from ubizjak at gmail dot com 2008-11-17 18:11 --- I think that addps .LC10(%rip), %xmm0 mulps %xmm1, %xmm0 addps .LC11(%rip), %xmm0 mulps %xmm1, %xmm0 addps .LC12(%rip), %xmm0 mulps %xmm1, %xmm0 addps

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2008-11-17 Thread tim at klingt dot org
--- Comment #7 from tim at klingt dot org 2008-11-17 18:19 --- Created an attachment (id=16710) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16710action=view) compressed preprocessed source, gcc-4.4 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38134

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2008-11-17 Thread tim at klingt dot org
--- Comment #8 from tim at klingt dot org 2008-11-17 18:30 --- Created an attachment (id=16711) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16711action=view) 16684: compressed preprocessed source, gcc-4.3 -- tim at klingt dot org changed: What|Removed

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2008-11-17 Thread tim at klingt dot org
--- Comment #9 from tim at klingt dot org 2008-11-17 18:49 --- i have updated the test program and attached preprocessed sources of gcc 4.3 and 4.4 the loop prefix contains 4.4 (9 invariant loads, one store of a generated constant to the stack): pxor%xmm5, %xmm5

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2008-11-15 Thread rguenth at gcc dot gnu dot org
-- rguenth at gcc dot gnu dot org changed: What|Removed |Added GCC target triplet||x86_64-*-*-* Keywords|

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2008-11-15 Thread hjl dot tools at gmail dot com
--- Comment #4 from hjl dot tools at gmail dot com 2008-11-16 00:06 --- (In reply to comment #3) i tried to run the benchmark with -fno-ira, which turned out to be about 20% slower than without the flag. Can you try -O3 -march=core2 -mtune=generic and -O3 -march=core2

[Bug target/38134] [4.4 Regression] speed regression with inline-asm sse code

2008-11-15 Thread hjl dot tools at gmail dot com
--- Comment #5 from hjl dot tools at gmail dot com 2008-11-16 00:08 --- (In reply to comment #3) anyway, i found, that the preprocessed source generated by gcc-4.3 cannot be compiled with gcc-4.4 ... the specific file can be found here