[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2018-06-09 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #14 from Peter Cordes --- I happened to look at this old bug again recently. re: extracting high the low two 32-bit elements: (In reply to Uroš Bizjak from comment #11) > > Or without SSE4 -mtune=sandybridge (anything that excluded

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-30 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #13 from uros at gcc dot gnu.org --- Author: uros Date: Tue May 30 17:18:25 2017 New Revision: 248691 URL: https://gcc.gnu.org/viewcvs?rev=248691=gcc=rev Log: PR target/80833 * config/i386/constraints.md (Yd): New

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-24 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #12 from Uroš Bizjak --- (In reply to Peter Cordes from comment #4) > MMX is also a saving in code-size: one fewer prefix byte vs. SSE2 integer > instructions. It's also another set of 8 registers for 32-bit mode. After touching a

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-24 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #11 from Uroš Bizjak --- (In reply to Peter Cordes from comment #0) > A lower-latency xmm->int strategy would be: > > movd%xmm0, %eax > pextrd $1, %xmm0, %edx Proposed patch implements the above for generic

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-24 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #10 from Uroš Bizjak --- (In reply to Peter Cordes from comment #0) > Scalar 64-bit integer ops in vector regs may be useful in general in 32-bit > code in some cases, especially if it helps with register pressure. We have

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-24 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #9 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #8) > movq%xmm0, (%esp) <<-- unneeded store due to RA problem For some reason, reload "fixes" direct DImode register moves, and passes value via memory.

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-24 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #8 from Uroš Bizjak --- The patch from comment #7 generates: a) DImode move for 32 bit targets: --cut here-- long long test (long long a) { asm ("" : "+x" (a)); return a; } --cut here-- gcc -O2 -msse4.1 -mtune=intel

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-24 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #7 from Uroš Bizjak --- Created attachment 41412 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41412=edit Prototype patch Patch that emits mov/pinsr or mov/pextr pairs for DImode (x86_32) and TImode (x86_64) moves.

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-22 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #6 from Peter Cordes --- (In reply to Richard Biener from comment #5) > There's some related bugs. I think there is no part of the compiler that > specifically tries to avoid store forwarding issues. Ideally the compiler would keep

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed|

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-19 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #4 from Peter Cordes --- I don't think it's worth anyone's time to implement this in 2017, but using MMX regs for 64-bit store/load would be faster on really old CPUs that split 128b vectors insns into two halves, like K8 and Pentium

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-19 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #3 from Peter Cordes --- Atom's movd xmm->int is slower (lat=4, rtput=2) than its movd int->xmm (lat=3, rtput=1), which is opposite of every other CPU (except Silvermont where they're the same throughput but xmm->int is 1c slower).

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-19 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #2 from Peter Cordes --- On most CPUs, psrldq / movd is optimal for xmm[1] -> int without SSE4. On SnB-family, movd runs on port0, and psrldq can run on port5, so they can execute in parallel. (And the second movd can run the next

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-19 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833 --- Comment #1 from Peter Cordes --- See https://godbolt.org/g/krXH9M for the functions I was looking at.