http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
Richard Biener rguenth at gcc dot gnu.org changed:
What|Removed |Added
Status|NEW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
Eric Botcazou ebotcazou at gcc dot gnu.org changed:
What|Removed |Added
CC||ebotcazou at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
--- Comment #7 from Richard Guenther rguenth at gcc dot gnu.org 2012-07-05
08:38:05 UTC ---
It's a pass ordering issue, cunrolli also can tremendously help vectorization
because it enables vectorization of the loop that is then the innermost loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
--- Comment #8 from Eric Botcazou ebotcazou at gcc dot gnu.org 2012-07-05
08:48:24 UTC ---
It's a pass ordering issue, cunrolli also can tremendously help vectorization
because it enables vectorization of the loop that is then the innermost
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
--- Comment #9 from Richard Guenther rguenth at gcc dot gnu.org 2012-07-05
10:10:28 UTC ---
I have a few patches that try to estimate CSE opportunities exposed by
complete unrolling. In this case the CSE opportunity is the reduction
into C(i,j)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
--- Comment #10 from Richard Guenther rguenth at gcc dot gnu.org 2012-07-05
10:11:55 UTC ---
Oh, and you can disable cunrolli already via -fdisable-tree-cunrolli.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
--- Comment #11 from Eric Botcazou ebotcazou at gcc dot gnu.org 2012-07-05
10:30:09 UTC ---
Oh, and you can disable cunrolli already via -fdisable-tree-cunrolli.
Indeed, I always forget that we have it in 4.7 and above.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch changed:
What|Removed |Added
Last reconfirmed|
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
Richard Guenther rguenth at gcc dot gnu.org changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
--- Comment #1 from Richard Guenther rguenth at gcc dot gnu.org 2011-01-14
20:43:15 UTC ---
It's faster for me with -O3 (Athlon64, using -march=native).
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
--- Comment #2 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch
2011-01-14 20:52:54 UTC ---
(In reply to comment #1)
It's faster for me with -O3 (Athlon64, using -march=native).
well not on
model name : Intel(R) Xeon(R) CPU
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
--- Comment #3 from Joost VandeVondele Joost.VandeVondele at pci dot uzh.ch
2011-01-14 21:02:04 UTC ---
Actually, also on AMD I have at -O2 9.4s -O3 11.8s
model : 9
model name : AMD Opteron(tm) Processor 6176 SE
stepping:
12 matches
Mail list logo