[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 Richard Biener changed: What|Removed |Added Component|tree-optimization |rtl-optimization CC|

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 Richard Biener changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #9 from Hongtao.liu --- 1703 : 401cb1: vmovq %xmm1,%r9 (*) 834 : 401cb6: vmovq %r8,%xmm1 1719 : 401cbb: vmovq %r9,%xmm0 (*) Look like %r9 is dead after the second (*), and it can be optimized to 1703 :

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #10 from Hongtao.liu --- (In reply to Richard Biener from comment #8) > So w/ -Ofast -march=znver2 I get a runtime of 130 seconds, when I add > -mtune-ctrl=^inter_unit_moves_from_vec,^inter_unit_moves_to_vec then > this improves to 1

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #11 from Richard Biener --- (In reply to Richard Biener from comment #8) > So w/ -Ofast -march=znver2 I get a runtime of 130 seconds, when I add > -mtune-ctrl=^inter_unit_moves_from_vec,^inter_unit_moves_to_vec then > this improves t

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #12 from Richard Biener --- (In reply to Hongtao.liu from comment #10) > (In reply to Richard Biener from comment #8) > > So w/ -Ofast -march=znver2 I get a runtime of 130 seconds, when I add > > -mtune-ctrl=^inter_unit_moves_from_ve

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #13 from hubicka at kam dot mff.cuni.cz --- > > According to znver2_cost > > > > Cost of sse_to_integer is a little bit less than fp_store, maybe increase > > sse_to_integer cost(more than fp_store) can helps RA to choose memory > >

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #14 from Richard Biener --- (In reply to Hongtao.liu from comment #9) > 1703 : 401cb1: vmovq %xmm1,%r9 (*) > 834 : 401cb6: vmovq %r8,%xmm1 > 1719 : 401cbb: vmovq %r9,%xmm0 (*) > > Look like %r9 is dead afte

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #15 from Richard Biener --- Created attachment 52300 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52300&action=edit LBM_performStreamCollide testcase This is the relevant function.

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #16 from hubicka at kam dot mff.cuni.cz --- > > Yep, we also have code like > > - movabsq $0x3ff03db8fde2ef4e, %r8 > ... > - vmovq %r8, %xmm11 It is loading random constant to xmm11. Since reg<->xmm moves are relativ

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #17 from Richard Biener --- So in .reload we have (with unpatched trunk) 401: NOTE_INSN_BASIC_BLOCK 6 462: ax:DF=[`*.LC0'] REG_EQUAL 9.8506899724167309977929107844829559326171875e-1 407: xmm2:DF=ax:DF 463: ax:D

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #18 from Richard Biener --- For the case of LBM what also helps is disabling PRE or using PGO (which sees the useless PRE) given that the path the expressions become partially compile-time computable is never taken at runtime. In th

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #19 from rguenther at suse dot de --- On Thu, 27 Jan 2022, hubicka at kam dot mff.cuni.cz wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 > > --- Comment #13 from hubicka at kam dot mff.cuni.cz --- > > > According to z

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #20 from rguenther at suse dot de --- On Thu, 27 Jan 2022, hubicka at kam dot mff.cuni.cz wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 > > --- Comment #16 from hubicka at kam dot mff.cuni.cz --- > > > > Yep, we als

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread hubicka at kam dot mff.cuni.cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #21 from hubicka at kam dot mff.cuni.cz --- > I would say so. It saves code size and also uop space unless the two > can magically fuse to a immediate to %xmm move (I doubt that). I made simple benchmark double a=10; int main() {

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 H.J. Lu changed: What|Removed |Added Status|NEW |WAITING --- Comment #22 from H.J. Lu --- Is

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 Richard Biener changed: What|Removed |Added Status|WAITING |NEW --- Comment #23 from Richard Biene

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #24 from Hongtao.liu --- for vmovq %rdi, %xmm7 # 503 [c=4 l=4] *movdf_internal/21 .. vmulsd %xmm7, %xmm4, %xmm5 # 320 [c=12 l=4] *fop_df_comm/2 .. movabsq $0x3fef85af6c69b5a6, %rdi # 409 [c=5 l=10] *

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #25 from Hongtao.liu --- > Guess we need to let RA know mem cost is cheaper than GPR for r249. Reduce sse_store cost?

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-28 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #26 from Vladimir Makarov --- (In reply to Richard Biener from comment #7) > make costs in a way that IRA/LRA prefer re-materialization of constants > from the constant pool over spilling to GPRs (if that's possible at all - > Vlad?)

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-28 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #27 from Vladimir Makarov --- (In reply to Richard Biener from comment #17) > So in .reload we have (with unpatched trunk) > > 401: NOTE_INSN_BASIC_BLOCK 6 > 462: ax:DF=[`*.LC0'] > REG_EQUAL 9.850689972416730997792

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-02-09 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #28 from Vladimir Makarov --- Could somebody benchmark the following patch on zen2 470.lbm. diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc index 9cee17479ba..76619aca8eb 100644 --- a/gcc/lra-constraints.cc +++ b/gcc/lr

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #29 from Richard Biener --- (In reply to Vladimir Makarov from comment #28) > Could somebody benchmark the following patch on zen2 470.lbm. > > diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc > index 9cee17479ba..76619a

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-02-10 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #30 from Vladimir Makarov --- (In reply to Richard Biener from comment #29) > (In reply to Vladimir Makarov from comment #28) > > Could somebody benchmark the following patch on zen2 470.lbm. > > Code generation changes quite a bit,

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-04-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 Richard Biener changed: What|Removed |Added Last reconfirmed|2021-09-03 00:00:00 |2022-4-11 --- Comment #31 from Richard

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-04-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #32 from Richard Biener --- So the bad "head" can be fixed via diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index c74edd1aaef..8f9f26e0a82 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -3580

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-04-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #33 from Richard Biener --- (In reply to Richard Biener from comment #32) > The diff with ! added is quite short, I've yet have to measure any > effect on LBM: > > --- streamcollide.s.orig2022-04-25 11:37:01.638733951 +0200

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-04-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #34 from Richard Biener --- As noted the effect of if(...) { ux = 0.005; uy = 0.002; uz = 0.000; } is PRE of most(!) dependent instructions, creating # prephitmp_1099 = PHI <_1098(6), 6.4997172499889149648879538

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-04-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 Richard Biener changed: What|Removed |Added Target Milestone|12.0|13.0 Priority|P1

Re: [Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread Jan Hubicka via Gcc-bugs
> > According to znver2_cost > > > > Cost of sse_to_integer is a little bit less than fp_store, maybe increase > > sse_to_integer cost(more than fp_store) can helps RA to choose memory > > instead of GPR. > > That sounds reasonable - GPR<->xmm is cheaper than GPR -> stack -> xmm > but GPR<->xmm s

Re: [Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

2022-01-27 Thread Jan Hubicka via Gcc-bugs
> I would say so. It saves code size and also uop space unless the two > can magically fuse to a immediate to %xmm move (I doubt that). I made simple benchmark double a=10; int main() { long int i; double sum,val1,val2,val3,val4; for (i=0;i<10;i++) { #if