[Bug libstdc++/116140] [15 Regression] 5-35% slowdown of 483.xalancbmk and 523.xalancbmk_r since r15-2356-ge69456ff9a54ba

2024-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140 --- Comment #4 from Tamar Christina --- It looks like it's because the old unrolled code for the pointer version did a subtract and used the difference to optimize the IV check away to every 4 elements. This explains the increase in

[Bug libstdc++/116140] [15 Regression] 5-35% slowdown of 483.xalancbmk and 523.xalancbmk_r since r15-2356-ge69456ff9a54ba

2024-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140 --- Comment #3 from Tamar Christina --- (In reply to Jan Hubicka from comment #2) > Looking at the change, I do not see how that could disable inlining. It > should only reduce size of the function size estimates in the heuristics. > > I think

[Bug fortran/90608] Inline non-scalar minloc/maxloc calls

2024-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608 --- Comment #24 from Tamar Christina --- (In reply to Mikael Morin from comment #23) > (In reply to Mikael Morin from comment #21) > > > > (...) and should be able to submit the first > > series (inline minloc without dim argument) this week. >

[Bug target/115974] sat_add, etc. vector patterns not done for aarch64 (non-sve)

2024-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115974 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org

[Bug target/116145] Suboptimal SVE immediate synthesis

2024-07-31 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145 --- Comment #5 from Tamar Christina --- (In reply to ktkachov from comment #4) > Intersting, thanks for the background. The bigger issue I was seeing was > with a string-matching loop like https://godbolt.org/z/E7b13915E where the > constant

[Bug target/116145] Suboptimal SVE immediate synthesis

2024-07-31 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug libstdc++/116140] [15 Regression] 5-35% slowdown of 483.xalancbmk and 523.xalancbmk_r since r15-2356-ge69456ff9a54ba

2024-07-30 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140 --- Comment #1 from Tamar Christina --- Yeah, we've noticed it as well. The weird thing is that the dynamic instruction count went up by a lot. So it looks like some inlining or something did not happen.

[Bug target/116074] [15 regression] ICE when building harfbuzz-9.0.0 on arm64 (related_int_vector_mode, at stor-layout.cc:581)

2024-07-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116074 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-07-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 116074, which changed state. Bug 116074 Summary: [15 regression] ICE when building harfbuzz-9.0.0 on arm64 (related_int_vector_mode, at stor-layout.cc:581) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116074

[Bug tree-optimization/116074] [15 regression] ICE when building harfbuzz-9.0.0 on arm64 (related_int_vector_mode, at stor-layout.cc:581)

2024-07-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116074 --- Comment #8 from Tamar Christina --- Going with a backend fix instead.

[Bug tree-optimization/116074] [15 regression] ICE when building harfbuzz-9.0.0 on arm64 (related_int_vector_mode, at stor-layout.cc:581)

2024-07-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116074 --- Comment #7 from Tamar Christina --- The backend is returning TImode for get_vectype_for_scalar_type for historical reasons where large integer modes were considered struct types and this vector modes. However they're not modes the

[Bug tree-optimization/116074] [15 regression] ICE when building harfbuzz-9.0.0 on arm64 (related_int_vector_mode, at stor-layout.cc:581)

2024-07-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116074 Tamar Christina changed: What|Removed |Added Last reconfirmed||2024-07-25

[Bug fortran/90608] Inline non-scalar minloc/maxloc calls

2024-07-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608 --- Comment #22 from Tamar Christina --- (In reply to Mikael Morin from comment #21) > (In reply to Tamar Christina from comment #20) > > Hi Mikael, > > > > I did regression testing on x86_64 and AArch64 and only found one test-ism. > > > > I

[Bug ipa/106783] [12/13/14/15 Regression] ICE in ipa-modref.cc:analyze_function since r12-5247-ga34edf9a3e907de2

2024-07-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106783 --- Comment #8 from Tamar Christina --- (In reply to Jan Hubicka from comment #6) > The problem is that n/=0 is undefined behavior (so we can optimize out call > to function doing divide by zero), while __builtin_trap is observable and we > do

[Bug fortran/90608] Inline non-scalar minloc/maxloc calls

2024-07-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608 --- Comment #20 from Tamar Christina --- Hi Mikael, I did regression testing on x86_64 and AArch64 and only found one test-ism. I think I understand most of the patch to be able to deal with any fallout, would it be ok if I fix the test-ism

[Bug tree-optimization/115531] vectorizer generates inefficient code for masked conditional update loops

2024-07-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-07-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 115531, which changed state. Bug 115531 Summary: vectorizer generates inefficient code for masked conditional update loops https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531 What|Removed

[Bug tree-optimization/115936] [15 Regression] GCN vs. ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

2024-07-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/115936] [15 Regression] GCN vs. ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

2024-07-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936 --- Comment #6 from Tamar Christina --- (In reply to Richard Biener from comment #3) > iv->step should never be a pointer type This is created by SCEV. simple_iv_with_niters in the case where no CHREC is found creates an IV with base == ev,

[Bug target/115936] [15 Regression] GCN vs. ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

2024-07-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936 --- Comment #5 from Tamar Christina --- (In reply to Richard Biener from comment #3) > iv->step should never be a pointer type This is created by SCEV. simple_iv_with_niters in the case where no CHREC is found creates an IV with base == ev,

[Bug target/115934] [15 Regression] nvptx vs. ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

2024-07-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115934 --- Comment #7 from Tamar Christina --- (In reply to Thomas Schwinge from comment #6) > Tamar, Richard, thanks for having a look. > > (In reply to Tamar Christina from comment #4) > > This one looks a bit like costing, [...] > > I see. So we

[Bug target/115936] [15 Regression] GCN vs. ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

2024-07-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936 --- Comment #4 from Tamar Christina --- (In reply to Richard Biener from comment #3) > iv->step should never be a pointer type That's what I initially thought too. My suspicion is that there is some code that tries to create the 0 offset.

[Bug target/115934] [15 Regression] nvptx vs. ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

2024-07-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115934 --- Comment #4 from Tamar Christina --- This one looks a bit like costing, before the patch IVopts had: : inv_expr 1: -element_7(D) inv_expr 2: (signed int) rite_5(D) - (signed int) element_7(D) and after the patch it generates a few

[Bug target/115936] [15 Regression] GCN vs. ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

2024-07-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936 Tamar Christina changed: What|Removed |Added Target Milestone|--- |15.0 --- Comment #2 from Tamar

[Bug target/115936] [15 Regression] GCN vs. ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

2024-07-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 CC|

[Bug target/115934] [15 Regression] nvptx vs. ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

2024-07-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115934 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug target/115934] [15 Regression] nvptx vs. ivopts: replace constant_multiple_of with aff_combination_constant_multiple_p [PR114932]

2024-07-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115934 --- Comment #1 from Tamar Christina --- Hi, thanks for the report, could you tell me a target triple I can use for nvptx?

[Bug tree-optimization/115866] missed optimization vectorizing switch statements.

2024-07-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug tree-optimization/115866] New: missed optimization vectorizing switch statements.

2024-07-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 Bug ID: 115866 Summary: missed optimization vectorizing switch statements. Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug libstdc++/115799] ranges::find's optimized branching for memchr is not quite right

2024-07-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115799 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug tree-optimization/104265] Missed vectorization in 526.blender_r

2024-07-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104265 --- Comment #5 from Tamar Christina --- Also for fully masked architectures we can instead of recreating the vectors just mask out the irrelevant values. But we should still order the exits based on complexity.

[Bug tree-optimization/104265] Missed vectorization in 526.blender_r

2024-07-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104265 --- Comment #4 from Tamar Christina --- (In reply to Richard Biener from comment #3) > Note the SLP discovery opportunity is from the "reduction" PHI to the > return which merges control flow to a zero/one flag. Right, so I get what you mean

[Bug fortran/90608] Inline non-scalar minloc/maxloc calls

2024-07-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608 --- Comment #19 from Tamar Christina --- Hi Mikael, It looks like the last version of your patch already gets inlined in the call sites we cared about. Would it be possible for you to upstream it?

[Bug c++/115623] ICE: Segmentation fault in finish_for_cond with novector and almost infinite loop

2024-07-04 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/115629] Inefficient if-convert of masked conditionals

2024-07-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629 --- Comment #6 from Tamar Christina --- (In reply to rguent...@suse.de from comment #5) > > In this case, the second load is conditional on the first load mask, which > > means it's already done an AND. > > And crucially inverting it means you

[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)

2024-07-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545 --- Comment #12 from Tamar Christina --- I had a bug in the benchmark, I forgot to set taskset, These are the correct ones: ++---+-+-+ | NEEDLE | scalar 1x | vect| memchr |

[Bug tree-optimization/115629] Inefficient if-convert of masked conditionals

2024-07-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629 --- Comment #4 from Tamar Christina --- (In reply to Richard Biener from comment #3) > So we now tail-merge the two b[i] loading blocks. Can you check SVE > code-gen with this? If that fixes the PR consider adding a SVE testcase. Thanks, the

[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)

2024-07-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545 --- Comment #11 from Tamar Christina --- (In reply to Jonathan Wakely from comment #9) > Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653731.html > > Rerunning benchmarks with this patch would be very welcome. OK, I have

[Bug tree-optimization/115120] Bad interaction between ivcanon and early break vectorization

2024-06-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120 --- Comment #5 from Tamar Christina --- considering ivopts bails out on doloop prediction for multiple exits anyway, what do you think about: diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc index

[Bug tree-optimization/115629] New: Inefficient if-convert of masked conditionals

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629 Bug ID: 115629 Summary: Inefficient if-convert of masked conditionals Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/115531] vectorizer generates inefficient code for masked conditional update loops

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug c++/115623] ICE: Segmentation fault in finish_for_cond with novector and almost infinite loop

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623 --- Comment #4 from Tamar Christina --- novect3.c: In function 'void f(char*, int)': novect3.c:4:9: error: missing loop condition in loop with 'GCC novector' pragma before ';' token 4 | for (;;i++) | should do it, will

[Bug c++/115623] ICE: Segmentation fault in finish_for_cond with novector and almost infinite loop

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623 Tamar Christina changed: What|Removed |Added Status|NEW |ASSIGNED

[Bug tree-optimization/115120] Bad interaction between ivcanon and early break vectorization

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120 --- Comment #4 from Tamar Christina --- You asked why this doesn't happen with a normal vector loop Richi. For a normal loop when IVcannon adds the downward counting loop there are two main differences. 1. for a single exit loop, the downward

[Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452

2024-06-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597 --- Comment #4 from Tamar Christina --- (In reply to Richard Biener from comment #2) > Ah, I feared this would happen - this case seems to be because of a lot of > VEC_PERM nodes(?) which are not handled by the CSE process as well as the >

[Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452

2024-06-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597 --- Comment #3 from Tamar Christina --- > > Can you check whether that fixes the issue? > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 9465d94de1a..212d5f97f7d 100644 > --- a/gcc/tree-vect-slp.cc > +++

[Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452

2024-06-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597 Bug ID: 115597 Summary: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 Product: gcc

[Bug middle-end/115534] intermediate stack use not eliminated

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534 --- Comment #5 from Tamar Christina --- (In reply to Andrew Pinski from comment #4) > This might be improved by > https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654819.html . Or it > might be the case the vectorizer case needs to be

[Bug tree-optimization/115537] [15 Regression] vectorizable_reduction ICEs after g:d66b820f392aa9a7c34d3cddaf3d7c73bf23f82d

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115537 --- Comment #5 from Tamar Christina --- Thanks for the fix! I think the testcase needs SVE enabled to ICE no? shouldn't that be -mcpu=neoverse-v1 and not -mcpu=neoverse-n1?

[Bug middle-end/115534] intermediate stack use not eliminated

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534 --- Comment #2 from Tamar Christina --- (In reply to Andrew Pinski from comment #1) > I suspect there is a dup of this already. See the bug which I made this one > blocking for a list of related bugs. Most of the other bugs relate to the

[Bug tree-optimization/115537] New: [15 Regression] vectorizable_reduction ICEs after g:d66b820f392aa9a7c34d3cddaf3d7c73bf23f82d

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115537 Bug ID: 115537 Summary: [15 Regression] vectorizable_reduction ICEs after g:d66b820f392aa9a7c34d3cddaf3d7c73bf23f82d Product: gcc Version: 15.0 Status: UNCONFIRMED

[Bug tree-optimization/115534] New: intermediate stack use not eliminated

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534 Bug ID: 115534 Summary: intermediate stack use not eliminated Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/115531] vectorizer generates inefficient code for masked conditional update loops

2024-06-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #1) > I suspect PR 20999 would fix this ... > but we have to be careful since without masked stores, you could still > vectorize this unlike the transformed

[Bug tree-optimization/115531] New: vectorizer generates inefficient code for masked conditional update loops

2024-06-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531 Bug ID: 115531 Summary: vectorizer generates inefficient code for masked conditional update loops Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords:

[Bug target/115464] [14 Backport] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 --- Comment #10 from Tamar Christina --- Thanks for the fix, but I don't think it's sufficient. what I meant with the earlier comment was that the subregs are broken in general, so not just the one generated by the undef fast path. i.e.

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 --- Comment #7 from Tamar Christina --- (In reply to Tamar Christina from comment #6) > (In reply to Richard Sandiford from comment #5) > > In this kind of situation, we should go through a fresh pseudo rather than > > try to take the subreg

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 --- Comment #6 from Tamar Christina --- (In reply to Richard Sandiford from comment #5) > In this kind of situation, we should go through a fresh pseudo rather than > try to take the subreg directly. I did try that but fwprop pushed it back

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 Tamar Christina changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org ---

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 Tamar Christina changed: What|Removed |Added Last reconfirmed||2024-06-12 CC|

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #15 from Tamar Christina --- (In reply to rguent...@suse.de from comment #14) > On Thu, 6 Jun 2024, tnfchris at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 > > > > --- Comment #13 from Tamar

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #13 from Tamar Christina --- (In reply to rguent...@suse.de from comment #12) > > since we don't care about overflow here, it looks like the stripping should > > be recursive as long as it's a NOP expression between two integral

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #11 from Tamar Christina --- (In reply to Richard Biener from comment #10) > I think the question is why IVOPTs ends up using both the signed and > unsigned variant of the same IV instead of expressing all uses of both with > one

[Bug tree-optimization/54013] Loop with control flow not vectorized

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013 Tamar Christina changed: What|Removed |Added Blocks||115130 --- Comment #4 from Tamar

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #9 from Tamar Christina --- It's taken me a bit of time to track down all the reasons for the speedup with the earlier patch. This comes from two parts: 1. Signed IVs don't get simplified. Due to possible UB with signed overflows

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-05-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 --- Comment #9 from Tamar Christina --- (In reply to prathamesh3492 from comment #8) > Hi Tamar, > Using -falign-loops=5 indeed brings back the performance. > The adrp instruction has same address (0x4ae784) by setting -falign-loops=5 > (which

[Bug tree-optimization/115130] (early-break) [meta-bug] early break vectorization

2024-05-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed|

[Bug tree-optimization/115130] New: (early-break) [meta-bug] early break vectorization

2024-05-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 Bug ID: 115130 Summary: (early-break) [meta-bug] early break vectorization Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: meta-bug, missed-optimization

[Bug tree-optimization/115120] Bad interaction between ivcanon and early break vectorization

2024-05-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120 --- Comment #3 from Tamar Christina --- That makes sense, though I also wonder how it works for scalar multi exit loops, IVops has various checks on single exits. I guess one problem is that the code in IVops that does this uses the exit to

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-05-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 --- Comment #7 from Tamar Christina --- Yeah, it's most likely an alignment issue, especially as there's no code changes. We run our benchmarking with different flags so it may be why we don't see it. the loop seems misaligned, you can try

[Bug target/114412] [14/15 Regression] 7% slowdown of 436.cactusADM on aarch64

2024-05-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114412 --- Comment #5 from Tamar Christina --- (In reply to Filip Kastl from comment #4) > (In reply to Tamar Christina from comment #3) > > Hi Filip, > > > > Do you generate these runs with counters based PGO or compiler > > instrumentation? > > >

[Bug target/115087] New: dead block not eliminated in SVE intrinsics code

2024-05-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115087 Bug ID: 115087 Summary: dead block not eliminated in SVE intrinsics code Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug target/114412] [14/15 Regression] 7% slowdown of 436.cactusADM on aarch64

2024-05-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114412 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug tree-optimization/114932] Improvement in CHREC can give large performance gains

2024-05-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug tree-optimization/114932] Improvement in CHREC can give large performance gains

2024-05-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #6 from Tamar Christina --- Created attachment 58096 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58096=edit exchange2.fppized-bad.f90.187t.ivopts

[Bug tree-optimization/114932] Improvement in CHREC can give large performance gains

2024-05-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #5 from Tamar Christina --- Created attachment 58095 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58095=edit exchange2.fppized-good.f90.187t.ivopts

[Bug tree-optimization/114932] Improvement in CHREC can give large performance gains

2024-05-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #4 from Tamar Christina --- reduced more: --- module brute_force integer, parameter :: r=9 integer block(r, r, 0) contains subroutine brute do do do do do

[Bug tree-optimization/114932] Improvement in CHREC can give large performance gains

2024-05-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #2) > > which is harder for prefetchers to follow. > > This seems like a limitation in the HW prefetcher rather than anything else. > Maybe the cost model for

[Bug tree-optimization/114932] New: Improvement in CHREC can give large performance gains

2024-05-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 Bug ID: 114932 Summary: Improvement in CHREC can give large performance gains Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity:

[Bug ipa/92538] Proposal for IPA init() constant propagation

2024-05-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92538 Tamar Christina changed: What|Removed |Added CC||jamborm at gcc dot gnu.org ---

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-05-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 --- Comment #3 from Tamar Christina --- I cannot reproduce this even recompiling libc.

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 --- Comment #1 from Tamar Christina --- Hmm I Am unable to reproduce this with -O3 - flto -mcpu=neoverse-v2 on a neoverse-v2 machine. Is any other option required? Also that code was new in gcc 14 and was partially reverted due to register

[Bug rtl-optimization/114766] ^ constraint modifier unexpectedly affects register class selection.

2024-04-20 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766 --- Comment #2 from Tamar Christina --- (In reply to Vladimir Makarov from comment #1) > (In reply to Tamar Christina from comment #0) > > The documentation for ^ states: > > If it works for you, we could try to use the patch (although it needs

[Bug tree-optimization/114769] [14 Regression] Suspicious code in vect_recog_sad_pattern() since r14-1832

2024-04-19 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114769 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/114769] [14 Regression] Suspicious code in vect_recog_sad_pattern() since r14-1832

2024-04-19 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114769 --- Comment #2 from Tamar Christina --- I believe this is safe, but the interface is definitely not the cleanest. vect_recog_absolute_difference has two callers: 1. vect_recog_sad_pattern where if you return true with unprom not set, then

[Bug target/113625] Interesting behavior with and without -mcpu=generic

2024-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113625 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug rtl-optimization/114766] New: ^ constraint modifier unexpectedly affects register class selection.

2024-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766 Bug ID: 114766 Summary: ^ constraint modifier unexpectedly affects register class selection. Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords:

[Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations

2024-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/114513] [11/12/13/14 Regression] [aarch64] floating-point registers are used when GPRs are preferred

2024-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114513 Bug 114513 depends on bug 114741, which changed state. Bug 114741 Summary: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 What|Removed

[Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations

2024-04-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations

2024-04-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 --- Comment #6 from Tamar Christina --- and the exact armv9-a cost model you quoted, also does the right codegen. https://godbolt.org/z/obafoT6cj There is just an inexplicable penalty being applied to the r->r alternative.

[Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations

2024-04-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org,

[Bug tree-optimization/113552] [11/12/13 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-04-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #26 from Tamar Christina --- (In reply to Richard Biener from comment #25) > That means, when the loop takes the early exit we _must_ take that during > the vector iterations. Peeling for gaps means if we would take the early >

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #24 from Tamar Christina --- (In reply to Richard Biener from comment #23) > Maybe easier to understand testcase: > > with -O3 -msse4.1 -fno-vect-cost-model we return 20 instead of 8. Adding > -fdisable-tree-cunroll avoids the

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #22 from Tamar Christina --- note that due to the secondary exit the actual full vector iteration count is 8 scalar elements at VF=4 == 2. And it's this boundary condition where we fail, since ceil (8/4) == 2. any other value would

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #21 from Tamar Christina --- Created attachment 57932 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57932=edit loop.c attached reduced testcase that reproduces the issue and also checks the buffer position and copied values.

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #6 from Tamar Christina --- (In reply to Jakub Jelinek from comment #4) > Now, with SVE/RISCV vectors the actual vectorization factor is a poly_int > rather than constant. One possibility would be to use VLA arrays in those >

[Bug tree-optimization/114635] New: OpenMP reductions fail dependency analysis

2024-04-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 Bug ID: 114635 Summary: OpenMP reductions fail dependency analysis Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

  1   2   3   4   5   6   7   8   >