[Bug libstdc++/115799] ranges::find's optimized branching for memchr is not quite right

2024-07-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115799 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug tree-optimization/104265] Missed vectorization in 526.blender_r

2024-07-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104265 --- Comment #5 from Tamar Christina --- Also for fully masked architectures we can instead of recreating the vectors just mask out the irrelevant values. But we should still order the exits based on complexity.

[Bug tree-optimization/104265] Missed vectorization in 526.blender_r

2024-07-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104265 --- Comment #4 from Tamar Christina --- (In reply to Richard Biener from comment #3) > Note the SLP discovery opportunity is from the "reduction" PHI to the > return which merges control flow to a zero/one flag. Right, so I get what you mean

[Bug fortran/90608] Inline non-scalar minloc/maxloc calls

2024-07-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608 --- Comment #19 from Tamar Christina --- Hi Mikael, It looks like the last version of your patch already gets inlined in the call sites we cared about. Would it be possible for you to upstream it?

[Bug c++/115623] ICE: Segmentation fault in finish_for_cond with novector and almost infinite loop

2024-07-04 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/115629] Inefficient if-convert of masked conditionals

2024-07-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629 --- Comment #6 from Tamar Christina --- (In reply to rguent...@suse.de from comment #5) > > In this case, the second load is conditional on the first load mask, which > > means it's already done an AND. > > And crucially inverting it means you

[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)

2024-07-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545 --- Comment #12 from Tamar Christina --- I had a bug in the benchmark, I forgot to set taskset, These are the correct ones: ++---+-+-+ | NEEDLE | scalar 1x | vect| memchr |

[Bug tree-optimization/115629] Inefficient if-convert of masked conditionals

2024-07-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629 --- Comment #4 from Tamar Christina --- (In reply to Richard Biener from comment #3) > So we now tail-merge the two b[i] loading blocks. Can you check SVE > code-gen with this? If that fixes the PR consider adding a SVE testcase. Thanks, the

[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)

2024-07-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545 --- Comment #11 from Tamar Christina --- (In reply to Jonathan Wakely from comment #9) > Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653731.html > > Rerunning benchmarks with this patch would be very welcome. OK, I have

[Bug tree-optimization/115120] Bad interaction between ivcanon and early break vectorization

2024-06-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120 --- Comment #5 from Tamar Christina --- considering ivopts bails out on doloop prediction for multiple exits anyway, what do you think about: diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc index

[Bug tree-optimization/115629] New: Inefficient if-convert of masked conditionals

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629 Bug ID: 115629 Summary: Inefficient if-convert of masked conditionals Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/115531] vectorizer generates inefficient code for masked conditional update loops

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug c++/115623] ICE: Segmentation fault in finish_for_cond with novector and almost infinite loop

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623 --- Comment #4 from Tamar Christina --- novect3.c: In function 'void f(char*, int)': novect3.c:4:9: error: missing loop condition in loop with 'GCC novector' pragma before ';' token 4 | for (;;i++) | should do it, will

[Bug c++/115623] ICE: Segmentation fault in finish_for_cond with novector and almost infinite loop

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623 Tamar Christina changed: What|Removed |Added Status|NEW |ASSIGNED

[Bug tree-optimization/115120] Bad interaction between ivcanon and early break vectorization

2024-06-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120 --- Comment #4 from Tamar Christina --- You asked why this doesn't happen with a normal vector loop Richi. For a normal loop when IVcannon adds the downward counting loop there are two main differences. 1. for a single exit loop, the downward

[Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452

2024-06-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597 --- Comment #4 from Tamar Christina --- (In reply to Richard Biener from comment #2) > Ah, I feared this would happen - this case seems to be because of a lot of > VEC_PERM nodes(?) which are not handled by the CSE process as well as the >

[Bug middle-end/115597] [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452

2024-06-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597 --- Comment #3 from Tamar Christina --- > > Can you check whether that fixes the issue? > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 9465d94de1a..212d5f97f7d 100644 > --- a/gcc/tree-vect-slp.cc > +++

[Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452

2024-06-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597 Bug ID: 115597 Summary: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 Product: gcc

[Bug middle-end/115534] intermediate stack use not eliminated

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534 --- Comment #5 from Tamar Christina --- (In reply to Andrew Pinski from comment #4) > This might be improved by > https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654819.html . Or it > might be the case the vectorizer case needs to be

[Bug tree-optimization/115537] [15 Regression] vectorizable_reduction ICEs after g:d66b820f392aa9a7c34d3cddaf3d7c73bf23f82d

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115537 --- Comment #5 from Tamar Christina --- Thanks for the fix! I think the testcase needs SVE enabled to ICE no? shouldn't that be -mcpu=neoverse-v1 and not -mcpu=neoverse-n1?

[Bug middle-end/115534] intermediate stack use not eliminated

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534 --- Comment #2 from Tamar Christina --- (In reply to Andrew Pinski from comment #1) > I suspect there is a dup of this already. See the bug which I made this one > blocking for a list of related bugs. Most of the other bugs relate to the

[Bug tree-optimization/115537] New: [15 Regression] vectorizable_reduction ICEs after g:d66b820f392aa9a7c34d3cddaf3d7c73bf23f82d

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115537 Bug ID: 115537 Summary: [15 Regression] vectorizable_reduction ICEs after g:d66b820f392aa9a7c34d3cddaf3d7c73bf23f82d Product: gcc Version: 15.0 Status: UNCONFIRMED

[Bug tree-optimization/115534] New: intermediate stack use not eliminated

2024-06-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534 Bug ID: 115534 Summary: intermediate stack use not eliminated Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/115531] vectorizer generates inefficient code for masked conditional update loops

2024-06-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #1) > I suspect PR 20999 would fix this ... > but we have to be careful since without masked stores, you could still > vectorize this unlike the transformed

[Bug tree-optimization/115531] New: vectorizer generates inefficient code for masked conditional update loops

2024-06-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531 Bug ID: 115531 Summary: vectorizer generates inefficient code for masked conditional update loops Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords:

[Bug target/115464] [14 Backport] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 --- Comment #10 from Tamar Christina --- Thanks for the fix, but I don't think it's sufficient. what I meant with the earlier comment was that the subregs are broken in general, so not just the one generated by the undef fast path. i.e.

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 --- Comment #7 from Tamar Christina --- (In reply to Tamar Christina from comment #6) > (In reply to Richard Sandiford from comment #5) > > In this kind of situation, we should go through a fresh pseudo rather than > > try to take the subreg

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 --- Comment #6 from Tamar Christina --- (In reply to Richard Sandiford from comment #5) > In this kind of situation, we should go through a fresh pseudo rather than > try to take the subreg directly. I did try that but fwprop pushed it back

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 Tamar Christina changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org ---

[Bug target/115464] ICE when building libaom on arm64 (neon sve bridge usage with tbl/perm)

2024-06-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 Tamar Christina changed: What|Removed |Added Last reconfirmed||2024-06-12 CC|

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #15 from Tamar Christina --- (In reply to rguent...@suse.de from comment #14) > On Thu, 6 Jun 2024, tnfchris at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 > > > > --- Comment #13 from Tamar

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #13 from Tamar Christina --- (In reply to rguent...@suse.de from comment #12) > > since we don't care about overflow here, it looks like the stripping should > > be recursive as long as it's a NOP expression between two integral

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #11 from Tamar Christina --- (In reply to Richard Biener from comment #10) > I think the question is why IVOPTs ends up using both the signed and > unsigned variant of the same IV instead of expressing all uses of both with > one

[Bug tree-optimization/54013] Loop with control flow not vectorized

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013 Tamar Christina changed: What|Removed |Added Blocks||115130 --- Comment #4 from Tamar

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #9 from Tamar Christina --- It's taken me a bit of time to track down all the reasons for the speedup with the earlier patch. This comes from two parts: 1. Signed IVs don't get simplified. Due to possible UB with signed overflows

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-05-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 --- Comment #9 from Tamar Christina --- (In reply to prathamesh3492 from comment #8) > Hi Tamar, > Using -falign-loops=5 indeed brings back the performance. > The adrp instruction has same address (0x4ae784) by setting -falign-loops=5 > (which

[Bug tree-optimization/115130] (early-break) [meta-bug] early break vectorization

2024-05-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed|

[Bug tree-optimization/115130] New: (early-break) [meta-bug] early break vectorization

2024-05-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 Bug ID: 115130 Summary: (early-break) [meta-bug] early break vectorization Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: meta-bug, missed-optimization

[Bug tree-optimization/115120] Bad interaction between ivcanon and early break vectorization

2024-05-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120 --- Comment #3 from Tamar Christina --- That makes sense, though I also wonder how it works for scalar multi exit loops, IVops has various checks on single exits. I guess one problem is that the code in IVops that does this uses the exit to

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-05-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 --- Comment #7 from Tamar Christina --- Yeah, it's most likely an alignment issue, especially as there's no code changes. We run our benchmarking with different flags so it may be why we don't see it. the loop seems misaligned, you can try

[Bug target/114412] [14/15 Regression] 7% slowdown of 436.cactusADM on aarch64

2024-05-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114412 --- Comment #5 from Tamar Christina --- (In reply to Filip Kastl from comment #4) > (In reply to Tamar Christina from comment #3) > > Hi Filip, > > > > Do you generate these runs with counters based PGO or compiler > > instrumentation? > > >

[Bug target/115087] New: dead block not eliminated in SVE intrinsics code

2024-05-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115087 Bug ID: 115087 Summary: dead block not eliminated in SVE intrinsics code Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug target/114412] [14/15 Regression] 7% slowdown of 436.cactusADM on aarch64

2024-05-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114412 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug tree-optimization/114932] Improvement in CHREC can give large performance gains

2024-05-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug tree-optimization/114932] Improvement in CHREC can give large performance gains

2024-05-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #6 from Tamar Christina --- Created attachment 58096 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58096=edit exchange2.fppized-bad.f90.187t.ivopts

[Bug tree-optimization/114932] Improvement in CHREC can give large performance gains

2024-05-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #5 from Tamar Christina --- Created attachment 58095 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58095=edit exchange2.fppized-good.f90.187t.ivopts

[Bug tree-optimization/114932] Improvement in CHREC can give large performance gains

2024-05-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #4 from Tamar Christina --- reduced more: --- module brute_force integer, parameter :: r=9 integer block(r, r, 0) contains subroutine brute do do do do do

[Bug tree-optimization/114932] Improvement in CHREC can give large performance gains

2024-05-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #2) > > which is harder for prefetchers to follow. > > This seems like a limitation in the HW prefetcher rather than anything else. > Maybe the cost model for

[Bug tree-optimization/114932] New: Improvement in CHREC can give large performance gains

2024-05-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 Bug ID: 114932 Summary: Improvement in CHREC can give large performance gains Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity:

[Bug ipa/92538] Proposal for IPA init() constant propagation

2024-05-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92538 Tamar Christina changed: What|Removed |Added CC||jamborm at gcc dot gnu.org ---

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-05-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 --- Comment #3 from Tamar Christina --- I cannot reproduce this even recompiling libc.

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug target/114860] [14/15 regression] [aarch64] 511.povray regresses by ~5.5% with -O3 -flto -march=native -mcpu=neoverse-v2 since r14-10014-ga2f4be3dae04fa

2024-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860 --- Comment #1 from Tamar Christina --- Hmm I Am unable to reproduce this with -O3 - flto -mcpu=neoverse-v2 on a neoverse-v2 machine. Is any other option required? Also that code was new in gcc 14 and was partially reverted due to register

[Bug rtl-optimization/114766] ^ constraint modifier unexpectedly affects register class selection.

2024-04-20 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766 --- Comment #2 from Tamar Christina --- (In reply to Vladimir Makarov from comment #1) > (In reply to Tamar Christina from comment #0) > > The documentation for ^ states: > > If it works for you, we could try to use the patch (although it needs

[Bug tree-optimization/114769] [14 Regression] Suspicious code in vect_recog_sad_pattern() since r14-1832

2024-04-19 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114769 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/114769] [14 Regression] Suspicious code in vect_recog_sad_pattern() since r14-1832

2024-04-19 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114769 --- Comment #2 from Tamar Christina --- I believe this is safe, but the interface is definitely not the cleanest. vect_recog_absolute_difference has two callers: 1. vect_recog_sad_pattern where if you return true with unprom not set, then

[Bug target/113625] Interesting behavior with and without -mcpu=generic

2024-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113625 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug rtl-optimization/114766] New: ^ constraint modifier unexpectedly affects register class selection.

2024-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766 Bug ID: 114766 Summary: ^ constraint modifier unexpectedly affects register class selection. Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords:

[Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations

2024-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/114513] [11/12/13/14 Regression] [aarch64] floating-point registers are used when GPRs are preferred

2024-04-18 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114513 Bug 114513 depends on bug 114741, which changed state. Bug 114741 Summary: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 What|Removed

[Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations

2024-04-17 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations

2024-04-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 --- Comment #6 from Tamar Christina --- and the exact armv9-a cost model you quoted, also does the right codegen. https://godbolt.org/z/obafoT6cj There is just an inexplicable penalty being applied to the r->r alternative.

[Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations

2024-04-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org,

[Bug tree-optimization/113552] [11/12/13 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-04-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #26 from Tamar Christina --- (In reply to Richard Biener from comment #25) > That means, when the loop takes the early exit we _must_ take that during > the vector iterations. Peeling for gaps means if we would take the early >

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #24 from Tamar Christina --- (In reply to Richard Biener from comment #23) > Maybe easier to understand testcase: > > with -O3 -msse4.1 -fno-vect-cost-model we return 20 instead of 8. Adding > -fdisable-tree-cunroll avoids the

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #22 from Tamar Christina --- note that due to the secondary exit the actual full vector iteration count is 8 scalar elements at VF=4 == 2. And it's this boundary condition where we fail, since ceil (8/4) == 2. any other value would

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #21 from Tamar Christina --- Created attachment 57932 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57932=edit loop.c attached reduced testcase that reproduces the issue and also checks the buffer position and copied values.

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #6 from Tamar Christina --- (In reply to Jakub Jelinek from comment #4) > Now, with SVE/RISCV vectors the actual vectorization factor is a poly_int > rather than constant. One possibility would be to use VLA arrays in those >

[Bug tree-optimization/114635] New: OpenMP reductions fail dependency analysis

2024-04-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 Bug ID: 114635 Summary: OpenMP reductions fail dependency analysis Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug target/114577] New: Inefficient codegen for SVE/NEON bridge

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114577 Bug ID: 114577 Summary: Inefficient codegen for SVE/NEON bridge Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug target/114510] [14 Regression] missed proping of multiply by 2 into address of load/stores

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114510 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org ---

[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org

[Bug rtl-optimization/114575] New: [14 Regression] SVE addressing modes broken since g:839bc42772ba7af66af3bd16efed4a69511312ae

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114575 Bug ID: 114575 Summary: [14 Regression] SVE addressing modes broken since g:839bc42772ba7af66af3bd16efed4a69511312ae Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug rtl-optimization/113682] Branches in branchless binary search rather than cmov/csel/csinc

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682 --- Comment #9 from Tamar Christina --- (In reply to Andrew Pinski from comment #8) > This might be the path splitting running on the gimple level causing issues > too; see PR 112402 . Ah that's a good shout. It looks like Richi already

[Bug rtl-optimization/113682] Branches in branchless binary search rather than cmov/csel/csinc

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682 Tamar Christina changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #20 from Tamar Christina --- This is a bad interaction with early break and peeling for gaps. when peeling for gaps we set bias_for_lowest to 0, which then negates the ceil for the upper bound calculation when the div is exact. We

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 Tamar Christina changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed|

[Bug tree-optimization/114345] FRE missing knowledge of semantics of IFN loads

2024-03-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345 --- Comment #5 from Tamar Christina --- (In reply to Richard Biener from comment #4) > Well, the shuffling in .LOAD_LANES will be a bit awkward to do, but sure. We > basically lack "constant folding" of .LOAD_LANES and similarly of course > we

[Bug target/114350] New: missing support for SVE widening floating point conversion

2024-03-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114350 Bug ID: 114350 Summary: missing support for SVE widening floating point conversion Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords:

[Bug tree-optimization/114345] FRE missing knowledge of semantics of IFN loads

2024-03-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #2) > Oh VN does have some knowledge of MASK_STORE and LEN_STORE. Just not > LOAD_LANES . > > > See PR 106365 for MASK_STORE and LEN_STORE implementation.

[Bug tree-optimization/114346] New: vectorizer generates the same IV twice

2024-03-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114346 Bug ID: 114346 Summary: vectorizer generates the same IV twice Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/114345] New: FRE missing knowledge of semantics of IFN loads

2024-03-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345 Bug ID: 114345 Summary: FRE missing knowledge of semantics of IFN loads Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 --- Comment #6 from Tamar Christina --- vectorizer generates: mask_patt_21.19_58 = vect_perm_even_49 >= vect_cst__57; mask_patt_21.19_59 = vect_perm_even_55 >= vect_cst__57; vexit_reduc_63 = mask_patt_21.19_58 | mask_patt_21.19_59; if

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

2024-03-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #17 from Tamar Christina --- > So doing in the vectorizer sth like the following should get us the best > possible ranges? Ah, probably only global ranges since the SCEV query > itself would still lack context sensitive info (but

[Bug tree-optimization/114234] [14 Regression] verify_ssa failure with early-break vectorisation

2024-03-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114234 Tamar Christina changed: What|Removed |Added Last reconfirmed||2024-03-05

[Bug tree-optimization/114192] scalar code left around following early break vectorization of reduction

2024-03-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877 --- Comment #12 from Tamar Christina --- and it's not the first time we have conditional lowering. We already do so for e.g. shifts, where shifting by an amount => bitsize of a vector element is defined behavior or AArch64.

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877 --- Comment #11 from Tamar Christina --- (In reply to Andrew Pinski from comment #10) > (In reply to Tamar Christina from comment #9) > > While RA should be able to deal with this, > > shouldn't we also just lower TBLs in gimple? > > > > This

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #3 from Tamar Christina --- > > This was a correctness fix btw, so I'm not sure we can easily recover - we > could try using niter information for CHREC_VARIABLE but then there's > variable niter here so I don't see a chance. >

[Bug tree-optimization/114151] New: [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 Bug ID: 114151 Summary: [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b Product: gcc

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877 --- Comment #9 from Tamar Christina --- While RA should be able to deal with this, shouldn't we also just lower TBLs in gimple? This no reason why this can't be a VEC_PERM_EXPR which would also get the copies removed at the gimple level and

[Bug target/102171] vget_low_*/vget_high_* intrinsics should become BIT_FIELD_REF during gimple

2024-02-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102171 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #2) > I think I am going to implement this (or assign it interally to someone else > to implement). If you do, please also remove them from arm_neon.h and use the

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 Tamar Christina changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org ---

[Bug tree-optimization/86530] Vectorization failure for a simple loop

2024-02-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86530 --- Comment #8 from Tamar Christina --- (In reply to Andrew Pinski from comment #6) > With my patch for V4QI, we still don't get the best code: > vect_perm_even_271 = VEC_PERM_EXPR 4, 6 }>; > vect_perm_even_273 = VEC_PERM_EXPR 4, 6 }>; >

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7

2024-02-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441 --- Comment #27 from Tamar Christina --- Created attachment 57538 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57538=edit proposed1.patch proposed patch, this gets the gathers and scatters back. doing regression run.

[Bug tree-optimization/114099] [14 regression] ICE in find_uses_to_rename_use when building darktable-4.6.1

2024-02-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114099 --- Comment #8 from Tamar Christina --- Created attachment 57537 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57537=edit uses.patch new code seems sensitive to visitation order as get_virtual_phi returns NULL for blocks which don't

[Bug middle-end/114081] [14 regression] ICE in verify_dominators when building php-8.3.3 (error: dominator of 16 should be 111, not 3) since r14-6822

2024-02-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114081 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug tree-optimization/114068] [14 regression] ICE when building darktable-4.6.1 (error: PHI node with wrong VUSE on edge from BB 25) since r14-8768

2024-02-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114068 --- Comment #14 from Tamar Christina --- patch submitted https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646415.html

  1   2   3   4   5   6   7   8   >