[Bug target/102171] vget_low_*/vget_high_* intrinsics should become BIT_FIELD_REF during gimple

2024-02-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102171 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #2) > I think I am going to implement this (or assign it interally to someone else > to implement). If you do, please also remove them from arm_neon.h and use the n

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877 --- Comment #9 from Tamar Christina --- While RA should be able to deal with this, shouldn't we also just lower TBLs in gimple? This no reason why this can't be a VEC_PERM_EXPR which would also get the copies removed at the gimple level and allo

[Bug tree-optimization/114151] New: [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 Bug ID: 114151 Summary: [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b Product: gcc Version:

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #3 from Tamar Christina --- > > This was a correctness fix btw, so I'm not sure we can easily recover - we > could try using niter information for CHREC_VARIABLE but then there's > variable niter here so I don't see a chance. > It

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877 --- Comment #11 from Tamar Christina --- (In reply to Andrew Pinski from comment #10) > (In reply to Tamar Christina from comment #9) > > While RA should be able to deal with this, > > shouldn't we also just lower TBLs in gimple? > > > > This no

[Bug target/98877] [AArch64] Inefficient code generated for tbl NEON intrinsics

2024-02-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877 --- Comment #12 from Tamar Christina --- and it's not the first time we have conditional lowering. We already do so for e.g. shifts, where shifting by an amount => bitsize of a vector element is defined behavior or AArch64.

[Bug tree-optimization/114192] scalar code left around following early break vectorization of reduction

2024-03-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192 Tamar Christina changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED

[Bug tree-optimization/114234] [14 Regression] verify_ssa failure with early-break vectorisation

2024-03-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114234 Tamar Christina changed: What|Removed |Added Last reconfirmed||2024-03-05 Status|UNCONFI

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

2024-03-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #17 from Tamar Christina --- > So doing in the vectorizer sth like the following should get us the best > possible ranges? Ah, probably only global ranges since the SCEV query > itself would still lack context sensitive info (but as

[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822

2024-03-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 --- Comment #6 from Tamar Christina --- vectorizer generates: mask_patt_21.19_58 = vect_perm_even_49 >= vect_cst__57; mask_patt_21.19_59 = vect_perm_even_55 >= vect_cst__57; vexit_reduc_63 = mask_patt_21.19_58 | mask_patt_21.19_59; if (

[Bug tree-optimization/114345] New: FRE missing knowledge of semantics of IFN loads

2024-03-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345 Bug ID: 114345 Summary: FRE missing knowledge of semantics of IFN loads Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/114346] New: vectorizer generates the same IV twice

2024-03-14 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114346 Bug ID: 114346 Summary: vectorizer generates the same IV twice Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Pr

[Bug tree-optimization/114345] FRE missing knowledge of semantics of IFN loads

2024-03-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345 --- Comment #3 from Tamar Christina --- (In reply to Andrew Pinski from comment #2) > Oh VN does have some knowledge of MASK_STORE and LEN_STORE. Just not > LOAD_LANES . > > > See PR 106365 for MASK_STORE and LEN_STORE implementation. Shouldn'

[Bug target/114350] New: missing support for SVE widening floating point conversion

2024-03-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114350 Bug ID: 114350 Summary: missing support for SVE widening floating point conversion Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization

[Bug tree-optimization/114345] FRE missing knowledge of semantics of IFN loads

2024-03-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345 --- Comment #5 from Tamar Christina --- (In reply to Richard Biener from comment #4) > Well, the shuffling in .LOAD_LANES will be a bit awkward to do, but sure. We > basically lack "constant folding" of .LOAD_LANES and similarly of course > we

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 Tamar Christina changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed|

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #20 from Tamar Christina --- This is a bad interaction with early break and peeling for gaps. when peeling for gaps we set bias_for_lowest to 0, which then negates the ceil for the upper bound calculation when the div is exact. We

[Bug rtl-optimization/113682] Branches in branchless binary search rather than cmov/csel/csinc

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682 Tamar Christina changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug rtl-optimization/113682] Branches in branchless binary search rather than cmov/csel/csinc

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682 --- Comment #9 from Tamar Christina --- (In reply to Andrew Pinski from comment #8) > This might be the path splitting running on the gimple level causing issues > too; see PR 112402 . Ah that's a good shout. It looks like Richi already agrees

[Bug target/106346] [11/12/13/14 Regression] Potential regression on vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94

2023-07-31 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346 Tamar Christina changed: What|Removed |Added Target Milestone|11.5|14.0

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Comme

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #14 from Tamar Christina --- Or rather, info_for_reduction looks at the original statement if it's a pattern, whereas vect_is_reduction only looks at the direct statement. You'll probably want to check vect_orig_stmt if using info_f

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #16 from Tamar Christina --- (In reply to Hao Liu from comment #15) > Ah, I see. > > I've sent out a quick fix patch for code review. I'll investigate more > about this and find out the root cause. Thanks! I can reduce a testcase

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625 --- Comment #18 from Tamar Christina --- Hi, here's the reduced case: > cat analyse.i double x264_weights_analyse___trans_tmp_1; float x264_weights_analyse_ref_mean; x264_weights_analyse() { x264_weights_analyse___trans_tmp_1 = floor(x2

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-08-04 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 106346, which changed state. Bug 106346 Summary: [11/12/13/14 Regression] Potential regression on vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=1063

[Bug target/106346] [11/12/13/14 Regression] Potential regression on vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94

2023-08-04 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/95958] [meta-bug] Inefficient arm_neon.h code for AArch64

2023-08-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958 Bug 95958 depends on bug 88212, which changed state. Bug 88212 Summary: IRA Register Coalescing not working for the testcase https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88212 What|Removed |Added ---

[Bug rtl-optimization/88212] IRA Register Coalescing not working for the testcase

2023-08-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88212 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org Re

[Bug target/89967] Inefficient code generation for vld2q_lane_u8 under aarch64

2023-08-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org

[Bug target/111370] On Aarch64 4% 511.povray_r regression between g:6cd85273071b5f13 (2023-08-23 00:17) and g:e1f096a3cc96c719 (2023-08-25 22:34)

2023-09-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111370 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org Last re

[Bug fortran/90608] Inline non-scalar minloc/maxloc calls

2023-09-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org,

[Bug fortran/90608] Inline non-scalar minloc/maxloc calls

2023-10-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608 --- Comment #9 from Tamar Christina --- (In reply to Mikael Morin from comment #8) > Created attachment 56091 [details] > Rough patch > > Here is a rough patch to make the scalarizer support minloc calls. > It regresses on minloc_1.f90 at least,

[Bug tree-optimization/111770] New: predicated loads inactive lane values not modelled

2023-10-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770 Bug ID: 111770 Summary: predicated loads inactive lane values not modelled Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug target/116145] Suboptimal SVE immediate synthesis

2024-07-31 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Comme

[Bug target/116145] Suboptimal SVE immediate synthesis

2024-07-31 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145 --- Comment #5 from Tamar Christina --- (In reply to ktkachov from comment #4) > Intersting, thanks for the background. The bigger issue I was seeing was > with a string-matching loop like https://godbolt.org/z/E7b13915E where the > constant poo

[Bug target/115974] sat_add, etc. vector patterns not done for aarch64 (non-sve)

2024-07-31 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115974 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org

[Bug fortran/90608] Inline non-scalar minloc/maxloc calls

2024-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608 --- Comment #24 from Tamar Christina --- (In reply to Mikael Morin from comment #23) > (In reply to Mikael Morin from comment #21) > > > > (...) and should be able to submit the first > > series (inline minloc without dim argument) this week. >

[Bug libstdc++/116140] [15 Regression] 5-35% slowdown of 483.xalancbmk and 523.xalancbmk_r since r15-2356-ge69456ff9a54ba

2024-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140 --- Comment #3 from Tamar Christina --- (In reply to Jan Hubicka from comment #2) > Looking at the change, I do not see how that could disable inlining. It > should only reduce size of the function size estimates in the heuristics. > > I think

[Bug libstdc++/116140] [15 Regression] 5-35% slowdown of 483.xalancbmk and 523.xalancbmk_r since r15-2356-ge69456ff9a54ba

2024-08-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140 --- Comment #4 from Tamar Christina --- It looks like it's because the old unrolled code for the pointer version did a subtract and used the difference to optimize the IV check away to every 4 elements. This explains the increase in instruction

[Bug target/116229] [15 Regression] wrong code at -Ofast aarch64 due to missing fneg to generate 0x8000000000000000

2024-08-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116229 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug target/116229] [15 Regression] wrong code at -Ofast aarch64 due to missing fneg to generate 0x8000000000000000

2024-08-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116229 Tamar Christina changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/116409] [15 Regression] Recent phiopt change causing ICE with sqrt and -frounding-math

2024-08-19 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116409 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org

[Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5

2024-08-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463 --- Comment #5 from Tamar Christina --- Yeah, This is because they generate different gimple sequences and thus different SLP trees. The core of the problem is there's no canonical form here, and a missing gimple simplification rule: _33 = IM

[Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5

2024-08-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463 --- Comment #11 from Tamar Christina --- (In reply to Richard Biener from comment #6) > I think > > a - ((b * -c) + (d * -e)) -> a + (b * c) + (d * e) > > is a good simplification to be made, but it's difficult to do this with > canonicali

[Bug tree-optimization/116520] Multiple condition lead to missing vectorization due to missing early break

2024-08-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520 --- Comment #3 from Tamar Christina --- (In reply to Richard Biener from comment #2) > The issue seems to be that if-conversion isn't done: > > Can not ifcvt due to multiple exits > > maybe my patched dev tree arrives with a different CFG here

[Bug tree-optimization/116520] Multiple condition lead to missing vectorization due to missing early break

2024-08-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520 --- Comment #4 from Tamar Christina --- (In reply to Tamar Christina from comment #3) > (In reply to Richard Biener from comment #2) > > The issue seems to be that if-conversion isn't done: > > I wonder if this transformation is really beneficia

[Bug rtl-optimization/116541] [14/15 Regression] Inefficient missing use of reg+reg addressing modes

2024-09-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116541 Tamar Christina changed: What|Removed |Added CC||wilco at gcc dot gnu.org Ever con

[Bug tree-optimization/36010] Loop interchange not performed

2024-09-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36010 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Commen

[Bug middle-end/116575] New: [15 Regression] blender in SPEC2017 ICE in vect_analyze_slp

2024-09-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116575 Bug ID: 116575 Summary: [15 Regression] blender in SPEC2017 ICE in vect_analyze_slp Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: ice-on-valid-code

[Bug tree-optimization/116577] New: [15 Regression] tonto in SPECCPU 2006 ICEs in vect_lower_load_permutations

2024-09-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116577 Bug ID: 116577 Summary: [15 Regression] tonto in SPECCPU 2006 ICEs in vect_lower_load_permutations Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: ice

[Bug tree-optimization/116575] [15 Regression] blender in SPEC2017 ICE in vect_analyze_slp

2024-09-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116575 --- Comment #1 from Tamar Christina --- --- int a; float *b, *c; void d() { char *e; for (; a; a++, b += 4, c += 4) if (*e++) { float *f = c; f[0] = b[0]; f[1] = b[1]; f[2] = b[2]; f[3] = b[3]; } } comp

[Bug tree-optimization/116577] [15 Regression] tonto in SPECCPU 2006 ICEs in vect_lower_load_permutations

2024-09-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116577 --- Comment #2 from Tamar Christina --- --- module type type a complex(kind(1.0d0)) j real(kind(1.0d0)) k real(kind(1.0d0)) l end type contains subroutine b(c,g) type(a), dimension(:) :: c target c type(a), dim

[Bug tree-optimization/116577] [15 Regression] tonto in SPECCPU 2006 ICEs in vect_lower_load_permutations

2024-09-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116577 --- Comment #3 from Tamar Christina --- reproducer should be saved with extension .f90

[Bug tree-optimization/116628] [15 Regression] ICE in vect_analyze_loop_1 on aarch64 with -Ofast in TSVC

2024-09-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116628 --- Comment #3 from Tamar Christina --- Still seems to ICE after that commit on last night's trunk https://godbolt.org/z/GnYT7Kx46

[Bug tree-optimization/116628] [15 Regression] ICE in vect_analyze_loop_1 on aarch64 with -Ofast in TSVC

2024-09-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116628 --- Comment #5 from Tamar Christina --- (In reply to Richard Biener from comment #4) > Confirmed. The ICE means we've "fatally" failed to analyze an epilogue > which we do not expect. > > t.c:4:21: note: worklist: examine stmt: .MASK_STORE (

[Bug tree-optimization/116628] [15 Regression] ICE in vect_analyze_loop_1 on aarch64 with -Ofast in TSVC

2024-09-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116628 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug tree-optimization/116628] [15 Regression] ICE in vect_analyze_loop_1 on aarch64 with -Ofast in TSVC

2024-09-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116628 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/115866] missed optimization vectorizing switch statements.

2024-09-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 Tamar Christina changed: What|Removed |Added Resolution|FIXED |--- Status|RESOLVED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-09-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 115866, which changed state. Bug 115866 Summary: missed optimization vectorizing switch statements. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 What|Removed |Added --

[Bug tree-optimization/115130] [meta-bug] early break vectorization

2024-09-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 Bug 115130 depends on bug 115866, which changed state. Bug 115866 Summary: missed optimization vectorizing switch statements. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866 What|Removed |Added

[Bug target/116667] New: missing superfluous zero-extends of SVE values

2024-09-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116667 Bug ID: 116667 Summary: missing superfluous zero-extends of SVE values Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/116684] [vectorization][x86-64] dot_16x1x16_uint8_int8_int32 could be better optimized

2024-09-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116684 Tamar Christina changed: What|Removed |Added CC||victorldn at gcc dot gnu.org --- Comm

[Bug middle-end/109153] New: missed vector constructor optimizations

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153 Bug ID: 109153 Summary: missed vector constructor optimizations Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal P

[Bug middle-end/109153] missed vector constructor optimizations

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153 --- Comment #3 from Tamar Christina --- (In reply to Richard Biener from comment #2) > On the GIMPLE side we should canonicalize here I think, at which point > inserts into a splatted vector become more profitable depends? > > _4 = VEC_PERM_E

[Bug middle-end/109153] missed vector constructor optimizations

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org

[Bug tree-optimization/109156] New: Support Absolute Difference detection in GCC

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109156 Bug ID: 109156 Summary: Support Absolute Difference detection in GCC Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/109154] [13 regression] aarch64 -mcpu=neoverse-v1 microbude performance regression

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 --- Comment #1 from Tamar Christina --- Thanks for the report, taking a look!

[Bug tree-optimization/109156] Support Absolute Difference detection in GCC

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109156 --- Comment #2 from Tamar Christina --- (In reply to Richard Biener from comment #1) > (In reply to Tamar Christina from comment #0) > > 2. It looks like all targets that implement SAD do so with an instruction > > that does ABD and then perform

[Bug tree-optimization/109156] Support Absolute Difference detection in GCC

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109156 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org -

[Bug target/109154] [13 regression] aarch64 -mcpu=neoverse-v1 microbude performance regression

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 --- Comment #2 from Tamar Christina --- Confirmed, It looks like the extra range information from g:4fbe3e6aa74dae5c75a73c46ae6683fdecd1a75d is leading jump threading down the wrong path. Reduced testcase: --- int etot_0, fasten_main_natpro_ch

[Bug target/109154] [13 regression] jump threading with de-optimizes nested floating point comparisons

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 Tamar Christina changed: What|Removed |Added Summary|[13 regression] aarch64 |[13 regression] jump

[Bug tree-optimization/109230] [13 Regression] Maybe wrong code for opus package on aarch64 since r13-4122-g1bc7efa948f751

2023-03-21 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109230 --- Comment #1 from Tamar Christina --- That patch only fixed the bootstrap, in any case I'm on holidays so have asked someone else to look.

[Bug tree-optimization/109230] [13 Regression] Maybe wrong code for opus package on aarch64 since r13-4122-g1bc7efa948f751

2023-03-21 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109230 --- Comment #11 from Tamar Christina --- Neither of those vec_perms are valid targets for this optimization. It looks like sel.series_p is not doing what I expected. It's matching even elements and ignoring the odd ones.

[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons

2023-03-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 --- Comment #24 from Tamar Christina --- (In reply to Jakub Jelinek from comment #12) > (In reply to Richard Biener from comment #11) > > _1 shoud be [-Inf, nextafter (0.0, -Inf)], not [-Inf, -0.0] > The reduced testcase is invalid because it us

[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons

2023-03-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 --- Comment #25 from Tamar Christina --- Created attachment 54777 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54777&action=edit extracted codegen

[Bug rtl-optimization/109391] New: Inefficient codegen on AArch64 when structure types are returned

2023-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109391 Bug ID: 109391 Summary: Inefficient codegen on AArch64 when structure types are returned Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimi

[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons

2023-04-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 --- Comment #42 from Tamar Christina --- Thanks for all the work so far folks! Just to clarify the current state, it looks like the first reduced testcase is now correct. But the larger example as in c26 is still suboptimal, but slightly bette

[Bug tree-optimization/109587] New: Deeply nested loop unrolling overwhelms register allocator

2023-04-21 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587 Bug ID: 109587 Summary: Deeply nested loop unrolling overwhelms register allocator Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization

[Bug tree-optimization/109587] Deeply nested loop unrolling overwhelms register allocator with -O3

2023-04-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587 --- Comment #4 from Tamar Christina --- (In reply to Richard Biener from comment #3) > The issue isn't unrolling but invariant motion. We unroll the innermost > loop, vectorizer the middle loop and then unroll that as well. That leaves > us wi

[Bug tree-optimization/109587] Deeply nested loop unrolling overwhelms register allocator with -O3

2023-04-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587 --- Comment #7 from Tamar Christina --- (In reply to Richard Biener from comment #5) > (In reply to Tamar Christina from comment #4) > > (In reply to Richard Biener from comment #3) > > > The issue isn't unrolling but invariant motion. We unrol

[Bug tree-optimization/109154] [13/14 regression] jump threading de-optimizes nested floating point comparisons

2023-04-25 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 --- Comment #54 from Tamar Christina --- @Jakub, just to check to avoid doing duplicate work, did you intend to do the remaining ifcvt changes or should we?

[Bug tree-optimization/109154] [13/14 regression] jump threading de-optimizes nested floating point comparisons

2023-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org -

[Bug target/109632] New: Inefficient codegen when complex numbers are emulated with structs

2023-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 Bug ID: 109632 Summary: Inefficient codegen when complex numbers are emulated with structs Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-opti

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #2 from Tamar Christina --- (In reply to Richard Biener from comment #1) > Well, the usual unknown ABI boundary at function entry/exit. Yes but LLVM gets it right, so should be a solve able computer science problem. :) Note that th

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #3 from Tamar Christina --- note that even if we can't stop SLP, we should be able to generate as efficient code by being creative about the instruction selection, that's why I marked it as a target bug :)

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #6 from Tamar Christina --- That's an interesting approach, I think it would also fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109391 would it not? Since the int16x8x3_t return would be "scalarized" avoiding the bad expansion?

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #9 from Tamar Christina --- Thank you!

[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b

2023-05-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711 --- Comment #5 from Tamar Christina --- (In reply to Martin Liška from comment #3) > Hm, on x86_64-linux-gnu, it started with r13-6616-g2246d576f922ba. $ cat prtest2.c void lspf2lpc(); int interpolate_lpc_q_0; void interpolate_lpc(int subfram

[Bug ipa/109711] [14 regression] ICE (tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in verify_range, at value-range.cc:1060) when building ffmpeg-4.4.4 since r14-377-gc92b8be9b52b

2023-05-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711 --- Comment #6 from Tamar Christina --- my own bisect does indeed end up at r14-377-gc92b8be9b52b7e and cannot reproduce it on GCC 13.

[Bug rtl-optimization/114575] New: [14 Regression] SVE addressing modes broken since g:839bc42772ba7af66af3bd16efed4a69511312ae

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114575 Bug ID: 114575 Summary: [14 Regression] SVE addressing modes broken since g:839bc42772ba7af66af3bd16efed4a69511312ae Product: gcc Version: 14.0 Status: UNCONFIRMED

[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org

[Bug target/114510] [14 Regression] missed proping of multiply by 2 into address of load/stores

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114510 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Comme

[Bug target/114577] New: Inefficient codegen for SVE/NEON bridge

2024-04-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114577 Bug ID: 114577 Summary: Inefficient codegen for SVE/NEON bridge Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal P

[Bug tree-optimization/114635] New: OpenMP reductions fail dependency analysis

2024-04-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 Bug ID: 114635 Summary: OpenMP reductions fail dependency analysis Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635 --- Comment #6 from Tamar Christina --- (In reply to Jakub Jelinek from comment #4) > Now, with SVE/RISCV vectors the actual vectorization factor is a poly_int > rather than constant. One possibility would be to use VLA arrays in those > cases,

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #21 from Tamar Christina --- Created attachment 57932 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57932&action=edit loop.c attached reduced testcase that reproduces the issue and also checks the buffer position and copied v

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #22 from Tamar Christina --- note that due to the secondary exit the actual full vector iteration count is 8 scalar elements at VF=4 == 2. And it's this boundary condition where we fail, since ceil (8/4) == 2. any other value would

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #24 from Tamar Christina --- (In reply to Richard Biener from comment #23) > Maybe easier to understand testcase: > > with -O3 -msse4.1 -fno-vect-cost-model we return 20 instead of 8. Adding > -fdisable-tree-cunroll avoids the issu

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 --- Comment #26 from Tamar Christina --- (In reply to Richard Biener from comment #25) > That means, when the loop takes the early exit we _must_ take that during > the vector iterations. Peeling for gaps means if we would take the early > exit

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403 Tamar Christina changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

<    1   2   3   4   5   6   7   8   9   >