[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 --- Comment #11 from GCC Commits --- The trunk branch has been updated by Thomas Schwinge : https://gcc.gnu.org/g:da75309c635c54a6010b146514d456d2a4c6ab33 commit r15-7102-gda75309c635c54a6010b146514d456d2a4c6ab33 Author: Thomas Schwinge Date: Tue Jan 21 14:57:37 2025 +0100 vect: Force alignment peeling to vectorize more early break loops [PR118211]: update 'gcc.dg/vect/vect-switch-search-line-fast.c' for GCN PR tree-optimization/118211 PR tree-optimization/116126 gcc/testsuite/ * gcc.dg/vect/vect-switch-search-line-fast.c: Update for GCN.
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 --- Comment #10 from ak at gcc dot gnu.org --- Okay it looks like the test case just avoids the if (...) return problem by replacing it with if (...) break. I guess the vectorizer should really be able to do that on its own.
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 ak at gcc dot gnu.org changed: What|Removed |Added CC||ak at gcc dot gnu.org --- Comment #9 from ak at gcc dot gnu.org --- On x86/avx512f the first variant still fails with earch-line-fast.c:4:60: missed: couldn't vectorize loop search-line-fast.c:4:60: missed: not vectorized: number of iterations cannot be computed. and the second variant with end condition with search-line-fast-cond.c:3:18: missed: couldn't vectorize loop search-line-fast-cond.c:3:18: missed: not vectorized: unsupported control flow in loop. search-line-fast-cond.c:1:22: note: vectorized 0 loops in function. The first needs some pattern matching: having the break condition in the loop vs having it in a while header shouldn't matter. I think the later is due to vect_analyze_loop_form: |if (EDGE_COUNT (bbs[i]->succs) != 1 [local count: 1044213920]: # prephitmp_25 = PHI <_24(4), 0(12)> _10 = _1 == 92; _13 = _10 | prephitmp_25; if (_13 != 0) goto ; [8.03%] else goto ; [91.97%] [local count: 83800315]: # s_19 = PHI return s_19; because the return isn't a jump out of the loop. I'm not sure how arm avoids that problem.
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Comment #8 from Tamar Christina --- This seems to now vectorize on 32-bit platforms. It's failing analysis on 64-bit ones.
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 --- Comment #7 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:086031c058598512d09bf898e4db3735b3e1f22c commit r15-6811-g086031c058598512d09bf898e4db3735b3e1f22c Author: Alex Coplan Date: Mon Jun 24 13:54:48 2024 +0100 vect: Also cost gconds for scalar [PR118211] Currently we only cost gconds for the vector loop while we omit costing them when analyzing the scalar loop; this unfairly penalizes the vector loop in the case of loops with early exits. This (together with the previous patches) enables us to vectorize std::find with 64-bit element sizes. gcc/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost): Don't skip over gconds.
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 --- Comment #5 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:f1c6789ab6c5443ccefab96c74b0e862119d1781 commit r15-6809-gf1c6789ab6c5443ccefab96c74b0e862119d1781 Author: Tamar Christina Date: Mon Jul 8 12:16:11 2024 +0100 vect: Fix dominators when adding a guard to skip the vector loop [PR118211] The alignment peeling changes exposed a latent missing dominator update with early break vectorization, specifically when inserting the vector skip edge, since the new edge bypasses the prolog skip block and thus has the potential to subvert its dominance. This patch fixes that. gcc/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * tree-vect-loop-manip.cc (vect_do_peeling): Update immediate dominators of nodes that were dominated by the prolog skip block after inserting vector skip edge. Initialize prolog variable to NULL to avoid bogus -Wmaybe-uninitialized during bootstrap. gcc/testsuite/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * g++.dg/vect/vect-early-break_6.cc: New test. Co-Authored-By: Alex Coplan
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 --- Comment #6 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:f4e259b4a66c81c234608056117836e13606e4c8 commit r15-6810-gf4e259b4a66c81c234608056117836e13606e4c8 Author: Alex Coplan Date: Thu Jul 25 16:34:05 2024 + vect: Ensure we add vector skip guard even when versioning for aliasing [PR118211] This fixes a latent wrong code issue whereby vect_do_peeling determined the wrong condition for inserting the vector skip guard. Specifically in the case where the loop niters are unknown at compile time we used to check: !LOOP_REQUIRES_VERSIONING (loop_vinfo) but LOOP_REQUIRES_VERSIONING is true for loops which we have versioned for aliasing, and that has nothing to do with prolog peeling. I think this condition should instead be checking specifically if we aren't versioning for alignment. As it stands, when we version for alignment, we don't peel, so the vector skip guard is indeed redundant in that case. With the testcase added (reduced from the Fortran frontend) we would version for aliasing, omit the vector skip guard, and then at runtime we would peel sufficient iterations for alignment that there wasn't a full vector iteration left when we entered the vector body, thus overflowing the output buffer. gcc/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * tree-vect-loop-manip.cc (vect_do_peeling): Adjust skip_vector condition to only omit the edge if we're versioning for alignment. gcc/testsuite/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * gcc.dg/vect/vect-early-break_130.c: New test.
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 --- Comment #4 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:0a46245174123ad2802753e7fee689a541570ca0 commit r15-6808-g0a46245174123ad2802753e7fee689a541570ca0 Author: Alex Coplan Date: Fri Jun 7 11:13:02 2024 + vect: Don't guard scalar epilogue for inverted loops [PR118211] For loops with LOOP_VINFO_EARLY_BREAKS_VECT_PEELED we should always enter the scalar epilogue, so avoid emitting a guard on entry to the epilogue. gcc/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * tree-vect-loop-manip.cc (vect_do_peeling): Avoid emitting an epilogue guard for inverted early-exit loops.
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 --- Comment #3 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:68326d5d1a593dc0bf098c03aac25916168bc5a9 commit r15-6807-g68326d5d1a593dc0bf098c03aac25916168bc5a9 Author: Alex Coplan Date: Mon Mar 11 13:09:10 2024 + vect: Force alignment peeling to vectorize more early break loops [PR118211] This allows us to vectorize more loops with early exits by forcing peeling for alignment to make sure that we're guaranteed to be able to safely read an entire vector iteration without crossing a page boundary. To make this work for VLA architectures we have to allow compile-time non-constant target alignments. We also have to override the result of the target's preferred_vector_alignment hook if it isn't a power-of-two multiple of the TYPE_SIZE of the chosen vector type. gcc/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Set need_peeling_for_alignment flag on read DRs instead of failing vectorization. Punt on gathers. (dr_misalignment): Handle non-constant target alignments. (vect_compute_data_ref_alignment): If need_peeling_for_alignment flag is set on the DR, then override the target alignment chosen by the preferred_vector_alignment hook to choose a safe alignment. (vect_supportable_dr_alignment): Override support_vector_misalignment hook if need_peeling_for_alignment is set on the DR: in this case we must return dr_unaligned_unsupported in order to force peeling. * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog peeling by a compile-time non-constant amount. * tree-vectorizer.h (dr_vec_info): Add new flag need_peeling_for_alignment. gcc/testsuite/ChangeLog: PR tree-optimization/118211 PR tree-optimization/116126 * gcc.dg/tree-ssa/cunroll-13.c: Don't vectorize. * gcc.dg/tree-ssa/cunroll-14.c: Likewise. * gcc.dg/unroll-6.c: Likewise. * gcc.dg/tree-ssa/gen-vect-28.c: Likewise. * gcc.dg/vect/vect-104.c: Expect to vectorize. * gcc.dg/vect/vect-early-break_108-pr113588.c: Likewise. * gcc.dg/vect/vect-early-break_109-pr113588.c: Likewise. * gcc.dg/vect/vect-early-break_110-pr113467.c: Likewise. * gcc.dg/vect/vect-early-break_3.c: Likewise. * gcc.dg/vect/vect-early-break_65.c: Likewise. * gcc.dg/vect/vect-early-break_8.c: Likewise. * gfortran.dg/vect/vect-5.f90: Likewise. * gfortran.dg/vect/vect-8.f90: Likewise. * gcc.dg/vect/vect-switch-search-line-fast.c: Co-Authored-By: Tamar Christina
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 Bug 116126 depends on bug 115484, which changed state. Bug 115484 Summary: [13/14/15 regression] if-to-switch prevents AVX vectorization https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115484 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- Just note that on the libcpp side we ensure padding of the cpp buffers, so something the vectorizer itself can't.
[Bug tree-optimization/116126] vectorize libcpp search_line_fast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2024-07-30 Version|unknown |15.0 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Confirmed.