[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

--- Comment #11 from GCC Commits  ---
The trunk branch has been updated by Thomas Schwinge :

https://gcc.gnu.org/g:da75309c635c54a6010b146514d456d2a4c6ab33

commit r15-7102-gda75309c635c54a6010b146514d456d2a4c6ab33
Author: Thomas Schwinge 
Date:   Tue Jan 21 14:57:37 2025 +0100

vect: Force alignment peeling to vectorize more early break loops
[PR118211]: update 'gcc.dg/vect/vect-switch-search-line-fast.c' for GCN

PR tree-optimization/118211
PR tree-optimization/116126
gcc/testsuite/
* gcc.dg/vect/vect-switch-search-line-fast.c: Update for GCN.

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-13 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

--- Comment #10 from ak at gcc dot gnu.org ---
Okay it looks like the test case just avoids the if (...) return problem by
replacing it with if (...) break. I guess the vectorizer should really be able
to do that on its own.

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-13 Thread ak at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

ak at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ak at gcc dot gnu.org

--- Comment #9 from ak at gcc dot gnu.org ---
On x86/avx512f the first variant still fails with 

earch-line-fast.c:4:60: missed: couldn't vectorize loop
search-line-fast.c:4:60: missed: not vectorized: number of iterations cannot be
computed.

and the second variant with end condition with

search-line-fast-cond.c:3:18: missed: couldn't vectorize loop
search-line-fast-cond.c:3:18: missed: not vectorized: unsupported control flow
in loop.
search-line-fast-cond.c:1:22: note: vectorized 0 loops in function.

The first needs some pattern matching: having the break condition in the loop
vs having it in a while header shouldn't matter.

I think the later is due to

vect_analyze_loop_form:
   
  |if (EDGE_COUNT (bbs[i]->succs) != 1


  [local count: 1044213920]:
  # prephitmp_25 = PHI <_24(4), 0(12)>
  _10 = _1 == 92;
  _13 = _10 | prephitmp_25;
  if (_13 != 0)
goto ; [8.03%]
  else
goto ; [91.97%]

   [local count: 83800315]:
  # s_19 = PHI 
  return s_19;

because the return isn't a jump out of the loop.
I'm not sure how arm avoids that problem.

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #8 from Tamar Christina  ---
This seems to now vectorize on 32-bit platforms.

It's failing analysis on 64-bit ones.

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

--- Comment #7 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:086031c058598512d09bf898e4db3735b3e1f22c

commit r15-6811-g086031c058598512d09bf898e4db3735b3e1f22c
Author: Alex Coplan 
Date:   Mon Jun 24 13:54:48 2024 +0100

vect: Also cost gconds for scalar [PR118211]

Currently we only cost gconds for the vector loop while we omit costing
them when analyzing the scalar loop; this unfairly penalizes the vector
loop in the case of loops with early exits.

This (together with the previous patches) enables us to vectorize
std::find with 64-bit element sizes.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost):
Don't skip over gconds.

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

--- Comment #5 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:f1c6789ab6c5443ccefab96c74b0e862119d1781

commit r15-6809-gf1c6789ab6c5443ccefab96c74b0e862119d1781
Author: Tamar Christina 
Date:   Mon Jul 8 12:16:11 2024 +0100

vect: Fix dominators when adding a guard to skip the vector loop [PR118211]

The alignment peeling changes exposed a latent missing dominator update
with early break vectorization, specifically when inserting the vector
skip edge, since the new edge bypasses the prolog skip block and thus
has the potential to subvert its dominance.  This patch fixes that.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Update immediate
dominators of nodes that were dominated by the prolog skip block
after inserting vector skip edge.  Initialize prolog variable to
NULL to avoid bogus -Wmaybe-uninitialized during bootstrap.

gcc/testsuite/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* g++.dg/vect/vect-early-break_6.cc: New test.

Co-Authored-By: Alex Coplan 

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

--- Comment #6 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:f4e259b4a66c81c234608056117836e13606e4c8

commit r15-6810-gf4e259b4a66c81c234608056117836e13606e4c8
Author: Alex Coplan 
Date:   Thu Jul 25 16:34:05 2024 +

vect: Ensure we add vector skip guard even when versioning for aliasing
[PR118211]

This fixes a latent wrong code issue whereby vect_do_peeling determined
the wrong condition for inserting the vector skip guard.  Specifically
in the case where the loop niters are unknown at compile time we used to
check:

  !LOOP_REQUIRES_VERSIONING (loop_vinfo)

but LOOP_REQUIRES_VERSIONING is true for loops which we have versioned
for aliasing, and that has nothing to do with prolog peeling.  I think
this condition should instead be checking specifically if we aren't
versioning for alignment.

As it stands, when we version for alignment, we don't peel, so the
vector skip guard is indeed redundant in that case.

With the testcase added (reduced from the Fortran frontend) we would
version for aliasing, omit the vector skip guard, and then at runtime we
would peel sufficient iterations for alignment that there wasn't a full
vector iteration left when we entered the vector body, thus overflowing
the output buffer.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust skip_vector
condition to only omit the edge if we're versioning for
alignment.

gcc/testsuite/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* gcc.dg/vect/vect-early-break_130.c: New test.

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

--- Comment #4 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:0a46245174123ad2802753e7fee689a541570ca0

commit r15-6808-g0a46245174123ad2802753e7fee689a541570ca0
Author: Alex Coplan 
Date:   Fri Jun 7 11:13:02 2024 +

vect: Don't guard scalar epilogue for inverted loops [PR118211]

For loops with LOOP_VINFO_EARLY_BREAKS_VECT_PEELED we should always
enter the scalar epilogue, so avoid emitting a guard on entry to the
epilogue.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Avoid emitting an
epilogue guard for inverted early-exit loops.

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2025-01-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:68326d5d1a593dc0bf098c03aac25916168bc5a9

commit r15-6807-g68326d5d1a593dc0bf098c03aac25916168bc5a9
Author: Alex Coplan 
Date:   Mon Mar 11 13:09:10 2024 +

vect: Force alignment peeling to vectorize more early break loops
[PR118211]

This allows us to vectorize more loops with early exits by forcing
peeling for alignment to make sure that we're guaranteed to be able to
safely read an entire vector iteration without crossing a page boundary.

To make this work for VLA architectures we have to allow compile-time
non-constant target alignments.  We also have to override the result of
the target's preferred_vector_alignment hook if it isn't a power-of-two
multiple of the TYPE_SIZE of the chosen vector type.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
Set need_peeling_for_alignment flag on read DRs instead of
failing vectorization.  Punt on gathers.
(dr_misalignment): Handle non-constant target alignments.
(vect_compute_data_ref_alignment): If need_peeling_for_alignment
flag is set on the DR, then override the target alignment chosen
by the preferred_vector_alignment hook to choose a safe
alignment.
(vect_supportable_dr_alignment): Override
support_vector_misalignment hook if need_peeling_for_alignment
is set on the DR: in this case we must return
dr_unaligned_unsupported in order to force peeling.
* tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
peeling by a compile-time non-constant amount.
* tree-vectorizer.h (dr_vec_info): Add new flag
need_peeling_for_alignment.

gcc/testsuite/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* gcc.dg/tree-ssa/cunroll-13.c: Don't vectorize.
* gcc.dg/tree-ssa/cunroll-14.c: Likewise.
* gcc.dg/unroll-6.c: Likewise.
* gcc.dg/tree-ssa/gen-vect-28.c: Likewise.
* gcc.dg/vect/vect-104.c: Expect to vectorize.
* gcc.dg/vect/vect-early-break_108-pr113588.c: Likewise.
* gcc.dg/vect/vect-early-break_109-pr113588.c: Likewise.
* gcc.dg/vect/vect-early-break_110-pr113467.c: Likewise.
* gcc.dg/vect/vect-early-break_3.c: Likewise.
* gcc.dg/vect/vect-early-break_65.c: Likewise.
* gcc.dg/vect/vect-early-break_8.c: Likewise.
* gfortran.dg/vect/vect-5.f90: Likewise.
* gfortran.dg/vect/vect-8.f90: Likewise.
* gcc.dg/vect/vect-switch-search-line-fast.c:

Co-Authored-By: Tamar Christina 

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2024-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126
Bug 116126 depends on bug 115484, which changed state.

Bug 115484 Summary: [13/14/15 regression] if-to-switch prevents AVX 
vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115484

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2024-07-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Just note that on the libcpp side we ensure padding of the cpp buffers, so
something the vectorizer itself can't.

[Bug tree-optimization/116126] vectorize libcpp search_line_fast

2024-07-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-07-30
Version|unknown |15.0
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.