[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-16 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #28 from rguenther at suse dot de  ---
On Thu, 16 Jan 2025, segher at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825
> 
> --- Comment #27 from Segher Boessenkool  ---
> > This is a GIMPLE pass which has no idea what the backend will expand
> > __builtin_darn() to.
> 
> So you are saying >90% of builtins now need to say they are pure and const
> (which
> makes totally no sense for things not touching memory)?  It never was like
> this,
> only things that do touch memory at all ever looked at those attributes :-(

All builtins you expect GCC should CSE need to be marked pure or const
(and that was always the case).  I think DARN shouldn't be CSEd though.

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-16 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #27 from Segher Boessenkool  ---
> This is a GIMPLE pass which has no idea what the backend will expand
> __builtin_darn() to.

So you are saying >90% of builtins now need to say they are pure and const
(which
makes totally no sense for things not touching memory)?  It never was like
this,
only things that do touch memory at all ever looked at those attributes :-(

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #26 from Richard Biener  ---
(In reply to Segher Boessenkool from comment #25)
> No, darn does have a side effect: it returns a random number in the
> destination reg (_deliver_ _a_ _r_andom _n_umber).  It does not touch memory
> at all.
> 
> There are no call insns at all either, of course, so how is that code you
> show
> relevant at all?

This is a GIMPLE pass which has no idea what the backend will expand
__builtin_darn() to.  And yes, on the GIMPLE side builtin calls are calls.
And builtins like __builtin_strftime () do expand to calls.

As said already - if the rev in question caused tons of testsuite fails
please open a _separate_ bugreport listing those so that we can properly
track such a regression.

_This_ bug is about the excessive unrolling and thus code size regression
involving stmts with side-effects (on AVR).

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #25 from Segher Boessenkool  ---
No, darn does have a side effect: it returns a random number in the destination
reg (_deliver_ _a_ _r_andom _n_umber).  It does not touch memory at all.

There are no call insns at all either, of course, so how is that code you show
relevant at all?

The most usual side effect is a store to memory, and that is way way way
scarier
than anything here.  Treating things with a side effect as something that
should
hinder unrolling in any way is not very sensible.

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #24 from Segher Boessenkool  ---
> > Okay, two insns, there's an add insn as well.  But *not* unrolling this 
> > likely
> > makes bigger code already!
> 
> This is because
> 
>   /* If there is call on a hot path through the loop, then
>  there is most probably not much to optimize.  */ 
>   else if (size.num_non_pure_calls_on_hot_path) 
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "Not unrolling loop %d: "
>  "contains call and code would grow.\n",
>  loop->num);
>   return false;
>   /* If there is pure/const call in the function, then we can
>  still optimize the unrolled loop body if it contains some
>  other interesting code than the calls and code storing or
>  cumulating the return value.  */
>   else if (size.num_pure_calls_on_hot_path
>/* One IV increment, one test, one ivtmp store and
>   one useful stmt.  That is about minimal loop
>   doing pure call.  */
>&& (size.non_call_stmts_on_hot_path
><= 3 + size.num_pure_calls_on_hot_path))
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "Not unrolling loop %d: " 
>  "contains just pure calls and code would 
> grow.\n",
>  loop->num);
>   return false;
> }

There are no calls at all here.  Or, do builtins count as calls?  That
makes no sense.  Most builtins end up as simple single insns.

> this _only_ gets pre-empted if the loop appears to shrink (from the
> removal of 1/3rd of stmts or other heuristics).  Not estimating the
> calls to go away now makes the loop not shrinking and hit the call
> exception (otherwise we allow some growth).

Completely unrolling this loop gets rid of all the loop setup etc. insns.
That is three insns setup, and one to do the actual looping, statically; and
the loop insn (each iteration) dynamically.  Completely unrolling by 4x always
is a win.

> Not sure if DARN is properly marked pure (aka not writing to memory),
> probably not, since otherwise it would not have been considered
> having side-effects.

It does not touch memory at all.  "pure" and "const" do not mean anything
for insns that do not touch memory.

> So - fix your builtins!

How, what, where?

> Not sure what the point about this
> testcase is - a sum of 4 random numbers should be replaceable
> with one random number & ~0x3.  But sure, darn() probably
> cannot be pure (otherwise we'd CSE multiple calls).

The point of the testcase is ensuring that for every instance of the builtin
a darn insn is generated, instead of doing only one and multiplying by four,
as before (it was an unspec instead of an unspecv).

> So - bad testcase.  Do you have another one to show?

Lots and lots and lots.  But if you cannot agree that what we did before was
good and what we do now is bad, i.e. this is a regression, for *this* testcase,
I don't know how to ever convince you of anything :-(


Segher

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-15 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #23 from rguenther at suse dot de  ---
On Wed, 15 Jan 2025, hubicka at ucw dot cz wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825
> 
> --- Comment #22 from Jan Hubicka  ---
> >   /* If there is pure/const call in the function, then we can
> >  still optimize the unrolled loop body if it contains some
> >  other interesting code than the calls and code storing or
> >  cumulating the return value.  */
> 
> Note that these days we could optimize around non-pure/const functions
> if modref summary is informative enough.  Common case is when function
> return value by writting to pointer passed as argument.
> I am however not sure how to predict this - i.e. where to draw line
> between useful and useless modref summary.

note this case involves a (target) builtin, we could try to special-case
some (the target could?), we have is_inexpensive_builtin which
unconditionally says true for all target builtins ...

But I think the motivating testcase we have is bad, so I'd like to
see a better one.  Also this doesn't feel like appropriate time to
change such things - I'm not going to backport the side-effect
change either.

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-15 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #22 from Jan Hubicka  ---
>   /* If there is pure/const call in the function, then we can
>  still optimize the unrolled loop body if it contains some
>  other interesting code than the calls and code storing or
>  cumulating the return value.  */

Note that these days we could optimize around non-pure/const functions
if modref summary is informative enough.  Common case is when function
return value by writting to pointer passed as argument.
I am however not sure how to predict this - i.e. where to draw line
between useful and useless modref summary.

Honza

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #21 from rguenther at suse dot de  ---
On Mon, 13 Jan 2025, segher at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825
> 
> --- Comment #20 from Segher Boessenkool  ---
> gcc.target/powerpc/darn-3.c
> 
> I'll quote the whole thing:
> 
> ===
> static int darn32(void) { return __builtin_darn_32(); }
> 
> int four(void)
> {
> int sum = 0;
> int i;
> for (i = 0; i < 4; i++)
> sum += darn32();
> return sum;
> }
> ===
> 
> Okay, two insns, there's an add insn as well.  But *not* unrolling this likely
> makes bigger code already!

This is because

  /* If there is call on a hot path through the loop, then
 there is most probably not much to optimize.  */ 
  else if (size.num_non_pure_calls_on_hot_path) 
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Not unrolling loop %d: "
 "contains call and code would grow.\n",
 loop->num);
  return false;
  /* If there is pure/const call in the function, then we can
 still optimize the unrolled loop body if it contains some
 other interesting code than the calls and code storing or
 cumulating the return value.  */
  else if (size.num_pure_calls_on_hot_path
   /* One IV increment, one test, one ivtmp store and
  one useful stmt.  That is about minimal loop
  doing pure call.  */
   && (size.non_call_stmts_on_hot_path
   <= 3 + size.num_pure_calls_on_hot_path))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Not unrolling loop %d: " 
 "contains just pure calls and code would 
grow.\n",
 loop->num);
  return false;
}

this _only_ gets pre-empted if the loop appears to shrink (from the
removal of 1/3rd of stmts or other heuristics).  Not estimating the
calls to go away now makes the loop not shrinking and hit the call
exception (otherwise we allow some growth).

Not sure if DARN is properly marked pure (aka not writing to memory),
probably not, since otherwise it would not have been considered
having side-effects.

So - fix your builtins!  Not sure what the point about this
testcase is - a sum of 4 random numbers should be replaceable
with one random number & ~0x3.  But sure, darn() probably
cannot be pure (otherwise we'd CSE multiple calls).

So - bad testcase.  Do you have another one to show?

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #20 from Segher Boessenkool  ---
gcc.target/powerpc/darn-3.c

I'll quote the whole thing:

===
static int darn32(void) { return __builtin_darn_32(); }

int four(void)
{
int sum = 0;
int i;
for (i = 0; i < 4; i++)
sum += darn32();
return sum;
}
===

Okay, two insns, there's an add insn as well.  But *not* unrolling this likely
makes bigger code already!

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-13 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #19 from Georg-Johann Lay  ---
(In reply to Segher Boessenkool from comment #18)
> A single insn (always 4 bytes) is too much?!?
Maybe a small test case might help to show the issue.

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #18 from Segher Boessenkool  ---
So, you are saying the change made us realise some insns can never be optimised
away?  It should be obvious in much  more fundamental ways that these insns can
never be optimised away (simply because they are always executed), but heh.

> If you now no longer get "trivial" stuff unrolled then you run into
> the size limit.

A single insn (always 4 bytes) is too much?!?  There is no way to make
non-empty
code smaller, are you saying we will never get *anything* unrolled now?!?

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-12 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #17 from rguenther at suse dot de  ---
On Sat, 11 Jan 2025, segher at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825
> 
> --- Comment #16 from Segher Boessenkool  ---
> Trivial stuff is no longer unrolled at all after this change.  It does not
> matter
> *at all* whether an insn has a side effect or not, for whether code with such
> an
> insn should be unrolled or not.  Any heuristic like that is totally broken,
> totally
> nonsensical.

There's limits on how much code growth we allow and we estimate the
final code size, reducing that by 1/3 "optimistically" (followup
transforms might optimize), the fix was to make that reduction not
apply to statements that obviously never are valid to optimize.

If you now no longer get "trivial" stuff unrolled then you run into
the size limit.

The previous code, assessing that 1/3rd of all insns with side-effects
are going to be optimized away indeed made no sense.

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #16 from Segher Boessenkool  ---
Trivial stuff is no longer unrolled at all after this change.  It does not
matter
*at all* whether an insn has a side effect or not, for whether code with such
an
insn should be unrolled or not.  Any heuristic like that is totally broken,
totally
nonsensical.

In all failing cases I saw none of the code will be eliminated at all ever.

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

--- Comment #15 from Richard Biener  ---
(In reply to Segher Boessenkool from comment #14)
> Many Power testcases fail since this commit.  The commit says it wants to
> change
> what volatile accesses do, but instead, it changes behaviour for anything
> marked
> volatil.  That is a lot of asm, and a whole lot of builtins; the flag simply
> means "there is an unspecified side effect".
> 
> Please repair this?

It's intended - stmts with side effects are not going to be eliminated, the
heuristic was assuming they'd be, to 1/3, but that's obviously not true.

You have to fix the testcases or specifically argue why the heuristic is
now bad.  This should preferably happen in new missed-optimization
or testsuite-fail PRs.

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2025-01-10 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #14 from Segher Boessenkool  ---
Many Power testcases fail since this commit.  The commit says it wants to
change
what volatile accesses do, but instead, it changes behaviour for anything
marked
volatil.  That is a lot of asm, and a whole lot of builtins; the flag simply
means "there is an unspecified side effect".

Please repair this?

[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os

2024-11-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825

Richard Biener  changed:

   What|Removed |Added

  Known to work||15.0
Summary|[12/13/14/15 Regression]|[12/13/14 Regression] Loop
   |Loop unrolling increases|unrolling increases code
   |code size with -Os  |size with -Os

--- Comment #13 from Richard Biener  ---
Fixed on trunk, I'm unlikely going to backport this though as with all
heuristics this is going to cause some regressions (none important though,
hopefully).  There's a workaround by adding #pragma GCC unroll 0 to loops
where this hurts most.

Going to keep it open for tracking/duping anyway.