[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #28 from rguenther at suse dot de --- On Thu, 16 Jan 2025, segher at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 > > --- Comment #27 from Segher Boessenkool --- > > This is a GIMPLE pass which has no idea what the backend will expand > > __builtin_darn() to. > > So you are saying >90% of builtins now need to say they are pure and const > (which > makes totally no sense for things not touching memory)? It never was like > this, > only things that do touch memory at all ever looked at those attributes :-( All builtins you expect GCC should CSE need to be marked pure or const (and that was always the case). I think DARN shouldn't be CSEd though.
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #27 from Segher Boessenkool --- > This is a GIMPLE pass which has no idea what the backend will expand > __builtin_darn() to. So you are saying >90% of builtins now need to say they are pure and const (which makes totally no sense for things not touching memory)? It never was like this, only things that do touch memory at all ever looked at those attributes :-(
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #26 from Richard Biener --- (In reply to Segher Boessenkool from comment #25) > No, darn does have a side effect: it returns a random number in the > destination reg (_deliver_ _a_ _r_andom _n_umber). It does not touch memory > at all. > > There are no call insns at all either, of course, so how is that code you > show > relevant at all? This is a GIMPLE pass which has no idea what the backend will expand __builtin_darn() to. And yes, on the GIMPLE side builtin calls are calls. And builtins like __builtin_strftime () do expand to calls. As said already - if the rev in question caused tons of testsuite fails please open a _separate_ bugreport listing those so that we can properly track such a regression. _This_ bug is about the excessive unrolling and thus code size regression involving stmts with side-effects (on AVR).
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #25 from Segher Boessenkool --- No, darn does have a side effect: it returns a random number in the destination reg (_deliver_ _a_ _r_andom _n_umber). It does not touch memory at all. There are no call insns at all either, of course, so how is that code you show relevant at all? The most usual side effect is a store to memory, and that is way way way scarier than anything here. Treating things with a side effect as something that should hinder unrolling in any way is not very sensible.
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825
--- Comment #24 from Segher Boessenkool ---
> > Okay, two insns, there's an add insn as well. But *not* unrolling this
> > likely
> > makes bigger code already!
>
> This is because
>
> /* If there is call on a hot path through the loop, then
> there is most probably not much to optimize. */
> else if (size.num_non_pure_calls_on_hot_path)
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "Not unrolling loop %d: "
> "contains call and code would grow.\n",
> loop->num);
> return false;
> /* If there is pure/const call in the function, then we can
> still optimize the unrolled loop body if it contains some
> other interesting code than the calls and code storing or
> cumulating the return value. */
> else if (size.num_pure_calls_on_hot_path
>/* One IV increment, one test, one ivtmp store and
> one useful stmt. That is about minimal loop
> doing pure call. */
>&& (size.non_call_stmts_on_hot_path
><= 3 + size.num_pure_calls_on_hot_path))
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "Not unrolling loop %d: "
> "contains just pure calls and code would
> grow.\n",
> loop->num);
> return false;
> }
There are no calls at all here. Or, do builtins count as calls? That
makes no sense. Most builtins end up as simple single insns.
> this _only_ gets pre-empted if the loop appears to shrink (from the
> removal of 1/3rd of stmts or other heuristics). Not estimating the
> calls to go away now makes the loop not shrinking and hit the call
> exception (otherwise we allow some growth).
Completely unrolling this loop gets rid of all the loop setup etc. insns.
That is three insns setup, and one to do the actual looping, statically; and
the loop insn (each iteration) dynamically. Completely unrolling by 4x always
is a win.
> Not sure if DARN is properly marked pure (aka not writing to memory),
> probably not, since otherwise it would not have been considered
> having side-effects.
It does not touch memory at all. "pure" and "const" do not mean anything
for insns that do not touch memory.
> So - fix your builtins!
How, what, where?
> Not sure what the point about this
> testcase is - a sum of 4 random numbers should be replaceable
> with one random number & ~0x3. But sure, darn() probably
> cannot be pure (otherwise we'd CSE multiple calls).
The point of the testcase is ensuring that for every instance of the builtin
a darn insn is generated, instead of doing only one and multiplying by four,
as before (it was an unspec instead of an unspecv).
> So - bad testcase. Do you have another one to show?
Lots and lots and lots. But if you cannot agree that what we did before was
good and what we do now is bad, i.e. this is a regression, for *this* testcase,
I don't know how to ever convince you of anything :-(
Segher
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #23 from rguenther at suse dot de --- On Wed, 15 Jan 2025, hubicka at ucw dot cz wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 > > --- Comment #22 from Jan Hubicka --- > > /* If there is pure/const call in the function, then we can > > still optimize the unrolled loop body if it contains some > > other interesting code than the calls and code storing or > > cumulating the return value. */ > > Note that these days we could optimize around non-pure/const functions > if modref summary is informative enough. Common case is when function > return value by writting to pointer passed as argument. > I am however not sure how to predict this - i.e. where to draw line > between useful and useless modref summary. note this case involves a (target) builtin, we could try to special-case some (the target could?), we have is_inexpensive_builtin which unconditionally says true for all target builtins ... But I think the motivating testcase we have is bad, so I'd like to see a better one. Also this doesn't feel like appropriate time to change such things - I'm not going to backport the side-effect change either.
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #22 from Jan Hubicka --- > /* If there is pure/const call in the function, then we can > still optimize the unrolled loop body if it contains some > other interesting code than the calls and code storing or > cumulating the return value. */ Note that these days we could optimize around non-pure/const functions if modref summary is informative enough. Common case is when function return value by writting to pointer passed as argument. I am however not sure how to predict this - i.e. where to draw line between useful and useless modref summary. Honza
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825
--- Comment #21 from rguenther at suse dot de ---
On Mon, 13 Jan 2025, segher at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825
>
> --- Comment #20 from Segher Boessenkool ---
> gcc.target/powerpc/darn-3.c
>
> I'll quote the whole thing:
>
> ===
> static int darn32(void) { return __builtin_darn_32(); }
>
> int four(void)
> {
> int sum = 0;
> int i;
> for (i = 0; i < 4; i++)
> sum += darn32();
> return sum;
> }
> ===
>
> Okay, two insns, there's an add insn as well. But *not* unrolling this likely
> makes bigger code already!
This is because
/* If there is call on a hot path through the loop, then
there is most probably not much to optimize. */
else if (size.num_non_pure_calls_on_hot_path)
{
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Not unrolling loop %d: "
"contains call and code would grow.\n",
loop->num);
return false;
/* If there is pure/const call in the function, then we can
still optimize the unrolled loop body if it contains some
other interesting code than the calls and code storing or
cumulating the return value. */
else if (size.num_pure_calls_on_hot_path
/* One IV increment, one test, one ivtmp store and
one useful stmt. That is about minimal loop
doing pure call. */
&& (size.non_call_stmts_on_hot_path
<= 3 + size.num_pure_calls_on_hot_path))
{
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Not unrolling loop %d: "
"contains just pure calls and code would
grow.\n",
loop->num);
return false;
}
this _only_ gets pre-empted if the loop appears to shrink (from the
removal of 1/3rd of stmts or other heuristics). Not estimating the
calls to go away now makes the loop not shrinking and hit the call
exception (otherwise we allow some growth).
Not sure if DARN is properly marked pure (aka not writing to memory),
probably not, since otherwise it would not have been considered
having side-effects.
So - fix your builtins! Not sure what the point about this
testcase is - a sum of 4 random numbers should be replaceable
with one random number & ~0x3. But sure, darn() probably
cannot be pure (otherwise we'd CSE multiple calls).
So - bad testcase. Do you have another one to show?
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825
--- Comment #20 from Segher Boessenkool ---
gcc.target/powerpc/darn-3.c
I'll quote the whole thing:
===
static int darn32(void) { return __builtin_darn_32(); }
int four(void)
{
int sum = 0;
int i;
for (i = 0; i < 4; i++)
sum += darn32();
return sum;
}
===
Okay, two insns, there's an add insn as well. But *not* unrolling this likely
makes bigger code already!
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #19 from Georg-Johann Lay --- (In reply to Segher Boessenkool from comment #18) > A single insn (always 4 bytes) is too much?!? Maybe a small test case might help to show the issue.
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #18 from Segher Boessenkool --- So, you are saying the change made us realise some insns can never be optimised away? It should be obvious in much more fundamental ways that these insns can never be optimised away (simply because they are always executed), but heh. > If you now no longer get "trivial" stuff unrolled then you run into > the size limit. A single insn (always 4 bytes) is too much?!? There is no way to make non-empty code smaller, are you saying we will never get *anything* unrolled now?!?
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #17 from rguenther at suse dot de --- On Sat, 11 Jan 2025, segher at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 > > --- Comment #16 from Segher Boessenkool --- > Trivial stuff is no longer unrolled at all after this change. It does not > matter > *at all* whether an insn has a side effect or not, for whether code with such > an > insn should be unrolled or not. Any heuristic like that is totally broken, > totally > nonsensical. There's limits on how much code growth we allow and we estimate the final code size, reducing that by 1/3 "optimistically" (followup transforms might optimize), the fix was to make that reduction not apply to statements that obviously never are valid to optimize. If you now no longer get "trivial" stuff unrolled then you run into the size limit. The previous code, assessing that 1/3rd of all insns with side-effects are going to be optimized away indeed made no sense.
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #16 from Segher Boessenkool --- Trivial stuff is no longer unrolled at all after this change. It does not matter *at all* whether an insn has a side effect or not, for whether code with such an insn should be unrolled or not. Any heuristic like that is totally broken, totally nonsensical. In all failing cases I saw none of the code will be eliminated at all ever.
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 --- Comment #15 from Richard Biener --- (In reply to Segher Boessenkool from comment #14) > Many Power testcases fail since this commit. The commit says it wants to > change > what volatile accesses do, but instead, it changes behaviour for anything > marked > volatil. That is a lot of asm, and a whole lot of builtins; the flag simply > means "there is an unspecified side effect". > > Please repair this? It's intended - stmts with side effects are not going to be eliminated, the heuristic was assuming they'd be, to 1/3, but that's obviously not true. You have to fix the testcases or specifically argue why the heuristic is now bad. This should preferably happen in new missed-optimization or testsuite-fail PRs.
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #14 from Segher Boessenkool --- Many Power testcases fail since this commit. The commit says it wants to change what volatile accesses do, but instead, it changes behaviour for anything marked volatil. That is a lot of asm, and a whole lot of builtins; the flag simply means "there is an unspecified side effect". Please repair this?
[Bug tree-optimization/115825] [12/13/14 Regression] Loop unrolling increases code size with -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115825 Richard Biener changed: What|Removed |Added Known to work||15.0 Summary|[12/13/14/15 Regression]|[12/13/14 Regression] Loop |Loop unrolling increases|unrolling increases code |code size with -Os |size with -Os --- Comment #13 from Richard Biener --- Fixed on trunk, I'm unlikely going to backport this though as with all heuristics this is going to cause some regressions (none important though, hopefully). There's a workaround by adding #pragma GCC unroll 0 to loops where this hurts most. Going to keep it open for tracking/duping anyway.
