--- Comment #16 from steven at gcc dot gnu dot org 2010-02-05 13:33 ---
I'm trying to coerce IVOPTSs into producing the following, optimal code in the
GIMPLE optimizers (without much luck, so far):
bb 2:
pretmp.11_26 = (int) s_11(D);
ivtmp.20_28 = (long unsigned int) b_inout_5(D);
--- Comment #17 from rakdver at kam dot mff dot cuni dot cz 2010-02-05
13:58 ---
Subject: Re: Inefficient loop unrolling
But if I adjust the cost to make
index cost only 1 or 2, I get this:
bb 2:
pretmp.11_26 = (int) s_11(D);
ivtmp.25_28 = (long unsigned int)
--- Comment #18 from steven at gcc dot gnu dot org 2010-02-05 14:02 ---
I used -O2 -std=c99 -mcpu=arm9 -funroll-loops and I manually hacked the cost
in GDB to change from:
Address costs:
index costs 6
cst + index costs 2
...to this...:
Address costs:
index costs 1
cst + index
--- Comment #9 from rearnsha at gcc dot gnu dot org 2010-02-04 11:11
---
(In reply to comment #8)
ldr r2, [r1, #0]
mul r3, r2, r0
str r3, [r1], #4
ldr r2, [r1, #0]
mul r3, r2, r0
str r3, [r1], #4
--- Comment #10 from steven at gcc dot gnu dot org 2010-02-04 11:21 ---
I'm going to crack this bug.
--
steven at gcc dot gnu dot org changed:
What|Removed |Added
--- Comment #11 from rguenth at gcc dot gnu dot org 2010-02-04 11:47
---
Also try the patches from PR42617 to see if they improve the pre-regalloc
scheduling.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36712
--- Comment #12 from steven at gcc dot gnu dot org 2010-02-04 14:54 ---
With the patches from bug 42617 applied, I get the following:
.file tst.c
.text
.align 2
.global Unroll
.type Unroll, %function
Unroll:
@ args = 0, pretend = 0,
--- Comment #13 from steven at gcc dot gnu dot org 2010-02-04 14:56 ---
With -fno-web, the patches from bug 42617 do not help and the output is the
same as that of comment #8 (second asm dump).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36712
--- Comment #14 from steven at gcc dot gnu dot org 2010-02-04 15:19 ---
Part of the problem comes from the way IVOPTS optimizes the memory access:
;; Generating RTL for gimple basic block 3
;; D.1814_10 = MEM[base: D.1846_29];
(insn 52 51 0 tst.c:6 (set (reg:SI 172 [ D.1814 ])
--- Comment #15 from steven at gcc dot gnu dot org 2010-02-04 16:06 ---
The patches for bug 31849 have been commited, it seems, and it doesn't help for
this case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36712
--- Comment #8 from froydnj at gcc dot gnu dot org 2010-01-25 21:10 ---
First, something has gotten better; an arm-eabi gcc (-O2 -std=c99 -mcpu=arm9
-funroll-loops) from 20091209 gives:
Unroll:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 0
--- Comment #7 from drow at gcc dot gnu dot org 2009-10-15 12:12 ---
I really would like to see this submitted - at least as a starting point for
discussion. You don't need to do anything different than for a small patch; if
you've missed any steps, a reviewer will tell you.
Another
--- Comment #6 from bmei at broadcom dot com 2009-05-21 08:38 ---
I only submitted small patch before. To add a pass (may need new command-line
option, disabling the old rtl-level unrolling) seems to be a big issue to me.
Don't know what's procedure.
My code also contains my own
--- Comment #1 from ramana at gcc dot gnu dot org 2009-05-20 13:19 ---
Can be reproduced with trunk today.
--
ramana at gcc dot gnu dot org changed:
What|Removed |Added
--- Comment #2 from rguenth at gcc dot gnu dot org 2009-05-20 14:09 ---
I think there is no induction variable optimization on RTL anymore.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36712
--- Comment #3 from ramana at gcc dot gnu dot org 2009-05-20 14:14 ---
There was a discussion thread here.
http://gcc.gnu.org/ml/gcc/2008-07/msg00037.html and one of the solutions that
Bingfeng was investigating was loop unrolling before ivopts in certain cases
being useful .
--
--- Comment #4 from bmei at broadcom dot com 2009-05-20 14:17 ---
I implemented a tree-level loop-unrolling pass in our private porting, which
takes advantage of later tree ivopt pass. It produces much better code than
rtl-level loop unrolling in such scenarios. Not sure whether
--- Comment #5 from dje dot gcc at gmail dot com 2009-05-20 17:51 ---
Subject: Re: Inefficient loop unrolling
I implemented a tree-level loop-unrolling pass in our private porting, which
takes advantage of later tree ivopt pass. It produces much better code than
rtl-level loop
18 matches
Mail list logo