--
steven at gcc dot gnu dot org changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last
--- Comment #23 from changpeng dot fang at amd dot com 2010-07-21 21:30
---
Fixed
--
changpeng dot fang at amd dot com changed:
What|Removed |Added
--- Comment #21 from changpeng dot fang at amd dot com 2010-06-08 16:23
---
Just for the record, non-constant step prefetching improves 459.GemsFDTD
by 5.5% (under -O3 + prefetch) on amd-linux64 systems. And the gains are
from the following set of loops:
NFT.fppized.f90:1268
--- Comment #22 from borntraeger at de dot ibm dot com 2010-06-08 19:42
---
I bootstrapped with patches 0002 and 0003.
The results are also good.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297
--- Comment #14 from changpeng dot fang at amd dot com 2010-06-07 18:27
---
Here is the current status of my investigation:
(1) 465.tonto regression (~9%):
The regressions mainly comes from loops which have array references with both
constant (prefetch_mod = 8) and non-constant
--- Comment #15 from changpeng dot fang at amd dot com 2010-06-07 18:30
---
Created an attachment (id=20860)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20860action=view)
Don't consider effect of unrolling in the computation of insn-to-prefetch ratio
--
--- Comment #16 from changpeng dot fang at amd dot com 2010-06-07 18:32
---
Created an attachment (id=20861)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20861action=view)
Limit non-constant step prefetching only to the innermost loops
--
--- Comment #17 from changpeng dot fang at amd dot com 2010-06-07 18:37
---
(In reply to comment #15)
Created an attachment (id=20860)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20860action=view) [edit]
Don't consider effect of unrolling in the computation of insn-to-prefetch
--- Comment #18 from rakdver at kam dot mff dot cuni dot cz 2010-06-07
20:24 ---
Subject: Re: Big spec cpu2006 prefetch regressions
on gcc 4.6 on x86
--- Comment #14 from changpeng dot fang at amd dot com 2010-06-07 18:27
---
Here is the current status of my
--- Comment #19 from changpeng dot fang at amd dot com 2010-06-07 22:30
---
Created an attachment (id=20862)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20862action=view)
Account prefetch_mod and unroll_factor for the computation of the prefetch
count
Ooops. Attached a wrong
--- Comment #20 from borntraeger at de dot ibm dot com 2010-06-08 05:51
---
both patches look sane. I will test both.
thank you for your work.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44297
--- Comment #11 from changpeng dot fang at amd dot com 2010-06-01 17:40
---
(In reply to comment #10)
Created an attachment (id=20783)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20783action=view) [edit]
experimental patch to have separate values for
--- Comment #12 from borntraeger at de dot ibm dot com 2010-06-01 19:30
---
Ok. So I will let you continue to look into that and wait for your results?
Do you have any feedback on separate.patch and its influence on performance?
--
--- Comment #13 from changpeng dot fang at amd dot com 2010-06-01 19:59
---
(In reply to comment #12)
Ok. So I will let you continue to look into that and wait for your results?
Do you have any feedback on separate.patch and its influence on performance?
+ for (; groups;
--- Comment #10 from borntraeger at de dot ibm dot com 2010-05-31 08:58
---
Created an attachment (id=20783)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20783action=view)
experimental patch to have separate values for min_insn_to_prefetch_ration
Changpeng,
thank you for the
--- Comment #4 from borntraeger at de dot ibm dot com 2010-05-28 07:24
---
Created an attachment (id=20767)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20767action=view)
Patch that makes loop invariant prefetches backend specfic
Three observations:
1. the patch had a bug which
--- Comment #5 from borntraeger at de dot ibm dot com 2010-05-28 07:41
---
An alternative approach might be have different values for
prefetch-min-insn-to-mem-ratio and min-insn-to-prefetch-ratio
depending on constant/non-constant step size.
--
--- Comment #6 from changpeng dot fang at amd dot com 2010-05-28 16:46
---
(In reply to comment #4)
Created an attachment (id=20767)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20767action=view) [edit]
Patch that makes loop invariant prefetches backend specfic
Actually, I
--- Comment #7 from changpeng dot fang at amd dot com 2010-05-28 16:56
---
(In reply to comment #5)
An alternative approach might be have different values for
prefetch-min-insn-to-mem-ratio and min-insn-to-prefetch-ratio
depending on constant/non-constant step size.
It may be a
--- Comment #8 from changpeng dot fang at amd dot com 2010-05-28 18:30
---
(In reply to comment #4)
Created an attachment (id=20767)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20767action=view) [edit]
Patch that makes loop invariant prefetches backend specfic
Three
--- Comment #9 from changpeng dot fang at amd dot com 2010-05-28 18:36
---
(In reply to comment #8)
Looks like this is a fix to the regressions. That is, the regressions are
actually caused by the wrong calculation. This bug could be considered fixed,
even though performance tuning
--- Comment #1 from changpeng dot fang at amd dot com 2010-05-27 20:49
---
The regressions are most likely from the patch that added non-constant step
prefetching:
* From: Andreas Krebbel krebbel at linux dot vnet dot ibm dot com
* To: Christian Borntraeger borntraeger at de
--- Comment #2 from changpeng dot fang at amd dot com 2010-05-27 20:55
---
To me, non-constant step prefetching seems not fit into the existing
prefetching
framework. non-constant stride prevent any reuse analysis, and thus prefetching
is kind of blindly.
--
--- Comment #3 from changpeng dot fang at amd dot com 2010-05-27 23:51
---
I did a quick look at 434.zeusmp and found that prefetching for the following
simple loop is responsible:
linpck.f: 131:
c
ccode for increment not equal to 1
c
ix = 1
smax = abs(sx(1))
24 matches
Mail list logo