Re: Loop unrolling + fusion ?

2009-03-27 Thread Claus Reinke
I can't see any issues with this version of the spec. Thanks. From the silence, we seemed to have lost the innocent bystanders? Anyway, for those who haven't noticed, there is now a feature request ticket (for that good feeling of closing it when this is finally implemented;-) as well as a

Re: Loop unrolling + fusion ?

2009-03-19 Thread Max Bolingbroke
2009/3/19 Claus Reinke claus.rei...@talk21.com: If the map, filter, fold, etc can be unrolled, then the unrolled definitions would be fused, right? So what is missing is fine control (how much to unroll this particular call to map here). The issues is that In stream fusion the combinators like

Re: Loop unrolling + fusion ?

2009-03-19 Thread Max Bolingbroke
2009/3/19 Claus Reinke claus.rei...@talk21.com: Recursion unfolding spec, 2nd attempt. If this is an improvement on the first version, and after correcting any obvious issues, I should put it on the ghc trac wiki somewhere, and create a feature request ticket. I can't see any issues

Re: Loop unrolling + fusion ?

2009-03-19 Thread Claus Reinke
Dear Simon*, thanks for answering my concerns about -fvia-C replacement. Are these answers somewhere in the ghc wiki, or perhaps they'd make a good basis for a useful ghc blog post? So, -fasm will soon be up to speed with -fvia-C in all cases, new native backends are not more difficult than

Re: Loop unrolling + fusion ?

2009-03-18 Thread Claus Reinke
Recursion unfolding spec, 2nd attempt. The main difference is to look at groups of mutually recursive definitions as a whole, rather than trying to think about individual definitions. That step actually seems sufficient to address most of the shortcomings raised so far, such as avoiding runaway

Re: Loop unrolling + fusion ?

2009-03-18 Thread Claus Reinke
{-# INLINE f PEEL n UNROLL m #-} The problem here is that this only works for directly recursive functions which I, for instance, don't normally use in high- performance code. Most of my loops are pipelines of collective combinators like map, filter, fold etc. because these are the ones

RE: Loop unrolling + fusion ?

2009-03-11 Thread Simon Peyton-Jones
Claus, Max | My preferred spec would be roughly | | {-# NOINLINE f #-} | as now | | {-# INLINE f #-} | works as now, which is for non-recursive f only (might in future | be taken as go-ahead for analysis-based recursion unfolding) | | {-# INLINE f PEEL n #-} | inline calls

Re[2]: Loop unrolling + fusion ?

2009-03-11 Thread Bulat Ziganshin
Hello Simon, Thursday, March 12, 2009, 1:29:56 AM, you wrote: For implementation, there are two routes. Either totally built-in, or using a Core-to-Core plug-in. The thing I like about the latter is that it can be done without having GHC HQ in the critical path, because we (I) tend to slow

RE: Loop unrolling + fusion ?

2009-03-10 Thread Simon Peyton-Jones
| What I don't understand yet is the routemap for replacing -fvia-C Good points, Claus. I think the story is as follows: * -fvia-C does not produce much better code, except in exceptionally tight loops, because GHC gives gcc very little scope for optimisation. Simon mentioned something like

Re: Loop unrolling + fusion ?

2009-03-09 Thread Max Bolingbroke
2009/3/9 Roman Leshchinskiy r...@cse.unsw.edu.au: The problem here is that this only works for directly recursive functions which I, for instance, don't normally use in high-performance code. Most of my loops are pipelines of collective combinators like map, filter, fold etc. because these are

Re: Loop unrolling + fusion ?

2009-03-09 Thread Claus Reinke
let f = ..f.. in f{n,m} -PEEL- let f = ..f.. in ..f{n-1,m}.. Probably what you intend here is that you create one copy of the definition every round rather than one per call site, is that right? I don't think so - ultimately, the point of both peeling and unrolling is to unfold a definition

Re: Loop unrolling + fusion ?

2009-03-09 Thread Claus Reinke
{-# INLINE f PEEL n #-} inline calls *into* recursive f (called loop peeling for loops) {-# INLINE f UNROLL m #-} inline recursive calls to f *inside* f (called loop unrolling for loops) {-# INLINE f PEEL n UNROLL m #-} combine the previous two The problem here is that this only works

Re: Loop unrolling + fusion ?

2009-03-09 Thread Simon Marlow
Claus Reinke wrote: That was one of my questions in the optimization and rewrite rules thread: shouldn't -fvia-C be supported (as a non-default option) for at least as long as the alternative isn't a clear win in all cases? The trouble with supporting multiple backends is that the cost in

Re: Loop unrolling + fusion ?

2009-03-09 Thread Max Bolingbroke
2009/3/9 Claus Reinke claus.rei...@talk21.com: let f = ..f.. in f{n,m} -PEEL- let f = ..f.. in ..f{n-1,m}.. Probably what you intend here is that you create one copy of the definition every round rather than one per call site, is that right? I don't think so - ultimately, the point of both

Re: Loop unrolling + fusion ?

2009-03-09 Thread Claus Reinke
let f = ..f.. in f{n,m} -PEEL- let f = ..f.. in ..f{n-1,m}.. Probably what you intend here is that you create one copy of the definition every round rather than one per call site, is that right? I don't think so - ultimately, the point of both peeling and unrolling is to unfold a definition

Re: Loop unrolling + fusion ?

2009-03-09 Thread Max Bolingbroke
2009/3/9 Claus Reinke claus.rei...@talk21.com: But if you annotate all your unrolled and peeled new definitions as NOINLINE, do you still get the optimizations you want? There are probably a few GHC optimizations that can look through non-recursive lets, but RULES are not among those. The

Re: Loop unrolling + fusion ?

2009-03-09 Thread Brandon S. Allbery KF8NH
On 2009 Mar 9, at 9:32, Claus Reinke wrote: One way out would be to treat the whole mutual recursion as a single entity, either implicitly, as I indicated, or explicitly, as I interpret Brandon's somewhat ambiguous comment. In other words, the peel/unroll limits would apply to a whole group

Re: Loop unrolling + fusion ?

2009-03-08 Thread Roman Leshchinskiy
On 07/03/2009, at 09:26, Claus Reinke wrote: My preferred spec would be roughly {-# NOINLINE f #-} as now {-# INLINE f #-}works as now, which is for non-recursive f only (might in future be taken as go-ahead for analysis-based recursion unfolding) {-# INLINE f PEEL n #-} inline

Re: Loop unrolling + fusion ?

2009-03-07 Thread Max Bolingbroke
2009/3/7 Claus Reinke claus.rei...@talk21.com: hmm, appropriate is one of those words that shouldn't occur in specs, not even rough ones, so let's flesh this out a bit, by abstract example. let f = ..f.. in f{n,m} -PEEL- let f = ..f.. in ..f{n-1,m}.. Probably what you intend here is that you

Re: Loop unrolling + fusion ?

2009-03-06 Thread Simon Marlow
Claus Reinke wrote: its loop unroller on this guy haven't succeeded. -funroll-loops and -funroll-all-loops doesn't touch it, That's because the C produced by GHC doesn't look like a loop to GCC. This can be fixed but given that we are moving away from -fvia-C anyway, it probably isn't

Re: Loop unrolling + fusion ?

2009-03-06 Thread Claus Reinke
That was one of my questions in the optimization and rewrite rules thread: shouldn't -fvia-C be supported (as a non-default option) for at least as long as the alternative isn't a clear win in all cases? The trouble with supporting multiple backends is that the cost in terms of testing and

Re: Loop unrolling + fusion ?

2009-03-06 Thread Claus Reinke
The implementation I'm thinking of is basically trivial. You just add the information gathered from the pragmas onto the Ids, then have a dedicated core pass that looks at the pragmas and does it's worker/wrapper thing. The technology to do peeling/unrolling is trivial and there already examples

Re: Loop unrolling + fusion ?

2009-03-06 Thread Claus Reinke
My preferred spec would be roughly {-# NOINLINE f #-} as now {-# INLINE f #-} works as now, which is for non-recursive f only (might in future be taken as go-ahead for analysis-based recursion unfolding) {-# INLINE f PEEL n #-} inline calls *into* recursive f (called loop peeling

Re: Loop unrolling + fusion ?

2009-03-02 Thread Tyson Whitehead
On March 1, 2009 17:31:13 Max Bolingbroke wrote: I am no assembly guru and haven't seen that last form of leaq either, but I'm going to guess that: leaq(%rsi,%rsi,4), %rax Says that rax is rsi * ((1 + 1) * 2 ^ 4) = rsi * 32 leaq0(,%rax,8), %rsi If I recall

Re: Loop unrolling + fusion ?

2009-03-01 Thread Claus Reinke
its loop unroller on this guy haven't succeeded. -funroll-loops and -funroll-all-loops doesn't touch it, That's because the C produced by GHC doesn't look like a loop to GCC. This can be fixed but given that we are moving away from -fvia-C anyway, it probably isn't worth doing. That was

Re: Loop unrolling + fusion ?

2009-03-01 Thread Claus Reinke
So now, since we've gone to such effort to produce a tiny loop like, this, can't we unroll it just a little? it is worth unrolling this guy, so we get the win of both aggressive high level fusion, and aggressive low level loop optimisations? It might be useful to point out that the

Re: Loop unrolling + fusion ?

2009-03-01 Thread Max Bolingbroke
2009/3/1 Claus Reinke claus.rei...@talk21.com: It might be useful to point out that the interaction goes both ways. Not only are fused loops candidates for unrolling, but unrolling can also enable fusion, giving one example of why Core-level unrolling (in addition to backend-level loop

Loop unrolling + fusion ?

2009-02-28 Thread Don Stewart
Hey guys, We have nice fusion frameworks now. E.g. stream fusion on uvector, http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uvector Takes something like this: import Data.Array.Vector import Data.Bits main = print . productU . mapU (*2) . mapU (`shiftL` 2) $

Re: Loop unrolling + fusion ?

2009-02-28 Thread Claus Reinke
import Data.Array.Vector import Data.Bits main = print . productU . mapU (*2) . mapU (`shiftL` 2) $ replicateU (1 :: Int) (5::Int) and turns it into a loop like this: $wfold :: Int# - Int# - Int# $wfold = \ (ww_sWX :: Int#) (ww1_sX1 :: Int#) - case ww1_sX1

Re: Loop unrolling + fusion ?

2009-02-28 Thread Max Bolingbroke
2009/2/28 Don Stewart d...@galois.com: So now, since we've gone to such effort to produce a tiny loop like, this, can't we unroll it just a little? Sadly, my attempts to get GCC to trigger its loop unroller on this guy haven't succeeded. -funroll-loops and -funroll-all-loops doesn't  touch it,

Re: Loop unrolling + fusion ?

2009-02-28 Thread Roman Leshchinskiy
On 01/03/2009, at 04:49, Don Stewart wrote: So now, since we've gone to such effort to produce a tiny loop like, this, can't we unroll it just a little? Sadly, my attempts to get GCC to trigger its loop unroller on this guy haven't succeeded. -funroll-loops and -funroll-all-loops doesn't