I can't see any issues with this version of the spec.
Thanks. From the silence, we seemed to have lost the innocent
bystanders? Anyway, for those who haven't noticed, there is now
a feature request ticket (for that good feeling of closing it when this
is finally implemented;-) as well as a
2009/3/19 Claus Reinke claus.rei...@talk21.com:
If the map, filter, fold, etc can be unrolled, then the unrolled
definitions would be fused, right? So what is missing is fine
control (how much to unroll this particular call to map here).
The issues is that In stream fusion the combinators like
2009/3/19 Claus Reinke claus.rei...@talk21.com:
Recursion unfolding spec, 2nd attempt.
If this is an improvement on the first version, and after correcting any
obvious issues, I should put it on the ghc trac wiki somewhere, and create a
feature request ticket.
I can't see any issues
Dear Simon*,
thanks for answering my concerns about -fvia-C replacement. Are these
answers somewhere in the ghc wiki, or perhaps they'd make a good basis
for a useful ghc blog post?
So, -fasm will soon be up to speed with -fvia-C in all cases, new native
backends are not more difficult than
Recursion unfolding spec, 2nd attempt.
The main difference is to look at groups of mutually recursive
definitions as a whole, rather than trying to think about individual
definitions. That step actually seems sufficient to address most of
the shortcomings raised so far, such as avoiding runaway
{-# INLINE f PEEL n UNROLL m #-}
The problem here is that this only works for directly recursive
functions which I, for instance, don't normally use in high-
performance code. Most of my loops are pipelines of collective
combinators like map, filter, fold etc. because these are the ones
Claus, Max
| My preferred spec would be roughly
|
| {-# NOINLINE f #-}
| as now
|
| {-# INLINE f #-}
| works as now, which is for non-recursive f only (might in future
| be taken as go-ahead for analysis-based recursion unfolding)
|
| {-# INLINE f PEEL n #-}
| inline calls
Hello Simon,
Thursday, March 12, 2009, 1:29:56 AM, you wrote:
For implementation, there are two routes. Either totally built-in,
or using a Core-to-Core plug-in. The thing I like about the latter
is that it can be done without having GHC HQ in the critical path,
because we (I) tend to slow
| What I don't understand yet is the routemap for replacing -fvia-C
Good points, Claus. I think the story is as follows:
* -fvia-C does not produce much better code, except in exceptionally tight
loops, because GHC gives gcc very little scope for optimisation. Simon
mentioned something like
2009/3/9 Roman Leshchinskiy r...@cse.unsw.edu.au:
The problem here is that this only works for directly recursive functions
which I, for instance, don't normally use in high-performance code. Most of
my loops are pipelines of collective combinators like map, filter, fold etc.
because these are
let f = ..f.. in f{n,m} -PEEL- let f = ..f.. in ..f{n-1,m}..
Probably what you intend here is that you create one copy of the
definition every round rather than one per call site, is that right?
I don't think so - ultimately, the point of both peeling and unrolling
is to unfold a definition
{-# INLINE f PEEL n #-}
inline calls *into* recursive f (called loop peeling for loops)
{-# INLINE f UNROLL m #-}
inline recursive calls to f *inside* f (called loop unrolling for
loops)
{-# INLINE f PEEL n UNROLL m #-}
combine the previous two
The problem here is that this only works
Claus Reinke wrote:
That was one of my questions in the optimization and rewrite rules
thread: shouldn't -fvia-C be supported (as a non-default option)
for at least as long as the alternative isn't a clear win in all cases?
The trouble with supporting multiple backends is that the cost in
2009/3/9 Claus Reinke claus.rei...@talk21.com:
let f = ..f.. in f{n,m} -PEEL- let f = ..f.. in ..f{n-1,m}..
Probably what you intend here is that you create one copy of the
definition every round rather than one per call site, is that right?
I don't think so - ultimately, the point of both
let f = ..f.. in f{n,m} -PEEL- let f = ..f.. in ..f{n-1,m}..
Probably what you intend here is that you create one copy of the
definition every round rather than one per call site, is that right?
I don't think so - ultimately, the point of both peeling and unrolling is to
unfold a definition
2009/3/9 Claus Reinke claus.rei...@talk21.com:
But if you annotate all your unrolled and peeled new definitions as
NOINLINE, do you still get the optimizations you want? There are probably a
few GHC optimizations that can look through non-recursive
lets, but RULES are not among those.
The
On 2009 Mar 9, at 9:32, Claus Reinke wrote:
One way out would be to treat the whole mutual recursion as a single
entity, either implicitly, as I indicated, or explicitly, as I
interpret Brandon's somewhat ambiguous comment. In other words, the
peel/unroll limits would apply to a whole group
On 07/03/2009, at 09:26, Claus Reinke wrote:
My preferred spec would be roughly
{-# NOINLINE f #-}
as now
{-# INLINE f #-}works as now, which is for non-recursive f only
(might in future
be taken as go-ahead for analysis-based recursion unfolding)
{-# INLINE f PEEL n #-}
inline
2009/3/7 Claus Reinke claus.rei...@talk21.com:
hmm, appropriate is one of those words that shouldn't occur in specs,
not even rough ones, so let's flesh this out a bit, by abstract example.
let f = ..f.. in f{n,m} -PEEL- let f = ..f.. in ..f{n-1,m}..
Probably what you intend here is that you
Claus Reinke wrote:
its loop unroller on this guy haven't succeeded. -funroll-loops and
-funroll-all-loops doesn't touch it,
That's because the C produced by GHC doesn't look like a loop to GCC.
This can be fixed but given that we are moving away from -fvia-C
anyway, it probably isn't
That was one of my questions in the optimization and rewrite rules
thread: shouldn't -fvia-C be supported (as a non-default option)
for at least as long as the alternative isn't a clear win in all cases?
The trouble with supporting multiple backends is that the cost in terms of
testing and
The implementation I'm thinking of is basically trivial. You just add
the information gathered from the pragmas onto the Ids, then have a
dedicated core pass that looks at the pragmas and does it's
worker/wrapper thing. The technology to do peeling/unrolling is
trivial and there already examples
My preferred spec would be roughly
{-# NOINLINE f #-}
as now
{-# INLINE f #-}
works as now, which is for non-recursive f only (might in future
be taken as go-ahead for analysis-based recursion unfolding)
{-# INLINE f PEEL n #-}
inline calls *into* recursive f (called loop peeling
On March 1, 2009 17:31:13 Max Bolingbroke wrote:
I am no assembly guru and haven't seen that last form of leaq either,
but I'm going to guess that:
leaq(%rsi,%rsi,4), %rax
Says that rax is rsi * ((1 + 1) * 2 ^ 4) = rsi * 32
leaq0(,%rax,8), %rsi
If I recall
its loop unroller on this guy haven't succeeded. -funroll-loops and
-funroll-all-loops doesn't touch it,
That's because the C produced by GHC doesn't look like a loop to GCC. This can be fixed but given
that we are moving away from -fvia-C anyway, it probably isn't worth doing.
That was
So now, since we've gone to such effort to produce a tiny loop like, this,
can't we unroll it just a little?
it is worth unrolling this guy, so we get the win of both aggressive high level
fusion, and aggressive low level loop optimisations?
It might be useful to point out that the
2009/3/1 Claus Reinke claus.rei...@talk21.com:
It might be useful to point out that the interaction goes both ways.
Not only are fused loops candidates for unrolling, but unrolling
can also enable fusion, giving one example of why Core-level
unrolling (in addition to backend-level loop
Hey guys,
We have nice fusion frameworks now. E.g. stream fusion on uvector,
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uvector
Takes something like this:
import Data.Array.Vector
import Data.Bits
main = print . productU . mapU (*2) . mapU (`shiftL` 2) $
import Data.Array.Vector
import Data.Bits
main = print . productU . mapU (*2) . mapU (`shiftL` 2) $ replicateU (1 :: Int)
(5::Int)
and turns it into a loop like this:
$wfold :: Int# - Int# - Int#
$wfold =
\ (ww_sWX :: Int#) (ww1_sX1 :: Int#) -
case ww1_sX1
2009/2/28 Don Stewart d...@galois.com:
So now, since we've gone to such effort to produce a tiny loop like, this,
can't we unroll it just a little? Sadly, my attempts to get GCC to trigger
its loop unroller on this guy haven't succeeded. -funroll-loops and
-funroll-all-loops doesn't touch it,
On 01/03/2009, at 04:49, Don Stewart wrote:
So now, since we've gone to such effort to produce a tiny loop like,
this,
can't we unroll it just a little? Sadly, my attempts to get GCC to
trigger
its loop unroller on this guy haven't succeeded. -funroll-loops and
-funroll-all-loops doesn't
31 matches
Mail list logo