Lennart Augustsson wrote:
So then tail calls should be very cheap when most of the arguments don't
change.
Yes, but the problem tends to be the arguments that change, and the fact that
they are passed on the stack. A C loop would keep the loop-carried variables in
registers. On x86_64 you g
OK, so there are few extra costs for unknown tail calls.
First, there might be a pipeline stall if the jump address isn't
loaded into a register far enough in advance before the jump.
Second, you need to pass information about the number of arguments in
the caller.
And finally, you need to te
On Tue, Apr 10, 2007 at 07:59:04PM +0100, Lennart Augustsson wrote:
> So then tail calls should be very cheap when most of the arguments
> don't change.
>
> On Apr 10, 2007, at 10:17 , Simon Marlow wrote:
>
> >Lennart Augustsson wrote:
> >>It's not that hard to figure out an order to permute th
So then tail calls should be very cheap when most of the arguments
don't change.
On Apr 10, 2007, at 10:17 , Simon Marlow wrote:
Lennart Augustsson wrote:
It's not that hard to figure out an order to permute the arguments
on the stack before a tail call that minimizes that number of
moves
Lennart Augustsson wrote:
It's not that hard to figure out an order to permute the arguments on
the stack before a tail call that minimizes that number of moves and
temporary locations. Lmlc did this 20 years ago. :)
Right, and that's what GHC does too, with a strongly-connected-component
an
| 6. The inliner is a bit too greedy. Removing the slow-path code from
|singleton doesn't help because popSingleton is only used once; but
|if I explicitly {-# NOINLINE popSingleton #-}, the code for
|singleton itself becomes much smaller, and inlinable (15% perf
|gain). Plus the
It's not that hard to figure out an order to permute the arguments on
the stack before a tail call that minimizes that number of moves and
temporary locations. Lmlc did this 20 years ago. :)
-- Lennart
On Apr 5, 2007, at 19:17 , Claus Reinke wrote:
Stefan O'Rear wrote:
> 2. Parame
Stefan O'Rear wrote:
> 2. Parameters are very expensive. Our type of functions that build
>(ignoring CPS for the time being) was MBA# -> Int# -> [ByteString],
>where the Int# is the current write pointer. Adding an extra Int#
>to cache the size of the array (rather than calling sMBA#
| 5. State# threads clog the optimizer quite effectively. Replacing
|st(n-1)# with realWorld# everywhere I could count on data
|dependencies to do the same job doubled performance.
The idea is that the optimiser should allow you to write at a high level, and
do the book keeping for you.
Stefan O'Rear wrote:
> 2. Parameters are very expensive. Our type of functions that build
>(ignoring CPS for the time being) was MBA# -> Int# -> [ByteString],
>where the Int# is the current write pointer. Adding an extra Int#
>to cache the size of the array (rather than calling sMBA#
10 matches
Mail list logo