[racket-dev] better x86 performance

2011-04-24 Thread Matthew Flatt
The `assoc' example helped focus my attention on a long-unsolved issue with JIT-generated code, where non-tail calls from JIT-generated code to other JIT-generated code seemed more expensive than they should be. This effect showed up in `assq' and `assoc' through a high relative cost for calling `a

Re: [racket-dev] better x86 performance

2011-04-24 Thread Eli Barzilay
An hour and a half ago, Matthew Flatt wrote: > > [...] Later, the `ret' to return from the non-tail call would > confuse the processor and caused stalls, because the `ret' it wasn't > matched with its `call'. It's easy enough to put the return address > in place using `call' when setting up a fra

Re: [racket-dev] better x86 performance

2011-04-24 Thread Robby Findler
On Sun, Apr 24, 2011 at 7:56 PM, Eli Barzilay wrote: > An hour and a half ago, Matthew Flatt wrote: >> >> [...] Later, the `ret' to return from the non-tail call would >> confuse the processor and caused stalls, because the `ret' it wasn't >> matched with its `call'.  It's easy enough to put the r

Re: [racket-dev] better x86 performance

2011-04-24 Thread Eli Barzilay
Two minutes ago, Robby Findler wrote: > On Sun, Apr 24, 2011 at 7:56 PM, Eli Barzilay wrote: > > An hour and a half ago, Matthew Flatt wrote: > >> > >> [...] Later, the `ret' to return from the non-tail call would > >> confuse the processor and caused stalls, because the `ret' it wasn't > >> match

Re: [racket-dev] better x86 performance

2011-04-24 Thread Vincent St-Amour
These are impressive speedups! Given how close we were to the fastest Scheme compilers on some of these, that may be enough to give us the lead. I'll run the benchmarks on different implementations tomorrow. Vincent At Sun, 24 Apr 2011 17:11:21 -0600, Matthew Flatt wrote: > > The `assoc' exam

Re: [racket-dev] better x86 performance

2011-04-26 Thread Vincent St-Amour
Here are the numbers for Racket, Typed Racket, Gambit and Larceny on 32 bits, and without Larceny on 64 bits. Overall, we're competitive, but we're losing pretty hard on deriv. Vincent  fastestgambit larceny racket typed-racket cpstack5715 ms  1.13 1 1.46 1.44 dderiv6200 ms  1.48 1 1.49 1.51 der

Re: [racket-dev] better x86 performance

2011-04-27 Thread Matthew Flatt
At Tue, 26 Apr 2011 17:43:57 -0400, Vincent St-Amour wrote: > we're losing pretty hard on deriv. It looks like 20% of the time can be saved by JIT-inlining the `procedure-arity-includes?' check that `map' performs. (And I've been meaning to make that JIT change anyway, since it's an obstacle to us