Two optimizations I have not seen mentioned so far; don't be so Groovy ;-)
1. Replace the the Map<> with an array of primitive ints. Why use an integer as
a key into a hash map when it can be used as an array index?
int[] results = new int[19] // since we need to index values 3..18
2. Replace the N.times{} or (1..N).each{} loops with a good old fashioned
for(int i=0;i<N;i++) loop. I was actually surprised how much of an improvement
this made if @CompileStatic was not used. For statically compiled code this
didn't really make a difference, but for dynamic code the speed up is huge.
Of course, the big winner is using @CompileStatic, which when combined with
using an array results in a ~10x speed up.
For my tests I basically took the code posted by John and James and ran 50
rounds of 1M iterations. Here are some representative total times on my oldish
quad core MacBook Pro:
@CompileDynamic
Map + N.times = 17 seconds
Map + for-loop = 7 seconds
Array + N.times = 13 seconds
Array + for-loop = 3 seconds
@CompileStatic
Map + N.times = 3.4 seconds
Map + for-loop = 3.0 seconds
Array + N.times = 1.4 seconds
Array + for-loop = 1.4 seconds
Interesting, dynamically compiled code using a primitive array and a for-loop
performed just as well as statically compiled code doing it the "Groovy way".
- Keith
> On Mar 28, 2017, at 4:25 PM, Paul Moore <[email protected]> wrote:
>
> I'm very much a newbie with Groovy, so I apologise in advance if this
> is not the right place for questions like this. I couldn't find
> anywhere else that looked like a better option - if there is somewhere
> I should have asked, feel free to redirect me.
>
> I want to write a simulation script using Groovy - this is something
> of a hobby challenge for me, I have a friend who has done a similar
> task in C++, and I'm looking for a more user-friendly language to
> write the code, while not losing too much performance over the C
> version.
>
> The code is basically to simulate "a game". The particular game is
> defined by the user, as a function that generates scores. The program
> runs the game many times, and summarises the distribution of the
> results. Basically, a Monte Carlo simulation. My current code in
> Groovy for this, using "roll 3 dice and add up the results" as the
> target game, looks as follows:
>
> @Grapes(
> @Grab(group='org.apache.commons', module='commons-math3', version='3.6.1')
> )
> import org.apache.commons.math3.random.MersenneTwister
>
> def benchmark = { closure ->
> start = System.currentTimeMillis()
> closure.call()
> now = System.currentTimeMillis()
> now - start
> }
>
> rng = new MersenneTwister()
>
> int roll() {
> rng.nextInt(6) + rng.nextInt(6) + rng.nextInt(6) + 3
> }
>
> int N = 1000000
> def results = [:]
>
> def time = benchmark {
> N.times {
> int n = roll()
> results[n] = results.containsKey(n) ? results[n] + 1 : 1
> }
> }
>
> println "Took ${time/100} sec"
> for (e in results.sort()) {
> println "$e.key: $e.value"
> }
>
> This does exactly what I want, but takes about 5 seconds to run the
> simulation on my PC. My friend's C++ code runs a similar simulation in
> about 0.1 second. That's a massive penalty for Groovy, and likely
> means that for more realistic simulations (which would be a lot more
> complex than 3d6!) I wouldn't even be close to competitive.
>
> The code above is completely unoptimised. I know that Groovy's dynamic
> programming features can introduce some overhead, but I also get the
> impression from the documentation that by careful use of exact types,
> and similar techniques, this can be speeded up a lot (the docs claim
> potentially better than C performance in some cases).
>
> What should I be looking at to optimise the above code? The areas I
> can think of are:
>
> 1. The RNG. I assume that the apache commons code is pretty efficient,
> though. I do want a reasonably decent RNG, and I'd heard that the JVM
> RNG is not sufficient for simulation. For now I'm assuming that this
> is sufficiently good.
> 2. The roll() function. This is the core of the inner loop, and likely
> the big bottleneck. I've declared the type, which I guess is the first
> step, but I don't know what else I can do here. I tried a
> CompileStatic annotation, but that gave me errors about referencing
> the rng variable. I'm not sure what that implies - is my code doing
> something wrong in how it references the global rng variable?
> 3. Collecting the results in a map is likely not ideal. Is there a
> better data structure I should be using? I basically want to be able
> to count how many times each result appears - results will be integers
> (in certain cases I might want non-integers but I can handle them
> exceptionally) but I don't necessarily know the range in advance (so
> I'd avoid a static array unless it gives significant performance
> benefits - when I did a quick test, I got about a second faster
> runtime, noticeable, but not enough to get me anywhere near my sub-1
> second target)
>
> I tried using the GProf profiler to see if that shed any light on what
> I should do. When I ran with a realistic number of iterations it took
> forever and then failed with an out of memory error. Dropping it to
> 10000 iterations, I got
>
> % cumulative self self total self total
> self total
> time seconds seconds calls ms/call ms/call min ms min ms
> max ms max ms name
> 34.5 0.04 0.04 10000 0.00 0.00 0.00 0.00
> 0.91 1.13 blackpool$_run_closure2$_closure3.roll
> 18.5 0.06 0.02 39984 0.00 0.00 0.00 0.00
> 0.53 0.53 java.lang.Integer.plus
> 15.9 0.08 0.02 10000 0.00 0.01 0.00 0.00
> 0.28 1.26 blackpool$_run_closure2$_closure3.doCall
> 11.0 0.10 0.01 30000 0.00 0.00 0.00 0.00
> 0.36 0.36 org.apache.commons.math3.random.MersenneTwister.nextInt
> 5.1 0.10 0.00 10000 0.00 0.00 0.00 0.00
> 0.19 0.19 java.util.LinkedHashMap.containsKey
> 4.4 0.11 0.00 10000 0.00 0.00 0.00 0.00
> 0.02 0.02 java.util.LinkedHashMap.putAt
> 4.0 0.12 0.00 1 5.25 125.62 5.25 125.62
> 5.25 125.62 java.lang.Integer.times
> 3.9 0.12 0.00 9984 0.00 0.00 0.00 0.00
> 0.02 0.02 java.util.LinkedHashMap.getAt
> 2.2 0.12 0.00 1 2.85 128.48 2.85 128.48
> 2.85 128.48 blackpool$_run_closure2.doCall
>
> But I don't really know how to interpret that. (Also, am I somehow
> using GProf wrongly? It seems like it shouldn't run out of memory
> profiling a 5-second program run...)
>
> Can anyone offer any advice on what I should be looking at here?
>
> Thanks,
> Paul
----------------------
Keith Suderman
Research Associate
Department of Computer Science
Vassar College, Poughkeepsie NY
[email protected]