You should also specify how many cores you plan on devoting to your
application. Notice that most of this discussion has been about JVM apps
running on machines with >32 cores. Systems like this aren't exactly common
in my line of work (where we tend to run greater numbers of smaller servers
using
2013/11/6 Dave Tenny
> (To contrast the lengthy discussion and analysis of this topic that is
> *hopefully* the exception and not the rule)
Some of the comments reveal that part of the problem is in part with JVM
memory allocator
which has its throughput limits.
There are known large commercia
Hi,
I believe Clojure's original mission has been giving you tools for handling
concurrency[1] in your programs in a sane way.
However, with the advent of Reducers[2], the landscape is changing quite a
bit.
If you're interested in the concurrency vs. parallelism terminology and
what language const
As a person who has recently been dabbling with clojure for evaluation
purposes I wondered if anybody wanted to post some links about parallel
clojure apps that have been clear and easy parallelism wins for the types
of applications that clojure was designed for. (To contrast the lengthy
discu
Neat, thanks for that. I skimmed it and don't know enough about Java to be
able to tell quickly how easily we can use this to our advantage, but
perhaps somebody else on the list will know.
The disruptor project from LMAX has wrestled with these sort of issues at
length and achieved astounding leve
The disruptor project from LMAX has wrestled with these sort of issues at
length and achieved astounding levels of performance on the JVM
Martin Thompson, the original author of the disruptor, is a leading light
in the JVM performance space, his mechanical sympathy blog is a goldmine of
informatio
Interesting! If that is true of Java (I don't know Java at all), then your
argument seems plausible. Cache-to-main-memory writes still take many more
CPU cycles (an order of magnitude more, last I knew) than
processor-to-cache. I don't think it's so much a bandwidth issue as
latency, AFAIK. Thanks
Adding to this thread from almost a year ago. I don't have conclusive
proof with experiments to show right now, but I do have some experiments
that have led me to what I think is a plausible cause of not just Clojure
programs running more slowly when multi-threaded than when single-threaded,
but a
On Jan 31, 2013, at 10:15 AM, Chas Emerick wrote:
>>
>> Then Wm. Josiah posted a full-application benchmark, which appears to
>> have entirely different performance problems from the synthetic `burn`
>> benchmark. I’d rejected GC as the cause for the slowdown there too, but
>> ATM can’t recall w
On Jan 31, 2013, at 9:23 AM, Marshall Bockrath-Vandegrift wrote:
> Chas Emerick writes:
>
>> The nature of the `burn` program is such that I'm skeptical of the
>> ability of any garbage-collected runtime (lispy or not) to scale its
>> operation across multiple threads.
>
> Bringing you up to sp
Chas Emerick writes:
> Keeping the discussion here would make sense, esp. in light of
> meetup.com's horrible "discussion board".
Excellent. Solves the problem of deciding the etiquette of jumping on
the meetup board for a meetup one has never been involved in. :-)
> The nature of the `burn`
Keeping the discussion here would make sense, esp. in light of meetup.com's
horrible "discussion board".
I don't have a lot to offer on the JVM/Clojure-specific problem beyond what I
wrote in that meetup thread, but Lee's challenge(s) were too hard to resist:
> "Would your conclusion be somethi
Josiah mentioned requesting a free trial of the ZIng JVM. Did you ever get
access to that, and were able to try your code running on that?
Again, I have no direct experience with their product to guarantee you better
results -- just that I've heard good things about their ability to handle
con
FYI we had a bit of a discussion about this at a meetup in Amherst MA
yesterday, and while I'm not sufficiently on top of the JVM or system issues to
have briefed everyone on all of the details there has been a little of followup
since the discussion, including results of some different experim
"Wm. Josiah Erikson" writes:
> Am I reading this right that this is actually a Java problem, and not
> clojure-specific? Wouldn't the rest of the Java community have noticed
> this? Or maybe massive parallelism in this particular way isn't
> something commonly done with Java in the industry?
>
>
Am I reading this right that this is actually a Java problem, and not
clojure-specific? Wouldn't the rest of the Java community have noticed
this? Or maybe massive parallelism in this particular way isn't something
commonly done with Java in the industry?
Thanks for the patches though - it's nice
I've posted a patch with some changes here
(https://gist.github.com/4416803), it includes the record change here and
a small change to interpret-instruction, the benchmark runs > 2x the
default as it did for Marshall.
The patch also modifies the main loop to use a thread pool instead of
agents
No, it's not the context switching, changing isArray (a native method) to
getAnnotations (a normal jvm method) gives the same time for both the
parallel and serial version.
Cameron.
On Saturday, December 29, 2012 10:34:42 AM UTC+11, Leonardo Borges wrote:
>
> In that case isn't context switchin
In that case isn't context switching dominating your test?
.isArray isn't expensive enough to warrant the use of pmap
Leonardo Borges
www.leonardoborges.com
On Dec 29, 2012 10:29 AM, "cameron" wrote:
> Hi Lee,
> I've done some more digging and seem to have found the root of the
> problem,
> i
Hi Lee,
I've done some more digging and seem to have found the root of the
problem,
it seems that java native methods are much slower when called in parallel.
The following code illustrates the problem:
(letfn [(time-native [f]
(let [c (class [])]
(time (dorun (
I've been moving house for the last week or so but I'll also give the
benchmark another look.
My initial profiling seemed to show that the parallel version was spending
a significant amount of time in java.lang.isArray,
clojush.pushstate/stack-ref is calling nth on the result of cons, since it
i
On Dec 21, 2012, at 6:59 PM, Meikel Brandmeyer wrote:
>
>> Is there a much simpler way that I overlooked?
>
> I'm not sure it's simpler, but it's more straight-forward, I'd say.
>
Thanks Marshall and Mikel on the struct->record conversion code. I'll
definitely make a change along those lines.
Hi,
Am 22.12.12 00:37, schrieb Lee Spector:
> ;; this is defined elsewhere, and I want push-states to have fields for each
> push-type that's defined here
> (def push-types '(:exec :integer :float :code :boolean :string :zip
> :tag :auxiliary :return :environment)
>
> (d
Lee Spector writes:
> FWIW I used records for push-states at one point but did not observe a
> speedup and it required much messier code, so I reverted to
> struct-maps. But maybe I wasn't doing the right timings. I'm curious
> about how you changed to records without the messiness. I'll include
On Dec 21, 2012, at 5:22 PM, Marshall Bockrath-Vandegrift wrote:
> Not to the bottom of things yet, but found some low-hanging fruit –
> switching the `push-state` from a struct-map to a record gives a flat
> ~2x speedup in all configurations I tested. So, that’s good?
I really appreciate your a
"Wm. Josiah Erikson" writes:
> I hope this helps people get to the bottom of things.
Not to the bottom of things yet, but found some low-hanging fruit –
switching the `push-state` from a struct-map to a record gives a flat
~2x speedup in all configurations I tested. So, that’s good?
I have how
"Wm. Josiah Erikson" writes:
> Then run, for instance: /usr/bin/time -f %E lein run
> clojush.examples.benchmark-bowling
>
> and then, when that has finished, edit
> src/clojush/examples/benchmark_bowling.clj and uncomment
> ":use-single-thread true" and run it again. I think this is a
> succinct
I tried redefining the few places in the code (string_reverse, I think)
that used reverse to use the same version of reverse that I got such great
speedups with in your code, and it made no difference. There are not any
explicit calls to conj in the code that I could find.
On Wed, Dec 19, 2012 at
On Dec 19, 2012, at 11:57 AM, Wm. Josiah Erikson wrote:
> I think this is a succinct, deterministic benchmark that clearly
> demonstrates the problem and also doesn't use conj or reverse.
Clarification: it's not just a tight loop involving reverse/conj, as our
previous benchmark was. It's our
Whoops, sorry about the link. It should be able to be found here:
http://gibson.hampshire.edu/~josiah/clojush/
On Wed, Dec 19, 2012 at 11:57 AM, Wm. Josiah Erikson wrote:
> So here's what we came up with that clearly demonstrates the problem. Lee
> provided the code and I tweaked it until I belie
So here's what we came up with that clearly demonstrates the problem. Lee
provided the code and I tweaked it until I believe it shows the problem
clearly and succinctly.
I have put together a .tar.gz file that has everything needed to run it,
except lein. Grab it here: clojush_bowling_benchmark.ta
On Dec 14, 2012, at 10:41 PM, cameron wrote:
> Until Lee has a representative benchmark for his application it's difficult
> to tell if he's
> experiencing the same problem but there would seem to be a case for changing
> the PersistentList
> implementation in clojure.lang.
We put together a ve
On Dec 15, 2012, at 1:14 AM, cameron wrote:
>
> Originally I was using ECJ (http://cs.gmu.edu/~eclab/projects/ecj/) in java
> for my GP work but for the last few years it's been GEVA with a clojure
> wrapper I wrote (https://github.com/cdorrat/geva-clj).
Ah yes -- I've actually downloaded and
>
> I'd be interested in seeing your GP system. The one we're using evolves
> "Push" programs and I suspect that whatever's triggering this problem with
> multicore utilization is stemming from something in the inner loop of my
> Push interpreter (https://github.com/lspector/Clojush)... but I
Thanks Herwig,
I used your plugin with the following 2 burn variants:
(defn burn-slow [& _]
(count (last (take 1000 (iterate #(reduce conj '() %) (range 1))
(defn burn-fast [& _]
(count (last (take 1000 (iterate #(reduce conj* (list nil) %) (range
1))
Where conj* is just a
I've created a test harness for this as a leiningen plugin:
https://github.com/bendlas/lein-partest
You can just put
:plugins [[net.bendlas/lein-partest "0.1.0"]]
into your project and run
lein partest your.ns/testfn 6
to run 6 threads/processes in parallel
The plugin then runs the fu
On Dec 13, 2012, at 4:21 PM, cameron wrote:
>
> Have you made any progress on a small deterministic benchmark that reflects
> your applications behaviour (ie. the RNG seed work you were discussing)? I'm
> keen to help, but I don't have time to look at benchmarks that take hours to
> run.
>
>
On Friday, December 14, 2012 5:41:59 AM UTC+11, Wm. Josiah Erikson wrote:
>
> Does this help? Should I do something else as well? I'm curious to try
> running like, say 16 concurrent copies on the 48-way node
>
> Have you made any progress on a small deterministic benchmark that
reflects
Cool. I've requested a free trial.
On Thu, Dec 13, 2012 at 1:53 PM, Andy Fingerhut wrote:
> I'm not saying that I know this will help, but if you are open to trying a
> different JVM that has had a lot of work done on it to optimize it for high
> concurrency, Azul's Zing JVM may be worth a try, t
I'm not saying that I know this will help, but if you are open to trying a
different JVM that has had a lot of work done on it to optimize it for high
concurrency, Azul's Zing JVM may be worth a try, to see if it increases
parallelism for a single Clojure instance in a single JVM, with lots of t
Ah. We'll look into running several clojures in one JVM too. Thanks.
On Thu, Dec 13, 2012 at 1:41 PM, Wm. Josiah Erikson wrote:
> OK, I did something a little bit different, but I think it proves the same
> thing we were shooting for.
>
> On a 48-way 4 x Opteron 6168 with 32GB of RAM. This is Tom
OK, I did something a little bit different, but I think it proves the same
thing we were shooting for.
On a 48-way 4 x Opteron 6168 with 32GB of RAM. This is Tom's "Bowling"
benchmark:
1: multithreaded. Average of 10 runs: 14:00.9
2. singlethreaded. Average of 10 runs: 23:35.3
3. singlethreaded,
See https://github.com/flatland/classlojure for a, nearly, ready-made
solution to running several Clojures in one JVM.
On Wed, Dec 12, 2012 at 5:20 PM, Lee Spector wrote:
>
> On Dec 12, 2012, at 10:45 AM, Christophe Grand wrote:
> > Lee, while you are at benchmarking, would you mind running sev
On Thursday, December 13, 2012 12:51:57 AM UTC+11, Marshall
Bockrath-Vandegrift wrote:
>
> cameron > writes:
>
> > the megamorphic call site hypothesis does sound plausible but I'm
> > not sure where the following test fits in.
>
> ...
>
> > I was toying with the idea of replacing the Empt
On Dec 12, 2012, at 10:45 AM, Christophe Grand wrote:
> Lee, while you are at benchmarking, would you mind running several threads in
> one JVM with one clojure instance per thread? Thus each thread should get
> JITted independently.
I'm not actually sure how to do that. We're starting runs wit
Lee, while you are at benchmarking, would you mind running several threads
in one JVM with one clojure instance per thread? Thus each thread should
get JITted independently.
Christophe
On Wed, Dec 12, 2012 at 4:11 PM, Lee Spector wrote:
>
> On Dec 12, 2012, at 10:03 AM, Andy Fingerhut wrote:
>
On Dec 12, 2012, at 10:03 AM, Andy Fingerhut wrote:
>
> Have you tried running your real application in a single thread in a JVM, and
> then run multiple JVMs in parallel, to see if there is any speedup? If so,
> that would again help determine whether it is multiple threads in a single
> JVM
Lee:
I believe you said that with your benchmarking code achieved good speedup when
run as separate JVMs that were each running a single thread, even before making
the changes to the implementation of reverse found by Marshall. I confirmed
that on my own machine as well.
Have you tried runnin
cameron writes:
> the megamorphic call site hypothesis does sound plausible but I'm
> not sure where the following test fits in.
...
> I was toying with the idea of replacing the EmptyList class with a
> PersistsentList instance to mitigate the problem
> in at least one common case, however i
Andy Fingerhut writes:
> I'm not practiced in recognizing megamorphic call sites, so I could be
> missing some in the example code below, modified from Lee's original
> code. It doesn't use reverse or conj, and as far as I can tell
> doesn't use PersistentList, either, only Cons.
...
> Can you
Hi Marshall,
the megamorphic call site hypothesis does sound plausible but I'm not
sure where the following test fits in.
If I understand correctly we believe that it's the fact that the base case
(an PersistentList$EmptyList instance)
and the normal case (an PersistsentList instance) have dif
On Dec 11, 2012, at 1:06 PM, Marshall Bockrath-Vandegrift wrote:
> So I think if you replace your calls to `reverse` and any `conj` loops
> you have in your own code, you should see a perfectly reasonable
> speedup.
Tantalizing, but on investigation I see that our real application actually does
Hm. Interesting. For the record, the exact code I'm running right now that
I'm seeing great parallelism with is this:
(defn reverse-recursively [coll]
(loop [[r & more :as all] (seq coll)
acc '()]
(if all
(recur more (cons r acc))
acc)))
(defn burn
([] (loop [i 0
...and, suddenly, the high-core-count Opterons show us what we wanted and
hoped for. If I increase that range statement to 100 and run it on the
48-core node, it takes 50 seconds (before it took 50 minutes), while the
FX-8350 takes 3:31.89 and the 3770K takes 3:48.95. Thanks Marshall! I think
you m
Marshall:
I'm not practiced in recognizing megamorphic call sites, so I could be missing
some in the example code below, modified from Lee's original code. It doesn't
use reverse or conj, and as far as I can tell doesn't use PersistentList,
either, only Cons.
(defn burn-cons [size]
(let [si
And, interestingly enough, suddenly the AMD FX-8350 beats the Intel Core i7
3770K, when before it was very very much not so. So for some reason, this
bug was tickled more dramatically on AMD multicore processors than on Intel
ones.
On Tue, Dec 11, 2012 at 2:54 PM, Wm. Josiah Erikson wrote:
> OK W
OK WOW. You hit the nail on the head. It's "reverse" being called in a pmap
that does it. When I redefine my own version of reverse (I totally cheated
and just stole this) like this:
(defn reverse-recursively [coll]
(loop [[r & more :as all] (seq coll)
acc '()]
(if all
(recur
Lee Spector writes:
> If the application does lots of "list processing" but does so with a
> mix of Clojure list and sequence manipulation functions, then one
> would have to write private, list/cons-only versions of all of these
> things? That is -- overstating it a bit, to be sure, but perhaps
On Dec 11, 2012, at 11:40 AM, Marshall Bockrath-Vandegrift wrote:
>
>> Or have I missed a currently-available work-around among the many
>> suggestions?
>
> You can specialize your application to avoid megamodal call sites in
> tight loops. If you are working with `Cons`-order sequences, just u
Lee Spector writes:
> Is the following a fair characterization pending further developments?
>
> If you have a cons-intensive task then even if it can be divided into
> completely independent, long-running subtasks, there is currently no
> known way to get significant speedups by running the subt
Lee,
My reading of this thread is not quite as pessimistic as yours. Here is
my synthesis for the practical application developer in Clojure from
reading and re-reading all of the posts above. Marshall and Cameron, please
feel free to correct me if I screw anything up here royally. ;-)
When
On Dec 11, 2012, at 4:37 AM, Marshall Bockrath-Vandegrift wrote:
> I’m not sure what the next steps are. Open a bug on the JVM? This is
> something one can attempt to circumvent on a case-by-case basis, but
> IHMO has significant negative implications for Clojure’s concurrency
> story.
I've gott
"nicolas.o...@gmail.com" writes:
> What happens if your run it a third time at the end? (The question
> is related to the fact that there appears to be transition states
> between monomorphic and megamorphic call sites, which might lead to
> an explanation.)
Same results, but your comment jog
Interesting. I tried the following:
:jvm-opts ["-Xmx10g" "-Xms10g" "-XX:+AggressiveOpts" "-server"
"-XX:+TieredCompilation" "-XX:ReservedCodeCacheSize=256m" "-XX:TLABSize=1G"
"-XX:+PrintGCDetails" "-XX:+PrintGCTimeStamps" "-XX:+UseParNewGC"
"-XX:+ResizeTLAB" "-XX:+UseTLAB"]
I got a slight slowdown
"Wm. Josiah Erikson" writes:
> Aha. Not only do I get a lot of "made not entrant", I get a lot of
> "made zombie". However, I get this for both runs with map and with
> pmap (and with pmapall as well)
I’m not sure this is all that enlightening. From what I can gather,
“made not entrant” just me
I tried some more performance tuning options in Java, just for kicks, and
didn't get any advantages from them: "-server" "-XX:+TieredCompilation"
"-XX:ReservedCodeCacheSize=256m"
Also, in case it's informative:
[josiah@compute-1-17 benchmark]$ grep entrant compilerOutputCompute-1-1.txt
| wc -l
17
Aha. Not only do I get a lot of "made not entrant", I get a lot of "made
zombie". However, I get this for both runs with map and with pmap (and with
pmapall as well)
For instance, from a pmapall run:
33752 159 clojure.lang.Cons::next (10 bytes) made zombie
33752 164
>
> - Parallel allocation of `Cons` and `PersistentList` instances through
> a Clojure `conj` function remains fast as long as the function only
> ever returns objects of a single concrete type
A possible explanation for this could be JIT Deoptimization. Deoptimization
happens when
cameron writes:
> There does seem to be something unusual about conj and
> clojure.lang.PersistentList in this parallel test case and I don't
> think it's related to the JVMs memory allocation.
I’ve got a few more data-points, but still no handle on what exactly is
going on.
My last benchmark s
The main GC feature here are the Thread-Local Allocation Buffers. They are
on by default and are "automatically sized according to allocation
patterns". The size can also be fine-tuned with the -XX:TLABSize=nconfiguration
option. You may consider tweaking this setting to optimize
runtime. Basic
Hi Marshall,
I think we're definitely on the right track.
If I replace the reverse call with the following function I get a parallel
speedup of ~7.3 on an 8 core machine.
(defn copy-to-java-list [coll]
(let [lst (java.util.LinkedList.)]
(doseq [x coll]
(.addFirst lst x))
lst))
There's no magic here, everyone tuning their app hit this wall eventually,
tweaking the JVM memory options :)
Luc
>
> On Dec 9, 2012, at 6:25 AM, Softaddicts wrote:
>
> > If the number of object allocation mentioned earlier in this thread are
> > real,
> > yes vm heap management can be a bott
On Dec 9, 2012, at 6:25 AM, Softaddicts wrote:
> If the number of object allocation mentioned earlier in this thread are real,
> yes vm heap management can be a bottleneck. There has to be some
> locking done somewhere otherwise the heap would corrupt :)
>
> The other bottleneck can come from ga
Andy Fingerhut writes:
> My current best guess is the JVM's memory allocator, not Clojure code.
I didn’t mean to imply the problem was in Clojure itself, but I don’t
believe the issue is in the memory allocator either. I now believe the
problem is in a class of JIT optimization HotSpot is perfo
On Dec 9, 2012, at 4:48 AM, Marshall Bockrath-Vandegrift wrote:
>
> It’s like there’s a lock of some sort sneaking in on the `conj` path.
> Any thoughts on what that could be?
My current best guess is the JVM's memory allocator, not Clojure code.
Andy
--
You received this message because you
On Dec 8, 2012, at 9:37 PM, Lee Spector wrote:
>
> On Dec 8, 2012, at 10:19 PM, meteorfox wrote:
>>
>> Now if you run vmstat 1 while running your benchmark you'll notice that the
>> run queue will be most of the time at 8, meaning that 8 "processes" are
>> waiting for CPU, and this is due to m
If the number of object allocation mentioned earlier in this thread are real,
yes vm heap management can be a bottleneck. There has to be some
locking done somewhere otherwise the heap would corrupt :)
The other bottleneck can come from garbage collection which has to freeze
object allocation com
cameron writes:
> Interesting problem, the slowdown seems to being caused by the reverse
> call (actually the calls to conj with a list argument).
Excellent analysis, sir! I think this points things in the right
direction.
> fast-reverse : map-ms: 3.3, pmap-ms 0.7, speedup 4.97
> list-cons
Hi Lee,
Would it be difficult to try the following version of 'pmap'? It doesn't
use futures but executors instead so at least this could help narrow the
problem down... If the problem is due to the high number of futures
spawned by pmap then this should fix it...
(defn- with-thread-pool* [
I forgot to mention, I cut the number of reverse iterations down to 1000
(not 1) so I wouldn't have to wait too long for criterium, the speedup
numbers are representative of the full test though.
Cameron.
On Sunday, December 9, 2012 6:26:16 PM UTC+11, cameron wrote:
>
>
> Interesting probl
Interesting problem, the slowdown seems to being caused by the reverse call
(actually the calls to conj with a list argument).
Calling conj in a multi-threaded environment seems to have a significant
performance impact when using lists
I created some alternate reverse implementations (the fastes
On Dec 8, 2012, at 10:19 PM, meteorfox wrote:
>
> Now if you run vmstat 1 while running your benchmark you'll notice that the
> run queue will be most of the time at 8, meaning that 8 "processes" are
> waiting for CPU, and this is due to memory accesses (in this case, since this
> is not true
On Dec 8, 2012, at 8:16 PM, Marek Šrank wrote:
>
> Yep, reducers, don't use lazy seqs. But they return just sth. like
> transformed functions, that will be applied when building the collection. So
> you can use them like this:
>
> (into [] (r/map burn (doall (range 4)
>
> See
> http:
Correction regarding the run-queue, this is not completely correct, :S .
But the stalled cycles and memory accesses still holds.
Sorry for the misinformation.
On Friday, December 7, 2012 8:25:14 PM UTC-5, Lee wrote:
>
>
> I've been running compute intensive (multi-day), highly parallelizable
>
Lee:
I ran Linux perf and also watched the run queue (with vmstat) and your
bottleneck is basically memory access. The CPUs are idle 80% of the time by
stalled cycles. Here's what I got on my machine.
Intel Core i7 4 cores with Hyper thread (8 virtual processors)
16 GiB of Memory
Oracle JVM
an
Lee:
So I ran
On Friday, December 7, 2012 8:25:14 PM UTC-5, Lee wrote:
>
>
> I've been running compute intensive (multi-day), highly parallelizable
> Clojure processes on high-core-count machines and blithely assuming that
> since I saw near maximal CPU utilization in "top" and the like that I
> Just tried, my first foray into reducers, but I must not be understanding
> something correctly:
>
> (time (r/map burn (doall (range 4
>
> returns in less than a second on my macbook pro, whereas
>
> (time (doall (map burn (range 4
>
> takes nearly a minute.
>
> This feels lik
I'm glad somebody else can duplicate our findings! I get results similar to
this on Intel hardware. On AMD hardware, the disparity is bigger, and
multiple threads of a single JVM invocation on AMD hardware consistently
gives me slowdowns as compared to a single thread. Also, your results are
on
One more possibility to consider:
Single-threaded versions are more likely to keep the working set in the
processor's largest cache, whereas parallel versions that use N times the
working set for N times the parallelism can cause that same cache to thrash to
main memory.
Andy
--
You received
On Dec 8, 2012, at 3:42 PM, Andy Fingerhut wrote:
>
> I'm hoping you realize that (take 1 (iterate reverse value)) is reversing
> a linked list 1 times, each time allocating 1 cons cells (or
> Clojure's equivalent of a cons cell)? For a total of around 100,000,000
> memory allocat
On Dec 7, 2012, at 5:25 PM, Lee Spector wrote:
> The test: I wrote a time-consuming function that just does a bunch of math
> and list manipulation (which is what takes a lot of time in my real
> applications):
>
> (defn burn
> ([] (loop [i 0
> value '()]
>(if (>= i 1
I haven't analyzed your results in detail, but here are some results I had on
my 2GHz 4-core Intel core i7 MacBook Pro vintage 2011.
When running multiple threads within a single JVM invocation, I never got a
speedup of even 2. The highest speedup I measured was 1.82 speedup when I ran
8 threa
Andy: The short answer is yes, and we saw huge speedups. My latest post, as
well as Lee's, has details.
On Friday, December 7, 2012 9:42:03 PM UTC-5, Andy Fingerhut wrote:
>
>
> On Dec 7, 2012, at 5:25 PM, Lee Spector wrote:
>
> >
> > Another strange observation is that we can run multiple inst
Hi guys - I'm the colleague Lee speaks of. Because Jim mentioned running
things on a 4-core Phenom II, I did some benchmarking on a Phenom II X4
945, and found some very strange results, which I shall post here, after I
explain a little function that Lee wrote that is designed to get improved
r
On Dec 8, 2012, at 1:28 PM, Paul deGrandis wrote:
> My experiences in the past are similar to the numbers that Jim is reporting.
>
> I have recently been centering most of my crunching code around reducers.
> Is it possible for you to cook up a small representative test using
> reducers+fork/joi
My experiences in the past are similar to the numbers that Jim is reporting.
I have recently been centering most of my crunching code around reducers.
Is it possible for you to cook up a small representative test using
reducers+fork/join (and potentially primitives in the intermediate steps)?
Pe
On Dec 8, 2012, at 9:36 AM, Marshall Bockrath-Vandegrift wrote:
>
> Although it doesn’t impact your benchmark, `pmap` may be further
> adversely affecting the performance of your actual program. There’s a
> open bug regarding `pmap` and chunked seqs:
>
>http://dev.clojure.org/jira/browse/CL
On Dec 7, 2012, at 9:42 PM, Andy Fingerhut wrote:
>
>
> When you say "we can run multiple instances of the test on the same machine",
> do you mean that, for example, on an 8 core machine you run 8 different JVMs
> in parallel, each doing a single-threaded 'map' in your Clojure code and not
>
Lee Spector writes:
> I'm also aware that the test that produced the data I give below,
> insofar as it uses pmap to do the distribution, may leave cores idle
> for a bit if some tasks take a lot longer than others, because of the
> way that pmap allocates cores to threads.
Although it doesn’t i
Even though this is very surprising (and sad) to hear, I'm afraid I've
got different experiences... My reducer-based parallel minimax is about
3x faster than the serial one, on my 4-core AMD phenom II and a tiny bit
faster on my girlfriend's intel i5 (2 physical cores + 2 virtual). I'm
suspecti
1 - 100 of 104 matches
Mail list logo