subject:"Poor parallelization performance across 18 cores \(but not 4\)"

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-20 Thread David Iba

Andy: Heh, glad to hear that I'm not the only one facing this issue, and I appreciate the encouragement since it's been kicking my ass the past week :) On the bright side, as someone coming from more of a math background, this has forced me to learn a lot about how cpus/threads/memory/etc.

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-19 Thread Herwig Hochleitner

This reminds me of another thread, where performance issues related to concurrent allocation were explored in depth: https://groups.google.com/d/topic/clojure/48W2eff3caU/discussion The main takeaway for me was, that Hotspot will slow down pretty dramatically, as soon as there are two threads

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-19 Thread Andy Fingerhut

David: No new suggestions to add right now. Herwig's suggestion that it could be the Java allocator has some evidence for it given your results. I'm not sure whether this StackOverflow Q on TLAB is fully accurate, but it may provide some useful info:

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-19 Thread Fluid Dynamics

On Thursday, November 19, 2015 at 1:36:59 AM UTC-5, David Iba wrote: > > OK, have a few updates to report: > >- Oracle vs OpenJDK did not make a difference >- Whenever I run N>1 threads calling any of these functions with >swap/vswap, there is some overhead compared to running 18

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-19 Thread David Iba

Yeah, I actually tried using aset as well, and was still seeing these "rogue" threads taking much longer (although the ones that did finish in a normal amount of time had very similar completion times to those running in their own process.) Herwig: I will try those suggestions when I get a

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-18 Thread David Iba

Timothy: Each thread (call of f2) creates its own "local" atom, so I don't think there should be any swap retries. Gianluca: Good idea! I've only tried OpenJDK, but I will look into trying Oracle and report back. Andy: jvisualvm was showing pretty much all of the memory allocated in the

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-18 Thread David Iba

No worries. Thanks, I'll give that a try as well! On Thursday, November 19, 2015 at 1:04:04 AM UTC+9, tbc++ wrote: > > Oh, then I completely mis-understood the problem at hand here. If that's > the case then do the following: > > Change "atom" to "volatile!" and "swap!" to "vswap!". See if that

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-18 Thread gianluca torta

by the way, have you tried both Oracle and Open JDK with the same results? Gianluca On Tuesday, November 17, 2015 at 8:28:49 PM UTC+1, Andy Fingerhut wrote: > > David, you say "Based on jvisualvm monitoring, doesn't seem to be > GC-related". > > What is jvisualvm showing you related to GC and/or

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-18 Thread Timothy Baldridge

This sort of code is somewhat the worst case situation for atoms (or really for CAS). Clojure's swap! is based off the "compare-and-swap" or CAS operation that most x86 CPUs have as an instruction. If we expand swap! it looks something like this: (loop [old-val @x*] (let [new-val (assoc old-val

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-18 Thread Timothy Baldridge

Oh, then I completely mis-understood the problem at hand here. If that's the case then do the following: Change "atom" to "volatile!" and "swap!" to "vswap!". See if that changes anything. Timothy On Wed, Nov 18, 2015 at 9:00 AM, David Iba wrote: > Timothy: Each thread

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-18 Thread David Iba

OK, have a few updates to report: - Oracle vs OpenJDK did not make a difference - Whenever I run N>1 threads calling any of these functions with swap/vswap, there is some overhead compared to running 18 separate single-run processes in parallel. This overhead seems to increase as N

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-17 Thread Niels van Klaveren

Could you also show how you are running these functions in parallel and time them ? The way you start the functions can have as much impact as the functions themselves. Regards, Niels On Tuesday, November 17, 2015 at 6:38:39 AM UTC+1, David Iba wrote: > > I have functions f1 and f2 below, and

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-17 Thread David Iba

Andy: Interesting. Thanks for educating me on the fact that atom swap's don't use the STM. Your theory seems plausible... I will try those tests next time I launch the 18-core instance, but yeah, not sure how illuminating the results will be. Niels: along the lines of this (so that each

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-17 Thread David Iba

correction: that "do" should be a "doall". (My actual test code was a bit different, but each run printed some info when it started so it doesn't have to do with delayed evaluation of lazy seq's or anything). On Tuesday, November 17, 2015 at 6:49:16 PM UTC+9, David Iba wrote: > > Andy:

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-17 Thread Andy Fingerhut

David, you say "Based on jvisualvm monitoring, doesn't seem to be GC-related". What is jvisualvm showing you related to GC and/or memory allocation when you tried the 18-core version with 18 threads in the same process? Even memory allocation could become a point of contention, depending upon

Poor parallelization performance across 18 cores (but not 4)

2015-11-16 Thread David Iba

I have functions f1 and f2 below, and let's say they run in T1 and T2 amount of time when running a single instance/thread. The issue I'm facing is that parallelizing f2 across 18 cores takes anywhere from 2-5X T2, and for more complex funcs takes absurdly long. 1. (defn f1 [] 2.

Re: Poor parallelization performance across 18 cores (but not 4)

2015-11-16 Thread Andy Fingerhut

There is no STM involved if you only have atoms, and no refs, so it can't be STM-related. I have a conjecture, but don't yet have a suggestion for an experiment that would prove or disprove it. The JVM memory model requires that changes to values that should be visible to all threads, like swap!

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

Poor parallelization performance across 18 cores (but not 4)

Re: Poor parallelization performance across 18 cores (but not 4)

17 matches

Site Navigation

Mail list logo

Footer information