Andy: Heh, glad to hear that I'm not the only one facing this issue, and I
appreciate the encouragement since it's been kicking my ass the past week
:) On the bright side, as someone coming from more of a math background,
this has forced me to learn a lot about how cpus/threads/memory/etc.
This reminds me of another thread, where performance issues related to
concurrent allocation were explored in depth:
https://groups.google.com/d/topic/clojure/48W2eff3caU/discussion
The main takeaway for me was, that Hotspot will slow down pretty
dramatically, as soon as there are two threads
David:
No new suggestions to add right now. Herwig's suggestion that it could be
the Java allocator has some evidence for it given your results. I'm not
sure whether this StackOverflow Q on TLAB is fully accurate, but it may
provide some useful info:
On Thursday, November 19, 2015 at 1:36:59 AM UTC-5, David Iba wrote:
>
> OK, have a few updates to report:
>
>- Oracle vs OpenJDK did not make a difference
>- Whenever I run N>1 threads calling any of these functions with
>swap/vswap, there is some overhead compared to running 18
Yeah, I actually tried using aset as well, and was still seeing these
"rogue" threads taking much longer (although the ones that did finish in a
normal amount of time had very similar completion times to those running in
their own process.)
Herwig: I will try those suggestions when I get a
Timothy: Each thread (call of f2) creates its own "local" atom, so I don't
think there should be any swap retries.
Gianluca: Good idea! I've only tried OpenJDK, but I will look into trying
Oracle and report back.
Andy: jvisualvm was showing pretty much all of the memory allocated in the
No worries. Thanks, I'll give that a try as well!
On Thursday, November 19, 2015 at 1:04:04 AM UTC+9, tbc++ wrote:
>
> Oh, then I completely mis-understood the problem at hand here. If that's
> the case then do the following:
>
> Change "atom" to "volatile!" and "swap!" to "vswap!". See if that
by the way, have you tried both Oracle and Open JDK with the same results?
Gianluca
On Tuesday, November 17, 2015 at 8:28:49 PM UTC+1, Andy Fingerhut wrote:
>
> David, you say "Based on jvisualvm monitoring, doesn't seem to be
> GC-related".
>
> What is jvisualvm showing you related to GC and/or
This sort of code is somewhat the worst case situation for atoms (or really
for CAS). Clojure's swap! is based off the "compare-and-swap" or CAS
operation that most x86 CPUs have as an instruction. If we expand swap! it
looks something like this:
(loop [old-val @x*]
(let [new-val (assoc old-val
Oh, then I completely mis-understood the problem at hand here. If that's
the case then do the following:
Change "atom" to "volatile!" and "swap!" to "vswap!". See if that changes
anything.
Timothy
On Wed, Nov 18, 2015 at 9:00 AM, David Iba wrote:
> Timothy: Each thread
OK, have a few updates to report:
- Oracle vs OpenJDK did not make a difference
- Whenever I run N>1 threads calling any of these functions with
swap/vswap, there is some overhead compared to running 18 separate
single-run processes in parallel. This overhead seems to increase as N
Could you also show how you are running these functions in parallel and
time them ? The way you start the functions can have as much impact as the
functions themselves.
Regards,
Niels
On Tuesday, November 17, 2015 at 6:38:39 AM UTC+1, David Iba wrote:
>
> I have functions f1 and f2 below, and
Andy: Interesting. Thanks for educating me on the fact that atom swap's
don't use the STM. Your theory seems plausible... I will try those tests
next time I launch the 18-core instance, but yeah, not sure how
illuminating the results will be.
Niels: along the lines of this (so that each
correction: that "do" should be a "doall". (My actual test code was a bit
different, but each run printed some info when it started so it doesn't
have to do with delayed evaluation of lazy seq's or anything).
On Tuesday, November 17, 2015 at 6:49:16 PM UTC+9, David Iba wrote:
>
> Andy:
David, you say "Based on jvisualvm monitoring, doesn't seem to be
GC-related".
What is jvisualvm showing you related to GC and/or memory allocation when
you tried the 18-core version with 18 threads in the same process?
Even memory allocation could become a point of contention, depending upon
I have functions f1 and f2 below, and let's say they run in T1 and T2
amount of time when running a single instance/thread. The issue I'm facing
is that parallelizing f2 across 18 cores takes anywhere from 2-5X T2, and
for more complex funcs takes absurdly long.
1. (defn f1 []
2.
There is no STM involved if you only have atoms, and no refs, so it can't
be STM-related.
I have a conjecture, but don't yet have a suggestion for an experiment that
would prove or disprove it.
The JVM memory model requires that changes to values that should be visible
to all threads, like swap!
17 matches
Mail list logo