I'll volunteer to run your code on an Azul box.
- Azul gear has a great profiling tool. I should be able to rapidly
tell hot-locks/lock-contention from other resource bottlenecks.
- Azul gear has far more bandwidth than X86 gear, so if your X86 is
bandwidth bound - this won't show up on us.
- Az
...
>parallel (6) : "Elapsed time: 38357.797175 msecs"
>parallel (7) : "Elapsed time: 37756.190205 msecs"
>From 4 to 7 there is no speedup at all.
>This awfully looks like you are using a core i7 with 8 threats but
only 4 physical cores. What is your hardware?
sorry, I found you have alre
On Aug 9, 6:08 am, Nicolas Oury wrote:
> > If I do my pmaptest with a very large Integer (inc 20) instead
> > of (inc 0), it is as slow as the double version. My question is,
> > whether Clojure may has a special handling for small integers? Like
> > using primitives for small ints and do
> If I do my pmaptest with a very large Integer (inc 20) instead
> of (inc 0), it is as slow as the double version. My question is,
> whether Clojure may has a special handling for small integers? Like
> using primitives for small ints and doing a new Integer for larger
> ones?
>
It seem
> Johann, if you are still following this thread, could you try running
> this Clojure program on your 8 core machine?
>
> http://github.com/jafingerhut/clojure-benchmarks/blob/3e45bd8f6c3eba4...
>
> These first set of parameters below will do 8 jobs sequentially, each
> doing 10^10 (inc c)'s, whe
Andy,
I just thought I'd mention that for 80 cents you can rent an hour on an
8-core EC2 machine with 7GB of RAM. We use EC2 a lot for such things at
work. It may be an easy way for you to accomplish your goals.
http://aws.amazon.com/ec2/instance-types/
Chad Harrington
chad.harring...@gmail.com
Hi Brad,
I think that there is no global lock for heap allocation, at least for
small objects.
As a support for this claim:
http://www.ibm.com/developerworks/java/library/j-jtp09275.html
(see more specifically: "Thread-local allocation", but the article is
really interesting as a whole.)
I am
> I'm not sure how to determine why calling 'new Double' each time
> through NewDoubleTest's inner loop causes 2 threads to perform not
> much better than 1. The best possible explanation I've heard is from
> Nicolas Oury -- perhaps we are measuring the bandwidth from cache to
> main memory, not
Johann, if you are still following this thread, could you try running
this Clojure program on your 8 core machine?
http://github.com/jafingerhut/clojure-benchmarks/blob/3e45bd8f6c3eba47f982a0f6083493a9f076d0e9/misc/pmap-testing.clj
These first set of parameters below will do 8 jobs sequentially,
On Aug 6, 11:51 am, John Harrop wrote:
> Cache misses are a possibility; try the integer version with long, so the
> size of the data is the same as with double.
> The other possibility I'd consider likely is that the JDK you were using
> implements caching in Double.valueOf(double). This could b
Cache misses are a possibility; try the integer version with long, so the
size of the data is the same as with double.
The other possibility I'd consider likely is that the JDK you were using
implements caching in Double.valueOf(double). This could be dealt with if
Clojure boxing directly called ne
On Aug 6, 10:00 am, Bradbev wrote:
> On Aug 6, 3:07 am, Andy Fingerhut
> wrote:
>
>
>
> > On Aug 5, 6:09 am, Rich Hickey wrote:
>
> > > On Wed, Aug 5, 2009 at 8:29 AM, Johann Kraus
> > > wrote:
>
> > > >> Could it be that your CPU has a single floating-point unit shared by 4
> > > >> cores o
FYI
IEEE doubles are typically 64 bit
IEEE floats are typically 32 bit.
The wikipedia article is good:
http://en.wikipedia.org/wiki/IEEE_754-2008
The IEEE standard (requires login):
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4610935
I'm not sure how the JVM implements them prec
On Aug 6, 3:07 am, Andy Fingerhut
wrote:
> On Aug 5, 6:09 am, Rich Hickey wrote:
>
>
>
> > On Wed, Aug 5, 2009 at 8:29 AM, Johann Kraus wrote:
>
> > >> Could it be that your CPU has a single floating-point unit shared by 4
> > >> cores on a single die, and thus only 2 floating-point units total
Hello again,
Another interesting test: replace the double operation by something longer,
that won't allocate anything.
(a long chain of math functions with primitive types...), and see if the
parallelism is better.
Best,
Nicolas.
On Thu, Aug 6, 2009 at 2:32 PM, Nicolas Oury wrote:
> Hello,
Hello,
I will try to have a guess. If 98% of time is spend allocating Doubles, the
program is loading new lines of memory in cache
every n Doubles. At some point down the different levels of cache, you have
a common cache/main memory for both cores and the bus to this memory has to
be shared in so
On Aug 5, 6:09 am, Rich Hickey wrote:
> On Wed, Aug 5, 2009 at 8:29 AM, Johann Kraus wrote:
>
> >> Could it be that your CPU has a single floating-point unit shared by 4
> >> cores on a single die, and thus only 2 floating-point units total for
> >> all 8 of your cores? If so, then that fact, pl
On Wed, Aug 5, 2009 at 8:29 AM, Johann Kraus wrote:
>
>> Could it be that your CPU has a single floating-point unit shared by 4
>> cores on a single die, and thus only 2 floating-point units total for
>> all 8 of your cores? If so, then that fact, plus the fact that each
>> core has its own separ
> Could it be that your CPU has a single floating-point unit shared by 4
> cores on a single die, and thus only 2 floating-point units total for
> all 8 of your cores? If so, then that fact, plus the fact that each
> core has its own separate ALU for integer operations, would seem to
> explain th
Johann:
Could it be that your CPU has a single floating-point unit shared by 4
cores on a single die, and thus only 2 floating-point units total for
all 8 of your cores? If so, then that fact, plus the fact that each
core has its own separate ALU for integer operations, would seem to
explain the
> My guess would be you're seeing the overhead for pmap since the
> (inc 0.1) computation is so cheap. From the docs for pmap:
> "Only useful for computationally intensive functions where the time of
> f dominates the coordination overhead."
I don't think so, as the cheap computation (inc 0.
Johann Kraus writes:
> Doing this with doubles:
> leads to:
> (time (maptest 8)) : 68044.060324 msecs
> (time (pmaptest 8)) : 35051.174503 msecs
> i.e. a speedup of ~2.
>
> However, the CPU usage indicated by "top" is ~690%. What does the CPU
> do?
My guess would be you're seeing the overhead
> However, the CPU usage indicated by "top" is ~690%. What does the CPU do?
100% per core. So with dual quad-core processors, it'd mean roughly 7
cores were being pegged.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Group
Sorry about the copy&paste error. I partially changed len to cores.
The code must look like:
(defn maptest [cores] (doall (map (fn [x] (dotimes [_ 10]
(inc
0))) (range cores
(defn pmaptest [cores] (doall (pmap (fn [x] (dotimes [_ 10]
(inc 0))) (range cores
and
(defn mapt
Hi all,
recently I did some micro-benchmarks of parallel code on my 8-core
computer. But I don't get the point about this behaviour of pmap. Can
anyone explain this to me? The code is running on a dual quad-core
intel machine (Xeon X5482, 3.20 GHz).
(defn maptest [cores] (doall (map (fn [x] (dot
25 matches
Mail list logo