I see, so here might be the problem. With more cores, there's less memory 
available per core, and now many of your threads are doing external hashing 
(spilling data to disk), as evidenced by the calls to 
ExternalAppendOnlyMap.spill. Maybe with 10 threads, there was enough memory per 
task to do all its hashing there. It's true though that these threads appear to 
be CPU-bound, largely due to Java Serialization. You could get this to run 
quite a bit faster using Kryo. However that won't eliminate the issue of 
spilling here.

Matei

On Jul 14, 2014, at 1:02 PM, lokesh.gidra <lokesh.gi...@gmail.com> wrote:

> I am only playing with 'N' in local[N]. I thought that by increasing N, Spark
> will automatically use more parallel tasks. Isn't it so? Can you please tell
> me how can I modify the number of parallel tasks?
> 
> For me, there are hardly any threads in BLOCKED state in jstack output. In
> 'top' I see my application consuming all the 48 cores all the time with
> N=48.
> 
> I am attaching two jstack outputs that I took will the application was
> running.
> 
> 
> Lokesh
> 
> lessoutput3.lessoutput3
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput3.lessoutput3>
>   
> lessoutput4.lessoutput4
> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput4.lessoutput4>
>   
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9640.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to