I see, so here might be the problem. With more cores, there's less memory available per core, and now many of your threads are doing external hashing (spilling data to disk), as evidenced by the calls to ExternalAppendOnlyMap.spill. Maybe with 10 threads, there was enough memory per task to do all its hashing there. It's true though that these threads appear to be CPU-bound, largely due to Java Serialization. You could get this to run quite a bit faster using Kryo. However that won't eliminate the issue of spilling here.
Matei On Jul 14, 2014, at 1:02 PM, lokesh.gidra <lokesh.gi...@gmail.com> wrote: > I am only playing with 'N' in local[N]. I thought that by increasing N, Spark > will automatically use more parallel tasks. Isn't it so? Can you please tell > me how can I modify the number of parallel tasks? > > For me, there are hardly any threads in BLOCKED state in jstack output. In > 'top' I see my application consuming all the 48 cores all the time with > N=48. > > I am attaching two jstack outputs that I took will the application was > running. > > > Lokesh > > lessoutput3.lessoutput3 > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput3.lessoutput3> > > lessoutput4.lessoutput4 > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n9640/lessoutput4.lessoutput4> > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9640.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.