I don’t think the number of CPU cores controls the “number of parallel tasks”. 
The number of Tasks corresponds first and foremost to the number of (Dstream) 
RDD Partitions  

 

The Spark documentation doesn’t mention what is meant by “Task” in terms of 
Standard Multithreading Terminology ie a Thread or Process so your point is 
good 

 

Ps: time and time again every product and dev team and company invent their own 
terminology so 50% of the time using the product is spent on deciphering and 
reinventing the wheel 

 

From: Mulugeta Mammo [mailto:mulugeta.abe...@gmail.com] 
Sent: Thursday, May 28, 2015 7:24 PM
To: Ruslan Dautkhanov
Cc: user
Subject: Re: Value for SPARK_EXECUTOR_CORES

 

Thanks for the valuable information. The blog states:

 

"The cores property controls the number of concurrent tasks an executor can 
run. --executor-cores 5 means that each executor can run a maximum of five 
tasks at the same time. " 

 

So, I guess the max number of executor-cores I can assign is the CPU count 
(which includes the number of threads per core), not just the number of cores. 
I just want to be sure the "cores" term Spark is using.

 

Thanks

 

On Thu, May 28, 2015 at 11:16 AM, Ruslan Dautkhanov <dautkha...@gmail.com> 
wrote:

It's not only about cores. Keep in mind spark.executor.cores also affects 
available memeory for each task:

 

>From 
>http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

 

The memory available to each task is (spark.executor.memory * 
spark.shuffle.memoryFraction 
*spark.shuffle.safetyFraction)/spark.executor.cores. Memory fraction and safety 
fraction default to 0.2 and 0.8 respectively.




I'd test spark.executor.cores with 2,4,8 and 16 and see what makes your job run 
faster..

 


-- 
Ruslan Dautkhanov

 

On Wed, May 27, 2015 at 6:46 PM, Mulugeta Mammo <mulugeta.abe...@gmail.com> 
wrote:

My executor has the following spec (lscpu):

 

CPU(s): 16

Core(s) per socket: 4

Socket(s): 2

Thread(s) per code: 2

 

The CPU count is obviously 4*2*2 = 16. My question is what value is Spark 
expecting in SPARK_EXECUTOR_CORES ? The CPU count (16) or total # of cores (2 * 
2 = 4) ?

 

Thanks

 

 

Reply via email to