When does SparkContext.defaultParallelism have the correct value?

Stephen Coy Mon, 06 Jul 2020 20:35:39 -0700

Hi there,

I have found that if I invoke


sparkContext.defaultParallelism()

too early it will not return the correct value;

For example, if I write this:

final JavaSparkContext sparkContext = new 
JavaSparkContext(sparkSession.sparkContext());
final int workerCount = sparkContext.defaultParallelism();

I will get some small number (which I can’t recall right now).

However, if I insert:

sparkContext.parallelize(List.of(1, 2, 3, 4)).collect()

between these two lines I get the expected value being something like 
node_count * node_core_count;

This seems like a hacky work around solution to me. Is there a better way to 
get this value initialised properly?

FWIW, I need this value to size a connection pool (fs.s3a.connection.maximum) 
correctly in a cluster independent way.

Thanks,

Steve C


[http://downloads.ifmsystems.com/data/marketing/images/signatures/driving-force-newsletter.jpg]<https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force>
This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/

When does SparkContext.defaultParallelism have the correct value?

Reply via email to