Hi there,
We use something like:
/*
* Force Spark to initialise the defaultParallelism by executing a dummy
parallel operation and then return
* the resulting defaultParallelism.
*/
private int getWorkerCount(SparkContext sparkContext) {
sparkContext.parallelize(List.of(1, 2, 3, 4)).collect();
return sparkContext.defaultParallelism();
}
Its useful for setting certain pool sizes dynamically, such as:
sparkContext.hadoopConfiguration().set("fs.s3a.connection.maximum",
Integer.toString(workerCount * 2));
This works in our Spark 3.0.1 code; just migrating to 3.2.1 now.
Cheers,
Steve C
On 8 Jun 2022, at 4:28 pm, Poorna Murali
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I would like to know if it is possible to get the count of live master and
worker spark nodes running in a system.
Please help to clarify the same.
Thanks&Regards,
Poorna
This email contains confidential information of and is the copyright of
Infomedia. It must not be forwarded, amended or disclosed without consent of
the sender. If you received this message by mistake, please advise the sender
and delete all copies. Security of transmission on the internet cannot be
guaranteed, could be infected, intercepted, or corrupted and you should ensure
you have suitable antivirus protection in place. By sending us your or any
third party personal details, you consent to (or confirm you have obtained
consent from such third parties) to Infomedia's privacy policy.
http://www.infomedia.com.au/privacy-policy/