Hi all,

We have 4 HPC nodes and installed spark individually in all nodes.

Spark is used as local mode(each driver/executor will have 8 cores and 65
GB) in Sparklyr/pyspark using Rstudio/Posit workbench. Slurm is used as
scheduler.

As this is local mode, we are facing performance issue(as only one
executor) when it comes dealing with large datasets.

Can I convert this 4 nodes into spark standalone cluster. We dont have
hadoop so yarn mode is out of scope.

Shall I follow the official documentation for setting up standalone
cluster. Will it work? Do I need to aware anything else?
Can you please share your thoughts?

Thanks,
Elango

Reply via email to