Hi all, We have 4 HPC nodes and installed spark individually in all nodes.
Spark is used as local mode(each driver/executor will have 8 cores and 65 GB) in Sparklyr/pyspark using Rstudio/Posit workbench. Slurm is used as scheduler. As this is local mode, we are facing performance issue(as only one executor) when it comes dealing with large datasets. Can I convert this 4 nodes into spark standalone cluster. We dont have hadoop so yarn mode is out of scope. Shall I follow the official documentation for setting up standalone cluster. Will it work? Do I need to aware anything else? Can you please share your thoughts? Thanks, Elango