you need to setup ssh without password, use key instead. How to connect without password using SSH (passwordless) <https://levelup.gitconnected.com/how-to-connect-without-password-using-ssh-passwordless-9b8963c828e8>
fre. 15. sep. 2023 kl. 20:55 skrev Mich Talebzadeh < mich.talebza...@gmail.com>: > Hi, > > Can these 4 nodes talk to each other through ssh as trusted hosts (on top > of the network that Sean already mentioned)? Otherwise you need to set it > up. You can install a LAN if you have another free port at the back of your > HPC nodes. They should > > You ought to try to set up a Hadoop cluster pretty easily. Check this old > article of mine for Hadoop set-up. > > > https://www.linkedin.com/pulse/diy-festive-season-how-install-configure-big-data-so-mich/?trackingId=z7n5tx7tQOGK9tcG9VClkw%3D%3D > > Hadoop will provide you with a common storage layer (HDFS) that these > nodes will be able to share and talk. Yarn is your best bet as the resource > manager with reasonably powerful hosts you have. However, for now the Stand > Alone mode will do. Make sure that the Metastore you choose, (by default it > will use Hive Metastore called Derby :( ) is something respetable like > Postgres DB that can handle multiple concurrent spark jobs > > HTH > > > Mich Talebzadeh, > Distinguished Technologist, Solutions Architect & Engineer > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 15 Sept 2023 at 07:04, Ilango <elango...@gmail.com> wrote: > >> >> Hi all, >> >> We have 4 HPC nodes and installed spark individually in all nodes. >> >> Spark is used as local mode(each driver/executor will have 8 cores and 65 >> GB) in Sparklyr/pyspark using Rstudio/Posit workbench. Slurm is used as >> scheduler. >> >> As this is local mode, we are facing performance issue(as only one >> executor) when it comes dealing with large datasets. >> >> Can I convert this 4 nodes into spark standalone cluster. We dont have >> hadoop so yarn mode is out of scope. >> >> Shall I follow the official documentation for setting up standalone >> cluster. Will it work? Do I need to aware anything else? >> Can you please share your thoughts? >> >> Thanks, >> Elango >> > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297