Hi,

Can these 4 nodes talk to each other through ssh as trusted hosts (on top
of the network that Sean already mentioned)? Otherwise you need to set it
up. You can install a LAN if you have another free port at the back of your
HPC nodes. They should

You ought to try to set up a Hadoop cluster pretty easily. Check this old
article of mine for Hadoop set-up.

https://www.linkedin.com/pulse/diy-festive-season-how-install-configure-big-data-so-mich/?trackingId=z7n5tx7tQOGK9tcG9VClkw%3D%3D

Hadoop will provide you with a common storage layer (HDFS) that these nodes
will be able to share and talk. Yarn is your best bet as the resource
manager with reasonably powerful hosts you have. However, for now the Stand
Alone mode will do. Make sure that the Metastore you choose, (by default it
will use Hive Metastore called Derby :( ) is something respetable like
Postgres DB that can handle multiple concurrent spark jobs

HTH


Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 15 Sept 2023 at 07:04, Ilango <elango...@gmail.com> wrote:

>
> Hi all,
>
> We have 4 HPC nodes and installed spark individually in all nodes.
>
> Spark is used as local mode(each driver/executor will have 8 cores and 65
> GB) in Sparklyr/pyspark using Rstudio/Posit workbench. Slurm is used as
> scheduler.
>
> As this is local mode, we are facing performance issue(as only one
> executor) when it comes dealing with large datasets.
>
> Can I convert this 4 nodes into spark standalone cluster. We dont have
> hadoop so yarn mode is out of scope.
>
> Shall I follow the official documentation for setting up standalone
> cluster. Will it work? Do I need to aware anything else?
> Can you please share your thoughts?
>
> Thanks,
> Elango
>

Reply via email to