Re: Spark stand-alone mode

Bjørn Jørgensen Fri, 15 Sep 2023 13:22:51 -0700

you need to setup ssh without password, use key instead.  How to connect
without password using SSH (passwordless)
<https://levelup.gitconnected.com/how-to-connect-without-password-using-ssh-passwordless-9b8963c828e8>


fre. 15. sep. 2023 kl. 20:55 skrev Mich Talebzadeh <
mich.talebza...@gmail.com>:

> Hi,
>
> Can these 4 nodes talk to each other through ssh as trusted hosts (on top
> of the network that Sean already mentioned)? Otherwise you need to set it
> up. You can install a LAN if you have another free port at the back of your
> HPC nodes. They should
>
> You ought to try to set up a Hadoop cluster pretty easily. Check this old
> article of mine for Hadoop set-up.
>
>
> https://www.linkedin.com/pulse/diy-festive-season-how-install-configure-big-data-so-mich/?trackingId=z7n5tx7tQOGK9tcG9VClkw%3D%3D
>
> Hadoop will provide you with a common storage layer (HDFS) that these
> nodes will be able to share and talk. Yarn is your best bet as the resource
> manager with reasonably powerful hosts you have. However, for now the Stand
> Alone mode will do. Make sure that the Metastore you choose, (by default it
> will use Hive Metastore called Derby :( ) is something respetable like
> Postgres DB that can handle multiple concurrent spark jobs
>
> HTH
>
>
> Mich Talebzadeh,
> Distinguished Technologist, Solutions Architect & Engineer
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 15 Sept 2023 at 07:04, Ilango <elango...@gmail.com> wrote:
>
>>
>> Hi all,
>>
>> We have 4 HPC nodes and installed spark individually in all nodes.
>>
>> Spark is used as local mode(each driver/executor will have 8 cores and 65
>> GB) in Sparklyr/pyspark using Rstudio/Posit workbench. Slurm is used as
>> scheduler.
>>
>> As this is local mode, we are facing performance issue(as only one
>> executor) when it comes dealing with large datasets.
>>
>> Can I convert this 4 nodes into spark standalone cluster. We dont have
>> hadoop so yarn mode is out of scope.
>>
>> Shall I follow the official documentation for setting up standalone
>> cluster. Will it work? Do I need to aware anything else?
>> Can you please share your thoughts?
>>
>> Thanks,
>> Elango
>>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: Spark stand-alone mode

Reply via email to