Forgot the link to the Wiki: https://wiki.fysik.dtu.dk/niflheim/SLURM

On 12/8/19 9:18 PM, Ole Holm Nielsen wrote:
Hi Dean,

You may want to look at the links in my Slurm Wiki page.  Both the official Slurm documentation and other resources are listed.  I think most of your requirements and questions are described in these pages.

My Wiki gives detailed deployment information for a CentOS 7 cluster, but much of this information should be relevant for Ubuntu as well.

/Ole


On 06-12-2019 22:57, Dean Schulze wrote:
I'm doing my first slurm installation.  The schedmd docs assume that I have a cluster that meets certain (unstated) requirements available, but I don't.  I've found a couple of examples showing how to setup a cluster for slurm using real hardware (nodes) with GPUs:

https://github.com/mknoxnv/ubuntu-slurm
https://github.com/nateGeorge/slurm_gpu_ubuntu

The requirements for a cluster for slurm seem to be:

   Passwordless SSH is working between slurm controller and slurm nodes
   There is shared storage between all the nodes: /storage & /home (NFS)
   The UIDs and GIDs will be consistent between all the nodes. (LDAP or other)
   Hostnames have to be a FQDN.
   Slurm will be used to control SSH access to compute nodes.
   Compute nodes are DNS resolvable.
   Compute nodes have GPUs and the latest CUDA drivers installed
   Time has to be synchronized across all nodes and controller (ntp or freeipa)
   (If time isn't synch'ed properly the controller might not start)


My questions are:

   Are the cluster requirements above correct and complete?

   Can I use virtual machines without GPUs for my nodes?
   (This is just to get started.  Eventually I'll have real hardware with GPUs for my nodes.)

   From the Ubuntu link on your download page I've downloaded these files:

     slurmctld_18.08.6.2-1_amd64.deb      610.9 kB
     slurm-client_18.08.6.2-1_amd64.deb   887.7 kB
     slurm-wlm_18.08.6.2-1_amd64.deb      12.3 kB

   The slurmctld would be installed on my controller, but what do I install on my nodes?    The slurm-wlm file is very small.  Would I install it on my node? What is the client for?


Reply via email to