I'm doing my first slurm installation. The schedmd docs assume that I have a cluster that meets certain (unstated) requirements available, but I don't. I've found a couple of examples showing how to setup a cluster for slurm using real hardware (nodes) with GPUs:
https://github.com/mknoxnv/ubuntu-slurm https://github.com/nateGeorge/slurm_gpu_ubuntu The requirements for a cluster for slurm seem to be: Passwordless SSH is working between slurm controller and slurm nodes There is shared storage between all the nodes: /storage & /home (NFS) The UIDs and GIDs will be consistent between all the nodes. (LDAP or other) Hostnames have to be a FQDN. Slurm will be used to control SSH access to compute nodes. Compute nodes are DNS resolvable. Compute nodes have GPUs and the latest CUDA drivers installed Time has to be synchronized across all nodes and controller (ntp or freeipa) (If time isn't synch'ed properly the controller might not start) My questions are: Are the cluster requirements above correct and complete? Can I use virtual machines without GPUs for my nodes? (This is just to get started. Eventually I'll have real hardware with GPUs for my nodes.) From the Ubuntu link on your download page I've downloaded these files: slurmctld_18.08.6.2-1_amd64.deb 610.9 kB slurm-client_18.08.6.2-1_amd64.deb 887.7 kB slurm-wlm_18.08.6.2-1_amd64.deb 12.3 kB The slurmctld would be installed on my controller, but what do I install on my nodes? The slurm-wlm file is very small. Would I install it on my node? What is the client for? Thank you.