Depending on the users who will be on this cluster, I'd probably adjust the partition to have a defined, non-infinite MaxTime, and maybe a lower DefaultTime. Otherwise, it would be very easy for someone to start a job that reserves all cores until the nodes get rebooted, since all they have to do is submit a job with no explicit time limit (which would then use DefaultTime, which itself has a default value of MaxTime).
On 10/2/20, 7:37 AM, "slurm-users on behalf of John H" <slurm-users-boun...@lists.schedmd.com on behalf of j...@sdf.org> wrote: Hi All Hope you are all keeping well in these difficult times. I have setup a small Slurm cluster of 8 compute nodes (4 x 1-core CPUs, 16GB RAM) without scheduling or accounting as it isn't really needed. I'm just looking for confirmation it's configured correctly to allow the controller to 'see' all resource and allocate incoming jobs to the most readily available node in the cluster. I can see jobs are being delivered to different nodes but want to ensure I haven't inadvertently done anything to render it sub optimal (even in such a simple use case!) Thanks very much for any assistance, here is my cfg: # # SLURM.CONF ControlMachine=slnode1 BackupController=slnode2 MpiDefault=none ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm StateSaveLocation=/var/spool/slurm-llnl SwitchType=switch/none TaskPlugin=task/none # # TIMERS MinJobAge=86400 # # SCHEDULING FastSchedule=1 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_CPU_MEMORY # # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=cluster JobAcctGatherType=jobacct_gather/none SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/var/log/slurm-llnl/slurmd.log # # COMPUTE NODES NodeName=slnode[1-8] CPUs=4 Boards=1 SocketsPerBoard=4 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=16017 PartitionName=sl Nodes=slnode[1-8] Default=YES MaxTime=INFINITE State=UP John