Depending on the users who will be on this cluster, I'd probably adjust the 
partition to have a defined, non-infinite MaxTime, and maybe a lower 
DefaultTime. Otherwise, it would be very easy for someone to start a job that 
reserves all cores until the nodes get rebooted, since all they have to do is 
submit a job with no explicit time limit (which would then use DefaultTime, 
which itself has a default value of MaxTime). 

On 10/2/20, 7:37 AM, "slurm-users on behalf of John H" 
<slurm-users-boun...@lists.schedmd.com on behalf of j...@sdf.org> wrote:

    Hi All

    Hope you are all keeping well in these difficult times.

    I have setup a small Slurm cluster of 8 compute nodes (4 x 1-core CPUs, 
16GB RAM) without scheduling or accounting as it isn't really needed.

    I'm just looking for confirmation it's configured correctly to allow the 
controller to 'see' all resource and allocate incoming jobs to the most readily 
available node in the cluster. I can see
    jobs are being delivered to different nodes but want to ensure I haven't 
inadvertently done anything to render it sub optimal (even in such a simple use 
case!)

    Thanks very much for any assistance, here is my cfg:

    #
    # SLURM.CONF
    ControlMachine=slnode1
    BackupController=slnode2
    MpiDefault=none
    ProctrackType=proctrack/pgid
    ReturnToService=1
    SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
    SlurmctldPort=6817
    SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
    SlurmdPort=6818
    SlurmdSpoolDir=/var/spool/slurmd
    SlurmUser=slurm
    StateSaveLocation=/var/spool/slurm-llnl
    SwitchType=switch/none
    TaskPlugin=task/none
    #
    # TIMERS
    MinJobAge=86400
    #
    # SCHEDULING
    FastSchedule=1
    SchedulerType=sched/backfill
    SelectType=select/cons_res
    SelectTypeParameters=CR_CPU_MEMORY
    #
    # LOGGING AND ACCOUNTING
    AccountingStorageType=accounting_storage/none
    ClusterName=cluster
    JobAcctGatherType=jobacct_gather/none
    SlurmctldDebug=3
    SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
    SlurmdDebug=3
    SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
    #
    # COMPUTE NODES
    NodeName=slnode[1-8] CPUs=4 Boards=1 SocketsPerBoard=4 CoresPerSocket=1 
ThreadsPerCore=1 RealMemory=16017
    PartitionName=sl Nodes=slnode[1-8] Default=YES MaxTime=INFINITE State=UP

    John

Reply via email to