Dear all, I'm new on this list. I am responsible for several small clusters at our chair.
I set up slurm 21.08.8-2 on a small cluster (CentOS 7) with 8 nodes: NodeName=node0[1-8] CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1 One collegue has to run 20,000 jobs on this machine. Every job starts his program with mpirun on 12 cores. The standard slurm behavior makes that the node, which runs this job is blocked (and 28 cores are idle). The small cluster has only 8 nodes, so only 8 jobs can run in parallel. In order to solve this problem I'm trying to start some subtasks with srun inside a batch job (without mpirun for now): #!/bin/bash #SBATCH --job-name=test_multi_prog_srun #SBATCH --nodes=1 #SBATCH --partition=short #SBATCH --time=02:00:00 #SBATCH --exclusive srun -vvv --exact -n1 -c1 sleep 20 > srun1.log 2>&1 & srun -vvv --exact -n1 -c1 sleep 30 > srun2.log 2>&1 & wait However, only one task runs. The second is waiting for the completion of the first task to start. Can someone explain me, what I'm doing wrong? Thx in advance, Regards, Guillaume # slurm.conf file MpiDefault=none ProctrackType=proctrack/linuxproc ReturnToService=1 SlurmUser=root SwitchType=switch/none TaskPlugin=task/none SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory AccountingStorageEnforce=limits AccountingStorageType=accounting_storage/slurmdbd AccountingStoreFlags=job_comment JobAcctGatherFrequency=30 SlurmctldDebug=error SlurmdDebug=error SlurmctldLogFile=/var/log/slurmctld.log SlurmdLogFile=/var/log/slurmd.log NodeName=node0[1-8] CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1 State=UNKNOWN PartitionName=short Nodes=node[01-08] Default=NO MaxTime=0-02:00:00 State=UP DefaultTime=00:00:00 MinNodes=1 PriorityTier=100