On 06/15/2022 02:48 PM, Tina Friedrich wrote: > Hi Guillaume, > Hi Tina,
> in that example you wouldn't need the 'srun' to run more than one task, > I think. > You are correct. To start a program like sleep I could simply run: sleep 20s & sleep 30s & wait However, my objective is to use mpirun in combination with srun to avoid to define manually rankfile. > > I'm not 100% sure, but to me it sounds like you're currently assigning > whole nodes to jobs rather than cores (i.e have > 'SelectType=select/linear' and no OverSubscribe) and find that to be > wasteful - is that correct? > In my first email I copy parts of my slurm.conf. I'm using "SelectType=select/cons_res" with "SelectTypeParameters=CR_Core_Memory" And until now "no OverSubscribe". I tried to activate "OverSubscribe=YES" on the partition with PartitionName=short Nodes=node[01-08] Default=NO MaxTime=0-02:00:00 State=UP DefaultTime=00:00:00 MinNodes=1 PriorityTier=100 OverSubscribe=YES But it did not solve the issue with srun -vvv --exact -n1 -c1 sleep 20 > srun1.log 2>&1 & srun -vvv --exact -n1 -c1 sleep 30 > srun2.log 2>&1 & wait > If it is, I'd say the more obvious solution to that would be to change > the SelectType to either select/cons_res or select/cons_tres, so that > cores (not nodes) are allocated to jobs? > How can I be sure that my slurm is using the parameter "select/cons_res" defined in my /etc/slurm/slurm.conf? Thx a lot Guillaume > Tina > > On 15/06/2022 13:20, Guillaume De Nayer wrote: >> Dear all, >> >> I'm new on this list. I am responsible for several small clusters at our >> chair. >> >> I set up slurm 21.08.8-2 on a small cluster (CentOS 7) with 8 nodes: >> NodeName=node0[1-8] CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 >> ThreadsPerCore=1 >> >> One collegue has to run 20,000 jobs on this machine. Every job starts >> his program with mpirun on 12 cores. The standard slurm behavior makes >> that the node, which runs this job is blocked (and 28 cores are idle). >> The small cluster has only 8 nodes, so only 8 jobs can run in parallel. >> >> In order to solve this problem I'm trying to start some subtasks with >> srun inside a batch job (without mpirun for now): >> >> #!/bin/bash >> #SBATCH --job-name=test_multi_prog_srun >> #SBATCH --nodes=1 >> #SBATCH --partition=short >> #SBATCH --time=02:00:00 >> #SBATCH --exclusive >> >> srun -vvv --exact -n1 -c1 sleep 20 > srun1.log 2>&1 & >> srun -vvv --exact -n1 -c1 sleep 30 > srun2.log 2>&1 & >> wait >> >> >> However, only one task runs. The second is waiting for the completion of >> the first task to start. >> >> Can someone explain me, what I'm doing wrong? >> >> >> Thx in advance, >> Regards, >> Guillaume >> >> >> # slurm.conf file >> MpiDefault=none >> ProctrackType=proctrack/linuxproc >> ReturnToService=1 >> SlurmUser=root >> SwitchType=switch/none >> TaskPlugin=task/none >> SchedulerType=sched/backfill >> SelectType=select/cons_res >> SelectTypeParameters=CR_Core_Memory >> AccountingStorageEnforce=limits >> AccountingStorageType=accounting_storage/slurmdbd >> AccountingStoreFlags=job_comment >> JobAcctGatherFrequency=30 >> SlurmctldDebug=error >> SlurmdDebug=error >> SlurmctldLogFile=/var/log/slurmctld.log >> SlurmdLogFile=/var/log/slurmd.log >> >> NodeName=node0[1-8] CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 >> ThreadsPerCore=1 State=UNKNOWN >> PartitionName=short Nodes=node[01-08] Default=NO MaxTime=0-02:00:00 >> State=UP DefaultTime=00:00:00 MinNodes=1 PriorityTier=100 >> >> >> >