Hi Gareth, I think you solved the problem. In my slurm.conf no setting on the Memory was set (not for the node definition, not for the partition). I change that and I add also "--mem-per-cpu 1" in the srun. It seems to work. I will test it now with mpirun.
Thx a lot for your help! Regards Guillaume On 06/15/2022 11:20 PM, Williams, Gareth (IM&T, Black Mountain) wrote: > I think the problem might be that you are not requesting memory, so by > default, all memory on a node is allocated to the job and "cons_res" will not > allocate a second job to any node. That comes up quite often. > > Gareth > > -----Original Message----- > From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of > Guillaume De Nayer > Sent: Thursday, 16 June 2022 1:37 AM > To: slurm-users@lists.schedmd.com > Subject: Re: [slurm-users] Multiple Program Runs using srun in one Slurm > batch Job on one node > > On 06/15/2022 05:25 PM, Ward Poelmans wrote: >> Hi Guillaume, >> >> On 15/06/2022 16:59, Guillaume De Nayer wrote: >>> >>> Perhaps I missunderstand the Slurm documentation... >>> >>> As thought that the --exclusive option used in combination with >>> sbatch will reserve the whole node (40 cores) for the job (submitted >>> with sbatch). This part is working fine. I can check it with sacct. >>> >>> Then, this job starts subtasks on the reserved 40 cores with srun. >>> Therefore I'm using "-n1 -c1" in combination with "srun". I thought >>> that it was possible to use the reserved cores inside this job using srun. >> >> You're correct. --exclusive will give you all cores on the nodes but >> only as much memory as requested. >> >> >>> The following slightly modified job without --exclusive and with >>> --ntasks=2 leads to a similar problem: Only one srun is running at a >>> time. The second starts directly after the first one finished. >>> >>> #!/bin/bash >>> #SBATCH --job-name=test_multi_prog_srun #SBATCH --ntasks=2 #SBATCH >>> --partition=short #SBATCH --time=02:00:00 >>> >>> srun -vvv --exact -n1 -c1 sleep 20 > srun1.log 2>&1 & srun -vvv >>> --exact -n1 -c1 sleep 30 > srun2.log 2>&1 & wait >> >> This should work... It works on our cluster. Are you sure they don't >> run in parallel? >> > > Yes I'm pretty sure that it does not work in parallel: The command sacct show > me only on subtask "RUNNING". Then, when this subtask is marked as > "COMPLETED", the second one appears and is marked "RUNNING". > > Moreover, if I directly connect on the node, only one process of "sleep" > is running. > > ok. If it works on your cluster, I have perhaps a problem in my slurm config. > Which version of slurm are you using on your cluster? And can you share your > slurm.conf? > >> We usually recommend to use gnu parallel or xargs like: >> >> xargs -P $SLURM_NTASKS srun -N 1 -n 1 -c 1 --exact sleep 30 >> > > ok. I will install "gnu parallel" and also test your xargs command. > > Thx a lot! > Guillaume > >