[slurm-users] good practices

Nigella Sanders Mon, 25 Nov 2019 01:17:13 -0800

Hi all,

I guess this is a simple matter but I still find it confusing.


I have to run 20 jobs on our supercomputer.
Each job takes about 8 hours and every one need the previous one to be
completed.
The queue time limit for jobs is 10 hours.

So my first approach is serially launching them in a loop using srun:


*#!/bin/bash*
*for i in {1..20};do*

*    srun  --time 08:10:00  [options]*

*done*

However SLURM literature keeps saying that 'srun' should be only used for
short command line tests. So that some sysadmins would consider this a bad
practice (see this
<https://stackoverflow.com/questions/43767866/slurm-srun-vs-sbatch-and-their-parameters>
).

My second approach switched to sbatch:

* #!/bin/bash *
*for i in {1..20};do*
*    sbatch  --time 08:10:00 [options]*

*    [polling to queue to see if job is done]*
*done*

But since sbatch returns the prompt I had to add code to check for job
termination. Polling make use of sleep command and it is prone to race
conditions so it doesn't like to sysadmins either.

I guess there must be a --wait option in some recent versions of SLURM (see
this <https://bugs.schedmd.com/show_bug.cgi?id=1685>). Not yet available in
our system though.

Is there any prefererable/canonical/friendly way to do this?
Any thoughts would be really appreciated,

Regards,
Nigella.

[slurm-users] good practices

Reply via email to