Inline below On Tue, Nov 26, 2019 at 5:50 AM Loris Bennett <loris.benn...@fu-berlin.de> wrote: > > Hi Nigella, > > Nigella Sanders <nigella.sand...@gmail.com> writes: > > > Thank you all for such interesting replies. > > > > The --dependency option is quite useful but in practice it has some > > inconvenients. Firstly, all 20 jobs are instantly queued which some > > users may be interpreting as an abusive use of common resources. > > This doesn't seem a problem to me, since no common resources are being > used by jobs in the queue. It only becomes a problem if a single person > can queue enough jobs to consume all the resources *and* you are not using > any form of fairshare. Otherwise job started later, but with a higher > priority will start earlier, if the resources become available. > > This is not to say that users might *think* that a large number of jobs > belonging other users automatically means that later jobs will be > disadvantages. However, that is more an issue of educating your users. > > > Even worse, if a job fails, the rest one will stay queued forever (?) > > being the first tagged as "DependencyNeverSatisfied", and the rest > > just as "Dependency". > > This is just a consequence of your requirement that "each job ... needs > the previous one to be completed", but it also isn't a problem, because > pending jobs don't consume resources for which users complete.
Also, using kill_invalid_depend in your slurm.conf's SchedulerParameters will automatically remove the jobs from the queue once their dependency can't be satisfied. > > Regards > > Loris > > > PS: Yarom, with queue time I meant the total run time allowed. I my case, > > after a job starts running it will be killed if it takes more than 10 hours > > of execution time. If the partition queue time limit were of 10 days > > for instance I guess I could use a single sbatch to launch an script > > containing the 20 executions as steps with srun > > > > Regards, > > Nigella > > > > El lun., 25 nov. 2019 a las 15:08, Yair Yarom (<ir...@cs.huji.ac.il>) > > escribió: > > > > Hi, > > > > I'm not sure what queue time limit of 10 hours is. If you can't have jobs > > waiting for more than 10 hours, than it seems to be very small for 8 hours > > jobs. > > Generally, a few options: > > a. The --dependency option (either afterok or singleton) > > b. The --array option of sbatch with limit of 1 job at a time (instead of > > the for loop): sbatch --array=1-20%1 > > c. At the end of the script of each job, call the sbatch line of the next > > job (this is probably the only option if indeed I understood the queue time > > limit correctly). > > > > And indeed, srun should probably be reserved for strictly interactive jobs. > > > > Regards, > > Yair. > > > > On Mon, Nov 25, 2019 at 11:21 AM Nigella Sanders > > <nigella.sand...@gmail.com> wrote: > > > > Hi all, > > > > I guess this is a simple matter but I still find it confusing. > > > > I have to run 20 jobs on our supercomputer. > > Each job takes about 8 hours and every one need the previous one to be > > completed. > > The queue time limit for jobs is 10 hours. > > > > So my first approach is serially launching them in a loop using srun: > > > > #!/bin/bash > > for i in {1..20};do > > srun --time 08:10:00 [options] > > done > > > > However SLURM literature keeps saying that 'srun' should be only used for > > short command line tests. So that some sysadmins would consider this a bad > > practice (see this). > > > > My second approach switched to sbatch: > > > > #!/bin/bash > > for i in {1..20};do > > sbatch --time 08:10:00 [options] > > [polling to queue to see if job is done] > > done > > > > But since sbatch returns the prompt I had to add code to check for job > > termination. Polling make use of sleep command and it is prone to race > > conditions so it doesn't like to sysadmins either. > > > > I guess there must be a --wait option in some recent versions of SLURM > > (see this). Not yet available in our system though. > > > > Is there any prefererable/canonical/friendly way to do this? > > Any thoughts would be really appreciated, > > > > Regards, > > Nigella. > > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de >