Re: [slurm-users] How to deal with jobs that need to be restarted several time

2019-03-13 Thread Selch, Brigitte (FIDF)
Hello, Jeah, that's it. I can use salloc, instead of sbatch. The user can test and run the job within this interactive slurm allocation. Thank you Brigitte Selch -Ursprüngliche Nachricht- Von: slurm-users Im Auftrag von Renfro, Michael Gesendet: Dienstag, 12. März 2019 15:33 An:

Re: [slurm-users] How to deal with jobs that need to be restarted several time

2019-03-12 Thread Renfro, Michael
If the failures happen right after the job starts (or close enough), I’d use an interactive session with srun (or some other wrapper that calls srun, such as fisbatch). Our hpcshell wrapper for srun is just a bash function: = hpcshell () { srun --partition=interactive $@ --pty bash -i

[slurm-users] How to deal with jobs that need to be restarted several time

2019-03-12 Thread Selch, Brigitte (FIDF)
Hello, Some jobs have to be restarted several times until they run. Users start the Job, it fails, they have to do some changes, they start the job again, it fails again ... and so on. So they want to keep the resources until the job is running properly. Is there a possibility to 'inherit'