33
An: Slurm User Community List
Betreff: Re: [slurm-users] How to deal with jobs that need to be restarted
several time
If the failures happen right after the job starts (or close enough), I’d use an
interactive session with srun (or some other wrapper that calls srun, such as
fisbatch).
Our hpc
If the failures happen right after the job starts (or close enough), I’d use an
interactive session with srun (or some other wrapper that calls srun, such as
fisbatch).
Our hpcshell wrapper for srun is just a bash function:
=
hpcshell ()
{
srun --partition=interactive $@ --pty bash -i
Hello,
Some jobs have to be restarted several times until they run.
Users start the Job, it fails, they have to do some changes,
they start the job again, it fails again ... and so on.
So they want to keep the resources until the job is running properly.
Is there a possibility to 'inherit' alloc