Hello, I have written a script in my prolog.sh that cancels any slurm job if the parameter gres=gpu is not present. This is the script i added to my prolog.sh
if [ $SLURM_JOB_PARTITION == "gpu" ]; then if [ ! -z "${GPU_DEVICE_ORDINAL}" ]; then echo "GPU ID used is ID: $GPU_DEVICE_ORDINAL " list_gpu=$(echo "$GPU_DEVICE_ORDINAL" | sed -e "s/,//g") Ngpu=$(expr length $list_gpu) else echo "No GPU selected" Ngpu=0 fi # if 0 gpus were allocated, cancel the job if [ "$Ngpu" -eq "0" ]; then scancel ${SLURM_JOB_ID} fi fi What the code does is look at the number of gpus allocated, and if it is 0, cancel the job ID. It working fine if a user use sbatch submit.sh (and the submit.sh do not have the value --gres=gpu:1). However, when requesting an interactive session without gpus, the job is getting killed and the job hangs for 5-6 mins before getting killed. jlo@mfe01:~ $ srun --partition=gpu --pty bash --login srun: job 4631872 queued and waiting for resources srun: job 4631872 has been allocated resources srun: Force Terminated job 4631872 ...the killing hangs for 5-6minutes Is there anything wrong with my script? Why only when scancel an interactive session, I am seeing this hanging. I would like to remove the hanging Thanks *Fritz Ratnasamy* Data Scientist Information Technology The University of Chicago Booth School of Business 5807 S. Woodlawn Chicago, Illinois 60637 Phone: +(1) 773-834-4556