Hi there,

I use a bash script to simultaneously submit multiple, single-GPU jobs to a
cluster containing 18 nodes with 4 GPUs per node.

#!/bin/bash
#SBATCH -J jobName
#SBATCH --partition=GPU
#SBATCH --get-user-env
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --gres=gpu:1

source /etc/profile.d/modules.sh
export pmemd="srun $AMBERHOME/bin/pmemd.cuda "
export CUDA_VISIBLE_DEVICES=$(/programs/bin/freegpus 1 $SLURM_JOB_ID) //
Program uses nvidia-smi to figure out what GPUs are occupied.

${pmemd} -O \
-i eq2.in \
-o eq2.o \
-p CPLX_Neut_Sol.prmtop \
-c eq1.rst7 \
-r eq2.rst7 \
-x eq2.nc \
-ref eq1.rst7


We installed an extra 8 nodes recently and I find when submitting to those
nodes I get four jobs running on a single GPU, while the other three GPUs
are idle. If I wait 30 seconds between submission they go on separate GPUs
(the behaviour I want). When submitting using the same scripts to the older
nodes, all works fine. I've reproduced this multiple times. See a video of
the problem here (note the quality may be better if you download first):

https://www.dropbox.com/s/ahc39mvsefnvnps/video1.ogv?dl=0
<https://www.dropbox.com/s/ahc39mvsefnvnps/video1.ogv?dl=0>

I'm showing that the output of our program "freegpus" is ok, but when
submitting two jobs to node015, they both go on the same GPU with ID 0.
When submitting two jobs to node003, they go on separate GPUs. I've
repeated this behaviour ~10 times. Once in a while the jobs seem to go
straight to running, instead of hanging around as "PD" for several seconds.
When that happens they do actually go on separate GPUs on node015!

It seems like a SLURM bug, so I thought I'd post here.
Any ideas?

Oliver

Reply via email to