Hi,

I run gmx 2019 using GPU
There are 4 GPUs in my GPU hosts.
I have slurm and configured gres=gpu

1. If I submit a job with --gres=gpu:1 then GPU#0 is identified and used (-gpu_id $CUDA_VISIBLE_DEVICES). 2. If I submit a second job, it fails: the $CUDA_VISIBLE_DEVICES is 1 and selected, but GPU #0 is identified by gmx as a compatible gpu.
From the output:

gmx mdrun -v -pin on -deffnm equi_nvt -nt 8 -gpu_id 1 -nb gpu -pme gpu -npme 1 -ntmpi 4

  GPU info:
    Number of GPUs detected: 1
    #0: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat: compatible

Fatal error:
You limited the set of compatible GPUs to a set that included ID #1, but that
ID is not for a compatible GPU. List only compatible GPUs.

3. If I login to that node and run the mdrun command written into the output in the previous step then it selects the right gpu and runs as expected.

$CUDA_DEVICE_ORDER is set to PCI_BUS_ID

I can not decide if this is a slurm config error or something with gromacs, as $CUDA_VISIBLE_DEVICES is set correctly by slurm and I expect gromacs to detect all 4GPUs.

Thanks for your help and suggestions,
Tamas

--

Tamas Hegedus, PhD
Senior Research Fellow
Department of Biophysics and Radiation Biology
Semmelweis University     | phone: (36) 1-459 1500/60233
Tuzolto utca 37-47        | mailto:ta...@hegelab.org
Budapest, 1094, Hungary   | http://www.hegelab.org


Reply via email to