Hello,

We're looking for any advice for salloc/srun setup that uses 1 gpu/task but
where the job makes use of all available gpus.


*Test #1:*

We desire an salloc and srun such that each task gets 1 GPU, but the GPU
usage for the job is spread out among 4 available devices.  See gres.conf
below.



% salloc -n 12 -c 2 -gres=gpu:1



% srun  env | grep CUDA

CUDA_VISIBLE_DEVICES=0

(12 times)



Where we desire:

CUDA_VISIBLE_DEVICES=0

CUDA_VISIBLE_DEVICES=1

CUDA_VISIBLE_DEVICES=2

CUDA_VISIBLE_DEVICES=3

And so on (12 times), such that each task still gets 1 gpu, but usage is
spread out among the 4 available devices (see gres.conf below).   Not one
(device=0).

That way each task is not waiting on device 0 to free up, as is currently
the case.


What are we missing or misunderstanding?

   - salloc / srun parameter?
   - slurm.conf or gres.conf setting?



Also see other additional tests below that illustrate current behavior:



*Test #2*

Here we believe each srun task will need 4 gpus each.

% salloc -n 12 -c 2 -gres=gpu:4

%  srun env | grep CUDA

CUDA_VISIBLE_DEVICES=0,1,2,3

(12 times)



This matches expectation.





*Test #3*

Another test, where I submit multiple sruns in succession:

Here we use a simple sleepCUDA.py scripts which sleeps a few seconds, and
then prints $CUDA_VISIBLE_DEVICES)



% salloc -n 12 -c 2 -gres=gpu:4

%  srun -gres=gpu:1 sleepCUDA.py &

% srun -gres=gpu:1 sleepCUDA.py &

% srun -gres=gpu:1 sleepCUDA.py &

% srun -gres=gpu:1 sleepCUDA.py &



Result:

CUDA_VISIBLE_DEVICES=0  (jobid 1)

CUDA_VISIBLE_DEVICES=1  (jobid 2)

CUDA_VISIBLE_DEVICES=2  (jobid 3)

CUDA_VISIBLE_DEVICES=3  (jobid 4)

And so on (but not necessarily in 0,1,2,3 order)

Though a single srun submission would only use 1 gpu (device=0) as before
 as expected.

But this seems like a step in right direction since multiple devices were
used, but not quite what we wanted.


And according to: https://slurm.schedmd.com/archive/slurm-16.05.7/gres.html

*“By default, a job step will be allocated all of the generic resources
allocated to the job/ [Test #2]*

*If desired, the job step may explicitly specify a different generic
resource count than the job. [Test #3]”*



To Test#3 non-iteractively should we look into creating an sbatch script
(with multiple sruns) instead of salloc?




*OS: *CentOS 7

*Slurm version: *16.05.6


*gres.conf*

Name=gpu File=/dev/nvidia0

Name=gpu File=/dev/nvidia1

Name=gpu File=/dev/nvidia2

Name=gpu File=/dev/nvidia3



*slurm.conf (truncated/partial/simplified)*

NodeName=node1 Gres=gpu:4

NodeName=node2 Gres=gpu:4

NodeName=node3 Gres=gpu:4

NodeName=node4 Gres=gpu:4

GresTypes=gpu



No cgroup.conf



Posting actual .conf is not practical due to firewalls.


Any advice will be greatly appreciated!

Thank you!

Reply via email to