Hello List,
I'm trying to help clients schedule GPU jobs where it is needed that the
clients can utilize their GPUs to the full.
With using their GPUs to the full I mean each GPU is occupied with a GPU
job independent of possible interference of other jobs or inefficiency
of inter GPU communication.
So the client has a 3 node cluster, with 2 nodes containing 4x nvidia
p100 GPUs, and 1 node containing 4x nvidia k40 GPUs. My client wants to
be able to allocate ONLY the needed number of GPUs to his job. This is
possible for as long as the job doesn't need more then the number of
GPUs in one node. If this client wants to allocate 5 GPUs, I'm not able
to allocate 4 GPU's on one node and 1 GPU on a second... Does slurm have
a solution for this problem?
Best regards,
Geert
- [slurm-dev] expand GPU jobs Geert Geurts
-