Maybe I've described the problem unclear....
Effectively the problem occurs when the number of GPUs needed for a job
is uneven and greater then the number of GPUs hosted by one node. Let me
clarify with some examples:
* Needed 4 GPUs: No problem, fits on one node.
* Needed 5 GPUs: Means a problem... The most close you would get by
requesting 2 nodes with each 3 GPU's, one GPU will be left unused...
Are there possibilities to circumvent this problem?
Best regards,
Geert
On 02/21/2017 09:51 AM, Geert Geurts wrote:
Hello List,
I'm trying to help clients schedule GPU jobs where it is needed that
the clients can utilize their GPUs to the full.
With using their GPUs to the full I mean each GPU is occupied with a
GPU job independent of possible interference of other jobs or
inefficiency of inter GPU communication.
So the client has a 3 node cluster, with 2 nodes containing 4x nvidia
p100 GPUs, and 1 node containing 4x nvidia k40 GPUs. My client wants
to be able to allocate ONLY the needed number of GPUs to his job. This
is possible for as long as the job doesn't need more then the number
of GPUs in one node. If this client wants to allocate 5 GPUs, I'm not
able to allocate 4 GPU's on one node and 1 GPU on a second... Does
slurm have a solution for this problem?
Best regards,
Geert