Hi.

Isn't that exactly what cgroups are for?
If you use cgroups and request 1 core on a machine w/ N available, you will only use the one you requested, even if the others are idle. If another job gets scheduled on the same machine it's because the requested resources are available. From my (little) experience, usually the problem is the request for RAM: while users tend to estimate quite correctly the number of cores they need, they greatly overstimate the memory. Often by 3 orders of magnitude. For this, the 'seff' tool is quite educational, to the point that its output could be useful in the job completion mail :)

Il 17/12/2021 08:53, Steffen Grunewald ha scritto:
On Fri, 2021-12-17 at 13:03:32 +0530, Sudeep Narayan Banerjee wrote:
Hello All: Can we please restrict one GPU job on one GPU node?

That is,
a) when we submit a GPU job on an empty node (say gpu2) requesting 16 cores
as that gives the best performance in the GPU and it gives best performance.
b) Then another user flooded the CPU cores on gpu2 sharing the GPU
resources. The net results is a GPU job got hit by 40% performance in the
next run

Can we make some changes in the slurm configuration such that when a GPU
job is submitted in a GPU node, no other job can enter that GPU node?

Hi,

your scenario is incomplete :/

In your scenario, a (job_submit?) script could probably change the number
of cores requested to the maximum available, thus avoiding anything else
entering the machine afterwards.
But:

What if some CPU cores of the GPU machine are already in use? Even if that
job behaves nicely at the time the GPU job gets scheduled to the machine,
this doesn't guarantee that this won't change the next moment.

If your GPU machines are of identical configuration, the only feasible way
seems to be to request a full machine.
This won't work that easily if your setup is inhomogeneous, or/and if there
are multiple GPUs in a single machine.

Sometimes there's no technical solution to social problems (assuming that
CPU flooding happens on purpose and knowingly, not by accident), I'm afraid...

Best,
  Steffen


--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Reply via email to