Hi David,

On 1/6/22 22:39, David Henkemeyer wrote:
When my team used PBS, we had several nodes that had a TON of CPUs, so many, in fact, that we ended up setting np to a smaller value, in order to not starve the system of memory.

What is the best way to do this with Slurm?  I tried modifying # of CPUs in the slurm.conf file, but I noticed that Slurm enforces that "CPUs" is equal to Boards * SocketsPerBoard * CoresPerSocket * ThreadsPerCore.  This left me with having to "fool" Slurm into thinking there were either fewer ThreadsPerCore, fewer CoresPerSocket, or fewer SocketsPerBoard.  This is a less than ideal solution, it seems to me.  At least, it left me feeling like there has to be a better way.

If your goal is to limit the amount of RAM memory per job, then kernel Cgroups is probably the answer. I've collected some information in my Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#cgroup-configuration

If some users need more RAM than available for 1 core, they have to submit jobs for a larger number of cores to get it. This makes a lot of sense, IMHO.

SchedMD is working on the use of Cgroups v2, see the talk "Slurm 21.08 and Beyond" by Tim Wickberg, SchedMD, https://slurm.schedmd.com/publications.html

You could probably "fool" Slurm as you describe it, but that shouldn't be necessary.

I hope this helps.

/Ole

Reply via email to