Hi;

You should set

SelectType=select/cons_res

and plus one of these:

SelectTypeParameters=CR_Memory
SelectTypeParameters=CR_Core_Memory
SelectTypeParameters=CR_CPU_Memory
SelectTypeParameters=CR_Socket_Memory

to open Memory allocation tracking according to documentation:

https://slurm.schedmd.com/cons_res_share.html

Also, the line:

#SBATCH --mem=1GBB

contains "1GBB". Is this same at job script?


Regards;

Ahmet M.


24.10.2019 23:00 tarihinde Mike Mosley yazdı:
Hello,

We are testing Slurm19.05 on Linux RHEL7.5+ with the intent to migrate from it toTorque/Moab in the near future.

One of the things our users are used to is that when their jobs exceed the amount of memory they requested, the job is terminated by the scheduler.   We realize the Slurm prefers to use cgroups to contain rather than kill the jobs but initially we need to have the kill option in place to transition our users.

So, looking at the documentation, it appears that in 19.05, the following needs to be set to accomplish this:

JobAcctGatherParams = OverMemoryKill


Other possibly relevant settings we made:

JobAcctGatherType = jobacct_gather/linux

ProctrackType = proctrack/linuxproc


We have avoided configuring any cgroup parameters for the time being.

Unfortunately, when we submit a job with the following:

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=1

#SBATCH --mem=1GBB


We see RSS ofthe  job steadily increase beyond the 1GB limit and it is never killed.    Interestingly enough, the proc information shows the ulimit (hard and soft) for the process set to around 1GB.

We have tried various settings without any success.   Can anyone point out what we are doing wrong?

Thanks,

Mike

--
*/J. Michael Mosley/*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC  28223
_704.687.7065 _ _ j/mmos...@uncc.edu <mailto:mmos...@uncc.edu>/_

Reply via email to