Re: [slurm-users] OverMemoryKill Not Working?

mercan Thu, 24 Oct 2019 20:31:14 -0700

Hi;

You should set


SelectType=select/cons_res

and plus one of these:

SelectTypeParameters=CR_Memory
SelectTypeParameters=CR_Core_Memory
SelectTypeParameters=CR_CPU_Memory
SelectTypeParameters=CR_Socket_Memory

to open Memory allocation tracking according to documentation:

https://slurm.schedmd.com/cons_res_share.html

Also, the line:

#SBATCH --mem=1GBB

contains "1GBB". Is this same at job script?


Regards;

Ahmet M.


24.10.2019 23:00 tarihinde Mike Mosley yazdı:

Hello,
We are testing Slurm19.05 on Linux RHEL7.5+ with the intent to migratefrom it toTorque/Moab in the near future.
One of the things our users are used to is that when their jobs exceedthe amount of memory they requested, the job is terminated by thescheduler. We realize the Slurm prefers to use cgroups to containrather than kill the jobs but initially we need to have the killoption in place to transition our users.
So, looking at the documentation, it appears that in 19.05, thefollowing needs to be set to accomplish this:
JobAcctGatherParams = OverMemoryKill


Other possibly relevant settings we made:

JobAcctGatherType = jobacct_gather/linux

ProctrackType = proctrack/linuxproc


We have avoided configuring any cgroup parameters for the time being.

Unfortunately, when we submit a job with the following:

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=1

#SBATCH --mem=1GBB
We see RSS ofthe job steadily increase beyond the 1GB limit and it isnever killed. Interestingly enough, the proc information shows theulimit (hard and soft) for the process set to around 1GB.
We have tried various settings without any success. Can anyone pointout what we are doing wrong?
Thanks,

Mike

--
*/J. Michael Mosley/*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC  28223
_704.687.7065 _ _ j/mmos...@uncc.edu <mailto:mmos...@uncc.edu>/_

Re: [slurm-users] OverMemoryKill Not Working?

Reply via email to