Hi;
You should set
SelectType=select/cons_res
and plus one of these:
SelectTypeParameters=CR_Memory
SelectTypeParameters=CR_Core_Memory
SelectTypeParameters=CR_CPU_Memory
SelectTypeParameters=CR_Socket_Memory
to open Memory allocation tracking according to documentation:
https://slurm.schedmd.com/cons_res_share.html
Also, the line:
#SBATCH --mem=1GBB
contains "1GBB". Is this same at job script?
Regards;
Ahmet M.
24.10.2019 23:00 tarihinde Mike Mosley yazdı:
Hello,
We are testing Slurm19.05 on Linux RHEL7.5+ with the intent to migrate
from it toTorque/Moab in the near future.
One of the things our users are used to is that when their jobs exceed
the amount of memory they requested, the job is terminated by the
scheduler. We realize the Slurm prefers to use cgroups to contain
rather than kill the jobs but initially we need to have the kill
option in place to transition our users.
So, looking at the documentation, it appears that in 19.05, the
following needs to be set to accomplish this:
JobAcctGatherParams = OverMemoryKill
Other possibly relevant settings we made:
JobAcctGatherType = jobacct_gather/linux
ProctrackType = proctrack/linuxproc
We have avoided configuring any cgroup parameters for the time being.
Unfortunately, when we submit a job with the following:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=1GBB
We see RSS ofthe job steadily increase beyond the 1GB limit and it is
never killed. Interestingly enough, the proc information shows the
ulimit (hard and soft) for the process set to around 1GB.
We have tried various settings without any success. Can anyone point
out what we are doing wrong?
Thanks,
Mike
--
*/J. Michael Mosley/*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC 28223
_704.687.7065 _ _ j/mmos...@uncc.edu <mailto:mmos...@uncc.edu>/_