Re: [slurm-users] OverMemoryKill Not Working?

mercan Fri, 25 Oct 2019 06:37:01 -0700

Hi;

The Slurm documentation at these pages:


https://slurm.schedmd.com/slurm.conf.html

https://slurm.schedmd.com/cons_res_share.html


conflict with the slurm 19.05 release notes at this page:

https://slurm.schedmd.com/news.html

Probably the documentation pages are obsolete. But, I don't know anyvalid document which describe the slurm version 19.05.


Regards,

Ahmet M.


On 25.10.2019 16:17, Mike Mosley wrote:

Ahmet,

Thank you for taking the time to respond to my question.

Yes, the --mem=1GBB is a typo. It's correct in my script, I justfat-fingered it in the email. :-)


BTW, the exact version I am using is 19.05.*2.*

Regarding your response, it seems that that might be more than what Ineed. I simply want to enforce the memory limits as specified by theuser at job submission time. This seems to have been the behavior inprevious versions of Slurm. What I want is what is described in the19.05 release notes:


/RELEASE NOTES FOR SLURM VERSION 19.05
28 May 2019
/
/
/

/NOTE: slurmd and slurmctld will now fatal if two incompatiblemechanisms for

      enforcing memory limits are set. This makes incompatible the use of

task/cgroup memory limit enforcing(Constrain[RAM|Swap]Space=yes) with JobAcctGatherParams=OverMemoryKill, which could cause problemswhen a

      task is killed by one of them while the other is at the same time

managing that task. The NoOverMemoryKill setting has beendeprecated in

      favor of OverMemoryKill, since now the default is *NOT* to have any
      memory enforcement mechanism.

NOTE: MemLimitEnforce parameter has been removed and the functionalitythat

      was provided with it has been merged into a JobAcctGatherParams. It
      may be enabled by setting JobAcctGatherParams=OverMemoryKill, so now
      job and steps killing by OOM is enabled from the same place.
/
//

So, is it really necessary to do what you suggested to get thatfunctionality?

If someone could post just a simple slurm.conf file that forces thememory limits to be honored (and kills the job if they are exceeded),then I could extract what I need from that.


Again, thanks for the assistance.

Mike

On Thu, Oct 24, 2019 at 11:27 PM mercan <ahmet.mer...@uhem.itu.edu.tr<mailto:ahmet.mer...@uhem.itu.edu.tr>> wrote:


    Hi;

    You should set

    SelectType=select/cons_res

    and plus one of these:

    SelectTypeParameters=CR_Memory
    SelectTypeParameters=CR_Core_Memory
    SelectTypeParameters=CR_CPU_Memory
    SelectTypeParameters=CR_Socket_Memory

    to open Memory allocation tracking according to documentation:

    https://slurm.schedmd.com/cons_res_share.html

    Also, the line:

    #SBATCH --mem=1GBB

    contains "1GBB". Is this same at job script?


    Regards;

    Ahmet M.


    24.10.2019 23:00 tarihinde Mike Mosley yazdı:
    > Hello,
    >
    > We are testing Slurm19.05 on Linux RHEL7.5+ with the intent to
    migrate
    > from it toTorque/Moab in the near future.
    >
    > One of the things our users are used to is that when their jobs
    exceed
    > the amount of memory they requested, the job is terminated by the
    > scheduler.   We realize the Slurm prefers to use cgroups to contain
    > rather than kill the jobs but initially we need to have the kill
    > option in place to transition our users.
    >
    > So, looking at the documentation, it appears that in 19.05, the
    > following needs to be set to accomplish this:
    >
    > JobAcctGatherParams = OverMemoryKill
    >
    >
    > Other possibly relevant settings we made:
    >
    > JobAcctGatherType = jobacct_gather/linux
    >
    > ProctrackType = proctrack/linuxproc
    >
    >
    > We have avoided configuring any cgroup parameters for the time
    being.
    >
    > Unfortunately, when we submit a job with the following:
    >
    > #SBATCH --nodes=1
    >
    > #SBATCH --ntasks-per-node=1
    >
    > #SBATCH --mem=1GBB
    >
    >
    > We see RSS ofthe  job steadily increase beyond the 1GB limit and
    it is
    > never killed.    Interestingly enough, the proc information
    shows the
    > ulimit (hard and soft) for the process set to around 1GB.
    >
    > We have tried various settings without any success.   Can anyone
    point
    > out what we are doing wrong?
    >
    > Thanks,
    >
    > Mike
    >
    > --
    > */J. Michael Mosley/*
    > University Research Computing
    > The University of North Carolina at Charlotte
    > 9201 University City Blvd
    > Charlotte, NC  28223
    > _704.687.7065 _ _ j/mmos...@uncc.edu <mailto:mmos...@uncc.edu>
    <mailto:mmos...@uncc.edu <mailto:mmos...@uncc.edu>>/_



--
*/J. Michael Mosley/*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC  28223
_704.687.7065 _ _ j/mmos...@uncc.edu <mailto:mmos...@uncc.edu>/_

Re: [slurm-users] OverMemoryKill Not Working?

Reply via email to