[slurm-users] About memory limits with srun

Patrick Begou Sun, 21 Mar 2021 23:27:38 -0700

Hi all,

I sent this mail from a bad email address this week-end. I apologize if
it is published duplicate (but not found in the archive yet).


May be this is a basic question but I'm stuck with it. I'm quite new in
managing a small cluster with slurm instead of a local batch scheduler.
On the nodes I've set memory limits in slurm.conf.

    DefMemPerCPU=2048
    MaxMemPerCPU=4096

Requesting 1.2GB of RAM works:

    srun --ntasks-per-node=1 --mem-per-cpu=1500M -p tenibre-gpu --pty
    bash -i

and my testcase  can allocate until 1.5GB:

    ./a.out
     allocation de 1000Mo.........Ok
    ....
     allocation de 1419Mo.........Ok
     allocation de 1524Mo.........Ok
    Killed

Now I would like to use more memory than MaxMemPerCPU:

    srun --ntasks-per-node=1 --mem-per-cpu=12G -p tenibre-gpu --pty bash -i

So, if I understand the documentation, as mem-per-cpu > MaxMemPerCPU
this is a limitation applied to the task and it agregates cpu and
memory. The squeue command show 3 cpu agregated on the node to reach the
3*MaxMemPerCPU memory requested so all seams correct.

     JOBID    PARTITION               NAME       USER ST      
    TIME          START_TIME TIME_LIMIT CPUS NODELIST(REASON)
        497  tenibre-gpu               bash      begou  R       1:23
    2021-03-20T14:42:47   12:00:00    3 tenibre-gpu-0

But my task is unable to exceed the MaxMemPerCPU value ?

    ./a.out
     allocation de 1000Mo.........Ok
    ....

     allocation de 4145Mo.........Ok
     allocation de 4250Mo.........Ok
    Killed

So, I'm wrong somewhere but ? 

Running the testcase in a ssh sessions (ssh as root then su as a basic
user) allows using more memory so it is related to my bad slurm setup/use

Patrick

[slurm-users] About memory limits with srun

Reply via email to