The problem may be that SLURM is monitoring Resident Set Size (RSS). I would take a look at your /proc/<pid>/stat file to see what that says.
Quoting Mike Schachter <[email protected]>: > > Thanks Moe! We have accounting with slurmdbd enabled, and > have several QOS's set which we manipulate through a python > plugin we wrote. > > The problem seems to be that the actual number set for --mem > does not correspond to the memory used. The job will get killed > for --mem=300 but not for --mem=400, despite the fact that the > job itself consumes 965MB of memory. > > mike > > > > On Mon, Sep 10, 2012 at 5:33 PM, Moe Jette <[email protected]> wrote: >> >> You need to enable accounting, which samples memory use periodically. >> The task/cgroup plugin will also enforce memory limits soon. >> >> Quoting Mike Schachter <[email protected]>: >> >>> >>> Hi there, >>> >>> I'm running the following sbatch job: >>> >>> #!/bin/sh >>> #SBATCH -p all -c 1 --mem=400 >>> >>> python -c "import time; import numpy as np; a = np.ones([11000, >>> 11000]); time.sleep(50);" >>> >>> This job should fail - it allocates roughly 935MB of memory. However, >>> it only fails when --mem=300. Am I misinterpreting how --mem computes >>> maximum job memory? >>> >>> Thanks! >>> >>> mike >>
