I did a cat /proc/<pid>/status and the memory sizes all seem accurate, including VmRSS:
VmPeak: 1089636 kB VmSize: 1089636 kB VmLck: 0 kB VmHWM: 963536 kB VmRSS: 963536 kB VmData: 952296 kB VmStk: 172 kB VmExe: 2160 kB VmLib: 79848 kB VmPTE: 2028 kB I'd be happy to look into the source code if you could point me to where the monitoring takes place. mike On Tue, Sep 11, 2012 at 9:15 AM, Moe Jette <[email protected]> wrote: > > The problem may be that SLURM is monitoring Resident Set Size (RSS). I > would take a look at your /proc/<pid>/stat file to see what that says. > > Quoting Mike Schachter <[email protected]>: > >> >> Thanks Moe! We have accounting with slurmdbd enabled, and >> have several QOS's set which we manipulate through a python >> plugin we wrote. >> >> The problem seems to be that the actual number set for --mem >> does not correspond to the memory used. The job will get killed >> for --mem=300 but not for --mem=400, despite the fact that the >> job itself consumes 965MB of memory. >> >> mike >> >> >> >> On Mon, Sep 10, 2012 at 5:33 PM, Moe Jette <[email protected]> wrote: >>> >>> You need to enable accounting, which samples memory use periodically. >>> The task/cgroup plugin will also enforce memory limits soon. >>> >>> Quoting Mike Schachter <[email protected]>: >>> >>>> >>>> Hi there, >>>> >>>> I'm running the following sbatch job: >>>> >>>> #!/bin/sh >>>> #SBATCH -p all -c 1 --mem=400 >>>> >>>> python -c "import time; import numpy as np; a = np.ones([11000, >>>> 11000]); time.sleep(50);" >>>> >>>> This job should fail - it allocates roughly 935MB of memory. However, >>>> it only fails when --mem=300. Am I misinterpreting how --mem computes >>>> maximum job memory? >>>> >>>> Thanks! >>>> >>>> mike >>> >
