I did a cat /proc/<pid>/status and the memory sizes
all seem accurate, including VmRSS:

VmPeak:  1089636 kB
VmSize:  1089636 kB
VmLck:         0 kB
VmHWM:    963536 kB
VmRSS:    963536 kB
VmData:   952296 kB
VmStk:       172 kB
VmExe:      2160 kB
VmLib:     79848 kB
VmPTE:      2028 kB

I'd be happy to look into the source code if you could
point me to where the monitoring takes place.

 mike



On Tue, Sep 11, 2012 at 9:15 AM, Moe Jette <[email protected]> wrote:
>
> The problem may be that SLURM is monitoring Resident Set Size (RSS). I
> would take a look at your /proc/<pid>/stat file to see what that says.
>
> Quoting Mike Schachter <[email protected]>:
>
>>
>> Thanks Moe! We have accounting with slurmdbd enabled, and
>> have several QOS's set which we manipulate through a python
>> plugin we wrote.
>>
>> The problem seems to be that the actual number set for --mem
>> does not correspond to the memory used. The job will get killed
>> for --mem=300 but not for --mem=400, despite the fact that the
>> job itself consumes 965MB of memory.
>>
>>  mike
>>
>>
>>
>> On Mon, Sep 10, 2012 at 5:33 PM, Moe Jette <[email protected]> wrote:
>>>
>>> You need to enable accounting, which samples memory use periodically.
>>> The task/cgroup plugin will also enforce memory limits soon.
>>>
>>> Quoting Mike Schachter <[email protected]>:
>>>
>>>>
>>>> Hi there,
>>>>
>>>> I'm running the following sbatch job:
>>>>
>>>> #!/bin/sh
>>>> #SBATCH -p all -c 1 --mem=400
>>>>
>>>> python -c "import time; import numpy as np; a = np.ones([11000,
>>>> 11000]); time.sleep(50);"
>>>>
>>>> This job should fail - it allocates roughly 935MB of memory. However,
>>>> it only fails when --mem=300. Am I misinterpreting how --mem computes
>>>> maximum job memory?
>>>>
>>>> Thanks!
>>>>
>>>>  mike
>>>
>

Reply via email to