The problem may be that SLURM is monitoring Resident Set Size (RSS). I  
would take a look at your /proc/<pid>/stat file to see what that says.

Quoting Mike Schachter <[email protected]>:

>
> Thanks Moe! We have accounting with slurmdbd enabled, and
> have several QOS's set which we manipulate through a python
> plugin we wrote.
>
> The problem seems to be that the actual number set for --mem
> does not correspond to the memory used. The job will get killed
> for --mem=300 but not for --mem=400, despite the fact that the
> job itself consumes 965MB of memory.
>
>  mike
>
>
>
> On Mon, Sep 10, 2012 at 5:33 PM, Moe Jette <[email protected]> wrote:
>>
>> You need to enable accounting, which samples memory use periodically.
>> The task/cgroup plugin will also enforce memory limits soon.
>>
>> Quoting Mike Schachter <[email protected]>:
>>
>>>
>>> Hi there,
>>>
>>> I'm running the following sbatch job:
>>>
>>> #!/bin/sh
>>> #SBATCH -p all -c 1 --mem=400
>>>
>>> python -c "import time; import numpy as np; a = np.ones([11000,
>>> 11000]); time.sleep(50);"
>>>
>>> This job should fail - it allocates roughly 935MB of memory. However,
>>> it only fails when --mem=300. Am I misinterpreting how --mem computes
>>> maximum job memory?
>>>
>>> Thanks!
>>>
>>>  mike
>>

Reply via email to