> -----Original Message-----
> From: Ryan Cox [mailto:[email protected]]
> Sent: Wednesday, 7 January 2015 12:34 PM
> To: slurm-dev
> Subject: [slurm-dev] Re: GresTypes typo in docs
> 
> 
> Gareth,
> 
> Maybe I'm missing something or your configuration is different, but
> /dev/shm is also controlled by cgroups.  If you are using cgroups and
> requiring users to request memory, the GRES setup shouldn't be
> necessary since /dev/shm is accounted for like normal memory (though
> it's in a different memory.stat field from rss).
> 
> For example:
> $ srun -n 1 --mem=100M dd if=/dev/zero of=/dev/shm/DELETEME bs=1M
> count=500
> slurmstepd: Exceeded step memory limit at some point. oom-killer likely
> killed a process.
> srun: error: m6-5-7: task 0: Killed
> srun: Force Terminated job step 5401199.0
> 
> Ryan

Aah thanks. That is news to me.  I think we still want some memdir scheduling 
as by default shm is only 50% the size of memory (at least on our SLES images). 
 It does raise a bunch of questions about cleaning up cgroups and writing to 
shm from one cgroup and reading from another.  On the other hand, we are 
looking into this setup as a matter of principle rather than an immediate real 
need - and as a way of getting familiar with slurm's gres which we will need 
for gpu and phi soon.

We don't currently have the same hard limit you illustrate above - I guess our 
cpuset config is currently only limiting access to cores (and I can keep 
populating /dev/shm until beyond the specified memory limit):
> cat /etc/slurm/cgroup.conf
### 
# Slurm cgroup support configuration file 
### 
#CgroupMountpoint=/sys/fs/cgroup
CgroupMountpoint=/dev/cgroup
CgroupAutomount=yes 
CgroupReleaseAgentDir="/etc/slurm/cgroup" 
ConstrainCores=yes 
#

BTW. The 'gres without plugin' thread 
(https://groups.google.com/forum/#!topic/slurm-devel/EQQ1_msGLrc) is probably 
more appropriate for this content.  They are related threads but the doco is 
fixed and I figured a separate specific question about getting gres working was 
appropriate. That thread now seems resolved for me.

Gareth

> 
> On 01/05/2015 08:38 PM, [email protected] wrote:
> > Hi,
> >
> > We are busy configuring a Gres for counting /dev/shm space (calling
> it 'memdir' and not being too worried about enforcement, just
> separating jobs that request it and need separation) and got caught out
> by a typo on http://slurm.schedmd.com/gres.html where the example has
> GresType=gpu,bandwith rather than GresTypes=...
> >
> > Could you please fix the doc!
> >
> > BTW. Slurm was quite ungracious about having that bad entry in
> > slurm.conf
> >
> > Regards,
> >
> > Gareth
> 
> --
> Ryan Cox
> Operations Director
> Fulton Supercomputing Lab
> Brigham Young University

Reply via email to