> -----Original Message----- > From: Ryan Cox [mailto:[email protected]] > Sent: Wednesday, 7 January 2015 12:34 PM > To: slurm-dev > Subject: [slurm-dev] Re: GresTypes typo in docs > > > Gareth, > > Maybe I'm missing something or your configuration is different, but > /dev/shm is also controlled by cgroups. If you are using cgroups and > requiring users to request memory, the GRES setup shouldn't be > necessary since /dev/shm is accounted for like normal memory (though > it's in a different memory.stat field from rss). > > For example: > $ srun -n 1 --mem=100M dd if=/dev/zero of=/dev/shm/DELETEME bs=1M > count=500 > slurmstepd: Exceeded step memory limit at some point. oom-killer likely > killed a process. > srun: error: m6-5-7: task 0: Killed > srun: Force Terminated job step 5401199.0 > > Ryan
Aah thanks. That is news to me. I think we still want some memdir scheduling as by default shm is only 50% the size of memory (at least on our SLES images). It does raise a bunch of questions about cleaning up cgroups and writing to shm from one cgroup and reading from another. On the other hand, we are looking into this setup as a matter of principle rather than an immediate real need - and as a way of getting familiar with slurm's gres which we will need for gpu and phi soon. We don't currently have the same hard limit you illustrate above - I guess our cpuset config is currently only limiting access to cores (and I can keep populating /dev/shm until beyond the specified memory limit): > cat /etc/slurm/cgroup.conf ### # Slurm cgroup support configuration file ### #CgroupMountpoint=/sys/fs/cgroup CgroupMountpoint=/dev/cgroup CgroupAutomount=yes CgroupReleaseAgentDir="/etc/slurm/cgroup" ConstrainCores=yes # BTW. The 'gres without plugin' thread (https://groups.google.com/forum/#!topic/slurm-devel/EQQ1_msGLrc) is probably more appropriate for this content. They are related threads but the doco is fixed and I figured a separate specific question about getting gres working was appropriate. That thread now seems resolved for me. Gareth > > On 01/05/2015 08:38 PM, [email protected] wrote: > > Hi, > > > > We are busy configuring a Gres for counting /dev/shm space (calling > it 'memdir' and not being too worried about enforcement, just > separating jobs that request it and need separation) and got caught out > by a typo on http://slurm.schedmd.com/gres.html where the example has > GresType=gpu,bandwith rather than GresTypes=... > > > > Could you please fix the doc! > > > > BTW. Slurm was quite ungracious about having that bad entry in > > slurm.conf > > > > Regards, > > > > Gareth > > -- > Ryan Cox > Operations Director > Fulton Supercomputing Lab > Brigham Young University
