Hello, is there a best practise for activating this feature (set ConstrainDevices=yes)? Do I have restart the slurmds? Does this affects running jobs?
We are using Slurm 19.05. Best, Stefan Am Dienstag, 25. August 2020, 17:24:41 CEST schrieb Christoph Brüning: > Hello, > > we're using cgroups to restrict access to the GPUs. > > What I found particularly helpful, are the slides by Marshall Garey from > last year's Slurm User Group Meeting: > https://slurm.schedmd.com/SLUG19/cgroups_and_pam_slurm_adopt.pdf > (NVML didn't work for us for some reason I cannot recall, but listing > the GPU device files explicitly was not a big deal) > > Best, > Christoph > > On 25/08/2020 16.12, Willy Markuske wrote: > > Hello, > > > > I'm trying to restrict access to gpu resources on a cluster I maintain > > for a research group. There are two nodes put into a partition with gres > > gpu resources defined. User can access these resources by submitting > > their job under the gpu partition and defining a gres=gpu. > > > > When a user includes the flag --gres=gpu:# they are allocated the number > > of gpus and slurm properly allocates them. If a user requests only 1 gpu > > they only see CUDA_VISIBLE_DEVICES=1. However, if a user does not > > include the --gres=gpu:# flag they can still submit a job to the > > partition and are then able to see all the GPUs. This has led to some > > bad actors running jobs on all GPUs that other users have allocated and > > causing OOM errors on the gpus. > > > > Is it possible, and where would I find the documentation on doing so, to > > require users to define a --gres=gpu:# to be able to submit to a > > partition? So far reading the gres documentation doesn't seem to have > > yielded any word on this issue specifically. > > > > Regards, -- Stefan Stäglich, Universität Freiburg, Institut für Informatik Georges-Köhler-Allee, Geb.52, 79110 Freiburg, Germany E-Mail : staeg...@informatik.uni-freiburg.de WWW : gki.informatik.uni-freiburg.de Telefon: +49 761 203-54216 Fax : +49 761 203-8222