On Monday, 25 March 2019 2:30:34 AM PDT Peter Steinbach wrote: > I observed a weird behavior of the '--gres-flags=disable-binding' > option. With the above .conf files, I created a local slurm cluster with > 3 computes (2 GPUs and 4 cores each).
First of all, you will want to use cgroups to ensure that processes that do not request GPUs cannot access them. Secondly, do your CPUs have hyperthreading enabled by some chance? If so then your gres.conf is likely wrong as you'll want to list the first HT on each core that you want to restrict access to. >From the manual page for gres.conf: NOTE: If your cores contain multiple threads only list the first thread of each core. The logic is such that it uses core instead of thread scheduling per GRES. Also note that since Slurm must be able to perform resource management on heterogeneous clusters having various core ID num- bering schemes, an abstract index will be used instead of the physical core index. That abstract id may not correspond to your physical core number. Basically Slurm starts numbering from 0 to n, being 0 the id of the first processing unit (core or thread if HT is enabled) on the first socket, first core and maybe first thread, and then continuing sequen- tially to the next thread, core, and socket. The numbering generally coincides with the processing unit logical number (PU L#) seen in lstopo output. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA