Thanks for the tips, Kilian, this really pointed me in the right direction.
It turns out the issue was the CPU IDs we were using in gres.conf were based on how our system was identifying them, when they really needed to be in the platform-agnostic format (CPU_ID = Board_ID x threads_per_board + Socket_ID x threads_per_socket + Core_ID x threads_per_core + Thread_ID; from the gres.conf docs). -----Original Message----- From: Kilian Cavalotti [mailto:kilian.cavalotti.w...@gmail.com] Sent: Thursday, October 26, 2017 10:39 AM To: slurm-dev <slurm-dev@schedmd.com> Cc: Vipin Sirohi <vsir...@nvidia.com> Subject: [slurm-dev] Re: CPU/GPU Affinity Not Working Hi Dave, On Wed, Oct 25, 2017 at 9:23 PM, Dave Sizer <dsi...@nvidia.com> wrote: > For some reason, we are observing that the preferred CPUs defined in > gres.conf for GPU devices are being ignored when running jobs. That > is, in our gres.conf we have gpu resource lines, such as: > > Name=gpu Type=kepler File=/dev/nvidia0 > CPUs=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 > Name=gpu Type=kepler File=/dev/nvidia4 > CPUs=8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31 In passing, you can use range notation for CPU indexes, and make it more compact: Name=gpu Type=kepler File=/dev/nvidia0 CPUs=[0-7,16-23] Name=gpu Type=kepler File=/dev/nvidia4 CPUs=[8-15,24-31] > but when we run a job with the second gpu allocated, > /sys/fs/cgroup/cpuset/slurm/…./cpuset.cpus reports that the job has > been allocated cpus from the first gpu’s set. It seems as if the > CPU/GPU affinity in gres.conf is being completely ignored. Slurmd.log > doesn’t seem to mention anything about it with maximum debug verbosity. You can try to use DebugFlags=CPU_Bind,gres in your slurm.conf for more details. > We have tried the following TaskPlugin settings: “task/affinity,task/cgroup” > and just “task/cgroup”. In both cases we have tried setting > TaskPluginParam to “Cpuset”. All of these configurations produced the > same incorrect results. We use this: SelectType=select/cons_res SelectTypeParameters=CR_CORE_MEMORY ProctrackType=proctrack/cgroup TaskPlugin=task/cgroup and for a 4-GPU node which has a gres.conf like this (don't ask, some vendors like their CPU ids alternating between sockets): NodeName=sh-114-03 name=gpu File=/dev/nvidia[0-1] CPUs=0,2,4,6,8,10,12,14,16,18 NodeName=sh-114-03 name=gpu File=/dev/nvidia[2-3] CPUs=1,3,5,7,9,11,13,15,17,19 we can submit 4 jobs using 1 GPU each, which end up getting a CPU id that matches the allocated GPU: $ sbatch --array=1-4 -p gpu -w sh-114-03 --gres=gpu:1 --wrap="sleep 100" Submitted batch job 2669681 $ scontrol -dd show job 2669681 | grep CPU_ID | sort Nodes=sh-114-03 CPU_IDs=0 Mem=12800 GRES_IDX=gpu(IDX:0) Nodes=sh-114-03 CPU_IDs=1 Mem=12800 GRES_IDX=gpu(IDX:2) Nodes=sh-114-03 CPU_IDs=2 Mem=12800 GRES_IDX=gpu(IDX:1) Nodes=sh-114-03 CPU_IDs=3 Mem=12800 GRES_IDX=gpu(IDX:3) How do you check which GPU your job has been allocated? Cheers, -- Kilian ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------