Thanks for the tips, Kilian, this really pointed me in the right direction.  

It turns out the issue was the CPU IDs we were using in gres.conf were based on 
how our system was identifying them, when they really needed to be in the 
platform-agnostic format (CPU_ID = Board_ID x threads_per_board + Socket_ID x 
threads_per_socket + Core_ID x threads_per_core + Thread_ID; from the gres.conf 
docs).

-----Original Message-----
From: Kilian Cavalotti [mailto:kilian.cavalotti.w...@gmail.com] 
Sent: Thursday, October 26, 2017 10:39 AM
To: slurm-dev <slurm-dev@schedmd.com>
Cc: Vipin Sirohi <vsir...@nvidia.com>
Subject: [slurm-dev] Re: CPU/GPU Affinity Not Working


Hi Dave,

On Wed, Oct 25, 2017 at 9:23 PM, Dave Sizer <dsi...@nvidia.com> wrote:
> For some reason, we are observing that the preferred CPUs defined in 
> gres.conf for GPU devices are being ignored when running jobs.  That 
> is, in our gres.conf we have gpu resource lines, such as:
>
> Name=gpu Type=kepler File=/dev/nvidia0
> CPUs=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
> Name=gpu Type=kepler File=/dev/nvidia4
> CPUs=8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31

In passing, you can use range notation for CPU indexes, and make it more 
compact:

Name=gpu Type=kepler File=/dev/nvidia0 CPUs=[0-7,16-23] Name=gpu Type=kepler 
File=/dev/nvidia4 CPUs=[8-15,24-31]

> but when we run a job with the second gpu allocated, 
> /sys/fs/cgroup/cpuset/slurm/…./cpuset.cpus reports that the job has 
> been allocated cpus from the first gpu’s set.  It seems as if the 
> CPU/GPU affinity in gres.conf is being completely ignored.  Slurmd.log 
> doesn’t seem to mention anything about it with maximum debug verbosity.

You can try to use DebugFlags=CPU_Bind,gres in your slurm.conf for more details.

> We have tried the following TaskPlugin settings: “task/affinity,task/cgroup”
> and just “task/cgroup”.  In both cases we have tried setting 
> TaskPluginParam to “Cpuset”.  All of these configurations produced the 
> same incorrect results.

We use this:

SelectType=select/cons_res
SelectTypeParameters=CR_CORE_MEMORY
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup

and for a 4-GPU node which has a gres.conf like this (don't ask, some vendors 
like their CPU ids alternating between sockets):

NodeName=sh-114-03   name=gpu    File=/dev/nvidia[0-1]
CPUs=0,2,4,6,8,10,12,14,16,18
NodeName=sh-114-03   name=gpu    File=/dev/nvidia[2-3]
CPUs=1,3,5,7,9,11,13,15,17,19

we can submit 4 jobs using 1 GPU each, which end up getting a CPU id that 
matches the allocated GPU:

$ sbatch --array=1-4 -p gpu -w sh-114-03 --gres=gpu:1 --wrap="sleep 100"
Submitted batch job 2669681

$ scontrol -dd show job 2669681 | grep CPU_ID | sort
    Nodes=sh-114-03 CPU_IDs=0 Mem=12800 GRES_IDX=gpu(IDX:0)
    Nodes=sh-114-03 CPU_IDs=1 Mem=12800 GRES_IDX=gpu(IDX:2)
    Nodes=sh-114-03 CPU_IDs=2 Mem=12800 GRES_IDX=gpu(IDX:1)
    Nodes=sh-114-03 CPU_IDs=3 Mem=12800 GRES_IDX=gpu(IDX:3)

How do you check which GPU your job has been allocated?

Cheers,
--
Kilian

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Reply via email to