Hi,I'm not sure how it works in 19.0.5, but with 18.x it's possible to specify CPU affinity in the file /etc/slurm/gres.conf
Name=gpu Type=v100 File=/dev/nvidia0 CPUs=0-17,36-53 Name=gpu Type=v100 File=/dev/nvidia1 CPUs=0-17,36-53 Name=gpu Type=v100 File=/dev/nvidia2 CPUs=18-35,54-71 Name=gpu Type=v100 File=/dev/nvidia3 CPUs=18-35,54-71
The CPUs number you can get with command: nvidia-smi topo -m GPU0 GPU1 GPU2 GPU3 mlx5_0 CPU Affinity GPU0 X NV2 NV2 NV2 NODE 0-17,36-53 GPU1 NV2 X NV2 NV2 NODE 0-17,36-53 GPU2 NV2 NV2 X NV2 SYS 18-35,54-71 GPU3 NV2 NV2 NV2 X SYS 18-35,54-71 mlx5_0 NODE NODE SYS SYS X Best regards, Daniel On 27.06.2019 15:17, Luis Altenkort wrote:
Hello everyone,I have several nodes with 2 sockets each and 4 GPUs per Socket (i.e. 8 GPUs per bode). I now want to tell SLURM that GPUs with device ID 0,1,2,3 are connected to socket 0 and GPUs 4,5,6,7 are connected to socket 1. I want to do this in order to be able to use the new command --gpus-per-socket. All GPUs on one socket are directly linked via NVlink and use P2P communication. In the end I want to be able to run multi-gpu jobs on GPUs that are all on one socket (and not distributed accross sockets or nodes). What do I have to change in my slurm.conf and how would I submit jobs? Like this?:#!/bin/bash #SBATCH --job-name=test #SBATCH --partition=volta #SBATCH --gpus-per-socket=4 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 Slurm is on version 19.05.0. Thanks in advance!
smime.p7s
Description: S/MIME Cryptographic Signature