Re: [slurm-users] GPU Gres Type inconsistencies

Ben Roberts Tue, 20 Jun 2023 01:47:02 -0700

For the benefit of anyone else who comes across this, I've managed to resolve 
the issue.


  1.  Remove the affected node entries from the slurm.conf on slurmctld host
  2.  Restart slurmctld
  3.  Re-add the nodes back to slurm.conf on slurmctld host
  4.  Restart slurmctld again

Following this, the Gres= lines in `scontrol show node ...` display the new 
type. I guess this means slurmctld was persisting some state about the previous 
gres type somewhere, but I'm not sure where, and removing the node from 
slurm.conf and restarting caused this to be flushed.

--
Regards,
Ben Roberts

From: Ben Roberts
Sent: 19 June 2023 11:57
To: slurm-users@lists.schedmd.com
Subject: GPU Gres Type inconsistencies

Hi all,

I'm trying to set up GPU Gres Types to correctly identify the installed 
hardware (generation and memory size). I'm using a mix of explicit 
configuration (to set a friendly type name) and autodetection (to handle the 
cores and links detection). I'm seeing two related issues which I don't 
understand.

  1.  The output of `scontrol show node` references `Gres=gpu:tesla:2` instead 
of the type I'm specifying in the config file (`v100s-pcie-32gb`)
  2.  Attempts to schedule jobs using generic `--gpus 1` are working fine, but 
attempts to specify the gpu type (either with `--gres gpu:v100s-pcie-32gb:1` or 
`--gres gpu:v100s-pcie-32gb:1` fail with `error: Unable to allocate resources: 
Requested node configuration is not available`

If I've understood the documentation 
(https://slurm.schedmd.com/gres.conf.html#OPT_Type), I should be able to use 
any substring of what nvml detects the card as (`tesla_v100s-pcie-32gb`) as the 
Type string. With gres debug flag set, I can see the GPUs are detected, and 
matched up with the static entries in gres.conf correctly. I don't see any 
mention of Type=tesla in the logs, so I'm at a loss as to why scontrol show 
node is reporting `gpu:tesla` instead of `gpu:v100s-pcie-32gb` as configured. I 
presume this mismatch is the cause of the failure to schedule, because while 
the job spec matches the configured gpu type and should be schedulable, the 
scheduler doesn't actually see any resources of this type available to run.

The "tesla" string is the first "word" of the autodetected type, but I can't 
see why it would be being truncated to just this rather than using the whole 
string. I did previously use the type "tesla" in the config, which worked fine 
since everything matched up, but since does not adequately describe the 
hardware so I need to change this to be more specific. Is there anywhere other 
than slurm.conf or gres.conf where the old gpu type might be persisted and need 
purging?

I've tried using `scontrol update node=gpu2 gres=gpu:v100s-pcie-32gb:0` to 
manually change the gres type (trying to set the number of GPUs to 2 here is 
rejected, but 0 is accepted). `scontrol reconfig` then causes the `scontrol 
show node` output to update to `Gres=vpu:v100s-pcie-32gb:2` as expected, but 
removes the gpus from CfgTRES. After restarting slurmctld, the Gres, and 
cfgTRES briefly match up for all nodes, but very shortly after the Gres entries 
revert back to Gres=gpu:tesla:0 again, so back to square 1.

I've tried using the full tesla_v100s-pcie-32gb string as the type also, but 
this has no effect, the gres type is still reported as gpu:tesla only. This is 
all with slurm 23.02.3, on Rocky Linux 8.8, using 
cuda-nvml-devel-12-0-12.0.140-1.x86_64. Excerpts from configs and logs shown 
below.

Can anyone point me in the right direction in how to solve this? Thanks,

# /etc/slurm/gres.conf
Name=gpu Type=v100s-pcie-32gb File=/dev/nvidia0
Name=gpu Type=v100s-pcie-32gb File=/dev/nvidia1
AutoDetect=nvml

# /etc/slurm/slurm.conf (identical on all nodes)
AccountingStorageTRES=gres/gpu,gres/gpu:v100s-pcie-32gb,gres/gpu:v100-pcie-32gb
EnforcePartLimits=ANY
GresTypes=gpu
NodeName=gpu2 CoresPerSocket=8 CPUs=8 Gres=gpu:v100s-pcie-32gb:2 Sockets=1 
ThreadsPerCore=1

# scontrol show node gpu2
NodeName=gpu2 Arch=x86_64 CoresPerSocket=8
   CPUAlloc=0 CPUEfctv=8 CPUTot=8 CPULoad=0.02
   AvailableFeatures=...
   Gres=gpu:tesla:0(S:0)
   NodeAddr=gpu2.example.com NodeHostName=gpu2 Version=23.02.3
   OS=Linux 4.18.0-477.13.1.el8_8.x86_64 #1 SMP Tue May 30 22:15:39 UTC 2023
   RealMemory=331301 AllocMem=0 FreeMem=334102 Sockets=1 Boards=1
   MemSpecLimit=500
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=gpu
   BootTime=2023-06-14T23:03:05 SlurmdStartTime=2023-06-18T23:25:21
   LastBusyTime=2023-06-18T23:23:23 ResumeAfterTime=None
   CfgTRES=cpu=8,mem=331301M,billing=8,gres/gpu=2,gres/gpu:v100s-pcie-32gb=2
   AllocTRES=

# /var/log/slurm/slurmd.log (trimmed to only relevant lines for brevity)
[2023-06-19T11:29:25.629] GRES: Global AutoDetect=nvml(1)
[2023-06-19T11:29:25.629] debug:  gres/gpu: init: loaded
[2023-06-19T11:29:25.629] debug:  gpu/nvml: init: init: GPU NVML plugin loaded
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _nvml_init: Successfully 
initialized NVML
[2023-06-19T11:29:26.265] debug:  gpu/nvml: _get_system_gpu_list_nvml: Systems 
Graphics Driver Version: 525.105.17
[2023-06-19T11:29:26.265] debug:  gpu/nvml: _get_system_gpu_list_nvml: NVML 
Library Version: 12.525.105.17
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: NVML API 
Version: 11
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: Total 
CPU count: 8
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: Device 
count: 2
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: GPU 
index 0:
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     
Name: tesla_v100s-pcie-32gb
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     
UUID: GPU-1ef493da-bf08-60a4-8afb-4db79646f86e
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     PCI 
Domain/Bus/Device: 0:11:0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     PCI 
Bus ID: 00000000:0B:00.0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     
NVLinks: -1,0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     
Device File (minor number): /dev/nvidia0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     CPU 
Affinity Range - Machine: 0-7
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     Core 
Affinity Range - Abstract: 0-7
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     MIG 
mode: disabled
[2023-06-19T11:29:26.302] debug2: Possible GPU Memory Frequencies (1):
[2023-06-19T11:29:26.302] debug2: -------------------------------
[2023-06-19T11:29:26.302] debug2:     *1107 MHz [0]
[2023-06-19T11:29:26.302] debug2:         Possible GPU Graphics Frequencies 
(196):
[2023-06-19T11:29:26.302] debug2:         ---------------------------------
[2023-06-19T11:29:26.302] debug2:           *1597 MHz [0]
[2023-06-19T11:29:26.302] debug2:           *1590 MHz [1]
[2023-06-19T11:29:26.302] debug2:           ...
[2023-06-19T11:29:26.302] debug2:           *870 MHz [97]
[2023-06-19T11:29:26.302] debug2:           ...
[2023-06-19T11:29:26.302] debug2:           *142 MHz [194]
[2023-06-19T11:29:26.302] debug2:           *135 MHz [195]
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: GPU 
index 1:
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     
Name: tesla_v100s-pcie-32gb
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     
UUID: GPU-0e7d20b1-5a0f-8ef6-5120-970bd26210bb
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     PCI 
Domain/Bus/Device: 0:19:0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     PCI 
Bus ID: 00000000:13:00.0
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     
NVLinks: 0,-1
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     
Device File (minor number): /dev/nvidia1
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     CPU 
Affinity Range - Machine: 0-7
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     Core 
Affinity Range - Abstract: 0-7
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml:     MIG 
mode: disabled
[2023-06-19T11:29:26.303] debug2: Possible GPU Memory Frequencies (1):
[2023-06-19T11:29:26.303] debug2: -------------------------------
[2023-06-19T11:29:26.303] debug2:     *1107 MHz [0]
[2023-06-19T11:29:26.303] debug2:         Possible GPU Graphics Frequencies 
(196):
[2023-06-19T11:29:26.303] debug2:         ---------------------------------
[2023-06-19T11:29:26.303] debug2:           *1597 MHz [0]
[2023-06-19T11:29:26.303] debug2:           *1590 MHz [1]
[2023-06-19T11:29:26.303] debug2:           ...
[2023-06-19T11:29:26.303] debug2:           *870 MHz [97]
[2023-06-19T11:29:26.303] debug2:           ...
[2023-06-19T11:29:26.303] debug2:           *142 MHz [194]
[2023-06-19T11:29:26.303] debug2:           *135 MHz [195]
[2023-06-19T11:29:26.303] gpu/nvml: _get_system_gpu_list_nvml: 2 GPU system 
device(s) detected
[2023-06-19T11:29:26.303] Gres GPU plugin: Merging configured GRES with system 
GPUs
[2023-06-19T11:29:26.303] debug2: gres/gpu: _merge_system_gres_conf: 
gres_list_conf:
[2023-06-19T11:29:26.303] debug2:     GRES[gpu] Type:v100s-pcie-32gb Count:1 
Cores(8):(null)  Links:(null) 
Flags:HAS_FILE,HAS_TYPE,ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT 
File:/dev/nvidia0 UniqueId:(null)
[2023-06-19T11:29:26.303] debug2:     GRES[gpu] Type:v100s-pcie-32gb Count:1 
Cores(8):(null)  Links:(null) 
Flags:HAS_FILE,HAS_TYPE,ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT 
File:/dev/nvidia1 UniqueId:(null)
[2023-06-19T11:29:26.303] debug:  gres/gpu: _merge_system_gres_conf: Including 
the following GPU matched between system and configuration:
[2023-06-19T11:29:26.303] debug:      GRES[gpu] Type:v100s-pcie-32gb Count:1 
Cores(8):0-7  Links:-1,0 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia0 
UniqueId:(null)
[2023-06-19T11:29:26.303] debug:  gres/gpu: _merge_system_gres_conf: Including 
the following GPU matched between system and configuration:
[2023-06-19T11:29:26.303] debug:      GRES[gpu] Type:v100s-pcie-32gb Count:1 
Cores(8):0-7  Links:0,-1 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia1 
UniqueId:(null)
[2023-06-19T11:29:26.303] debug2: gres/gpu: _merge_system_gres_conf: 
gres_list_gpu
[2023-06-19T11:29:26.303] debug2:     GRES[gpu] Type:v100s-pcie-32gb Count:1 
Cores(8):0-7  Links:-1,0 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia0 
UniqueId:(null)
[2023-06-19T11:29:26.303] debug2:     GRES[gpu] Type:v100s-pcie-32gb Count:1 
Cores(8):0-7  Links:0,-1 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia1 
UniqueId:(null)
[2023-06-19T11:29:26.303] Gres GPU plugin: Final merged GRES list:
[2023-06-19T11:29:26.303]     GRES[gpu] Type:v100s-pcie-32gb Count:1 
Cores(8):0-7  Links:-1,0 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia0 
UniqueId:(null)
[2023-06-19T11:29:26.303]     GRES[gpu] Type:v100s-pcie-32gb Count:1 
Cores(8):0-7  Links:0,-1 Flags:HAS_FILE,HAS_TYPE,ENV_NVML File:/dev/nvidia1 
UniqueId:(null)
[2023-06-19T11:29:26.303] GRES: _set_gres_device_desc : /dev/nvidia0 major 195, 
minor 0
[2023-06-19T11:29:26.303] GRES: _set_gres_device_desc : /dev/nvidia1 major 195, 
minor 1
[2023-06-19T11:29:26.303] GRES: gpu device number 0(/dev/nvidia0):c 195:0 rwm
[2023-06-19T11:29:26.303] GRES: gpu device number 1(/dev/nvidia1):c 195:1 rwm
[2023-06-19T11:29:26.303] Gres Name=gpu Type=v100s-pcie-32gb Count=1 Index=0 
ID=7696487 File=/dev/nvidia0 Cores=0-7 CoreCnt=8 Links=-1,0 
Flags=HAS_FILE,HAS_TYPE,ENV_NVML
[2023-06-19T11:29:26.303] Gres Name=gpu Type=v100s-pcie-32gb Count=1 Index=1 
ID=7696487 File=/dev/nvidia1 Cores=0-7 CoreCnt=8 Links=0,-1 
Flags=HAS_FILE,HAS_TYPE,ENV_NVML
[2023-06-19T11:29:26.303] CPU frequency setting not configured for this node
[2023-06-19T11:29:26.304] slurmd version 23.02.3 started
[2023-06-19T11:29:26.306] slurmd started on Mon, 19 Jun 2023 11:29:26 +0100
[2023-06-19T11:29:26.307] CPUs=8 Boards=1 Sockets=1 Cores=8 Threads=1 
Memory=338063 TmpDisk=2048 Uptime=390381 CPUSpecList=(null) 
FeaturesAvail=(null) FeaturesActive=(null)
[2023-06-19T11:29:26.310] debug:  _handle_node_reg_resp: slurmctld sent back 14 
TRES.

--
Regards,
Ben Roberts

For details of how GSA uses your personal information, please see our Privacy 
Notice here: https://www.gsacapital.com/privacy-notice 

This email and any files transmitted with it contain confidential and 
proprietary information and is solely for the use of the intended recipient.
If you are not the intended recipient please return the email to the sender and 
delete it from your computer and you must not use, disclose, distribute, copy, 
print or rely on this email or its contents.
This communication is for informational purposes only.
It is not intended as an offer or solicitation for the purchase or sale of any 
financial instrument or as an official confirmation of any transaction.
Any comments or statements made herein do not necessarily reflect those of GSA 
Capital.
GSA Capital Partners LLP is authorised and regulated by the Financial Conduct 
Authority and is registered in England and Wales at Stratton House, 5 Stratton 
Street, London W1J 8LA, number OC309261.
GSA Capital Services Limited is registered in England and Wales at the same 
address, number 5320529.

Re: [slurm-users] GPU Gres Type inconsistencies

Reply via email to