Re: [slurm-users] NVML autodetect "Failed to get supported memory frequencies" error

2021-03-05 Thread Joshua Baker-LePain
On Fri, 5 Mar 2021 at 7:51am, Kilian Cavalotti wrote Hi Joshua, On Thu, Mar 4, 2021 at 8:38 PM Joshua Baker-LePain wrote: slurmd: error: _nvml_get_mem_freqs: Failed to get supported memory frequencies slurmd: error: for the GPU : Not Supported slurmd: 4 GPU system device(s) detected

[slurm-users] NVML autodetect "Failed to get supported memory frequencies" error

2021-03-04 Thread Joshua Baker-LePain
rce_gtx_1080 Count:1 Cores(1024):0-27 slurmd: Links:-1,0,0,0 Flags:HAS_FILE,HAS_TYPE File:/dev/nvidia0 My googling has utterly failed me on this. Any help? Thanks! -- Joshua Baker-LePain Wynton Cluster Sysadmin UCSF

Re: [slurm-users] Setup for backup slurmctld

2020-02-26 Thread Joshua Baker-LePain
? -- Joshua Baker-LePain Wynton Cluster Sysadmin UCSF

[slurm-users] Setup for backup slurmctld

2020-02-26 Thread Joshua Baker-LePain
private network). So, how are folks sharing the StateSaveLocation in such a setup? Any and all recommendations (including those with the 2 slurmctld servers in the same rack) welcome. Thanks! -- Joshua Baker-LePain Wynton Cluster Sysadmin UCSF