Re: [slurm-users] Wrong hwloc detected?

2021-11-09 Thread Chris Samuel
On 5/11/21 4:47 am, Diego Zuccato wrote: How can Slurm detect such an old HWLOC version? Looking at the code it's not actually checking the hwloc version, it's finding an error condition and suggesting that may be the cause, but it sounds like it's not for you. src/plugins/task/cgroup/task

Re: [slurm-users] Wrong hwloc detected?

2021-11-08 Thread Diego Zuccato
Hi Ole. I'm using the packages from Debian stable (slurm 20.11.4, hwloc 2.4.1). And I checked: hwloc is installed on all the nodes. Quite obvious since it's a dep for slurmd: https://packages.debian.org/bullseye/slurmd Being a dep, i "suspect" slurmd is built with hwloc support. Diego Il 07/1

Re: [slurm-users] Wrong hwloc detected?

2021-11-07 Thread Ole Holm Nielsen
Hi Diego, Are you sure that the Slurm software installed on all compute nodes was actually built on a system which had the hwloc packages installed? They should also be installed on the compute nodes. The prerequisite packages are listed here: https://wiki.fysik.dtu.dk/niflheim/Slurm_instal

Re: [slurm-users] Wrong hwloc detected?

2021-11-05 Thread Diego Zuccato
They aren't using modules so it must be something system-wide :( But not all jobs are impacted. And it seems it's a bit random (doesn't happen always). I'm out of ideas, currently :( Il 05/11/2021 13:10, Ole Holm Nielsen ha scritto: On 11/5/21 12:47, Diego Zuccato wrote: Some users are report

Re: [slurm-users] Wrong hwloc detected?

2021-11-05 Thread Ole Holm Nielsen
On 11/5/21 12:47, Diego Zuccato wrote: Some users are reporting this error: slurmstepd-str957-mtx-01: error: hwloc_get_obj_below_by_type() failing, task/affinity plugin may be required to address bug fixed in HWLOC version 1.11.5 slurmstepd-str957-mtx-01: error: task[0] unable to set taskset

[slurm-users] Wrong hwloc detected?

2021-11-05 Thread Diego Zuccato
Hello all. Some users are reporting this error: slurmstepd-str957-mtx-01: error: hwloc_get_obj_below_by_type() failing, task/affinity plugin may be required to address bug fixed in HWLOC version 1.11.5 slurmstepd-str957-mtx-01: error: task[0] unable to set taskset '0x0' I checked on that nod