Hello compute-0-12 reports totally buggy NUMA information:
$ cat compute-0-12/sys/devices/system/node/node*/cpumap 00000000,000000ff 00000000,ff00ff00 00000000,00ff0000 0000ffff,00000000 $ cat compute-0-0/sys/devices/system/node/node*/cpumap 00000000,000000ff 00000000,0000ff00 00000000,00ff0000 00000000,ff000000 000000ff,00000000 0000ff00,00000000 00ff0000,00000000 ff000000,00000000 This is likely a BIOS bug, and indeed the BIOS is older on compute-0-0 (3.0a instead of 3.5). I would suggest trying the latest 3.5a from http://www.supermicro.com/support/resources/results.aspx If it doesn't help, you should ask SuperMicro to provide the old 3.0a (and report the issue) Brice Le 10/02/2016 21:30, Fabricio Cannini a écrit : > Hello there > > I'm facing an issue with hwloc 1.5.3 (old, i know, but it's the stock > centos 6 package) in that a single node emits this message whenever i > run any MPI-enabled program. > > **************************************************************************** > > * Hwloc has encountered what looks like an error from the operating > system. > * > * object (Socket P#0 cpuset 0x0000ffff) intersection without inclusion! > * Error occurred in topology.c line 718 > * > * Please report this error message to the hwloc user's mailing list, > * along with the output from the hwloc-gather-topology.sh script. > **************************************************************************** > > > > This happen only in one node. Other similar nodes (same hardware, same > OS, same software) run fine. The OS is centos 6.5 x86_64. > > > In the attachments, 'compute-0-0' is the healthy node, 'compute-0-12' > is the quirky one. > > > Is it possible to point the faulty hardware from the attached outputs ? > > > TIA, > Fabricio