Hello

compute-0-12 reports totally buggy NUMA information:

$ cat compute-0-12/sys/devices/system/node/node*/cpumap
00000000,000000ff
00000000,ff00ff00
00000000,00ff0000
0000ffff,00000000
$ cat compute-0-0/sys/devices/system/node/node*/cpumap
00000000,000000ff
00000000,0000ff00
00000000,00ff0000
00000000,ff000000
000000ff,00000000
0000ff00,00000000
00ff0000,00000000
ff000000,00000000

This is likely a BIOS bug, and indeed the BIOS is older on compute-0-0
(3.0a instead of 3.5). I would suggest trying the latest 3.5a from
http://www.supermicro.com/support/resources/results.aspx
If it doesn't help, you should ask SuperMicro to provide the old 3.0a
(and report the issue)

Brice



Le 10/02/2016 21:30, Fabricio Cannini a écrit :
> Hello there
>
> I'm facing an issue with hwloc 1.5.3 (old, i know, but it's the stock
> centos 6 package) in that a single node emits this message whenever i
> run any MPI-enabled program.
>
> ****************************************************************************
>
> * Hwloc has encountered what looks like an error from the operating
> system.
> *
> * object (Socket P#0 cpuset 0x0000ffff) intersection without inclusion!
> * Error occurred in topology.c line 718
> *
> * Please report this error message to the hwloc user's mailing list,
> * along with the output from the hwloc-gather-topology.sh script.
> ****************************************************************************
>
>
>
> This happen only in one node. Other similar nodes (same hardware, same
> OS, same software) run fine. The OS is centos 6.5 x86_64.
>
>
> In the attachments, 'compute-0-0' is the healthy node, 'compute-0-12'
> is the quirky one.
>
>
> Is it possible to point the faulty hardware from the attached outputs ?
>
>
> TIA,
> Fabricio

Reply via email to