Hello
I don't have access to a MI300A but I worked with AMD several month ago
to solve a very similar issue. It was caused by a buggy APCI HMAT in the
BIOS.
Try setting HWLOC_USE_NUMA_DISTANCES=0 in the environment to disable the
hwloc code that uses this HMAT info. If the warning goes away then you
need to get a more recent firmware. That said, it would be annoying if
this old buggy firmware is still in the wild 4 months later.
Brice
Le 08/02/2024 à 18:23, Hartman, John a écrit :
Is there a timeline for hwloc to support the MI300A? Currently, hwloc
isn’t happy when it encounters one:
* hwloc 2.9.0 received invalid information from the operating system.
*
* Failed with: intersection without inclusion
* while inserting Group0 (cpuset
0x00ff,0x,0x00ff,0x00ff,0x,0x00ff) at
Group0 (cpuset 0x,0x,,0x,0x)
* coming from: linux:sysfs:numa
*
* The following FAQ entry in the hwloc documentation may help:
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's
mailing list,
* along with the files generated by the hwloc-gather-topology script.
*
* hwloc will now ignore this invalid topology information and continue.
John
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users
OpenPGP_signature.asc
Description: OpenPGP digital signature
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users