Re: [hwloc-users] MI300A support

2024-02-08 Thread Brice Goglin

Hello

I don't have access to a MI300A but I worked with AMD several month ago 
to solve a very similar issue. It was caused by a buggy APCI HMAT in the 
BIOS.


Try setting HWLOC_USE_NUMA_DISTANCES=0 in the environment to disable the 
hwloc code that uses this HMAT info. If the warning goes away then you 
need to get a more recent firmware. That said, it would be annoying if 
this old buggy firmware is still in the wild 4 months later.


Brice


Le 08/02/2024 à 18:23, Hartman, John a écrit :


Is there a timeline for hwloc to support the MI300A? Currently, hwloc 
isn’t happy when it encounters one:




* hwloc 2.9.0 received invalid information from the operating system.

*

* Failed with: intersection without inclusion

* while inserting Group0 (cpuset 
0x00ff,0x,0x00ff,0x00ff,0x,0x00ff) at 
Group0 (cpuset 0x,0x,,0x,0x)


* coming from: linux:sysfs:numa

*

* The following FAQ entry in the hwloc documentation may help:

* What should I do when hwloc reports "operating system" warnings?

* Otherwise please report this error message to the hwloc user's 
mailing list,


* along with the files generated by the hwloc-gather-topology script.

*

* hwloc will now ignore this invalid topology information and continue.



John


___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


OpenPGP_signature.asc
Description: OpenPGP digital signature
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

[hwloc-users] MI300A support

2024-02-08 Thread Hartman, John
Is there a timeline for hwloc to support the MI300A? Currently, hwloc isn’t 
happy when it encounters one:


* hwloc 2.9.0 received invalid information from the operating system.
*
* Failed with: intersection without inclusion
* while inserting Group0 (cpuset 
0x00ff,0x,0x00ff,0x00ff,0x,0x00ff) at Group0 
(cpuset 0x,0x,,0x,0x)
* coming from: linux:sysfs:numa
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the files generated by the hwloc-gather-topology script.
*
* hwloc will now ignore this invalid topology information and continue.


John
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users