[hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-08-24 Thread Ondřej Vlček
Dear all, I have encountered hwloc error for the AMD Opteron 6300 processor family (see below). I am using hwloc.x86_64 v1.7-3.el7, which is its latest version available in standard packages for CentOS 7. Is this something, what has been already encountered and fixed in newer versions of hwloc

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-08-24 Thread Brice Goglin
Hello, hwloc 1.7 is very old, I am surprised CentOS 7 doesn't have anything more recent, maybe not in "standard" packages? Anyway, this is a very common error on AMD 6200 and 6300 machines. See http://www.open-mpi.org/projects/hwloc/doc/v1.11.0/a00030.php#faq_os_error Assuming you kernel isn't to

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread vlcek
,SUSE etc.) can pick them up automatically as they create their respective next releases. Ondrej -- Původní zpráva -- Od: Brice Goglin Komu: Ondrej Certik Datum: 24. 8. 2015 15:32:33 Předmět: Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family "

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Ondřej Vlček
Dear Brice, thank you for your answer. Neither upgrade of BIOS nor using the latest hwloc helped. Finaly we contacted AMD and they fixed a bug in kernel which coused problems with 12-core AMD processors. They should upstream the changes to kernel.org soon, so that all the distros (Centos,RHEL,

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Fabian Wein
Hello, I'm new to the list and new to the mpi-business, too. Our 4*12 Opteron 6238 system is very similar to the one from the original poster and I get the same error message. Any use in posting my logs? I compiled the latest hwloc, no change. our System is Ubunut 14.4 LTS with kernel 3.13.

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Brice Goglin
Hello Good to know. Did you see/test the kernel patch yet? If possible, could you send a link to the kernel commit when it appears upstream? Thanks Brice Le 27/10/2015 09:21, Ondřej Vlček a écrit : > Dear Brice, > thank you for your answer. Neither upgrade of BIOS nor using the latest > hwloc

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Samuel Thibault
Hello, Fabian Wein, le Tue 27 Oct 2015 09:43:22 +0100, a écrit : > Is there a way to configure the topology manually? Yes, you can export the current topology to an xml file: lstopo platform.xml then modify the platform, then use export HWLOC_XMLFILE=platform.xml to force using the modified x

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Brice Goglin
Hello This bug is about L3 cache locality only, everything else should be fine, including cache sizes. Few applications use that locality information, so I assume it doesn't matter for PETSc scaling. We can work around the bug by loading a XML topology. There's no easy way to build that correct XM

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Fabian Wein
Brice, thank you very much for the offer. I attached the xml file .. * hwloc 1.11.1 has encountered what looks like an error from the operating system. * * L3 (cpuset 0x03f0) intersects with NUMANode (P#0 cpuset 0x003f) without inclusion! * Error occurred in topology.c line 981 * ..

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Brice Goglin
Here's the fixed XML. For the record, for each NUMA node, I extended the cpusets of the L3 to match the container NUMA node, and moved all L2 objects as children of that L3. Now you may load that XML instead of the native discovery by setting HWLOC_XMLFILE=leo2.xml in your environment. Brice Le

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Ondřej Vlček
Hi Brice, I just tested the patch, which AMD did on our system. After that hwloc red the information about HW configuration correctly. I asked AMD developers to inform me as soon as they push the fix. So far I did nor receive info... I may urge them after week or so. Ondrej > On Tuesday, Oct

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Fabian Wein
Thank you very much for the file. When I try with PETSc, compiled with open-mpi and icc I get -- Failed to parse XML input with the minimalistic parser. If it was not generated by hwloc, try enabling full XML support with libxml2. -

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Brice Goglin
I guess the problem is that your OMPI uses an old hwloc internally. That one may be too old to understand recent XML exports. Try replacing "Package" with "Socket" everywhere in the XML file. Brice Le 27/10/2015 15:31, Fabian Wein a écrit : > Thank you very much for the file. > > When I try wit

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Fabian Wein
On 10/27/2015 03:42 PM, Brice Goglin wrote: I guess the problem is that your OMPI uses an old hwloc internally. That one may be too old to understand recent XML exports. Try replacing "Package" with "Socket" everywhere in the XML file. Thanks! That was it. I now get almost perfectly reproducib

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Brice Goglin
I guess the next step would be to look at how these tasks are placed on the machine. There are 8 NUMA nodes on the machine. Maybe 9 is where it starts placing a second task per NUMA node? For OMPI, --report-bindings may help. I am not sure about MPICH. Brice Le 27/10/2015 15:52, Fabian Wein a é

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-27 Thread Fabian Wein
I want to try that tomorrow. Currently I use open-mpi, ist it worth buying intel-mpi? We have the C++ and Fortran compilers but not mpi up to now. Might it be an issue with the hwloc xml file? My idea is, if it would help to temporarily install an older kernel - 3.2 was reported to work - and ge

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-28 Thread Fabian Wein
I hope I'm still on the right list for my current problem. Today we figured out on a similiar but older four opteron (6100) 48 cores system that mpiexec -bind-to numa is the essential key point. This I want to realize on my system. I already installed libnuma such that hwloc configure uses n

Re: [hwloc-users] hwloc error for AMD Opteron 6300 processor family

2015-10-29 Thread Brice Goglin
Le 28/10/2015 18:04, Fabian Wein a écrit : > I hope I'm still on the right list for my current problem. Hello It looks like this should go to us...@open-mpi.org now. > - > A request was made to bind a process, but at least one node does NOT > support binding processes to cpus.