I want to try that tomorrow. Currently I use open-mpi, ist it worth buying
intel-mpi? We have the C++ and Fortran compilers but not mpi up to now.
Might it be an issue with the hwloc xml file? My idea is, if it would help to
temporarily install an older kernel - 3.2 was reported to work - and ge
I guess the next step would be to look at how these tasks are placed on
the machine. There are 8 NUMA nodes on the machine. Maybe 9 is where it
starts placing a second task per NUMA node?
For OMPI, --report-bindings may help. I am not sure about MPICH.
Brice
Le 27/10/2015 15:52, Fabian Wein a é
On 10/27/2015 03:42 PM, Brice Goglin wrote:
I guess the problem is that your OMPI uses an old hwloc internally. That
one may be too old to understand recent XML exports.
Try replacing "Package" with "Socket" everywhere in the XML file.
Thanks! That was it.
I now get almost perfectly reproducib
I guess the problem is that your OMPI uses an old hwloc internally. That
one may be too old to understand recent XML exports.
Try replacing "Package" with "Socket" everywhere in the XML file.
Brice
Le 27/10/2015 15:31, Fabian Wein a écrit :
> Thank you very much for the file.
>
> When I try wit
Thank you very much for the file.
When I try with PETSc, compiled with open-mpi and icc I get
--
Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
-
Hi Brice,
I just tested the patch, which AMD did on our system. After that hwloc red
the information about HW configuration correctly. I asked AMD developers to
inform me as soon as they push the fix. So far I did nor receive info... I may
urge them after week or so.
Ondrej
> On Tuesday, Oct
Here's the fixed XML. For the record, for each NUMA node, I extended the
cpusets of the L3 to match the container NUMA node, and moved all L2
objects as children of that L3.
Now you may load that XML instead of the native discovery by setting
HWLOC_XMLFILE=leo2.xml in your environment.
Brice
Le
Brice,
thank you very much for the offer. I attached the xml file
..
* hwloc 1.11.1 has encountered what looks like an error from the
operating system.
*
* L3 (cpuset 0x03f0) intersects with NUMANode (P#0 cpuset
0x003f) without inclusion!
* Error occurred in topology.c line 981
*
..
Hello
This bug is about L3 cache locality only, everything else should be
fine, including cache sizes. Few applications use that locality
information, so I assume it doesn't matter for PETSc scaling.
We can work around the bug by loading a XML topology. There's no easy
way to build that correct XM
Hello,
Fabian Wein, le Tue 27 Oct 2015 09:43:22 +0100, a écrit :
> Is there a way to configure the topology manually?
Yes, you can export the current topology to an xml file:
lstopo platform.xml
then modify the platform, then use
export HWLOC_XMLFILE=platform.xml
to force using the modified x
Hello
Good to know. Did you see/test the kernel patch yet? If possible, could
you send a link to the kernel commit when it appears upstream?
Thanks
Brice
Le 27/10/2015 09:21, Ondřej Vlček a écrit :
> Dear Brice,
> thank you for your answer. Neither upgrade of BIOS nor using the latest
> hwloc
Hello,
I'm new to the list and new to the mpi-business, too.
Our 4*12 Opteron 6238 system is very similar to the one from the
original poster and I get the same error message.
Any use in posting my logs?
I compiled the latest hwloc, no change. our System is Ubunut 14.4 LTS
with kernel 3.13.
Dear Brice,
thank you for your answer. Neither upgrade of BIOS nor using the latest
hwloc helped. Finaly we contacted AMD and they fixed a bug in kernel which
coused problems with 12-core AMD processors. They should upstream the changes
to kernel.org soon, so that all the distros (Centos,RHEL,
Dear Brice,
thank you for your answer. Neither upgrade of BIOS nor using the latest
hwloc helped. Finaly we contacted AMD and they fixed a bug in kernel which
coused problems with 12-core AMD processors. They should upstream the
changes to kernel.org soon, so that all the distros (Centos,RHEL,
14 matches
Mail list logo