[OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-07 Thread Pim Schellart
this error was discussed before and it was possible to sort it out in “insert_object_by_parent”, is this still considered? If not, what (top level) hwloc API call should we look for in the SLURM sources to start debugging? Any help will be most welcome. Kind regards, Pim Schellart

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-08 Thread Pim Schellart
both the working and `broken` case. Note that the xml is of course printed (differently) multiple times for each task/core. As always, any help would be appreciated. Regards, Pim Schellart P.S. $ mpirun --version mpirun (Open MPI) 1.6.5 broken.log Description: Binary data working.log

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-08 Thread Pim Schellart
” if you happen to get nodes where all the versions match, and > “fail” when you get a combination that includes a different version. > > Is there some way you can narrow down your search to find the node(s) that > are picking up the different version? > > >> On Dec

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-08 Thread Pim Schellart
r install whether OMPI uses the embedded or the >> system-wide hwloc? >> >> Brice >> >> >> >> >> Le 08/12/2014 17:07, Pim Schellart a écrit : >>> Dear Ralph, >>> >>> the nodes are called coma## and as you can see in the

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-09 Thread Pim Schellart
oad and install an OMPI tarball myself and avoid these > headaches. This mismatch in required versions is why we embed hwloc as it is > a critical library for OMPI, and we had to ensure that the version matched > our internal requirements. > > >> On Dec 8, 2014, at 8:50

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-09 Thread Pim Schellart
are completely isolated from each other. > > >> On Dec 9, 2014, at 12:25 AM, Pim Schellart wrote: >> >> The version that “lstopo --version” reports is the same (1.8) on all nodes, >> but we may indeed be hitting the second issue. We can try to compile a new

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-10 Thread Pim Schellart
is because for the first started job the CPU cores assigned are 0 and 1 whereas they are different for the later started jobs. I attached the output (including lstopo —of xml output (called for each task)) for both the working and broken case again. Kind regards, Pim Schellart

Re: [OMPI devel] hwloc out-of-order topology discovery with SLURM 14.11.0 and openmpi 1.6

2014-12-12 Thread Pim Schellart
where exactly we don’t know. Thank you for your help in solving this! Kind regards, Pim Schellart > On 11 Dec 2014, at 04:19, Gilles Gouaillardet > wrote: > > Ralph, > > You are right, > please disregard my previous post, it was irrelevant. > > i just noticed t