this error was discussed before
and it was possible to sort it out in “insert_object_by_parent”, is this still
considered? If not, what (top level) hwloc API call should we look for in the
SLURM sources to start debugging? Any help will be most welcome.
Kind regards,
Pim Schellart
both the working and `broken` case. Note that the xml is of
course printed (differently) multiple times for each task/core. As always, any
help would be appreciated.
Regards,
Pim Schellart
P.S. $ mpirun --version
mpirun (Open MPI) 1.6.5
broken.log
Description: Binary data
working.log
” if you happen to get nodes where all the versions match, and
> “fail” when you get a combination that includes a different version.
>
> Is there some way you can narrow down your search to find the node(s) that
> are picking up the different version?
>
>
>> On Dec
r install whether OMPI uses the embedded or the
>> system-wide hwloc?
>>
>> Brice
>>
>>
>>
>>
>> Le 08/12/2014 17:07, Pim Schellart a écrit :
>>> Dear Ralph,
>>>
>>> the nodes are called coma## and as you can see in the
oad and install an OMPI tarball myself and avoid these
> headaches. This mismatch in required versions is why we embed hwloc as it is
> a critical library for OMPI, and we had to ensure that the version matched
> our internal requirements.
>
>
>> On Dec 8, 2014, at 8:50
are completely isolated from each other.
>
>
>> On Dec 9, 2014, at 12:25 AM, Pim Schellart wrote:
>>
>> The version that “lstopo --version” reports is the same (1.8) on all nodes,
>> but we may indeed be hitting the second issue. We can try to compile a new
is because for the first started job the CPU cores assigned
are 0 and 1 whereas they are different for the later started jobs. I attached
the output (including lstopo —of xml output (called for each task)) for both
the working and broken case again.
Kind regards,
Pim Schellart
where exactly we don’t
know. Thank you for your help in solving this!
Kind regards,
Pim Schellart
> On 11 Dec 2014, at 04:19, Gilles Gouaillardet
> wrote:
>
> Ralph,
>
> You are right,
> please disregard my previous post, it was irrelevant.
>
> i just noticed t