Hi Folks,
I'm seeing some MTT failures from last nite in the form of run failures
and/or timeouts.
Is anyone else seeing these?
On the ibm dataplex system I'm seeing these kinds of assertion failures in
ob1:
c_ring: pml_ob1_component.c:308: mca_pml_ob1_component_fini: Assertion
`((0xdeafbeedULL
I think this is fixed by Nathan's PR:
https://github.com/open-mpi/ompi/pull/653
It's waiting for George's review -- George?
> On Jun 24, 2015, at 7:14 AM, Howard Pritchard wrote:
>
> Hi Folks,
>
> I'm seeing some MTT failures from last nite in the form of run failures
> and/or timeouts.
Jeff,
Attached mlnx internal mpich test ini file. please note that we don't run all
the tests in the mpich tests - we remove some of them during the build stage in
the mpich_tests.ini file
-Devendar
mpich_tests.ini
Description: mpich_tests.ini
Daniel,
thanks for the logs.
an other workaround is to
mpirun --mca coll ^hcoll ...
i was able to reproduce the issue, and it surprisingly occurs only if
the coll_ml module is loaded *before* the hcoll module.
/* this is not the case on my system, so i had to hack my
mca_base_component_path i