Andrew,

the 2 seconds timeout is very likely a bug that was fixed, so i strongly
suggest you give a try to the latest 2.0.2 that was released earlier this
week.

Ralph is referring an other timeout which is hard coded (fwiw, the MPI
standard says nothing about timeout, so we hardcoded one to prevent jobs
from hanging forever) to 600 seconds in master, but is still 60 seconds in
the v2.0.x branch
IIRC, the hard coded timeout is in MPI_Comm_{accept,connect} and i do not
know if it is somehow involved in MPI_Comm_spawn.

Cheers,

Gilles

On Saturday, February 4, 2017, r...@open-mpi.org <r...@open-mpi.org> wrote:

> We know v2.0.1 has problems with comm_spawn, and so you may be
> encountering one of those. Regardless, there is indeed a timeout mechanism
> in there. It was added because people would execute a comm_spawn, and then
> would hang and eat up their entire allocation time for nothing.
>
> In v2.0.2, I see it is still hardwired at 60 seconds. I believe we
> eventually realized we needed to make that a variable, but it didn’t get
> into the 2.0.2 release.
>
>
> > On Feb 1, 2017, at 1:00 AM, elistrato...@info.sgu.ru <javascript:;>
> wrote:
> >
> > I am using Open MPI version 2.0.1.
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org <javascript:;>
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <javascript:;>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to