Re: [OMPI devel] Launcher in trunk is broken?

2011-10-10 Thread Jeff Squyres
On Oct 10, 2011, at 1:29 PM, Ralph Castain wrote: >>> From a network point of view, there is a slight issue with the commit >>> 25245. A direct call to exit will close all pending sockets, with a linger >>> of 60 seconds (quite bad if you use static ports as an example). There are >>> proper pr

Re: [OMPI devel] Launcher in trunk is broken?

2011-10-10 Thread Ralph Castain
On Oct 10, 2011, at 11:14 AM, George Bosilca wrote: > Ralph, > > If you don't mind I would like to understand this issue a little bit more. > What exactly is broken in the termination detection? > >> From a network point of view, there is a slight issue with the commit 25245. >> A direct call

Re: [OMPI devel] Launcher in trunk is broken?

2011-10-10 Thread George Bosilca
Ralph, If you don't mind I would like to understand this issue a little bit more. What exactly is broken in the termination detection? >From a network point of view, there is a slight issue with the commit 25245. A >direct call to exit will close all pending sockets, with a linger of 60 >secon

Re: [OMPI devel] Launcher in trunk is broken?

2011-10-10 Thread Ralph Castain
It wasn't the launcher that was broken, but termination detection, and not for all environments (e.g., worked fine for slurm). It is a progress-related issue. Should be fixed in r25245. On Oct 10, 2011, at 8:33 AM, Shamis, Pavel wrote: > + 1 , I see the same issue. > >> -Original Message-

Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-10 Thread George Bosilca
On Oct 7, 2011, at 21:44 , Alex Brick wrote: > I'm a little unclear on this comment. My comment was about the generic way the entire problem was stated. DMTCP doesn't support checkpointing of Open MPI applications. Instead the correct answer is "DMTCP supports checkpointing of Open MPI applica

Re: [OMPI devel] Launcher in trunk is broken?

2011-10-10 Thread Ralph Castain
I'm back from vacation - am building now and will take a look (will be a little while to build). On Oct 10, 2011, at 8:33 AM, Shamis, Pavel wrote: > + 1 , I see the same issue. > >> -Original Message- >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] >> On Behalf O

Re: [OMPI devel] Launcher in trunk is broken?

2011-10-10 Thread Shamis, Pavel
+ 1 , I see the same issue. > -Original Message- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] > On Behalf Of Yevgeny Kliteynik > Sent: Monday, October 10, 2011 10:24 AM > To: OpenMPI Devel > Subject: [OMPI devel] Launcher in trunk is broken? > > It looks like the

[OMPI devel] Launcher in trunk is broken?

2011-10-10 Thread Yevgeny Kliteynik
It looks like the process launcher is broken in the OMPI trunk: If you run any simple test (not necessarily including MPI calls) on 4 or more nodes, the MPI processes won't be killed after the test finishes. $ mpirun -host host_1,host_2,host_3,host_4 -np 4 --mca btl sm,tcp,self /bin/hostname Out

Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-10 Thread Jeff Squyres
I think he's asking how MTCP does this without involvement by the MPI implementation. On Oct 7, 2011, at 9:44 PM, Alex Brick wrote: > I'm a little unclear on this comment. > > DMTCP currently supports checkpointing and restoring sockets over TCP, and we > are actively working on Infiniband su