On Oct 10, 2011, at 1:29 PM, Ralph Castain wrote:
>>> From a network point of view, there is a slight issue with the commit
>>> 25245. A direct call to exit will close all pending sockets, with a linger
>>> of 60 seconds (quite bad if you use static ports as an example). There are
>>> proper pr
On Oct 10, 2011, at 11:14 AM, George Bosilca wrote:
> Ralph,
>
> If you don't mind I would like to understand this issue a little bit more.
> What exactly is broken in the termination detection?
>
>> From a network point of view, there is a slight issue with the commit 25245.
>> A direct call
Ralph,
If you don't mind I would like to understand this issue a little bit more. What
exactly is broken in the termination detection?
>From a network point of view, there is a slight issue with the commit 25245. A
>direct call to exit will close all pending sockets, with a linger of 60
>secon
It wasn't the launcher that was broken, but termination detection, and not for
all environments (e.g., worked fine for slurm). It is a progress-related issue.
Should be fixed in r25245.
On Oct 10, 2011, at 8:33 AM, Shamis, Pavel wrote:
> + 1 , I see the same issue.
>
>> -Original Message-
On Oct 7, 2011, at 21:44 , Alex Brick wrote:
> I'm a little unclear on this comment.
My comment was about the generic way the entire problem was stated. DMTCP
doesn't support checkpointing of Open MPI applications. Instead the correct
answer is "DMTCP supports checkpointing of Open MPI applica
I'm back from vacation - am building now and will take a look (will be a little
while to build).
On Oct 10, 2011, at 8:33 AM, Shamis, Pavel wrote:
> + 1 , I see the same issue.
>
>> -Original Message-
>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>> On Behalf O
+ 1 , I see the same issue.
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
> On Behalf Of Yevgeny Kliteynik
> Sent: Monday, October 10, 2011 10:24 AM
> To: OpenMPI Devel
> Subject: [OMPI devel] Launcher in trunk is broken?
>
> It looks like the
It looks like the process launcher is broken in the OMPI trunk:
If you run any simple test (not necessarily including MPI calls) on 4 or
more nodes, the MPI processes won't be killed after the test finishes.
$ mpirun -host host_1,host_2,host_3,host_4 -np 4 --mca btl sm,tcp,self
/bin/hostname
Out
I think he's asking how MTCP does this without involvement by the MPI
implementation.
On Oct 7, 2011, at 9:44 PM, Alex Brick wrote:
> I'm a little unclear on this comment.
>
> DMTCP currently supports checkpointing and restoring sockets over TCP, and we
> are actively working on Infiniband su