On Oct 7, 2011, at 21:44 , Alex Brick wrote:
> I'm a little unclear on this comment.
My comment was about the generic way the entire problem was stated. DMTCP
doesn't support checkpointing of Open MPI applications. Instead the correct
answer is "DMTCP supports checkpointing of Open MPI applica
I think he's asking how MTCP does this without involvement by the MPI
implementation.
On Oct 7, 2011, at 9:44 PM, Alex Brick wrote:
> I'm a little unclear on this comment.
>
> DMTCP currently supports checkpointing and restoring sockets over TCP, and we
> are actively working on Infiniband su
I'm a little unclear on this comment.
DMTCP currently supports checkpointing and restoring sockets over TCP, and we
are actively working on Infiniband support. However, we feel that value is
added by also working as an Open MPI module, where Open MPI handles all of the
network communication, a
Thanks Alex. Can you answer George's other question about "hand waving"?
On Oct 7, 2011, at 3:59 PM, Alex Brick wrote:
> Yes, we were trying to give some background on the project and use consistent
> branding. Our package is called DMTCP, which includes two components: DMTCP
> (a distri
Yes, we were trying to give some background on the project and use consistent
branding. Our package is called DMTCP, which includes two components: DMTCP
(a distributed checkpointer), and MTCP (a single process checkpointer, which
can be used both standalone and internally by DMTCP).
This RFC
Way too much hands waving here.
When you say certain networks you mean TCP and potentially SM. However, I doubt
even TCP can be fully supported. Not without the preconnect option … or a mean
to update the modes information.
george.
On Oct 7, 2011, at 14:56 , Josh Hursey wrote:
>> From what
>From what I have seen during development, this RFC integrates the MTCP
single process checkpointer into the C/R infrastructure of Open MPI.
The MTCP component of the DMTCP project can be used in insolation,
which is what they are integrating. So they can use DMTCP to
checkpoint/restart an unmodifi
Alex,
It looks like there is a mismatch between what you propose to achieve and the
text in your RFC. You propose to add a new single-process checkpoint-restart
mechanism (MTCP), to the ones already provided in Open MPI. However, most of
the text in your RFC is about DMTCP, which is another lay
Terry --
Please add this to the agenda for Oct 18. I'd like to invite Alex and his
advisor to the Oct 18 teleconf to discuss.
On Oct 6, 2011, at 2:58 AM, Alex Brick wrote:
> WHAT: Bring in the mtcp CRS component
>
> WHY: Add support for the MTCP checkpoint/restart service
>
> WHERE: opal/m