Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-10 Thread George Bosilca
On Oct 7, 2011, at 21:44 , Alex Brick wrote: > I'm a little unclear on this comment. My comment was about the generic way the entire problem was stated. DMTCP doesn't support checkpointing of Open MPI applications. Instead the correct answer is "DMTCP supports checkpointing of Open MPI applica

Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-10 Thread Jeff Squyres
I think he's asking how MTCP does this without involvement by the MPI implementation. On Oct 7, 2011, at 9:44 PM, Alex Brick wrote: > I'm a little unclear on this comment. > > DMTCP currently supports checkpointing and restoring sockets over TCP, and we > are actively working on Infiniband su

Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-07 Thread Alex Brick
I'm a little unclear on this comment. DMTCP currently supports checkpointing and restoring sockets over TCP, and we are actively working on Infiniband support. However, we feel that value is added by also working as an Open MPI module, where Open MPI handles all of the network communication, a

Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-07 Thread Jeff Squyres
Thanks Alex. Can you answer George's other question about "hand waving"? On Oct 7, 2011, at 3:59 PM, Alex Brick wrote: > Yes, we were trying to give some background on the project and use consistent > branding. Our package is called DMTCP, which includes two components: DMTCP > (a distri

Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-07 Thread Alex Brick
Yes, we were trying to give some background on the project and use consistent branding. Our package is called DMTCP, which includes two components: DMTCP (a distributed checkpointer), and MTCP (a single process checkpointer, which can be used both standalone and internally by DMTCP). This RFC

Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-07 Thread George Bosilca
Way too much hands waving here. When you say certain networks you mean TCP and potentially SM. However, I doubt even TCP can be fully supported. Not without the preconnect option … or a mean to update the modes information. george. On Oct 7, 2011, at 14:56 , Josh Hursey wrote: >> From what

Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-07 Thread Josh Hursey
>From what I have seen during development, this RFC integrates the MTCP single process checkpointer into the C/R infrastructure of Open MPI. The MTCP component of the DMTCP project can be used in insolation, which is what they are integrating. So they can use DMTCP to checkpoint/restart an unmodifi

Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-06 Thread George Bosilca
Alex, It looks like there is a mismatch between what you propose to achieve and the text in your RFC. You propose to add a new single-process checkpoint-restart mechanism (MTCP), to the ones already provided in Open MPI. However, most of the text in your RFC is about DMTCP, which is another lay

Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package

2011-10-06 Thread Jeff Squyres
Terry -- Please add this to the agenda for Oct 18. I'd like to invite Alex and his advisor to the Oct 18 teleconf to discuss. On Oct 6, 2011, at 2:58 AM, Alex Brick wrote: > WHAT: Bring in the mtcp CRS component > > WHY: Add support for the MTCP checkpoint/restart service > > WHERE: opal/m