WHAT: Bring in the mtcp CRS component

WHY: Add support for the MTCP checkpoint/restart service

WHERE: opal/mca/crs/mtcp

TIMEOUT: Tuesday teleconf, 2011-10-18 (about 1 week from now)

-------------------------------------------
What is MTCP?

MTCP (MultiThreaded CheckPointing; http://dmtcp.sourceforge.net) is an LGPL single-process checkpointing package that has been under development for seven years. It operates entirely in user space, requiring no special kernel modules or superuser access to a system. Using it is as simple as linking with a library and adding a call to the mtcp_init function to your code.

MTCP is distributed as a part of the DMTCP package, and is currently available as a Debian package.

-------------------------------------------
More details:

Open MPI MTCP integration implementation available at:

  https://bitbucket.org/jsquyres/ompi-dmtcp2

The DMTCP parent project website is below:

  http://dmtcp.sourceforge.net/

This RFC introduces a new CRS component for Open MPI that uses MTCP to provide transparent checkpointing. The primary advantage of MTCP over the existing BLCR CRS module is that it operates entirely in userspace, meaning that any user can use it on a system without requiring special kernel modules or superuser access to the system. Like the BLCR module, using the MTCP CRS module is entirely transparent to the actual user process, and requires no modification to the user program.

Jeff Hursey and Josh Squyres have been working with the DMTCP authors (based out of the US Northeastern University in Boston, MA, USA) for quite a while and feel that this component is ready to be brought into the Open MPI main line for inclusion in the 1.7.x series (and possibly the 1.5.x series?). The authors have submitted OMPI 3rd party contribution agreements.

Reply via email to