Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Josh Hursey
Yep. For the checkpoint/continue that patch looks good. On Tue, Feb 18, 2014 at 11:30 AM, Adrian Reber wrote: > On Tue, Feb 18, 2014 at 10:21:23AM -0600, Josh Hursey wrote: > > So when a process is restarted with CRIU, does it resume execution after > > the criu_dump() or somewhere else? > > Th

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Adrian Reber
On Tue, Feb 18, 2014 at 10:21:23AM -0600, Josh Hursey wrote: > So when a process is restarted with CRIU, does it resume execution after > the criu_dump() or somewhere else? The process is resumed at the same point it was checkpointed with criu_dump(). > In a continue/leave-running mode after chec

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Josh Hursey
So when a process is restarted with CRIU, does it resume execution after the criu_dump() or somewhere else? In a continue/leave-running mode after checkpoint the MPI library does not need to do quite a much work since we can depend on some things not changing (such as the machine name, orted pid,

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Adrian Reber
I think I do not understand your question. So far I have only implemented the checkpoint part and not the restart part. Using criu_dump() the process can be left in three different states. Without any special handling the process is dumped and then killed. I can also tell criu to leave the proces

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-17 Thread Josh Hursey
It look fine except that the restart state is not flagged. When a process is restarted does it resume execution inside the criu_dump() function? If so, is there a way to tell from its return code (or some other mechanism) that it is being restarted versus continuing after checkpointing? On Mon, F

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-17 Thread Ralph Castain
Great - looks fine to me!! On Feb 17, 2014, at 11:39 AM, Adrian Reber wrote: > I have prepared a patch I would like to commit which adds to code to > actually checkpoint a process. Thanks for the pointers about the string > variables I tried to do implement it correctly. > > CRIU currently has

[OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-17 Thread Adrian Reber
I have prepared a patch I would like to commit which adds to code to actually checkpoint a process. Thanks for the pointers about the string variables I tried to do implement it correctly. CRIU currently has problems with the new OOB usock but I will contact the CRIU developers about this error. U