[OMPI devel] RFC: new OMPI RTE define:

2014-02-17 Thread Jeff Squyres (jsquyres)
WHAT: New OMPI_RTE_EVENT_BASE define WHY: The usnic BTL needs to run some events asynchronously; the ORTE event base already exists and is running asynchronously in MPI processes WHERE: in ompi/mca/rte/rte.h and rte_orte.h TIMEOUT: COB Friday, 21 Feb 2014 MORE DETAIL: The WHY line described i

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-17 Thread Josh Hursey
It look fine except that the restart state is not flagged. When a process is restarted does it resume execution inside the criu_dump() function? If so, is there a way to tell from its return code (or some other mechanism) that it is being restarted versus continuing after checkpointing? On Mon, F

Re: [OMPI devel] OPAL_CRS_* meaning

2014-02-17 Thread Josh Hursey
These values indicate the current state of the checkpointing lifecycle. In particular CONTINUE/RESTART are set by the checkpointer in the CRS (all others are used by the INC mechanism). In the opal_crs.checkpoint() call the checkpointer will capture the program state and it is possible to emerge fr

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-17 Thread Ralph Castain
Great - looks fine to me!! On Feb 17, 2014, at 11:39 AM, Adrian Reber wrote: > I have prepared a patch I would like to commit which adds to code to > actually checkpoint a process. Thanks for the pointers about the string > variables I tried to do implement it correctly. > > CRIU currently has

[OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-17 Thread Adrian Reber
I have prepared a patch I would like to commit which adds to code to actually checkpoint a process. Thanks for the pointers about the string variables I tried to do implement it correctly. CRIU currently has problems with the new OOB usock but I will contact the CRIU developers about this error. U

Re: [OMPI devel] [PATCH] Fix typo defining macro _WORD_MASK_

2014-02-17 Thread Jeff Squyres (jsquyres)
+1 On Feb 16, 2014, at 4:55 PM, Andreas Schwab wrote: > diff --git a/opal/util/crc.c b/opal/util/crc.c > index 9cfae94..c2112de 100644 > --- a/opal/util/crc.c > +++ b/opal/util/crc.c > @@ -41,7 +41,7 @@ > #elif (OPAL_ALIGNMENT_LONG == 4) > #define _WORD_MASK_ 0x3 > #else > -#define _WORD_MASK 0x

Re: [OMPI devel] How to prefer oob/tcp over oob/usock

2014-02-17 Thread Ralph Castain
Sure: "-mca oob tcp" On Feb 17, 2014, at 8:10 AM, Adrian Reber wrote: > With the newly added oob/usock checkpointing with CRIU stopped working. > Is there a way I can prefer oob/tcp on the command line? > > Adrian > ___ > devel mailing

[OMPI devel] How to prefer oob/tcp over oob/usock

2014-02-17 Thread Adrian Reber
With the newly added oob/usock checkpointing with CRIU stopped working. Is there a way I can prefer oob/tcp on the command line? Adrian

[OMPI devel] OPAL_CRS_* meaning

2014-02-17 Thread Adrian Reber
This is probably for Josh. What is the meaning of the OPAL_CRS_* enums? They are probably used to communicate the state of the CRS modules. OPAL_CRS_ERROR seems to be used in case an error happened. What is the CRS module supposed to set this to if the checkpoint was successful. OPAL_CRS_CONTINUE

Re: [OMPI devel] How to read OPAL_OUTPUT-ed strings

2014-02-17 Thread Ralph Castain
Looking at your cmd line, it looks like you are trying to get diagnostic output from the mapper? If so, that cmd line is totally wrong. First, there are no "OPAL_OUTPUT" calls (at least, that I know of) in the orte layer as I studiously avoid them. Instead, everything is either cap or lower case

Re: [OMPI devel] How to read OPAL_OUTPUT-ed strings

2014-02-17 Thread Jeff Squyres (jsquyres)
OPAL_OUTPUT is the exact equivalent of opal_output(), except that it is complied out for non-debug builds. So if you did a production build (E.g., a vpath build), OPAL_OUTPUT() will be compiled out. Otherwise, we typically use stream 0 for debugging stuff. On Feb 17, 2014, at 3:21 AM, Alex Mar

[OMPI devel] How to read OPAL_OUTPUT-ed strings

2014-02-17 Thread Alex Margolin
Hi, I'm having trouble getting the OPAL_OUTPUT to print. I'm trying the following command line (with no success): `pwd`/osh_install/bin/oshrun --map-by node -np 2 -mca orte_debug true -mca orte_debug_verbose 100 -mca orte_report_silent_errors true -mca orte_map_stddiag_to_stderr true ./examples/