WHAT: New OMPI_RTE_EVENT_BASE define
WHY: The usnic BTL needs to run some events asynchronously; the ORTE event base
already exists and is running asynchronously in MPI processes
WHERE: in ompi/mca/rte/rte.h and rte_orte.h
TIMEOUT: COB Friday, 21 Feb 2014
MORE DETAIL:
The WHY line described i
It look fine except that the restart state is not flagged. When a process
is restarted does it resume execution inside the criu_dump() function? If
so, is there a way to tell from its return code (or some other mechanism)
that it is being restarted versus continuing after checkpointing?
On Mon, F
These values indicate the current state of the checkpointing lifecycle. In
particular CONTINUE/RESTART are set by the checkpointer in the CRS (all
others are used by the INC mechanism). In the opal_crs.checkpoint() call
the checkpointer will capture the program state and it is possible to
emerge fr
Great - looks fine to me!!
On Feb 17, 2014, at 11:39 AM, Adrian Reber wrote:
> I have prepared a patch I would like to commit which adds to code to
> actually checkpoint a process. Thanks for the pointers about the string
> variables I tried to do implement it correctly.
>
> CRIU currently has
I have prepared a patch I would like to commit which adds to code to
actually checkpoint a process. Thanks for the pointers about the string
variables I tried to do implement it correctly.
CRIU currently has problems with the new OOB usock but I will contact
the CRIU developers about this error. U
+1
On Feb 16, 2014, at 4:55 PM, Andreas Schwab wrote:
> diff --git a/opal/util/crc.c b/opal/util/crc.c
> index 9cfae94..c2112de 100644
> --- a/opal/util/crc.c
> +++ b/opal/util/crc.c
> @@ -41,7 +41,7 @@
> #elif (OPAL_ALIGNMENT_LONG == 4)
> #define _WORD_MASK_ 0x3
> #else
> -#define _WORD_MASK 0x
Sure: "-mca oob tcp"
On Feb 17, 2014, at 8:10 AM, Adrian Reber wrote:
> With the newly added oob/usock checkpointing with CRIU stopped working.
> Is there a way I can prefer oob/tcp on the command line?
>
> Adrian
> ___
> devel mailing
With the newly added oob/usock checkpointing with CRIU stopped working.
Is there a way I can prefer oob/tcp on the command line?
Adrian
This is probably for Josh. What is the meaning of the OPAL_CRS_* enums?
They are probably used to communicate the state of the CRS modules.
OPAL_CRS_ERROR seems to be used in case an error happened. What is the
CRS module supposed to set this to if the checkpoint was successful.
OPAL_CRS_CONTINUE
Looking at your cmd line, it looks like you are trying to get diagnostic output
from the mapper? If so, that cmd line is totally wrong. First, there are no
"OPAL_OUTPUT" calls (at least, that I know of) in the orte layer as I
studiously avoid them. Instead, everything is either cap or lower case
OPAL_OUTPUT is the exact equivalent of opal_output(), except that it is
complied out for non-debug builds.
So if you did a production build (E.g., a vpath build), OPAL_OUTPUT() will be
compiled out. Otherwise, we typically use stream 0 for debugging stuff.
On Feb 17, 2014, at 3:21 AM, Alex Mar
Hi,
I'm having trouble getting the OPAL_OUTPUT to print. I'm trying the
following command line (with no success):
`pwd`/osh_install/bin/oshrun --map-by node -np 2 -mca orte_debug true -mca
orte_debug_verbose 100 -mca orte_report_silent_errors true -mca
orte_map_stddiag_to_stderr true ./examples/
12 matches
Mail list logo