[OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-17 Thread Adrian Reber
I have prepared a patch I would like to commit which adds to code to actually checkpoint a process. Thanks for the pointers about the string variables I tried to do implement it correctly. CRIU currently has problems with the new OOB usock but I will contact the CRIU developers about this error. U

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Adrian Reber
ell from its return code (or some other mechanism) > that it is being restarted versus continuing after checkpointing? > > > On Mon, Feb 17, 2014 at 2:00 PM, Ralph Castain wrote: > > > Great - looks fine to me!! > > > > > > On Feb 17, 2014, at 11:39 AM, Adria

Re: [OMPI devel] OPAL_CRS_* meaning

2014-02-18 Thread Adrian Reber
> You can see it used in the opal_cr_inc_core_prep() function in > opal/runtime/opal_cr.c > > -- Josh > > > > On Mon, Feb 17, 2014 at 9:28 AM, Adrian Reber wrote: > > > This is probably for Josh. What is the meaning of the OPAL_CRS_* enums? > > > >

Re: [OMPI devel] C/R and orte_oob

2014-02-18 Thread Adrian Reber
On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote: > On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote: > > I tried to implement something like you described. It is not yet event > > driven, but before continuing I wanted to get some feedback if it is at > >

Re: [OMPI devel] C/R and orte_oob

2014-02-18 Thread Adrian Reber
On Tue, Feb 18, 2014 at 06:39:12AM -0800, Ralph Castain wrote: > On Feb 18, 2014, at 6:24 AM, Adrian Reber wrote: > > > On Fri, Feb 14, 2014 at 02:51:51PM -0800, Ralph Castain wrote: > >> On Feb 13, 2014, at 11:26 AM, Adrian Reber wrote: > >>> I tried to imple

Re: [OMPI devel] CRS/CRIU: add code to actually checkpoint a process

2014-02-18 Thread Adrian Reber
e checkpoint only functionality in continue mode the patch can be checked in? Adrian > On Tue, Feb 18, 2014 at 4:08 AM, Adrian Reber wrote: > > > I think I do not understand your question. So far I have only implemented > > the > > checkpoint part and no

[OMPI devel] startup sstore orte/mca/ess/base/ess_base_std_tool.c

2014-02-21 Thread Adrian Reber
To restart a process using orte-restart I need sstore initialized when running as a tool. This is currently missing. The new code is #if OPAL_ENABLE_FT_CR == 1 and should only affect --with-ft builds. The following is the change I want to make: diff --git a/orte/mca/ess/base/ess_base_std_tool.c

[OMPI devel] mca_base_component_distill_checkpoint_ready variable

2014-02-21 Thread Adrian Reber
There is a variable in the FT code which is not defined and therefore currently #ifdef'd out. #if (OPAL_ENABLE_FT == 1) && (OPAL_ENABLE_FT_CR == 1) #ifdef ENABLE_FT_FIXED /* FIXME_FT * * the variable mca_base_component_distill_checkpoint_ready * was removed by commit 8181c8273c4

[OMPI devel] openmpi-1.7.5a1r30797 fails building on SL 5.5

2014-02-22 Thread Adrian Reber
On a Scientific Linux 5.5 system the nightly snapshot openmpi-1.7.5a1r30797 fails to build with following errors: Making all in romio make[3]: Entering directory `/tmp/adrian/openmpi-compile/openmpi-1.7.5a1r30797/build/ompi/mca/io/romio/romio' make[4]: Entering directory `/tmp/adrian/openmpi-co

[OMPI devel] Fix compiler warnings in FT code

2014-03-03 Thread Adrian Reber
I have a simple patch which fixes the remaining compiler warnings when running with '--with-ft': https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=4dee703a0a2e64972b0c35b7693c11a09f1fbe5f Does anybody see any problems with this patch? Adrian

Re: [OMPI devel] Fix compiler warnings in FT code

2014-03-03 Thread Adrian Reber
mewhere else? or do you have a different way to set those parameters? > > Other than that it looks good to me. > > > On Mon, Mar 3, 2014 at 5:29 AM, Adrian Reber wrote: > > > I have a simple patch which fixes the remaining compiler warnings when > > running with &#

Re: [OMPI devel] Fix compiler warnings in FT code

2014-03-03 Thread Adrian Reber
r how to set it up at the moment. > > > > > On Mon, Mar 3, 2014 at 7:25 AM, Adrian Reber wrote: > > > I removed a complete function because it was not used: > > > > ../../../../../orte/mca/sstore/stage/sstore_stage_component.c: At top

Re: [OMPI devel] mca_base_component_distill_checkpoint_ready variable

2014-03-03 Thread Adrian Reber
On Fri, Feb 21, 2014 at 10:12:54AM -0700, Nathan Hjelm wrote: > On Fri, Feb 21, 2014 at 05:21:10PM +0100, Adrian Reber wrote: > > There is a variable in the FT code which is not defined and therefore > > currently #ifdef'd out. > > > > #if (OPAL_ENABLE_FT

Re: [OMPI devel] Fix compiler warnings in FT code

2014-03-05 Thread Adrian Reber
Caching = Disabled [dcbz:02880] sstore:stage: open: Compression = Disabled [dcbz:02880] sstore:stage: open: Compression Delay= 0 [dcbz:02880] sstore:stage: open: Skip FileM (Debug Only) = False On Mon, Mar 03, 2014 at 05:42:13PM +0100, Adrian Reber wrote: > I w

Re: [OMPI devel] C/R and orte_oob

2014-03-06 Thread Adrian Reber
On Tue, Feb 18, 2014 at 03:46:58PM +0100, Adrian Reber wrote: > > >>> I tried to implement something like you described. It is not yet event > > >>> driven, but before continuing I wanted to get some feedback if it is at > > >>> least the right start:

Re: [OMPI devel] C/R and orte_oob

2014-03-07 Thread Adrian Reber
On Thu, Mar 06, 2014 at 07:47:22PM -0800, Ralph Castain wrote: > > Sorry for delay - yes, that looks like the right direction. I would > > suggest doing it via the current state machine, though, by simply > > defining another job or proc state in orte/mca/plm/plm_types.h, and > >

Re: [OMPI devel] C/R and orte_oob

2014-03-10 Thread Adrian Reber
On Fri, Mar 07, 2014 at 06:54:18AM -0800, Ralph Castain wrote: > > If you like, I can define the required code in the trunk and let you > > fill in the event functionality. > > That would be great. > >>> > >>> Thanks for your changes. When using --with-ft there are a few compil

[OMPI devel] orte-restart and PATH

2014-03-12 Thread Adrian Reber
I am using orte-restart without setting my PATH to my Open MPI installation. I am running /full/path/to/orte-restart and orte-restart tries to run mpirun to restart the process. This fails on my system because I do not have any mpirun in my PATH. Is it expected for an Open MPI installation to set u

[OMPI devel] usage of mca variables in orte-restart

2014-03-14 Thread Adrian Reber
I am now trying to run orte-restart. As far as I understand it orte-restart analyzes the checkpoint metadata and then tries to exec() mpirun which then starts opal-restart. During the startup of opal-restart (during initialize()) detection of the best CRS module is disabled: /* * Turn of

Re: [OMPI devel] usage of mca variables in orte-restart

2014-03-15 Thread Adrian Reber
egistered. > > -Nathan > > Please excuse the horrible Outlook top-posting. OWA sucks. > > > From: devel [devel-boun...@open-mpi.org] on behalf of Adrian Reber > [adr...@lisas.de] > Sent: Friday, March 14, 2014 3:05 PM > To: de

Re: [OMPI devel] usage of mca variables in orte-restart

2014-03-17 Thread Adrian Reber
On Fri, Mar 14, 2014 at 10:18:06PM +, Hjelm, Nathan T wrote: > The preferred way is to use mca_base_var_find and then call > mca_base_var_[set|get]_value. For performance sake we only look at the > environment when the variable is registered. I believe I found a bug in mca_base_var_set_value

Re: [OMPI devel] usage of mca variables in orte-restart

2014-03-18 Thread Adrian Reber
lue() to select the preferred crs module? Adrian On Mon, Mar 17, 2014 at 08:47:16AM -0600, Nathan Hjelm wrote: > Good catch. Fixing now. > > -Nathan > > On Mon, Mar 17, 2014 at 02:50:02PM +0100, Adrian Reber wrote: > > On Fri, Mar 14, 2014 at 10:18:06PM +, Hjelm, Nathan T w

[OMPI devel] Open MPI and CRIU stdout/stderr

2014-03-19 Thread Adrian Reber
Cross-posting to criu and openmpi devel mailinglists. To get fault tolerance back into Open MPI I added code to use criu as a checkpoint/restart tool. I can checkpoint a process successfully but I have troubles restarting it. CRIU has currently problems restoring the process which is probably rela

[OMPI devel] Restarting and Pipes

2014-04-10 Thread Adrian Reber
Trying to restart a process I see that orterun has three pipes connected to the processes running under its control (-np 1). orterun: orterun 11562 adrian 15w FIFO0,8 0t0 5304173 pipe orterun 11562 adrian 16r FIFO0,8 0t0 5304174 pipe orterun

Re: [OMPI devel] 1-question developer poll

2014-04-16 Thread Adrian Reber
On Wed, Apr 16, 2014 at 10:32:10AM +, Jeff Squyres (jsquyres) wrote: > What source code repository technology(ies) do you use for Open MPI > development? (indicate all that apply) > > - SVN > - Mercurial > - Git git Adrian pgp0Qj8qxYTHc.pgp Description: PGP signature

Re: [OMPI devel] RFC: Remove heterogeneous support

2014-04-25 Thread Adrian Reber
On Fri, Apr 25, 2014 at 10:29:36AM +, Jeff Squyres (jsquyres) wrote: > On Apr 25, 2014, at 6:13 AM, Gilles Gouaillardet > wrote: > > > it is possible to use qemu in order to emulate unavailable hardware. > > for what it's worth, i am now running a ppc64 qemu emulated virtual > > machine on a

Re: [OMPI devel] r31916 question

2014-06-19 Thread Adrian Reber
The fault tolerance code also needs additional changes because of this commit. I have the changes prepared but not committed. On Wed, Jun 18, 2014 at 03:45:11PM -0700, Ralph Castain wrote: > Huh - thought I got that. Sorry I missed it. Let me take a look and ensure > that the alps ras module is s

[OMPI devel] Segmentation fault in opal_fifo (MTT)

2016-03-01 Thread Adrian Reber
I have seen it before but it was not reproducible. I have now two segfaults in opal_fifo in today's MTT run on master and 2.x: https://mtt.open-mpi.org/index.php?do_redir=2270 https://mtt.open-mpi.org/index.php?do_redir=2271 The thing that is strange about the MTT output is that MTT does not det

Re: [OMPI devel] 1.10.3rc MTT failures

2016-04-25 Thread Adrian Reber
Errors like that (Win::Get_attr: Got wrong value for disp unit) are from my ppc64 machine: https://mtt.open-mpi.org/index.php?do_redir=2295 The MTT setup is checking out the tests from github directly: [Test get: ibm] module = SCM scm_module = Git scm_url = https://github.com/open-mpi/ompi-tests.

<    1   2