What versions of BLCR and Open MPI are you using?

Have you tried to checkpoint/restart a single (non-MPI) application with BLCR? BLCR ships with some examples, and I would suggest trying to make sure those work before moving onto Open MPI.

Typically this type of failure is the result of BLCRs cr_init() failing. Do you happen to see an error message like the following:
  Error: crs:blcr: module_init: cr_init failed

You could also try to see if something more subtle is happening by turning on verbosity with the following command line switch:
  -mca crs_base_verbose 10

Let me know how those go, and we can keep debugging from there.

-- Josh

On May 8, 2009, at 6:47 PM, Kritiraj Sajadah wrote:


Hi Gus,

Thanks for your email. I have /usr/local/bin included in my $PATH. (Not /usr/local/include - it was just a copying mistake).

I checked where mpicc and mpirun are and i got the following path

/usr/local/bin/mpirun
/usr/local/bin/mpicc

The BLCR  I am using was downloaded and installed seperately.

1) Do you think i may be using the wrong version of BLCR?.
There is a directory called blcr within the openmpi tarball (openmpi-1.3/opal/mca/crs/blcr). Should I use this?

2) DO you think it's better to install openmpi in /usr/local/openmpi and blcr in/usr/local/blcr?

3) If so, how do i uninstall the one i have already?

Thank you

Kritiraj



--- On Fri, 5/8/09, Gus Correa <g...@ldeo.columbia.edu> wrote:

From: Gus Correa <g...@ldeo.columbia.edu>
Subject: Re: [OMPI users] *** An error occurred in MPI_Init
To: "Open MPI Users" <us...@open-mpi.org>
Date: Friday, May 8, 2009, 6:33 PM
PS - Kritiraj

Reading your message more carefully, I saw that you did
this:

****
Open the $HOME/.bashrc and added the following:

PATH="/usr/local/include:$PATH"
LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"

****

However, this is what you should have done:

****
Open the $HOME/.bashrc and added the following:

PATH="/usr/local/bin:$PATH"
LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"

****

Note that /usr/local/bin, not /usr/local/include should be
pre-pended to your PATH!


Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Gus Correa wrote:
Hi Kritiraj

This looks like as many other errors reported on this
list
that are caused by using the wrong MPI compiler
wrappers
or the wrong mpirun/mpiexec.
Typically this is caused by a PATH environment
variable that
is pointing to the wrong executables (mpicc, mpirun).
Most Linux distributions, compilers, etc, come with
their
own MPI versions, and this can be very confusing.

Try using full path names for mpicc and for mpirun.
That is bullet proof method to get exactly what you
want.
In your case use /usr/local/bin (as you configured
with --prefix=/usr/local).
(Actually, I prefer to configure with a more
distinctive
name to the prefix, something like
/usr/local/openmpi-1.3.2,
to avoid any confusion with other MPIs.)

You can also try "which mpicc" and "which mpirun",
or "mpicc --showme" and "mpirun --help" to get a bit
more
information about what you are really using.

I hope this helps.
Gus Correa

---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia
University
Palisades, NY, 10964-8000 - USA

---------------------------------------------------------------------


Kritiraj Sajadah wrote:
Dear All,
           I
have install and configured openmpi with BLCR on my laptop:

1) configure and install blcr

./configure --prefix=/usr/local/
--enable-debug=yes --enable-libcr-tracing=yes
--enable-kernel-tracing=yes --enable-testsuite=yes
--enable-all-static=yes --enable-static=yes

make
make install

2) configure and install openmpi

./configure --prefix=/usr/local/ --enable-picky
--enable-debug --enable-mpi-profile --enable-mpi-cxx
--enable-pretty-print-stacktrace --enable-binaries
--enable-trace --enable-static=yes --enable-debug
--with-devel-headers=1 --with-mpi-param-check=always
--with-ft=cr --enable-ft-thread --with-blcr=/usr/local/
--with-blcr-libdir=/usr/local/lib --enable-mpi-threads=yes

make all install

3) add the environment variables.


Open the $HOME/.bashrc and added the following:

PATH="/usr/local/include:$PATH"
LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"

Now the problem:

I am trying to checkpoint the following MPI
application:

#include <stdio.h>
#include <mpi.h>

main(int argc, char **argv)
{
    int node;

   MPI_Init(&argc,&argv);
    MPI_Comm_rank(MPI_COMM_WORLD,
&node);

   printf("Hello World from Node
%d\n",node);

  MPI_Finalize();
}

I am running mpirun as follows:

raj-laptop> mpirun -am ft-enable-cr
helloworld.

The errors are as follows:


--------------------------------------------------------------------------

It looks like opal_init failed for some reason;
your parallel process is
likely to abort.  There are many reasons that
a parallel process can
fail during opal_init; some of which are due to
configuration or
environment problems.  This failure appears
to be an internal failure;
here's some additional information (which may only
be relevant to an
Open MPI developer):

   opal_cr_init() failed failed
   --> Returned value -1 instead
of OPAL_SUCCESS

--------------------------------------------------------------------------

*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now
abort)
[raj-laptop:9439] Abort before MPI_INIT completed
successfully; not able to guarantee that all other processes
were killed!
[raj-laptop:09439] [[INVALID],INVALID]
ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line
77

--------------------------------------------------------------------------

It looks like MPI_INIT failed for some reason;
your parallel process is
likely to abort.  There are many reasons that
a parallel process can
fail during MPI_INIT; some of which are due to
configuration or environment
problems.  This failure appears to be an
internal failure; here's some
additional information (which may only be relevant
to an Open MPI
developer):

   ompi_mpi_init: orte_init failed
   --> Returned "Error" (-1)
instead of "Success" (0)

--------------------------------------------------------------------------


Is it something to do with me running it on a
single node; i.e my laptop? or is it something to do with
configurations or libraries?


Any help will be very appreciated.

Regards,

Raj




   _______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to