I'm unable to reproduce this problem. :( I tried both the svn head
(r17288) and the tarball that you were using (openmpi-1.3a1r17175) on
a similar system without problem.
The error you are seeing may be caused by old connectivity
information in the session directory. You may want to make sure that /
tmp does not contain any "openmpi-session*" directories before
starting mpirun.
Other than that you may want to try a clean build of Open MPI just to
make sure that you are not seeing anything odd resulting from old
Open MPI install files.
let me know if that helps.
-- Josh
On Jan 24, 2008, at 12:38 PM, Wong, Wayne wrote:
I'm having some difficulty geting the Open MPI checkpoint/restart
fault tolerance working. I have compiled Open MPI with the "--with-
ft=cr" flag, but when I attempt to run my test program (ring), the
ompi-checkpoint command fails. I have verified that the test
program works fine without the fault tolerance enabled. Here are
the details:
[me@dev1 ~]$ mpirun -np 4 -am ft-enable-cr ring
[me@dev1 ~]$ ps -efa | grep mpirun
me 3052 2820 1 08:25 pts/2 00:00:00 mpirun -np 4 -am
ft-enable-cr ring
[me@dev1 ~]$ ompi-checkpoint 3052
[dev1.acme.local:03060] [NO-NAME] ORTE_ERROR_LOG: Unknown
error: 5854512 in file sds_singleton_module.c at line 50
[dev1.acme.local:03060] [NO-NAME] ORTE_ERROR_LOG: Unknown
error: 5854512 in file runtime/orte_init.c at line 311
----------------------------------------------------------------------
----
It looks like orte_init failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel
process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal
failure;
here's some additional information (which may only be relevant
to an
Open MPI developer):
orte_sds_base_set_name failed
--> Returned value Unknown error: 5854512 (5854512) instead
of ORTE_SUCCESS
----------------------------------------------------------------------
----
Any help would be appreciated. Thanks.
<ompi_info.txt.gz><config.log.gz>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users