I'm unable to reproduce this problem. :( I tried both the svn head (r17288) and the tarball that you were using (openmpi-1.3a1r17175) on a similar system without problem.

The error you are seeing may be caused by old connectivity information in the session directory. You may want to make sure that / tmp does not contain any "openmpi-session*" directories before starting mpirun.

Other than that you may want to try a clean build of Open MPI just to make sure that you are not seeing anything odd resulting from old Open MPI install files.

let me know if that helps.

-- Josh

On Jan 24, 2008, at 12:38 PM, Wong, Wayne wrote:

I'm having some difficulty geting the Open MPI checkpoint/restart fault tolerance working. I have compiled Open MPI with the "--with- ft=cr" flag, but when I attempt to run my test program (ring), the ompi-checkpoint command fails. I have verified that the test program works fine without the fault tolerance enabled. Here are the details:

     [me@dev1 ~]$ mpirun -np 4 -am ft-enable-cr ring
     [me@dev1 ~]$ ps -efa | grep mpirun
me 3052 2820 1 08:25 pts/2 00:00:00 mpirun -np 4 -am ft-enable-cr ring


     [me@dev1 ~]$ ompi-checkpoint 3052
[dev1.acme.local:03060] [NO-NAME] ORTE_ERROR_LOG: Unknown error: 5854512 in file sds_singleton_module.c at line 50 [dev1.acme.local:03060] [NO-NAME] ORTE_ERROR_LOG: Unknown error: 5854512 in file runtime/orte_init.c at line 311 ---------------------------------------------------------------------- ---- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can
     fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an
     Open MPI developer):

       orte_sds_base_set_name failed
--> Returned value Unknown error: 5854512 (5854512) instead of ORTE_SUCCESS

---------------------------------------------------------------------- ----
Any help would be appreciated.  Thanks.
<ompi_info.txt.gz><config.log.gz>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to