The problem was that orted couldn't find ssh nor rsh on that machine. I've added my installation to PATH and it now works. So one question: I will definitely not use MPI_Comm_spawn or any related stuff. Do I need this ssh? If not, is there any way to say orted that it shouldn't be looking for ssh because it won't need it?
Regards, Grzegorz Maj 2010/7/7 Ralph Castain <r...@open-mpi.org>: > Check your path and ld_library_path- looks like you are picking up some stale > binary for orted and/or stale libraries (perhaps getting the default OMPI > instead of 1.4.2) on the machine where it fails. > > On Jul 7, 2010, at 7:44 AM, Grzegorz Maj wrote: > >> Hi, >> I was trying to run some MPI processes as a singletons. On some of the >> machines they crash on MPI_Init. I use exactly the same binaries of my >> application and the same installation of openmpi 1.4.2 on two machines >> and it works on one of them and fails on the other one. This is the >> command and its output (test is a simple application calling only >> MPI_Init and MPI_Finalize): >> >> LD_LIBRARY_PATH=/home/gmaj/openmpi/lib ./test >> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >> ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 161 >> -------------------------------------------------------------------------- >> It looks like orte_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during orte_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> orte_plm_base_select failed >> --> Returned value Not found (-13) instead of ORTE_SUCCESS >> -------------------------------------------------------------------------- >> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >> ../../orte/runtime/orte_init.c at line 132 >> -------------------------------------------------------------------------- >> It looks like orte_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during orte_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> orte_ess_set_name failed >> --> Returned value Not found (-13) instead of ORTE_SUCCESS >> -------------------------------------------------------------------------- >> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >> ../../orte/orted/orted_main.c at line 323 >> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a >> daemon on the local node in file >> ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line >> 381 >> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a >> daemon on the local node in file >> ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line >> 143 >> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a >> daemon on the local node in file ../../orte/runtime/orte_init.c at >> line 132 >> -------------------------------------------------------------------------- >> It looks like orte_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during orte_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> orte_ess_set_name failed >> --> Returned value Unable to start a daemon on the local node (-128) >> instead of ORTE_SUCCESS >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_mpi_init: orte_init failed >> --> Returned "Unable to start a daemon on the local node" (-128) >> instead of "Success" (0) >> -------------------------------------------------------------------------- >> *** An error occurred in MPI_Init >> *** before MPI was initialized >> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) >> [host01:21865] Abort before MPI_INIT completed successfully; not able >> to guarantee that all other processes were killed! >> >> >> Any ideas on this? >> >> Thanks, >> Grzegorz Maj >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >