Good heavens - where did you find something that old? Can you use a more recent version?
Sent from my iPad On Feb 13, 2012, at 4:45 AM, "Richard Bardwell" <rich...@sharc.co.uk> wrote: > Gentlemen > > I am struggling to get MPI working when the hostfile contains different nodes. > > I get the error below. Any ideas ?? I can ssh without password between the two > > nodes. I am running 1.2.8 MPI on both machines. > > Any help most appreciated !!!!! > > > > MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst > /home/sharc/MPITEST/v8_mpi_test/mpitest > > Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67 > > [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file > runtime/orte_init_stage1.c at line 182 > > -------------------------------------------------------------------------- > > It looks like orte_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during orte_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > orte_rml_base_select failed > > --> Returned value -13 instead of ORTE_SUCCESS > > -------------------------------------------------------------------------- > > [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file > runtime/orte_system_init.c at line 42 > > [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file > runtime/orte_init.c at line 52 > > Open RTE was unable to initialize properly. The error occured while > > attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS. > > [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0] > > [linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs > > [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file > base/pls_base_orted_cmds.c at line 275 > > [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c > at line 1158 > > [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at > line 90 > > [linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start as > expected. > > [linux-tmpw:10489] ERROR: There may be more information available from > > [linux-tmpw:10489] ERROR: the remote shell (see above). > > [linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 243. > > [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0] > > [linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit > > [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file > base/pls_base_orted_cmds.c at line 188 > > [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c > at line 1190 > > -------------------------------------------------------------------------- > > mpiexec was unable to cleanly terminate the daemons for this job. Returned > value Timeout instead of ORTE_SUCCESS. > > -------------------------------------------------------------------------- > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users