I don't know if anyone has tried to run Open MPI with globus before.
One requirement that Open MPI currently has is that all nodes must be
reachable to each other via TCP. Is that true in your globus
environment?
On Mar 10, 2008, at 11:01 AM, Christoph Spielmann wrote:
Hi everybody!
I try to get OpenMPI and Globus to cooperate. These are the steps i
executed in order to get OpenMPI working:
• export PATH=/opt/openmpi/bin/:$PATH
• /opt/globus/setup/globus/setup-globus-job-manager-fork
checking for mpiexec... /opt/openmpi/bin//mpiexec
checking for mpirun... /opt/openmpi/bin//mpirun
find-fork-tools: creating ./config.status
config.status: creating fork.pm
• restart VDT (includes GRAM, WSGRAM, mysql, rls...)
As you can see the necessary OpenMPI-executables are recognized
correctly by setup-globus-job-manager-fork. But when i actually try
to execute a simple mpi-program using globus-job-run i get this:
globus-job-run localhost -x '(jobType=mpi)' -np 2 -s ./hypercube 0
[hydra:10168] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/
orte_init_stage1.c at line 312
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_pls_base_select failed
--> Returned value -1 instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[hydra:10168] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/
orte_system_init.c at line 42
[hydra:10168] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/
orte_init.c at line 52
--------------------------------------------------------------------------
Open RTE was unable to initialize properly. The error occured while
attempting to orte_init(). Returned value -1 instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
The MPI-program itself is okey:
which mpirun && mpirun -np 2 hypercube 0
/opt/openmpi/bin/mpirun
Process 0 received broadcast message 'MPI_Broadcast with hypercube
topology' from Process 0
Process 1 received broadcast message 'MPI_Broadcast with hypercube
topology' from Process 0
>From what i read in the mailing list i think that something is
wrong with the pls and globus. But i have no idea what could be
wrong not to speak of how it could be fixed ;). so if someone would
have an idea how this could be fixed, i'd be glad to hear it.
Regards,
Christoph
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems