The problem here is that you are attempting to start the application
processes without using our mpirun.  We call this a "standalone" launch.

Unfortunately, OMPI doesn't currently understand how to do a standalone
launch - ORTE will get confused and abort, as you experienced. There are two
ways to fix this:

1. someone could write a Globus launcher for ORTE. I don't think this would
be terribly hard. You would then use our mpirun to start the job after
getting an allocation via some grid-compatible resource manager.

2. once we get standalone operations working, you could do what you tried.
You will likely have to write an ESS component for Globus so the processes
can figure out their rank.

I have done some prototyping for standalone launch, and expect to have at
least one working example in our development trunk later this month.
However, we currently don't plan to release standalone support until
probably 1.3.2, which likely won't come out for a few months.

Hope that helps
Ralph


On 3/14/08 5:40 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> I don't know if anyone has tried to run Open MPI with globus before.
> 
> One requirement that Open MPI currently has is that all nodes must be
> reachable to each other via TCP.  Is that true in your globus
> environment?
> 
> 
> 
> On Mar 10, 2008, at 11:01 AM, Christoph Spielmann wrote:
> 
>> Hi everybody!
>> 
>> I try to get OpenMPI and Globus to cooperate. These are the steps i
>> executed in order to get OpenMPI working:
>> 
>> € export PATH=/opt/openmpi/bin/:$PATH
>> € /opt/globus/setup/globus/setup-globus-job-manager-fork
>> checking for mpiexec... /opt/openmpi/bin//mpiexec
>> checking for mpirun... /opt/openmpi/bin//mpirun
>> find-fork-tools: creating ./config.status
>> config.status: creating fork.pm
>> € restart VDT (includes GRAM, WSGRAM, mysql, rls...)
>> As you can see the necessary OpenMPI-executables are recognized
>> correctly by setup-globus-job-manager-fork. But when i actually try
>> to execute a simple mpi-program using globus-job-run i get this:
>> 
>> globus-job-run localhost -x '(jobType=mpi)' -np 2 -s ./hypercube 0
>> [hydra:10168] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/
>> orte_init_stage1.c at line 312
>> --------------------------------------------------------------------------
>> It looks like orte_init failed for some reason; your parallel
>> process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>> 
>>   orte_pls_base_select failed
>>   --> Returned value -1 instead of ORTE_SUCCESS
>> 
>> --------------------------------------------------------------------------
>> [hydra:10168] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/
>> orte_system_init.c at line 42
>> [hydra:10168] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/
>> orte_init.c at line 52
>> --------------------------------------------------------------------------
>> Open RTE was unable to initialize properly.  The error occured while
>> attempting to orte_init().  Returned value -1 instead of ORTE_SUCCESS.
>> --------------------------------------------------------------------------
>> 
>> The MPI-program itself is okey:
>> 
>> which mpirun && mpirun -np 2 hypercube 0
>> /opt/openmpi/bin/mpirun
>> Process 0 received broadcast message 'MPI_Broadcast with hypercube
>> topology' from Process 0
>> Process 1 received broadcast message 'MPI_Broadcast with hypercube
>> topology' from Process 0
>> 
>> 
>>> From what i read in the mailing list i think that something is
>> wrong with the pls and globus. But i have no idea what could be
>> wrong not to speak of how it could be fixed ;). so if someone would
>> have an idea how this could be fixed, i'd be glad to hear it.
>> 
>> Regards,
>> 
>> Christoph
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



Reply via email to