Normally, one does simply set the ld_library_path in your environment to point to the right thing. Alternatively, you could configure OMPI with
--enable-mpirun-prefix-by-default This tells OMPI to automatically add the prefix you configured the system with to your ld_library_path and path envars. It should solve your problem, if you don't want to simply set those values in your environment anyway. Ralph On Wed, Oct 28, 2009 at 2:10 PM, Luke Shulenburger <lshulenbur...@gmail.com>wrote: > Thanks for the quick reply. This leads me to another issue I have > been having with openmpi as it relates to sge. The "tight > integration" works where I do not have to give mpirun a hostfile when > I use the scheduler, but it does not seem to be passing on my > environment variables. Specifically because I used intel compilers to > compile openmpi, I have to be sure to set the LD_LIBRARY_PATH > correctly in my job submission script or openmpi will not run (giving > the error discussed in the FAQ). Where I am a little lost is whether > this is a problem with the way I built openmpi or whether it is a > configuration problem with sge. > > This may be unrelated to my previous problem, but the similarities > with the environment variables made me think of it. > > Thanks for your consideration, > Luke Shulenburger > Geophysical Laboratory > Carnegie Institution of Washington > > On Wed, Oct 28, 2009 at 3:48 PM, Ralph Castain <r...@open-mpi.org> wrote: > > I'm afraid we have never really supported this kind of nested invocations > of > > mpirun. If it works with any version of OMPI, it is totally a fluke - it > > might work one time, and then fail the next. > > > > The problem is that we pass envars to the launched processes to control > > their behavior, and these conflict with what mpirun needs. We have tried > > various scrubbing mechanisms (i.e., having mpirun start out by scrubbing > the > > environment of envars that would have come from the initial mpirun, but > they > > all have the unfortunate possibility of removing parameters provided by > the > > user - and that can cause its own problems. > > > > I don't know if we will ever support nested operations - occasionally, I > do > > give it some thought, but have yet to find a foolproof solution. > > > > Ralph > > > > > > On Wed, Oct 28, 2009 at 1:11 PM, Luke Shulenburger < > lshulenbur...@gmail.com> > > wrote: > >> > >> Hello, > >> I am having trouble with a script that calls mpi. Basically my > >> problem distills to wanting to call a script with: > >> > >> mpirun -np # ./script.sh > >> > >> where script.sh looks like: > >> #!/bin/bash > >> mpirun -np 2 ./mpiprogram > >> > >> Whenever I invoke script.sh normally (as ./script.sh for instance) it > >> works fine, but if I do mpirun -np 2 ./script.sh I get the following > >> error: > >> > >> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is > >> attempting to be sent to a process whose contact information is > >> unknown in file rml_oob_send.c at line 105 > >> [ppv.stanford.edu:08814] [[27860,1],0] could not get route to > >> [[INVALID],INVALID] > >> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is > >> attempting to be sent to a process whose contact information is > >> unknown in file base/plm_base_proxy.c at line 86 > >> > >> I have also tried running with mpirun -d to get some debugging info > >> and it appears that the proctable is not being created for the second > >> mpirun. The command hangs like so: > >> > >> [ppv.stanford.edu:08823] procdir: > >> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0/0 > >> [ppv.stanford.edu:08823] jobdir: > >> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0 > >> [ppv.stanford.edu:08823] top: openmpi-sessions-sluke@ppv.stanford.edu_0 > >> [ppv.stanford.edu:08823] tmp: /tmp > >> [ppv.stanford.edu:08823] [[27855,0],0] node[0].name ppv daemon 0 arch > >> ffc91200 > >> [ppv.stanford.edu:08823] Info: Setting up debugger process table for > >> applications > >> MPIR_being_debugged = 0 > >> MPIR_debug_state = 1 > >> MPIR_partial_attach_ok = 1 > >> MPIR_i_am_starter = 0 > >> MPIR_proctable_size = 1 > >> MPIR_proctable: > >> (i, host, exe, pid) = (0, ppv.stanford.edu, > >> /home/sluke/maintenance/openmpi-1.3.3/examples/./shell.sh, 8824) > >> [ppv.stanford.edu:08825] procdir: > >> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/1/0 > >> [ppv.stanford.edu:08825] jobdir: > >> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/1 > >> [ppv.stanford.edu:08825] top: openmpi-sessions-sluke@ppv.stanford.edu_0 > >> [ppv.stanford.edu:08825] tmp: /tmp > >> [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is > >> attempting to be sent to a process whose contact information is > >> unknown in file rml_oob_send.c at line 105 > >> [ppv.stanford.edu:08825] [[27855,1],0] could not get route to > >> [[INVALID],INVALID] > >> [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is > >> attempting to be sent to a process whose contact information is > >> unknown in file base/plm_base_proxy.c at line 86 > >> [ppv.stanford.edu:08825] Info: Setting up debugger process table for > >> applications > >> MPIR_being_debugged = 0 > >> MPIR_debug_state = 1 > >> MPIR_partial_attach_ok = 1 > >> MPIR_i_am_starter = 0 > >> MPIR_proctable_size = 0 > >> MPIR_proctable: > >> > >> > >> In this case, it does not matter what the ultimate mpiprogram I try to > >> run is, the shell script fails in the same way regardless (I've tried > >> the hello_f90 executable from the openmpi examples directory). Here > >> are some details of my setup: > >> > >> I have built openmpi 1.3.3 with the intel fortran in c compilers > >> (version 11.1). The machine uses rocks with the SGE scheduler, so I > >> have run autoconf with ./configure --prefix=/home/sluke --with-sge, > >> however this problem persists even if I am running on the head node > >> outside of the scheduler. I am attaching the resulting config.log to > >> this email as well as output to ompi_info --all and ifconfig. I hope > >> this gives the experts on the list enough to go from, but I will be > >> happy to provide any more information that might be helpful. > >> > >> Luke Shulenburger > >> Geophysical Laboratory > >> Carnegie Institution of Washington > >> > >> > >> PS I have tried this on a machine with openmpi-1.2.6 and cannot > >> reproduce the error, however on a second machine with openmpi-1.3.2 I > >> have the same problem. > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >