Hello,
I am having trouble with a script that calls mpi.  Basically my
problem distills to wanting to call a script with:

mpirun -np # ./script.sh

where script.sh looks like:
#!/bin/bash
mpirun -np 2 ./mpiprogram

Whenever I invoke script.sh normally (as ./script.sh for instance) it
works fine, but if I do mpirun -np 2 ./script.sh I get the following
error:

[ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file rml_oob_send.c at line 105
[ppv.stanford.edu:08814] [[27860,1],0] could not get route to
[[INVALID],INVALID]
[ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file base/plm_base_proxy.c at line 86

I have also tried running with mpirun -d to get some debugging info
and it appears that the proctable is not being created for the second
mpirun.  The command hangs like so:

[ppv.stanford.edu:08823] procdir:
/tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0/0
[ppv.stanford.edu:08823] jobdir:
/tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0
[ppv.stanford.edu:08823] top: openmpi-sessions-sluke@ppv.stanford.edu_0
[ppv.stanford.edu:08823] tmp: /tmp
[ppv.stanford.edu:08823] [[27855,0],0] node[0].name ppv daemon 0 arch ffc91200
[ppv.stanford.edu:08823] Info: Setting up debugger process table for
applications
  MPIR_being_debugged = 0
  MPIR_debug_state = 1
  MPIR_partial_attach_ok = 1
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 1
  MPIR_proctable:
    (i, host, exe, pid) = (0, ppv.stanford.edu,
/home/sluke/maintenance/openmpi-1.3.3/examples/./shell.sh, 8824)
[ppv.stanford.edu:08825] procdir:
/tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/1/0
[ppv.stanford.edu:08825] jobdir:
/tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/1
[ppv.stanford.edu:08825] top: openmpi-sessions-sluke@ppv.stanford.edu_0
[ppv.stanford.edu:08825] tmp: /tmp
[ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file rml_oob_send.c at line 105
[ppv.stanford.edu:08825] [[27855,1],0] could not get route to
[[INVALID],INVALID]
[ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file base/plm_base_proxy.c at line 86
[ppv.stanford.edu:08825] Info: Setting up debugger process table for
applications
  MPIR_being_debugged = 0
  MPIR_debug_state = 1
  MPIR_partial_attach_ok = 1
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 0
  MPIR_proctable:


In this case, it does not matter what the ultimate mpiprogram I try to
run is, the shell script fails in the same way regardless (I've tried
the hello_f90 executable from the openmpi examples directory).  Here
are some details of my setup:

I have built openmpi 1.3.3 with the intel fortran in c compilers
(version 11.1).  The machine uses rocks with the SGE scheduler, so I
have run autoconf with ./configure --prefix=/home/sluke --with-sge,
however this problem persists even if I am running on the head node
outside of the scheduler.  I am attaching the resulting config.log to
this email as well as output to ompi_info --all and ifconfig.  I hope
this gives the experts on the list enough to go from, but I will be
happy to provide any more information that might be helpful.

Luke Shulenburger
Geophysical Laboratory
Carnegie Institution of Washington


PS I have tried this on a machine with openmpi-1.2.6 and cannot
reproduce the error, however on a second machine with openmpi-1.3.2 I
have the same problem.

Attachment: config.log.gz
Description: GNU Zip compressed data

Attachment: ifconfigout
Description: Binary data

Attachment: ompi_info
Description: Binary data

Reply via email to