I was initially using 1.1.2 and moved to 1.2b2 because of a hang on
MPI_Bcast() which 1.2b2 reports to fix, and seemed to have done so. My
compute nodes are 2 dual core xeons on myrinet with mx. The problem is
trying to get ompi running on mx only. My machine file is as follows ...

node-1 slots=4 max-slots=4
node-2 slots=4 max-slots=4
node-3 slots=4 max-slots=4

'mpirun' with the minimum number of processes in order to get the error
...
        mpirun --prefix /usr/local/openmpi-1.2b2 -x LD_LIBRARY_PATH
--hostfile ./h1-3 -np 2 --mca btl mx,self ./cpi

Results with the following output ...

:~/Projects/ompi/cpi$ mpirun --prefix /usr/local/openmpi-1.2b2 -x
LD_LIBRARY_PATH --hostfile ./h1-3 -np 2 --mca btl mx,self ./cpi

------------------------------------------------------------------------
--
Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of 
usable components.
------------------------------------------------------------------------
--
------------------------------------------------------------------------
--
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of 
usable components.
------------------------------------------------------------------------
--
------------------------------------------------------------------------
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------------
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
------------------------------------------------------------------------
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
------------------------------------------------------------------------
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 1 with PID 0 on node node-1 exited on
signal 1.

---------------- end of output -----------------------

I get that same error w/ the examples included in the ompi-1.2b2
distrib. However, if I change the mca params as such ...

        mpirun --prefix /usr/local/openmpi-1.2b2 -x LD_LIBRARY_PATH
--hostfile ./h1-3 -np 5 --mca pml cm ./cpi

Running up to -np 5 works (one of the processes does get put on the 2nd
node), but running with -np 6 fails with the following ...

[node-2:10464] mx_connect fail for node-2:0 with key aaaaffff (error
Endpoint closed or not connectable!)
[node-2:10463] mx_connect fail for node-2:0 with key aaaaffff (error
Endpoint closed or not connectable!)
------------------------------------------------------------------------
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
------------------------------------------------------------------------
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
------------------------------------------------------------------------
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
------------------------------------------------------------------------
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 0 with PID 0 on node node-1 exited on
signal 1.
3 additional processes aborted (not shown)

----------------- end of mpirun output ---------------------

I don't believe there'a anything wrong w/ the hardware as I can ping on
mx between this failed node and the master fine. So I tried a different
set of 3 nodes and I got the same error, it always fails on the 2nd node
of any group of nodes I choose.













Reply via email to