I was initially using 1.1.2 and moved to 1.2b2 because of a hang on MPI_Bcast() which 1.2b2 reports to fix, and seemed to have done so. My compute nodes are 2 dual core xeons on myrinet with mx. The problem is trying to get ompi running on mx only. My machine file is as follows ...
node-1 slots=4 max-slots=4 node-2 slots=4 max-slots=4 node-3 slots=4 max-slots=4 'mpirun' with the minimum number of processes in order to get the error ... mpirun --prefix /usr/local/openmpi-1.2b2 -x LD_LIBRARY_PATH --hostfile ./h1-3 -np 2 --mca btl mx,self ./cpi Results with the following output ... :~/Projects/ompi/cpi$ mpirun --prefix /usr/local/openmpi-1.2b2 -x LD_LIBRARY_PATH --hostfile ./h1-3 -np 2 --mca btl mx,self ./cpi ------------------------------------------------------------------------ -- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. ------------------------------------------------------------------------ -- ------------------------------------------------------------------------ -- Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. ------------------------------------------------------------------------ -- ------------------------------------------------------------------------ -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) ------------------------------------------------------------------------ -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) ------------------------------------------------------------------------ -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) ------------------------------------------------------------------------ -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) mpirun noticed that job rank 1 with PID 0 on node node-1 exited on signal 1. ---------------- end of output ----------------------- I get that same error w/ the examples included in the ompi-1.2b2 distrib. However, if I change the mca params as such ... mpirun --prefix /usr/local/openmpi-1.2b2 -x LD_LIBRARY_PATH --hostfile ./h1-3 -np 5 --mca pml cm ./cpi Running up to -np 5 works (one of the processes does get put on the 2nd node), but running with -np 6 fails with the following ... [node-2:10464] mx_connect fail for node-2:0 with key aaaaffff (error Endpoint closed or not connectable!) [node-2:10463] mx_connect fail for node-2:0 with key aaaaffff (error Endpoint closed or not connectable!) ------------------------------------------------------------------------ -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Error" (-1) instead of "Success" (0) ------------------------------------------------------------------------ -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) ------------------------------------------------------------------------ -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Error" (-1) instead of "Success" (0) ------------------------------------------------------------------------ -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) mpirun noticed that job rank 0 with PID 0 on node node-1 exited on signal 1. 3 additional processes aborted (not shown) ----------------- end of mpirun output --------------------- I don't believe there'a anything wrong w/ the hardware as I can ping on mx between this failed node and the master fine. So I tried a different set of 3 nodes and I got the same error, it always fails on the 2nd node of any group of nodes I choose.