Dear all, We are migrating a code using OpenMPI from Ubuntu 10.04 to Ubuntu 12.04, and encouter some problems.
Bellow is a test code that work on Ubuntu 10.04, but fails on Ubuntu 12.04 The question is: is there a bug in the test code, or is it due to a bug in OpenMPI? Thanks for any help, David ============================================================================== OpenMPI versions ============================================================================== We use the default OpenMPI versions on both version of Ubuntu: $ apt-cache policy openmpi-bin # On Ubuntu 10.04 openmpi-bin: Installed: 1.4.1-2 Candidate: 1.4.1-2 Version table: *** 1.4.1-2 0 500 http://ubuntu.lucid.miroir.rocq.inria.fr/ lucid/universe Packages 100 /var/lib/dpkg/status $ apt-cache policy openmpi-bin # On Ubuntu 12.04 openmpi-bin: Installed: 1.4.3-2.1ubuntu3 Candidate: 1.4.3-2.1ubuntu3 Version table: *** 1.4.3-2.1ubuntu3 0 500 http://ubuntu.precise.miroir.rocq.inria.fr/ precise/universe amd64 Packages 100 /var/lib/dpkg/status ============================================================================== Error messages ============================================================================== The test code given bellow is working on Ubuntu 10.04, but sometimes fails on 12.04, with the folling output for example: -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 10 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- Error rank 10 tab[0] = 8 Error rank 11 tab[0] = 7 Error rank 12 tab[0] = 6 Error rank 13 tab[0] = 10 Error rank 14 tab[2] = 10 -------------------------------------------------------------------------- mpiexec has exited due to process rank 10 with PID 10284 on node saphene exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here). -------------------------------------------------------------------------- [saphene:10273] 4 more processes have sent help message help-mpi-api.txt / mpi-abort [saphene:10273] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages ============================================================================== Test code ============================================================================== Here is the code: #include <iostream> #include <mpi.h> using namespace std; int main(int argc, char** argv) { int ierr; ierr = MPI_Init(&argc, &argv); if(ierr != MPI_SUCCESS){ cout << "Error initializing mpi" << endl; MPI_Abort(MPI_COMM_WORLD, ierr); } // get the number of process int numProcess; MPI_Comm_size(MPI_COMM_WORLD, &numProcess); // get the rank of the process int rank; MPI_Comm_rank(MPI_COMM_WORLD, &rank); for(int it=0; it<20; it++){ // gather all rank in an array int *tab = new int[numProcess]; ierr = MPI_Allgather(&rank, 1, MPI_INT, tab, 1, MPI_INT, MPI_COMM_WORLD); if(ierr != MPI_SUCCESS){ cout << "Error MPI_Allgather rank:" << rank << endl; MPI_Abort(MPI_COMM_WORLD, ierr); } // check that everything is ok for(int i=0; i<numProcess; i++){ if(tab[i] != i){ cout << "Error rank " << rank << " tab[" << i << "] = " << tab[i] << endl; MPI_Abort(MPI_COMM_WORLD, 1); } } delete [] tab; } MPI_Finalize(); cout << "Exit normally" << endl; return 0; }