Dear all,

We are migrating a code using OpenMPI from Ubuntu 10.04 to Ubuntu 12.04, and
encouter some problems.

Bellow is a test code that work on Ubuntu 10.04, but fails on Ubuntu 12.04

The question is: is there a bug in the test code, or is it due to a bug in
OpenMPI?

Thanks for any help,
David

==============================================================================
OpenMPI versions
==============================================================================

We use the default OpenMPI versions on both version of Ubuntu:

$ apt-cache policy openmpi-bin # On Ubuntu 10.04
openmpi-bin:
  Installed: 1.4.1-2
  Candidate: 1.4.1-2
  Version table:
 *** 1.4.1-2 0
        500 http://ubuntu.lucid.miroir.rocq.inria.fr/ lucid/universe Packages
        100 /var/lib/dpkg/status

$ apt-cache policy openmpi-bin # On Ubuntu 12.04
openmpi-bin:
  Installed: 1.4.3-2.1ubuntu3
  Candidate: 1.4.3-2.1ubuntu3
  Version table:
 *** 1.4.3-2.1ubuntu3 0
        500 http://ubuntu.precise.miroir.rocq.inria.fr/ precise/universe amd64 
Packages
        100 /var/lib/dpkg/status

==============================================================================
Error messages
==============================================================================

The test code given bellow is working on Ubuntu 10.04, but sometimes fails on
12.04, with the folling output for example:

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 10 in communicator MPI_COMM_WORLD 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Error rank 10 tab[0] = 8
Error rank 11 tab[0] = 7
Error rank 12 tab[0] = 6
Error rank 13 tab[0] = 10
Error rank 14 tab[2] = 10
--------------------------------------------------------------------------
mpiexec has exited due to process rank 10 with PID 10284 on
node saphene exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--------------------------------------------------------------------------
[saphene:10273] 4 more processes have sent help message help-mpi-api.txt / 
mpi-abort
[saphene:10273] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
help / error messages

==============================================================================
Test code
==============================================================================

Here is the code:

#include <iostream>
#include <mpi.h>

using namespace std;

int main(int argc, char** argv)
{
        int ierr;
        ierr = MPI_Init(&argc, &argv);

        if(ierr != MPI_SUCCESS){
                cout << "Error initializing mpi" << endl;
                MPI_Abort(MPI_COMM_WORLD, ierr);
        }

        // get the number of process
        int numProcess;
        MPI_Comm_size(MPI_COMM_WORLD, &numProcess);

        // get the rank of the process
        int rank;
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        
        for(int it=0; it<20; it++){
                // gather all rank in an array
                int *tab = new int[numProcess];
                ierr = MPI_Allgather(&rank, 1, MPI_INT, tab, 1, MPI_INT, 
MPI_COMM_WORLD);

          if(ierr != MPI_SUCCESS){
                  cout << "Error MPI_Allgather rank:" << rank << endl;
                  MPI_Abort(MPI_COMM_WORLD, ierr);
          }

                // check that everything is ok
                for(int i=0; i<numProcess; i++){
                        if(tab[i] != i){
                                cout << "Error rank " << rank << " tab[" << i 
<< "] = " << tab[i] << endl;
        MPI_Abort(MPI_COMM_WORLD, 1);
                        }
                }
                delete [] tab;  
        }

        MPI_Finalize();
        cout << "Exit normally" << endl;
        return 0;
}

Reply via email to