Dear all, I have not used OpenMPI much before, but am maintaining a large legacy application. We noticed a bug to do with a call to MPI_Allgather as summarised in this post to Stackoverflow: http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
In the process of looking further into the problem, I noticed that the following function results in strange behaviour. void test_all_gather() { struct _TEST_ALL_GATHER { int node; }; int ierr, size, rank; ierr = MPI_Comm_size(MPI_COMM_WORLD, &size); ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank); struct _TEST_ALL_GATHER local; struct _TEST_ALL_GATHER *gathered; gathered = (struct _TEST_ALL_GATHER*) malloc(size * sizeof(*gathered)); local.node = rank; MPI_Allgather(&local, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE, MPI_COMM_WORLD); int i; for (i = 0; i < numnodes; ++i) { (void) printf("gathered[%d].node = %d\n", i, gathered[i].node); } FREE(gathered); } At one point, this function printed the following: gathered[0].node = 2 gathered[1].node = 3 gathered[2].node = 2 gathered[3].node = 3 gathered[4].node = 4 gathered[5].node = 5 Can anyone suggest a place to start looking into why this might be happening? There is a section of the code that calls MPI_Comm_split, but I am not sure if that is related... Running on Ubuntu 11.10 and a summary of ompi_info: Package: Open MPI buildd@allspice Distribution Open MPI: 1.4.3 Open MPI SVN revision: r23834 Open MPI release date: Oct 05, 2010 Open RTE: 1.4.3 Open RTE SVN revision: r23834 Open RTE release date: Oct 05, 2010 OPAL: 1.4.3 OPAL SVN revision: r23834 OPAL release date: Oct 05, 2010 Ident string: 1.4.3 Prefix: /usr Configured architecture: x86_64-pc-linux-gnu Configure host: allspice Configured by: buildd Thanks! Brett