Dear all,

I have not used OpenMPI much before, but am maintaining a large legacy
application. We noticed a bug to do with a call to MPI_Allgather as
summarised in this post to Stackoverflow:
http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results

In the process of looking further into the problem, I noticed that the
following function results in strange behaviour.

void test_all_gather() {

    struct _TEST_ALL_GATHER {
        int node;
    };

    int ierr, size, rank;
    ierr = MPI_Comm_size(MPI_COMM_WORLD, &size);
    ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    struct _TEST_ALL_GATHER local;
    struct _TEST_ALL_GATHER *gathered;

    gathered = (struct _TEST_ALL_GATHER*) malloc(size * sizeof(*gathered));

    local.node = rank;

    MPI_Allgather(&local, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
        gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
MPI_COMM_WORLD);

    int i;
    for (i = 0; i < numnodes; ++i) {
        (void) printf("gathered[%d].node = %d\n", i, gathered[i].node);
    }

    FREE(gathered);
}

At one point, this function printed the following:
gathered[0].node = 2
gathered[1].node = 3
gathered[2].node = 2
gathered[3].node = 3
gathered[4].node = 4
gathered[5].node = 5

Can anyone suggest a place to start looking into why this might be
happening? There is a section of the code that calls MPI_Comm_split, but I
am not sure if that is related...

Running on Ubuntu 11.10 and a summary of ompi_info:
Package: Open MPI buildd@allspice Distribution
Open MPI: 1.4.3
Open MPI SVN revision: r23834
Open MPI release date: Oct 05, 2010
Open RTE: 1.4.3
Open RTE SVN revision: r23834
Open RTE release date: Oct 05, 2010
OPAL: 1.4.3
OPAL SVN revision: r23834
OPAL release date: Oct 05, 2010
Ident string: 1.4.3
Prefix: /usr
Configured architecture: x86_64-pc-linux-gnu
Configure host: allspice
Configured by: buildd

Thanks!
Brett

Reply via email to