As your code prints OK without verifying the correctness of the
result, you are only verifying the lack of segfault in OpenMPI, which
is necessary but not sufficient for correct execution.

It is not uncommon for MPI implementations to have issues near
count=2^31.  I can't speak to the extent to which OpenMPI is
rigorously correct in this respect.  I've yet to find an
implementation which is end-to-end count-safe, which includes support
for zettabyte buffers via MPI datatypes for collectives,
point-to-point, RMA and IO.

The easy solution for your case is to chop MPI_Allgatherv into
multiple calls.  In the case where the array of send counts is near
uniform, you can do N MPI_Allgather calls and 1 MPI_Allgatherv, which
might help performance in some cases.

Since most MPI implementations use Send/Recv under the hood for
collectives, you can aid in the debugging of this issue by testing
MPI_Send/Recv for count->2^31.

Best,

Jeff

On Mon, Aug 5, 2013 at 6:48 PM, ryan He <ryan.qing...@gmail.com> wrote:
> Dear All,
>
> I write a simple test code to use MPI_Allgatherv function. The problem comes
> when
> the send buf size becomes relatively big.
>
> When Bufsize = 2^28 – 1, run on 4 processors. OK
> When Bufsize = 2^28, run on 4 processors. Error
> [btl_tcp_frag.c:209:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv
> error (0xffffffff85f526f8, 2147483592) Bad address(1)
>
> When Bufsize =2^29-1, run on 2 processors. OK
> When Bufsize = 2^29, run on 2 processors. Error
> [btl_tcp_frag.c:209:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv
> error (0xffffffff964605d0, 2147483632) Bad address(1)
>
> Bufsize is not that close to int limit, but readv in mca_btl_tcp_frag_recv
> has size close to 2147483647. Does anyone have idea why the error comes? Any
> suggestion to solve or avoid this problem?
>
> The simple test code is attached below:
>
> #include <stdio.h>
>
> #include <stdlib.h>
>
> #include <string.h>
>
> #include <unistd.h>
>
> #include <time.h>
>
> #include "mpi.h"
>
> int main(int argc, char ** argv)
>
> {
>
> int myid,nproc;
>
> long i,j;
>
> long size;
>
> long bufsize;
>
> int *rbuf;
>
> int *sbuf;
>
> char hostname[MPI_MAX_PROCESSOR_NAME];
>
> int len;
>
> size = (long) 2*1024*1024*1024-1;
>
> MPI_Init(&argc, &argv);
>
> MPI_Comm_rank(MPI_COMM_WORLD, &myid);
>
> MPI_Comm_size(MPI_COMM_WORLD, &nproc);
>
> MPI_Get_processor_name(hostname, &len);
>
> printf("I am process %d with pid: %d at %s\n",myid,getpid(),hostname);
>
> sleep(2);
>
>
> if (myid == 0)
>
> printf("size : %ld\n",size);
>
> sbuf = (int *) calloc(size,sizeof(MPI_INT));
>
> if (sbuf == NULL) {
>
> printf("fail to allocate memory of sbuf\n");
>
> exit(1);
>
> }
>
> rbuf = (int *) calloc(size,sizeof(MPI_INT));
>
> if (rbuf == NULL) {
>
> printf("fail to allocate memory of rbuf\n");
>
> exit(1);
>
> }
>
> int *recvCount = calloc(nproc,sizeof(int));
>
> int *displ = calloc(nproc,sizeof(int));
>
> bufsize = 268435456; //which is 2^28
>
> for(i=0;i<nproc;++i) {
>
> recvCount[i] = bufsize;
>
> displ[i] = bufsize*i;
>
> }
>
> for (i=0;i<bufsize;++i)
>
> sbuf[i] = myid+i;
>
> printf("buffer size: %ld recvCount[0]:%d last displ
> index:%d\n",bufsize,recvCount[0],displ[nproc-1]);
>
> fflush(stdout);
>
>
> MPI_Allgatherv(sbuf,recvCount[0], MPI_INT,rbuf,recvCount,displ,MPI_INT,
>
> MPI_COMM_WORLD);
>
>
> printf("OK\n");
>
> fflush(stdout);
>
> MPI_Finalize();
>
> return 0;
>
> }
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Jeff Hammond
jeff.scie...@gmail.com

Reply via email to