Hi Xianjun

Suggestions/Questions:

1) Did you check if malloc returns a non-NULL pointer?
Your program is assuming this, but it may not be true,
and in this case the problem is not with MPI.
You can print a message and call MPI_Abort if it doesn't.

2) Have you tried MPI_Isend/MPI_Irecv?
Or perhaps the buffered cousin MPI_Ibsend?

3) Why do you want to send these huge messages?
Wouldn't it be less of a trouble to send several
smaller messages?

I hope it helps,
Gus Correa

Xianjun wrote:

Hi

Are you running on two processes (mpiexec -n 2)?
Yes

Have you tried to print Gsize?
Yes, I had checked my codes several times, and I thought the errors came from the OpenMpi. :)

The command line I used:
"mpirun -hostfile ./Serverlist -np 2 ./test". The "Serverlist" file include several computers in my network.

The command line that I used to build the openmpi-1.4.1:
./configure --enable-debug --prefix=/usr/work/openmpi ; make all install;

What interconnect do you use?
It is normal TCP/IP interconnect with 1GB network card. when I debugged my codes(and the openmpi codes), I found the openMpi do call the "mca_pml_ob1_send_request_start_rdma(...)" function, but I was not quite sure which protocal was used when transfer 2BG data. Do you have any opinions? Thanks

Best Regards
Xianjun Meng

2010/12/7 Gus Correa <g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>>

    Hi Xianjun

    Are you running on two processes (mpiexec -n 2)?
    I think this code will deadlock for more than two processes.
    The MPI_Recv won't have a matching send for rank>1.

    Also, this is C, not MPI,
    but you may be wrapping into the negative numbers.
    Have you tried to print Gsize?
    It is probably -2147483648 in 32bit and 64bit machines.

    My two cents.
    Gus Correa

    Mike Dubman wrote:

        Hi,
        What interconnect and command line do you use? For InfiniBand
        openib component there is a known issue with large transfers (2GB)

        https://svn.open-mpi.org/trac/ompi/ticket/2623

        try disabling memory pinning:
        
http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned


        regards
        M


        2010/12/6 <xjun.m...@gmail.com <mailto:xjun.m...@gmail.com>
        <mailto:xjun.m...@gmail.com <mailto:xjun.m...@gmail.com>>>


           hi,

           In my computers(X86-64), the sizeof(int)=4, but the
           sizeof(long)=sizeof(double)=sizeof(size_t)=8. when I checked my
           mpi.h file, I found that the definition about the sizeof(int) is
           correct. meanwhile, I think the mpi.h file was generated
        according
           to my compute environment when I compiled the Openmpi. So, my
        codes
           still don't work. :(

           Further, I found when I called the collective routines(such as,
           MPI_Allgatherv(...)) which are implemented by the Point 2 Point
           don't work either when the data > 2GB.

           Thanks
           Xianjun

           2010/12/6 Tim Prince <n...@aol.com <mailto:n...@aol.com>
        <mailto:n...@aol.com <mailto:n...@aol.com>>>


               On 12/5/2010 7:13 PM, Xianjun wrote:

                   hi,

                   I met a question recently when I tested the MPI_send and
                   MPI_Recv
                   functions. When I run the following codes, the processes
                   hanged and I
                   found there was not data transmission in my network
        at all.

                   BTW: I finished this test on two X86-64 computers
        with 16GB
                   memory and
                   installed Linux.

                   1 #include <stdio.h>
                   2 #include <mpi.h>
                   3 #include <stdlib.h>
                   4 #include <unistd.h>
                   5
                   6
                   7 int main(int argc, char** argv)
                   8 {
                   9 int localID;
                   10 int numOfPros;
                   11 size_t Gsize = (size_t)2 * 1024 * 1024 * 1024;
                   12
                   13 char* g = (char*)malloc(Gsize);
                   14
                   15 MPI_Init(&argc, &argv);
                   16 MPI_Comm_size(MPI_COMM_WORLD, &numOfPros);
                   17 MPI_Comm_rank(MPI_COMM_WORLD, &localID);
                   18
                   19 MPI_Datatype MPI_Type_lkchar;
                   20 MPI_Type_contiguous(2048, MPI_BYTE, &MPI_Type_lkchar);
                   21 MPI_Type_commit(&MPI_Type_lkchar);
                   22
                   23 if (localID == 0)
                   24 {
                   25 MPI_Send(g, 1024*1024, MPI_Type_lkchar, 1, 1,
                   MPI_COMM_WORLD);
                   26 }
                   27
                   28 if (localID != 0)
                   29 {
                   30 MPI_Status status;
                   31 MPI_Recv(g, 1024*1024, MPI_Type_lkchar, 0, 1, \
                   32 MPI_COMM_WORLD, &status);
                   33 }
                   34
                   35 MPI_Finalize();
                   36
                   37 return 0;
                   38 }

               You supplied all your constants as 32-bit signed data,
        so, even
               if the count for MPI_Send() and MPI_Recv() were a larger data
               type, you would see this limit. Did you look at your
        <mpi.h> ?

               --         Tim Prince

               _______________________________________________
               users mailing list
               us...@open-mpi.org <mailto:us...@open-mpi.org>
        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

               http://www.open-mpi.org/mailman/listinfo.cgi/users



           _______________________________________________
           users mailing list
           us...@open-mpi.org <mailto:us...@open-mpi.org>
        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

           http://www.open-mpi.org/mailman/listinfo.cgi/users



        ------------------------------------------------------------------------

        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        http://www.open-mpi.org/mailman/listinfo.cgi/users


    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users



------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to