Hi

Are you running on two processes (mpiexec -n 2)?
Yes

Have you tried to print Gsize?
Yes, I had checked my codes several times, and I thought the errors came
from the OpenMpi. :)

The command line I used:
"mpirun -hostfile ./Serverlist -np 2 ./test". The "Serverlist" file include
several computers in my network.

The command line that I used to build the openmpi-1.4.1:
./configure --enable-debug --prefix=/usr/work/openmpi ; make all install;

What interconnect do you use?
It is normal TCP/IP interconnect with 1GB network card. when I debugged my
codes(and the openmpi codes), I found the openMpi do call the
"mca_pml_ob1_send_request_start_rdma(...)" function, but I was not quite
sure which protocal was used when transfer 2BG data. Do you have any
opinions? Thanks

Best Regards
Xianjun Meng

2010/12/7 Gus Correa <g...@ldeo.columbia.edu>

> Hi Xianjun
>
> Are you running on two processes (mpiexec -n 2)?
> I think this code will deadlock for more than two processes.
> The MPI_Recv won't have a matching send for rank>1.
>
> Also, this is C, not MPI,
> but you may be wrapping into the negative numbers.
> Have you tried to print Gsize?
> It is probably -2147483648 in 32bit and 64bit machines.
>
> My two cents.
> Gus Correa
>
> Mike Dubman wrote:
>
>> Hi,
>> What interconnect and command line do you use? For InfiniBand openib
>> component there is a known issue with large transfers (2GB)
>>
>> https://svn.open-mpi.org/trac/ompi/ticket/2623
>>
>> try disabling memory pinning:
>> http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
>>
>>
>> regards
>> M
>>
>>
>> 2010/12/6 <xjun.m...@gmail.com <mailto:xjun.m...@gmail.com>>
>>
>>
>>    hi,
>>
>>    In my computers(X86-64), the sizeof(int)=4, but the
>>    sizeof(long)=sizeof(double)=sizeof(size_t)=8. when I checked my
>>    mpi.h file, I found that the definition about the sizeof(int) is
>>    correct. meanwhile, I think the mpi.h file was generated according
>>    to my compute environment when I compiled the Openmpi. So, my codes
>>    still don't work. :(
>>
>>    Further, I found when I called the collective routines(such as,
>>    MPI_Allgatherv(...)) which are implemented by the Point 2 Point
>>    don't work either when the data > 2GB.
>>
>>    Thanks
>>    Xianjun
>>
>>    2010/12/6 Tim Prince <n...@aol.com <mailto:n...@aol.com>>
>>
>>
>>        On 12/5/2010 7:13 PM, Xianjun wrote:
>>
>>            hi,
>>
>>            I met a question recently when I tested the MPI_send and
>>            MPI_Recv
>>            functions. When I run the following codes, the processes
>>            hanged and I
>>            found there was not data transmission in my network at all.
>>
>>            BTW: I finished this test on two X86-64 computers with 16GB
>>            memory and
>>            installed Linux.
>>
>>            1 #include <stdio.h>
>>            2 #include <mpi.h>
>>            3 #include <stdlib.h>
>>            4 #include <unistd.h>
>>            5
>>            6
>>            7 int main(int argc, char** argv)
>>            8 {
>>            9 int localID;
>>            10 int numOfPros;
>>            11 size_t Gsize = (size_t)2 * 1024 * 1024 * 1024;
>>            12
>>            13 char* g = (char*)malloc(Gsize);
>>            14
>>            15 MPI_Init(&argc, &argv);
>>            16 MPI_Comm_size(MPI_COMM_WORLD, &numOfPros);
>>            17 MPI_Comm_rank(MPI_COMM_WORLD, &localID);
>>            18
>>            19 MPI_Datatype MPI_Type_lkchar;
>>            20 MPI_Type_contiguous(2048, MPI_BYTE, &MPI_Type_lkchar);
>>            21 MPI_Type_commit(&MPI_Type_lkchar);
>>            22
>>            23 if (localID == 0)
>>            24 {
>>            25 MPI_Send(g, 1024*1024, MPI_Type_lkchar, 1, 1,
>>            MPI_COMM_WORLD);
>>            26 }
>>            27
>>            28 if (localID != 0)
>>            29 {
>>            30 MPI_Status status;
>>            31 MPI_Recv(g, 1024*1024, MPI_Type_lkchar, 0, 1, \
>>            32 MPI_COMM_WORLD, &status);
>>            33 }
>>            34
>>            35 MPI_Finalize();
>>            36
>>            37 return 0;
>>            38 }
>>
>>        You supplied all your constants as 32-bit signed data, so, even
>>        if the count for MPI_Send() and MPI_Recv() were a larger data
>>        type, you would see this limit. Did you look at your <mpi.h> ?
>>
>>        --         Tim Prince
>>
>>        _______________________________________________
>>        users mailing list
>>        us...@open-mpi.org <mailto:us...@open-mpi.org>
>>
>>        http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>    _______________________________________________
>>    users mailing list
>>    us...@open-mpi.org <mailto:us...@open-mpi.org>
>>
>>    http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to