Hi Jeff, As I said in my last message (see bellow) the patch (or at least the patch I got) don't fixes the problem for me. Whether I apply it over OpenMPI 1.2.5 or 1.2.6rc2, I still get the same problem:
The client aborts with a truncation error message while the server freeze when for example the server is started on 3 process and the client on 2 process. Feel free to try yourself the two small client and server programs I posted in my first message. Thanks, Martin Subject: [OMPI users] RE : users Digest, Vol 841, Issue 3 From: Audet, Martin (Martin.Audet_at_[hidden]) List-Post: users@lists.open-mpi.org Date: 2008-03-13 17:04:25 Hi Georges, Thanks for your patch, but I'm not sure I got it correctly. The patch I got modify a few arguments passed to isend()/irecv()/recv() in coll_basic_allgather.c. Here is the patch I applied: Index: ompi/mca/coll/basic/coll_basic_allgather.c =================================================================== --- ompi/mca/coll/basic/coll_basic_allgather.c (revision 17814) +++ ompi/mca/coll/basic/coll_basic_allgather.c (working copy) @@ -149,7 +149,7 @@ } /* Do a send-recv between the two root procs. to avoid deadlock */ - err = MCA_PML_CALL(isend(sbuf, scount, sdtype, 0, + err = MCA_PML_CALL(isend(sbuf, scount, sdtype, root, MCA_COLL_BASE_TAG_ALLGATHER, MCA_PML_BASE_SEND_STANDARD, comm, &reqs[rsize])); @@ -157,7 +157,7 @@ return err; } - err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0, + err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, root, MCA_COLL_BASE_TAG_ALLGATHER, comm, &reqs[0])); if (OMPI_SUCCESS != err) { @@ -186,14 +186,14 @@ return err; } - err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0, + err = MCA_PML_CALL(isend(rbuf, rsize * scount, sdtype, root, MCA_COLL_BASE_TAG_ALLGATHER, MCA_PML_BASE_SEND_STANDARD, comm, &req)); if (OMPI_SUCCESS != err) { goto exit; } - err = MCA_PML_CALL(recv(tmpbuf, size * scount, sdtype, 0, + err = MCA_PML_CALL(recv(tmpbuf, size * rcount, rdtype, root, MCA_COLL_BASE_TAG_ALLGATHER, comm, MPI_STATUS_IGNORE)); if (OMPI_SUCCESS != err) { However with this patch, I still have the problem. Suppose I start the server with three process and the client with two, the clients prints: [audet_at_linux15 dyn_connect]$ mpiexec --universe univ1 -n 2 ./aclient '0.2.0:2000' intercomm_flag = 1 intercomm_remote_size = 3 rem_rank_tbl[3] = { 0 1 2} [linux15:26114] *** An error occurred in MPI_Allgather [linux15:26114] *** on communicator [linux15:26114] *** MPI_ERR_TRUNCATE: message truncated [linux15:26114] *** MPI_ERRORS_ARE_FATAL (goodbye) mpiexec noticed that job rank 0 with PID 26113 on node linux15 exited on signal 15 (Terminated). [audet_at_linux15 dyn_connect]$ and abort. The server on the other side simply hang (as before). Regards, Martin -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: March 14, 2008 19:45 To: Open MPI Users Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails Yes, please let us know if this fixes it. We're working on a 1.2.6 release; we can definitely put this fix in there if it's correct. Thanks! On Mar 13, 2008, at 4:07 PM, George Bosilca wrote: > I dig into the sources and I think you correctly pinpoint the bug. > It seems we have a mismatch between the local and remote sizes in > the inter-communicator allgather in the 1.2 series (which explain > the message truncation error when the local and remote groups have a > different number of processes). Attached to this email you can find > a patch that [hopefully] solve this problem. If you can please test > it and let me know if this solve your problem. > > Thanks, > george. > > <inter_allgather.patch> > > > On Mar 13, 2008, at 1:11 PM, Audet, Martin wrote: > >> >> Hi, >> >> After re-checking the MPI standard (www.mpi-forum.org and MPI - The >> Complete Reference), I'm more and more convinced that my small >> examples programs establishing a intercommunicator with >> MPI_Comm_Connect()/MPI_Comm_accept() over an MPI port and >> exchanging data over it with MPI_Allgather() is correct. Especially >> calling MPI_Allgather() with recvcount=1 (its third argument) >> instead of the total number of MPI_INT that will be received (e.g. >> intercomm_remote_size in the examples) is both correct and >> consistent with MPI_Allgather() behavior on intracommunicator (e.g. >> "normal" communicator). >> >> MPI_Allgather(&comm_rank, 1, MPI_INT, >> rem_rank_tbl, 1, MPI_INT, >> intercomm); >> >> Also the recvbuf argument (the second argument) of MPI_Allgather() >> in the examples should have a size of intercomm_remote_size (e.g. >> the size of the remote group), not the sum of the local and remote >> groups in the client and sever process. The standard says that for >> all-to-all type of operations over an intercommunicator, the >> process send and receives data from the remote group only (anyway >> it is not possible to exchange data with process of the local group >> over an intercommunicator). >> >> So, for me there is no reason for stopping the process with an >> error message complaining about message truncation. There should be >> no truncation, sendcount, sendtype, recvcount and recvtype >> arguments of MPI_Allgather() are correct and consistent. >> >> So again for me the OpenMPI behavior with my example look more and >> more like a bug... >> >> Concerning George comment about valgrind and TCP/IP, I totally >> agree, messages reported by valgrind are only a clue of a bug, >> especially in this contex, not a proof of bug. Another clue is that >> my small examples work perfectly with mpich2 ch3:sock. >> >> Regards, >> >> Martin Audet >> >> >> ------------------------------ >> >> Message: 4 >> Date: Thu, 13 Mar 2008 08:21:51 +0100 >> From: jody <jody....@gmail.com> >> Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails >> To: "Open MPI Users" <us...@open-mpi.org> >> Message-ID: >> <9b0da5ce0803130021l4ead0f91qaf43e4ac7d332...@mail.gmail.com> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> HI >> I think the recvcount argument you pass to MPI_Allgather should not >> be >> 1 but instead >> the number of MPI_INTs your buffer rem_rank_tbl can contain. >> As it stands now, you tell MPI_Allgather that it may only receive 1 >> MPI_INT. >> >> Furthermore, i'm not sure, but i think your receive buffer should be >> large enough >> to contain messages from *all* processes, and not just from the >> "far side" >> >> Jody >> >> . >> >> >> ------------------------------ >> >> Message: 6 >> Date: Thu, 13 Mar 2008 09:06:47 -0500 >> From: George Bosilca <bosi...@eecs.utk.edu> >> Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails >> To: Open MPI Users <us...@open-mpi.org> >> Message-ID: <82e9ff28-fb87-4ffb-a492-dde472d5d...@eecs.utk.edu> >> Content-Type: text/plain; charset="us-ascii" >> >> I am not aware of any problems with the allreduce/allgather. But, we >> are aware of the problem with valgrind that report non initialized >> values when used with TCP. It's a long story, but I can guarantee >> that >> this should not affect a correct MPI application. >> >> george. >> >> PS: For those who want to know the details: we have to send a header >> over TCP which contain some very basic information, including the >> size >> of the fragment. Unfortunately, we have a 2 bytes gap in the header. >> As we never initialize these 2 unused bytes, but we send them over >> the >> wire, valgrind correctly detect the non initialized data transfer. >> >> >> On Mar 12, 2008, at 3:58 PM, Audet, Martin wrote: >> >>> Hi again, >>> >>> Thanks Pak for the link and suggesting to start an "orted" deamon, >>> by doing so my clients and servers jobs were able to establish an >>> intercommunicator between them. >>> >>> However I modified my programs to perform an MPI_Allgather() of a >>> single "int" over the new intercommunicator to test communication a >>> litle bit and I did encountered problems. I am now wondering if >>> there is a problem in MPI_Allreduce() itself for intercommunicators. >>> Note that the same program run without problems with mpich2 >>> (ch3:sock). >>> >>> For example if I start orted as follows: >>> >>> orted --persistent --seed --scope public --universe univ1 >>> >>> and then start the server with three process: >>> >>> mpiexec --universe univ1 -n 3 ./aserver >>> >>> it prints: >>> >>> Server port = '0.2.0:2000' >>> >>> Now if I start the client with two process as follow (using the >>> server port): >>> >>> mpiexec --universe univ1 -n 2 ./aclient '0.2.0:2000' >>> >>> The server prints: >>> >>> intercomm_flag = 1 >>> intercomm_remote_size = 2 >>> rem_rank_tbl[2] = { 0 1} >>> >>> which is the correct output. The client then prints: >>> >>> intercomm_flag = 1 >>> intercomm_remote_size = 3 >>> rem_rank_tbl[3] = { 0 1 2} >>> [linux15:30895] *** An error occurred in MPI_Allgather >>> [linux15:30895] *** on communicator >>> [linux15:30895] *** MPI_ERR_TRUNCATE: message truncated >>> [linux15:30895] *** MPI_ERRORS_ARE_FATAL (goodbye) >>> mpiexec noticed that job rank 0 with PID 30894 on node linux15 >>> exited on signal 15 (Terminated). >>> >>> As you can see the first messages are correct but the client job >>> terminate with an error (and the server hang). >>> >>> After re-reading the documentation about MPI_Allgather() over an >>> intercommunicator, I don't see anything wrong in my simple code. >>> Also if I run the client and server process with valgrind, I get a >>> few messages like: >>> >>> ==29821== Syscall param writev(vector[...]) points to uninitialised >>> byte(s) >>> ==29821== at 0x36235C2130: writev (in /lib64/libc-2.3.5.so) >>> ==29821== by 0x7885583: mca_btl_tcp_frag_send (in /home/publique/ >>> openmpi-1.2.5/lib/openmpi/mca_btl_tcp.so) >>> ==29821== by 0x788501B: mca_btl_tcp_endpoint_send (in /home/ >>> publique/openmpi-1.2.5/lib/openmpi/mca_btl_tcp.so) >>> ==29821== by 0x7467947: mca_pml_ob1_send_request_start_prepare >>> (in /home/publique/openmpi-1.2.5/lib/openmpi/mca_pml_ob1.so) >>> ==29821== by 0x7461494: mca_pml_ob1_isend (in /home/publique/ >>> openmpi-1.2.5/lib/openmpi/mca_pml_ob1.so) >>> ==29821== by 0x798BF9D: mca_coll_basic_allgather_inter (in /home/ >>> publique/openmpi-1.2.5/lib/openmpi/mca_coll_basic.so) >>> ==29821== by 0x4A5069C: PMPI_Allgather (in /home/publique/ >>> openmpi-1.2.5/lib/libmpi.so.0.0.0) >>> ==29821== by 0x400EED: main (aserver.c:53) >>> ==29821== Address 0x40d6cac is not stack'd, malloc'd or (recently) >>> free'd >>> >>> in both MPI_Allgather() and MPI_Comm_disconnect() calls for client >>> and server with valgrind always reporting that the address in >>> question are "not stack'd, malloc'd or (recently) free'd". >>> >>> So is there a problem with MPI_Allgather() on intercommunicators or >>> am I doing something wrong ? >>> >>> Thanks, >>> >>> Martin >>> >>> >>> /* aserver.c */ >>> #include <stdio.h> >>> #include <mpi.h> >>> >>> #include <assert.h> >>> #include <stdlib.h> >>> >>> int main(int argc, char **argv) >>> { >>> int comm_rank,comm_size; >>> char port_name[MPI_MAX_PORT_NAME]; >>> MPI_Comm intercomm; >>> int ok_flag; >>> >>> int intercomm_flag; >>> int intercomm_remote_size; >>> int *rem_rank_tbl; >>> int ii; >>> >>> MPI_Init(&argc, &argv); >>> >>> MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank); >>> MPI_Comm_size(MPI_COMM_WORLD, &comm_size); >>> >>> ok_flag = (comm_rank != 0) || (argc == 1); >>> MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD); >>> >>> if (!ok_flag) { >>> if (comm_rank == 0) { >>> fprintf(stderr,"Usage: %s\n",argv[0]); >>> } >>> MPI_Abort(MPI_COMM_WORLD, 1); >>> } >>> >>> MPI_Open_port(MPI_INFO_NULL, port_name); >>> >>> if (comm_rank == 0) { >>> printf("Server port = '%s'\n", port_name); >>> } >>> MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, >>> &intercomm); >>> >>> MPI_Close_port(port_name); >>> >>> MPI_Comm_test_inter(intercomm, &intercomm_flag); >>> if (comm_rank == 0) { >>> printf("intercomm_flag = %d\n", intercomm_flag); >>> } >>> assert(intercomm_flag != 0); >>> MPI_Comm_remote_size(intercomm, &intercomm_remote_size); >>> if (comm_rank == 0) { >>> printf("intercomm_remote_size = %d\n", intercomm_remote_size); >>> } >>> rem_rank_tbl = malloc(intercomm_remote_size*sizeof(*rem_rank_tbl)); >>> MPI_Allgather(&comm_rank, 1, MPI_INT, >>> rem_rank_tbl, 1, MPI_INT, >>> intercomm); >>> if (comm_rank == 0) { >>> printf("rem_rank_tbl[%d] = {", intercomm_remote_size); >>> for (ii=0; ii < intercomm_remote_size; ii++) { >>> printf(" %d", rem_rank_tbl[ii]); >>> } >>> printf("}\n"); >>> } >>> free(rem_rank_tbl); >>> >>> MPI_Comm_disconnect(&intercomm); >>> >>> MPI_Finalize(); >>> >>> return 0; >>> } >>> >>> /* aclient.c */ >>> #include <stdio.h> >>> #include <unistd.h> >>> >>> #include <mpi.h> >>> >>> #include <assert.h> >>> #include <stdlib.h> >>> >>> int main(int argc, char **argv) >>> { >>> int comm_rank,comm_size; >>> int ok_flag; >>> MPI_Comm intercomm; >>> >>> int intercomm_flag; >>> int intercomm_remote_size; >>> int *rem_rank_tbl; >>> int ii; >>> >>> MPI_Init(&argc, &argv); >>> >>> MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank); >>> MPI_Comm_size(MPI_COMM_WORLD, &comm_size); >>> >>> ok_flag = (comm_rank != 0) || ((argc == 2) && argv[1] && >>> (*argv[1] != '\0')); >>> MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD); >>> >>> if (!ok_flag) { >>> if (comm_rank == 0) { >>> fprintf(stderr,"Usage: %s mpi_port\n", argv[0]); >>> } >>> MPI_Abort(MPI_COMM_WORLD, 1); >>> } >>> >>> while (MPI_Comm_connect((comm_rank == 0) ? argv[1] : 0, >>> MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm) != MPI_SUCCESS) { >>> if (comm_rank == 0) { >>> printf("MPI_Comm_connect() failled, sleeping and retrying... >>> \n"); >>> } >>> sleep(1); >>> } >>> >>> MPI_Comm_test_inter(intercomm, &intercomm_flag); >>> if (comm_rank == 0) { >>> printf("intercomm_flag = %d\n", intercomm_flag); >>> } >>> assert(intercomm_flag != 0); >>> MPI_Comm_remote_size(intercomm, &intercomm_remote_size); >>> if (comm_rank == 0) { >>> printf("intercomm_remote_size = %d\n", intercomm_remote_size); >>> } >>> rem_rank_tbl = malloc(intercomm_remote_size*sizeof(*rem_rank_tbl)); >>> MPI_Allgather(&comm_rank, 1, MPI_INT, >>> rem_rank_tbl, 1, MPI_INT, >>> intercomm); >>> if (comm_rank == 0) { >>> printf("rem_rank_tbl[%d] = {", intercomm_remote_size); >>> for (ii=0; ii < intercomm_remote_size; ii++) { >>> printf(" %d", rem_rank_tbl[ii]); >>> } >>> printf("}\n"); >>> } >>> free(rem_rank_tbl); >>> >>> MPI_Comm_disconnect(&intercomm); >>> >>> MPI_Finalize(); >>> >>> return 0; >>> } >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: smime.p7s >> Type: application/pkcs7-signature >> Size: 2423 bytes >> Desc: not available >> Url : >> http://www.open-mpi.org/MailArchives/users/attachments/20080313/642d41dd/attachment.bin >> >> ------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> End of users Digest, Vol 841, Issue 1 >> ************************************* >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users