Hi Georges,
Thanks for your patch, but I'm not sure I got it correctly. The patch I got
modify a few arguments passed to isend()/irecv()/recv() in
coll_basic_allgather.c. Here is the patch I applied:
Index: ompi/mca/coll/basic/coll_basic_allgather.c
===================================================================
--- ompi/mca/coll/basic/coll_basic_allgather.c (revision 17814)
+++ ompi/mca/coll/basic/coll_basic_allgather.c (working copy)
@@ -149,7 +149,7 @@
}
/* Do a send-recv between the two root procs. to avoid deadlock */
- err = MCA_PML_CALL(isend(sbuf, scount, sdtype, 0,
+ err = MCA_PML_CALL(isend(sbuf, scount, sdtype, root,
MCA_COLL_BASE_TAG_ALLGATHER,
MCA_PML_BASE_SEND_STANDARD,
comm, &reqs[rsize]));
@@ -157,7 +157,7 @@
return err;
}
- err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0,
+ err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, root,
MCA_COLL_BASE_TAG_ALLGATHER, comm,
&reqs[0]));
if (OMPI_SUCCESS != err) {
@@ -186,14 +186,14 @@
return err;
}
- err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0,
+ err = MCA_PML_CALL(isend(rbuf, rsize * scount, sdtype, root,
MCA_COLL_BASE_TAG_ALLGATHER,
MCA_PML_BASE_SEND_STANDARD, comm, &req));
if (OMPI_SUCCESS != err) {
goto exit;
}
- err = MCA_PML_CALL(recv(tmpbuf, size * scount, sdtype, 0,
+ err = MCA_PML_CALL(recv(tmpbuf, size * rcount, rdtype, root,
MCA_COLL_BASE_TAG_ALLGATHER, comm,
MPI_STATUS_IGNORE));
if (OMPI_SUCCESS != err) {
However with this patch, I still have the problem. Suppose I start the server
with three process and the client with two, the clients prints:
[audet@linux15 dyn_connect]$ mpiexec --universe univ1 -n 2 ./aclient
'0.2.0:2000'
intercomm_flag = 1
intercomm_remote_size = 3
rem_rank_tbl[3] = { 0 1 2}
[linux15:26114] *** An error occurred in MPI_Allgather
[linux15:26114] *** on communicator
[linux15:26114] *** MPI_ERR_TRUNCATE: message truncated
[linux15:26114] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 0 with PID 26113 on node linux15 exited on signal
15 (Terminated).
[audet@linux15 dyn_connect]$
and abort. The server on the other side simply hang (as before).
Regards,
Martin