Thanks Jeremiah; I filed the following ticket about this: https://svn.open-mpi.org/trac/ompi/ticket/2723
On Feb 10, 2011, at 3:24 PM, Jeremiah Willcock wrote: > I forgot to mention that this was tested with 3 or 4 ranks, connected via TCP. > > -- Jeremiah Willcock > > On Thu, 10 Feb 2011, Jeremiah Willcock wrote: > >> Here is a small test case that hits the bug on 1.4.1: >> >> #include <mpi.h> >> >> int arr[1142]; >> >> int main(int argc, char** argv) { >> int rank, my_size; >> MPI_Init(&argc, &argv); >> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> my_size = (rank == 1) ? 1142 : 1088; >> MPI_Bcast(arr, my_size, MPI_INT, 0, MPI_COMM_WORLD); >> MPI_Finalize(); >> return 0; >> } >> >> I tried it on 1.5.1, and I get MPI_ERR_TRUNCATE instead, so this might have >> already been fixed. >> >> -- Jeremiah Willcock >> >> >> On Thu, 10 Feb 2011, Jeremiah Willcock wrote: >> >>> FYI, I am having trouble finding a small test case that will trigger this >>> on 1.5; I'm either getting deadlocks or MPI_ERR_TRUNCATE, so it could have >>> been fixed. What are the triggering rules for different broadcast >>> algorithms? It could be that only certain sizes or only certain BTLs >>> trigger it. >>> -- Jeremiah Willcock >>> On Thu, 10 Feb 2011, Jeff Squyres wrote: >>>> Nifty! Yes, I agree that that's a poor error message. It's probably >>>> (unfortunately) being propagated up from the underlying point-to-point >>>> system, where an ERR_IN_STATUS would actually make sense. >>>> I'll file a ticket about this. Thanks for the heads up. >>>> On Feb 9, 2011, at 4:49 PM, Jeremiah Willcock wrote: >>>>> On Wed, 9 Feb 2011, Jeremiah Willcock wrote: >>>>>> I get the following Open MPI error from 1.4.1: >>>>>> *** An error occurred in MPI_Bcast >>>>>> *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0 >>>>>> *** MPI_ERR_IN_STATUS: error code in status >>>>>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) >>>>>> (hostname and port removed from each line). There is no MPI_Status >>>>>> returned by MPI_Bcast, so I don't know what the error is? Is this >>>>>> something that people have seen before? >>>>> For the record, this appears to be caused by specifying inconsistent data >>>>> sizes on the different ranks in the broadcast operation. The error >>>>> message could still be improved, though. >>>>> -- Jeremiah Willcock >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/