FYI, I am having trouble finding a small test case that will trigger this on 1.5; I'm either getting deadlocks or MPI_ERR_TRUNCATE, so it could have been fixed. What are the triggering rules for different broadcast algorithms? It could be that only certain sizes or only certain BTLs trigger it.

-- Jeremiah Willcock

On Thu, 10 Feb 2011, Jeff Squyres wrote:

Nifty!  Yes, I agree that that's a poor error message.  It's probably 
(unfortunately) being propagated up from the underlying point-to-point system, 
where an ERR_IN_STATUS would actually make sense.

I'll file a ticket about this.  Thanks for the heads up.


On Feb 9, 2011, at 4:49 PM, Jeremiah Willcock wrote:

On Wed, 9 Feb 2011, Jeremiah Willcock wrote:

I get the following Open MPI error from 1.4.1:

*** An error occurred in MPI_Bcast
*** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
*** MPI_ERR_IN_STATUS: error code in status
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

(hostname and port removed from each line).  There is no MPI_Status returned by 
MPI_Bcast, so I don't know what the error is?  Is this something that people 
have seen before?

For the record, this appears to be caused by specifying inconsistent data sizes 
on the different ranks in the broadcast operation.  The error message could 
still be improved, though.

-- Jeremiah Willcock
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to