+1 on all that has been said.

As Eugene stated: this is not an internal Open MPI bug.  Your application is 
calling some form of an MPI receive with a buffer that is too small.  The MPI 
specification defines this as a truncation error; hence, Open MPI gives you an 
ERR_TRUNCATE.  You can fix the error by calling that MPI receive with a bigger 
buffer.

I believe that the issue is getting further complicated because you are not 
calling MPI directly -- you are calling boost, and so the actual MPI calls that 
are being made are being obscured.  You might need to dive into the boost.mpi 
documentation a bit more to understand exactly how you are posting a receive 
that is too small for an incoming message.  

I'm guessing that you're either accidentally posting a receive that is too 
small, or your worker nodes are sending multiple different kinds of messages to 
the master on the same tag, and the messages after the first one are larger 
than 72 bytes (i.e., it's a timing issue that some peer process' 2nd message is 
reaching the master at a "bad" time -- if this is the case, I'd suggest using 
different tags to separate the different kinds of messages that are being sent).

-----

The details of how Open MPI moves messages across the network is somewhat 
irrelevant to this issue.  But if you care how that actually works, check out 
these FAQ items (they're specific to the OpenFabrics transport, but the same 
general method is used in many of Open MPI's transports):

    http://www.open-mpi.org/faq/?category=openfabrics#large-message-tuning-1.3
    http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned



On Jul 12, 2010, at 6:31 AM, jody wrote:

> Hi
> > mpi_irecv(workerNodeID, messageTag, bufferVector[row][column])
> OpenMPI contains no function of this form.
> There is MPI_Irecv, but it takes a different number of arguments.
> 
> Or is this a boost method?
> If yes, i guess you have to make sure that the
> bufferVector[row][column] is large enough...
> Perhaps there is a boost forum you can check out if the problem persists
> 
> Jody
> 
> 
> On Sun, Jul 11, 2010 at 10:13 AM, Jack Bryan <dtustud...@hotmail.com> wrote:
> > thanks for your reply.
> > The message size is 72 bytes.
> > The master sends out the message package to each 51 nodes.
> > Then, after doing their local work, the worker node send back the same-size
> > message to the master.
> > Master use vector.push_back(new messageType) to receive each message from
> > workers.
> > Master use the
> > mpi_irecv(workerNodeID, messageTag, bufferVector[row][column])
> > to receive the worker message.
> > the row is the rankID of each worker, the column is index for  message from
> > worker.
> > Each worker may send multiple messages to master.
> > when the worker node size is large, i got MPI_ERR_TRUNCATE error.
> > Any help is appreciated.
> > JACK
> > July 10  2010
> >
> > ________________________________
> > Date: Sat, 10 Jul 2010 23:12:49 -0700
> > From: eugene....@oracle.com
> > To: us...@open-mpi.org
> > Subject: Re: [OMPI users] OpenMPI how large its buffer size ?
> >
> > Jack Bryan wrote:
> >
> > The master node can receive message ( the same size)  from 50 worker nodes.
> > But, it cannot receive message from 51 nodes. It caused "truncate error".
> >
> > How big was the buffer that the program specified in the receive call?  How
> > big was the message that was sent?
> >
> > MPI_ERR_TRUNCATE means that you posted a receive with an application buffer
> > that turned out to be too small to hold the message that was received.  It's
> > a user application error that has nothing to do with MPI's internal
> > buffers.  MPI's internal buffers don't need to be big enough to hold that
> > message.  MPI could require the sender and receiver to coordinate so that
> > only part of the message is moved at a time.
> >
> > I used the same buffer to get the message in 50 node case.
> > About ""rendezvous" protocol", what is the meaning of "the sender sends a
> > short portion "?
> > What is the "short portion", is it a small mart of the message of the sender
> > ?
> >
> > It's at least the message header (communicator, tag, etc.) so that the
> > receiver can figure out if this is the expected message or not.  In
> > practice, there is probably also some data in there as well.  The amount of
> > that portion depends on the MPI implementation and, in practice, the
> > interconnect the message traveled over, MPI-implementation-dependent
> > environment variables set by the user, etc.  E.g., with OMPI over shared
> > memory by default it's about 4Kbytes (if I remember correctly).
> >
> > This "rendezvous" protocol" can work automatically in background without
> > programmer
> > indicates in his program ?
> >
> > Right.  MPI actually allows you to force such synchronization with
> > MPI_Ssend, but typically MPI implementations use it automatically for
> > "plain" long sends as well even if the user didn't not use MPI_Ssend.
> >
> > The "acknowledgement " can be generated by the receiver only when the
> > corresponding mpi_irecv is posted by the receiver ?
> >
> > Right.
> >
> > ________________________________
> > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with
> > Hotmail. Get busy.
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to