Re: [OMPI users] MPI piggyback mechanism

Thomas Ropars Mon, 18 Feb 2008 08:14:52 -0500

Sorry for my late reply.
And thank you all for your answers and comments.


Oleg,

Same question as Aurélien. You mentionned that you have implemented somepiggyback mechanisms in Open MPI.

Are these mechanisms available ?
Would it be possible to use it ?

Regards.

Thomas Ropars
Aurélien Bouteiller wrote:

Oleg,

Is there an implementation in Open MPI of your techniques ? Can we putour greedy nasty pawns on it ?


Thanks for the link, Josh.

Aurelien

Le 5 févr. 08 à 08:39, Josh Hursey a écrit :

Oleg,

Interesting work. You mentioned late in your email that you believe
that adding support for piggybacking to the MPI standard would be the
best solution. As you may know, the MPI Forum has reconvened and there
is a working group for Fault Tolerance. This working group is
discussing a piggybacking interface proposal for the standard, amongst
other things. If you are interested in contributing to this
conversation you can find the mailing list here:
 http://lists.cs.uiuc.edu/mailman/listinfo/mpi3-ft

Best,
Josh

On Feb 5, 2008, at 4:58 AM, Oleg Morajko wrote:

Hi,

I've been working on MPI piggyback technique as a part of my PhDwork.


Although MPI does not provide a native support, there are several
different
solutions to transmit piggyback data over every MPI communication.
You may
find a brief overview in papers [1, 2]. This includes copying the
original
message and the extra data to a bigger buffer, sending additional
message or
changing the sendtype to a dynamically created wrapper datatype that
contains a pointer to the original data and the piggyback data. I
have tried
all mechanisms and they work, but considering the overhead, there is
no "the
best" technique that outperforms the others in all scenarios. Jeff
Squyres
had interesting comments on this subject before (in this mailing
list).

Finally after some benchmarking, I have implemented *a *hybrid
technique
that combines existing mechanisms. For small, point-to-point messages

datatype wrapping seems to be the less intrusive, at leastconsideringOpenMPI implementation of derived datatypes. For large, point-to-point

messages, experiments confirmed that sending an additional message
is much
cheaper than wrapping (and besides the intrusion is small as we are
already
sending a large message). Moreover, the implementation may
interleave the
original send with an asynchronous send of piggyback data. This
optimization
partially hides the latency of additional send and lowers overall
intrusion.
The  same criteria can be applied for collective operations, except
barrier
and reduce operations. As the former does not transmit any data and
the
latter transforms the data, the only solution is to send additional
messages.

There is a penalty of course. Especially for collective operations
with very
small messages the intrusion may reach 15% and that's a lot. It than
decreases down to 0.1% for bigger messages, but anyway it's still
there. I
don't know what are your requirements/expectations for that issue.
The only
work that reported lower overheads is [3] but they added native
piggyback
support by changing underlying MPI implementation.

I think the best possible option is to add piggyback support for MPI
as a
part of the standard. A growing number of runtime tools use this
functionality for multiple reasons and certainly PMPI itself is not
enough.
References of interest:

 -

 [1] Shende, S., Malony, A., Morris, A., Wolf, F. "Performance
 Profiling Overhead Compensation for MPI Programs". 12th EuroPVM-MPI
 Conference, LNCS, vol. 3666, pp. 359-367, 2005.  They review various
 techniques and  come up with datatype wrapping.

 -

 [2] Schulz, M., "Extracting Critical Path Graphs from MPI
 Applications". Cluster Computing 2005, IEEE International, pp. 1-10,
 September 2005. They use datatype wrapping.
 - [3] Jeffrey Vetter, "Dynamic Statistical Profiling of
Communication
 Activity in Distributed Applications". They add support for
piggyback at MPI
 implementation level and report very low overheads (no surprise).

Regards,
Oleg Morajko


On Feb 1, 2008 5:08 PM, Aurélien Bouteiller <boute...@eecs.utk.edu>
wrote:

I don't know of any work in that direction for now. Indeed, we plan
to
eventually integrate at least causal message logging in the pml-v,
which also includes piggybacking. Therefore we are open for
collaboration with you on this matter. Please let us know :)

Aurelien



Le 1 févr. 08 à 09:51, Thomas Ropars a écrit :

Hi,

I'm currently working on optimistic message logging and I wouldlike

to
implement an optimistic message logging protocol in OpenMPI.
Optimistic
message logging protocols piggyback information about dependencies
between processes on the application messages to be able to find a
consistent global state after a failure. That's why I'm interested
in
the problem of piggybacking information on MPI messages.

Is there some works on this problem at the moment ?
Has anyone already implemented some mechanisms in OpenMPI to
piggyback
data on MPI messages?

Regards,

Thomas

Oleg Morajko wrote:

Hi,

I'm developing a causality chain tracking library and need a
mechanism
to attach an extra data to every MPI message, so called piggyback
mechanism.

As far as I know there are a few solutions to this problem from
which
the two fundamental ones are the following:

 * Dynamic datatype wrapping - if a user MPI_Send, let's say 1024
   doubles, the wrapped send call implementation dynamically
   creates a derived datatype that is a structure composed of a
   pointer to 1024 doubles and extra fields to be piggybacked. The
   datatype is constructed with absolute addresses to avoid
copying
   the original buffer. The receivers side creates the equivalent
   datatype to receive the original data and extra data. The
   performance of this solution depends on the how good is derived
   data type handling, but seems to be lightweight.

 * Sending extra data in a separate message -- seems this can have
   much more significant overhead

Do you know any other portable solution?

I have implemented the first solution for P2P operations and it
works

pretty well. However there are problems with collectiveoperations.

There are 2 classes of collective calls that are problematic:

1. Single receiver calls, like MPI_Gather. The sender tasks in
   gather can be handled in the same way as a normal send, a data
   item is wrapped and extra data is piggybacked with the message.
   The problem is at the receiver side when a root gathers N data
   items that must be received in an array big enough to receive
   all items strided by datatype extent.

   In particular, it seems impossible to construct a datatype that
   contains data item and extra data (i.e. structure type with
   absolute addresses) AND make an array of these datatypes
   separated by a fixed extent. For example: data item to receive
   from every process is a vector of 1024 doubles. Extra data is a
   single integer. User provides a receive buffer with place for N
   * 1024 * double. The library allocates an array of N integers
to
   receive piggybacked data. How to construct a datatype that can
   be used to receive data in MPI_Gather?

2. MPI_Reduce calls. There is no problem with datatypes as the
   receiver gets the single data item and not an array as in
   previous case. The problem is the reduction operator itself
   (MPI_Op) because these operators do not work with wrapped data
   types. So I can create a new operator to recognize the wrapped
   data type that extracts the original data (skipping extra data)
   and performs the original reduction. The point is how to invoke
   the original reduction on an existing datatype. I have found
   that Open MPI calls internally ompi_op_reduce(op, inbuf, rbuf,
   count, dtype) this solves a problem. However this makes the
code
   MPI-implementation dependent. Any idea on more portable
options?


Thank you in advance for any comment.

--Oleg

------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Dr. Aurélien Bouteiller
Sr. Research Associate - Innovative Computing Laboratory
Suite 350, 1122 Volunteer Boulevard
Knoxville, TN 37996
865 974 6321





_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI piggyback mechanism

Reply via email to