Hi,

I'm developing a causality chain tracking library and need a mechanism to
attach an extra data to every MPI message, so called piggyback mechanism.

As far as I know there are a few solutions to this problem from which the
two fundamental ones are the following:

   - Dynamic datatype wrapping - if a user MPI_Send, let's say 1024
   doubles, the wrapped send call implementation dynamically creates a derived
   datatype that is a structure composed of a pointer to 1024 doubles and extra
   fields to be piggybacked. The datatype is constructed with absolute
   addresses to avoid copying the original buffer. The receivers side creates
   the equivalent datatype to receive the original data and extra data. The
   performance of this solution depends on the how good is derived data type
   handling, but seems to be lightweight.

   - Sending extra data in a separate message -- seems this can have much
   more significant overhead

Do you know any other portable solution?

I have implemented the first solution for P2P operations and it works pretty
well. However there are problems with collective operations. There are 2
classes of collective calls that are problematic:

   1. Single receiver calls, like MPI_Gather. The sender tasks in gather
   can be handled in the same way as a normal send, a data item is wrapped and
   extra data is piggybacked with the message. The problem is at the receiver
   side when a root gathers N data items that must be received in an array big
   enough to receive all items strided by datatype extent.

   In particular, it seems impossible to construct a datatype that
   contains data item and extra data (i.e. structure type with absolute
   addresses) AND make an array of these datatypes separated by a fixed extent.
   For example: data item to receive from every process is a vector of 1024
   doubles. Extra data is a single integer. User provides a receive buffer with
   place for N * 1024 * double. The library allocates an array of N integers to
   receive piggybacked data. How to construct a datatype that can be used to
   receive data in MPI_Gather?

   2. MPI_Reduce calls. There is no problem with datatypes as the
   receiver gets the single data item and not an array as in previous case. The
   problem is the reduction operator itself (MPI_Op) because these operators do
   not work with wrapped data types. So I can create a new operator to
   recognize the wrapped data type that extracts the original data (skipping
   extra data) and performs the original reduction. The point is how to invoke
   the original reduction on an existing datatype. I have found that Open MPI
   calls internally ompi_op_reduce(op, inbuf, rbuf, count, dtype) this solves a
   problem. However this makes the code MPI-implementation dependent. Any idea
   on more portable options?


Thank you in advance for any comment.

--Oleg

Reply via email to