Dear colleagues,
   FWIW, years ago I was looking at this problem and developed my
own solution (for C programs) with this structure:
--Be sure your code that works with ambiguous-length types like
'long' can handle different sizes.  I have replacement unambiguous
typedef names like 'si32', 'ui64' etc. for the usual signed and
unsigned fixed-point numbers.
--Run your source code through a utility that analyzes a specified
set of variables, structures, and unions that will be used in
messages and builds tables giving their included types.  Include
these tables in your makefiles.
--Replace malloc, calloc, realloc, free with my own versions,
where you pass a type argument pointing into to this table along
with number of items, etc.  There are separate memory pools for
items that will be passed often, rarely, or never, just to make
things more efficient.
--Do all these calls on the rank 0 processor at program startup and
call a special broadcast routine that sets up data structures on
all the other processors to manage the conversions.
--Replace mpi message passing and broadcast calls with new routines
that use the type information (stored by malloc, calloc, etc.) to
determine what variables to lengthen or shorten or swap on arrival
at the destination.  Regular mpi message passing is used inside
these routines and can be used natively for variables that do not
ever need length changes or byte swapping (i.e. text).  I have a
simple set of routines to gather statistics across nodes with sum,
max, etc. operations, but not too fancy.  I do not have versions of
any of the mpi operations that collect or distribute matrices, etc.
--A little routine must be written for every union.  This is called
from the package when a union is received to determine which
member is present so the right conversion can be done.
--There was a hook to handle IBM (hex exponent) vs IEEE floating
point, but the code never got written.
   Because this is all very complicated and demanding on the
programmer, I am not making it publicly available, but will be
glad to send it privately to anyone who really thinks they can
use it and is willing to get their hands dirty.
   George Reeke (private email: re...@rockefeller.edu)






On Tue, 2018-04-03 at 23:39 +0000, Jeff Squyres (jsquyres) wrote:
> On Apr 2, 2018, at 1:39 PM, dpchoudh . <dpcho...@gmail.com> wrote:
> > 
> > Sorry for a pedantic follow up:
> > 
> > Is this (heterogeneous cluster support) something that is specified by
> > the MPI standard (perhaps as an optional component)?
> 
> The MPI standard states that if you send a message, you should receive the 
> same values at the receiver.  E.g., if you sent int=3, you should receive 
> int=3, even if one machine is big endian and the other machine is little 
> endian.
> 
> It does not specify what happens when data sizes are different (e.g., if type 
> X is 4 bits on one side and 8 bits on the other) -- there's no good answers 
> on what to do there.
> 
> > Do people know if
> > MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an
> > OpenMPI forum)


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to