Dear colleagues, FWIW, years ago I was looking at this problem and developed my own solution (for C programs) with this structure: --Be sure your code that works with ambiguous-length types like 'long' can handle different sizes. I have replacement unambiguous typedef names like 'si32', 'ui64' etc. for the usual signed and unsigned fixed-point numbers. --Run your source code through a utility that analyzes a specified set of variables, structures, and unions that will be used in messages and builds tables giving their included types. Include these tables in your makefiles. --Replace malloc, calloc, realloc, free with my own versions, where you pass a type argument pointing into to this table along with number of items, etc. There are separate memory pools for items that will be passed often, rarely, or never, just to make things more efficient. --Do all these calls on the rank 0 processor at program startup and call a special broadcast routine that sets up data structures on all the other processors to manage the conversions. --Replace mpi message passing and broadcast calls with new routines that use the type information (stored by malloc, calloc, etc.) to determine what variables to lengthen or shorten or swap on arrival at the destination. Regular mpi message passing is used inside these routines and can be used natively for variables that do not ever need length changes or byte swapping (i.e. text). I have a simple set of routines to gather statistics across nodes with sum, max, etc. operations, but not too fancy. I do not have versions of any of the mpi operations that collect or distribute matrices, etc. --A little routine must be written for every union. This is called from the package when a union is received to determine which member is present so the right conversion can be done. --There was a hook to handle IBM (hex exponent) vs IEEE floating point, but the code never got written. Because this is all very complicated and demanding on the programmer, I am not making it publicly available, but will be glad to send it privately to anyone who really thinks they can use it and is willing to get their hands dirty. George Reeke (private email: re...@rockefeller.edu)
On Tue, 2018-04-03 at 23:39 +0000, Jeff Squyres (jsquyres) wrote: > On Apr 2, 2018, at 1:39 PM, dpchoudh . <dpcho...@gmail.com> wrote: > > > > Sorry for a pedantic follow up: > > > > Is this (heterogeneous cluster support) something that is specified by > > the MPI standard (perhaps as an optional component)? > > The MPI standard states that if you send a message, you should receive the > same values at the receiver. E.g., if you sent int=3, you should receive > int=3, even if one machine is big endian and the other machine is little > endian. > > It does not specify what happens when data sizes are different (e.g., if type > X is 4 bits on one side and 8 bits on the other) -- there's no good answers > on what to do there. > > > Do people know if > > MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an > > OpenMPI forum) _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users