Hi David

For what it is worth, the method suggested by
Terry Dontje and Richard Troutmann is what is used in several generations of climate coupled models that we've been using for the
past 8+ years.

The goals are slightly different from yours:
they cut across logical boundaries
(i.e. who's atmosphere, who's ocean, etc),
whereas you want to cut across physical boundaries
(i.e. belonging to the same computer,
as diffuse as the notion of "same computer" can be these days).

The variants of this technique that I know of
are slightly different from Terry's suggestion:
they don't split the (MPI_COMM_WORLD) communicator,
but create additional sub-communicators instead.
However, the idea is the same.

The upside of this technique, as Terry and Richard point out,
is portability.
These models have been run in IBM Blue Genes using the IBM MPI,
on Kraken and Jaguar (Cray XT5  or XT6?) using whatever MPI they
have there, and I can even run them in our modest Beowulf clusters,
using OpenMPI or MVAPICH2, or even MPICH2.
All MPI calls are completely standard, hence the code is portable.
If the code had calls to the "orte" layer
(or to "P4" in the old days of MPICH) for instance, it wouldn't be.

If portability, specially portability across MPI variants, is important
to you, you may think of implementing the functionality you need
this way.

And to the MPI insiders/developers, a plea from a mere user:
Whatever you take to the Forum,
please keep this functionality (creating new communicators, splitting old ones, getting processor name, etc) in the standard,
although the extensions suggested by Ralph Castain and Eugene Loh
would be certainly welcome.

Cheers,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

David Mathog wrote:
The answer is yes - sort of...

In OpenMPI, every process has information about not only its own local
rank, but the local rank of all its peers regardless of what node they
are on. We use that info internally for a variety of things.
Now the "sort of". That info isn't exposed via an MPI API at this
time. If that doesn't matter, then I can tell you how to get it - it's
pretty trivial to do.

Please tell me how to do this using the internal information.
For now I will use that to write these functions (which might at some
point correspond to standard functions, or not)
my_MPI_Local_size(MPI_Comm comm, int *lmax, int *lactual)
my_MPI_Local_rank(MPI_Comm comm, int *lrank)

These will return N for lmax, a value M in 1->N for lactual, and a value
in 1->M for lrank, for any worker on a machine corresponding to a
hostfile line like:

node123.cluster slots=N

As usual, this could get complicated.  There are probably race
conditions on lactual vs. lrank as the workers start, but I'm guessing
the lrank to lmax relationship won't have that problem.  Similarly, the
meaning of "local" is pretty abstract. For now all that is intended is
"a group of equivalent cores within a single enclosure, where
communication between them is strictly internal to the enclosure, and
where all have equivalent access to the local disks and the network
interface(s)".  Other ways to define "local" might make more sense on
more complex hardware.
Another function that logically belongs with these is:

my_MPI_Local_list(MPI_Comm comm, int *llist, int *lactual)

I don't need it now, but can imagine applications that would.  This
would return the (current)  lactual value and the corresponding list of
rank numbers of all the local workers.  The array llist must be of size
lmax.


Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to