All this seems a terrific effort.
Is it really justified, especially if it can't cope with the diversity of real-world applications ?

I suspect that people who are clever enough to code complex parallel codes involving collective primitives might be able to check arguments.
If possible, I suggest that "easy" arguments (codes) should be checked, and that multidimensional ones (buffers) should not. And of course the doc should make the point clear enough.

If f03 standard allows a full argument check, let it be for a f03 interface only.

Pierre V.


Michael Kluskens wrote:
On Nov 2, 2006, at 11:53 AM, Jeff Squyres wrote:

  
Adding Craig Rasmussen from LANL into the CC list...

On Oct 31, 2006, at 10:26 AM, Michael Kluskens wrote:

    
OpenMPI tickets 39 & 55 deal with problems with the Fortran 90
large interface with regards to:

#39: MPI_IN_PLACE in MPI_REDUCE <https://svn.open-mpi.org/trac/ompi/
ticket/39>
#55: MPI_GATHER with arrays of different dimensions <https://
svn.open-mpi.org/trac/ompi/ticket/55>

Attached is a patch to deal with these two issues as applied
against OpenMPI-1.3a1r12364.
      
Thanks for the patch!  Before committing this, though, I think more
needs to be done and I want to understand it before doing so (part of
this is me thinking it out while I write this e-mail...).  Also, be
aware that SC is 1.5 weeks away, so I may not be able to get to
address this issue before then (SC tends to be all-consuming).
    
Understood, just didn't wish to see this die or get worse.

  
1. The "same type" heuristic for the "large" F90 module was not
intended to cover all possible scenarios.  You're absolutely right
that assuming the same
    
dimension (sp)
  
makes no sense for some of the
interfaces.  The problem is that the obvious alternative (all
possible scenarios) creates an exponential number of interfaces (in
the millions).
    
I think it can be limited by including reasonable scenarios.  As is  
it's not very useful but as is it at least can be patched by the end- 
builder.

  
  So "large" was an attempt to provide *some* of the
interfaces -- but [your] experience has shown that this can do more
harm than good (i.e., make some legal MPI applications uncompilable
because we provide *some* interfaces to MPI_GATHER, but not all).
    
This is a serious issue in my opinion.  I suspect that virtually  
every use of MPI_GATHER and the others would fail with the large  
interfaces as is, there by making sure no one would be able to use  
the large interfaces on a multiuser system.

  
1a. It gets worse because of MPI's semantics for MPI_GATHER.  You
pointed out one scenario -- it doesn't make sense to supply "integer"
for both the sendbuf and recvbuf because the root will need an
integer array to receive all the values (similar logic applies to
MPI_SCATTER and other collectives -- so what you did for MPI_GATHER
would need to be applied to several others as well).
    
Agreed.  I limited my patch to that which I could test with working  
code and could justify work time wise.

  
1b. But even worse than that is the fact that, for MPI_GATHER, the
receive buffer is not relevant on non-root processes.  So it's valid
for *any* type to be passed for non-root processes (leading to the
exponential interface explosion described above).
    
I would consider this to be very bad programming practice and not a  
good idea to support in the large interface regardless of the cost.

One issue is that derived datatypes will never (?) work with the  
large interfaces, for that matter I would guess that derived  
datatypes probably don't work with medium and possibly small  
interfaces.  I don't know if there is away around that issue at all  
in F90/F95, some places may have to do two installations.  I don't  
think giving up on all interfaces that conflict with derived types is  
a good solution.

  
So having *some* interfaces for MPI_GATHER can be a problem for both
1a and 1b -- perfectly valid/legal MPI apps will fail to compile.

I'm not sure what the right balance is here -- how do we allow for
both 1a and 1b without creating millions of interfaces?  Your patch
created MPI_GATHER interfaces for all the same types, but allowing
any dimension mix.  With the default max dimension level of 4 in
OMPI's interfaces, this created 90 new interfaces for MPI_GATHER,
calculated (and verified with some grep/wc'ing):

For src buffer of dimension:    0   1   2   3   4
Create this many recvbuf types: 4 + 4 + 3 + 2 + 1 = 14
    
An alternative would be to allow same and one less dimension for  
large (called dim+1 below), and make all dimensions be optional some  
way.  I know that having these extra interfaces allowed me to find  
serious oversights on my part by permitting me to compile with the  
large interfaces.

  
For each src/recvbuf combination, create this many interfaces:

(char + logical + (integer * 4) + (real * 2) + (complex * 2)) = 10

Where 4, 2, and 2 are the number of integer, real, and complex types
supported by the compiler on my machines (e.g., gfortran on OSX/intel
and Linux/EM64T).

So this created 14 * 10 = 140 interfaces, as opposed to the 50 that
were there before the patch (5 dimensions of src/recvbuf * 10 types =
50), resulting in 90 new interfaces.

This effort will need to be duplicated by several other collectives:

- allgather, allgatherv
- alltoall, alltoallv, alltoallw
- gather, gatherv
- scatter, scatterv

So an increase of 9 * 90 = 810 new interfaces.  Not too bad,
considering the alternative (exponential).  But consider that the
"large" interface only has (by my count via egrep/wc) 4013
interfaces.  This would be increasing its size by about 20%.  This is
certainly not a show-stopper, but something to consider.
    
Without some increase (all or dim+1) I suspect large interfaces will  
be useless for anyone (or any site) accessing one of these 10  
routines anywhere in their program.

  
Note that if you go higher than OMPI's default 4 dimensions, the
number of new interfaces gets considerably larger (e.g., for 7
dimensions you get 35 send/recv type combinations instead of 14, so
(35 * 10 * 9) = 3150 total interfaces (just for the collectives), if
I did my math right.

2. You also identified another scenario that needs to be fixed --
support for MPI_IN_PLACE in certain collectives (MPI_REDUCE is not
the only collective that supports it).  It doesn't seem to be a Good
Idea to allow the INTEGER type to be mixed with any other type for
send/recvbuf combinations, just to allow MPI_IN_PLACE.  This
potentially adds in send/recvbuf signatures that we want to disallow
(even though they are potentially valid MPI applications!) -- e.g.,
INTEGER and FLOAT.  What if a user accidentally supplied an INTEGER
for the sendbuf that wasn't MPI_IN_PLACE?  That's what the type
system is supposed to be preventing.

I don't know enough about the type system of F90, but it strikes me
that we should be able to create a unique type for MPI_IN_PLACE
(don't know why I didn't think of this before for some of the MPI
sentinel values... :-\ ) and therefore have a safe mechanism for this
sentinel value.
    
This would be very good approach, allowing large interfaces to be  
used with MPI_IN_PLACE but preventing this alternative error.  That's  
a bit more complicated then I'm ready to patch myself.

  
This would add 10 interfaces for every function that supports
MPI_IN_PLACE; a pretty small increase.

This same technique should probably be applied to some of the other
sentinel values, such as MPI_ARGVS_NULL and MPI_STATUSES_IGNORE.
    
I agree on that as well, but don't have experience using these to  
understand all their issues.

  
---------------

All that being said, what does it mean?

I think #2 is easily enough fixed (just require the time to do so),
and has minimal impact on the number of interfaces.  Implementing MPI
sentinel values with unique types also makes user apps that much more
safe (i.e., they won't accidentally pass in an incorrect type that
would be mistaken -- by the interface -- for a valid signature).
    
Or pass the sentinel values into places they should not be passed.

  
#1 is still a problem.  No matter how we slice it, we're going to
leave out valid combinations of send/recv buffers that will prevent
potentially legal MPI applications from compiling.  This is as
opposed to not having F90 interfaces for the 2-choice-buffer
functions at all, which would mean that F90 apps using MPI_GATHER
(for example) would simply fall back to the F77 interfaces where no
type checking is done.  End result: all MPI F90 apps can compile.

Simply put, with the trivial, small, and medium module sizes, all
valid MPI F90 applications can compile and run.
    
Well maybe not as I point out above with derived types, again not a  
reason to ditch interfaces completely, they do more good then harm.

  
  With the large size,
unless we do the exponential interface explosion, we will be
potentially excluding some legal MPI F90 applications -- they *will
not be able to compile* (without workarounds).  This is what I meant
by ticket 55's title "F90 "large" interface may not entirely make
sense".

So there are multiple options here:

1. Keep chasing a "good" definition of "large" such that most/all
current MPI F90 apps can compile.  The problem is that this target
can change over time, and keep requiring maintenance.

2. Stop pursuing "large" because of the problems mentioned above.
This has the potential problem of not providing type safety to F90
MPI apps for the MPI collective interfaces, but at least all apps can
compile, and there's only a small number of 2-choice-buffer functions
that do not get the type safety from F90 (i.e., several MPI
collective functions).

3. Start implementing the proposed F03 MPI interfaces that don't have
the same problems as the F90 MPI interfaces.

I have to admit that I'm leaning more towards #2 (and I wish that
someone who has the time would do #3!) and discarding #1...
    

I dislike #2 intensely because then I and others couldn't at least  
patch the interface scripts before building OpenMPI.

#1 is preferred and just give the users/builders clear notice they  
may not cover everything and perhaps a hint as to what directory has  
the files to be patched to extend the large interface a bit further.

#3 would be nice but I don't see enough F03 support in enough  
compilers at this time.  I don't even have a book on the F03 changes  
and I program Fortran most of the day virtually every weekday.  It  
took our group till about 2000 to start using Fortran 90 and almost  
everything we do is in Fortran.

Michael

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  


-- 
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/

       _/_/_/_/    _/       _/       Dr. Pierre VALIRON
      _/     _/   _/      _/   Laboratoire d'Astrophysique
     _/     _/   _/     _/    Observatoire de Grenoble / UJF
    _/_/_/_/    _/    _/    BP 53  F-38041 Grenoble Cedex 9 (France)
   _/          _/   _/    http://www-laog.obs.ujf-grenoble.fr/~valiron/
  _/          _/  _/     Mail: pierre.vali...@obs.ujf-grenoble.fr
 _/          _/ _/      Phone: +33 4 7651 4787  Fax: +33 4 7644 8821
_/          _/_/        

Reply via email to