On Fri, Sep 25, 2009 at 10:12:33PM -0400, Greg Fischer wrote:
> 
>    It looks like the buffering operations consume about 15% as much time
>    as the allreduce operations.  Not huge, but not trivial, all the same.
>    Is there any way to avoid the buffering step?

That depends on how you allocated the array and phim arrays.
If they are contiguous in memory at least with respect to the
first two dimensions, i.e., if they are allocated like

allocate (array(1:im, 1:jm, 1:something, ..., ..., ...)

and similarly for phim (i.e., the first dimension is exactly 1:im,
the second 1:jm, and the third starts at 1) then you should be able
to do

call MPI_Allreduce(array(1,1,1,nl,0,ng), phim(1,1,1,nl,0,ng),   &
                   im*jm*kmloc(coords(2)+1), MPI_REAL, MPI_SUM, &
                   ang_com, ierr)

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: sieg...@sfu.ca
Canada  V5A 1S6

>    On Thu, Sep 24, 2009 at 6:03 PM, Eugene Loh <[1]eugene....@sun.com>
>    wrote:
> 
>    Greg Fischer wrote:
> 
>      (I apologize in advance for the simplistic/newbie question.)
>      I'm performing an ALLREDUCE operation on a multi-dimensional array.
>      This operation is the biggest bottleneck in the code, and I'm
>      wondering if there's a way to do it more efficiently than what I'm
>      doing now.  Here's a representative example of what's happening:
>         ir=1
>         do ikl=1,km
>           do ij=1,jm
>             do ii=1,im
>               albuf(ir)=array(ii,ij,ikl,nl,0,ng)
>               ir=ir+1
>             enddo
>           enddo
>         enddo
>         agbuf=0.0
>         call
>      mpi_allreduce(albuf,agbuf,im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,
>      ang_com,ierr)
>         ir=1
>         do ikl=1,km
>           do ij=1,jm
>             do ii=1,im
>               phim(ii,ij,ikl,nl,0,ng)=agbuf(ir)
>               ir=ir+1
>             enddo
>           enddo
>         enddo
>      Is there any way to just do this in one fell swoop, rather than
>      buffering, transmitting, and unbuffering?  This operation is looped
>      over many times.  Are there savings to be had here?
> 
>    There are three steps here:  buffering, transmitting, and unbuffering.
>    Any idea how the run time is distributed among those three steps?
>    E.g., if most time is spent in the MPI call, then combining all three
>    steps into one is unlikely to buy you much... and might even hurt.  If
>    most of the time is spent in the MPI call, then there may be some
>    tuning of collective algorithms to do.  I don't have any experience
>    doing this with OMPI.  I'm just saying it makes some sense to isolate
>    the problem a little bit more.
> 
>      _______________________________________________
>      users mailing list
>      [2]us...@open-mpi.org
>      [3]http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> References
> 
>    1. mailto:eugene....@sun.com
>    2. mailto:us...@open-mpi.org
>    3. http://www.open-mpi.org/mailman/listinfo.cgi/users

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to