v1.1 does not have the tuned collective (I think but now I'm not 100%
sure anymore), or at least they were not active by default. The first
version with the tuned collective will be 1.2. The current decision
function (from the nightly builds) target high performance networks
with 2 characte
George
I found the info I think you were referring to. Thanks. I then experimented
essentially randomly with different algorithms for all reduce. But the issue
with really bad performance for certain message sizes persisted with v1.1.
The good news is that the upgrade to 1.2 fixed my worst problem
Tony,
What do mean by TCP ? Are you using an ethernet interconnect ?
I have noticed a similar slowdown using LAM/MPI and MPI_Alltoall
primitive on our Solaris 10 cluster using gigabit ethernet and TCP. For
a large number of nodes I could ever come to a complete hangup. Part of
the problem
We have nightly 1.2 tarballs too (just not listed on the web page).
To clarify our development process:
- we develop on the trunk. We typically keep that version number
higher than any existing or upcoming releases (e.g., right now, it's
"1.3" because the trunk will someday be branched for
On Nov 1, 2006, at 10:27 AM, George Bosilca wrote:
PS: BTW which version of Open MPI are you using ? The one who deliver
the best performance or the collective communications (at least on
high performance networks) is the nightly release of he 1.2 branch.
As far as I can see the only nightly
On Oct 28, 2006, at 6:51 PM, Tony Ladd wrote:
George
Thanks for the references. However, I was not able to figure out if
it what
I am asking is so trivial it is simply passed over or so subtle
that its
been overlooked (I suspect the former).
No. The answer to your question was in the ar
George
Thanks for the references. However, I was not able to figure out if it what
I am asking is so trivial it is simply passed over or so subtle that its
been overlooked (I suspect the former). The binary tree algorithm in
MPI_Allreduce takes a tiume proportional to 2*N*log_2M where N is the vec
No documentation yet. If you want to understand how it works and what
exactly "highly optimized" means please look for the collectives
papers on this page (http://www.netlib.org/utk/people/JackDongarra/
papers.htm). In few words, we have multiples algorithms and we tune
them based on the net
George
Thanks for the info. When you say "highly optimized" do you algorithmically,
tuning, or both? In particular I wonder if OMPI optimized collectives use
the divide and conquer strategy to maximize network bandwidth.
Sorry to be dense but I could not find documanetation on how to access the
o
There are 2 different collectives in Open MPI. One is a basic
implementation and one is highly optimized. The only problem is that
we optimized them based on the network, number of nodes and message
size. As you can imagine ... not all the networks are the same ...
which lead to troubles on
1) I think OpenMPI does not use optimal algorithms for collectives. But
neither does LAM. For example the MPI_Allreduce scales as log_2 N where N is
the number of processors. MPICH uses optimized collectives and the
MPI_Allreduce is essentially independent of N. Unfortunately MPICH has never
had a
11 matches
Mail list logo