If I remember correctly, both Intel MPI and MVAPICH2 bind processes by
default. OMPI does not. There are many cases where the "bind by
default" behavior gives better default performance. (There are also
cases where it can give catastrophically worse performance.) Anyhow, it
seems possible to me that this accounts for the difference you're seeing.
To play with binding in OMPI, you can try adding "--bind-to-socket
--bysocket" to your mpirun command line, though what to try can depend
on what version of OMPI you're using as well as details of your
processor (HyperThreads?), your application, etc. There's a FAQ entry
at http://www.open-mpi.org/faq/?category=tuning#using-paffinity
On 12/27/2011 6:45 AM, Ralph Castain wrote:
It depends a lot on the application and how you ran it. Can you
provide some info? For example, if you oversubscribed the node, then
we dial down the performance to provide better cpu sharing. Another
point: we don't bind processes by default while other MPIs do. Etc.
So more info (like the mpirun command line you used, which version you
used, how OMPI was configured, etc.) would help.
On Dec 27, 2011, at 6:35 AM, Eric Feng wrote:
Can anyone help me?
I got similar performance issue when comparing to mvapich2 which is
much faster in each MPI function in real application but similar in
IMB benchmark.
------------------------------------------------------------------------
*From:* Eric Feng <hpc_benchm...@yahoo.com
<mailto:hpc_benchm...@yahoo.com>>
*To:* "us...@open-mpi.org <mailto:us...@open-mpi.org>"
<us...@open-mpi.org <mailto:us...@open-mpi.org>>
*Sent:* Friday, December 23, 2011 9:12 PM
*Subject:* [OMPI users] Openmpi performance issue
Hello,
I am running into performance issue with Open MPI, I wish experts
here can provide me some help,
I have one application calls a lot of sendrecv, and isend/irecv, so
waitall. When I run Intel MPI, it is around 30% faster than OpenMPI.
However if i test sendrecv using IMB, OpenMPI is even faster than
Intel MPI, but when run with real application, Open MPI is much
slower than Intel MPI in all MPI functions by looking at profiling
results. So this is not some function issue, it has a overall
drawback somewhere. Can anyone give me some suggestions of where to
tune to make it run faster with real application?