OpenMPI version: 1.4.3
Platform: IBM P5, 32 processors, 256 GB memory, Symmetric Multi-Threading (SMT)
enabled
Application: starts up 48 processes and does MPI using MPI_Barrier, MPI_Get,
MPI_Put (lots of transfers, large amounts of data)
Issue: When implemented using Open MPI vs. IBM's MPI ('poe' from HPC Toolkit),
the application runs 3-5 times slower.
I suspect that IBM's MPI implementation must take advantage of some knowledge
that it has about data transfers that Open MPI is not taking advantage of.
Any suggestions?
Thanks,
Brian Price