On Oct 15, 2009, at 2:14 AM, Sangamesh B wrote:
I've run ibpingpong tests. They are working fine.
Sorry for the delay in replying.
Good.
Are there any additional tests available which will make sure that
"there is no problem with IB software and Open MPI. The problem is
with Application or IB hardware"?
George mentioned the point that using "--mca btl openib,self" will
only allow OMPI to use those two networks. So you should be good
there -- with those command line options, it'll either run on IB or it
will fail to run if the IB is not working.
Unfortunately, OMPI currently only has a negative acknowledgement when
you're *not* using high-performance networks -- it doesn't give you a
positive acknowledgement when it *is* using a high-performance network
(because this is the much more common case).
Because we've faced some critical problems:
http://www.open-mpi.org/community/lists/users/2009/10/10843.php
This one *appears* to be an application issue. But there was no
information provided beyond the initial posting, so it's impossible to
say.
http://www.open-mpi.org/community/lists/users/2009/09/10700.php
Pasha had a good reply to this post:
http://www.open-mpi.org/community/lists/users/2009/09/10705.php
If he's right (and he usually is :-) ), then one of your IB ports when
from ACTIVE to DOWN during the run, potentially indicating bad
hardware (i.e., Open MPI simply reported the error -- it's possible/
likely that Open MPI didn't *cause* the error). Pasha suggested using
ibdiagnet to verify your fabric. Failing that, you might want to
contact your IB/cluster vendor for assistance with a layer-0
diagnostic of your IB fabric.
Hope that helps!
--
Jeff Squyres
jsquy...@cisco.com