On Oct 15, 2009, at 2:14 AM, Sangamesh B wrote:

     I've run ibpingpong tests. They are working fine.

Sorry for the delay in replying.

Good.

Are there any additional tests available which will make sure that "there is no problem with IB software and Open MPI. The problem is with Application or IB hardware"?

George mentioned the point that using "--mca btl openib,self" will only allow OMPI to use those two networks. So you should be good there -- with those command line options, it'll either run on IB or it will fail to run if the IB is not working.

Unfortunately, OMPI currently only has a negative acknowledgement when you're *not* using high-performance networks -- it doesn't give you a positive acknowledgement when it *is* using a high-performance network (because this is the much more common case).

    Because we've faced some critical problems:

http://www.open-mpi.org/community/lists/users/2009/10/10843.php

This one *appears* to be an application issue. But there was no information provided beyond the initial posting, so it's impossible to say.

http://www.open-mpi.org/community/lists/users/2009/09/10700.php

Pasha had a good reply to this post:

    http://www.open-mpi.org/community/lists/users/2009/09/10705.php

If he's right (and he usually is :-) ), then one of your IB ports when from ACTIVE to DOWN during the run, potentially indicating bad hardware (i.e., Open MPI simply reported the error -- it's possible/ likely that Open MPI didn't *cause* the error). Pasha suggested using ibdiagnet to verify your fabric. Failing that, you might want to contact your IB/cluster vendor for assistance with a layer-0 diagnostic of your IB fabric.

Hope that helps!

--
Jeff Squyres
jsquy...@cisco.com

Reply via email to