Jan,
I guess that you have OFED driver installed on you machines. You may do basic network verification with ibdiagnet utility (http://linux.die.net/man/1/ibdiagnet) that is part of OFED installation.
Regards,
Pasha


Jeff Squyres wrote:
On May 4, 2009, at 9:50 AM, jan wrote:

Thank you Jeff. I have passed the mail to the IB vendor Dell company(the
blade was ordered from Dell Taiwan), but he todl me that he didn't
understand  "layer 0 diagnostics". Coluld you help us to get more
information of "layer 0 diagnostics". Thanks again.


Layer 0 = your physical network layer. Specifically: ensure that your IB network is actually functioning properly at both the physical and driver layer. Cisco was an IB vendor for several years; I can tell you from experience that it is *not* enough to just plug everything in and run a few trivial tests to ensure that network traffic seems to be passed properly. You need to have your vendor run a full set of layer 0 diagnostics to ensure that all the cables are good, all the HCAs are good, all the drivers are functioning properly, etc. This involves running diagnostic network testing patterns, checking various error counters on the HCAs and IB switches, etc.

This is something that Dell should know how to do.

I say all this because the problem that you are seeing *seems* to be a network-related problem, not an OMPI-related problem. One can never know for sure, but it is fairly clear that the very first step in your case is to verify that the network is functioning 100% properly. FWIW: this was standard operating procedure when Cisco was selling IB hardware.


Reply via email to