On Apr 25, 2009, at 11:59 AM, Anton Starikov wrote:

I can confirm that I have exactly the same problem, also on Dell
system, even with latest openpmpi.

Our system is:

Dell M905
OpenSUSE 11.1
kernel: 2.6.27.21-0.1-default
ofed-1.4-21.12 from SUSE repositories.
OpenMPI-1.3.2


But what I can also add, it not only affect openmpi, if this messages
are triggered after mpirun:
[node032][[9340,1],11][btl_openib_component.c:3002:poll_device] error
polling HP CQ with -2 errno says Success

Then IB stack hangs. You cannot even reload it, have to reboot node.



Something that severe should not be able to be caused by Open MPI. Specifically: Open MPI should not be able to hang the OFED stack. Have you run layer 0 diagnostics to know that your fabric is clean? You might want to contact your IB vendor to find out how to do that.

--
Jeff Squyres
Cisco Systems

Reply via email to