Hello,

I run into a stange problem with qlogic OFED and openmpi. When i submit
(through SGE) 2 jobs on the same node, the second job ends up with:

(ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26)

I'm pretty sure the infiniband is working well as the other job runs fine.

Here is details about the configuration:

Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a switch)
qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll)
openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge)

-------------

In order to fix this problem i recompiled openmpi without psm support, but
i faced an other problem:

The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory.  This typically can indicate that the
memlock limits are set too low.  For most HPC installations, the
memlock limits should be set to "unlimited".  The failure occured
here:

  Local host:    compute-0-6.local
  OMPI source:   btl_openib.c:329
  Function:      ibv_create_srq()
  Device:        qib0
  Memlock limit: *unlimited*

Reply via email to