Hello!

I've built openmpi 1.6.1rc3 with support of MXM. But when I try to launch
an application using this mtl it hangs and can't figure out why.

If I launch it with np below 128 then everything works fine since mxm isn't
used. I've tried setting the threshold to 0 and launching 2 processes with
the same result: hangs on startup.
What could be causing this problem?

Here is the command I execute:
/opt/openmpi/1.6.1/mxm-test/bin/mpirun \
                -np $NP \
                -hostfile hosts_fdr2 \
                --mca mtl mxm \
                --mca btl ^tcp \
                --mca mtl_mxm_np 0 \
                -x OMP_NUM_THREADS=$NT \
                -x LD_LIBRARY_PATH \
                --bind-to-core \
                -npernode 16 \
                --mca coll_fca_np 0 -mca coll_fca_enable 0 \
                ./IMB-MPI1 -npmin $NP Allreduce Reduce Barrier Bcast
Allgather Allgatherv

I'm performing the tests on nodes with Intel SB processors and FDR. Openmpi
was configured with the following parameters:
CC=icc CXX=icpc F77=ifort FC=ifort ./configure
--prefix=/opt/openmpi/1.6.1rc3/mxm-test --with-mxm=/opt/mellanox/mxm
--with-fca=/opt/mellanox/fca --with-knem=/usr/share/knem
I'm using the latest ofed from mellanox: 1.5.3-3.1.0 on centos 6.1 with
default kernel: 2.6.32-131.0.15.
The compilation with default mxm (1.0.601) failed so I installed the latest
version from mellanox: 1.1.1227

Best regards, Pavel Mezentsev.

Reply via email to