Dear all, I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2 I have 2 compute nodes for testing, each node has a single quad core CPU.
Here is my submission script and PE config: $ cat hpl-8cpu.sge #!/bin/bash # #$ -N HPL_8cpu_IB #$ -pe mpi-fu 8 #$ -cwd #$ -j y #$ -S /bin/bash #$ -V # cd /home/admin/hpl-2.0 # For IB /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines ./bin/goto-openmpi-gcc/xhpl I've tested the mpirun command can be run correctly in command line. $ qconf -sp mpi-fu pe_name mpi-fu slots 8 user_lists NONE xuser_lists NONE start_proc_args /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile stop_proc_args /opt/sge/mpi/stopmpi.sh allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots min accounting_summary TRUE I've checked the $TMPDIR/machines after submit, it was correct. node0002 node0002 node0002 node0002 node0001 node0001 node0001 node0001 However, I found that if I explicitly specify the "-machinefile $TMPDIR/machines", all 8 mpi processes were spawned within a single node, i.e. node0002. However, if I omit "-machinefile $TMPDIR/machines" in the line mpirun, i.e. /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS ./bin/goto-openmpi-gcc/xhpl The mpi processes can start correctly, 4 processes in node0001 and 4 processes in node0002. Is this normal behaviour of Open MPI? Also, I wondered if I have IB interface, for example, the hostname of IB become node0001-clust and node0002-clust, will Open MPI automatically use the IB interface? How about if I have 2 IB ports in each node, which IB bonding was done, will Open MPI automatically benefit from the double bandwidth? Thanks a lot. Best Regards, PN