Dear Rolf, Thanks for your reply. I've created another PE and changed the submission script, explicitly specify the hostname with "--host". However the result is the same.
# qconf -sp orte pe_name orte slots 8 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots min accounting_summary TRUE $ cat hpl-8cpu-test.sge #!/bin/bash # #$ -N HPL_8cpu_GB #$ -pe orte 8 #$ -cwd #$ -j y #$ -S /bin/bash #$ -V # cd /home/admin/hpl-2.0 /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS --host node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002 ./bin/goto-openmpi-gcc/xhpl # pdsh -a ps ax --width=200|grep hpl node0002: 18901 ? S 0:00 /opt/openmpi-gcc/bin/mpirun -v -np 8 --host node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002 ./bin/goto-openmpi-gcc/xhpl node0002: 18902 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl node0002: 18903 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl node0002: 18904 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl node0002: 18905 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl node0002: 18906 ? RLl 0:29 ./bin/goto-openmpi-gcc/xhpl node0002: 18907 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl node0002: 18908 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl node0002: 18909 ? RLl 0:28 ./bin/goto-openmpi-gcc/xhpl Any hint to debug this situation? Also, if I have 2 IB ports in each node, which IB bonding was done, will Open MPI automatically benefit from the double bandwidth? Thanks a lot. Best Regards, PN 2009/4/1 Rolf Vandevaart <rolf.vandeva...@sun.com> > On 03/31/09 11:43, PN wrote: > >> Dear all, >> >> I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2 >> I have 2 compute nodes for testing, each node has a single quad core CPU. >> >> Here is my submission script and PE config: >> $ cat hpl-8cpu.sge >> #!/bin/bash >> # >> #$ -N HPL_8cpu_IB >> #$ -pe mpi-fu 8 >> #$ -cwd >> #$ -j y >> #$ -S /bin/bash >> #$ -V >> # >> cd /home/admin/hpl-2.0 >> # For IB >> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines >> ./bin/goto-openmpi-gcc/xhpl >> >> I've tested the mpirun command can be run correctly in command line. >> >> $ qconf -sp mpi-fu >> pe_name mpi-fu >> slots 8 >> user_lists NONE >> xuser_lists NONE >> start_proc_args /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile >> stop_proc_args /opt/sge/mpi/stopmpi.sh >> allocation_rule $fill_up >> control_slaves TRUE >> job_is_first_task FALSE >> urgency_slots min >> accounting_summary TRUE >> >> >> I've checked the $TMPDIR/machines after submit, it was correct. >> node0002 >> node0002 >> node0002 >> node0002 >> node0001 >> node0001 >> node0001 >> node0001 >> >> However, I found that if I explicitly specify the "-machinefile >> $TMPDIR/machines", all 8 mpi processes were spawned within a single node, >> i.e. node0002. >> >> However, if I omit "-machinefile $TMPDIR/machines" in the line mpirun, >> i.e. >> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS ./bin/goto-openmpi-gcc/xhpl >> >> The mpi processes can start correctly, 4 processes in node0001 and 4 >> processes in node0002. >> >> Is this normal behaviour of Open MPI? >> > > I just tried it both ways and I got the same result both times. The > processes are split between the nodes. Perhaps to be extra sure, you can > just run hostname? And for what it is worth, as you have seen, you do not > need to specify a machines file. Open MPI will use the ones that were > allocated by SGE. You can also change your parallel queue to not run any > scripts. Like this: > > start_proc_args /bin/true > stop_proc_args /bin/true > > >> Also, I wondered if I have IB interface, for example, the hostname of IB >> become node0001-clust and node0002-clust, will Open MPI automatically use >> the IB interface? >> > Yes, it should use the IB interface. > >> >> How about if I have 2 IB ports in each node, which IB bonding was done, >> will Open MPI automatically benefit from the double bandwidth? >> >> Thanks a lot. >> >> Best Regards, >> PN >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > > ========================= > rolf.vandeva...@sun.com > 781-442-3043 > ========================= > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >