Dear Rolf,

Thanks for your reply.
I've created another PE and changed the submission script, explicitly
specify the hostname with "--host".
However the result is the same.

# qconf -sp orte
pe_name            orte
slots              8
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $fill_up
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE

$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/admin/hpl-2.0
/opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS --host
node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002
./bin/goto-openmpi-gcc/xhpl


# pdsh -a ps ax --width=200|grep hpl
node0002: 18901 ?        S      0:00 /opt/openmpi-gcc/bin/mpirun -v -np 8
--host
node0001,node0001,node0001,node0001,node0002,node0002,node0002,node0002
./bin/goto-openmpi-gcc/xhpl
node0002: 18902 ?        RLl    0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18903 ?        RLl    0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18904 ?        RLl    0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18905 ?        RLl    0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18906 ?        RLl    0:29 ./bin/goto-openmpi-gcc/xhpl
node0002: 18907 ?        RLl    0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18908 ?        RLl    0:28 ./bin/goto-openmpi-gcc/xhpl
node0002: 18909 ?        RLl    0:28 ./bin/goto-openmpi-gcc/xhpl

Any hint to debug this situation?

Also, if I have 2 IB ports in each node, which IB bonding was done, will
Open MPI automatically benefit from the double bandwidth?

Thanks a lot.

Best Regards,
PN

2009/4/1 Rolf Vandevaart <rolf.vandeva...@sun.com>

> On 03/31/09 11:43, PN wrote:
>
>> Dear all,
>>
>> I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
>> I have 2 compute nodes for testing, each node has a single quad core CPU.
>>
>> Here is my submission script and PE config:
>> $ cat hpl-8cpu.sge
>> #!/bin/bash
>> #
>> #$ -N HPL_8cpu_IB
>> #$ -pe mpi-fu 8
>> #$ -cwd
>> #$ -j y
>> #$ -S /bin/bash
>> #$ -V
>> #
>> cd /home/admin/hpl-2.0
>> # For IB
>> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines
>> ./bin/goto-openmpi-gcc/xhpl
>>
>> I've tested the mpirun command can be run correctly in command line.
>>
>> $ qconf -sp mpi-fu
>> pe_name            mpi-fu
>> slots              8
>> user_lists         NONE
>> xuser_lists        NONE
>> start_proc_args    /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
>> stop_proc_args     /opt/sge/mpi/stopmpi.sh
>> allocation_rule    $fill_up
>> control_slaves     TRUE
>> job_is_first_task  FALSE
>> urgency_slots      min
>> accounting_summary TRUE
>>
>>
>> I've checked the $TMPDIR/machines after submit, it was correct.
>> node0002
>> node0002
>> node0002
>> node0002
>> node0001
>> node0001
>> node0001
>> node0001
>>
>> However, I found that if I explicitly specify the "-machinefile
>> $TMPDIR/machines", all 8 mpi processes were spawned within a single node,
>> i.e. node0002.
>>
>> However, if I omit "-machinefile $TMPDIR/machines" in the line mpirun,
>> i.e.
>> /opt/openmpi-gcc/bin/mpirun -v -np $NSLOTS ./bin/goto-openmpi-gcc/xhpl
>>
>> The mpi processes can start correctly, 4 processes in node0001 and 4
>> processes in node0002.
>>
>> Is this normal behaviour of Open MPI?
>>
>
> I just tried it both ways and I got the same result both times.  The
> processes are split between the nodes.  Perhaps to be extra sure, you can
> just run hostname?  And for what it is worth, as you have seen, you do not
> need to specify a machines file.  Open MPI will use the ones that were
> allocated by SGE.  You can also change your parallel queue to not run any
> scripts.  Like this:
>
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
>
>
>> Also, I wondered if I have IB interface, for example, the hostname of IB
>> become node0001-clust and node0002-clust, will Open MPI automatically use
>> the IB interface?
>>
> Yes, it should use the IB interface.
>
>>
>> How about if I have 2 IB ports in each node, which IB bonding was done,
>> will Open MPI automatically benefit from the double bandwidth?
>>
>> Thanks a lot.
>>
>> Best Regards,
>> PN
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
>
> =========================
> rolf.vandeva...@sun.com
> 781-442-3043
> =========================
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to