Am 11.11.2014 um 16:13 schrieb Ralph Castain: > This clearly displays the problem - if you look at the reported “allocated > nodes”, you see that we only got one node (cn6050). This is why we mapped all > your procs onto that node. > > So the real question is - why? Can you show us the content of PE_HOSTFILE? > > >> On Nov 11, 2014, at 4:51 AM, SLIM H.A. <[email protected]> wrote: >> >> Dear Reuti and Ralph >> >> Below is the output of the run for openmpi 1.8.3 with this line >> >> mpirun -np $NSLOTS --display-map --display-allocation --cpus-per-proc 1 $exe >> >> >> master=cn6050 >> PE=orte >> JOB_ID=2482923 >> Got 32 slots. >> slots: >> cn6050 16 par6.q@cn6050 <NULL> >> cn6045 16 par6.q@cn6045 <NULL>
The above looks like the PE_HOSTFILE. So it should be 16 slots per node. I wonder whether any environment variable was reset, which normally allows Open MPI to discover that it's running inside SGE. I.e. SGE_ROOT, JOB_ID, ARC and PE_HOSTFILE are untouched before the job starts? Supplying "-np $NSLOTS" shouldn't be necessary though. -- Reuti >> Tue Nov 11 12:37:37 GMT 2014 >> >> ====================== ALLOCATED NODES ====================== >> cn6050: slots=16 max_slots=0 slots_inuse=0 state=UP >> ================================================================= >> Data for JOB [57374,1] offset 0 >> >> ======================== JOB MAP ======================== >> >> Data for node: cn6050 Num slots: 16 Max slots: 0 Num procs: 32 >> Process OMPI jobid: [57374,1] App: 0 Process rank: 0 >> Process OMPI jobid: [57374,1] App: 0 Process rank: 1 >> >> … >> Process OMPI jobid: [57374,1] App: 0 Process rank: 31 >> >> >> Also >> ompi_info | grep grid >> gives MCA ras: gridengine (MCA v2.0, API v2.0, Component >> v1.8.3) >> and >> ompi_info | grep psm >> gives MCA mtl: psm (MCA v2.0, API v2.0, Component v1.8.3) >> because the intercoonect is TrueScale/QLogic >> >> and >> >> setenv OMPI_MCA_mtl "psm" >> >> is set in the script. This is the PE >> >> pe_name orte >> slots 4000 >> user_lists NONE >> xuser_lists NONE >> start_proc_args /bin/true >> stop_proc_args /bin/true >> allocation_rule $fill_up >> control_slaves TRUE >> job_is_first_task FALSE >> urgency_slots min >> >> Many thanks >> >> Henk >> >> >> From: users [mailto:[email protected]] On Behalf Of Ralph Castain >> Sent: 10 November 2014 05:07 >> To: Open MPI Users >> Subject: Re: [OMPI users] oversubscription of slots with GridEngine >> >> You might also add the —display-allocation flag to mpirun so we can see what >> it thinks the allocation looks like. If there are only 16 slots on the node, >> it seems odd that OMPI would assign 32 procs to it unless it thinks there is >> only 1 node in the job, and oversubscription is allowed (which it won’t be >> by default if it read the GE allocation) >> >> >> On Nov 9, 2014, at 9:56 AM, Reuti <[email protected]> wrote: >> >> Hi, >> >> >> Am 09.11.2014 um 18:20 schrieb SLIM H.A. <[email protected]>: >> >> We switched on hyper threading on our cluster with two eight core sockets >> per node (32 threads per node). >> >> We configured gridengine with 16 slots per node to allow the 16 extra >> threads for kernel process use but this apparently does not work. Printout >> of the gridengine hostfile shows that for a 32 slots job, 16 slots are >> placed on each of two nodes as expected. Including the openmpi --display-map >> option shows that all 32 processes are incorrectly placed on the head node. >> >> You mean the master node of the parallel job I assume. >> >> >> Here is part of the output >> >> master=cn6083 >> PE=orte >> >> What allocation rule was defined for this PE - "control_slave yes" is set? >> >> >> JOB_ID=2481793 >> Got 32 slots. >> slots: >> cn6083 16 par6.q@cn6083 <NULL> >> cn6085 16 par6.q@cn6085 <NULL> >> Sun Nov 9 16:50:59 GMT 2014 >> Data for JOB [44767,1] offset 0 >> >> ======================== JOB MAP ======================== >> >> Data for node: cn6083 Num slots: 16 Max slots: 0 Num procs: 32 >> Process OMPI jobid: [44767,1] App: 0 Process rank: 0 >> Process OMPI jobid: [44767,1] App: 0 Process rank: 1 >> ... >> Process OMPI jobid: [44767,1] App: 0 Process rank: 31 >> >> ============================================================= >> >> I found some related mailings about a new warning in 1.8.2 about >> oversubscription and I tried a few options to avoid the use of the extra >> threads for MPI tasks by openmpi without success, e.g. variants of >> >> --cpus-per-proc 1 >> --bind-to-core >> >> and some others. Gridengine treats hw threads as cores==slots (?) but the >> content of $PE_HOSTFILE suggests it distributes the slots sensibly so it >> seems there is an option for openmpi required to get 16 cores per node? >> >> Was Open MPI configured with --with-sge? >> >> -- Reuti >> >> >> I tried both 1.8.2, 1.8.3 and also 1.6.5. >> >> Thanks for some clarification that anyone can give. >> >> Henk >> >> >> _______________________________________________ >> users mailing list >> [email protected] >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25718.php >> _______________________________________________ >> users mailing list >> [email protected] >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25719.php >> >> _______________________________________________ >> users mailing list >> [email protected] >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25741.php > > _______________________________________________ > users mailing list > [email protected] > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/11/25747.php
