Am 11.11.2014 um 19:29 schrieb Ralph Castain:

> 
>> On Nov 11, 2014, at 10:06 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>> 
>> Am 11.11.2014 um 17:52 schrieb Ralph Castain:
>> 
>>> 
>>>> On Nov 11, 2014, at 7:57 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>>>> 
>>>> Am 11.11.2014 um 16:13 schrieb Ralph Castain:
>>>> 
>>>>> This clearly displays the problem - if you look at the reported 
>>>>> “allocated nodes”, you see that we only got one node (cn6050). This is 
>>>>> why we mapped all your procs onto that node.
>>>>> 
>>>>> So the real question is - why? Can you show us the content of PE_HOSTFILE?
>>>>> 
>>>>> 
>>>>>> On Nov 11, 2014, at 4:51 AM, SLIM H.A. <h.a.s...@durham.ac.uk> wrote:
>>>>>> 
>>>>>> Dear Reuti and Ralph
>>>>>> 
>>>>>> Below is the output of the run for openmpi 1.8.3 with this line
>>>>>> 
>>>>>> mpirun -np $NSLOTS --display-map --display-allocation --cpus-per-proc 1 
>>>>>> $exe
>>>>>> 
>>>>>> 
>>>>>> master=cn6050
>>>>>> PE=orte
>>>>>> JOB_ID=2482923
>>>>>> Got 32 slots.
>>>>>> slots:
>>>>>> cn6050 16 par6.q@cn6050 <NULL>
>>>>>> cn6045 16 par6.q@cn6045 <NULL>
>>>> 
>>>> The above looks like the PE_HOSTFILE. So it should be 16 slots per node.
>>> 
>>> Hey Reuti
>>> 
>>> Is that the standard PE_HOSTFILE format? I’m looking at the ras/gridengine 
>>> module, and it looks like it is expecting a different format. I suspect 
>>> that is the problem
>> 
>> Well, the fourth column can be a processer range in older versions of SGE 
>> and the binding in newer ones, but the first three columns were always this 
>> way.
> 
> Hmmm…perhaps I’m confused here. I guess you’re saying that just the last two 
> lines of his output contain the PE_HOSTFILE, as opposed to the entire thing?

Yes. The entire thing looks like an output of the jobscript from the OP. Only 
the last two lines should be the content of the PE_HOSTFILE


> If so, I’m wondering if that NULL he shows in there is the source of the 
> trouble. The parser doesn’t look like it would handle that very well, though 
> I’d need to test it. Is that NULL expected? Or is the NULL not really in the 
> file?

I must admit here: for me the fourth column is either literally UNDEFINED or 
the tuple cpu,core in case of turned on binding like 0,0 But it's never <NULL>, 
neither literally nor the byte 0x00. Maybe the OP can tell us which GE version 
he uses,.

-- Reuti


>> -- Reuti
>> 
>> 
>>> Ralph
>>> 
>>>> 
>>>> I wonder whether any environment variable was reset, which normally allows 
>>>> Open MPI to discover that it's running inside SGE.
>>>> 
>>>> I.e. SGE_ROOT, JOB_ID, ARC and PE_HOSTFILE are untouched before the job 
>>>> starts?
>>>> 
>>>> Supplying "-np $NSLOTS" shouldn't be necessary though.
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>> 
>>>>>> Tue Nov 11 12:37:37 GMT 2014
>>>>>> 
>>>>>> ======================   ALLOCATED NODES   ======================
>>>>>>      cn6050: slots=16 max_slots=0 slots_inuse=0 state=UP
>>>>>> =================================================================
>>>>>> Data for JOB [57374,1] offset 0
>>>>>> 
>>>>>> ========================   JOB MAP   ========================
>>>>>> 
>>>>>> Data for node: cn6050  Num slots: 16   Max slots: 0    Num procs: 32
>>>>>>      Process OMPI jobid: [57374,1] App: 0 Process rank: 0
>>>>>>      Process OMPI jobid: [57374,1] App: 0 Process rank: 1
>>>>>> 
>>>>>> …
>>>>>>      Process OMPI jobid: [57374,1] App: 0 Process rank: 31
>>>>>> 
>>>>>> 
>>>>>> Also
>>>>>> ompi_info | grep grid
>>>>>> gives                 MCA ras: gridengine (MCA v2.0, API v2.0, Component 
>>>>>> v1.8.3)
>>>>>> and
>>>>>> ompi_info | grep psm
>>>>>> gives                 MCA mtl: psm (MCA v2.0, API v2.0, Component v1.8.3)
>>>>>> because the intercoonect is TrueScale/QLogic
>>>>>> 
>>>>>> and
>>>>>> 
>>>>>> setenv OMPI_MCA_mtl "psm"
>>>>>> 
>>>>>> is set in the script. This is the PE
>>>>>> 
>>>>>> pe_name           orte
>>>>>> slots             4000
>>>>>> user_lists        NONE
>>>>>> xuser_lists       NONE
>>>>>> start_proc_args   /bin/true
>>>>>> stop_proc_args    /bin/true
>>>>>> allocation_rule   $fill_up
>>>>>> control_slaves    TRUE
>>>>>> job_is_first_task FALSE
>>>>>> urgency_slots     min
>>>>>> 
>>>>>> Many thanks
>>>>>> 
>>>>>> Henk
>>>>>> 
>>>>>> 
>>>>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph 
>>>>>> Castain
>>>>>> Sent: 10 November 2014 05:07
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] oversubscription of slots with GridEngine
>>>>>> 
>>>>>> You might also add the —display-allocation flag to mpirun so we can see 
>>>>>> what it thinks the allocation looks like. If there are only 16 slots on 
>>>>>> the node, it seems odd that OMPI would assign 32 procs to it unless it 
>>>>>> thinks there is only 1 node in the job, and oversubscription is allowed 
>>>>>> (which it won’t be by default if it read the GE allocation)
>>>>>> 
>>>>>> 
>>>>>> On Nov 9, 2014, at 9:56 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> 
>>>>>> Am 09.11.2014 um 18:20 schrieb SLIM H.A. <h.a.s...@durham.ac.uk>:
>>>>>> 
>>>>>> We switched on hyper threading on our cluster with two eight core 
>>>>>> sockets per node (32 threads per node).
>>>>>> 
>>>>>> We configured  gridengine with 16 slots per node to allow the 16 extra 
>>>>>> threads for kernel process use but this apparently does not work. 
>>>>>> Printout of the gridengine hostfile shows that for a 32 slots job, 16 
>>>>>> slots are placed on each of two nodes as expected. Including the openmpi 
>>>>>> --display-map option shows that all 32 processes are incorrectly  placed 
>>>>>> on the head node.
>>>>>> 
>>>>>> You mean the master node of the parallel job I assume.
>>>>>> 
>>>>>> 
>>>>>> Here is part of the output
>>>>>> 
>>>>>> master=cn6083
>>>>>> PE=orte
>>>>>> 
>>>>>> What allocation rule was defined for this PE - "control_slave yes" is 
>>>>>> set?
>>>>>> 
>>>>>> 
>>>>>> JOB_ID=2481793
>>>>>> Got 32 slots.
>>>>>> slots:
>>>>>> cn6083 16 par6.q@cn6083 <NULL>
>>>>>> cn6085 16 par6.q@cn6085 <NULL>
>>>>>> Sun Nov  9 16:50:59 GMT 2014
>>>>>> Data for JOB [44767,1] offset 0
>>>>>> 
>>>>>> ========================   JOB MAP   ========================
>>>>>> 
>>>>>> Data for node: cn6083  Num slots: 16   Max slots: 0    Num procs: 32
>>>>>>    Process OMPI jobid: [44767,1] App: 0 Process rank: 0
>>>>>>    Process OMPI jobid: [44767,1] App: 0 Process rank: 1
>>>>>> ...
>>>>>>    Process OMPI jobid: [44767,1] App: 0 Process rank: 31
>>>>>> 
>>>>>> =============================================================
>>>>>> 
>>>>>> I found some related mailings about a new warning in 1.8.2 about 
>>>>>> oversubscription and  I tried a few options to avoid the use of the 
>>>>>> extra threads for MPI tasks by openmpi without success, e.g. variants of
>>>>>> 
>>>>>> --cpus-per-proc 1 
>>>>>> --bind-to-core 
>>>>>> 
>>>>>> and some others. Gridengine treats hw threads as cores==slots (?) but 
>>>>>> the content of $PE_HOSTFILE suggests it distributes the slots sensibly  
>>>>>> so it seems there is an option for openmpi required to get 16 cores per 
>>>>>> node?
>>>>>> 
>>>>>> Was Open MPI configured with --with-sge?
>>>>>> 
>>>>>> -- Reuti
>>>>>> 
>>>>>> 
>>>>>> I tried both 1.8.2, 1.8.3 and also 1.6.5.
>>>>>> 
>>>>>> Thanks for some clarification that anyone can give.
>>>>>> 
>>>>>> Henk
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25718.php
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25719.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25741.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25747.php
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/11/25750.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/11/25753.php
>>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25756.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25757.php

Reply via email to