Hi,

Please notice the difference between "set linear:1:0,0“ and
"set linear:1“. The first one means - give me one core starting
at socket 0 core 0 (which means here obviously you are
requesting core 0 on socket 0). The second means that
you want one core on the host and the execution daemon
takes care which one.

So per design the core selection is done on the execd in SGE - 
while in Univa Grid Engine we moved that to the qmaster
itself (which has many advantages due to the global
view of the cluster / job and core usage). 

If now the execd in your case tries to bind the job it figures
out that a different job already uses this core and therefore
SGE just don’t do any binding for the job (in order to avoid 
overallocation).

I guess your linear:1:0,0 request is not by intention - it does
only make sense in scenarios where you are using your
host exclusively for one job.

This is probably caused by your JSV script - which sets binding_strategy
to „linear“ (linear:X:S,C) instead of „linear_automatic“ (linear:X). Obviously
the naming of the JSV parameter argument is unfortunate.

Might this be the reason?

Cheers

Daniel


Am 27.06.2014 um 12:58 schrieb Txema Heredia <txema.llis...@gmail.com>:

> El 27/06/14 12:32, Reuti escribió:
>> Am 27.06.2014 um 12:24 schrieb Txema Heredia:
>> 
>>> El 27/06/14 11:31, Reuti escribió:
>>>> Hi,
>>>> 
>>>> Am 26.06.2014 um 17:56 schrieb Txema Heredia:
>>>> 
>>>>> <snip>
>>>>> 
>>>>> # qstat -j 4561291 -cb | grep "job_name\|binding\|queue_list"
>>>>> job_name:                   c0-1
>>>>> hard_queue_list:            *@compute-0-1.local
>>>>> binding:                    set linear:1:0,0
>>>>> binding    1:               NONE
>>>>> 
>>>>> What I am missing here? What can be different in my nodes?
>>>> Does `qhost -F` output the fields:
>>>> 
>>>> $ qhost -F
>>>> ...
>>>>    hl:m_topology=SC
>>>>    hl:m_topology_inuse=SC
>>>>    hl:m_socket=1.000000
>>>>    hl:m_core=1.000000
>>>> 
>>>> for this machine?
>>>> 
>>>> -- Reuti
>>> Yes, qhost -F reports that for all nodes:
>>> 
>>> # qhost -F | grep "compute\|hl:m_"
>>> compute-0-0             lx26-amd64     12  0.60   94.6G   10.1G 9.8G   53.8M
>>>   hl:m_topology=SCCCCCCSCCCCCC
>>>   hl:m_topology_inuse=SCCCCCCSCCCCCC
>>>   hl:m_socket=2.000000
>>>   hl:m_core=12.000000
>>> compute-0-1             lx26-amd64     12  7.21   94.6G   14.9G 9.8G   86.6M
>>>   hl:m_topology=SCCCCCCSCCCCCC
>>>   hl:m_topology_inuse=ScCCCCCSCCCCCC
>>>   hl:m_socket=2.000000
>>>   hl:m_core=12.000000
>>> ...
>>> 
>>> 
>>> But the inuse topology is blatantly wrong.
>> What version of SGE are you using? Maybe the "PLPA" which was used in former 
>> versions doesn't support this particular CPU's topology. It was replaced by 
>> "hwloc" later on.
>> 
>> -- Reuti
>> 
> Originally it was SGE 6.2u5, but later on I substituted the sge_qmaster 
> binary for OGS/GE 2011.11p1 (due to a problem with parallel jobs and 
> -hold_jid)
> 
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to