Hi, Please notice the difference between "set linear:1:0,0“ and "set linear:1“. The first one means - give me one core starting at socket 0 core 0 (which means here obviously you are requesting core 0 on socket 0). The second means that you want one core on the host and the execution daemon takes care which one.
So per design the core selection is done on the execd in SGE - while in Univa Grid Engine we moved that to the qmaster itself (which has many advantages due to the global view of the cluster / job and core usage). If now the execd in your case tries to bind the job it figures out that a different job already uses this core and therefore SGE just don’t do any binding for the job (in order to avoid overallocation). I guess your linear:1:0,0 request is not by intention - it does only make sense in scenarios where you are using your host exclusively for one job. This is probably caused by your JSV script - which sets binding_strategy to „linear“ (linear:X:S,C) instead of „linear_automatic“ (linear:X). Obviously the naming of the JSV parameter argument is unfortunate. Might this be the reason? Cheers Daniel Am 27.06.2014 um 12:58 schrieb Txema Heredia <txema.llis...@gmail.com>: > El 27/06/14 12:32, Reuti escribió: >> Am 27.06.2014 um 12:24 schrieb Txema Heredia: >> >>> El 27/06/14 11:31, Reuti escribió: >>>> Hi, >>>> >>>> Am 26.06.2014 um 17:56 schrieb Txema Heredia: >>>> >>>>> <snip> >>>>> >>>>> # qstat -j 4561291 -cb | grep "job_name\|binding\|queue_list" >>>>> job_name: c0-1 >>>>> hard_queue_list: *@compute-0-1.local >>>>> binding: set linear:1:0,0 >>>>> binding 1: NONE >>>>> >>>>> What I am missing here? What can be different in my nodes? >>>> Does `qhost -F` output the fields: >>>> >>>> $ qhost -F >>>> ... >>>> hl:m_topology=SC >>>> hl:m_topology_inuse=SC >>>> hl:m_socket=1.000000 >>>> hl:m_core=1.000000 >>>> >>>> for this machine? >>>> >>>> -- Reuti >>> Yes, qhost -F reports that for all nodes: >>> >>> # qhost -F | grep "compute\|hl:m_" >>> compute-0-0 lx26-amd64 12 0.60 94.6G 10.1G 9.8G 53.8M >>> hl:m_topology=SCCCCCCSCCCCCC >>> hl:m_topology_inuse=SCCCCCCSCCCCCC >>> hl:m_socket=2.000000 >>> hl:m_core=12.000000 >>> compute-0-1 lx26-amd64 12 7.21 94.6G 14.9G 9.8G 86.6M >>> hl:m_topology=SCCCCCCSCCCCCC >>> hl:m_topology_inuse=ScCCCCCSCCCCCC >>> hl:m_socket=2.000000 >>> hl:m_core=12.000000 >>> ... >>> >>> >>> But the inuse topology is blatantly wrong. >> What version of SGE are you using? Maybe the "PLPA" which was used in former >> versions doesn't support this particular CPU's topology. It was replaced by >> "hwloc" later on. >> >> -- Reuti >> > Originally it was SGE 6.2u5, but later on I substituted the sge_qmaster > binary for OGS/GE 2011.11p1 (due to a problem with parallel jobs and > -hold_jid) > > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users