Re: [gridengine users] SGE PE scheduler problem, doesn't pick least used nodes ?

Reuti Wed, 16 Mar 2011 05:25:12 -0700

Am 16.03.2011 um 13:18 schrieb Erik Soyez:

> Well, that's probably true, "exclusive" resources are not the best choice.
> But the concept could work though if you defined that resource as ordinary
> "consumable", e.g.:
> 
> Complex definition:
> ------------------------------------------------------------------------
> exclusive  excl  INT  <=  YES  YES  0  0
> ------------------------------------------------------------------------
> 
> Exec host definition (each host):
> ------------------------------------------------------------------------
> complex_values  exclusive=1
> ------------------------------------------------------------------------
> 
> sge request (e.g. Nx6-CPU-Jobs):
> ------------------------------------------------------------------------
> --soft -l exclusive=0.1665
> ------------------------------------------------------------------------


Exactly this soft consumable is the main problem:

Unable to run job: denied: soft requests on consumables like "exclusive" are 
not supported.

There was a discussion on the former mailing list, how to change this behavior.

-- Reuti


> 
> Erik Soyez.
> 
> 
> On Wed, 16 Mar 2011, Reuti wrote:
> 
>> Am 16.03.2011 um 12:28 schrieb Erik Soyez:
>> 
>>> Good day Alex,
>>> 
>>> you could try implementing an "exclusive" ressource and request it with
>>> "--soft", e.g. "--soft -l exclusive" in sge_request file as default.
>> 
>> Won't this block the nodes completely? As soon as one job is occupying
>> 6 slots, the second job can't start as the "-soft -l exclusive" can't be
>> revoked in the future again, once the soft request was granted. I think
>> this is the main reason why soft consumables are denied, as the intended
>> behavior is not really clear (this could be changed, that granted soft
>> requests are handled as hard requests lateron).
>> 
>> 
>>> I have never tried this combination but have a look at "man complex",
>>> it's just an idea....  Erik Soyez.
>>> 
>>> 
>>> On Wed, 16 Mar 2011, Alex Phillips wrote:
>>> 
>>>> Dear List,
>>>> We have a cluster of 1920 cores spread over 160 nodes (12 cores/node), we 
>>>> only run one code in one queue, with jobs of between 48 and 256 cores 
>>>> using an mpi pe.
>>>> When benchmarking our code we found a 14-15% speedup by running on 6 
>>>> cores/node, compared with 12 cores/node.
>>>> We also found that if we ran on 6 cores/node, with a second job on the 
>>>> other 6cores/node, we still have a 5-6% speedup.
>>>> So I have configured our mpi pe with allocation_rule = 6, and this works, 
>>>> however, as the cluster fills up, the scheduler is starting a second job 
>>>> on some nodes, before all the nodes are busy.
>>>> How can we configure the scheduler to run one job on all the nodes, before 
>>>> starting a second job ?
>>>> I have tried defining the number of slots as a complex value on the 
>>>> execution hosts, I?ve tried ?np_load_avg, np_load_avg, slots, and -slots 
>>>> as the load_formula, but I can?t get it to work.
>>>> I?ve read 
>>>> _http://blogs.sun.com/sgrell/entry/grid_engine_scheduler_hacks_least_ but 
>>>> I can?t set the allocation rule to $pe_slots, as we only want to run on 6 
>>>> cores/node, not 12.
>>>> Any suggestions ?
> 
> 
> --
> 
> 
> 
> -- 
> Vorstand/Board of Management:
> Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Michel Lepert
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart
> Registernummer/Commercial Register No.: HRB 382196 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SGE PE scheduler problem, doesn't pick least used nodes ?

Reply via email to