Re: [gridengine users] question about managing queues

Reuti Thu, 06 Aug 2015 07:29:41 -0700

Hi,

> Am 03.08.2015 um 18:20 schrieb Carl G. Riches <[email protected]>:
> 
> On Sat, 1 Aug 2015, Reuti wrote:
> 
>> Hi,
>> 
>> Am 31.07.2015 um 23:00 schrieb Carl G. Riches:
>> 
>>>> <snip>
>>> Thanks for these pointers.  After going through these and others, I think I 
>>> have a basic understanding of share-tree and functional share policies. I'm 
>>> not clear on whether or not these can be combined, but I think so.
>>> 
>>> The goal is to limit user access to CPU when there is contention to a 
>>> queue.  The limit would apply to all users and the limit would be 12% of 
>>> all CPUs in a queue.
>> 
>> How do you came up with 12% - it will never make exactly 100%? Is the intend 
>> to have a fair share scheduling over a time frame then?
>> 
> 
> The 12% comes from the users' goal of how many CPUs out of the total 
> available a single user can have when the queue(s) are full.  Each user's 
> share should be measured at the time a job is dispatched (functional share?) 
> but with some fair-sharing over a time frame (share-tree?).  That is, if a 
> user has submitted 10,000 jobs (or 1000 10-core parallel jobs) to the queue 
> at one time and the queue doesn't have that many available slots/CPUs, the 
> sharing policy should dispatch other users' jobs in preference to the 
> large-volume user (but still allowing that user to have some slots available 
> to process jobs).  In the case of users that dump huge numbers of jobs to the 
> queues at a time, the sharing policy should remember that high level of use 
> for a short time (no more than a month).


For the "weight_user" and so on values you could keep the default value of 0.25 
(they are used for the functional policy only). More important are the values 
for "weight_ticket" in relation to the alike entries "weight_waiting_time". 
Just give the "weight ticktet" a higher value than the other entries.

The 12% you can achieve by defining a default-user in the share tree as leaf 
with e.g. 12000 shares out of a total of 100000 ("weight_tickets_share 10000") 
in the share-tree dialog, just below the "root" entry, i.e. "Add Node" + "Add 
Leaf".

While "enforce_user auto" in SGE's configuration will take care that each user 
gets his own entry automatically once he submitted a job, it's important to 
change the "auto_user_delete_time 0" to zero. This is the actual object which 
will hold the values recorded for the past usage. If the user object is deleted 
(either by hand or automatically), the knowledge of the past usage is gone.

The record reserved usage and not only the used one (i.e. a user requested more 
than one core but runs only a serial job), it's necessary to set:

"execd_params ACCT_RESERVED_USAGE=TRUE SHARETREE_RESERVED_USAGE=TRUE"

in SGE's configuration too.

-- Reuti


> Carl
> 
>> 
>> 
>>> There are no "projects" or "departments" at this time.  Would these 
>>> parameter settings achieve that goal?
>>> 
>>> 
>>> algorithm                         default
>>> schedule_interval                 0:0:15
>>> maxujobs                          0
>>> queue_sort_method                 load
>>> job_load_adjustments              np_load_avg=0.50
>>> load_adjustment_decay_time        0:7:30
>>> load_formula                      np_load_avg
>>> schedd_job_info                   true
>>> flush_submit_sec                  0
>>> flush_finish_sec                  0
>>> params                            none
>>> reprioritize_interval             0:0:0
>>> halftime                          168
>>> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
>>> compensation_factor               5.000000
>>> weight_user                       0.640000
>>> weight_project                    0.120000
>>> weight_department                 0.120000
>>> weight_job                        0.120000
>>> weight_tickets_functional         1000000
>>> weight_tickets_share              1000
>>> share_override_tickets            TRUE
>>> share_functional_shares           TRUE
>>> max_functional_jobs_to_schedule   2000
>>> report_pjob_tickets               TRUE
>>> max_pending_tasks_per_job         50
>>> halflife_decay_list               none
>>> policy_hierarchy                  OFS
>>> weight_ticket                     1000.000000
>>> weight_waiting_time               0.000000
>>> weight_deadline                   3600000.000000
>>> weight_urgency                    0.100000
>>> weight_priority                   0.100000
>>> max_reservation                   0
>>> default_duration                  INFINITY
>>> 
>>> 
>>> Must we also define a "default" user in some manner such that this policy 
>>> is applied?  If so, how do we do that?
>>> 
>>> 
>>> Thanks,
>>> Carl
>> 
>> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] question about managing queues

Reply via email to