Re: [gridengine users] question about managing queues

Carl G. Riches Fri, 07 Aug 2015 10:51:48 -0700

On Thu, 6 Aug 2015, Reuti wrote:

Hi,

Am 03.08.2015 um 18:20 schrieb Carl G. Riches <[email protected]>:

On Sat, 1 Aug 2015, Reuti wrote:

Hi,

Am 31.07.2015 um 23:00 schrieb Carl G. Riches:

<snip>

Thanks for these pointers.  After going through these and others, I think I 
have a basic understanding of share-tree and functional share policies. I'm not 
clear on whether or not these can be combined, but I think so.

The goal is to limit user access to CPU when there is contention to a queue.  
The limit would apply to all users and the limit would be 12% of all CPUs in a 
queue.


How do you came up with 12% - it will never make exactly 100%? Is the intend to 
have a fair share scheduling over a time frame then?


The 12% comes from the users' goal of how many CPUs out of the total available 
a single user can have when the queue(s) are full.  Each user's share should be 
measured at the time a job is dispatched (functional share?) but with some 
fair-sharing over a time frame (share-tree?).  That is, if a user has submitted 
10,000 jobs (or 1000 10-core parallel jobs) to the queue at one time and the 
queue doesn't have that many available slots/CPUs, the sharing policy should 
dispatch other users' jobs in preference to the large-volume user (but still 
allowing that user to have some slots available to process jobs).  In the case 
of users that dump huge numbers of jobs to the queues at a time, the sharing 
policy should remember that high level of use for a short time (no more than a 
month).


For the "weight_user" and so on values you could keep the default value of 0.25 (they are used for the 
functional policy only). More important are the values for "weight_ticket" in relation to the alike entries 
"weight_waiting_time". Just give the "weight ticktet" a higher value than the other entries.

The 12% you can achieve by defining a default-user in the share tree as leaf with e.g. 12000 shares out of a total of 
100000 ("weight_tickets_share 10000") in the share-tree dialog, just below the "root" entry, i.e. 
"Add Node" + "Add Leaf".

While "enforce_user auto" in SGE's configuration will take care that each user gets his 
own entry automatically once he submitted a job, it's important to change the 
"auto_user_delete_time 0" to zero. This is the actual object which will hold the values 
recorded for the past usage. If the user object is deleted (either by hand or automatically), the 
knowledge of the past usage is gone.

The record reserved usage and not only the used one (i.e. a user requested more 
than one core but runs only a serial job), it's necessary to set:

"execd_params ACCT_RESERVED_USAGE=TRUE SHARETREE_RESERVED_USAGE=TRUE"

in SGE's configuration too.


Reuti--

Thanks for your help and advice. We're getting closer to understanding howthis works. I'm not sure how to create the user--our version of qmondoesn't have a look that matches your description. We are using:


  Open Grid Scheduler/Grid Engine 2011.11p1

If I were to do thisvia qconf -auser, would the user configuration parameters look like this?


  name default-user
  oticket 0
  fshare 12000
  delete_time 0
  default_project NONE


Thanks again!
Carl
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] question about managing queues

Reply via email to